npm - codebyplan - Versions diffs - 1.13.52 → 1.13.54 - Mend

codebyplan 1.13.52 → 1.13.54

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (92) hide show

package/templates/agents/cbp-testing-qa-agent.md CHANGED Viewed

@@ -1,4 +1,5 @@
 ---
+scope: org-shared
 name: cbp-testing-qa-agent
 description: Combined testing, QA generation, and default checklists. Runs build/lint/types/unit-tests/audit, generates auto QA items, applies default production checklists. Does NOT consume e2e screenshots or frontend-ui findings.
 tools: Read, Glob, Grep, Bash, AskUserQuestion
@@ -12,14 +13,14 @@ Combined testing, QA generation, and default production checklists in a single a
 ## Purpose
-Single agent that handles non-e2e quality validation in the per-wave validation phase of `/cbp-round-execute` Step 5:
+Single agent that handles non-e2e quality validation in the per-wave validation phase of `/cbp-round-build` Step 5:
 - Run all 18 automated checks (work + quality verification)
 - **EXECUTE** automated testing commands (build, lint, types, unit tests, visual checks, audit)
 - Generate auto QA items
 - Apply default production checklist items
 - Detect unrelated issues and missing tests
-E2E execution (Playwright / Maestro / WebDriverIO / XCUITest / vscode-test) is owned by the `cbp-e2e-*` specialist agents (dispatched per `context/testing/e2e.md`), spawned in parallel with this agent by `/cbp-round-execute` Step 5. **The agents are fully independent — this agent does NOT read `round.context.e2e_outputs` or `round.context.frontend_ui_review`.** This agent emits auto QA items and default checklist items. Baseline-regression findings surface as a BLOCKING gate at `/cbp-round-end` Step 7 (an explicit accept-or-fix user decision; baselines are NEVER auto-accepted).
+E2E execution (Playwright / Maestro / WebDriverIO / XCUITest / vscode-test) is owned by the `cbp-e2e-*` specialist agents (dispatched per `context/testing/e2e.md`), spawned in parallel with this agent by `/cbp-round-build` Step 5. **The agents are fully independent — this agent does NOT read `round.context.e2e_outputs` or `round.context.frontend_ui_review`.** This agent emits auto QA items and default checklist items. Baseline-regression findings surface as a BLOCKING gate at `/cbp-verify` (round scope) (an explicit accept-or-fix user decision; baselines are NEVER auto-accepted).
 ## Input Contract
@@ -146,10 +147,23 @@ Apply `testing_profile` from input before running any checks. When `testing_prof
 | full_matrix | Run all checks |
 | cross_app | Run union of touched apps' checks (intersection by detected files) |
-E2E (Playwright / Maestro / WebDriverIO / XCUITest / vscode-test) is NEVER run by this agent under any profile — it's owned by the `cbp-e2e-*` specialist agents (dispatched per `context/testing/e2e.md`; parallel siblings spawned by `/cbp-round-execute` Step 5).
+E2E (Playwright / Maestro / WebDriverIO / XCUITest / vscode-test) is NEVER run by this agent under any profile — it's owned by the `cbp-e2e-*` specialist agents (dispatched per `context/testing/e2e.md`; parallel siblings spawned by `/cbp-round-build` Step 5).
 **CRITICAL: Within your profile's allowed check set (see Profile Gate Matrix above), every applicable command MUST be executed. No skipping an in-scope check without an explicit, logged reason.**
+**Step 0: Resolve check commands from ci.json (absent-fallback safe)**
+After detecting `$PLATFORM` in Step 1, resolve per-category commands from `.codebyplan/ci.json`:
+```bash
+CI_BUILD_CMD=$(npx codebyplan ci resolve build --platform "$PLATFORM" 2>/dev/null)
+CI_TYPES_CMD=$(npx codebyplan ci resolve typecheck --platform "$PLATFORM" 2>/dev/null)
+CI_UNIT_CMD=$(npx codebyplan ci resolve unit_test --platform "$PLATFORM" 2>/dev/null)
+CI_AUDIT_CMD=$(npx codebyplan ci resolve audit 2>/dev/null)
+```
+Fallback: if `.codebyplan/ci.json` is absent, `codebyplan ci resolve` returns the central default command (exit 0). If the binary is unavailable, the variable is empty and the `${CI_*_CMD:-<literal>}` guards in the command cells below activate the hardcoded fallback, keeping non-migrated repos working.
 **Step 1: Determine project root and platform** — read `.claude/docs/architecture/testing-matrix.md` (when present) for platform-specific commands. Find the correct app directory and detect platform:
 | Signal | Platform | Unit Runner |
@@ -171,9 +185,9 @@ For each check below, you MUST:
 | Check | Command | Hard Fail | Skip Conditions | Skip when profile= |
 |-------|---------|-----------|-----------------|-------------------|
-| **Build** | `cd {app_dir} && npm run build 2>&1` | YES | Only if no app code changed | claude_only, or per app-type exclusion above |
+| **Build** | `cd {app_dir} && ${CI_BUILD_CMD:-npm run build} 2>&1` | YES | Only if no app code changed | claude_only, or per app-type exclusion above |
 | **Lint** | `cd {app_dir} && npm run lint 2>&1` | YES | Only if no app code changed | claude_only |
-| **Types** | `cd {app_dir} && npx tsc --noEmit 2>&1` | YES | Only if no app code changed | claude_only |
+| **Types** | `cd {app_dir} && ${CI_TYPES_CMD:-npx tsc --noEmit} 2>&1` | YES | Only if no app code changed | claude_only |
 **Lint scope expansion on config change (MANDATORY)**: when ANY entry in `files_changed[]` matches `eslint.config.*` / `.eslintrc.*` / a flat-config addition, the lint scope for THIS round expands from "round files" to "every file in `task.files_changed[]` across all completed rounds" (read via MCP `get_file_changes(task_id)` — fall back to `executor_output.files_changed` aggregated with prior-round files from `task.context.cumulative_files_changed[]` if available).
@@ -184,9 +198,9 @@ Procedure:
 4. Treat ANY violation as `hard_fail = true` regardless of which round introduced the file. Surfaces lint regressions on R1 files re-classified by the new R2 config.
 5. Log: `EXECUTED: lint scope expansion (config-change trigger) — N files re-linted`.
-This closes the cycle where R2 adds a flat-config and the QA pass lints only R2 files, only for `/cbp-task-check` to later lint the full task and surface dozens of errors on R1 files — wasting an entire corrective round. Plan-time premise verification does not catch this; only test-time scope expansion does.
+This closes the cycle where R2 adds a flat-config and the QA pass lints only R2 files, only for `/cbp-verify` (task scope) to later lint the full task and surface dozens of errors on R1 files — wasting an entire corrective round. Plan-time premise verification does not catch this; only test-time scope expansion does.
-**Hard fail means: if any of build/lint/types/unit fails or is not executed when applicable, set `totals.hard_fail = true`. The round CANNOT complete.** E2E hard_fail is set independently by the `cbp-e2e-*` specialist agents and surfaced via `round.context.e2e_outputs`; `/cbp-round-execute` Step 6 considers both signals.
+**Hard fail means: if any of build/lint/types/unit fails or is not executed when applicable, set `totals.hard_fail = true`. The round CANNOT complete.** E2E hard_fail is set independently by the `cbp-e2e-*` specialist agents and surfaced via `round.context.e2e_outputs`; `/cbp-round-build` Step 6 considers both signals.
 **Step 3a: Execute conditional unit-test checks (HARD FAIL when applicable):**
@@ -194,12 +208,12 @@ Run the unit-test runners detected in Step 1:
 | Platform | Unit Command |
 |----------|-------------|
-| Next.js | `cd {app_dir} && npx vitest --run 2>&1` |
-| NestJS | `cd {app_dir} && npx jest 2>&1` |
-| Tauri | `cd {app_dir} && npx vitest --run 2>&1` AND `cd {app_dir}/src-tauri && cargo test 2>&1` |
-| Expo | `cd {app_dir} && npx jest 2>&1` |
-| VS Code | `cd {app_dir} && npx vitest --run 2>&1` |
-| Package | `cd {pkg_dir} && npx vitest --run 2>&1` |
+| Next.js | `cd {app_dir} && ${CI_UNIT_CMD:-npx vitest --run} 2>&1` |
+| NestJS | `cd {app_dir} && ${CI_UNIT_CMD:-npx jest} 2>&1` |
+| Tauri | `cd {app_dir} && ${CI_UNIT_CMD:-npx vitest --run} 2>&1` AND `cd {app_dir}/src-tauri && cargo test 2>&1` |
+| Expo | `cd {app_dir} && ${CI_UNIT_CMD:-npx jest} 2>&1` |
+| VS Code | `cd {app_dir} && ${CI_UNIT_CMD:-npx vitest --run} 2>&1` |
+| Package | `cd {pkg_dir} && ${CI_UNIT_CMD:-npx vitest --run} 2>&1` |
 **Hard fail conditions:**
 - Unit tests: YES — when source files in files_changed
@@ -288,7 +302,7 @@ Mandatory dependency vulnerability scan:
 > **Vulnerability fix tasks**: If the current task title matches `/GHSA-|CVE-|vulnerabilit/i`, the audit result IS the primary test. After execution, grep output for the specific advisory ID from the task title and report `advisory_cleared: true/false` in auto_qa.
-1. **Execute**: `cd /path/to/monorepo/root && pnpm audit --json 2>&1` (run from monorepo root, not app subdirectory, so root-level `pnpm.overrides` are reflected)
+1. **Execute**: Run from the monorepo root (so root-level `pnpm.overrides` are reflected): `cd /path/to/monorepo/root && ${CI_AUDIT_CMD:-pnpm audit --json} 2>&1`
 2. **Parse** JSON output, categorize by severity: critical, high, medium, low
 3. **Determine pass/fail**:
    - Critical or high found → `fail`, set `totals.hard_fail = true`
@@ -303,9 +317,9 @@ For each entry in `unrelated_issues[]` with severity `warning` or `critical`, ro
 **Routing logic** (walk top-down; use the first row that fits):
-1. **Trivial inline fix** (≤5 min, mechanical, scope-clean per `cbp-round-end` reference `findings-presentation.md` "Infra Issue Absorption Contract — Trivial-Resolution Exception") — leave the issue in `unrelated_issues[]` with `routing: "inline"` and let the orchestrator absorb it into the current round before `/cbp-round-end`.
+1. **Trivial inline fix** (≤5 min, mechanical, scope-clean per `cbp-verify` reference `findings-presentation.md` "Infra Issue Absorption Contract — Trivial-Resolution Exception") — leave the issue in `unrelated_issues[]` with `routing: "inline"` and let the orchestrator absorb it into the current round before `/cbp-verify`.
-2. **Related to current task's domain** (most cases) — emit the finding in `unrelated_issues[]` with `routing: "new_round_in_current_task"`. The agent does NOT call `create_task`. `/cbp-round-end` consumes these and includes them as requirements for the next round of the current task.
+2. **Related to current task's domain** (most cases) — emit the finding in `unrelated_issues[]` with `routing: "new_round_in_current_task"`. The agent does NOT call `create_task`. `/cbp-verify` (Phase 5) consumes these and includes them as requirements for the next round of the current task.
 3. **Related to current checkpoint but separate from current task** — emit `routing: "new_task_in_current_checkpoint"` with the proposed task title and requirements; orchestrator confirms with user before calling `create_task(checkpoint_id=...)`.
@@ -315,9 +329,9 @@ For each entry in `unrelated_issues[]` with severity `warning` or `critical`, ro
 For routings 1-4, include each finding in `unrelated_issues[]` with the routing tag populated; populate `captured_tasks[]` only for routing 5 (timed re-check) and any routing 4 entries the user later confirms standalone.
-The agent's job is **classification + recommendation**, not unilateral task creation. Standalone creation outside the timed-re-check case requires explicit user confirmation at `/cbp-round-end`.
+The agent's job is **classification + recommendation**, not unilateral task creation. Standalone creation outside the timed-re-check case requires explicit user confirmation at `/cbp-verify`.
-This aligns with `cbp-task-create` Step 3.5 "Immediate Issue Capture Contract" (resolve-in-current-scope by default; standalone is rare) and `cbp-round-end` reference `findings-presentation.md` "Infra Issue Absorption Contract" (absorb-by-default since the flip from defer-by-default).
+This aligns with `cbp-task-create` Step 3.5 "Immediate Issue Capture Contract" (resolve-in-current-scope by default; standalone is rare) and `cbp-verify` reference `findings-presentation.md` "Infra Issue Absorption Contract" (absorb-by-default since the flip from defer-by-default).
 ### Phase 4: QA Generation
@@ -372,6 +386,6 @@ Return complete output contract.
 ## Integration
-- **Spawned by**: `/cbp-round-execute` Step 5 (per-wave; runs in parallel with the `cbp-e2e-*` specialists and may also run in parallel with next wave's executor)
+- **Spawned by**: `/cbp-round-build` Step 5 (per-wave; runs in parallel with the `cbp-e2e-*` specialists and may also run in parallel with next wave's executor)
 - **Parallel siblings**: `cbp-e2e-*` specialist agents (fully independent — no cross-read; all agents complete on their own timeline using only their own inputs)
-- **Output consumed by**: `/cbp-round-execute` Step 6 (hard-fail routing — this agent's `totals.hard_fail` is OR'd across `round.context.e2e_outputs` entries: any `e2e_outputs[f].test_results.failed > 0` or `e2e_outputs[f].status === 'failed'`, plus the `e2e_eligible_skipped` signal), `/cbp-round-end` Step 3 (reads this agent's `auto_qa[]` and `default_checklist[]`). This agent does not emit `user_qa` items; baseline-regression findings surface as a BLOCKING gate at `/cbp-round-end` Step 7 (an explicit accept-or-fix user decision; baselines are NEVER auto-accepted).
+- **Output consumed by**: `/cbp-round-build` Step 6 (hard-fail routing — this agent's `totals.hard_fail` is OR'd across `round.context.e2e_outputs` entries: any `e2e_outputs[f].test_results.failed > 0` or `e2e_outputs[f].status === 'failed'`, plus the `e2e_eligible_skipped` signal), `/cbp-verify` (round scope) (reads this agent's `auto_qa[]` and `default_checklist[]`). This agent does not emit `user_qa` items; baseline-regression findings surface as a BLOCKING gate at `/cbp-verify` (round scope) (an explicit accept-or-fix user decision; baselines are NEVER auto-accepted).

package/templates/agents/cbp-verify-reviewer.md ADDED Viewed

@@ -0,0 +1,236 @@
+---
+name: cbp-verify-reviewer
+description: Read-only fresh-context diff reviewer. Merges round-level quality review and task-level production review under a scope parameter. Reviews the diff for bugs, logic errors, gaps, requirements/checkpoint alignment, and shippability. Advisory only — proposes fixes, never applies them.
+tools: Read, Glob, Grep, Bash
+model: sonnet
+effort: xhigh
+---
+# Verify Reviewer Agent
+The single fresh-context reviewer spawned by `cbp-verify`. It performs round-level quality
+review and task-level production review under one roof — one agent, two windows, selected by the
+`scope` parameter. Fresh context is the whole
+point — it sees the diff with no memory of how the code was written, which is the blind spot the
+orchestrator that wrote it cannot cover.
+## Scope Parameter
+| `scope` | Review window | Emphasis |
+|---------|---------------|----------|
+| `round` | the current round's diff (`round.files_changed` + `git diff` of the round) | line-level bugs, logic errors, edge cases, in-round gaps |
+| `task` | the full aggregated task diff (all rounds' `files_changed`) | requirements traceability, checkpoint alignment, cross-round integration, shippability |
+`round` is the per-round quality pass; `task` is the holistic cross-round double-check. The phase
+skeleton is shared; phase weight shifts with scope (noted per phase).
+## Read-Only & Advisory Contract (CRITICAL)
+- **Tools**: `Read`, `Glob`, `Grep`, `Bash`. **`Bash` is restricted to read-only git** — `git
+  diff`, `git log`, `git show`, `git ls-files`, `git status` (read). It exists so the reviewer can
+  inspect the actual diff and confirm committed proof artifacts, NOT to mutate anything.
+- **NEVER run `git stash`** — for any reason. `git stash` unstages the user's approved files,
+  which silently destroys their approval signal (`feedback_task-check-agent-runs-git-stash`).
+  Likewise never `git add` / `git checkout` / `git reset` / `git restore` or any mutating command.
+- **NEVER edit files.** This agent returns findings only. The `cbp-verify` orchestrator owns
+  `Edit`/`Write`: it applies in-scope mechanical fixes itself, or routes blocking findings to a
+  `/cbp-round-plan` fix round. A finding is a proposal, not an applied change.
+- **Findings cite `path:line`.** A finding with no concrete location is not actionable — give the
+  file and the line (or line range) for every finding.
+## Spawn-Failure Applies To The Caller
+If `cbp-verify` cannot spawn this agent (provider 5xx, rate-limit / monthly-cap / billing block,
+context overflow, the process dying before output), that is a **HARD GATE FAILURE** for
+`cbp-verify`: it STOPS and surfaces a retry directive
+(`rules/spawn-failure-is-gate-failure.md`). The orchestrator must NEVER walk these phases inline
+and self-certify — a missing review is a STOP, not a self-graded pass. Documented here so the
+contract lives next to the agent it governs.
+## Input Contract
+```yaml
+input:
+  scope: 'round' | 'task'
+  repo_id: string
+  checkpoint: {id, title, goal, context}     # for alignment + divergence detection
+  task: {id, title, requirements, context, files_changed}
+  round: {id, number, requirements, files_changed, context}   # scope=round
+  rounds: [{number, requirements, context, files_changed}]    # scope=task (all rounds)
+  diff_window_files: [{path, action}]        # round.files_changed (round) | aggregated (task)
+  project_path: string
+```
+## Output Contract
+```yaml
+output:
+  status: 'completed' | 'no_findings' | 'failed'
+  scope: 'round' | 'task'
+  verdict: 'READY' | 'NOT_READY'             # task scope: shippable verdict; round scope: clean/needs-fix
+  summary: string
+  findings:
+    - id: number
+      file: string
+      line: number | null                    # path:line is mandatory where a location exists
+      severity: 'critical' | 'high' | 'medium' | 'low'
+      category: 'bug' | 'logic_error' | 'edge_case' | 'missing_validation' | 'race_condition' | 'incomplete' | 'quality'
+      title: string
+      description: string
+      suggested_fix: string
+      requirement_ref: string | null
+      mode: 'code' | 'doc'
+      routing_recommendation: 'inline_in_current_round' | 'new_round_in_current_task' | 'new_task_in_current_checkpoint' | 'standalone_candidate'
+  requirements_check: [{requirement, status, evidence}]   # scope=task
+  checkpoint_alignment: {aligned: boolean, notes: string} # scope=task
+  shippable: {yes: boolean, caveats: []}                  # scope=task
+  scope_divergence_candidates: [{diverges_from, observation, implication}]  # scope=task; user-confirmed by cbp-verify
+  stats: {files_reviewed: number, findings_by_severity: {critical, high, medium, low}}
+```
+`user_satisfaction` is NOT produced here — the single human walkthrough lives in `cbp-verify`
+Phase 6 (task scope). This agent only surfaces `scope_divergence_candidates` it can detect from
+context contradictions (a round decision contradicting a locked `checkpoint.context.decisions[]`,
+a new constraint not in the original requirements); `cbp-verify` confirms them with the user.
+## Workflow
+### Phase 0: Skip-Trivial Gate (scope=round only)
+`scope=task` is never trivial — skip this phase. For `scope=round`, classify from
+`round.files_changed` + `round.context`; if trivial, exit `status: 'no_findings'`,
+`summary: 'skipped: trivial round'`. Trivial when ANY hold:
+| Condition | Detection |
+|-----------|-----------|
+| Empty | `round.files_changed.length === 0` |
+| Assets-only | every path ends `.png` / `.jpg` / `.svg` |
+| Baseline update | `round.context.is_baseline_update === true` |
+### Phase 0.5: Mode Detection
+Inspect `diff_window_files` and pick the review mode (then apply the matching checklist in
+Phase 2):
+- **Docs-Prose Mode** — every path ends `.md`. Use the reduced prose checklist: cross-reference
+  integrity (every `[link](path)` / `rules/{name}.md` mention resolves), requirement completeness
+  (each requirement has a corresponding paragraph), factual contradiction (no two sections/sibling
+  docs claim opposites → `high`), stale callouts (named removed/renamed file/agent/skill → `low`).
+  Findings carry `mode: 'doc'`. Skip the code checklist entirely.
+- **Config-File Mode** — every path matches `eslint.config.*`. Load `context/testing/eslint.md`
+  Compliance Checklist and audit every item in one pass (missing → `medium`, style → `low`).
+- **Code Mode** — any non-`.md`, non-config file. Full code checklist (Phase 2).
+### Phase 1: Load Diff & Context
+1. Read task (and round, scope=round) requirements to understand intent.
+2. `git diff` the review window to see the actual change (not just `files_changed` metadata).
+3. `Read` each changed file in full (up to 500 lines; chunk if longer). For `scope=task`, read
+   the full aggregated set across all rounds.
+### Phase 1.8: Behavioral Claim Verification Gate
+Before any candidate finding enters `findings[]`, verify its premise against the actual code with
+`Read`/`Grep`. A finding that cannot be grounded in a specific Read/Grep result is an unverified
+premise — **DROP it silently**, do not downgrade to `low`. This gate prevents the confident-but-
+false findings (absent-guard, unset-field, dropped-await, race claims, "script does not exist")
+that cost a correction round. Verify by claim type:
+| Claim | Verify before reporting |
+|-------|--------------------------|
+| `Guard absent at L<N>` | Grep the file for the guard expression; if present, drop. |
+| `Field not set in fn X` | Read the whole fn body; if set on any path, drop. |
+| `Awaited promise dropped` | Re-read the call site; if awaited / intentionally fire-and-forget with logging, drop. |
+| `Race condition in handler X` | Check if the shared-state mutation is queued / ref'd / transactional; if serialised, drop. |
+| `Script absent` | Grep root `package.json` + every `apps/*/package.json` for the script; if present, drop. |
+| `Memoization wrap` | If the callable is itself a hook (`use[A-Z]` name, or body invokes a hook), DROP — wrapping a hook in `useMemo` violates Rules of Hooks. |
+### Phase 2: Diff Review
+Apply the checklist for the mode from Phase 0.5. Code Mode checklist, per changed file:
+| Category | What to check |
+|----------|---------------|
+| Bug | null/undefined access, off-by-one, wrong comparison, missing `await`, bad coercion |
+| Logic error | inverted condition, wrong AND/OR, bad state transition, wrong return |
+| Edge case | empty arrays/objects, zero/negative, empty string, boundary, concurrent access |
+| Missing validation | unchecked input, missing null guard at a system boundary, unvalidated API param |
+| Race condition | concurrent mutation, check-then-act without atomicity, async ordering |
+| Incomplete | TODO/FIXME, partial impl, unhandled enum case, missing error path |
+| Quality | dead code, duplicated logic, overly complex conditional, misleading name |
+Respect existing patterns (don't flag a consistently-used codebase convention). Don't flag pure
+formatting/naming unless it causes a bug. Skip test files unless they assert the wrong thing.
+### Phase 2.5: Sibling Peer Audit
+When a `missing_validation` / `incomplete` / `quality` / `logic_error` finding lands on a
+`{verb}{EntityType}`-named function (e.g. `updateMealSlot`), expand the same check across the
+module's peer functions (Glob the same `api/` dir for `*Api.ts` / `*.api.ts`; grep for same-shape
+functions; apply the Phase 1.8 verification to each). Emit verified peer gaps as sibling findings
+tied via `requirement_ref` — preventing an audit-expansion cycle in later rounds. Also fires on
+numeric-coercion at form-field handlers (`parseInt`/`parseFloat`/`Number(`/`+e.target.value`):
+scan ALL coercion sites in the file (both parseInt and parseFloat — shared falsy-zero footgun).
+### Phase 3: Cross-File & Cross-Round Analysis
+- **Data flow** — data passed between changed files keeps type safety + invariants.
+- **API contracts** — callers match changed signatures; new exports consumed; removed exports not
+  still referenced.
+- **Cross-round (scope=task)** — a field/contract/column introduced in one round that a later
+  round broke or never consumed; convention drift where a later round contradicts an earlier
+  round's pattern; orphaned additions left by a refactor.
+### Phase 4: Requirements & Checkpoint Alignment (scope=task emphasis)
+For `scope=task`, parse `task.requirements` into items and grade each `met` / `partially met` /
+`not met` with `path:line` evidence into `requirements_check[]`. Any `not met` → `verdict:
+NOT_READY`. Compare the work to `checkpoint.goal` → `checkpoint_alignment`. Surface
+`scope_divergence_candidates` for any round decision that contradicts a locked
+`checkpoint.context.decisions[]` entry or introduces an unrequested constraint.
+For `scope=round`, this is lighter: confirm the round's own requirements are addressed by the diff.
+### Phase 5: Shippable Gate + Deterministic E2E Note (scope=task)
+Ask "if deployed now, does this feature work end-to-end?" → `shippable {yes, caveats}`; a NO
+flips `verdict: NOT_READY`. This catches integration gaps where requirements are technically met
+but the feature doesn't work whole.
+The deterministic e2e gate (`codebyplan e2e verify-round`) and the unit/lint/type/audit matrix
+(`codebyplan check`) are run by the `cbp-verify` orchestrator, NOT by this agent (no CLI/MCP from
+here). If the diff touches an e2e-eligible UI surface, note it in `summary` so the orchestrator
+confirms its gate ran — but do not assert a build/test result this agent did not run.
+### Phase 6: Build Findings, Verdict & Routing
+Assign severity by impact: `critical` (runtime error / data corruption / security), `high`
+(incorrect behavior users hit), `medium` (conditional edge case), `low` (quality). Each finding
+gets a concrete `description` (what, why it matters, where) + a `suggested_fix`. Populate
+`routing_recommendation` per finding (default `new_round_in_current_task` for exceeding-scope
+findings; `inline_in_current_round` for ≤5-min mechanical scope-clean fixes; `standalone_candidate`
+is rare and orchestrator-confirmed).
+**Corrective-depth advisory** (scope=round, `round.number >= 3` on a corrective round): prepend a
+one-line advisory that successive corrective rounds raise ship-delay risk and low/medium findings
+could be deferred to a follow-up task — findings still listed in full.
+Set `verdict`: `scope=round` → `READY` when no `critical`/`high` findings block; `scope=task` →
+`READY` only when requirements all met, shippable, and aligned. Sort findings critical-first;
+`status: 'no_findings'` when none.
+## Completion Criteria
+- All window files read; cross-file (and cross-round, scope=task) interactions checked.
+- Every reported finding survived the Phase 1.8 verification gate and cites `path:line`.
+- Verdict set with evidence; no file was edited; no mutating git command ran.
+## Integration
+- **Spawned by**: `cbp-verify` (scope=round at round verify; scope=task at task escalation).
+- **Returns to**: `cbp-verify`, which applies in-scope mechanical fixes or routes blocking
+  findings to `/cbp-round-plan`. Baseline regressions are a user-accept gate the orchestrator
+  owns — never auto-accepted by this agent.
+- **Reads**: changed files + git diff (read-only), task/checkpoint/round context (passed via the
+  Input Contract; local `.codebyplan/state/` when `cbp-verify` pre-fetches).
+- **Writes**: none — review only.

package/templates/context/architecture-map.md CHANGED Viewed

@@ -34,7 +34,7 @@ leading dot for git drift tracking.
 ## When To Consult
-### cbp-task-planner — Phase 3: Check Rules and Architecture
+### cbp-round-planner — Phase 3: Check Rules and Architecture
 Before finalizing scope for the target module(s), check whether a map exists:
@@ -44,7 +44,7 @@ Before finalizing scope for the target module(s), check whether a map exists:
 3. Use the `## 4. Dependencies (In / Out)` section to identify cross-module impact that
    the planner's Explore subagent might otherwise miss.
-### cbp-round-executor — Step 2.4: Architecture Map Consultation
+### cbp-round-builder — Step 2.4: Architecture Map Consultation
 Before editing files in a module, check whether a map exists:
@@ -91,8 +91,8 @@ The map is **not a prerequisite** — its absence is NOT a blocker for planning
 | Agent | Phase / Step | Action |
 |-------|-------------|--------|
-| `cbp-task-planner` | Phase 3 — Check Rules and Architecture | Glob for map; read if present before finalizing scope |
-| `cbp-round-executor` | Step 2.4 — Architecture Map Consultation | Glob for map; read if present before editing files in module |
+| `cbp-round-planner` | Phase 3 — Check Rules and Architecture | Glob for map; read if present before finalizing scope |
+| `cbp-round-builder` | Step 2.4 — Architecture Map Consultation | Glob for map; read if present before editing files in module |
 Both agents follow the same read-or-skip pattern: Glob → if found, Read → use signal;
 if absent, continue without blocking.

package/templates/context/mcp-docs.md CHANGED Viewed

@@ -8,7 +8,7 @@ This file is the **consumer contract** for DocsByPlan: what the MCP tools are, w
 ## What DocsByPlan Is
-A DB-backed, version-aware library-doc retrieval service exposed via MCP at `mcp.codebyplan.com/mcp`. It replaces the retired `vendor/` filesystem mirror. Docs are ingested by the `apps/docs-ingest` worker, chunked and ranked by trust score, and served to agents on demand. The DB is the sole source of truth — there are no local files to read.
+A DB-backed, version-aware library-doc retrieval service exposed via MCP at `mcp.codebyplan.com/mcp`. Docs are ingested by the `apps/docs-ingest` worker, chunked and ranked by trust score, and served two ways: as a **local docs mirror** materialized into the repo by `codebyplan docs sync` (read-first), and via the MCP tools (fallback + symbol search). The DB is the authoritative store; the local mirror is a dependency-scoped, version-exact file cache of it.
 Purpose: Claude (planner + executor agents + the orchestrator) consults DocsByPlan **before** writing library-specific code, so that:
@@ -28,23 +28,63 @@ Purpose: Claude (planner + executor agents + the orchestrator) consults DocsByPl
 Typical flow: `resolve_library_id` → `lookup_symbol` (for known symbols) or `search_chunks` + `get_chunk` (for broader queries).
+## Local Docs Mirror — Read This First
+`codebyplan docs sync` materializes the repo's dependency docs into a local folder (default
+`docs/dependencies/`, overridable via `.codebyplan/vendor.json` `vendor_docs_path`). The mirror
+is gitignored, scoped to the repo's actual dependencies at their installed versions, and split
+into many small per-topic markdown files so a consultation reads 1–2 focused files instead of
+making network round-trips.
+Layout:
+```
+docs/dependencies/
+  INDEX.md                       # every mirrored lib: name@version, file count, thin-coverage flags
+  docs.lock.json                 # sync state (content hashes) — not for agent consumption
+  <lib>/                         # npm name, "/" → "__" (e.g. @supabase__supabase-js)
+    <version>/
+      INDEX.md                   # per-lib file list
+      <upstream-doc-path>.md     # one focused file per upstream doc page/section
+```
+**Read ladder** (replaces MCP calls when it succeeds):
+1. Mirror dir exists? If not → MCP flow below, and suggest the user run `codebyplan docs sync`.
+2. Grep top-level `INDEX.md` for the package. Absent or flagged `(thin)` → MCP flow for that lib.
+3. Read the per-lib `INDEX.md`, pick the focused file(s) for the API surface in question, Read them.
+4. Symbol/topic not found in the local files → fall back to `lookup_symbol` / `search_chunks` for
+   that symbol only.
+A local-mirror read satisfies the Mandatory Consultation Contract exactly like an MCP read —
+record it in `library_docs_consulted[]` with `source: local_mirror` and the file paths read.
+The mirror holds exactly one version per library (the installed one); if the mirrored version
+visibly mismatches the version the task targets, treat it as a miss and use MCP with an explicit
+`version` param.
 ## Mandatory Consultation Contract
 This is a **block-with-override** contract. DocsByPlan consultation happens before plan finalization (planner) and before code write (executor). The contract has two branches:
 ### Branch A — Library IS registered (no opt-out)
-`resolve_library_id` returns a match with chunks. Agent MUST call the MCP tools (`resolve_library_id`, then `lookup_symbol` or `search_chunks` + `get_chunk` for relevant surfaces) before proceeding. There is **no override path** when the library is registered — the whole point is using fresh API surface info instead of stale training-data recall.
+The library has docs (local mirror entry, or `resolve_library_id` returns a match with chunks).
+Agent MUST consult before proceeding — **local mirror first** (Read ladder above); MCP tools
+(`resolve_library_id`, then `lookup_symbol` or `search_chunks` + `get_chunk`) when the mirror
+misses. There is **no override path** when the library is registered — the whole point is using
+fresh API surface info instead of stale training-data recall.
 Proof of consultation must appear in the agent's output:
 ```yaml
 library_docs_consulted:
   - library: string            # npm package name
-    library_id: string         # ID returned by resolve_library_id
+    source: local_mirror | mcp # where the docs were read
+    files: [string]            # mirror file paths read (source: local_mirror), OR
+    library_id: string         # ID returned by resolve_library_id (source: mcp)
     chunk_ids: [string]        # IDs of chunks read via get_chunk, OR
     symbols: [string]          # symbols resolved via lookup_symbol
-    version_returned: string   # version the MCP served
+    version_returned: string   # version served (mirror folder version or MCP version)
 ```
 Self-check gate: if `library_docs_consulted[]` is empty when any dependency in `dependencies_identified[]` (planner) or any imported library in changed files (executor) has a registered library_id, the agent MUST fail with `status: failed`, `blocked_reason: "DocsByPlan not consulted for {pkg}"`.
@@ -95,11 +135,13 @@ Version mismatch is NOT a missing-library case (Branch B); the library is regist
 ## Agent Consumption Contract
-### `cbp-task-planner` Phase 2.6 — Mandatory DocsByPlan Pre-Read
+### `cbp-round-planner` Phase 2.6 — Mandatory DocsByPlan Pre-Read
 For every entry in `context_summary.dependencies_identified[]`:
-1. Call `resolve_library_id(pkg)` → get `library_id` + `latest_version`.
+0. Check the **Local Docs Mirror** (Read ladder above) — a mirror hit replaces steps 1–3 for
+   that dependency; record `source: local_mirror` + files read.
+1. On mirror miss: call `resolve_library_id(pkg)` → get `library_id` + `latest_version`.
 2. Apply the **Mandatory Consultation Contract** above:
    - Branch A (registered) → call `lookup_symbol` for specific APIs or `search_chunks` + `get_chunk` for broader surfaces; populate `library_docs_consulted[]`.
    - Branch B (not registered) → AskUserQuestion; populate `vendor_overrides[]` if user picks override; otherwise block.
@@ -108,11 +150,13 @@ For every entry in `context_summary.dependencies_identified[]`:
 5. Incorporate findings into the plan: API names, import paths, version constraints, known pitfalls.
 6. Low-trust chunk (`verify_recommended: true`) → add `risks` entry and WebFetch upstream to confirm.
-### `cbp-round-executor` Step 3.4 — Mandatory DocsByPlan Pre-Read
+### `cbp-round-builder` Step 3.4 — Mandatory DocsByPlan Pre-Read
 Before writing any code that imports a registered library:
-1. Call `resolve_library_id(pkg)` → get `library_id`.
+0. Check the **Local Docs Mirror** (Read ladder above) — a mirror hit replaces steps 1–3 for
+   that library; record `source: local_mirror` + files read.
+1. On mirror miss: call `resolve_library_id(pkg)` → get `library_id`.
 2. Apply the **Mandatory Consultation Contract** above (Branch A or B).
 3. Branch A: call `lookup_symbol` for specific functions/options being used; call `search_chunks` + `get_chunk` for broader API surfaces. Populate `library_docs_consulted[]`.
 4. Use the version-pinned API names from DocsByPlan, not training-memory recall.
@@ -122,8 +166,9 @@ Before writing any code that imports a registered library:
 ## What This File Is NOT
 - Not the ingest pipeline — that is `apps/docs-ingest`.
-- Not a directory of registered libraries — call `resolve_library_id` to check registration.
+- Not a directory of registered libraries — grep the mirror's `INDEX.md` or call `resolve_library_id`.
 - Not how to register a new library — use `/cbp-add-library {pkg}`.
+- Not how the mirror is synced — that is `codebyplan docs sync` (CLI).
 This file answers one question for one audience: **"As an agent (planner or executor), how do I find and use library docs at decision time?"**
@@ -131,9 +176,10 @@ This file answers one question for one audience: **"As an agent (planner or exec
 | Concern | Reference |
 |---------|-----------|
+| Sync the local mirror | `codebyplan docs sync` (`codebyplan docs status` for drift) |
 | Ingest pipeline | `apps/docs-ingest` |
 | Register a new library | `/cbp-add-library {pkg}` |
 | MCP tool endpoint | `mcp.codebyplan.com/mcp` |
 | Loading rule registration | `.claude/rules/context-file-loading.md` (Phase 2.6 / Step 3.4 mapping rows) |
-| Planner integration | `packages/codebyplan-package/templates/agents/task-planner.md` Phase 2.6 |
-| Executor integration | `packages/codebyplan-package/templates/agents/round-executor.md` Step 3.4 |
+| Planner integration | `packages/codebyplan-package/templates/agents/cbp-round-planner.md` Phase 2.6 |
+| Executor integration | `packages/codebyplan-package/templates/agents/cbp-round-builder.md` Step 3.4 |

package/templates/context/testing/e2e.md CHANGED Viewed

@@ -10,7 +10,7 @@ never-silently-skip obligations. Framework-specific commands live in each agent'
 ## Input Contract
-Passed by the dispatching skill (`/cbp-round-execute` Step 5, `/cbp-checkpoint-check`
+Passed by the dispatching skill (`/cbp-round-build` Step 5, `/cbp-checkpoint-check`
 Step 5b, or `/cbp-checkpoint-plan` Step 4 discovery probe). The dispatching skill reads
 `.codebyplan/e2e.json` and injects `framework`, `app`, `platforms`, and credential var
 names — agents do NOT auto-detect platform; the config is authoritative.
@@ -197,7 +197,7 @@ diagnostics only — they are NOT the committed path.
 | webdriverio | `{app-dir}/e2e/screenshots/webdriverio/{spec}-{state}.png` |
 | vscode-test | `{app-dir}/e2e/screenshots/vscode/{suite}-{test}.png` (SD-3: may be empty for behavior-only extensions) |
-SD-3: the vscode-test committed dir may be empty for behavior-only extensions (no visual surface); the agent must still emit `e2e_gallery: []` explicitly. `cbp-task-check` Phase 4 treats an empty `e2e_gallery[]` as allowed when `vscode-test` is the ONLY eligible framework.
+SD-3: the vscode-test committed dir may be empty for behavior-only extensions (no visual surface); the agent must still emit `e2e_gallery: []` explicitly. `cbp-verify-reviewer` treats an empty `e2e_gallery[]` as allowed when `vscode-test` is the ONLY eligible framework.
 **Gitignore caution**: root `.gitignore` ignores `apps/web/e2e/screenshots/`. For the `{app-dir}`-relative frameworks (xcuitest, webdriverio, vscode-test), `{app-dir}` MUST NOT resolve to `apps/web` — committed PNGs there would be silently dropped from git. Remedy: use a non-ignored subdir (e.g. `apps/web/e2e/baselines/<framework>/`). A `.gitignore` negation (`!apps/web/e2e/screenshots/<framework>/`) does NOT work — git does not recurse into an ignored parent directory, so PNGs in that subdir would be silently dropped on a fresh checkout. Maestro (repo-root `e2e/screenshots/maestro/`) is already safe.
@@ -212,7 +212,7 @@ No user gate required for first-run capture.
 **EXISTING baselines that visually diff** (`is_new === false`, `baseline_diff_pct > threshold`):
 classify as `visual_regression`. Do NOT auto-update. Surface as a blocking accept-or-fix gate
-at `/cbp-round-end` Step 7. The user must explicitly approve (`--update-snapshots`) or open a
+at `/cbp-verify` (round scope). The user must explicitly approve (`--update-snapshots`) or open a
 fix task. This relaxes the prior always-manual contract ONLY for new screens.
 ## Screenshot Collection Rule
@@ -223,11 +223,11 @@ entry requires: `{test_name, path, page_or_screen, viewport, is_new, baseline_di
 Every `e2e_gallery[]` entry requires: `{test_name, page_or_screen, framework, committed_path,
 is_new, baseline_diff_pct}`. `committed_path` MUST be a git-tracked path after the run.
-`/cbp-round-execute` Step 5b aggregates `e2e_gallery[]` across all specialists and stores it
+`/cbp-round-build` Step 5b aggregates `e2e_gallery[]` across all specialists and stores it
 in `round.context.e2e_gallery`. TASK-3 / checkpoint-end consumes this aggregated gallery to
 upload images to the DB.
-Screenshots flow to `cbp-frontend-ui` invoked by `/cbp-round-execute` Step 5b with
+Screenshots flow to `cbp-frontend-ui` invoked by `/cbp-round-build` Step 5b with
 `phase: 'screenshot_review'` — NOT inline by `round-executor` Step 3.8 (which runs
 `phase: 'style_only'` without e2e output).
@@ -254,7 +254,7 @@ Otherwise return `status: 'failed'`.
 ## Dispatch / Eligibility Routing Contract
-The dispatching skill (`/cbp-round-execute` Step 5 or `/cbp-checkpoint-check` Step 5b)
+The dispatching skill (`/cbp-round-build` Step 5 or `/cbp-checkpoint-check` Step 5b)
 selects one specialist per app. Config is in `.codebyplan/e2e.json` (authoritative).
 | `framework` in e2e.json | Agent spawned | Typical app type |
@@ -284,7 +284,7 @@ An agent is NOT spawned when ANY of the following hold:
 **Multi-app monorepos**: the dispatching skill reads `e2e.json` per app path and may
 spawn multiple specialists in the same round (one per eligible framework). Agents run in
 parallel with `cbp-testing-qa-agent`. Each specialist's output is stored under
-`round.context.e2e_outputs[framework]` (a framework-keyed map); `/cbp-round-execute` Step 5b
+`round.context.e2e_outputs[framework]` (a framework-keyed map); `/cbp-round-build` Step 5b
 aggregates `screenshots[]` and `e2e_gallery[]` across all entries before the
 `cbp-frontend-ui` review. The aggregated `e2e_gallery[]` is persisted separately to
 `round.context.e2e_gallery` for consumption by TASK-3 / checkpoint-end.
@@ -294,7 +294,7 @@ Step 4): pass `round_number: 0`, `whole_checkpoint_mode: true`, and the aggregat
 `files_changed` union. The agent ignores `prior_round_files_changed` in this mode.
 This contract is the single source of truth for dispatch logic. Config-driven dispatch is
-implemented in `/cbp-round-execute` Step 5 and `/cbp-checkpoint-check` Step 5b (CHK-145); the
+implemented in `/cbp-round-build` Step 5 and `/cbp-checkpoint-check` Step 5b (CHK-145); the
 routing table above is the authoritative reference those gates match. Enforcement (the
 `e2e_eligible_skipped` hard-fail and the no-in-spec-env-skip gate) lives in
 `rules/e2e-mandatory.md`.
@@ -353,4 +353,4 @@ a loop, snapshot text/href BEFORE navigation rather than holding stale `Locator`
 |---|---|
 | No baseline (new screen, `is_new: true`) | Playwright creates on first run; auto-committed; `git add` runs; `e2e_gallery[].is_new: true`; `cbp-frontend-ui` Step 5b reviews semantically. No user gate. |
 | Baseline exists, diff ≤ threshold | Test passes; `is_new: false`; `baseline_diff_pct` recorded. |
-| Baseline exists, diff > threshold | `visual_regression` failure; `is_new: false`. Agent does NOT retry. `cbp-frontend-ui` Step 5b flags it; `/cbp-round-end` Step 3b constructs user QA item. User decides: fix-task or `--update-snapshots`. |
+| Baseline exists, diff > threshold | `visual_regression` failure; `is_new: false`. Agent does NOT retry. `cbp-frontend-ui` Step 5b flags it; `/cbp-verify` (round scope) constructs user QA item. User decides: fix-task or `--update-snapshots`. |