npm - codebyplan - Versions diffs - 1.13.53 → 1.13.55 - Mend

codebyplan 1.13.53 → 1.13.55

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (84) hide show

package/templates/skills/cbp-task-create/SKILL.md CHANGED Viewed

@@ -10,13 +10,12 @@ Create a new task within the active checkpoint. Gathers user context, analyzes e
 ## When Used
-- Suggested by `/cbp-task-check` when scope issues require a new task
-- Suggested by `/cbp-task-testing` when major problems need a separate task
+- Suggested by `/cbp-verify` (task scope) when scope issues or major problems require a separate task
 - User manually wants to add a task to the current checkpoint
 ## Identifier Notation
-This skill operates on the **active** checkpoint resolved via MCP `get_current_task` and does not accept a positional identifier argument. The task it creates gets its `number` from the next-available slot within the active checkpoint (checkpoint-bound). Canonical chk-task-round notation — used in prose, error messages, and cross-references — follows `cbp-round-start` Step 0 "CHK / TASK / ROUND Identifier Notation Vocabulary": `108-1` (CHK-108 TASK-1), `108-1-2` (round 2 of CHK-108 TASK-1).
+This skill operates on the **active** checkpoint resolved via MCP `get_current_task` and does not accept a positional identifier argument. The task it creates gets its `number` from the next-available slot within the active checkpoint (checkpoint-bound). Canonical chk-task-round notation — used in prose, error messages, and cross-references — follows `cbp-round-plan` Step 0 "CHK / TASK / ROUND Identifier Notation Vocabulary": `108-1` (CHK-108 TASK-1), `108-1-2` (round 2 of CHK-108 TASK-1).
 **Bare-number argument**: if a bare number (e.g. `42`) is provided with no checkpoint context, this skill cannot resolve it to a checkpoint-bound task:
@@ -44,8 +43,8 @@ Use AskUserQuestion to understand the new task:
 Why is this task needed? What should it accomplish?
-If this was triggered by `/cbp-task-check` or `/cbp-task-testing`, the findings are:
-[pre-loaded context from check/testing findings if available]
+If this was triggered by `/cbp-verify` (task scope), the findings are:
+[pre-loaded context from verify findings if available]
 Please describe:
 1. What the task should accomplish
@@ -70,7 +69,7 @@ Discovered issues MUST be captured. The default target is current scope (round
 | Situation | Action |
 |-----------|--------|
-| Trivial inline fix (≤5 min, mechanical, scope-clean) | Apply in the CURRENT round per `cbp-round-end` reference `findings-presentation.md` "Trivial-Resolution Exception" |
+| Trivial inline fix (≤5 min, mechanical, scope-clean) | Apply in the CURRENT round per `cbp-verify` reference `findings-presentation.md` "Trivial-Resolution Exception" |
 | Related to the current task's domain | Create a new ROUND in the current task |
 | Fits the current checkpoint goal but is meaningfully separate | Create a new TASK in the current checkpoint via `create_task(checkpoint_id)` |
 | Large enough to need multiple tasks AND fits no current checkpoint | Create a NEW CHECKPOINT via `create_checkpoint` |
@@ -193,5 +192,5 @@ Waiting for user to decide next step.
 - **Reads**: Local state `.codebyplan/state/checkpoints/<id>.json` + `.../tasks/<id>.json`; on miss `npx codebyplan sync` once; MCP `get_current_task` / `get_tasks` as documented break-glass when the state dir is absent and sync fails. Step 3.5 dedup `get_tasks(standalone=true)` stays MCP — no local-state equivalent for standalone listing.
 - **Writes**: `codebyplan task create --checkpoint-id <id> ...` (CLI write-through); MCP `create_task` break-glass.
-- **Triggered by**: `/cbp-task-check` (suggested), `/cbp-task-testing` (suggested), user manual
+- **Triggered by**: `/cbp-verify` (task scope, suggested), user manual
 - **Does NOT auto-trigger** next command — user decides

package/templates/skills/cbp-task-start/SKILL.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 name: cbp-task-start
 description: Start a task, load context from DB
-triggers: [cbp-round-start]
+triggers: [cbp-round-plan]
 argument-hint: [chk-task]  # e.g. `108-1` (CHK-108 TASK-1)
 effort: xhigh
 ---
@@ -14,7 +14,7 @@ Start a task by loading context from the database and preparing for work.
 ### Step 1: Parse `$ARGUMENTS`
-Parse the argument using the canonical chk-task-round notation (see `cbp-round-start` Step 0 "CHK / TASK / ROUND Identifier Notation Vocabulary"):
+Parse the argument using the canonical chk-task-round notation (see `cbp-round-plan` Step 0 "CHK / TASK / ROUND Identifier Notation Vocabulary"):
 | Shape | Regex | Resolves to |
 |-------|-------|-------------|
@@ -30,7 +30,7 @@ task-start: invalid argument `{value}`. Expected:
   (empty) → next pending task
 For standalone tasks, use `/cbp-standalone-task-start {N}`.
-For a specific round, use `/cbp-round-start 108-1-2`.
+For a specific round, use `/cbp-round-plan 108-1-2`.
 ```
 Error cases: `108-1-2` (that is round-start's shape), `abc`, `108-`, `-1`, `108--1`, anything with whitespace or non-numeric characters.
@@ -40,7 +40,7 @@ Error cases: `108-1-2` (that is round-start's shape), `abc`, `108-`, `-1`, `108-
 - `task-start 108-1` → CHK-108 TASK-1
 - `task-start` (no arg) → next pending via `get_current_task`
 - `task-start 45` → error: "Use /cbp-standalone-task-start 45 instead — bare numbers no longer route to standalone tasks."
-- `task-start 108-1-2` → error: "use `/cbp-round-start 108-1-2`"
+- `task-start 108-1-2` → error: "use `/cbp-round-plan 108-1-2`"
 - `task-start abc` → error: malformed
 - `task-start 108-` → error: malformed
@@ -75,7 +75,7 @@ Ask via AskUserQuestion, naming the resolved task and disclosing the actions:
 > - **Cancel** — do nothing
 - **Proceed** → continue to Step 3.
-- **Cancel** → abort cleanly: make NO writes (no branch switch, no commit, no `update_task`, no `/cbp-round-start` trigger) and exit with one line: `Cancelled by user — TASK-{N} not started.`
+- **Cancel** → abort cleanly: make NO writes (no branch switch, no commit, no `update_task`, no `/cbp-round-plan` trigger) and exit with one line: `Cancelled by user — TASK-{N} not started.`
 ### Step 3: Branch Auto-Handling
@@ -221,17 +221,17 @@ Display context summary:
 ### Step 6: Auto-trigger Round Start
-The Step 2.5 permission gate already covered this hand-off (the user approved running the skill, round-1 auto-start included), so no further prompt is needed here — automatically trigger `/cbp-round-start` for the first round.
+The Step 2.5 permission gate already covered this hand-off (the user approved running the skill, round-1 auto-start included), so no further prompt is needed here — automatically trigger `/cbp-round-plan` for the first round.
 ```
 Starting first round...
 ```
-Trigger `/cbp-round-start` with **no argument**. Do NOT pass the task identifier (`{chk}-{task}`) — round-start's 2-segment form is interpreted as standalone TASK-`{chk}` round `{task}`, not CHK-`{chk}` TASK-`{task}`. Passing no argument causes round-start to derive the active task/round from state, which is the correct path here.
+Trigger `/cbp-round-plan` with **no argument**. Do NOT pass the task identifier (`{chk}-{task}`) — round-start's 2-segment form is interpreted as standalone TASK-`{chk}` round `{task}`, not CHK-`{chk}` TASK-`{task}`. Passing no argument causes round-start to derive the active task/round from state, which is the correct path here.
 ## Integration
 - **Gates**: Step 2.5 permission gate — asks the user to confirm before any side effect; **Cancel** aborts cleanly with no writes. Fires on every invocation (manual, auto-trigger, auto-loop).
 - **Reads**: `.codebyplan/state/checkpoints/*.json`, `checkpoints/<id>/tasks/*.json`, `checkpoints/<id>/tasks/<id>/rounds/*.json`, `todos.json` (local-first; `npx codebyplan sync` on miss; MCP `get_current_task`/`get_tasks`/`get_rounds` break-glass)
 - **Writes**: `codebyplan task update` (CLI write-through; MCP `update_task` break-glass)
-- **Triggers**: `/cbp-round-start` (auto, round 1, no argument)
+- **Triggers**: `/cbp-round-plan` (auto, round 1, no argument)

package/templates/skills/cbp-todo/SKILL.md CHANGED Viewed

@@ -131,19 +131,17 @@ Once the gates pass, load the context the head command needs. This ensures `/cle
 | `/cbp-checkpoint-plan` | Load checkpoint from `.codebyplan/state/checkpoints/<id>.json` + task files under `checkpoints/<id>/tasks/` (fallback MCP `get_checkpoints` + `get_tasks`). Display checkpoint title, goal, ideas, existing task count |
 | `/cbp-checkpoint-start` | Load checkpoint + task files from local state (fallback MCP `get_checkpoints` + `get_tasks`). Display checkpoint title, status, claim state, first pending task |
 | `/cbp-task-start [N]` | Load from `.codebyplan/state/session/current.json` (fallback MCP `get_current_task`). Display checkpoint title + task title/requirements summary |
-| `/cbp-round-start` | Load from local state session + round files (fallback MCP `get_current_task` + `get_rounds`). Display checkpoint + task + round count + last round summary |
-| `/cbp-round-update` | Load from local state session + round files (fallback MCP `get_current_task` + `get_rounds`). Display checkpoint + task + files_changed triage summary (claude_approved, findings, hard_fail) |
-| `/cbp-round-input` | **Full context load** (see Step 2b) |
-| `/cbp-task-check` | Load from local state session (fallback MCP `get_current_task`). Display checkpoint + task + files summary |
-| `/cbp-task-testing` | Load from local state session + round files (fallback MCP `get_current_task` + `get_rounds`). Display checkpoint + task + testing status summary |
+| `/cbp-round-plan` | Load from local state session + round files (fallback MCP `get_current_task` + `get_rounds`). Display checkpoint + task + round count + last round summary |
+| `/cbp-round-plan` | **Full context load** (see Step 2b) |
+| `/cbp-verify` | Load from local state session + round files (fallback MCP `get_current_task` + `get_rounds`). Display checkpoint + task + files_changed triage summary (claude_approved, findings, hard_fail) |
 | `/cbp-task-create` | Load from local state session (fallback MCP `get_current_task`). Display checkpoint + task list summary |
-| `/cbp-task-complete` | Load from local state session (fallback MCP `get_current_task`). Display checkpoint + task summary |
+| `/cbp-finalize` | Load from local state session (fallback MCP `get_current_task`). Display checkpoint + task summary |
 | `/cbp-checkpoint-complete` | Load from local state session (fallback MCP `get_current_task`). Display checkpoint summary |
 | *(no command / idle)* | See Step 3 — suggest `/cbp-session-end` |
 **For any unrecognized command:** Load from local state session (fallback MCP `get_current_task`) as a safe default. Display whatever context is available.
-### Step 2b: Full Context Load (for `/cbp-round-input`)
+### Step 2b: Full Context Load (for `/cbp-round-plan`)
 This is the most context-dependent command. Load everything:
@@ -190,7 +188,7 @@ Reached only when the Step 1.5 ownership gate allowed routing to continue, the S
 ## Integration
-- **Called by**: `/cbp-session-start`, `/cbp-task-complete`, `/cbp-checkpoint-complete`, manual, after `/clear`
+- **Called by**: `/cbp-session-start`, `/cbp-finalize`, `/cbp-checkpoint-complete`, manual, after `/clear`
 - **Resolves**: `npx codebyplan resolve-worktree --json` (worktree id + distress signal), `npx codebyplan whoami --json` (user id)
 - **Reads**: `.codebyplan/state/todos.json`, `session/current.json`, `checkpoints/<id>.json`, `checkpoints/<id>/tasks/<id>.json`, `checkpoints/<id>/tasks/<id>/rounds/<id>.json`, `worktrees.json`. If missing/stale, run `npx codebyplan sync` once and re-read. Break-glass: MCP `get_todos`, `get_current_task`, `get_rounds`, `get_checkpoints`, `get_tasks` when state dir absent and sync fails. `get_worktrees` stays MCP (display-only ownership-block path; no CLI verb).
 - **Triggers**: `rows[0].command` (auto, after the Step 1.5 ownership gate and Step 1.55 stale-entity guard pass, and the Step 1.6 planning gate falls through); Step 1.55 overrides to STOP (stale completed/cancelled entity); Step 1.6 overrides to `/cbp-checkpoint-plan` (unplanned) or `/cbp-checkpoint-start` (planned-but-pending)

package/templates/skills/cbp-verify/SKILL.md ADDED Viewed

@@ -0,0 +1,146 @@
+---
+name: cbp-verify
+description: Unified verify stage — deterministic gates, real-execution proof, and a fresh-context diff review at round or task scope. Auto-triggered by cbp-round-build; escalates to task scope on the last clean round.
+argument-hint: [chk-task[-round] | task[-round]]
+triggers: [cbp-round-plan, cbp-round-complete, cbp-finalize]
+effort: xhigh
+---
+# Verify Command
+The single verify stage for the execution half. Collapses automated checks, finished-round
+triage, AI production review, and comprehensive task-level testing into one scope-aware skill.
+The deterministic spine lives in the CLI (`codebyplan check`, `codebyplan e2e verify-round`);
+this skill orchestrates the gates, proves execution, spawns ONE fresh-context reviewer, and
+routes on a single directive.
+Auto-triggered by `/cbp-round-build` after execution. The human gate is NOT here — it is the
+separate `ask`-tier `cbp-round-complete` (round) and the one batched walkthrough in Phase 6
+(task). This skill is model-invocable on purpose.
+## Scope & Kind
+- **SCOPE** (`round` | `task`) — auto-detected: a 3-segment `{chk}-{task}-{round}` (or 2-segment
+  `{task}-{round}` standalone) argument, or an auto-trigger from `cbp-round-build`, is
+  `scope=round`. A 2-segment `{chk}-{task}` (or bare `{task}` standalone) argument, or the Phase 5
+  escalation, is `scope=task`.
+- **KIND** (`checkpoint` | `standalone`) — detected ONCE at the top from identifier shape
+  (3-segment / 2-segment-chk = checkpoint; 2-segment / bare = standalone). KIND selects MCP tool
+  names per the table in `reference/deterministic-gates.md`.
+All reads are local-state-first (`.codebyplan/state/**`); on miss run `npx codebyplan sync` once
+and re-read; MCP `get_*` is the break-glass fallback. All writes go through `codebyplan ... update`
+(CLI write-through), MCP break-glass.
+## HARD GATES — non-negotiable
+- The deterministic-gate JSON **is** the verdict — never narrate "I verified the build". (Phase 2)
+- Empty execution proof on a UI-touching diff = GATE FAILURE. (Phase 3, `rules/execution-proof.md`)
+- Reviewer spawn failure = HARD GATE FAILURE → STOP + retry directive; NEVER self-review inline.
+  (Phase 4, `rules/spawn-failure-is-gate-failure.md`)
+- `gate6` is always hard (never baselined); baseline regressions are a user-accept gate, never
+  auto-accepted. (`rules/two-tier-ci.md`)
+## Phase Skeleton
+### PHASE 1 — RESOLVE
+Parse `$ARGUMENTS` (notation per `cbp-round-plan` identifier vocabulary). Detect SCOPE and KIND
+(above). Resolve the active round/task from local state. If `scope=round` and no in-progress
+round → `No active round. Run /cbp-round-plan first.` and STOP. If `scope=task` and any round is
+still `in_progress` → STOP with "complete the active round first". Full resolution + KIND tool
+table: `reference/deterministic-gates.md`.
+### PHASE 2 — DETERMINISTIC GATES
+Run the unified matrix and capture the JSON:
+```bash
+codebyplan check --scope <round|task> --json
+```
+The JSON `{ results[], any_failed, hard_fail_checks[], no_baseline }` IS the verdict — record
+each result's `check`, `status`, `exit_code`, `new_failures[]`. `gate6` is ALWAYS hard;
+`lint`/`typecheck`/`tests`/`audit` fail only on NEW per-package failures vs the committed
+`.check-baseline.json` (baseline-tolerant soft tier, `rules/two-tier-ci.md`). `any_failed === true`
+(equivalently `hard_fail_checks.length > 0`) → carry into the Phase 5 verdict as a fail. Exact
+contract + the `claude_only` carve-out (deterministic-only path, no agent): see
+`reference/deterministic-gates.md`.
+### PHASE 3 — REAL EXECUTION PROOF
+Produce the committed proof for every tier the diff touches (`rules/execution-proof.md`):
+- **Tier 1** (configured e2e framework whose `app` source changed) — persist `e2e_eligible` /
+  `e2e_outputs` to round context, then:
+  ```bash
+  codebyplan e2e verify-round --round-id <round_id> --task-id <task_id>
+  ```
+  Exit 0 = pass; exit 1 → surface `result.failed_checks[]` (`e2e_eligible_skipped` /
+  `zero_assertion_run` / `empty_gallery`) verbatim and carry as a fail.
+- **Tier 2/3/4** — dev-server screenshot / HTTP trace / command log per the rule.
+**Empty proof on a UI diff = GATE FAILURE.** Verify each screenshot/trace is committed with
+`git ls-files --error-unmatch <path>`. Write the `verify_manifest` (gates + proof, schema in
+`rules/execution-proof.md`). Per-scope detail: `reference/round-scope.md`, `reference/task-scope.md`.
+### PHASE 4 — FRESH-CONTEXT DIFF REVIEW
+Spawn `cbp-verify-reviewer` with `scope` (round → round diff; task → full task diff) and the
+Input Contract from `agents/cbp-verify-reviewer.md`. **SPAWN FAILURE = HARD GATE FAILURE** → STOP
+and surface the retry directive (`rules/spawn-failure-is-gate-failure.md`); record
+`<scope>.context.verify.spawn_failure`; do NOT walk the reviewer's phases inline. A returned
+`NOT_READY` is a successful review — act on it, do not retry.
+Triage the returned findings: in-scope mechanical fixes the orchestrator applies itself
+(`Edit`/`Write`); blocking out-of-scope findings → `/cbp-round-plan` fix round. A baseline
+regression is a **blocking user-accept gate** — never auto-accepted.
+### PHASE 5 — VERDICT + ROUTE (single directive, never an A/B/C menu)
+Combine Phase 2 + 3 + 4. Route on one directive (`feedback-close-out-routing.md`):
+| Result | Route |
+|--------|-------|
+| Any gate/proof/review fail | `Next: /cbp-round-plan` (open a fix round) |
+| Pass + more work wanted | `Next: /cbp-round-plan` (another round) |
+| Pass + LAST round + clean (scope=round) | escalate to `scope=task` → re-enter at Phase 1 |
+| Pass (scope=task) | proceed to Phase 6 finalize |
+### PHASE 6 — FINALIZE
+- **scope=round** — route to the human git-add gate: `Next: /cbp-round-complete`
+  (`ask`-tier; reconciles `sync-approvals` + `complete_round`). cbp-verify does NOT stage files
+  or complete the round. Detail: `reference/round-scope.md`.
+- **scope=task** — whole-repo `codebyplan check --scope task`, holistic `cbp-verify-reviewer`
+  (scope=task) already run in Phase 4, then the ONE genuine human step: a single batched
+  `AskUserQuestion` walkthrough (all user-testable items in one prompt, never one-per-question).
+  On satisfaction, write `task.context.verify_verdict = { verdict: 'READY', manifest, decided_at }`
+  and route `Next: /cbp-finalize`. Detail: `reference/task-scope.md`.
+## Key Rules
+- The JSON verdict from `codebyplan check` / `e2e verify-round` is authoritative — no prose
+  substitution.
+- Reviewer spawn failure STOPS the skill (retry directive); never self-certify inline.
+- Empty proof on a UI diff fails verify; screenshots must be committed.
+- Claude NEVER `git add`s — staging is the user's approval signal at `cbp-round-complete`.
+- Single-directive routing only — never an A/B/C menu.
+- `claude_only` profile is the deterministic-only carve-out (no reviewer spawn expected).
+## Integration
+- **Triggered by**: `/cbp-round-build` (auto, scope=round after execution); self-escalates to
+  scope=task on the last clean round.
+- **Reads**: `.codebyplan/state/**` (local-first; `npx codebyplan sync` on miss; MCP `get_*`
+  break-glass); changed files + git diff via the reviewer.
+- **Writes**: `codebyplan round update` / `codebyplan task update` (CLI write-through; MCP
+  `update_round` / `update_task` break-glass) — `verify_manifest`, `verify_verdict`.
+- **Spawns**: `cbp-verify-reviewer` (scope param); the `cbp-e2e-*` specialists feed Tier-1 proof
+  upstream in `cbp-round-build`.
+- **Triggers**: `/cbp-round-plan` (any fail or more-work), `/cbp-round-complete` (scope=round
+  finalize), `/cbp-finalize` (scope=task READY).
+- **References**: `reference/round-scope.md`, `reference/task-scope.md`,
+  `reference/deterministic-gates.md`.

package/templates/skills/cbp-verify/reference/deterministic-gates.md ADDED Viewed

@@ -0,0 +1,114 @@
+# Deterministic Gates — Command Contracts & Manifest
+Authoritative gate-command + manifest detail for `cbp-verify`. The SKILL.md phases point here;
+this file is loaded on demand.
+## KIND tool table
+KIND is detected once at SKILL Phase 1 from the identifier shape. MCP tool names differ by KIND;
+all writes prefer the CLI write-through and fall back to MCP.
+| Operation | `checkpoint` KIND | `standalone` KIND |
+|-----------|------------------|-------------------|
+| Get task | local state (break-glass `get_current_task`) | `get_current_standalone_task(repo_id)` |
+| Get rounds | local state (break-glass `get_rounds`) | `get_standalone_rounds(standalone_task_id)` |
+| Update round | `codebyplan round update` (MCP `update_round`) | MCP `update_standalone_round` |
+| Update task | `codebyplan task update` (MCP `update_task`) | MCP `update_standalone_task` |
+Empty-arg KIND detection: probe `get_current_standalone_task` first; if found → `standalone`;
+else `checkpoint` via `get_current_task`. (KIND detection is MCP-unavoidable — no identifier yet
+means no local path to probe; everything after is local-first.)
+## Phase 1 resolution detail
+| Parse | Resolution |
+|-------|-----------|
+| `{chk}-{task}-{round}` | checkpoint round. Read `.codebyplan/state/checkpoints/*.json` → filter `number==={chk}`; `.../tasks/*.json` → `{task}`; `.../rounds/*.json` → `{round}`. |
+| `{chk}-{task}` | checkpoint task (scope=task). Resolve checkpoint + task; verify all rounds `completed`. |
+| `{task}-{round}` | standalone round (scope=round). |
+| `{task}` (bare) | standalone task (scope=task). |
+| _(empty)_ | the active in-progress task/round from `.codebyplan/state/todos.json`. |
+On any miss: `npx codebyplan sync` once, re-read; MCP `get_*` break-glass only when the state dir
+is absent AND sync fails.
+## Phase 2 — `codebyplan check`
+```bash
+codebyplan check --scope <round|task> --json
+```
+JSON shape (`RunCheckResult`, source `packages/codebyplan-package/src/lib/check.ts:185`):
+```jsonc
+{
+  "results": [
+    { "check": "gate6|lint|typecheck|tests|audit",
+      "status": "pass|fail|skipped",
+      "exit_code": 0,
+      "command": "...",
+      "stdout": "...", "stderr": "...",
+      "executed": true,
+      "new_failures": ["@scope/pkg", "GHSA-xxxx"] }  // omitted for gate6
+  ],
+  "any_failed": false,
+  "hard_fail_checks": [],          // names of checks that failed post-baseline-diff
+  "no_baseline": false
+}
+```
+- **`gate6`** (sibling-identity parity) is ALWAYS hard — never baselined, no `new_failures` field.
+- `lint` / `typecheck` / `tests` / `audit` are **baseline-diffed**: `status: 'pass'` when
+  `new_failures` is `[]` even if the underlying command exited non-zero (pre-existing red is
+  tolerated). `audit.new_failures` lists new GHSA ids not in the allowlist.
+- Verdict: `any_failed === true` (≡ `hard_fail_checks.length > 0`) is a fail — surface each failing
+  result's `new_failures` / `stdout` / `stderr`. **This JSON is the verdict; never substitute prose.**
+- Soft tier uses NO `--no-baseline`. The whole-repo absolute-green tier
+  (`--scope merged --no-baseline`) belongs to checkpoint close, not this skill
+  (`rules/two-tier-ci.md`).
+## Phase 3 — `codebyplan e2e verify-round`
+```bash
+codebyplan e2e verify-round --round-id <uuid> --task-id <uuid>
+```
+Persist `round.context.e2e_eligible[]` + `e2e_outputs{}` FIRST (the CLI reads the round row from
+the DB). Verdict JSON (`VerifyRoundResult`, source `packages/codebyplan-package/src/lib/e2e.ts:127`):
+```jsonc
+{ "round_id": "...", "task_id": "...",
+  "result": { "pass": true, "failed_checks": [], "skipped_validly": [] } }
+```
+Exit 0 = pass. Exit 1 → one or more of `e2e_eligible_skipped` / `zero_assertion_run` /
+`empty_gallery` in `result.failed_checks[]` — surface verbatim, carry as a fail, route to a fix
+round (`rules/e2e-mandatory.md`). When `e2e_eligible[]` is empty, skip the call — nothing to verify.
+## `claude_only` carve-out (deterministic-only path)
+When the resolved profile is `claude_only` (round touched only `.claude/**` / docs / config — no
+app surface), there is **no reviewer to spawn by design**. Proof IS the deterministic set:
+1. `codebyplan check --scope <round|task> --json` (gate6 + matrix as above).
+2. `bash -n <hook>` for each touched `.sh` file.
+3. SKILL/agent/rule structure sanity for touched `.claude/` files (line counts, no `/cbp-*`
+   legacy notation).
+This is a first-class verification path, NOT a banned inline fallback
+(`rules/spawn-failure-is-gate-failure.md` carve-out) — Phase 4's reviewer spawn is skipped, and
+that skip is recorded as `verify_manifest.proof.tier: 4`, not a spawn failure.
+## verify-manifest write
+Write the manifest into round/task context (merge into existing context — the `update_*`
+REPLACE contract requires re-sending the full object):
+```bash
+codebyplan round update --id <round_id> --task-id <uuid> --checkpoint-id <uuid> --context '<json>'
+# break-glass: MCP update_round / update_standalone_round
+```
+Schema (canonical in `rules/execution-proof.md`): `verify_manifest = { scope, gates[], proof{ tier,
+artifacts[], e2e_verify_round }, decided_at }`. Each `proof.artifacts[].path` is proven committed
+via `git ls-files --error-unmatch <path>` before it counts.

package/templates/skills/{cbp-round-end → cbp-verify}/reference/findings-presentation.md RENAMED Viewed

@@ -1,6 +1,8 @@
-# Findings Presentation in `/cbp-round-end` Step 7
+# Findings Presentation & Infra Issue Absorption
-When `improve-round` returns findings, Step 7 presents them grouped by severity, then **auto-applies in-scope findings inline** (manual mode) or defers them to the next loop round (auto-loop mode). There is no findings-decision prompt.
+When `cbp-verify-reviewer` returns findings, `cbp-verify` Phase 4 presents them grouped by
+severity, then **auto-applies in-scope findings inline** (manual mode) or defers them to the next
+loop round (auto-loop mode). There is no findings-decision prompt.
 ## Example output
@@ -24,14 +26,16 @@ When `improve-round` returns findings, Step 7 presents them grouped by severity,
 ## Auto-apply model (manual mode)
-Step 7 auto-applies all **in-scope** findings inline — no user prompt. A finding is *in-scope* when every file it references is within the round's `files_changed[]`; it is *out-of-scope* otherwise.
+`cbp-verify` Phase 4 auto-applies all **in-scope** findings inline — no user prompt. A finding is
+*in-scope* when every file it references is within the round's `files_changed[]`; it is
+*out-of-scope* otherwise.
-- **In-scope** → the round-end orchestrator (main context, has Edit/Write) applies the fix directly via `Edit` / `Write`, re-runs the verification commands (hook syntax check + `cbp-testing-qa-agent` scoped to modified files), and records it in `round.context.inline_fix_log = { findings: [ids], rationale, fixes: [...], applied_at: <ISO> }`. The `cbp-improve-round` agent stays read-only/advisory and never writes.
-- **Out-of-scope** → saved to `round.context.improve_round_findings[]`; Step 8 routes them to `/cbp-round-input` (next round) or a new task per the Infra Issue Absorption Contract below.
+- **In-scope** → the verify orchestrator (main context, has Edit/Write) applies the fix directly via `Edit` / `Write`, re-runs the verification commands (hook syntax check + `cbp-testing-qa-agent` scoped to modified files), and records it in `round.context.inline_fix_log = { findings: [ids], rationale, fixes: [...], applied_at: <ISO> }`. The `cbp-verify-reviewer` agent stays read-only/advisory and never writes.
+- **Out-of-scope** → saved to `round.context.verify_findings[]`; Phase 5 routes them to `/cbp-round-plan` (next round) or a new task per the Infra Issue Absorption Contract below.
-The only user decision in Step 7 is the **baseline-regression accept** gate (baselines are NEVER auto-accepted). Under `auto_loop_mode`, Step 7 does not auto-apply — all findings are accepted into `improve_round_findings[]` and deferred to the next loop round.
+The only user decision in Phase 4 is the **baseline-regression accept** gate (baselines are NEVER auto-accepted). Under `auto_loop_mode`, Phase 4 does not auto-apply — all findings are accepted into `verify_findings[]` and deferred to the next loop round.
-The **Trivial-Resolution Exception** below still governs the deeper bypass cases (skipping executor / testing-qa / improve-round for ≤5-line non-logic corrective rounds); it is referenced by `/cbp-round-execute` and `/cbp-task-testing` for infra-issue absorption.
+The **Trivial-Resolution Exception** below still governs the deeper bypass cases (skipping executor / testing-qa / fresh-context review for ≤5-line non-logic corrective rounds); it is referenced by `/cbp-round-build` and `/cbp-verify` (task scope) for infra-issue absorption.
 ---
@@ -39,7 +43,7 @@ The **Trivial-Resolution Exception** below still governs the deeper bypass cases
 ### Resolve-in-Current-Scope by Default
-When `/cbp-round-execute` Step 5 (per-wave `cbp-testing-qa-agent`) or `/cbp-task-testing` surfaces a pre-existing infra-class issue (critical/high CVE, broken ESLint config-load, Playwright env-loading gap, dead CI pipeline, etc.), the default response is **absorb into current scope** — NOT create a standalone task.
+When `/cbp-round-build` Step 5 (per-wave `cbp-testing-qa-agent`) or `/cbp-verify` (task scope) surfaces a pre-existing infra-class issue (critical/high CVE, broken ESLint config-load, Playwright env-loading gap, dead CI pipeline, etc.), the default response is **absorb into current scope** — NOT create a standalone task.
 Order of preference for routing a finding:
@@ -84,10 +88,10 @@ When the trivial-resolution exception qualifies, the orchestrator MAY bypass the
 | Stage | Bypass allowed when | Document as |
 |-------|--------------------|-------------|
-| `cbp-round-executor` | Single-file Edit fully specified by prior reviewer output | `bypass_log.executor: "single-file edit, used direct Edit"` |
+| `cbp-round-builder` | Single-file Edit fully specified by prior reviewer output | `bypass_log.executor: "single-file edit, used direct Edit"` |
 | `cbp-testing-qa-agent` | Edit is non-code (comment, doc, type-annotation) AND existing test coverage protects the area | `bypass_log.testing_qa: "non-code edit, existing tests cover area"` |
-| `cbp-improve-round` | Diff is ≤5 lines AND no logic changed | `bypass_log.improve_round: "≤5 lines non-logic, skipped"` |
-| `cbp-task-planner` | Path B (the planner's trivial-corrective bypass that keeps repeat fix-rounds cheap) already qualifies | `bypass_log.planner: "Path B trivial-corrective bypass"` |
+| `cbp-verify-reviewer` | Diff is ≤5 lines AND no logic changed | `bypass_log.review: "≤5 lines non-logic, skipped"` |
+| `cbp-round-planner` | Path B (the planner's trivial-corrective bypass that keeps repeat fix-rounds cheap) already qualifies | `bypass_log.planner: "Path B trivial-corrective bypass"` |
 **ALL four bypasses simultaneously** is acceptable for ≤5-line non-logic corrective edits where every premise was verified by a prior reviewer.
@@ -95,7 +99,7 @@ When the trivial-resolution exception qualifies, the orchestrator MAY bypass the
 ### Infra-Class Issue Catalog
-These categories surface from per-wave `cbp-testing-qa-agent` or from `/cbp-task-testing`. Default routing for each is in-scope absorption unless genuinely off-axis from the active checkpoint.
+These categories surface from per-wave `cbp-testing-qa-agent` or from `/cbp-verify` (task scope). Default routing for each is in-scope absorption unless genuinely off-axis from the active checkpoint.
 | Category | Examples |
 |----------|----------|

package/templates/skills/cbp-verify/reference/round-scope.md ADDED Viewed

@@ -0,0 +1,62 @@
+# Round-Scope Verify
+Loaded by `cbp-verify` when `scope=round`. This is the per-round quality pass that runs after
+`/cbp-round-build` finishes execution — the soft tier of `rules/two-tier-ci.md`.
+## What round scope verifies
+The review window is THIS round's diff only (`round.files_changed` + `git diff` of the round). It
+covers automated checks, fresh-context review spawn, and finished-round triage routing in one
+scope-aware pass.
+## Phase mapping (round)
+- **Phase 2 — gates**: `codebyplan check --scope round --json`. Baseline-tolerant: only NEW
+  per-package failures fail; `gate6` always hard. The JSON is the verdict.
+- **Phase 3 — proof**: tier from the round's diff (`rules/execution-proof.md`).
+  - Tier 1: the `cbp-e2e-*` specialists already ran inside `cbp-round-build`; here persist
+    `e2e_eligible` / `e2e_outputs` then run `codebyplan e2e verify-round`. Empty gallery /
+    zero-assertion / eligible-skipped → fail.
+  - Tier 2/3: dev-server screenshot or HTTP trace for the round's changed routes/endpoints,
+    committed and proven via `git ls-files --error-unmatch`.
+  - Tier 4 (`claude_only`): deterministic-only path, no reviewer spawn (see
+    `reference/deterministic-gates.md`).
+- **Phase 4 — review**: spawn `cbp-verify-reviewer` with `scope: 'round'`. Spawn failure = HARD
+  GATE FAILURE → STOP + retry directive (`rules/spawn-failure-is-gate-failure.md`). In-scope
+  mechanical findings → orchestrator applies via Edit/Write; blocking findings →
+  `/cbp-round-plan`. A baseline regression surfaced by the reviewer or e2e is a blocking
+  user-accept gate, never auto-accepted.
+## Phase 5 routing (round)
+| Result | Directive |
+|--------|-----------|
+| Any gate / proof / review fail | `Next: /cbp-round-plan` (fix round) |
+| Pass, but more work wanted on the task | `Next: /cbp-round-plan` (another round) |
+| Pass + LAST round + clean | escalate to `scope=task` (re-enter Phase 1) |
+"More work wanted" is signalled the same way the old pipeline did — unstaged files at
+`/cbp-round-complete` mean the user wants more on them. cbp-verify does not decide that; it routes
+to the human gate and lets staging speak.
+## Phase 6 finalize (round) — hand to the human git-add gate
+cbp-verify does NOT complete the round and NEVER `git add`s. On a clean pass it persists the
+`verify_manifest` to round context and routes:
+```
+Next: /cbp-round-complete
+```
+`/cbp-round-complete` is the separate `ask`-tier, `disable-model-invocation` finalizer: the user
+stages the files they approve (`git add`), the skill reconciles via `codebyplan round
+sync-approvals` and `complete_round`, then routes onward (all files approved → escalate to task
+verify; some withheld → `/cbp-round-plan`). The permission prompt on `/cbp-round-complete` IS the
+human confirmation — do not add an AskUserQuestion in cbp-verify at round scope.
+## Writes (round)
+`codebyplan round update --id <round_id> --task-id <uuid> --checkpoint-id <uuid> --context '<json>'`
+(merge `verify_manifest` into existing context; the REPLACE contract requires the full object).
+Break-glass: MCP `update_round` (checkpoint KIND) / `update_standalone_round` (standalone KIND) —
+pass `caller_worktree_id` on locked feat rows.

package/templates/skills/cbp-verify/reference/task-scope.md ADDED Viewed

@@ -0,0 +1,71 @@
+# Task-Scope Verify
+Loaded by `cbp-verify` when `scope=task` — reached by escalation from the last clean round, or by
+an explicit `{chk}-{task}` / bare-`{task}` argument. This is the holistic cross-round
+double-check — AI production review plus comprehensive task-level testing in one pass.
+## Precondition
+All rounds of the task must be `completed`. If any round is `in_progress`, STOP:
+```
+## Cannot run task verify
+TASK-[N] has an active round (Round [N]). Finish it first (run /cbp-verify at round scope, then
+/cbp-round-complete).
+```
+## What task scope verifies
+The review window is the FULL aggregated task diff — all rounds' `files_changed` deduplicated
+(latest action per path wins). Task scope catches what no single round can see: requirements
+traceability, checkpoint-goal alignment, cross-round integration gaps, whole-repo lint/type/test
+regressions, and shippability.
+## Phase mapping (task)
+- **Phase 2 — gates**: `codebyplan check --scope task --json`. Whole-repo + baseline; only NEW
+  per-package failures fail; `gate6` always hard. This is the cross-package layer invisible to
+  per-round checks (a non-web package edit that slipped past per-round web-only lints surfaces
+  here).
+- **Phase 3 — proof**: aggregate proof across the task diff — every UI surface touched across all
+  rounds must have a committed artifact (`rules/execution-proof.md`). Re-run `codebyplan e2e
+  verify-round` for each round whose `e2e_eligible[]` is non-empty.
+- **Phase 4 — review**: spawn `cbp-verify-reviewer` with `scope: 'task'`. It grades each
+  requirement (`met`/`partially met`/`not met` with `path:line` evidence), checks
+  `checkpoint.goal` alignment, runs the holistic cross-round code review + shippable gate, and
+  surfaces `scope_divergence_candidates`. Spawn failure = HARD GATE FAILURE → STOP + retry
+  (`rules/spawn-failure-is-gate-failure.md`).
+## Phase 6 — the ONE genuine human step
+After the deterministic gates + reviewer pass, run a single batched `AskUserQuestion` walkthrough:
+present every user-testable item (visual quality, UX flow, business-logic correctness, edge cases,
+content accuracy) in ONE checklist prompt with a single overall answer — NEVER one question per
+item. Generate the items from task requirements + the aggregated diff + round context.
+`scope_divergence_candidates` from the reviewer are confirmed here (the reviewer cannot capture
+user input — it is read-only). If the user confirms a divergence about FUTURE scope, route to
+`/cbp-checkpoint-update` instead of finalize (the current task delivered correctly; the divergence
+belongs to checkpoint replanning).
+## Phase 5/6 routing (task)
+| Result | Directive |
+|--------|-----------|
+| Any gate / proof / review fail (fixable) | `Next: /cbp-round-plan` (fix round) |
+| Reviewer NOT_READY — needs new task scope | `Suggest: /cbp-task-create` then STOP (user scope decision) |
+| Confirmed future-scope divergence | `Next: /cbp-checkpoint-update` |
+| Pass + user satisfied | write verdict, `Next: /cbp-finalize` |
+On the pass path, write `task.context.verify_verdict = { verdict: 'READY', manifest, user_tests,
+decided_at }`:
+```bash
+codebyplan task update --id <task_id> --checkpoint-id <uuid> --context '<json>'
+# break-glass: MCP update_task (checkpoint KIND) / update_standalone_task (standalone KIND)
+```
+`/cbp-finalize` (the task-level ship finalizer) reads
+`task.context.verify_verdict` — it must exist with `verdict: 'READY'` before finalize proceeds.
+cbp-verify never edits source at task scope beyond the orchestrator-applied in-scope mechanical
+fixes from Phase 4; it never `git add`s.