npm - codebyplan - Versions diffs - 1.13.52 → 1.13.54 - Mend

codebyplan 1.13.52 → 1.13.54

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (92) hide show

package/templates/agents/cbp-task-check.md DELETED Viewed

@@ -1,217 +0,0 @@
----
-name: cbp-task-check
-description: Task verification agent. Verifies requirements, checkpoint alignment, QA status, file approvals, code review, shippable gate, round outcome analysis, and user satisfaction discussion.
-tools: Read, Glob, Grep, Bash, AskUserQuestion
-model: sonnet
-effort: xhigh
----
-# Task Check Agent
-AI-driven production readiness review with user satisfaction discussion. This is the **cross-round double-check** layer: per-round QA (build/lint/types per app, the `console.log`/debug scan, the OWASP/secret grep, API auth-enforcement curls, `pnpm audit`) already ran inside each round's `testing-qa-agent` — this agent does NOT re-run it. Its unique value is holistic: verifying all task requirements are met, checkpoint goals are aligned, the aggregated work is shippable, and — for tasks that span many rounds where scope can shift as new ideas/problems surface — detecting scope drift that should update the checkpoint or task rather than re-running per-round checks.
-**Numeric-claim verification (Proposal P6)**: when round summaries assert numeric facts (file counts, package counts, percentage changes, line counts, version numbers), verify each via direct count: `find ... | wc -l`, `grep -c`, `wc -l <file>`. Do NOT accept narrative numbers without a verification command. Mismatches between asserted and actual counts indicate documentation drift; flag as a finding requiring a fix.
-## Input Contract
-```yaml
-input:
-  task_number: number
-  round_number: number  # total rounds
-  checkpoint: {id, title, goal, context}
-  task: {id, title, requirements, context, files_changed, qa}
-  rounds: [{number, requirements, context, qa, files_changed}]
-```
-## Output Contract
-```yaml
-output:
-  status: 'completed'
-  verdict: 'READY' | 'NOT_READY'
-  requirements_check: [{requirement, status, evidence}]
-  checkpoint_alignment: {aligned: boolean, notes: string}
-  qa_summary: {passed, failed, pending}
-  files_summary: {approved, unapproved, list_unapproved}
-  code_review: {pass: boolean, issues: []}
-  shippable: {yes: boolean, caveats: []}
-  round_outcome_analysis: {direction_changes: [], improvements: [], task_data_updates: {}}
-  user_satisfaction: {satisfied: boolean, feedback: string}
-  route_recommendation: string
-```
-## Workflow
-### Phase 1: Completeness Gate
-Verify all rounds are completed (status = `completed`). No in_progress rounds allowed.
-If any round is incomplete:
-- Set verdict = NOT_READY
-- Return immediately with route_recommendation = `/cbp-round-update`
-### Phase 2: Requirements Verification
-Parse `task.requirements` into individual items. For EACH requirement:
-1. Read the requirement text
-2. Search `task.files_changed` for files that address it
-3. Search round summaries and context for implementation evidence
-4. Check QA items related to it
-| # | Requirement | Status | Evidence |
-|---|------------|--------|----------|
-| 1 | [text] | met / partially met / not met | [file paths, round numbers] |
-**Verdict rules:**
-- Any requirement "not met" = automatic NOT_READY
-- Any "partially met" = explain what is missing, whether it blocks shipping
-- All "met" = proceed
-### Phase 3: Checkpoint Goal Alignment
-Compare task work against `checkpoint.goal`:
-- Does this task contribute to the checkpoint goal?
-- Any contradictions between task decisions and checkpoint direction?
-- Flag drift from original intent
-### Phase 4: QA Status Review
-Review all QA items across all rounds:
-- **Auto items**: Verify all passed (build, lint, types, tests)
-- **Default items**: Verify all resolved (pass or skipped with reason)
-**E2E deterministic gate**: For each round where `round.context.e2e_eligible[]` is non-empty, run `codebyplan e2e verify-round --round-id <round_id> --task-id <task_id>`. Exit 0 = pass. Exit 1 = hard-fail — refuse a READY verdict and surface the stdout JSON's `failed_checks[]` verbatim in the verdict text. The CLI deterministically evaluates all three e2e hard-fails that were previously judged manually: `e2e_eligible_skipped` (eligible framework with no specialist output and no valid skip reason), `zero_assertion_run` (`passed === 0 && skipped > 0` on a path touching `files_changed` — "E2E spec authored but assertions did not execute (skip-gated)"), and `empty_gallery` (eligible UI-touching run with zero committed screenshots, per `rules/e2e-mandatory.md` § Committed-Screenshot Enforcement; the sole vscode-test-only exception is honored by the CLI). On any exit-1, route to a fix round per `rules/e2e-mandatory.md`.
-List any pending or failed items. Determine if they are blockers.
-### Phase 5: File Approval Check
-Check `task.files_changed`:
-- Count approved vs not_approved
-- List unapproved files
-- Determine if unapproved files block completion
-### Phase 6: Code Review (holistic spot-check)
-Per-round QA already ran the line-level checks — the `console.log`/debug scan (round `testing-qa-agent` Phase 3.5), the OWASP secret/injection grep (Phase 5), the API auth-enforcement curl (Phase 3.55), and `pnpm audit` (Phase 3.7). Do NOT re-run them here. Phase 6 is a light holistic spot-check across the aggregated diff for what a single round cannot see:
-- No obvious bugs or regressions that emerge only when all rounds' changes are read together
-- No cross-round integration gaps (a field/contract introduced in one round that a later round broke)
-- Error handling present where needed at the feature boundary
-- Consistent with existing codebase patterns across the full task diff
-If the aggregated diff surfaces an obvious issue per-round QA missed, flag it as a finding — but the per-round scans are authoritative for line-level concerns.
-### Phase 7: Shippable Feature Gate
-Ask: "If deployed now, would this feature work end-to-end?"
-- **YES**: Continue
-- **YES with caveats**: List caveats
-- **NO**: Verdict = NOT_READY, list what is broken/incomplete
-Catches integration gaps where requirements are technically met but feature does not work as a whole.
-### Phase 8: Round Outcome Analysis
-Analyze how rounds evolved the work:
-- **Direction changes**: Did user feedback change approach? Document shifts.
-- **Improvements**: What got better across rounds? What patterns emerged?
-- **Task data updates**: Capture actual outcomes vs planned for task context.
-Update `round_outcome_analysis` with findings.
-### Phase 9: User Satisfaction Discussion
-For tasks that ran many rounds, scope drift accumulates quietly — each round may have absorbed a new idea or problem without the checkpoint/task requirements being updated. The satisfaction discussion is where that drift surfaces; treat the scope-divergence scan below as a first-class output, not an afterthought.
-Present findings to user via AskUserQuestion:
-```
-## AI Production Review: TASK-[N]
-### Requirements: [N]/[N] met
-[table]
-### Shippable: [yes/no/caveats]
-### Checkpoint Alignment: [aligned/drift]
-### QA: [passed/failed/pending counts]
-### Files: [approved/unapproved counts]
-### Code Review: [pass/issues]
-### Round Evolution:
-[Brief summary of how work evolved across rounds]
-Are you satisfied with the delivered work? Any concerns or feedback?
-```
-Capture response in `user_satisfaction`.
-**Scope-divergence detection**: after capturing the response, scan it against the active checkpoint's locked context. Set `scope_divergence_detected: true` and populate `divergence_summary` when ANY hold:
-- The response references a different `TASK-N` (e.g., "before TASK-2 starts, we should re-shape findings") implying a re-slicing of upcoming tasks
-- The response contradicts a locked entry in `checkpoint.context.decisions[]` (e.g., user picked option B at checkpoint creation; their answer here implies option A is now correct)
-- The response introduces a new constraint or success criterion not present in the original task or checkpoint requirements
-`divergence_summary` shape:
-```yaml
-scope_divergence_detected: true
-divergence_summary:
-  diverges_from: "checkpoint.context.decisions[2]" | "task.requirements[1]" | "task TASK-N scope"
-  user_statement: "<verbatim quote>"
-  implication: "<one-line: what would need to change>"
-```
-When no divergence is detected, set `scope_divergence_detected: false` and proceed normally.
-### Phase 10: Verdict and Routing
-**READY** (all checks pass + user satisfied) AND `scope_divergence_detected: false`:
-- verdict = READY
-- route_recommendation = `/cbp-task-testing`
-**READY + scope_divergence_detected: true** (work is correct, but user input implies upcoming-scope change):
-- verdict = READY
-- route_recommendation = `/cbp-checkpoint-update`
-- Populate `route_context.divergence_summary` so checkpoint-update sees what changed
-- Rationale: the current task delivered correctly; the divergence is about FUTURE work and belongs to checkpoint replanning, not a fix round
-**NOT_READY — fixable issues:**
-- verdict = NOT_READY
-- route_recommendation = `/cbp-round-input`
-- List specific issues to address
-**NOT_READY — needs new task:**
-- verdict = NOT_READY
-- route_recommendation = `/cbp-task-create`
-- Explain why current task scope is insufficient
-**NOT_READY — approvals missing:**
-- verdict = NOT_READY
-- route_recommendation = "Approve files, re-run `/cbp-task-check`"
-- List unapproved files
-## Key Rules
-- **This is AI review + user discussion** — distinct from automated testing
-- **Read all changed files** — do not just check metadata
-- **Be thorough but practical** — flag real issues, not style preferences
-- **No file changes** — review only, never edit
-- **`/cbp-task-check` is NEVER skippable**
-## Completion Criteria
-- All 10 phases executed
-- All changed files read and reviewed
-- User satisfaction captured
-- Verdict determined with evidence
-- Route recommendation provided
-## Integration
-- **Spawned by**: `/cbp-task-check` command
-- **Returns to**: `/cbp-task-check` which routes based on verdict
-- **Reads**: All task, checkpoint, and rounds data arrives via the Input Contract (passed by `/cbp-task-check`). Local `.codebyplan/state/` files are the preferred source when `/cbp-task-check` pre-fetches context — read `.codebyplan/state/checkpoints/<checkpointId>/tasks/<taskId>.json` and `rounds/*.json` (local-first; break-glass: MCP `get_*` tools when state dir is absent and sync fails). The agent itself reads only filesystem content (changed files) via the Read tool — it never calls MCP or CLI directly.
-- **Writes**: None — review only, never edits.

package/templates/skills/cbp-round-check/SKILL.md DELETED Viewed

@@ -1,132 +0,0 @@
----
-name: cbp-round-check
-description: Run automated checks standalone for the current round
-effort: low
----
-<!-- Re-read this file before executing. Do not rely on memory. -->
-## Kind Detection
-Inspect the resolved identifier from argument parsing to determine the task kind:
-| Identifier shape | KIND |
-|-----------------|------|
-| `{task}-{round}` (2-segment, e.g. `45-2`) | `standalone` |
-| `{chk}-{task}-{round}` (3-segment, e.g. `141-3-1`) | `checkpoint` |
-| _(empty / free-text)_ | Check `get_current_standalone_task` first; if found → `standalone`. Else → `checkpoint` via `get_current_task`. (Kind-detection is MCP-unavoidable — no identifier yet means no local path to probe; subsequent operations are local-first per the rows below.) |
-Set `KIND` for the rest of this skill. MCP tool names vary by KIND:
-| Operation | `checkpoint` KIND | `standalone` KIND |
-|-----------|------------------|-------------------|
-| Get task | local state (break-glass: `get_current_task`) | `get_current_standalone_task(repo_id)` |
-| Get rounds | local state (break-glass: `get_rounds`) | `get_standalone_rounds(standalone_task_id)` |
-| Add round | `add_round(task_id, ...)` | `add_standalone_round(standalone_task_id, ...)` |
-| Update round | `update_round(round_id, ...)` | `update_standalone_round(standalone_round_id, ...)` |
-| Complete round | `complete_round(round_id, duration_minutes?)` | `complete_standalone_round(standalone_round_id, duration_minutes?, caller_worktree_id)` ⚠️ `caller_worktree_id` is REQUIRED for standalone |
-| Update task | `update_task(task_id, ...)` | `update_standalone_task(standalone_task_id, ...)` |
-# Round Check Command
-Run automated checks independently with mandatory execution. Updates round QA. Hard fails if mandatory checks (gate6/lint/typecheck/tests) fail.
-## Instructions
-### Step 1: Get Current Round
-Use Kind Detection above to set KIND. Then:
-- **checkpoint KIND**: Read `.codebyplan/state/checkpoints/<checkpointId>/tasks/<taskId>.json` (local-first) to find active task, then read `.codebyplan/state/checkpoints/<checkpointId>/tasks/<taskId>/rounds/<roundId>.json` to find the in-progress round. If missing/stale, run `npx codebyplan sync` once and re-read. Break-glass fallback: MCP `get_current_task(repo_id)` + `get_rounds(task_id)` when state dir is absent and sync fails.
-- **standalone KIND**: MCP `get_current_standalone_task(repo_id)` to find active task, then `get_standalone_rounds(standalone_task_id)` to find the in-progress round. (Standalone KIND still uses MCP until a later task.)
-### Step 2: Run Core Check Matrix
-From the repo root, run:
-```bash
-codebyplan check --scope round --json
-```
-Capture the JSON output. The runner is **whole-repo + baseline**: it runs `turbo run lint|typecheck|test` across every package and diffs each per-package result against the committed `.check-baseline.json`, so a pre-existing failure in an unrelated package does NOT fail the check — only a NEW failure does. The result shape is:
-```json
-{
-  "results": [
-    {"check": "gate6"|"lint"|"typecheck"|"tests"|"audit", "status": "pass"|"fail"|"skipped",
-     "exit_code": number|null, "command": string, "stdout": string, "stderr": string,
-     "executed": boolean, "new_failures"?: string[]}
-  ],
-  "any_failed": boolean,
-  "hard_fail_checks": [ ...names of checks that FAILED ]
-}
-```
-Five checks run in order: `gate6` (sibling-identity parity — `node scripts/check-sibling-identity.mjs`), `lint`, `typecheck`, `tests`, `audit`. For the baselined checks (`lint`/`typecheck`/`tests`) `new_failures` lists the packages that newly fail (not in the baseline); `status` is `pass` when `new_failures` is empty **even if the underlying command exited non-zero** (those failures are pre-existing/baselined). `audit.new_failures` lists new GHSA advisory ids not in the allowlist. **`gate6` is ALWAYS hard-fail — it is never baselined**; its `new_failures` field is omitted (absent/`undefined` in the JSON, not `null`), and a sibling-parity divergence fails the round regardless of the baseline.
-`hard_fail_checks` is dynamic — it lists only the checks that failed (`[]` when all pass; e.g. `["gate6"]` or `["typecheck","tests"]`), drawn from `results[].check`. The hard-fail checks for `--scope round` are `gate6`, `lint`, `typecheck`, and `tests` (`audit` is `--scope task` only). If `any_failed === true` (equivalently, `hard_fail_checks` is non-empty), this is a **hard fail** — surface each failing result's `stdout`/`stderr` (and `new_failures`) and stop.
-### Step 3: Execute Conditional Checks
-| Check | Command | Condition |
-|-------|---------|-----------|
-| **A11y** | Static check (aria, alt, focus) | UI files changed |
-| **API Health** | `curl -s -o /dev/null -w "%{http_code}" http://localhost:{PORT}/` | API routes changed |
-| **Visual** | Visual check flow (page-map + visual-check) | UI work + dev server running |
-### Step 4: Analyze Output
-Scan each runner result's `stdout`/`stderr` for:
-- **Warnings** (not just errors)
-- **Deprecation notices** (`grep -i "deprecat"` in output)
-- **Console.log in changed files**: `grep -rn "console\.\(log\|debug\|info\)" {changed_files}` (exclude tests)
-- **Bundle size warnings**
-### Step 5: Save QA Results
-Update round QA:
-- **checkpoint KIND**: `codebyplan round update --id <round_id> --task-id <task_id> --checkpoint-id <checkpoint_id> --qa '<json>'` (CLI write-through: local state file + REST). Break-glass fallback: MCP `update_round(round_id, qa: ...)` when the CLI is unavailable.
-- **standalone KIND**: MCP `update_standalone_round(standalone_round_id, qa: ...)`. (Standalone KIND still uses MCP until a later task.)
-Map each runner result entry to a QA item:
-```json
-{
-  "items": [
-    {"type": "auto", "check": "gate6", "status": "pass", "ran_at": "...", "notes": null, "executed": true},
-    {"type": "auto", "check": "lint", "status": "pass", "ran_at": "...", "notes": null, "executed": true},
-    {"type": "auto", "check": "typecheck", "status": "fail", "ran_at": "...", "notes": "1 new failing package", "executed": true},
-    {"type": "auto", "check": "tests", "status": "pass", "ran_at": "...", "notes": "no new failures (baselined)", "executed": true}
-  ]
-}
-```
-### Step 6: Show Results
-```
-## Round Check Results
-| Check | Status | Executed | Notes |
-|-------|--------|----------|-------|
-| gate6 | pass   | yes      | sibling-identity OK |
-| lint  | pass   | yes      | -     |
-| typecheck | fail | yes    | 1 new failing package |
-| tests | pass   | yes      | no new failures (baselined) |
-| A11y  | pass   | yes      | -     |
-| Visual| pass   | yes      | screenshots saved |
-**Result**: [N] passed, [N] failed, [N] skipped
-**Hard fail**: [yes/no]
-```
-If hard fail: `Mandatory checks failed. Fix issues before continuing.`
-If soft failures only: `Run /cbp-round-start to trigger auto-fix, or fix manually.`
-## Integration
-- **Reads (checkpoint KIND)**: `.codebyplan/state/checkpoints/<id>.json`, `checkpoints/<id>/tasks/<id>.json`, `checkpoints/<id>/tasks/<id>/rounds/<id>.json` (local-first; run `npx codebyplan sync` if missing; break-glass: MCP `get_current_task` / `get_rounds`)
-- **Reads (standalone KIND)**: MCP `get_current_standalone_task` / `get_standalone_rounds` (standalone KIND still uses MCP until a later task)
-- **Writes (checkpoint KIND)**: `codebyplan round update` (qa field). Break-glass: MCP `update_round`.
-- **Writes (standalone KIND)**: MCP `update_standalone_round` (qa field). (Standalone KIND still uses MCP until a later task.)
-- **Runner**: `codebyplan check --scope round --json` (whole-repo + baseline via `turbo run`; runs gate6 + lint + typecheck + tests; `--files` is accepted but ignored in whole-repo mode)
-- **Standalone**: Can be run independently at any time

package/templates/skills/cbp-round-end/SKILL.md DELETED Viewed

@@ -1,173 +0,0 @@
----
-name: cbp-round-end
-description: Summary wrap-up after testing phase completes
-effort: high
----
-# Round End Command
-Summary phase — presents what was done, then runs code quality review to catch bugs and logic errors that automated checks miss.
-**Inline-fallback for any spawn failure**: when `cbp-improve-round` (or any peer agent) fails to spawn, the orchestrator falls through to an inline procedure that produces equivalent (lower-fidelity but valid) output. The contract: detect failure class → record in `round.context.improve_round_findings.spawn_failure` → walk the agent's Phase checklist inline → continue the skill. Same procedure for every failure class (org/billing, monthly Agent cap, provider 5xx, rate limit, context overflow, tool not available). Pre-emptive skip applies when the same class fired on the prior round.
-See `reference/inline-fallback.md` for full trigger table, procedure, and coverage list.
-## Pipeline
-```
-/cbp-round-execute → /cbp-round-end → [code review + auto-apply in-scope] → /cbp-round-update
-```
-## Identifier Notation
-This skill operates on the **active** task/round resolved via MCP `get_current_task` / `get_rounds` and does not accept a positional identifier argument. Canonical chk-task-round notation is defined in `cbp-round-start` Step 0 "CHK / TASK / ROUND Identifier Notation Vocabulary".
-## Instructions
-### Step 1: Get Current Task and Round
-Read `.codebyplan/state/checkpoints/<checkpointId>/tasks/<taskId>.json` (local-first) to find the active task. If missing/stale, run `npx codebyplan sync` once and re-read. Break-glass fallback: MCP `get_current_task` when the state dir is absent and sync fails (daemon-dead + CLI-unavailable).
-Read `.codebyplan/state/checkpoints/<checkpointId>/tasks/<taskId>/rounds/<roundId>.json` (local-first) to find the in-progress round. Same sync / break-glass pattern (MCP `get_rounds` as fallback).
-Load round context with all outputs (executor_output, testing_qa_output, reviewer_output).
-### Step 2: Collect Files Changed
-Collect all files changed during this round from:
-- Work executor output
-- `git diff --name-status HEAD` for final state
-Build the files list with approval status:
-```json
-[
-  {
-    "path": "src/file.ts",
-    "action": "modified",
-    "claude_approved": true,
-    "user_approved": false
-  }
-]
-```
-**claude_approved**: `true` if cbp-testing-qa-agent passed for this file. `false` if issues remain.
-**user_approved**: Always `false` initially. User approves via git staging or web UI.
-### Step 3: Collect QA Results
-**No QA runs here** — all QA was already executed by per-wave `cbp-testing-qa-agent` inside `/cbp-round-execute` Step 5.
-#### 3a — Collect items from agent outputs
-Collect from round context:
-- **Auto items**: from `testing_qa_output.auto_qa.items`
-- **Default items**: from `testing_qa_output.default_checklist.items`
-Merge with previous rounds (supersede items for re-modified files, preserve verified items).
-### Step 4: Update Task Files and QA
-- **Round files + QA**: `codebyplan round update --id <round-id> --task-id <uuid> --checkpoint-id <uuid> --files-changed <json> --qa <json>` (CLI write-through: local state at `.codebyplan/state/checkpoints/<checkpointId>/tasks/<taskId>/rounds/<roundId>.json` + REST). Break-glass fallback: MCP `update_round` when the CLI is unavailable.
-- **Task files_changed merge**: `codebyplan task update --id <task-id> --checkpoint-id <uuid> --files-changed <json>` (CLI write-through: local state at `.codebyplan/state/checkpoints/<checkpointId>/tasks/<taskId>.json` + REST). Break-glass fallback: MCP `update_task` when the CLI is unavailable.
-- **Task QA aggregated**: `codebyplan task update --id <task-id> --checkpoint-id <uuid> --qa <json>` (same CLI write-through). Break-glass: MCP `update_task`.
-### Step 5: Present Summary
-```
-## Round [N] Complete - Ready for Review
-### Work Done
-[Brief summary from executor_output]
-### Files Changed ([N] files, [N] need approval)
-| File | Action | Claude | User |
-|------|--------|--------|------|
-| src/file.ts | modified | approved | pending |
-### Auto Checks
-| Check | Status |
-|-------|--------|
-| Build | pass |
-| Lint | pass |
-| Types | pass |
-| Tests | pass/skipped |
-```
-### Step 6: Spawn Code Quality Review
-Spawn `cbp-improve-round` agent via Agent tool with:
-```yaml
-input:
-  repo_id: [from config]
-  task: {id, title, requirements, context}
-  round: {id, number, requirements, files_changed, context}
-  project_path: [working directory]
-```
-Wait for agent to complete. If the spawn fails for any reason, apply the inline-fallback procedure documented in `reference/inline-fallback.md` (record `round.context.improve_round_findings.spawn_failure`, walk the agent's Phase checklist inline, continue the skill).
-### Step 7: Present Findings
-**Baseline-regression blocking gate**: before presenting code-review findings, check `round.context.frontend_ui_review.findings[]` for any entry with `category: 'baseline_regression'` (any severity). When one or more such findings exist, surface them as a BLOCKING decision that MUST be resolved before routing to `/cbp-round-update`:
-- Present each regression: screenshot path, `baseline_diff_pct`, affected page/screen.
-- Ask the user via AskUserQuestion to choose:
-  - **(a) Treat as regression** — add a fix-round to address the visual change, OR
-  - **(b) Accept the new baseline** — run `pnpm exec playwright test --update-snapshots` in `apps/{app}` and commit the updated baselines.
-- Do NOT route to `/cbp-round-update` until the decision is recorded. Baselines are NEVER auto-accepted.
-`rendered_visual` critical findings from `round.context.frontend_ui_review.findings[]` are surfaced in the normal findings presentation below (not as a separate gate).
-**If `status: 'no_findings'`:** show `### Code Review\nNo issues found. Code looks good.` and skip to Step 8.
-**If findings exist**, present them grouped by severity (table + per-finding details).
-**Under `auto_loop_mode === true`**: do NOT auto-apply here — Step 8's auto-loop path accepts all findings into `improve_round_findings[]` and defers the fixes to the next loop round. Skip straight to Step 8.
-**Manual mode**: **auto-apply all in-scope findings inline**. A finding is *in-scope* when every file it references is within the round's `files_changed[]`. The round-end orchestrator (main context — it has Edit/Write) applies these fixes directly; the `cbp-improve-round` agent stays read-only/advisory and never writes. Record each applied fix in `round.context.inline_fix_log` (findings indices, rationale, `fixes[]`, applied_at). After applying, re-run the verification scoped to the modified files (hook syntax check for `.sh`; `cbp-testing-qa-agent` for code) per `reference/findings-presentation.md`; if it fails, do NOT record the fix — treat the finding as out-of-scope instead. Findings that reference files OUTSIDE `files_changed[]` are **out-of-scope** — do NOT apply them; save them to `improve_round_findings[]` so Step 8 routes them to `/cbp-round-input` or a new task. There is no findings-decision AskUserQuestion — the round was already approved at the `/cbp-round-execute` permission prompt. The baseline-regression gate above is the ONLY user decision in this step.
-Example tables and the in-scope/out-of-scope classification: see `reference/findings-presentation.md`.
-### Step 8: Route Based on Decisions
-**If `round.context.auto_loop_mode === true`** (auto-loop active):
-- Auto-accept ALL findings into `improve_round_findings[]` regardless of severity (the user opted into the loop).
-- Skip the polish-spiral stop-gate (auto-loop has its own cap-exhausted termination).
-- Skip Step 7's inline auto-apply (findings are deferred to the next loop round, not applied this round).
-- Save findings via `update_round` exactly as in manual mode.
-- Auto-trigger `/cbp-round-update` immediately. round-update triages the round and either routes to `/cbp-round-input` (spawn another round) or **directs the user to run** `/cbp-round-complete` on a clean exit — see cbp-round-update SKILL.md Step 2/3.
-**Else (manual mode — flag absent or false):**
-Step 7 already auto-applied in-scope findings and logged them to `round.context.inline_fix_log`. Now record any out-of-scope findings and route:
-1. **Polish-spiral stop-gate** (round 2+ only): if this is round 2 or later AND the prior round also ended with code-review fixes, surface a one-line stop-gate via AskUserQuestion — *defer remaining polish to a follow-up task* vs *continue with another round*. This is a genuine user decision about scope (it guards against endless low-value polish loops), not a flow-control prompt. Skip on round 1.
-2. Save out-of-scope findings (those NOT auto-applied in Step 7) to round context via `codebyplan round update --id <round-id> --task-id <uuid> --checkpoint-id <uuid> --context <json>` (break-glass: MCP `update_round`):
-   ```json
-   {
-     "context": {
-       "improve_round_findings": [out-of-scope findings]
-     }
-   }
-   ```
-3. Auto-trigger `/cbp-round-update`. round-update triages the round: if out-of-scope findings (or a hard-fail) remain it routes to `/cbp-round-input` (which picks up the findings from round context and includes them in the new round's requirements automatically); if the round is clean it **directs the user to run** `/cbp-round-complete` (the user-invoked finalizer that reconciles the user's `git add`s and completes the round).
-## Key Rules
-- Claude NEVER git adds files — user approval is via git staging at `/cbp-round-complete`
-- Auto-triggers `/cbp-round-update` after findings are handled
-- `/cbp-round-end` is auto-triggered by `/cbp-round-execute` (user does not call it directly)
-- In-scope findings are **auto-applied inline** by the round-end orchestrator (the round was already approved at the `/cbp-round-execute` permission); out-of-scope findings route to `/cbp-round-input`. `cbp-improve-round` stays read-only/advisory. Baseline-regression accept (Step 7 gate) stays a user decision — baselines are NEVER auto-accepted.
-## Integration
-- **Triggered by**: `/cbp-round-execute` (auto, after all waves + testing complete)
-- **Reads**: `.codebyplan/state/checkpoints/<id>/tasks/<id>.json`, `checkpoints/<id>/tasks/<id>/rounds/<id>.json` (local-first; `npx codebyplan sync` on miss; MCP `get_current_task` / `get_rounds` as break-glass)
-- **Writes**: `codebyplan round update` (Step 4 round files/QA, Step 8 findings; break-glass: MCP `update_round`), `codebyplan task update` (Step 4 files_changed + QA aggregated; break-glass: MCP `update_task`)
-- **Spawns**: `cbp-improve-round` (code quality review)
-- **Triggers**: `/cbp-round-update` (auto, after findings handled)

package/templates/skills/cbp-round-end/reference/inline-fallback.md DELETED Viewed

@@ -1,35 +0,0 @@
-# Inline-Fallback for Any Spawn Failure
-When `improve-round` (or any agent spawned by this or peer skills) fails to spawn, the orchestrator falls through to an inline procedure that produces equivalent (lower-fidelity but valid) output. Same contract for every failure class — no special-casing per class.
-## Trigger conditions (any one)
-| Failure class             | Detection signal                                                                      |
-| ------------------------- | ------------------------------------------------------------------------------------- |
-| Org/billing limit         | `API Error: Extra usage is required for 1M context` (the original Proposal U trigger) |
-| Monthly Agent usage cap   | `API Error: This conversation has reached the monthly Agent usage limit` or similar   |
-| Provider 5xx              | Spawn returns `API Error 500` / `502` / `503` — transient or sustained                |
-| Rate limit                | `API Error 429` with retry hint                                                       |
-| Context overflow at spawn | Spawn returns `Context window exceeded` before agent can run                          |
-| Tool not available        | Skill caller's tool surface lacks Agent (rare — only in nested-agent contexts)        |
-## Fallback procedure (uniform across all triggers)
-1. Note the failure: `agent_spawned: false`, `skip_reason: "<one-line failure class>"`. Save to `round.context.improve_round_findings.spawn_failure = { class, error_message, decided_at }`.
-2. Perform the agent's analysis inline using whatever tools the orchestrator has (typically `Read` + `Bash` grep/find/head + `Glob`/`Grep`). Use the agent's documented Phase checklist as the script — agents are essentially curated checklists; following them inline produces equivalent (lower-fidelity but valid) output.
-3. Record findings in the same shape the agent would have returned (`findings[]` array with `severity`, `category`, `file`, `description`, `suggested_fix`). Mark each with `mode: 'inline_fallback'` so analytics can distinguish.
-4. Continue the skill — do NOT abort the round on spawn failure. The fallback is intended to keep the pipeline moving; aborting would force the user to manually re-run when the same failure will recur.
-**Pre-emptive skip**: when the same failure class fired in the previous round of the same task, skip the spawn attempt entirely and go straight to inline. This avoids one wasted API call per round during a sustained outage.
-## Coverage
-This fallback applies to:
-- `improve-round` spawned by `/cbp-round-end` (Step 6) — original case
-- `task-planner` spawned by `/cbp-round-start` Step 7 — orchestrator falls back to inline planning using the planner's Phase checklist
-- `testing-qa-agent` spawned by `/cbp-round-execute` Step 5 (per-wave) — orchestrator runs build/lint/types/tests inline via Bash and aggregates results in the agent's output shape
-- `task-check` spawned by `/cbp-task-check` skill — orchestrator walks the agent's verdict checklist inline
-- `improve-claude` spawned by its caller (when re-enabled) — orchestrator walks the agent's Phase 0-7 inline
-For details, each spawning skill carries a brief "Inline fallback" section pointing back to this contract. The canonical reference is here.

package/templates/skills/cbp-round-execute/reference/inline-fallback.md DELETED Viewed

@@ -1,55 +0,0 @@
-# Inline-fallback procedures
-When `round-executor` or `testing-qa-agent` cannot be spawned (env limits, monthly cap, 5xx, rate limit, context overflow), the orchestrator falls through to an inline procedure that walks the agent's Phase checklist using its own tools.
-The two fallback modes are documented separately so the SKILL.md stubs can link the right section.
-## Execution fallback (round-executor spawn failed)
-Triggered when the executor agent spawn returns one of the failure classes documented in `agent-spawn-failure-fallback.md`. Procedure:
-1. Detect failure class from error string. Record:
-   ```yaml
-   round.context.executor_findings.spawn_failure:
-     class: "monthly_agent_usage_limit" | "provider_5xx" | "rate_limit_429" | "context_overflow_at_spawn" | "billing_limit"
-     error_message: "<verbatim>"
-     decided_at: "<ISO>"
-   ```
-2. For `.claude/`-only file sets, fall through to the 3-INLINE branch in `../SKILL.md` Step 3 (orchestrator routes per file-routing.md to the matching build-cc skill or direct Edit).
-3. For non-`.claude/` file sets, walk `agents/round-executor.md` Phase 1–4 inline using Read / Edit / Write / Bash / Glob / Grep. Step 3 (Implementation) is the load-bearing phase — apply each `files_to_modify[]` deliverable in order, respecting wave boundaries when wave mode is active.
-4. Populate the executor's output contract with `mode: 'inline_fallback'` so analytics distinguishes.
-5. Pre-emptive skip on repeat: if `prior_round.context.executor_findings.spawn_failure.class === current_class`, skip the spawn attempt entirely and go straight to inline.
-## Validation fallback (testing-qa-agent spawn failed OR claude_only profile)
-Triggered when testing-qa-agent spawn returns a failure class, OR when the resolved profile is `claude_only` (in which case the agent should not have been spawned at all). Procedure:
-1. Detect failure class. Record:
-   ```yaml
-   round.context.testing_qa_findings.spawn_failure:
-     class: "<failure_class>"
-     error_message: "<verbatim>"
-     decided_at: "<ISO>"
-   ```
-2. Apply the profile gate matrix from `agents/testing-qa-agent.md` Phase 3 to determine which checks are in-scope:
-   - `claude_only`: only hook bash syntax (`bash -n <hook>`) + skill structure validation (line counts, scope marker, /cbp-* legacy notation absent)
-   - `web`: skip desktop + backend
-   - `backend`: skip web + desktop
-   - `desktop`: skip web + backend
-   - `full_matrix`: all
-   - `cross_app`: union of touched apps
-3. Walk `agents/testing-qa-agent.md` Phase 1 (Setup) + Phase 2 (Discovery) + Phase 3 (Mandatory Automated Testing) inline using Read / Grep / Bash. Aggregate per-check results.
-4. Populate `testing_qa_output` shape with `mode: 'inline_fallback'`. For `claude_only` specifically, use `mode: 'inline_synthesised_for_claude_only_profile'` (the agent was never expected to spawn — this isn't a fallback, it's the documented happy path).
-5. Pre-emptive skip on repeat: if `prior_round.context.testing_qa_findings.spawn_failure.class === current_class`, skip the spawn attempt entirely.
-## Pre-emptive skip rule
-Per `agent-spawn-failure-fallback.md` item 5: when the same failure class fired in the previous round of the same task, skip the spawn attempt entirely and go straight to inline. This avoids one wasted API call per round during a sustained outage.
-## Pairs With
-- `../SKILL.md` — points at this reference for procedural detail
-- `agents/round-executor.md` — execution-fallback target agent
-- `agents/testing-qa-agent.md` — validation-fallback target agent + Phase 3 profile gate matrix
-- `rules/agent-spawn-failure-fallback.md` — required-coverage table; canonical failure classes
-- `rules/testing-profile.md` — claude_only profile detail; cross-app union semantics