codebyplan 1.13.52 → 1.13.54

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (92) hide show
  1. package/dist/cli.js +3226 -897
  2. package/package.json +1 -1
  3. package/templates/agents/cbp-database-agent.md +1 -1
  4. package/templates/agents/cbp-e2e-maestro.md +1 -1
  5. package/templates/agents/cbp-e2e-playwright.md +24 -16
  6. package/templates/agents/cbp-e2e-tauri.md +1 -1
  7. package/templates/agents/cbp-e2e-vscode.md +1 -1
  8. package/templates/agents/cbp-e2e-xcuitest.md +1 -1
  9. package/templates/agents/cbp-improve-claude.md +2 -2
  10. package/templates/agents/{cbp-round-executor.md → cbp-round-builder.md} +23 -23
  11. package/templates/agents/{cbp-task-planner.md → cbp-round-planner.md} +26 -25
  12. package/templates/agents/cbp-security-agent.md +10 -2
  13. package/templates/agents/cbp-stripe-agent.md +2 -2
  14. package/templates/agents/cbp-testing-qa-agent.md +34 -20
  15. package/templates/agents/cbp-verify-reviewer.md +236 -0
  16. package/templates/context/architecture-map.md +4 -4
  17. package/templates/context/mcp-docs.md +57 -11
  18. package/templates/context/testing/e2e.md +9 -9
  19. package/templates/github-workflows/ci.yml +104 -0
  20. package/templates/github-workflows/publish.yml +8 -27
  21. package/templates/github-workflows/release-desktop.yml +215 -0
  22. package/templates/hooks/cbp-skill-context-guard.sh +1 -1
  23. package/templates/hooks/cbp-test-hooks.sh +9 -9
  24. package/templates/hooks/validate-structure-lengths.sh +1 -1
  25. package/templates/hooks/validate-structure-patterns.sh +1 -1
  26. package/templates/rules/README.md +1 -2
  27. package/templates/rules/agent-claim-verification.md +1 -1
  28. package/templates/rules/context-file-loading.md +10 -10
  29. package/templates/rules/development-workflow.md +73 -0
  30. package/templates/rules/e2e-mandatory.md +8 -8
  31. package/templates/rules/execution-proof.md +70 -0
  32. package/templates/rules/model-invocation-convention.md +2 -2
  33. package/templates/rules/parallel-waves.md +11 -11
  34. package/templates/rules/spawn-failure-is-gate-failure.md +76 -0
  35. package/templates/rules/task-routing-recommendation.md +1 -1
  36. package/templates/rules/todo-backend.md +3 -3
  37. package/templates/rules/two-tier-ci.md +63 -0
  38. package/templates/settings.project.base.json +15 -11
  39. package/templates/skills/cbp-build-cc-mode/SKILL.md +1 -1
  40. package/templates/skills/cbp-build-cc-settings/reference/cbp-permission-policy.md +7 -7
  41. package/templates/skills/cbp-build-cc-skill/SKILL.md +1 -1
  42. package/templates/skills/cbp-build-cc-skill/reference/cbp-quality.md +2 -2
  43. package/templates/skills/cbp-build-cc-skill/reference/fork-eligibility.md +11 -14
  44. package/templates/skills/cbp-checkpoint-check/SKILL.md +11 -3
  45. package/templates/skills/cbp-checkpoint-create/SKILL.md +16 -1
  46. package/templates/skills/cbp-checkpoint-end/SKILL.md +5 -1
  47. package/templates/skills/cbp-checkpoint-update/SKILL.md +3 -3
  48. package/templates/skills/cbp-clear-continue/SKILL.md +2 -2
  49. package/templates/skills/cbp-clear-prep/SKILL.md +3 -3
  50. package/templates/skills/{cbp-task-complete → cbp-finalize}/SKILL.md +25 -29
  51. package/templates/skills/{cbp-task-complete → cbp-finalize}/reference/checkpoint-done-branching.md +1 -1
  52. package/templates/skills/{cbp-task-complete → cbp-finalize}/reference/next-step-heuristic.md +1 -1
  53. package/templates/skills/cbp-frontend-design/SKILL.md +1 -1
  54. package/templates/skills/cbp-frontend-ui/SKILL.md +7 -7
  55. package/templates/skills/cbp-git-commit/SKILL.md +3 -3
  56. package/templates/skills/cbp-merge-main/SKILL.md +4 -4
  57. package/templates/skills/{cbp-round-execute → cbp-round-build}/SKILL.md +93 -75
  58. package/templates/skills/cbp-round-complete/SKILL.md +15 -14
  59. package/templates/skills/cbp-round-plan/SKILL.md +344 -0
  60. package/templates/skills/cbp-session-end/SKILL.md +1 -1
  61. package/templates/skills/cbp-setup-cd/SKILL.md +291 -0
  62. package/templates/skills/cbp-setup-cd/reference/github-actions-cd.md +231 -0
  63. package/templates/skills/cbp-setup-ci/SKILL.md +175 -0
  64. package/templates/skills/cbp-setup-ci/reference/github-actions.md +100 -0
  65. package/templates/skills/cbp-ship/SKILL.md +21 -0
  66. package/templates/skills/cbp-ship-main/SKILL.md +3 -2
  67. package/templates/skills/cbp-standalone-task-check/SKILL.md +10 -9
  68. package/templates/skills/cbp-standalone-task-complete/SKILL.md +12 -13
  69. package/templates/skills/cbp-standalone-task-create/SKILL.md +16 -9
  70. package/templates/skills/cbp-standalone-task-start/SKILL.md +9 -5
  71. package/templates/skills/cbp-standalone-task-testing/SKILL.md +16 -7
  72. package/templates/skills/cbp-task-create/SKILL.md +6 -7
  73. package/templates/skills/cbp-task-start/SKILL.md +8 -8
  74. package/templates/skills/cbp-todo/SKILL.md +6 -8
  75. package/templates/skills/cbp-verify/SKILL.md +146 -0
  76. package/templates/skills/cbp-verify/reference/deterministic-gates.md +114 -0
  77. package/templates/skills/{cbp-round-end → cbp-verify}/reference/findings-presentation.md +16 -12
  78. package/templates/skills/cbp-verify/reference/round-scope.md +62 -0
  79. package/templates/skills/cbp-verify/reference/task-scope.md +71 -0
  80. package/templates/agents/cbp-improve-round.md +0 -283
  81. package/templates/agents/cbp-task-check.md +0 -217
  82. package/templates/skills/cbp-round-check/SKILL.md +0 -132
  83. package/templates/skills/cbp-round-end/SKILL.md +0 -173
  84. package/templates/skills/cbp-round-end/reference/inline-fallback.md +0 -35
  85. package/templates/skills/cbp-round-execute/reference/inline-fallback.md +0 -55
  86. package/templates/skills/cbp-round-input/SKILL.md +0 -197
  87. package/templates/skills/cbp-round-start/SKILL.md +0 -261
  88. package/templates/skills/cbp-round-update/SKILL.md +0 -120
  89. package/templates/skills/cbp-ship/templates/workflow-eas-submit.yml +0 -53
  90. package/templates/skills/cbp-ship/templates/workflow-vsce-publish.yml +0 -31
  91. package/templates/skills/cbp-task-check/SKILL.md +0 -172
  92. package/templates/skills/cbp-task-testing/SKILL.md +0 -277
@@ -1,217 +0,0 @@
1
- ---
2
- name: cbp-task-check
3
- description: Task verification agent. Verifies requirements, checkpoint alignment, QA status, file approvals, code review, shippable gate, round outcome analysis, and user satisfaction discussion.
4
- tools: Read, Glob, Grep, Bash, AskUserQuestion
5
- model: sonnet
6
- effort: xhigh
7
- ---
8
-
9
- # Task Check Agent
10
-
11
- AI-driven production readiness review with user satisfaction discussion. This is the **cross-round double-check** layer: per-round QA (build/lint/types per app, the `console.log`/debug scan, the OWASP/secret grep, API auth-enforcement curls, `pnpm audit`) already ran inside each round's `testing-qa-agent` — this agent does NOT re-run it. Its unique value is holistic: verifying all task requirements are met, checkpoint goals are aligned, the aggregated work is shippable, and — for tasks that span many rounds where scope can shift as new ideas/problems surface — detecting scope drift that should update the checkpoint or task rather than re-running per-round checks.
12
-
13
- **Numeric-claim verification (Proposal P6)**: when round summaries assert numeric facts (file counts, package counts, percentage changes, line counts, version numbers), verify each via direct count: `find ... | wc -l`, `grep -c`, `wc -l <file>`. Do NOT accept narrative numbers without a verification command. Mismatches between asserted and actual counts indicate documentation drift; flag as a finding requiring a fix.
14
-
15
- ## Input Contract
16
-
17
- ```yaml
18
- input:
19
- task_number: number
20
- round_number: number # total rounds
21
- checkpoint: {id, title, goal, context}
22
- task: {id, title, requirements, context, files_changed, qa}
23
- rounds: [{number, requirements, context, qa, files_changed}]
24
- ```
25
-
26
- ## Output Contract
27
-
28
- ```yaml
29
- output:
30
- status: 'completed'
31
- verdict: 'READY' | 'NOT_READY'
32
- requirements_check: [{requirement, status, evidence}]
33
- checkpoint_alignment: {aligned: boolean, notes: string}
34
- qa_summary: {passed, failed, pending}
35
- files_summary: {approved, unapproved, list_unapproved}
36
- code_review: {pass: boolean, issues: []}
37
- shippable: {yes: boolean, caveats: []}
38
- round_outcome_analysis: {direction_changes: [], improvements: [], task_data_updates: {}}
39
- user_satisfaction: {satisfied: boolean, feedback: string}
40
- route_recommendation: string
41
- ```
42
-
43
- ## Workflow
44
-
45
- ### Phase 1: Completeness Gate
46
-
47
- Verify all rounds are completed (status = `completed`). No in_progress rounds allowed.
48
-
49
- If any round is incomplete:
50
- - Set verdict = NOT_READY
51
- - Return immediately with route_recommendation = `/cbp-round-update`
52
-
53
- ### Phase 2: Requirements Verification
54
-
55
- Parse `task.requirements` into individual items. For EACH requirement:
56
-
57
- 1. Read the requirement text
58
- 2. Search `task.files_changed` for files that address it
59
- 3. Search round summaries and context for implementation evidence
60
- 4. Check QA items related to it
61
-
62
- | # | Requirement | Status | Evidence |
63
- |---|------------|--------|----------|
64
- | 1 | [text] | met / partially met / not met | [file paths, round numbers] |
65
-
66
- **Verdict rules:**
67
- - Any requirement "not met" = automatic NOT_READY
68
- - Any "partially met" = explain what is missing, whether it blocks shipping
69
- - All "met" = proceed
70
-
71
- ### Phase 3: Checkpoint Goal Alignment
72
-
73
- Compare task work against `checkpoint.goal`:
74
- - Does this task contribute to the checkpoint goal?
75
- - Any contradictions between task decisions and checkpoint direction?
76
- - Flag drift from original intent
77
-
78
- ### Phase 4: QA Status Review
79
-
80
- Review all QA items across all rounds:
81
- - **Auto items**: Verify all passed (build, lint, types, tests)
82
- - **Default items**: Verify all resolved (pass or skipped with reason)
83
-
84
- **E2E deterministic gate**: For each round where `round.context.e2e_eligible[]` is non-empty, run `codebyplan e2e verify-round --round-id <round_id> --task-id <task_id>`. Exit 0 = pass. Exit 1 = hard-fail — refuse a READY verdict and surface the stdout JSON's `failed_checks[]` verbatim in the verdict text. The CLI deterministically evaluates all three e2e hard-fails that were previously judged manually: `e2e_eligible_skipped` (eligible framework with no specialist output and no valid skip reason), `zero_assertion_run` (`passed === 0 && skipped > 0` on a path touching `files_changed` — "E2E spec authored but assertions did not execute (skip-gated)"), and `empty_gallery` (eligible UI-touching run with zero committed screenshots, per `rules/e2e-mandatory.md` § Committed-Screenshot Enforcement; the sole vscode-test-only exception is honored by the CLI). On any exit-1, route to a fix round per `rules/e2e-mandatory.md`.
85
-
86
- List any pending or failed items. Determine if they are blockers.
87
-
88
- ### Phase 5: File Approval Check
89
-
90
- Check `task.files_changed`:
91
- - Count approved vs not_approved
92
- - List unapproved files
93
- - Determine if unapproved files block completion
94
-
95
- ### Phase 6: Code Review (holistic spot-check)
96
-
97
- Per-round QA already ran the line-level checks — the `console.log`/debug scan (round `testing-qa-agent` Phase 3.5), the OWASP secret/injection grep (Phase 5), the API auth-enforcement curl (Phase 3.55), and `pnpm audit` (Phase 3.7). Do NOT re-run them here. Phase 6 is a light holistic spot-check across the aggregated diff for what a single round cannot see:
98
-
99
- - No obvious bugs or regressions that emerge only when all rounds' changes are read together
100
- - No cross-round integration gaps (a field/contract introduced in one round that a later round broke)
101
- - Error handling present where needed at the feature boundary
102
- - Consistent with existing codebase patterns across the full task diff
103
-
104
- If the aggregated diff surfaces an obvious issue per-round QA missed, flag it as a finding — but the per-round scans are authoritative for line-level concerns.
105
-
106
- ### Phase 7: Shippable Feature Gate
107
-
108
- Ask: "If deployed now, would this feature work end-to-end?"
109
-
110
- - **YES**: Continue
111
- - **YES with caveats**: List caveats
112
- - **NO**: Verdict = NOT_READY, list what is broken/incomplete
113
-
114
- Catches integration gaps where requirements are technically met but feature does not work as a whole.
115
-
116
- ### Phase 8: Round Outcome Analysis
117
-
118
- Analyze how rounds evolved the work:
119
- - **Direction changes**: Did user feedback change approach? Document shifts.
120
- - **Improvements**: What got better across rounds? What patterns emerged?
121
- - **Task data updates**: Capture actual outcomes vs planned for task context.
122
-
123
- Update `round_outcome_analysis` with findings.
124
-
125
- ### Phase 9: User Satisfaction Discussion
126
-
127
- For tasks that ran many rounds, scope drift accumulates quietly — each round may have absorbed a new idea or problem without the checkpoint/task requirements being updated. The satisfaction discussion is where that drift surfaces; treat the scope-divergence scan below as a first-class output, not an afterthought.
128
-
129
- Present findings to user via AskUserQuestion:
130
-
131
- ```
132
- ## AI Production Review: TASK-[N]
133
-
134
- ### Requirements: [N]/[N] met
135
- [table]
136
-
137
- ### Shippable: [yes/no/caveats]
138
- ### Checkpoint Alignment: [aligned/drift]
139
- ### QA: [passed/failed/pending counts]
140
- ### Files: [approved/unapproved counts]
141
- ### Code Review: [pass/issues]
142
-
143
- ### Round Evolution:
144
- [Brief summary of how work evolved across rounds]
145
-
146
- Are you satisfied with the delivered work? Any concerns or feedback?
147
- ```
148
-
149
- Capture response in `user_satisfaction`.
150
-
151
- **Scope-divergence detection**: after capturing the response, scan it against the active checkpoint's locked context. Set `scope_divergence_detected: true` and populate `divergence_summary` when ANY hold:
152
-
153
- - The response references a different `TASK-N` (e.g., "before TASK-2 starts, we should re-shape findings") implying a re-slicing of upcoming tasks
154
- - The response contradicts a locked entry in `checkpoint.context.decisions[]` (e.g., user picked option B at checkpoint creation; their answer here implies option A is now correct)
155
- - The response introduces a new constraint or success criterion not present in the original task or checkpoint requirements
156
-
157
- `divergence_summary` shape:
158
-
159
- ```yaml
160
- scope_divergence_detected: true
161
- divergence_summary:
162
- diverges_from: "checkpoint.context.decisions[2]" | "task.requirements[1]" | "task TASK-N scope"
163
- user_statement: "<verbatim quote>"
164
- implication: "<one-line: what would need to change>"
165
- ```
166
-
167
- When no divergence is detected, set `scope_divergence_detected: false` and proceed normally.
168
-
169
- ### Phase 10: Verdict and Routing
170
-
171
- **READY** (all checks pass + user satisfied) AND `scope_divergence_detected: false`:
172
- - verdict = READY
173
- - route_recommendation = `/cbp-task-testing`
174
-
175
- **READY + scope_divergence_detected: true** (work is correct, but user input implies upcoming-scope change):
176
- - verdict = READY
177
- - route_recommendation = `/cbp-checkpoint-update`
178
- - Populate `route_context.divergence_summary` so checkpoint-update sees what changed
179
- - Rationale: the current task delivered correctly; the divergence is about FUTURE work and belongs to checkpoint replanning, not a fix round
180
-
181
- **NOT_READY — fixable issues:**
182
- - verdict = NOT_READY
183
- - route_recommendation = `/cbp-round-input`
184
- - List specific issues to address
185
-
186
- **NOT_READY — needs new task:**
187
- - verdict = NOT_READY
188
- - route_recommendation = `/cbp-task-create`
189
- - Explain why current task scope is insufficient
190
-
191
- **NOT_READY — approvals missing:**
192
- - verdict = NOT_READY
193
- - route_recommendation = "Approve files, re-run `/cbp-task-check`"
194
- - List unapproved files
195
-
196
- ## Key Rules
197
-
198
- - **This is AI review + user discussion** — distinct from automated testing
199
- - **Read all changed files** — do not just check metadata
200
- - **Be thorough but practical** — flag real issues, not style preferences
201
- - **No file changes** — review only, never edit
202
- - **`/cbp-task-check` is NEVER skippable**
203
-
204
- ## Completion Criteria
205
-
206
- - All 10 phases executed
207
- - All changed files read and reviewed
208
- - User satisfaction captured
209
- - Verdict determined with evidence
210
- - Route recommendation provided
211
-
212
- ## Integration
213
-
214
- - **Spawned by**: `/cbp-task-check` command
215
- - **Returns to**: `/cbp-task-check` which routes based on verdict
216
- - **Reads**: All task, checkpoint, and rounds data arrives via the Input Contract (passed by `/cbp-task-check`). Local `.codebyplan/state/` files are the preferred source when `/cbp-task-check` pre-fetches context — read `.codebyplan/state/checkpoints/<checkpointId>/tasks/<taskId>.json` and `rounds/*.json` (local-first; break-glass: MCP `get_*` tools when state dir is absent and sync fails). The agent itself reads only filesystem content (changed files) via the Read tool — it never calls MCP or CLI directly.
217
- - **Writes**: None — review only, never edits.
@@ -1,132 +0,0 @@
1
- ---
2
- name: cbp-round-check
3
- description: Run automated checks standalone for the current round
4
- effort: low
5
- ---
6
-
7
- <!-- Re-read this file before executing. Do not rely on memory. -->
8
-
9
- ## Kind Detection
10
-
11
- Inspect the resolved identifier from argument parsing to determine the task kind:
12
-
13
- | Identifier shape | KIND |
14
- |-----------------|------|
15
- | `{task}-{round}` (2-segment, e.g. `45-2`) | `standalone` |
16
- | `{chk}-{task}-{round}` (3-segment, e.g. `141-3-1`) | `checkpoint` |
17
- | _(empty / free-text)_ | Check `get_current_standalone_task` first; if found → `standalone`. Else → `checkpoint` via `get_current_task`. (Kind-detection is MCP-unavoidable — no identifier yet means no local path to probe; subsequent operations are local-first per the rows below.) |
18
-
19
- Set `KIND` for the rest of this skill. MCP tool names vary by KIND:
20
-
21
- | Operation | `checkpoint` KIND | `standalone` KIND |
22
- |-----------|------------------|-------------------|
23
- | Get task | local state (break-glass: `get_current_task`) | `get_current_standalone_task(repo_id)` |
24
- | Get rounds | local state (break-glass: `get_rounds`) | `get_standalone_rounds(standalone_task_id)` |
25
- | Add round | `add_round(task_id, ...)` | `add_standalone_round(standalone_task_id, ...)` |
26
- | Update round | `update_round(round_id, ...)` | `update_standalone_round(standalone_round_id, ...)` |
27
- | Complete round | `complete_round(round_id, duration_minutes?)` | `complete_standalone_round(standalone_round_id, duration_minutes?, caller_worktree_id)` ⚠️ `caller_worktree_id` is REQUIRED for standalone |
28
- | Update task | `update_task(task_id, ...)` | `update_standalone_task(standalone_task_id, ...)` |
29
-
30
- # Round Check Command
31
-
32
- Run automated checks independently with mandatory execution. Updates round QA. Hard fails if mandatory checks (gate6/lint/typecheck/tests) fail.
33
-
34
- ## Instructions
35
-
36
- ### Step 1: Get Current Round
37
-
38
- Use Kind Detection above to set KIND. Then:
39
-
40
- - **checkpoint KIND**: Read `.codebyplan/state/checkpoints/<checkpointId>/tasks/<taskId>.json` (local-first) to find active task, then read `.codebyplan/state/checkpoints/<checkpointId>/tasks/<taskId>/rounds/<roundId>.json` to find the in-progress round. If missing/stale, run `npx codebyplan sync` once and re-read. Break-glass fallback: MCP `get_current_task(repo_id)` + `get_rounds(task_id)` when state dir is absent and sync fails.
41
- - **standalone KIND**: MCP `get_current_standalone_task(repo_id)` to find active task, then `get_standalone_rounds(standalone_task_id)` to find the in-progress round. (Standalone KIND still uses MCP until a later task.)
42
-
43
- ### Step 2: Run Core Check Matrix
44
-
45
- From the repo root, run:
46
-
47
- ```bash
48
- codebyplan check --scope round --json
49
- ```
50
-
51
- Capture the JSON output. The runner is **whole-repo + baseline**: it runs `turbo run lint|typecheck|test` across every package and diffs each per-package result against the committed `.check-baseline.json`, so a pre-existing failure in an unrelated package does NOT fail the check — only a NEW failure does. The result shape is:
52
-
53
- ```json
54
- {
55
- "results": [
56
- {"check": "gate6"|"lint"|"typecheck"|"tests"|"audit", "status": "pass"|"fail"|"skipped",
57
- "exit_code": number|null, "command": string, "stdout": string, "stderr": string,
58
- "executed": boolean, "new_failures"?: string[]}
59
- ],
60
- "any_failed": boolean,
61
- "hard_fail_checks": [ ...names of checks that FAILED ]
62
- }
63
- ```
64
-
65
- Five checks run in order: `gate6` (sibling-identity parity — `node scripts/check-sibling-identity.mjs`), `lint`, `typecheck`, `tests`, `audit`. For the baselined checks (`lint`/`typecheck`/`tests`) `new_failures` lists the packages that newly fail (not in the baseline); `status` is `pass` when `new_failures` is empty **even if the underlying command exited non-zero** (those failures are pre-existing/baselined). `audit.new_failures` lists new GHSA advisory ids not in the allowlist. **`gate6` is ALWAYS hard-fail — it is never baselined**; its `new_failures` field is omitted (absent/`undefined` in the JSON, not `null`), and a sibling-parity divergence fails the round regardless of the baseline.
66
-
67
- `hard_fail_checks` is dynamic — it lists only the checks that failed (`[]` when all pass; e.g. `["gate6"]` or `["typecheck","tests"]`), drawn from `results[].check`. The hard-fail checks for `--scope round` are `gate6`, `lint`, `typecheck`, and `tests` (`audit` is `--scope task` only). If `any_failed === true` (equivalently, `hard_fail_checks` is non-empty), this is a **hard fail** — surface each failing result's `stdout`/`stderr` (and `new_failures`) and stop.
68
-
69
- ### Step 3: Execute Conditional Checks
70
-
71
- | Check | Command | Condition |
72
- |-------|---------|-----------|
73
- | **A11y** | Static check (aria, alt, focus) | UI files changed |
74
- | **API Health** | `curl -s -o /dev/null -w "%{http_code}" http://localhost:{PORT}/` | API routes changed |
75
- | **Visual** | Visual check flow (page-map + visual-check) | UI work + dev server running |
76
-
77
- ### Step 4: Analyze Output
78
-
79
- Scan each runner result's `stdout`/`stderr` for:
80
- - **Warnings** (not just errors)
81
- - **Deprecation notices** (`grep -i "deprecat"` in output)
82
- - **Console.log in changed files**: `grep -rn "console\.\(log\|debug\|info\)" {changed_files}` (exclude tests)
83
- - **Bundle size warnings**
84
-
85
- ### Step 5: Save QA Results
86
-
87
- Update round QA:
88
- - **checkpoint KIND**: `codebyplan round update --id <round_id> --task-id <task_id> --checkpoint-id <checkpoint_id> --qa '<json>'` (CLI write-through: local state file + REST). Break-glass fallback: MCP `update_round(round_id, qa: ...)` when the CLI is unavailable.
89
- - **standalone KIND**: MCP `update_standalone_round(standalone_round_id, qa: ...)`. (Standalone KIND still uses MCP until a later task.)
90
-
91
- Map each runner result entry to a QA item:
92
-
93
- ```json
94
- {
95
- "items": [
96
- {"type": "auto", "check": "gate6", "status": "pass", "ran_at": "...", "notes": null, "executed": true},
97
- {"type": "auto", "check": "lint", "status": "pass", "ran_at": "...", "notes": null, "executed": true},
98
- {"type": "auto", "check": "typecheck", "status": "fail", "ran_at": "...", "notes": "1 new failing package", "executed": true},
99
- {"type": "auto", "check": "tests", "status": "pass", "ran_at": "...", "notes": "no new failures (baselined)", "executed": true}
100
- ]
101
- }
102
- ```
103
-
104
- ### Step 6: Show Results
105
-
106
- ```
107
- ## Round Check Results
108
-
109
- | Check | Status | Executed | Notes |
110
- |-------|--------|----------|-------|
111
- | gate6 | pass | yes | sibling-identity OK |
112
- | lint | pass | yes | - |
113
- | typecheck | fail | yes | 1 new failing package |
114
- | tests | pass | yes | no new failures (baselined) |
115
- | A11y | pass | yes | - |
116
- | Visual| pass | yes | screenshots saved |
117
-
118
- **Result**: [N] passed, [N] failed, [N] skipped
119
- **Hard fail**: [yes/no]
120
- ```
121
-
122
- If hard fail: `Mandatory checks failed. Fix issues before continuing.`
123
- If soft failures only: `Run /cbp-round-start to trigger auto-fix, or fix manually.`
124
-
125
- ## Integration
126
-
127
- - **Reads (checkpoint KIND)**: `.codebyplan/state/checkpoints/<id>.json`, `checkpoints/<id>/tasks/<id>.json`, `checkpoints/<id>/tasks/<id>/rounds/<id>.json` (local-first; run `npx codebyplan sync` if missing; break-glass: MCP `get_current_task` / `get_rounds`)
128
- - **Reads (standalone KIND)**: MCP `get_current_standalone_task` / `get_standalone_rounds` (standalone KIND still uses MCP until a later task)
129
- - **Writes (checkpoint KIND)**: `codebyplan round update` (qa field). Break-glass: MCP `update_round`.
130
- - **Writes (standalone KIND)**: MCP `update_standalone_round` (qa field). (Standalone KIND still uses MCP until a later task.)
131
- - **Runner**: `codebyplan check --scope round --json` (whole-repo + baseline via `turbo run`; runs gate6 + lint + typecheck + tests; `--files` is accepted but ignored in whole-repo mode)
132
- - **Standalone**: Can be run independently at any time
@@ -1,173 +0,0 @@
1
- ---
2
- name: cbp-round-end
3
- description: Summary wrap-up after testing phase completes
4
- effort: high
5
- ---
6
-
7
- # Round End Command
8
-
9
- Summary phase — presents what was done, then runs code quality review to catch bugs and logic errors that automated checks miss.
10
-
11
- **Inline-fallback for any spawn failure**: when `cbp-improve-round` (or any peer agent) fails to spawn, the orchestrator falls through to an inline procedure that produces equivalent (lower-fidelity but valid) output. The contract: detect failure class → record in `round.context.improve_round_findings.spawn_failure` → walk the agent's Phase checklist inline → continue the skill. Same procedure for every failure class (org/billing, monthly Agent cap, provider 5xx, rate limit, context overflow, tool not available). Pre-emptive skip applies when the same class fired on the prior round.
12
-
13
- See `reference/inline-fallback.md` for full trigger table, procedure, and coverage list.
14
-
15
- ## Pipeline
16
-
17
- ```
18
- /cbp-round-execute → /cbp-round-end → [code review + auto-apply in-scope] → /cbp-round-update
19
- ```
20
-
21
- ## Identifier Notation
22
-
23
- This skill operates on the **active** task/round resolved via MCP `get_current_task` / `get_rounds` and does not accept a positional identifier argument. Canonical chk-task-round notation is defined in `cbp-round-start` Step 0 "CHK / TASK / ROUND Identifier Notation Vocabulary".
24
-
25
- ## Instructions
26
-
27
- ### Step 1: Get Current Task and Round
28
-
29
- Read `.codebyplan/state/checkpoints/<checkpointId>/tasks/<taskId>.json` (local-first) to find the active task. If missing/stale, run `npx codebyplan sync` once and re-read. Break-glass fallback: MCP `get_current_task` when the state dir is absent and sync fails (daemon-dead + CLI-unavailable).
30
-
31
- Read `.codebyplan/state/checkpoints/<checkpointId>/tasks/<taskId>/rounds/<roundId>.json` (local-first) to find the in-progress round. Same sync / break-glass pattern (MCP `get_rounds` as fallback).
32
-
33
- Load round context with all outputs (executor_output, testing_qa_output, reviewer_output).
34
-
35
- ### Step 2: Collect Files Changed
36
-
37
- Collect all files changed during this round from:
38
-
39
- - Work executor output
40
- - `git diff --name-status HEAD` for final state
41
-
42
- Build the files list with approval status:
43
-
44
- ```json
45
- [
46
- {
47
- "path": "src/file.ts",
48
- "action": "modified",
49
- "claude_approved": true,
50
- "user_approved": false
51
- }
52
- ]
53
- ```
54
-
55
- **claude_approved**: `true` if cbp-testing-qa-agent passed for this file. `false` if issues remain.
56
- **user_approved**: Always `false` initially. User approves via git staging or web UI.
57
-
58
- ### Step 3: Collect QA Results
59
-
60
- **No QA runs here** — all QA was already executed by per-wave `cbp-testing-qa-agent` inside `/cbp-round-execute` Step 5.
61
-
62
- #### 3a — Collect items from agent outputs
63
-
64
- Collect from round context:
65
-
66
- - **Auto items**: from `testing_qa_output.auto_qa.items`
67
- - **Default items**: from `testing_qa_output.default_checklist.items`
68
-
69
- Merge with previous rounds (supersede items for re-modified files, preserve verified items).
70
-
71
- ### Step 4: Update Task Files and QA
72
-
73
- - **Round files + QA**: `codebyplan round update --id <round-id> --task-id <uuid> --checkpoint-id <uuid> --files-changed <json> --qa <json>` (CLI write-through: local state at `.codebyplan/state/checkpoints/<checkpointId>/tasks/<taskId>/rounds/<roundId>.json` + REST). Break-glass fallback: MCP `update_round` when the CLI is unavailable.
74
- - **Task files_changed merge**: `codebyplan task update --id <task-id> --checkpoint-id <uuid> --files-changed <json>` (CLI write-through: local state at `.codebyplan/state/checkpoints/<checkpointId>/tasks/<taskId>.json` + REST). Break-glass fallback: MCP `update_task` when the CLI is unavailable.
75
- - **Task QA aggregated**: `codebyplan task update --id <task-id> --checkpoint-id <uuid> --qa <json>` (same CLI write-through). Break-glass: MCP `update_task`.
76
-
77
- ### Step 5: Present Summary
78
-
79
- ```
80
- ## Round [N] Complete - Ready for Review
81
-
82
- ### Work Done
83
- [Brief summary from executor_output]
84
-
85
- ### Files Changed ([N] files, [N] need approval)
86
- | File | Action | Claude | User |
87
- |------|--------|--------|------|
88
- | src/file.ts | modified | approved | pending |
89
-
90
- ### Auto Checks
91
- | Check | Status |
92
- |-------|--------|
93
- | Build | pass |
94
- | Lint | pass |
95
- | Types | pass |
96
- | Tests | pass/skipped |
97
- ```
98
-
99
- ### Step 6: Spawn Code Quality Review
100
-
101
- Spawn `cbp-improve-round` agent via Agent tool with:
102
-
103
- ```yaml
104
- input:
105
- repo_id: [from config]
106
- task: {id, title, requirements, context}
107
- round: {id, number, requirements, files_changed, context}
108
- project_path: [working directory]
109
- ```
110
-
111
- Wait for agent to complete. If the spawn fails for any reason, apply the inline-fallback procedure documented in `reference/inline-fallback.md` (record `round.context.improve_round_findings.spawn_failure`, walk the agent's Phase checklist inline, continue the skill).
112
-
113
- ### Step 7: Present Findings
114
-
115
- **Baseline-regression blocking gate**: before presenting code-review findings, check `round.context.frontend_ui_review.findings[]` for any entry with `category: 'baseline_regression'` (any severity). When one or more such findings exist, surface them as a BLOCKING decision that MUST be resolved before routing to `/cbp-round-update`:
116
-
117
- - Present each regression: screenshot path, `baseline_diff_pct`, affected page/screen.
118
- - Ask the user via AskUserQuestion to choose:
119
- - **(a) Treat as regression** — add a fix-round to address the visual change, OR
120
- - **(b) Accept the new baseline** — run `pnpm exec playwright test --update-snapshots` in `apps/{app}` and commit the updated baselines.
121
- - Do NOT route to `/cbp-round-update` until the decision is recorded. Baselines are NEVER auto-accepted.
122
-
123
- `rendered_visual` critical findings from `round.context.frontend_ui_review.findings[]` are surfaced in the normal findings presentation below (not as a separate gate).
124
-
125
- **If `status: 'no_findings'`:** show `### Code Review\nNo issues found. Code looks good.` and skip to Step 8.
126
-
127
- **If findings exist**, present them grouped by severity (table + per-finding details).
128
-
129
- **Under `auto_loop_mode === true`**: do NOT auto-apply here — Step 8's auto-loop path accepts all findings into `improve_round_findings[]` and defers the fixes to the next loop round. Skip straight to Step 8.
130
-
131
- **Manual mode**: **auto-apply all in-scope findings inline**. A finding is *in-scope* when every file it references is within the round's `files_changed[]`. The round-end orchestrator (main context — it has Edit/Write) applies these fixes directly; the `cbp-improve-round` agent stays read-only/advisory and never writes. Record each applied fix in `round.context.inline_fix_log` (findings indices, rationale, `fixes[]`, applied_at). After applying, re-run the verification scoped to the modified files (hook syntax check for `.sh`; `cbp-testing-qa-agent` for code) per `reference/findings-presentation.md`; if it fails, do NOT record the fix — treat the finding as out-of-scope instead. Findings that reference files OUTSIDE `files_changed[]` are **out-of-scope** — do NOT apply them; save them to `improve_round_findings[]` so Step 8 routes them to `/cbp-round-input` or a new task. There is no findings-decision AskUserQuestion — the round was already approved at the `/cbp-round-execute` permission prompt. The baseline-regression gate above is the ONLY user decision in this step.
132
-
133
- Example tables and the in-scope/out-of-scope classification: see `reference/findings-presentation.md`.
134
-
135
- ### Step 8: Route Based on Decisions
136
-
137
- **If `round.context.auto_loop_mode === true`** (auto-loop active):
138
-
139
- - Auto-accept ALL findings into `improve_round_findings[]` regardless of severity (the user opted into the loop).
140
- - Skip the polish-spiral stop-gate (auto-loop has its own cap-exhausted termination).
141
- - Skip Step 7's inline auto-apply (findings are deferred to the next loop round, not applied this round).
142
- - Save findings via `update_round` exactly as in manual mode.
143
- - Auto-trigger `/cbp-round-update` immediately. round-update triages the round and either routes to `/cbp-round-input` (spawn another round) or **directs the user to run** `/cbp-round-complete` on a clean exit — see cbp-round-update SKILL.md Step 2/3.
144
-
145
- **Else (manual mode — flag absent or false):**
146
-
147
- Step 7 already auto-applied in-scope findings and logged them to `round.context.inline_fix_log`. Now record any out-of-scope findings and route:
148
-
149
- 1. **Polish-spiral stop-gate** (round 2+ only): if this is round 2 or later AND the prior round also ended with code-review fixes, surface a one-line stop-gate via AskUserQuestion — *defer remaining polish to a follow-up task* vs *continue with another round*. This is a genuine user decision about scope (it guards against endless low-value polish loops), not a flow-control prompt. Skip on round 1.
150
- 2. Save out-of-scope findings (those NOT auto-applied in Step 7) to round context via `codebyplan round update --id <round-id> --task-id <uuid> --checkpoint-id <uuid> --context <json>` (break-glass: MCP `update_round`):
151
- ```json
152
- {
153
- "context": {
154
- "improve_round_findings": [out-of-scope findings]
155
- }
156
- }
157
- ```
158
- 3. Auto-trigger `/cbp-round-update`. round-update triages the round: if out-of-scope findings (or a hard-fail) remain it routes to `/cbp-round-input` (which picks up the findings from round context and includes them in the new round's requirements automatically); if the round is clean it **directs the user to run** `/cbp-round-complete` (the user-invoked finalizer that reconciles the user's `git add`s and completes the round).
159
-
160
- ## Key Rules
161
-
162
- - Claude NEVER git adds files — user approval is via git staging at `/cbp-round-complete`
163
- - Auto-triggers `/cbp-round-update` after findings are handled
164
- - `/cbp-round-end` is auto-triggered by `/cbp-round-execute` (user does not call it directly)
165
- - In-scope findings are **auto-applied inline** by the round-end orchestrator (the round was already approved at the `/cbp-round-execute` permission); out-of-scope findings route to `/cbp-round-input`. `cbp-improve-round` stays read-only/advisory. Baseline-regression accept (Step 7 gate) stays a user decision — baselines are NEVER auto-accepted.
166
-
167
- ## Integration
168
-
169
- - **Triggered by**: `/cbp-round-execute` (auto, after all waves + testing complete)
170
- - **Reads**: `.codebyplan/state/checkpoints/<id>/tasks/<id>.json`, `checkpoints/<id>/tasks/<id>/rounds/<id>.json` (local-first; `npx codebyplan sync` on miss; MCP `get_current_task` / `get_rounds` as break-glass)
171
- - **Writes**: `codebyplan round update` (Step 4 round files/QA, Step 8 findings; break-glass: MCP `update_round`), `codebyplan task update` (Step 4 files_changed + QA aggregated; break-glass: MCP `update_task`)
172
- - **Spawns**: `cbp-improve-round` (code quality review)
173
- - **Triggers**: `/cbp-round-update` (auto, after findings handled)
@@ -1,35 +0,0 @@
1
- # Inline-Fallback for Any Spawn Failure
2
-
3
- When `improve-round` (or any agent spawned by this or peer skills) fails to spawn, the orchestrator falls through to an inline procedure that produces equivalent (lower-fidelity but valid) output. Same contract for every failure class — no special-casing per class.
4
-
5
- ## Trigger conditions (any one)
6
-
7
- | Failure class | Detection signal |
8
- | ------------------------- | ------------------------------------------------------------------------------------- |
9
- | Org/billing limit | `API Error: Extra usage is required for 1M context` (the original Proposal U trigger) |
10
- | Monthly Agent usage cap | `API Error: This conversation has reached the monthly Agent usage limit` or similar |
11
- | Provider 5xx | Spawn returns `API Error 500` / `502` / `503` — transient or sustained |
12
- | Rate limit | `API Error 429` with retry hint |
13
- | Context overflow at spawn | Spawn returns `Context window exceeded` before agent can run |
14
- | Tool not available | Skill caller's tool surface lacks Agent (rare — only in nested-agent contexts) |
15
-
16
- ## Fallback procedure (uniform across all triggers)
17
-
18
- 1. Note the failure: `agent_spawned: false`, `skip_reason: "<one-line failure class>"`. Save to `round.context.improve_round_findings.spawn_failure = { class, error_message, decided_at }`.
19
- 2. Perform the agent's analysis inline using whatever tools the orchestrator has (typically `Read` + `Bash` grep/find/head + `Glob`/`Grep`). Use the agent's documented Phase checklist as the script — agents are essentially curated checklists; following them inline produces equivalent (lower-fidelity but valid) output.
20
- 3. Record findings in the same shape the agent would have returned (`findings[]` array with `severity`, `category`, `file`, `description`, `suggested_fix`). Mark each with `mode: 'inline_fallback'` so analytics can distinguish.
21
- 4. Continue the skill — do NOT abort the round on spawn failure. The fallback is intended to keep the pipeline moving; aborting would force the user to manually re-run when the same failure will recur.
22
-
23
- **Pre-emptive skip**: when the same failure class fired in the previous round of the same task, skip the spawn attempt entirely and go straight to inline. This avoids one wasted API call per round during a sustained outage.
24
-
25
- ## Coverage
26
-
27
- This fallback applies to:
28
-
29
- - `improve-round` spawned by `/cbp-round-end` (Step 6) — original case
30
- - `task-planner` spawned by `/cbp-round-start` Step 7 — orchestrator falls back to inline planning using the planner's Phase checklist
31
- - `testing-qa-agent` spawned by `/cbp-round-execute` Step 5 (per-wave) — orchestrator runs build/lint/types/tests inline via Bash and aggregates results in the agent's output shape
32
- - `task-check` spawned by `/cbp-task-check` skill — orchestrator walks the agent's verdict checklist inline
33
- - `improve-claude` spawned by its caller (when re-enabled) — orchestrator walks the agent's Phase 0-7 inline
34
-
35
- For details, each spawning skill carries a brief "Inline fallback" section pointing back to this contract. The canonical reference is here.
@@ -1,55 +0,0 @@
1
- # Inline-fallback procedures
2
-
3
- When `round-executor` or `testing-qa-agent` cannot be spawned (env limits, monthly cap, 5xx, rate limit, context overflow), the orchestrator falls through to an inline procedure that walks the agent's Phase checklist using its own tools.
4
-
5
- The two fallback modes are documented separately so the SKILL.md stubs can link the right section.
6
-
7
- ## Execution fallback (round-executor spawn failed)
8
-
9
- Triggered when the executor agent spawn returns one of the failure classes documented in `agent-spawn-failure-fallback.md`. Procedure:
10
-
11
- 1. Detect failure class from error string. Record:
12
- ```yaml
13
- round.context.executor_findings.spawn_failure:
14
- class: "monthly_agent_usage_limit" | "provider_5xx" | "rate_limit_429" | "context_overflow_at_spawn" | "billing_limit"
15
- error_message: "<verbatim>"
16
- decided_at: "<ISO>"
17
- ```
18
- 2. For `.claude/`-only file sets, fall through to the 3-INLINE branch in `../SKILL.md` Step 3 (orchestrator routes per file-routing.md to the matching build-cc skill or direct Edit).
19
- 3. For non-`.claude/` file sets, walk `agents/round-executor.md` Phase 1–4 inline using Read / Edit / Write / Bash / Glob / Grep. Step 3 (Implementation) is the load-bearing phase — apply each `files_to_modify[]` deliverable in order, respecting wave boundaries when wave mode is active.
20
- 4. Populate the executor's output contract with `mode: 'inline_fallback'` so analytics distinguishes.
21
- 5. Pre-emptive skip on repeat: if `prior_round.context.executor_findings.spawn_failure.class === current_class`, skip the spawn attempt entirely and go straight to inline.
22
-
23
- ## Validation fallback (testing-qa-agent spawn failed OR claude_only profile)
24
-
25
- Triggered when testing-qa-agent spawn returns a failure class, OR when the resolved profile is `claude_only` (in which case the agent should not have been spawned at all). Procedure:
26
-
27
- 1. Detect failure class. Record:
28
- ```yaml
29
- round.context.testing_qa_findings.spawn_failure:
30
- class: "<failure_class>"
31
- error_message: "<verbatim>"
32
- decided_at: "<ISO>"
33
- ```
34
- 2. Apply the profile gate matrix from `agents/testing-qa-agent.md` Phase 3 to determine which checks are in-scope:
35
- - `claude_only`: only hook bash syntax (`bash -n <hook>`) + skill structure validation (line counts, scope marker, /cbp-* legacy notation absent)
36
- - `web`: skip desktop + backend
37
- - `backend`: skip web + desktop
38
- - `desktop`: skip web + backend
39
- - `full_matrix`: all
40
- - `cross_app`: union of touched apps
41
- 3. Walk `agents/testing-qa-agent.md` Phase 1 (Setup) + Phase 2 (Discovery) + Phase 3 (Mandatory Automated Testing) inline using Read / Grep / Bash. Aggregate per-check results.
42
- 4. Populate `testing_qa_output` shape with `mode: 'inline_fallback'`. For `claude_only` specifically, use `mode: 'inline_synthesised_for_claude_only_profile'` (the agent was never expected to spawn — this isn't a fallback, it's the documented happy path).
43
- 5. Pre-emptive skip on repeat: if `prior_round.context.testing_qa_findings.spawn_failure.class === current_class`, skip the spawn attempt entirely.
44
-
45
- ## Pre-emptive skip rule
46
-
47
- Per `agent-spawn-failure-fallback.md` item 5: when the same failure class fired in the previous round of the same task, skip the spawn attempt entirely and go straight to inline. This avoids one wasted API call per round during a sustained outage.
48
-
49
- ## Pairs With
50
-
51
- - `../SKILL.md` — points at this reference for procedural detail
52
- - `agents/round-executor.md` — execution-fallback target agent
53
- - `agents/testing-qa-agent.md` — validation-fallback target agent + Phase 3 profile gate matrix
54
- - `rules/agent-spawn-failure-fallback.md` — required-coverage table; canonical failure classes
55
- - `rules/testing-profile.md` — claude_only profile detail; cross-app union semantics