codebyplan 1.13.52 → 1.13.54
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/cli.js +3226 -897
- package/package.json +1 -1
- package/templates/agents/cbp-database-agent.md +1 -1
- package/templates/agents/cbp-e2e-maestro.md +1 -1
- package/templates/agents/cbp-e2e-playwright.md +24 -16
- package/templates/agents/cbp-e2e-tauri.md +1 -1
- package/templates/agents/cbp-e2e-vscode.md +1 -1
- package/templates/agents/cbp-e2e-xcuitest.md +1 -1
- package/templates/agents/cbp-improve-claude.md +2 -2
- package/templates/agents/{cbp-round-executor.md → cbp-round-builder.md} +23 -23
- package/templates/agents/{cbp-task-planner.md → cbp-round-planner.md} +26 -25
- package/templates/agents/cbp-security-agent.md +10 -2
- package/templates/agents/cbp-stripe-agent.md +2 -2
- package/templates/agents/cbp-testing-qa-agent.md +34 -20
- package/templates/agents/cbp-verify-reviewer.md +236 -0
- package/templates/context/architecture-map.md +4 -4
- package/templates/context/mcp-docs.md +57 -11
- package/templates/context/testing/e2e.md +9 -9
- package/templates/github-workflows/ci.yml +104 -0
- package/templates/github-workflows/publish.yml +8 -27
- package/templates/github-workflows/release-desktop.yml +215 -0
- package/templates/hooks/cbp-skill-context-guard.sh +1 -1
- package/templates/hooks/cbp-test-hooks.sh +9 -9
- package/templates/hooks/validate-structure-lengths.sh +1 -1
- package/templates/hooks/validate-structure-patterns.sh +1 -1
- package/templates/rules/README.md +1 -2
- package/templates/rules/agent-claim-verification.md +1 -1
- package/templates/rules/context-file-loading.md +10 -10
- package/templates/rules/development-workflow.md +73 -0
- package/templates/rules/e2e-mandatory.md +8 -8
- package/templates/rules/execution-proof.md +70 -0
- package/templates/rules/model-invocation-convention.md +2 -2
- package/templates/rules/parallel-waves.md +11 -11
- package/templates/rules/spawn-failure-is-gate-failure.md +76 -0
- package/templates/rules/task-routing-recommendation.md +1 -1
- package/templates/rules/todo-backend.md +3 -3
- package/templates/rules/two-tier-ci.md +63 -0
- package/templates/settings.project.base.json +15 -11
- package/templates/skills/cbp-build-cc-mode/SKILL.md +1 -1
- package/templates/skills/cbp-build-cc-settings/reference/cbp-permission-policy.md +7 -7
- package/templates/skills/cbp-build-cc-skill/SKILL.md +1 -1
- package/templates/skills/cbp-build-cc-skill/reference/cbp-quality.md +2 -2
- package/templates/skills/cbp-build-cc-skill/reference/fork-eligibility.md +11 -14
- package/templates/skills/cbp-checkpoint-check/SKILL.md +11 -3
- package/templates/skills/cbp-checkpoint-create/SKILL.md +16 -1
- package/templates/skills/cbp-checkpoint-end/SKILL.md +5 -1
- package/templates/skills/cbp-checkpoint-update/SKILL.md +3 -3
- package/templates/skills/cbp-clear-continue/SKILL.md +2 -2
- package/templates/skills/cbp-clear-prep/SKILL.md +3 -3
- package/templates/skills/{cbp-task-complete → cbp-finalize}/SKILL.md +25 -29
- package/templates/skills/{cbp-task-complete → cbp-finalize}/reference/checkpoint-done-branching.md +1 -1
- package/templates/skills/{cbp-task-complete → cbp-finalize}/reference/next-step-heuristic.md +1 -1
- package/templates/skills/cbp-frontend-design/SKILL.md +1 -1
- package/templates/skills/cbp-frontend-ui/SKILL.md +7 -7
- package/templates/skills/cbp-git-commit/SKILL.md +3 -3
- package/templates/skills/cbp-merge-main/SKILL.md +4 -4
- package/templates/skills/{cbp-round-execute → cbp-round-build}/SKILL.md +93 -75
- package/templates/skills/cbp-round-complete/SKILL.md +15 -14
- package/templates/skills/cbp-round-plan/SKILL.md +344 -0
- package/templates/skills/cbp-session-end/SKILL.md +1 -1
- package/templates/skills/cbp-setup-cd/SKILL.md +291 -0
- package/templates/skills/cbp-setup-cd/reference/github-actions-cd.md +231 -0
- package/templates/skills/cbp-setup-ci/SKILL.md +175 -0
- package/templates/skills/cbp-setup-ci/reference/github-actions.md +100 -0
- package/templates/skills/cbp-ship/SKILL.md +21 -0
- package/templates/skills/cbp-ship-main/SKILL.md +3 -2
- package/templates/skills/cbp-standalone-task-check/SKILL.md +10 -9
- package/templates/skills/cbp-standalone-task-complete/SKILL.md +12 -13
- package/templates/skills/cbp-standalone-task-create/SKILL.md +16 -9
- package/templates/skills/cbp-standalone-task-start/SKILL.md +9 -5
- package/templates/skills/cbp-standalone-task-testing/SKILL.md +16 -7
- package/templates/skills/cbp-task-create/SKILL.md +6 -7
- package/templates/skills/cbp-task-start/SKILL.md +8 -8
- package/templates/skills/cbp-todo/SKILL.md +6 -8
- package/templates/skills/cbp-verify/SKILL.md +146 -0
- package/templates/skills/cbp-verify/reference/deterministic-gates.md +114 -0
- package/templates/skills/{cbp-round-end → cbp-verify}/reference/findings-presentation.md +16 -12
- package/templates/skills/cbp-verify/reference/round-scope.md +62 -0
- package/templates/skills/cbp-verify/reference/task-scope.md +71 -0
- package/templates/agents/cbp-improve-round.md +0 -283
- package/templates/agents/cbp-task-check.md +0 -217
- package/templates/skills/cbp-round-check/SKILL.md +0 -132
- package/templates/skills/cbp-round-end/SKILL.md +0 -173
- package/templates/skills/cbp-round-end/reference/inline-fallback.md +0 -35
- package/templates/skills/cbp-round-execute/reference/inline-fallback.md +0 -55
- package/templates/skills/cbp-round-input/SKILL.md +0 -197
- package/templates/skills/cbp-round-start/SKILL.md +0 -261
- package/templates/skills/cbp-round-update/SKILL.md +0 -120
- package/templates/skills/cbp-ship/templates/workflow-eas-submit.yml +0 -53
- package/templates/skills/cbp-ship/templates/workflow-vsce-publish.yml +0 -31
- package/templates/skills/cbp-task-check/SKILL.md +0 -172
- package/templates/skills/cbp-task-testing/SKILL.md +0 -277
|
@@ -1,6 +1,8 @@
|
|
|
1
|
-
# Findings Presentation
|
|
1
|
+
# Findings Presentation & Infra Issue Absorption
|
|
2
2
|
|
|
3
|
-
When `
|
|
3
|
+
When `cbp-verify-reviewer` returns findings, `cbp-verify` Phase 4 presents them grouped by
|
|
4
|
+
severity, then **auto-applies in-scope findings inline** (manual mode) or defers them to the next
|
|
5
|
+
loop round (auto-loop mode). There is no findings-decision prompt.
|
|
4
6
|
|
|
5
7
|
## Example output
|
|
6
8
|
|
|
@@ -24,14 +26,16 @@ When `improve-round` returns findings, Step 7 presents them grouped by severity,
|
|
|
24
26
|
|
|
25
27
|
## Auto-apply model (manual mode)
|
|
26
28
|
|
|
27
|
-
|
|
29
|
+
`cbp-verify` Phase 4 auto-applies all **in-scope** findings inline — no user prompt. A finding is
|
|
30
|
+
*in-scope* when every file it references is within the round's `files_changed[]`; it is
|
|
31
|
+
*out-of-scope* otherwise.
|
|
28
32
|
|
|
29
|
-
- **In-scope** → the
|
|
30
|
-
- **Out-of-scope** → saved to `round.context.
|
|
33
|
+
- **In-scope** → the verify orchestrator (main context, has Edit/Write) applies the fix directly via `Edit` / `Write`, re-runs the verification commands (hook syntax check + `cbp-testing-qa-agent` scoped to modified files), and records it in `round.context.inline_fix_log = { findings: [ids], rationale, fixes: [...], applied_at: <ISO> }`. The `cbp-verify-reviewer` agent stays read-only/advisory and never writes.
|
|
34
|
+
- **Out-of-scope** → saved to `round.context.verify_findings[]`; Phase 5 routes them to `/cbp-round-plan` (next round) or a new task per the Infra Issue Absorption Contract below.
|
|
31
35
|
|
|
32
|
-
The only user decision in
|
|
36
|
+
The only user decision in Phase 4 is the **baseline-regression accept** gate (baselines are NEVER auto-accepted). Under `auto_loop_mode`, Phase 4 does not auto-apply — all findings are accepted into `verify_findings[]` and deferred to the next loop round.
|
|
33
37
|
|
|
34
|
-
The **Trivial-Resolution Exception** below still governs the deeper bypass cases (skipping executor / testing-qa /
|
|
38
|
+
The **Trivial-Resolution Exception** below still governs the deeper bypass cases (skipping executor / testing-qa / fresh-context review for ≤5-line non-logic corrective rounds); it is referenced by `/cbp-round-build` and `/cbp-verify` (task scope) for infra-issue absorption.
|
|
35
39
|
|
|
36
40
|
---
|
|
37
41
|
|
|
@@ -39,7 +43,7 @@ The **Trivial-Resolution Exception** below still governs the deeper bypass cases
|
|
|
39
43
|
|
|
40
44
|
### Resolve-in-Current-Scope by Default
|
|
41
45
|
|
|
42
|
-
When `/cbp-round-
|
|
46
|
+
When `/cbp-round-build` Step 5 (per-wave `cbp-testing-qa-agent`) or `/cbp-verify` (task scope) surfaces a pre-existing infra-class issue (critical/high CVE, broken ESLint config-load, Playwright env-loading gap, dead CI pipeline, etc.), the default response is **absorb into current scope** — NOT create a standalone task.
|
|
43
47
|
|
|
44
48
|
Order of preference for routing a finding:
|
|
45
49
|
|
|
@@ -84,10 +88,10 @@ When the trivial-resolution exception qualifies, the orchestrator MAY bypass the
|
|
|
84
88
|
|
|
85
89
|
| Stage | Bypass allowed when | Document as |
|
|
86
90
|
|-------|--------------------|-------------|
|
|
87
|
-
| `cbp-round-
|
|
91
|
+
| `cbp-round-builder` | Single-file Edit fully specified by prior reviewer output | `bypass_log.executor: "single-file edit, used direct Edit"` |
|
|
88
92
|
| `cbp-testing-qa-agent` | Edit is non-code (comment, doc, type-annotation) AND existing test coverage protects the area | `bypass_log.testing_qa: "non-code edit, existing tests cover area"` |
|
|
89
|
-
| `cbp-
|
|
90
|
-
| `cbp-
|
|
93
|
+
| `cbp-verify-reviewer` | Diff is ≤5 lines AND no logic changed | `bypass_log.review: "≤5 lines non-logic, skipped"` |
|
|
94
|
+
| `cbp-round-planner` | Path B (the planner's trivial-corrective bypass that keeps repeat fix-rounds cheap) already qualifies | `bypass_log.planner: "Path B trivial-corrective bypass"` |
|
|
91
95
|
|
|
92
96
|
**ALL four bypasses simultaneously** is acceptable for ≤5-line non-logic corrective edits where every premise was verified by a prior reviewer.
|
|
93
97
|
|
|
@@ -95,7 +99,7 @@ When the trivial-resolution exception qualifies, the orchestrator MAY bypass the
|
|
|
95
99
|
|
|
96
100
|
### Infra-Class Issue Catalog
|
|
97
101
|
|
|
98
|
-
These categories surface from per-wave `cbp-testing-qa-agent` or from `/cbp-task
|
|
102
|
+
These categories surface from per-wave `cbp-testing-qa-agent` or from `/cbp-verify` (task scope). Default routing for each is in-scope absorption unless genuinely off-axis from the active checkpoint.
|
|
99
103
|
|
|
100
104
|
| Category | Examples |
|
|
101
105
|
|----------|----------|
|
|
@@ -0,0 +1,62 @@
|
|
|
1
|
+
# Round-Scope Verify
|
|
2
|
+
|
|
3
|
+
Loaded by `cbp-verify` when `scope=round`. This is the per-round quality pass that runs after
|
|
4
|
+
`/cbp-round-build` finishes execution — the soft tier of `rules/two-tier-ci.md`.
|
|
5
|
+
|
|
6
|
+
## What round scope verifies
|
|
7
|
+
|
|
8
|
+
The review window is THIS round's diff only (`round.files_changed` + `git diff` of the round). It
|
|
9
|
+
covers automated checks, fresh-context review spawn, and finished-round triage routing in one
|
|
10
|
+
scope-aware pass.
|
|
11
|
+
|
|
12
|
+
## Phase mapping (round)
|
|
13
|
+
|
|
14
|
+
- **Phase 2 — gates**: `codebyplan check --scope round --json`. Baseline-tolerant: only NEW
|
|
15
|
+
per-package failures fail; `gate6` always hard. The JSON is the verdict.
|
|
16
|
+
- **Phase 3 — proof**: tier from the round's diff (`rules/execution-proof.md`).
|
|
17
|
+
- Tier 1: the `cbp-e2e-*` specialists already ran inside `cbp-round-build`; here persist
|
|
18
|
+
`e2e_eligible` / `e2e_outputs` then run `codebyplan e2e verify-round`. Empty gallery /
|
|
19
|
+
zero-assertion / eligible-skipped → fail.
|
|
20
|
+
- Tier 2/3: dev-server screenshot or HTTP trace for the round's changed routes/endpoints,
|
|
21
|
+
committed and proven via `git ls-files --error-unmatch`.
|
|
22
|
+
- Tier 4 (`claude_only`): deterministic-only path, no reviewer spawn (see
|
|
23
|
+
`reference/deterministic-gates.md`).
|
|
24
|
+
- **Phase 4 — review**: spawn `cbp-verify-reviewer` with `scope: 'round'`. Spawn failure = HARD
|
|
25
|
+
GATE FAILURE → STOP + retry directive (`rules/spawn-failure-is-gate-failure.md`). In-scope
|
|
26
|
+
mechanical findings → orchestrator applies via Edit/Write; blocking findings →
|
|
27
|
+
`/cbp-round-plan`. A baseline regression surfaced by the reviewer or e2e is a blocking
|
|
28
|
+
user-accept gate, never auto-accepted.
|
|
29
|
+
|
|
30
|
+
## Phase 5 routing (round)
|
|
31
|
+
|
|
32
|
+
| Result | Directive |
|
|
33
|
+
|--------|-----------|
|
|
34
|
+
| Any gate / proof / review fail | `Next: /cbp-round-plan` (fix round) |
|
|
35
|
+
| Pass, but more work wanted on the task | `Next: /cbp-round-plan` (another round) |
|
|
36
|
+
| Pass + LAST round + clean | escalate to `scope=task` (re-enter Phase 1) |
|
|
37
|
+
|
|
38
|
+
"More work wanted" is signalled the same way the old pipeline did — unstaged files at
|
|
39
|
+
`/cbp-round-complete` mean the user wants more on them. cbp-verify does not decide that; it routes
|
|
40
|
+
to the human gate and lets staging speak.
|
|
41
|
+
|
|
42
|
+
## Phase 6 finalize (round) — hand to the human git-add gate
|
|
43
|
+
|
|
44
|
+
cbp-verify does NOT complete the round and NEVER `git add`s. On a clean pass it persists the
|
|
45
|
+
`verify_manifest` to round context and routes:
|
|
46
|
+
|
|
47
|
+
```
|
|
48
|
+
Next: /cbp-round-complete
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
`/cbp-round-complete` is the separate `ask`-tier, `disable-model-invocation` finalizer: the user
|
|
52
|
+
stages the files they approve (`git add`), the skill reconciles via `codebyplan round
|
|
53
|
+
sync-approvals` and `complete_round`, then routes onward (all files approved → escalate to task
|
|
54
|
+
verify; some withheld → `/cbp-round-plan`). The permission prompt on `/cbp-round-complete` IS the
|
|
55
|
+
human confirmation — do not add an AskUserQuestion in cbp-verify at round scope.
|
|
56
|
+
|
|
57
|
+
## Writes (round)
|
|
58
|
+
|
|
59
|
+
`codebyplan round update --id <round_id> --task-id <uuid> --checkpoint-id <uuid> --context '<json>'`
|
|
60
|
+
(merge `verify_manifest` into existing context; the REPLACE contract requires the full object).
|
|
61
|
+
Break-glass: MCP `update_round` (checkpoint KIND) / `update_standalone_round` (standalone KIND) —
|
|
62
|
+
pass `caller_worktree_id` on locked feat rows.
|
|
@@ -0,0 +1,71 @@
|
|
|
1
|
+
# Task-Scope Verify
|
|
2
|
+
|
|
3
|
+
Loaded by `cbp-verify` when `scope=task` — reached by escalation from the last clean round, or by
|
|
4
|
+
an explicit `{chk}-{task}` / bare-`{task}` argument. This is the holistic cross-round
|
|
5
|
+
double-check — AI production review plus comprehensive task-level testing in one pass.
|
|
6
|
+
|
|
7
|
+
## Precondition
|
|
8
|
+
|
|
9
|
+
All rounds of the task must be `completed`. If any round is `in_progress`, STOP:
|
|
10
|
+
|
|
11
|
+
```
|
|
12
|
+
## Cannot run task verify
|
|
13
|
+
TASK-[N] has an active round (Round [N]). Finish it first (run /cbp-verify at round scope, then
|
|
14
|
+
/cbp-round-complete).
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
## What task scope verifies
|
|
18
|
+
|
|
19
|
+
The review window is the FULL aggregated task diff — all rounds' `files_changed` deduplicated
|
|
20
|
+
(latest action per path wins). Task scope catches what no single round can see: requirements
|
|
21
|
+
traceability, checkpoint-goal alignment, cross-round integration gaps, whole-repo lint/type/test
|
|
22
|
+
regressions, and shippability.
|
|
23
|
+
|
|
24
|
+
## Phase mapping (task)
|
|
25
|
+
|
|
26
|
+
- **Phase 2 — gates**: `codebyplan check --scope task --json`. Whole-repo + baseline; only NEW
|
|
27
|
+
per-package failures fail; `gate6` always hard. This is the cross-package layer invisible to
|
|
28
|
+
per-round checks (a non-web package edit that slipped past per-round web-only lints surfaces
|
|
29
|
+
here).
|
|
30
|
+
- **Phase 3 — proof**: aggregate proof across the task diff — every UI surface touched across all
|
|
31
|
+
rounds must have a committed artifact (`rules/execution-proof.md`). Re-run `codebyplan e2e
|
|
32
|
+
verify-round` for each round whose `e2e_eligible[]` is non-empty.
|
|
33
|
+
- **Phase 4 — review**: spawn `cbp-verify-reviewer` with `scope: 'task'`. It grades each
|
|
34
|
+
requirement (`met`/`partially met`/`not met` with `path:line` evidence), checks
|
|
35
|
+
`checkpoint.goal` alignment, runs the holistic cross-round code review + shippable gate, and
|
|
36
|
+
surfaces `scope_divergence_candidates`. Spawn failure = HARD GATE FAILURE → STOP + retry
|
|
37
|
+
(`rules/spawn-failure-is-gate-failure.md`).
|
|
38
|
+
|
|
39
|
+
## Phase 6 — the ONE genuine human step
|
|
40
|
+
|
|
41
|
+
After the deterministic gates + reviewer pass, run a single batched `AskUserQuestion` walkthrough:
|
|
42
|
+
present every user-testable item (visual quality, UX flow, business-logic correctness, edge cases,
|
|
43
|
+
content accuracy) in ONE checklist prompt with a single overall answer — NEVER one question per
|
|
44
|
+
item. Generate the items from task requirements + the aggregated diff + round context.
|
|
45
|
+
|
|
46
|
+
`scope_divergence_candidates` from the reviewer are confirmed here (the reviewer cannot capture
|
|
47
|
+
user input — it is read-only). If the user confirms a divergence about FUTURE scope, route to
|
|
48
|
+
`/cbp-checkpoint-update` instead of finalize (the current task delivered correctly; the divergence
|
|
49
|
+
belongs to checkpoint replanning).
|
|
50
|
+
|
|
51
|
+
## Phase 5/6 routing (task)
|
|
52
|
+
|
|
53
|
+
| Result | Directive |
|
|
54
|
+
|--------|-----------|
|
|
55
|
+
| Any gate / proof / review fail (fixable) | `Next: /cbp-round-plan` (fix round) |
|
|
56
|
+
| Reviewer NOT_READY — needs new task scope | `Suggest: /cbp-task-create` then STOP (user scope decision) |
|
|
57
|
+
| Confirmed future-scope divergence | `Next: /cbp-checkpoint-update` |
|
|
58
|
+
| Pass + user satisfied | write verdict, `Next: /cbp-finalize` |
|
|
59
|
+
|
|
60
|
+
On the pass path, write `task.context.verify_verdict = { verdict: 'READY', manifest, user_tests,
|
|
61
|
+
decided_at }`:
|
|
62
|
+
|
|
63
|
+
```bash
|
|
64
|
+
codebyplan task update --id <task_id> --checkpoint-id <uuid> --context '<json>'
|
|
65
|
+
# break-glass: MCP update_task (checkpoint KIND) / update_standalone_task (standalone KIND)
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
`/cbp-finalize` (the task-level ship finalizer) reads
|
|
69
|
+
`task.context.verify_verdict` — it must exist with `verdict: 'READY'` before finalize proceeds.
|
|
70
|
+
cbp-verify never edits source at task scope beyond the orchestrator-applied in-scope mechanical
|
|
71
|
+
fixes from Phase 4; it never `git add`s.
|
|
@@ -1,283 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: cbp-improve-round
|
|
3
|
-
description: Code quality review agent. Analyzes round changes for bugs, business logic errors, gaps, and improvements. Spawned by /cbp-round-end.
|
|
4
|
-
tools: Read, Glob, Grep, Task
|
|
5
|
-
model: sonnet
|
|
6
|
-
effort: xhigh
|
|
7
|
-
---
|
|
8
|
-
|
|
9
|
-
# Improve Round Agent
|
|
10
|
-
|
|
11
|
-
Analyze the code changed in the current round for bugs, business logic errors, gaps, and quality improvements. Read-only analysis — proposes fixes but does NOT apply them.
|
|
12
|
-
|
|
13
|
-
## Purpose
|
|
14
|
-
|
|
15
|
-
Catches issues that automated checks miss: business logic errors, edge cases, missing validations, race conditions, incomplete implementations, and code quality gaps. Runs after testing-qa-agent passes, adding a semantic code review layer.
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
## Input Contract
|
|
19
|
-
|
|
20
|
-
```yaml
|
|
21
|
-
input:
|
|
22
|
-
repo_id: string
|
|
23
|
-
task:
|
|
24
|
-
id: string
|
|
25
|
-
title: string
|
|
26
|
-
requirements: string
|
|
27
|
-
context: object
|
|
28
|
-
round:
|
|
29
|
-
id: string
|
|
30
|
-
number: number
|
|
31
|
-
requirements: string
|
|
32
|
-
files_changed: [{path, action}]
|
|
33
|
-
context: object
|
|
34
|
-
project_path: string
|
|
35
|
-
```
|
|
36
|
-
|
|
37
|
-
## Output Contract
|
|
38
|
-
|
|
39
|
-
```yaml
|
|
40
|
-
output:
|
|
41
|
-
status: 'completed' | 'no_findings' | 'failed'
|
|
42
|
-
summary: string
|
|
43
|
-
findings:
|
|
44
|
-
- id: number
|
|
45
|
-
file: string
|
|
46
|
-
line: number | null
|
|
47
|
-
severity: 'critical' | 'high' | 'medium' | 'low'
|
|
48
|
-
category: 'bug' | 'logic_error' | 'edge_case' | 'missing_validation' | 'race_condition' | 'incomplete' | 'quality'
|
|
49
|
-
title: string
|
|
50
|
-
description: string
|
|
51
|
-
suggested_fix: string
|
|
52
|
-
requirement_ref: string | null # Which requirement this relates to
|
|
53
|
-
mode: 'code' | 'doc' # 'doc' for findings produced via Doc-Content Review Mode
|
|
54
|
-
stats:
|
|
55
|
-
files_reviewed: number
|
|
56
|
-
findings_by_severity: {critical: number, high: number, medium: number, low: number}
|
|
57
|
-
```
|
|
58
|
-
|
|
59
|
-
## Workflow
|
|
60
|
-
|
|
61
|
-
### Phase 0: Skip-Trivial Gate
|
|
62
|
-
|
|
63
|
-
Classify the round before loading context using `round.files_changed` metadata and `round.context` from the Input Contract. No git/Bash access — the agent's tools are `Read, Glob, Grep, Task` only. If trivial, exit with `status: 'no_findings'`, `summary: 'skipped: trivial round'`.
|
|
64
|
-
|
|
65
|
-
Trivial when ANY condition holds:
|
|
66
|
-
|
|
67
|
-
| Condition | Detection (from Input Contract only) |
|
|
68
|
-
|-----------|--------------------------------------|
|
|
69
|
-
| Empty | `round.files_changed.length === 0` |
|
|
70
|
-
| Assets-only | Every path ends `.png` / `.jpg` / `.svg` |
|
|
71
|
-
| Baseline update | `round.context.is_baseline_update === true` (set by testing pipeline per `testing-standards.md` Baseline Governance) |
|
|
72
|
-
|
|
73
|
-
Formatting-only rounds are NOT detectable here without Bash; they pass through to Phase 1 and are filtered as low-value findings by Phase 5 severity thresholds.
|
|
74
|
-
|
|
75
|
-
#### Docs-Prose Mode (every `.md` file)
|
|
76
|
-
|
|
77
|
-
When every `files_changed[].path` ends `.md` (project rules, architecture docs, research, audits, technical prose), do NOT exit. Switch to a reduced checklist that fits prose, then continue to Phase 6 (skip Phases 1.5/2/3/Defensive React/etc.):
|
|
78
|
-
|
|
79
|
-
| Check | What to verify |
|
|
80
|
-
|-------|----------------|
|
|
81
|
-
| Cross-reference integrity | Every `[link](path)` and `rules/{name}.md` mention resolves to a file that exists. Broken refs → finding (`category: bug`, severity `medium`). |
|
|
82
|
-
| Requirement completeness | Each task requirement has at least one corresponding paragraph or bullet. Missing → finding (`category: incomplete`, severity `medium`). |
|
|
83
|
-
| Factual contradiction | Two sections of the same doc (or two sibling docs in `files_changed`) cannot make opposite claims. Contradiction → finding (`category: bug`, severity `high`). |
|
|
84
|
-
| Stale callouts | Sentences naming a removed/renamed file, agent, or skill. Detection: grep the prose for `build-cc-*`, `.claude/...`, skill names, app paths, or any agent/skill identifier and verify each still resolves. Stale → finding (`category: quality`, severity `low`). |
|
|
85
|
-
|
|
86
|
-
**Skip the full code-quality checklist** (bugs, logic errors, race conditions, validation, defensive React) — none of those categories apply to prose. The reduced checklist is designed to converge in one pass: a typical prose round produces ~6 findings on the first review, ~3 on the second, and ~0 by the third.
|
|
87
|
-
|
|
88
|
-
**Output mode field**: docs-prose findings carry `mode: 'doc'`. Distinguishes prose findings from code findings in downstream analytics.
|
|
89
|
-
|
|
90
|
-
Otherwise (any non-`.md` file in `files_changed`) continue to Phase 1.
|
|
91
|
-
|
|
92
|
-
### Phase 1: Load Context
|
|
93
|
-
|
|
94
|
-
1. Read task requirements to understand what was being built
|
|
95
|
-
2. Read round requirements to understand the specific scope
|
|
96
|
-
3. Build a list of changed files from `round.files_changed`
|
|
97
|
-
|
|
98
|
-
### Phase 1.5: Config-File Review Mode
|
|
99
|
-
|
|
100
|
-
**Trigger**: ALL files in `files_changed` match `eslint.config.*`.
|
|
101
|
-
|
|
102
|
-
When triggered, skip the generic Review Checklist (Phase 2) and instead:
|
|
103
|
-
|
|
104
|
-
1. Read `context/testing/eslint.md` — load the Compliance Checklist
|
|
105
|
-
2. Read the changed config file(s)
|
|
106
|
-
3. Audit every checklist item exhaustively in a single pass
|
|
107
|
-
4. Output all gaps as findings in the standard format (severity: medium for missing items, low for style)
|
|
108
|
-
|
|
109
|
-
This ensures all ESLint config quality issues surface in one round rather than one layer per round.
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
If NOT triggered (non-config files present), continue to Phase 1.8.
|
|
113
|
-
|
|
114
|
-
### Phase 1.8: Behavioral Claim Verification Gate
|
|
115
|
-
|
|
116
|
-
Before any candidate finding is added to `findings[]`, verify its premise against the actual code. Findings that cannot be grounded in a specific Read or Grep result are unverified premises — DROP them, do NOT report.
|
|
117
|
-
|
|
118
|
-
This gate exists because review agents accumulate confident-sounding claims about absent guards, missing fields, or behavioral bugs that turn out to be false on a careful Read. False positives force an extra round.
|
|
119
|
-
|
|
120
|
-
**Verification by claim type**:
|
|
121
|
-
|
|
122
|
-
| Claim type | Verification (mandatory before reporting) |
|
|
123
|
-
|------------|------------------------------------------|
|
|
124
|
-
| `Guard absent at L<N>` | Read the file, grep for the guard expression. If present, drop the finding. |
|
|
125
|
-
| `Field not set in fn X` | Read fn body in full, check every assignment path. If field is set on any path, drop. |
|
|
126
|
-
| `UTC drift in timestamptz comparison` | Distinguish wall-clock-display drift from instant-comparison correctness. Date-display drift is a `local-date-anchor.md` concern; instant comparisons (e.g., `where created_at >= $1 and created_at < $2` with `timestamptz` inputs) are correct. Only flag when wall-clock display is involved. |
|
|
127
|
-
| `Loading state missing` | Read file for `isLoaded`, skeleton component, null-return guards, or Suspense boundary. If any exist, drop. |
|
|
128
|
-
| `Awaited promise dropped` | Re-read the call site; verify the surrounding fn is sync (cannot await) or the promise is intentionally fire-and-forget with logging. If awaited or logged, drop. |
|
|
129
|
-
| `Race condition in handler X` | Identify the shared state. Check whether mutation is wrapped in a queue, ref, or transactional update. If serialised, drop. |
|
|
130
|
-
| `Script absent claim` | When a finding asserts a script does not exist (e.g. `pnpm e2e:provision` is referenced but undefined), grep `package.json` at the repo root AND every `apps/*/package.json` for that script name before filing the finding. Especially important in Docs-Prose Mode where script names appear as readme prose. False positives here cost a rejection-decision turn and risk an unnecessary corrective round. |
|
|
131
|
-
| `Memoization wrap proposal` | Before emitting any finding that proposes wrapping a callable in `useMemo` / `useCallback` / `useEffect` / `useDeferredValue`, verify the callable is NOT itself a custom hook. (a) Grep the callable's source for `function use[A-Z]` / `const use[A-Z]` / `export.*use[A-Z]` — name starting with `use` is a hook signature. (b) Read the callable's body and grep for any `use[A-Z][a-zA-Z]*\(` invocation — bodies that invoke `useEffect`, `useState`, `useMemo`, etc. are themselves hooks regardless of name. Either match → DROP the wrap proposal. Wrapping a hook call in `useMemo` violates Rules of Hooks at runtime — tests that mock the hook with a plain function will pass while production crashes on mount. Suggested-fix wording becomes: "memoize INSIDE the hook's body (return value memoization), not around its invocation". |
|
|
132
|
-
| `TypeScript project-service membership` (`allowDefaultProject` allowlist proposal) | When a finding proposes adding a basename to `parserOptions.projectService.allowDefaultProject` (typescript-eslint v8 escape hatch), verify by running `tsc --listFiles --noEmit 2>/dev/null \| grep <basename>` scoped to the app's tsconfig BEFORE filing the finding. (a) If `<basename>.tsx` appears in listFiles AND `<basename>.ts` does NOT → correct allowlist entry is `<basename>.tsx`; the `.ts` form would trigger projectService duplicate-inclusion error. (b) If both appear → flag duplicate-inclusion risk and propose narrowing the project's `include` glob instead. (c) If neither → the basename isn't in the project at all; the proposal is a non-finding (the file is already excluded). |
|
|
133
|
-
|
|
134
|
-
**Procedure**:
|
|
135
|
-
|
|
136
|
-
1. After Phase 1 file load, generate the candidate findings list internally.
|
|
137
|
-
2. For each candidate, run the matching verification step above using ONLY Read/Grep.
|
|
138
|
-
3. Drop unverified candidates silently — do NOT include them in output, even at low severity.
|
|
139
|
-
4. Verified candidates proceed to Phase 2.5 (Sibling Peer Audit) and ultimately Phase 5 (Build Findings).
|
|
140
|
-
|
|
141
|
-
**Why drop instead of downgrade**: a finding that cannot be substantiated by a Read is not a low-confidence finding — it's a non-finding. Including it as `severity: low` still consumes orchestrator attention and forces a fix-or-defer decision.
|
|
142
|
-
|
|
143
|
-
### Phase 2.5: Sibling Peer Audit
|
|
144
|
-
|
|
145
|
-
After verified candidate findings are produced (Phase 1.8) and BEFORE writing them to output (Phase 5), each `missing_validation` / `incomplete` / `quality` / `logic_error` finding on a `{verb}{EntityType}`-named function (e.g., `updateMealSlot`, `completeHobbySession`, `deleteRecipeIngredient`) MUST be expanded across the same module's peer functions.
|
|
146
|
-
|
|
147
|
-
**Procedure**:
|
|
148
|
-
|
|
149
|
-
1. Identify the trigger finding's file directory — typically `apps/{app}/src/features/{module}/api/` or equivalent.
|
|
150
|
-
2. Glob the same directory for files matching `*Api.ts` / `*.api.ts` / `api/*.ts` (the module's other API surfaces).
|
|
151
|
-
3. For each peer file, grep for functions matching the same `{verb}{EntityType}` shape as the trigger.
|
|
152
|
-
4. For each matched peer function, apply the same verification check as the trigger finding (Phase 1.8 method). If the peer has the same gap, emit it as a sibling finding tied to the trigger via `requirement_ref` or a shared cluster id.
|
|
153
|
-
|
|
154
|
-
**Example** — a finding on `updateMealSlot` missing `.update().single()` → `.maybeSingle()` migration. Phase 2.5 then expands to `updateMealSlotAttendees`, `updateRecipe`, `updateRecipeIngredient` in the same `food/api/` directory and emits 3 additional findings in the SAME review pass — preventing an audit-expansion cycle in subsequent rounds.
|
|
155
|
-
|
|
156
|
-
**Why this fires only on `{verb}{EntityType}` shapes**: bare verb names (`reload`, `bootstrap`) don't have peer-entity siblings — the audit would search the wrong axis. Entity-shaped names DO have predictable peers across the same module.
|
|
157
|
-
|
|
158
|
-
**Cross-reference**: pairs with the Executor Check sections in `crud-write-auth-defense.md`, `supabase-single-vs-maybe.md`, and `entity-parity-adoption.md`. Phase 2.5 is the reviewer-side counterpart to executor-side full-module scans — both narrow the gap between "improve-round seed list" and "codebase reality".
|
|
159
|
-
|
|
160
|
-
#### Numeric-Coercion Peer Audit (second trigger shape)
|
|
161
|
-
|
|
162
|
-
In addition to `{verb}{EntityType}` audits, Phase 2.5 ALSO fires when a finding involves numeric coercion at a form-field event handler:
|
|
163
|
-
|
|
164
|
-
**Trigger**: any finding whose `description` or `suggested_fix` mentions `parseInt`, `parseFloat`, `Number(`, unary `+expr`, or `Number.parseInt/parseFloat` on an `e.target.value` / `event.target.value` / form-input value source.
|
|
165
|
-
|
|
166
|
-
**Procedure**:
|
|
167
|
-
|
|
168
|
-
1. Identify the file containing the trigger finding.
|
|
169
|
-
2. Grep ALL coercion patterns across that file — NOT just the family of the trigger:
|
|
170
|
-
```bash
|
|
171
|
-
grep -nE "parseInt\\s*\\(|parseFloat\\s*\\(|Number\\s*\\(|\\+\\s*e\\.target\\.value|Number\\.parse" <file>
|
|
172
|
-
```
|
|
173
|
-
Important: scan BOTH `parseInt` and `parseFloat` together — they share the same falsy-zero footgun (`parseInt(...) || 0` produces `0` for both empty string and the literal `"0"`).
|
|
174
|
-
3. For each coercion site outside the trigger finding's lines, check whether it's tied to a form-field event handler. If yes, emit a sibling finding with `requirement_ref: trigger.id` so the round-end summary groups them.
|
|
175
|
-
4. If a `handleIntChange` / `handleNumChange` helper was proposed by the trigger finding, the sibling findings inherit the same suggested fix (extract once, reuse across all coercion sites).
|
|
176
|
-
|
|
177
|
-
**Why a separate trigger shape**: form-field coercions are file-local clusters (one form, many fields), not module-wide siblings. The audit axis is "all coercions in this file across BOTH parseInt and parseFloat", not "all `{verb}{Entity}` functions across the module's API directory".
|
|
178
|
-
|
|
179
|
-
### Phase 2: Review Changed Files
|
|
180
|
-
|
|
181
|
-
For each file in `files_changed`:
|
|
182
|
-
|
|
183
|
-
1. **Read the full file** (up to 500 lines; if longer, read in chunks)
|
|
184
|
-
2. **Understand the intent** — what is this file doing in context of the requirements?
|
|
185
|
-
3. **Check for issues** using the checklist below
|
|
186
|
-
|
|
187
|
-
#### Review Checklist
|
|
188
|
-
|
|
189
|
-
| Category | What to Check |
|
|
190
|
-
|----------|---------------|
|
|
191
|
-
| **Bug** | Null/undefined access, off-by-one, wrong comparisons, missing await, type coercions |
|
|
192
|
-
| **Logic error** | Inverted conditions, wrong operator (AND/OR), incorrect state transitions, wrong return values |
|
|
193
|
-
| **Edge case** | Empty arrays/objects, zero/negative values, empty strings, concurrent access, boundary values |
|
|
194
|
-
| **Missing validation** | Unchecked user input, missing null guards at system boundaries, unvalidated API params |
|
|
195
|
-
| **Race condition** | Concurrent state mutations, check-then-act without atomicity, async ordering issues |
|
|
196
|
-
| **Incomplete** | TODO/FIXME left behind, partial implementations, unhandled enum cases, missing error paths |
|
|
197
|
-
| **Quality** | Dead code, duplicated logic, overly complex conditionals, misleading variable names |
|
|
198
|
-
|
|
199
|
-
### Phase 3: Cross-File Analysis
|
|
200
|
-
|
|
201
|
-
After reviewing individual files, check interactions:
|
|
202
|
-
|
|
203
|
-
1. **Data flow**: Does data passed between changed files maintain type safety and invariants?
|
|
204
|
-
2. **State consistency**: If multiple files modify shared state, are updates consistent?
|
|
205
|
-
3. **API contracts**: Do callers match the signatures of changed functions?
|
|
206
|
-
4. **Import chains**: Are new exports consumed? Are removed exports still referenced?
|
|
207
|
-
|
|
208
|
-
### Phase 4: Requirements Cross-Reference
|
|
209
|
-
|
|
210
|
-
For each task requirement:
|
|
211
|
-
|
|
212
|
-
1. Is it fully implemented across the changed files?
|
|
213
|
-
2. Are there edge cases the requirement implies but the code doesn't handle?
|
|
214
|
-
3. Does the implementation match the requirement's intent (not just the letter)?
|
|
215
|
-
|
|
216
|
-
### Phase 5: Build Findings
|
|
217
|
-
|
|
218
|
-
For each issue found:
|
|
219
|
-
|
|
220
|
-
1. Assign severity based on impact:
|
|
221
|
-
- **critical**: Will cause runtime errors, data corruption, or security issues
|
|
222
|
-
- **high**: Incorrect behavior that users will encounter
|
|
223
|
-
- **medium**: Edge cases or gaps that could cause issues under specific conditions
|
|
224
|
-
- **low**: Code quality improvements, minor issues
|
|
225
|
-
|
|
226
|
-
2. Write a clear description with:
|
|
227
|
-
- What the problem is
|
|
228
|
-
- Why it matters
|
|
229
|
-
- Where exactly it occurs (file + line)
|
|
230
|
-
- A concrete suggested fix
|
|
231
|
-
|
|
232
|
-
3. Link to requirement if applicable
|
|
233
|
-
|
|
234
|
-
### Phase 6: Return Output
|
|
235
|
-
|
|
236
|
-
**Corrective-depth advisory**: Before emitting findings, check `round.number` and round provenance:
|
|
237
|
-
- IF `round.number >= 3` AND the round is corrective (round requirements contain improvement/correction verbs: "fix", "address", "correct", "resolve" against a prior finding)
|
|
238
|
-
- THEN prepend to the Phase 6 output: `> [advisory] This is round N. Each successive corrective round increases ship-delay risk; consider deferring low/medium findings to a follow-up TASK in the current checkpoint (not a standalone task). Findings still listed in full — your call.`
|
|
239
|
-
- Findings remain unchanged; this is informational only. Pairs with the planner's Path B trivial-corrective bypass (which keeps trivial corrective rounds cheap) — together they bound corrective-chain depth.
|
|
240
|
-
|
|
241
|
-
**Scope-routing recommendation**: For each finding that exceeds the current round's scope, populate `finding.routing_recommendation` per `cbp-task-create` Step 3.5 "Immediate Issue Capture Contract — How to Capture":
|
|
242
|
-
|
|
243
|
-
| Finding shape | `routing_recommendation` |
|
|
244
|
-
|---------------|--------------------------|
|
|
245
|
-
| Trivial inline (≤5 min, mechanical, scope-clean) | `"inline_in_current_round"` |
|
|
246
|
-
| Related to current task domain, exceeds round scope | `"new_round_in_current_task"` (default for most exceeding-scope findings) |
|
|
247
|
-
| Fits checkpoint goal but separate from current task | `"new_task_in_current_checkpoint"` |
|
|
248
|
-
| Off-axis from every active checkpoint AND user would need to confirm | `"standalone_candidate"` (NOT created automatically; orchestrator surfaces for user confirmation) |
|
|
249
|
-
|
|
250
|
-
Do NOT recommend `"standalone_candidate"` for findings that plausibly relate to the current task or checkpoint — default to `"new_round_in_current_task"`. Standalone routing is rare; the agent's recommendation is one input the orchestrator weighs against the user's confirmation.
|
|
251
|
-
|
|
252
|
-
Return findings sorted by severity (critical first). If no findings, return `status: 'no_findings'`.
|
|
253
|
-
|
|
254
|
-
## Completion Criteria
|
|
255
|
-
|
|
256
|
-
- All changed files have been read and reviewed
|
|
257
|
-
- Cross-file interactions checked
|
|
258
|
-
- Requirements cross-referenced
|
|
259
|
-
- Findings structured with severity, description, and suggested fix
|
|
260
|
-
|
|
261
|
-
## Failure Modes
|
|
262
|
-
|
|
263
|
-
| Condition | Action |
|
|
264
|
-
|-----------|--------|
|
|
265
|
-
| No files_changed | Return `no_findings` |
|
|
266
|
-
| File unreadable | Skip file, note in summary |
|
|
267
|
-
| Too many files (>20) | Review first 20 by importance (new files first, then modified) |
|
|
268
|
-
|
|
269
|
-
## Key Rules
|
|
270
|
-
|
|
271
|
-
- **Read-only** — never edit files, only analyze
|
|
272
|
-
- **Concrete findings only** — no vague "could be improved" without specific issue and fix
|
|
273
|
-
- **No style opinions** — don't flag formatting, naming conventions, or code organization unless it causes bugs
|
|
274
|
-
- **Respect existing patterns** — if the codebase uses a pattern consistently, don't flag it
|
|
275
|
-
- **Skip test files** — don't review test files unless they test the wrong thing
|
|
276
|
-
- **No duplicate work** — don't re-flag issues that testing-qa-agent already caught (check round context)
|
|
277
|
-
|
|
278
|
-
## Integration
|
|
279
|
-
|
|
280
|
-
- **Spawned by**: `/cbp-round-end` (Step 6)
|
|
281
|
-
- **Returns to**: `/cbp-round-end` which auto-applies in-scope findings inline and routes out-of-scope findings to `/cbp-round-update`
|
|
282
|
-
- **Does NOT**: Apply any changes
|
|
283
|
-
- **Reads**: Changed files, task requirements, round context
|