codebyplan 1.13.53 → 1.13.55
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/cli.js +1364 -352
- package/package.json +1 -1
- package/templates/agents/cbp-database-agent.md +1 -1
- package/templates/agents/cbp-e2e-maestro.md +1 -1
- package/templates/agents/cbp-e2e-playwright.md +24 -16
- package/templates/agents/cbp-e2e-tauri.md +1 -1
- package/templates/agents/cbp-e2e-vscode.md +1 -1
- package/templates/agents/cbp-e2e-xcuitest.md +1 -1
- package/templates/agents/cbp-improve-claude.md +2 -2
- package/templates/agents/{cbp-round-executor.md → cbp-round-builder.md} +23 -23
- package/templates/agents/{cbp-task-planner.md → cbp-round-planner.md} +26 -25
- package/templates/agents/cbp-security-agent.md +1 -1
- package/templates/agents/cbp-stripe-agent.md +2 -2
- package/templates/agents/cbp-testing-qa-agent.md +11 -11
- package/templates/agents/cbp-verify-reviewer.md +236 -0
- package/templates/context/architecture-map.md +4 -4
- package/templates/context/mcp-docs.md +57 -11
- package/templates/context/testing/e2e.md +9 -9
- package/templates/github-workflows/ci.yml +58 -0
- package/templates/hooks/cbp-skill-context-guard.sh +1 -1
- package/templates/hooks/cbp-test-hooks.sh +9 -9
- package/templates/hooks/validate-structure-lengths.sh +1 -1
- package/templates/hooks/validate-structure-patterns.sh +1 -1
- package/templates/rules/README.md +1 -2
- package/templates/rules/agent-claim-verification.md +1 -1
- package/templates/rules/context-file-loading.md +10 -10
- package/templates/rules/development-workflow.md +73 -0
- package/templates/rules/e2e-mandatory.md +8 -8
- package/templates/rules/execution-proof.md +70 -0
- package/templates/rules/model-invocation-convention.md +2 -2
- package/templates/rules/parallel-waves.md +11 -11
- package/templates/rules/spawn-failure-is-gate-failure.md +76 -0
- package/templates/rules/task-routing-recommendation.md +1 -1
- package/templates/rules/todo-backend.md +3 -3
- package/templates/rules/two-tier-ci.md +63 -0
- package/templates/settings.project.base.json +8 -10
- package/templates/skills/cbp-build-cc-mode/SKILL.md +1 -1
- package/templates/skills/cbp-build-cc-settings/reference/cbp-permission-policy.md +7 -7
- package/templates/skills/cbp-build-cc-skill/SKILL.md +1 -1
- package/templates/skills/cbp-build-cc-skill/reference/cbp-quality.md +2 -2
- package/templates/skills/cbp-build-cc-skill/reference/fork-eligibility.md +11 -14
- package/templates/skills/cbp-checkpoint-check/SKILL.md +2 -2
- package/templates/skills/cbp-checkpoint-create/SKILL.md +16 -1
- package/templates/skills/cbp-checkpoint-update/SKILL.md +3 -3
- package/templates/skills/cbp-clear-continue/SKILL.md +2 -2
- package/templates/skills/cbp-clear-prep/SKILL.md +3 -3
- package/templates/skills/{cbp-task-complete → cbp-finalize}/SKILL.md +25 -29
- package/templates/skills/{cbp-task-complete → cbp-finalize}/reference/checkpoint-done-branching.md +1 -1
- package/templates/skills/{cbp-task-complete → cbp-finalize}/reference/next-step-heuristic.md +1 -1
- package/templates/skills/cbp-frontend-design/SKILL.md +1 -1
- package/templates/skills/cbp-frontend-ui/SKILL.md +7 -7
- package/templates/skills/cbp-git-commit/SKILL.md +3 -3
- package/templates/skills/cbp-merge-main/SKILL.md +4 -4
- package/templates/skills/{cbp-round-execute → cbp-round-build}/SKILL.md +93 -75
- package/templates/skills/cbp-round-complete/SKILL.md +15 -14
- package/templates/skills/cbp-round-plan/SKILL.md +344 -0
- package/templates/skills/cbp-session-end/SKILL.md +1 -1
- package/templates/skills/cbp-ship-main/SKILL.md +3 -2
- package/templates/skills/cbp-standalone-task-check/SKILL.md +10 -9
- package/templates/skills/cbp-standalone-task-complete/SKILL.md +12 -13
- package/templates/skills/cbp-standalone-task-create/SKILL.md +16 -9
- package/templates/skills/cbp-standalone-task-start/SKILL.md +9 -5
- package/templates/skills/cbp-standalone-task-testing/SKILL.md +5 -5
- package/templates/skills/cbp-task-create/SKILL.md +6 -7
- package/templates/skills/cbp-task-start/SKILL.md +8 -8
- package/templates/skills/cbp-todo/SKILL.md +6 -8
- package/templates/skills/cbp-verify/SKILL.md +146 -0
- package/templates/skills/cbp-verify/reference/deterministic-gates.md +114 -0
- package/templates/skills/{cbp-round-end → cbp-verify}/reference/findings-presentation.md +16 -12
- package/templates/skills/cbp-verify/reference/round-scope.md +62 -0
- package/templates/skills/cbp-verify/reference/task-scope.md +71 -0
- package/templates/agents/cbp-improve-round.md +0 -283
- package/templates/agents/cbp-task-check.md +0 -217
- package/templates/skills/cbp-round-check/SKILL.md +0 -134
- package/templates/skills/cbp-round-end/SKILL.md +0 -173
- package/templates/skills/cbp-round-end/reference/inline-fallback.md +0 -35
- package/templates/skills/cbp-round-execute/reference/inline-fallback.md +0 -55
- package/templates/skills/cbp-round-input/SKILL.md +0 -197
- package/templates/skills/cbp-round-start/SKILL.md +0 -261
- package/templates/skills/cbp-round-update/SKILL.md +0 -120
- package/templates/skills/cbp-ship/templates/workflow-eas-submit.yml +0 -53
- package/templates/skills/cbp-ship/templates/workflow-vsce-publish.yml +0 -31
- package/templates/skills/cbp-task-check/SKILL.md +0 -172
- package/templates/skills/cbp-task-testing/SKILL.md +0 -279
|
@@ -1,283 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: cbp-improve-round
|
|
3
|
-
description: Code quality review agent. Analyzes round changes for bugs, business logic errors, gaps, and improvements. Spawned by /cbp-round-end.
|
|
4
|
-
tools: Read, Glob, Grep, Task
|
|
5
|
-
model: sonnet
|
|
6
|
-
effort: xhigh
|
|
7
|
-
---
|
|
8
|
-
|
|
9
|
-
# Improve Round Agent
|
|
10
|
-
|
|
11
|
-
Analyze the code changed in the current round for bugs, business logic errors, gaps, and quality improvements. Read-only analysis — proposes fixes but does NOT apply them.
|
|
12
|
-
|
|
13
|
-
## Purpose
|
|
14
|
-
|
|
15
|
-
Catches issues that automated checks miss: business logic errors, edge cases, missing validations, race conditions, incomplete implementations, and code quality gaps. Runs after testing-qa-agent passes, adding a semantic code review layer.
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
## Input Contract
|
|
19
|
-
|
|
20
|
-
```yaml
|
|
21
|
-
input:
|
|
22
|
-
repo_id: string
|
|
23
|
-
task:
|
|
24
|
-
id: string
|
|
25
|
-
title: string
|
|
26
|
-
requirements: string
|
|
27
|
-
context: object
|
|
28
|
-
round:
|
|
29
|
-
id: string
|
|
30
|
-
number: number
|
|
31
|
-
requirements: string
|
|
32
|
-
files_changed: [{path, action}]
|
|
33
|
-
context: object
|
|
34
|
-
project_path: string
|
|
35
|
-
```
|
|
36
|
-
|
|
37
|
-
## Output Contract
|
|
38
|
-
|
|
39
|
-
```yaml
|
|
40
|
-
output:
|
|
41
|
-
status: 'completed' | 'no_findings' | 'failed'
|
|
42
|
-
summary: string
|
|
43
|
-
findings:
|
|
44
|
-
- id: number
|
|
45
|
-
file: string
|
|
46
|
-
line: number | null
|
|
47
|
-
severity: 'critical' | 'high' | 'medium' | 'low'
|
|
48
|
-
category: 'bug' | 'logic_error' | 'edge_case' | 'missing_validation' | 'race_condition' | 'incomplete' | 'quality'
|
|
49
|
-
title: string
|
|
50
|
-
description: string
|
|
51
|
-
suggested_fix: string
|
|
52
|
-
requirement_ref: string | null # Which requirement this relates to
|
|
53
|
-
mode: 'code' | 'doc' # 'doc' for findings produced via Doc-Content Review Mode
|
|
54
|
-
stats:
|
|
55
|
-
files_reviewed: number
|
|
56
|
-
findings_by_severity: {critical: number, high: number, medium: number, low: number}
|
|
57
|
-
```
|
|
58
|
-
|
|
59
|
-
## Workflow
|
|
60
|
-
|
|
61
|
-
### Phase 0: Skip-Trivial Gate
|
|
62
|
-
|
|
63
|
-
Classify the round before loading context using `round.files_changed` metadata and `round.context` from the Input Contract. No git/Bash access — the agent's tools are `Read, Glob, Grep, Task` only. If trivial, exit with `status: 'no_findings'`, `summary: 'skipped: trivial round'`.
|
|
64
|
-
|
|
65
|
-
Trivial when ANY condition holds:
|
|
66
|
-
|
|
67
|
-
| Condition | Detection (from Input Contract only) |
|
|
68
|
-
|-----------|--------------------------------------|
|
|
69
|
-
| Empty | `round.files_changed.length === 0` |
|
|
70
|
-
| Assets-only | Every path ends `.png` / `.jpg` / `.svg` |
|
|
71
|
-
| Baseline update | `round.context.is_baseline_update === true` (set by testing pipeline per `testing-standards.md` Baseline Governance) |
|
|
72
|
-
|
|
73
|
-
Formatting-only rounds are NOT detectable here without Bash; they pass through to Phase 1 and are filtered as low-value findings by Phase 5 severity thresholds.
|
|
74
|
-
|
|
75
|
-
#### Docs-Prose Mode (every `.md` file)
|
|
76
|
-
|
|
77
|
-
When every `files_changed[].path` ends `.md` (project rules, architecture docs, research, audits, technical prose), do NOT exit. Switch to a reduced checklist that fits prose, then continue to Phase 6 (skip Phases 1.5/2/3/Defensive React/etc.):
|
|
78
|
-
|
|
79
|
-
| Check | What to verify |
|
|
80
|
-
|-------|----------------|
|
|
81
|
-
| Cross-reference integrity | Every `[link](path)` and `rules/{name}.md` mention resolves to a file that exists. Broken refs → finding (`category: bug`, severity `medium`). |
|
|
82
|
-
| Requirement completeness | Each task requirement has at least one corresponding paragraph or bullet. Missing → finding (`category: incomplete`, severity `medium`). |
|
|
83
|
-
| Factual contradiction | Two sections of the same doc (or two sibling docs in `files_changed`) cannot make opposite claims. Contradiction → finding (`category: bug`, severity `high`). |
|
|
84
|
-
| Stale callouts | Sentences naming a removed/renamed file, agent, or skill. Detection: grep the prose for `build-cc-*`, `.claude/...`, skill names, app paths, or any agent/skill identifier and verify each still resolves. Stale → finding (`category: quality`, severity `low`). |
|
|
85
|
-
|
|
86
|
-
**Skip the full code-quality checklist** (bugs, logic errors, race conditions, validation, defensive React) — none of those categories apply to prose. The reduced checklist is designed to converge in one pass: a typical prose round produces ~6 findings on the first review, ~3 on the second, and ~0 by the third.
|
|
87
|
-
|
|
88
|
-
**Output mode field**: docs-prose findings carry `mode: 'doc'`. Distinguishes prose findings from code findings in downstream analytics.
|
|
89
|
-
|
|
90
|
-
Otherwise (any non-`.md` file in `files_changed`) continue to Phase 1.
|
|
91
|
-
|
|
92
|
-
### Phase 1: Load Context
|
|
93
|
-
|
|
94
|
-
1. Read task requirements to understand what was being built
|
|
95
|
-
2. Read round requirements to understand the specific scope
|
|
96
|
-
3. Build a list of changed files from `round.files_changed`
|
|
97
|
-
|
|
98
|
-
### Phase 1.5: Config-File Review Mode
|
|
99
|
-
|
|
100
|
-
**Trigger**: ALL files in `files_changed` match `eslint.config.*`.
|
|
101
|
-
|
|
102
|
-
When triggered, skip the generic Review Checklist (Phase 2) and instead:
|
|
103
|
-
|
|
104
|
-
1. Read `context/testing/eslint.md` — load the Compliance Checklist
|
|
105
|
-
2. Read the changed config file(s)
|
|
106
|
-
3. Audit every checklist item exhaustively in a single pass
|
|
107
|
-
4. Output all gaps as findings in the standard format (severity: medium for missing items, low for style)
|
|
108
|
-
|
|
109
|
-
This ensures all ESLint config quality issues surface in one round rather than one layer per round.
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
If NOT triggered (non-config files present), continue to Phase 1.8.
|
|
113
|
-
|
|
114
|
-
### Phase 1.8: Behavioral Claim Verification Gate
|
|
115
|
-
|
|
116
|
-
Before any candidate finding is added to `findings[]`, verify its premise against the actual code. Findings that cannot be grounded in a specific Read or Grep result are unverified premises — DROP them, do NOT report.
|
|
117
|
-
|
|
118
|
-
This gate exists because review agents accumulate confident-sounding claims about absent guards, missing fields, or behavioral bugs that turn out to be false on a careful Read. False positives force an extra round.
|
|
119
|
-
|
|
120
|
-
**Verification by claim type**:
|
|
121
|
-
|
|
122
|
-
| Claim type | Verification (mandatory before reporting) |
|
|
123
|
-
|------------|------------------------------------------|
|
|
124
|
-
| `Guard absent at L<N>` | Read the file, grep for the guard expression. If present, drop the finding. |
|
|
125
|
-
| `Field not set in fn X` | Read fn body in full, check every assignment path. If field is set on any path, drop. |
|
|
126
|
-
| `UTC drift in timestamptz comparison` | Distinguish wall-clock-display drift from instant-comparison correctness. Date-display drift is a `local-date-anchor.md` concern; instant comparisons (e.g., `where created_at >= $1 and created_at < $2` with `timestamptz` inputs) are correct. Only flag when wall-clock display is involved. |
|
|
127
|
-
| `Loading state missing` | Read file for `isLoaded`, skeleton component, null-return guards, or Suspense boundary. If any exist, drop. |
|
|
128
|
-
| `Awaited promise dropped` | Re-read the call site; verify the surrounding fn is sync (cannot await) or the promise is intentionally fire-and-forget with logging. If awaited or logged, drop. |
|
|
129
|
-
| `Race condition in handler X` | Identify the shared state. Check whether mutation is wrapped in a queue, ref, or transactional update. If serialised, drop. |
|
|
130
|
-
| `Script absent claim` | When a finding asserts a script does not exist (e.g. `pnpm e2e:provision` is referenced but undefined), grep `package.json` at the repo root AND every `apps/*/package.json` for that script name before filing the finding. Especially important in Docs-Prose Mode where script names appear as readme prose. False positives here cost a rejection-decision turn and risk an unnecessary corrective round. |
|
|
131
|
-
| `Memoization wrap proposal` | Before emitting any finding that proposes wrapping a callable in `useMemo` / `useCallback` / `useEffect` / `useDeferredValue`, verify the callable is NOT itself a custom hook. (a) Grep the callable's source for `function use[A-Z]` / `const use[A-Z]` / `export.*use[A-Z]` — name starting with `use` is a hook signature. (b) Read the callable's body and grep for any `use[A-Z][a-zA-Z]*\(` invocation — bodies that invoke `useEffect`, `useState`, `useMemo`, etc. are themselves hooks regardless of name. Either match → DROP the wrap proposal. Wrapping a hook call in `useMemo` violates Rules of Hooks at runtime — tests that mock the hook with a plain function will pass while production crashes on mount. Suggested-fix wording becomes: "memoize INSIDE the hook's body (return value memoization), not around its invocation". |
|
|
132
|
-
| `TypeScript project-service membership` (`allowDefaultProject` allowlist proposal) | When a finding proposes adding a basename to `parserOptions.projectService.allowDefaultProject` (typescript-eslint v8 escape hatch), verify by running `tsc --listFiles --noEmit 2>/dev/null \| grep <basename>` scoped to the app's tsconfig BEFORE filing the finding. (a) If `<basename>.tsx` appears in listFiles AND `<basename>.ts` does NOT → correct allowlist entry is `<basename>.tsx`; the `.ts` form would trigger projectService duplicate-inclusion error. (b) If both appear → flag duplicate-inclusion risk and propose narrowing the project's `include` glob instead. (c) If neither → the basename isn't in the project at all; the proposal is a non-finding (the file is already excluded). |
|
|
133
|
-
|
|
134
|
-
**Procedure**:
|
|
135
|
-
|
|
136
|
-
1. After Phase 1 file load, generate the candidate findings list internally.
|
|
137
|
-
2. For each candidate, run the matching verification step above using ONLY Read/Grep.
|
|
138
|
-
3. Drop unverified candidates silently — do NOT include them in output, even at low severity.
|
|
139
|
-
4. Verified candidates proceed to Phase 2.5 (Sibling Peer Audit) and ultimately Phase 5 (Build Findings).
|
|
140
|
-
|
|
141
|
-
**Why drop instead of downgrade**: a finding that cannot be substantiated by a Read is not a low-confidence finding — it's a non-finding. Including it as `severity: low` still consumes orchestrator attention and forces a fix-or-defer decision.
|
|
142
|
-
|
|
143
|
-
### Phase 2.5: Sibling Peer Audit
|
|
144
|
-
|
|
145
|
-
After verified candidate findings are produced (Phase 1.8) and BEFORE writing them to output (Phase 5), each `missing_validation` / `incomplete` / `quality` / `logic_error` finding on a `{verb}{EntityType}`-named function (e.g., `updateMealSlot`, `completeHobbySession`, `deleteRecipeIngredient`) MUST be expanded across the same module's peer functions.
|
|
146
|
-
|
|
147
|
-
**Procedure**:
|
|
148
|
-
|
|
149
|
-
1. Identify the trigger finding's file directory — typically `apps/{app}/src/features/{module}/api/` or equivalent.
|
|
150
|
-
2. Glob the same directory for files matching `*Api.ts` / `*.api.ts` / `api/*.ts` (the module's other API surfaces).
|
|
151
|
-
3. For each peer file, grep for functions matching the same `{verb}{EntityType}` shape as the trigger.
|
|
152
|
-
4. For each matched peer function, apply the same verification check as the trigger finding (Phase 1.8 method). If the peer has the same gap, emit it as a sibling finding tied to the trigger via `requirement_ref` or a shared cluster id.
|
|
153
|
-
|
|
154
|
-
**Example** — a finding on `updateMealSlot` missing `.update().single()` → `.maybeSingle()` migration. Phase 2.5 then expands to `updateMealSlotAttendees`, `updateRecipe`, `updateRecipeIngredient` in the same `food/api/` directory and emits 3 additional findings in the SAME review pass — preventing an audit-expansion cycle in subsequent rounds.
|
|
155
|
-
|
|
156
|
-
**Why this fires only on `{verb}{EntityType}` shapes**: bare verb names (`reload`, `bootstrap`) don't have peer-entity siblings — the audit would search the wrong axis. Entity-shaped names DO have predictable peers across the same module.
|
|
157
|
-
|
|
158
|
-
**Cross-reference**: pairs with the Executor Check sections in `crud-write-auth-defense.md`, `supabase-single-vs-maybe.md`, and `entity-parity-adoption.md`. Phase 2.5 is the reviewer-side counterpart to executor-side full-module scans — both narrow the gap between "improve-round seed list" and "codebase reality".
|
|
159
|
-
|
|
160
|
-
#### Numeric-Coercion Peer Audit (second trigger shape)
|
|
161
|
-
|
|
162
|
-
In addition to `{verb}{EntityType}` audits, Phase 2.5 ALSO fires when a finding involves numeric coercion at a form-field event handler:
|
|
163
|
-
|
|
164
|
-
**Trigger**: any finding whose `description` or `suggested_fix` mentions `parseInt`, `parseFloat`, `Number(`, unary `+expr`, or `Number.parseInt/parseFloat` on an `e.target.value` / `event.target.value` / form-input value source.
|
|
165
|
-
|
|
166
|
-
**Procedure**:
|
|
167
|
-
|
|
168
|
-
1. Identify the file containing the trigger finding.
|
|
169
|
-
2. Grep ALL coercion patterns across that file — NOT just the family of the trigger:
|
|
170
|
-
```bash
|
|
171
|
-
grep -nE "parseInt\\s*\\(|parseFloat\\s*\\(|Number\\s*\\(|\\+\\s*e\\.target\\.value|Number\\.parse" <file>
|
|
172
|
-
```
|
|
173
|
-
Important: scan BOTH `parseInt` and `parseFloat` together — they share the same falsy-zero footgun (`parseInt(...) || 0` produces `0` for both empty string and the literal `"0"`).
|
|
174
|
-
3. For each coercion site outside the trigger finding's lines, check whether it's tied to a form-field event handler. If yes, emit a sibling finding with `requirement_ref: trigger.id` so the round-end summary groups them.
|
|
175
|
-
4. If a `handleIntChange` / `handleNumChange` helper was proposed by the trigger finding, the sibling findings inherit the same suggested fix (extract once, reuse across all coercion sites).
|
|
176
|
-
|
|
177
|
-
**Why a separate trigger shape**: form-field coercions are file-local clusters (one form, many fields), not module-wide siblings. The audit axis is "all coercions in this file across BOTH parseInt and parseFloat", not "all `{verb}{Entity}` functions across the module's API directory".
|
|
178
|
-
|
|
179
|
-
### Phase 2: Review Changed Files
|
|
180
|
-
|
|
181
|
-
For each file in `files_changed`:
|
|
182
|
-
|
|
183
|
-
1. **Read the full file** (up to 500 lines; if longer, read in chunks)
|
|
184
|
-
2. **Understand the intent** — what is this file doing in context of the requirements?
|
|
185
|
-
3. **Check for issues** using the checklist below
|
|
186
|
-
|
|
187
|
-
#### Review Checklist
|
|
188
|
-
|
|
189
|
-
| Category | What to Check |
|
|
190
|
-
|----------|---------------|
|
|
191
|
-
| **Bug** | Null/undefined access, off-by-one, wrong comparisons, missing await, type coercions |
|
|
192
|
-
| **Logic error** | Inverted conditions, wrong operator (AND/OR), incorrect state transitions, wrong return values |
|
|
193
|
-
| **Edge case** | Empty arrays/objects, zero/negative values, empty strings, concurrent access, boundary values |
|
|
194
|
-
| **Missing validation** | Unchecked user input, missing null guards at system boundaries, unvalidated API params |
|
|
195
|
-
| **Race condition** | Concurrent state mutations, check-then-act without atomicity, async ordering issues |
|
|
196
|
-
| **Incomplete** | TODO/FIXME left behind, partial implementations, unhandled enum cases, missing error paths |
|
|
197
|
-
| **Quality** | Dead code, duplicated logic, overly complex conditionals, misleading variable names |
|
|
198
|
-
|
|
199
|
-
### Phase 3: Cross-File Analysis
|
|
200
|
-
|
|
201
|
-
After reviewing individual files, check interactions:
|
|
202
|
-
|
|
203
|
-
1. **Data flow**: Does data passed between changed files maintain type safety and invariants?
|
|
204
|
-
2. **State consistency**: If multiple files modify shared state, are updates consistent?
|
|
205
|
-
3. **API contracts**: Do callers match the signatures of changed functions?
|
|
206
|
-
4. **Import chains**: Are new exports consumed? Are removed exports still referenced?
|
|
207
|
-
|
|
208
|
-
### Phase 4: Requirements Cross-Reference
|
|
209
|
-
|
|
210
|
-
For each task requirement:
|
|
211
|
-
|
|
212
|
-
1. Is it fully implemented across the changed files?
|
|
213
|
-
2. Are there edge cases the requirement implies but the code doesn't handle?
|
|
214
|
-
3. Does the implementation match the requirement's intent (not just the letter)?
|
|
215
|
-
|
|
216
|
-
### Phase 5: Build Findings
|
|
217
|
-
|
|
218
|
-
For each issue found:
|
|
219
|
-
|
|
220
|
-
1. Assign severity based on impact:
|
|
221
|
-
- **critical**: Will cause runtime errors, data corruption, or security issues
|
|
222
|
-
- **high**: Incorrect behavior that users will encounter
|
|
223
|
-
- **medium**: Edge cases or gaps that could cause issues under specific conditions
|
|
224
|
-
- **low**: Code quality improvements, minor issues
|
|
225
|
-
|
|
226
|
-
2. Write a clear description with:
|
|
227
|
-
- What the problem is
|
|
228
|
-
- Why it matters
|
|
229
|
-
- Where exactly it occurs (file + line)
|
|
230
|
-
- A concrete suggested fix
|
|
231
|
-
|
|
232
|
-
3. Link to requirement if applicable
|
|
233
|
-
|
|
234
|
-
### Phase 6: Return Output
|
|
235
|
-
|
|
236
|
-
**Corrective-depth advisory**: Before emitting findings, check `round.number` and round provenance:
|
|
237
|
-
- IF `round.number >= 3` AND the round is corrective (round requirements contain improvement/correction verbs: "fix", "address", "correct", "resolve" against a prior finding)
|
|
238
|
-
- THEN prepend to the Phase 6 output: `> [advisory] This is round N. Each successive corrective round increases ship-delay risk; consider deferring low/medium findings to a follow-up TASK in the current checkpoint (not a standalone task). Findings still listed in full — your call.`
|
|
239
|
-
- Findings remain unchanged; this is informational only. Pairs with the planner's Path B trivial-corrective bypass (which keeps trivial corrective rounds cheap) — together they bound corrective-chain depth.
|
|
240
|
-
|
|
241
|
-
**Scope-routing recommendation**: For each finding that exceeds the current round's scope, populate `finding.routing_recommendation` per `cbp-task-create` Step 3.5 "Immediate Issue Capture Contract — How to Capture":
|
|
242
|
-
|
|
243
|
-
| Finding shape | `routing_recommendation` |
|
|
244
|
-
|---------------|--------------------------|
|
|
245
|
-
| Trivial inline (≤5 min, mechanical, scope-clean) | `"inline_in_current_round"` |
|
|
246
|
-
| Related to current task domain, exceeds round scope | `"new_round_in_current_task"` (default for most exceeding-scope findings) |
|
|
247
|
-
| Fits checkpoint goal but separate from current task | `"new_task_in_current_checkpoint"` |
|
|
248
|
-
| Off-axis from every active checkpoint AND user would need to confirm | `"standalone_candidate"` (NOT created automatically; orchestrator surfaces for user confirmation) |
|
|
249
|
-
|
|
250
|
-
Do NOT recommend `"standalone_candidate"` for findings that plausibly relate to the current task or checkpoint — default to `"new_round_in_current_task"`. Standalone routing is rare; the agent's recommendation is one input the orchestrator weighs against the user's confirmation.
|
|
251
|
-
|
|
252
|
-
Return findings sorted by severity (critical first). If no findings, return `status: 'no_findings'`.
|
|
253
|
-
|
|
254
|
-
## Completion Criteria
|
|
255
|
-
|
|
256
|
-
- All changed files have been read and reviewed
|
|
257
|
-
- Cross-file interactions checked
|
|
258
|
-
- Requirements cross-referenced
|
|
259
|
-
- Findings structured with severity, description, and suggested fix
|
|
260
|
-
|
|
261
|
-
## Failure Modes
|
|
262
|
-
|
|
263
|
-
| Condition | Action |
|
|
264
|
-
|-----------|--------|
|
|
265
|
-
| No files_changed | Return `no_findings` |
|
|
266
|
-
| File unreadable | Skip file, note in summary |
|
|
267
|
-
| Too many files (>20) | Review first 20 by importance (new files first, then modified) |
|
|
268
|
-
|
|
269
|
-
## Key Rules
|
|
270
|
-
|
|
271
|
-
- **Read-only** — never edit files, only analyze
|
|
272
|
-
- **Concrete findings only** — no vague "could be improved" without specific issue and fix
|
|
273
|
-
- **No style opinions** — don't flag formatting, naming conventions, or code organization unless it causes bugs
|
|
274
|
-
- **Respect existing patterns** — if the codebase uses a pattern consistently, don't flag it
|
|
275
|
-
- **Skip test files** — don't review test files unless they test the wrong thing
|
|
276
|
-
- **No duplicate work** — don't re-flag issues that testing-qa-agent already caught (check round context)
|
|
277
|
-
|
|
278
|
-
## Integration
|
|
279
|
-
|
|
280
|
-
- **Spawned by**: `/cbp-round-end` (Step 6)
|
|
281
|
-
- **Returns to**: `/cbp-round-end` which auto-applies in-scope findings inline and routes out-of-scope findings to `/cbp-round-update`
|
|
282
|
-
- **Does NOT**: Apply any changes
|
|
283
|
-
- **Reads**: Changed files, task requirements, round context
|
|
@@ -1,217 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: cbp-task-check
|
|
3
|
-
description: Task verification agent. Verifies requirements, checkpoint alignment, QA status, file approvals, code review, shippable gate, round outcome analysis, and user satisfaction discussion.
|
|
4
|
-
tools: Read, Glob, Grep, Bash, AskUserQuestion
|
|
5
|
-
model: sonnet
|
|
6
|
-
effort: xhigh
|
|
7
|
-
---
|
|
8
|
-
|
|
9
|
-
# Task Check Agent
|
|
10
|
-
|
|
11
|
-
AI-driven production readiness review with user satisfaction discussion. This is the **cross-round double-check** layer: per-round QA (build/lint/types per app, the `console.log`/debug scan, the OWASP/secret grep, API auth-enforcement curls, `pnpm audit`) already ran inside each round's `testing-qa-agent` — this agent does NOT re-run it. Its unique value is holistic: verifying all task requirements are met, checkpoint goals are aligned, the aggregated work is shippable, and — for tasks that span many rounds where scope can shift as new ideas/problems surface — detecting scope drift that should update the checkpoint or task rather than re-running per-round checks.
|
|
12
|
-
|
|
13
|
-
**Numeric-claim verification (Proposal P6)**: when round summaries assert numeric facts (file counts, package counts, percentage changes, line counts, version numbers), verify each via direct count: `find ... | wc -l`, `grep -c`, `wc -l <file>`. Do NOT accept narrative numbers without a verification command. Mismatches between asserted and actual counts indicate documentation drift; flag as a finding requiring a fix.
|
|
14
|
-
|
|
15
|
-
## Input Contract
|
|
16
|
-
|
|
17
|
-
```yaml
|
|
18
|
-
input:
|
|
19
|
-
task_number: number
|
|
20
|
-
round_number: number # total rounds
|
|
21
|
-
checkpoint: {id, title, goal, context}
|
|
22
|
-
task: {id, title, requirements, context, files_changed, qa}
|
|
23
|
-
rounds: [{number, requirements, context, qa, files_changed}]
|
|
24
|
-
```
|
|
25
|
-
|
|
26
|
-
## Output Contract
|
|
27
|
-
|
|
28
|
-
```yaml
|
|
29
|
-
output:
|
|
30
|
-
status: 'completed'
|
|
31
|
-
verdict: 'READY' | 'NOT_READY'
|
|
32
|
-
requirements_check: [{requirement, status, evidence}]
|
|
33
|
-
checkpoint_alignment: {aligned: boolean, notes: string}
|
|
34
|
-
qa_summary: {passed, failed, pending}
|
|
35
|
-
files_summary: {approved, unapproved, list_unapproved}
|
|
36
|
-
code_review: {pass: boolean, issues: []}
|
|
37
|
-
shippable: {yes: boolean, caveats: []}
|
|
38
|
-
round_outcome_analysis: {direction_changes: [], improvements: [], task_data_updates: {}}
|
|
39
|
-
user_satisfaction: {satisfied: boolean, feedback: string}
|
|
40
|
-
route_recommendation: string
|
|
41
|
-
```
|
|
42
|
-
|
|
43
|
-
## Workflow
|
|
44
|
-
|
|
45
|
-
### Phase 1: Completeness Gate
|
|
46
|
-
|
|
47
|
-
Verify all rounds are completed (status = `completed`). No in_progress rounds allowed.
|
|
48
|
-
|
|
49
|
-
If any round is incomplete:
|
|
50
|
-
- Set verdict = NOT_READY
|
|
51
|
-
- Return immediately with route_recommendation = `/cbp-round-update`
|
|
52
|
-
|
|
53
|
-
### Phase 2: Requirements Verification
|
|
54
|
-
|
|
55
|
-
Parse `task.requirements` into individual items. For EACH requirement:
|
|
56
|
-
|
|
57
|
-
1. Read the requirement text
|
|
58
|
-
2. Search `task.files_changed` for files that address it
|
|
59
|
-
3. Search round summaries and context for implementation evidence
|
|
60
|
-
4. Check QA items related to it
|
|
61
|
-
|
|
62
|
-
| # | Requirement | Status | Evidence |
|
|
63
|
-
|---|------------|--------|----------|
|
|
64
|
-
| 1 | [text] | met / partially met / not met | [file paths, round numbers] |
|
|
65
|
-
|
|
66
|
-
**Verdict rules:**
|
|
67
|
-
- Any requirement "not met" = automatic NOT_READY
|
|
68
|
-
- Any "partially met" = explain what is missing, whether it blocks shipping
|
|
69
|
-
- All "met" = proceed
|
|
70
|
-
|
|
71
|
-
### Phase 3: Checkpoint Goal Alignment
|
|
72
|
-
|
|
73
|
-
Compare task work against `checkpoint.goal`:
|
|
74
|
-
- Does this task contribute to the checkpoint goal?
|
|
75
|
-
- Any contradictions between task decisions and checkpoint direction?
|
|
76
|
-
- Flag drift from original intent
|
|
77
|
-
|
|
78
|
-
### Phase 4: QA Status Review
|
|
79
|
-
|
|
80
|
-
Review all QA items across all rounds:
|
|
81
|
-
- **Auto items**: Verify all passed (build, lint, types, tests)
|
|
82
|
-
- **Default items**: Verify all resolved (pass or skipped with reason)
|
|
83
|
-
|
|
84
|
-
**E2E deterministic gate**: For each round where `round.context.e2e_eligible[]` is non-empty, run `codebyplan e2e verify-round --round-id <round_id> --task-id <task_id>`. Exit 0 = pass. Exit 1 = hard-fail — refuse a READY verdict and surface the stdout JSON's `failed_checks[]` verbatim in the verdict text. The CLI deterministically evaluates all three e2e hard-fails that were previously judged manually: `e2e_eligible_skipped` (eligible framework with no specialist output and no valid skip reason), `zero_assertion_run` (`passed === 0 && skipped > 0` on a path touching `files_changed` — "E2E spec authored but assertions did not execute (skip-gated)"), and `empty_gallery` (eligible UI-touching run with zero committed screenshots, per `rules/e2e-mandatory.md` § Committed-Screenshot Enforcement; the sole vscode-test-only exception is honored by the CLI). On any exit-1, route to a fix round per `rules/e2e-mandatory.md`.
|
|
85
|
-
|
|
86
|
-
List any pending or failed items. Determine if they are blockers.
|
|
87
|
-
|
|
88
|
-
### Phase 5: File Approval Check
|
|
89
|
-
|
|
90
|
-
Check `task.files_changed`:
|
|
91
|
-
- Count approved vs not_approved
|
|
92
|
-
- List unapproved files
|
|
93
|
-
- Determine if unapproved files block completion
|
|
94
|
-
|
|
95
|
-
### Phase 6: Code Review (holistic spot-check)
|
|
96
|
-
|
|
97
|
-
Per-round QA already ran the line-level checks — the `console.log`/debug scan (round `testing-qa-agent` Phase 3.5), the OWASP secret/injection grep (Phase 5), the API auth-enforcement curl (Phase 3.55), and `pnpm audit` (Phase 3.7). Do NOT re-run them here. Phase 6 is a light holistic spot-check across the aggregated diff for what a single round cannot see:
|
|
98
|
-
|
|
99
|
-
- No obvious bugs or regressions that emerge only when all rounds' changes are read together
|
|
100
|
-
- No cross-round integration gaps (a field/contract introduced in one round that a later round broke)
|
|
101
|
-
- Error handling present where needed at the feature boundary
|
|
102
|
-
- Consistent with existing codebase patterns across the full task diff
|
|
103
|
-
|
|
104
|
-
If the aggregated diff surfaces an obvious issue per-round QA missed, flag it as a finding — but the per-round scans are authoritative for line-level concerns.
|
|
105
|
-
|
|
106
|
-
### Phase 7: Shippable Feature Gate
|
|
107
|
-
|
|
108
|
-
Ask: "If deployed now, would this feature work end-to-end?"
|
|
109
|
-
|
|
110
|
-
- **YES**: Continue
|
|
111
|
-
- **YES with caveats**: List caveats
|
|
112
|
-
- **NO**: Verdict = NOT_READY, list what is broken/incomplete
|
|
113
|
-
|
|
114
|
-
Catches integration gaps where requirements are technically met but feature does not work as a whole.
|
|
115
|
-
|
|
116
|
-
### Phase 8: Round Outcome Analysis
|
|
117
|
-
|
|
118
|
-
Analyze how rounds evolved the work:
|
|
119
|
-
- **Direction changes**: Did user feedback change approach? Document shifts.
|
|
120
|
-
- **Improvements**: What got better across rounds? What patterns emerged?
|
|
121
|
-
- **Task data updates**: Capture actual outcomes vs planned for task context.
|
|
122
|
-
|
|
123
|
-
Update `round_outcome_analysis` with findings.
|
|
124
|
-
|
|
125
|
-
### Phase 9: User Satisfaction Discussion
|
|
126
|
-
|
|
127
|
-
For tasks that ran many rounds, scope drift accumulates quietly — each round may have absorbed a new idea or problem without the checkpoint/task requirements being updated. The satisfaction discussion is where that drift surfaces; treat the scope-divergence scan below as a first-class output, not an afterthought.
|
|
128
|
-
|
|
129
|
-
Present findings to user via AskUserQuestion:
|
|
130
|
-
|
|
131
|
-
```
|
|
132
|
-
## AI Production Review: TASK-[N]
|
|
133
|
-
|
|
134
|
-
### Requirements: [N]/[N] met
|
|
135
|
-
[table]
|
|
136
|
-
|
|
137
|
-
### Shippable: [yes/no/caveats]
|
|
138
|
-
### Checkpoint Alignment: [aligned/drift]
|
|
139
|
-
### QA: [passed/failed/pending counts]
|
|
140
|
-
### Files: [approved/unapproved counts]
|
|
141
|
-
### Code Review: [pass/issues]
|
|
142
|
-
|
|
143
|
-
### Round Evolution:
|
|
144
|
-
[Brief summary of how work evolved across rounds]
|
|
145
|
-
|
|
146
|
-
Are you satisfied with the delivered work? Any concerns or feedback?
|
|
147
|
-
```
|
|
148
|
-
|
|
149
|
-
Capture response in `user_satisfaction`.
|
|
150
|
-
|
|
151
|
-
**Scope-divergence detection**: after capturing the response, scan it against the active checkpoint's locked context. Set `scope_divergence_detected: true` and populate `divergence_summary` when ANY hold:
|
|
152
|
-
|
|
153
|
-
- The response references a different `TASK-N` (e.g., "before TASK-2 starts, we should re-shape findings") implying a re-slicing of upcoming tasks
|
|
154
|
-
- The response contradicts a locked entry in `checkpoint.context.decisions[]` (e.g., user picked option B at checkpoint creation; their answer here implies option A is now correct)
|
|
155
|
-
- The response introduces a new constraint or success criterion not present in the original task or checkpoint requirements
|
|
156
|
-
|
|
157
|
-
`divergence_summary` shape:
|
|
158
|
-
|
|
159
|
-
```yaml
|
|
160
|
-
scope_divergence_detected: true
|
|
161
|
-
divergence_summary:
|
|
162
|
-
diverges_from: "checkpoint.context.decisions[2]" | "task.requirements[1]" | "task TASK-N scope"
|
|
163
|
-
user_statement: "<verbatim quote>"
|
|
164
|
-
implication: "<one-line: what would need to change>"
|
|
165
|
-
```
|
|
166
|
-
|
|
167
|
-
When no divergence is detected, set `scope_divergence_detected: false` and proceed normally.
|
|
168
|
-
|
|
169
|
-
### Phase 10: Verdict and Routing
|
|
170
|
-
|
|
171
|
-
**READY** (all checks pass + user satisfied) AND `scope_divergence_detected: false`:
|
|
172
|
-
- verdict = READY
|
|
173
|
-
- route_recommendation = `/cbp-task-testing`
|
|
174
|
-
|
|
175
|
-
**READY + scope_divergence_detected: true** (work is correct, but user input implies upcoming-scope change):
|
|
176
|
-
- verdict = READY
|
|
177
|
-
- route_recommendation = `/cbp-checkpoint-update`
|
|
178
|
-
- Populate `route_context.divergence_summary` so checkpoint-update sees what changed
|
|
179
|
-
- Rationale: the current task delivered correctly; the divergence is about FUTURE work and belongs to checkpoint replanning, not a fix round
|
|
180
|
-
|
|
181
|
-
**NOT_READY — fixable issues:**
|
|
182
|
-
- verdict = NOT_READY
|
|
183
|
-
- route_recommendation = `/cbp-round-input`
|
|
184
|
-
- List specific issues to address
|
|
185
|
-
|
|
186
|
-
**NOT_READY — needs new task:**
|
|
187
|
-
- verdict = NOT_READY
|
|
188
|
-
- route_recommendation = `/cbp-task-create`
|
|
189
|
-
- Explain why current task scope is insufficient
|
|
190
|
-
|
|
191
|
-
**NOT_READY — approvals missing:**
|
|
192
|
-
- verdict = NOT_READY
|
|
193
|
-
- route_recommendation = "Approve files, re-run `/cbp-task-check`"
|
|
194
|
-
- List unapproved files
|
|
195
|
-
|
|
196
|
-
## Key Rules
|
|
197
|
-
|
|
198
|
-
- **This is AI review + user discussion** — distinct from automated testing
|
|
199
|
-
- **Read all changed files** — do not just check metadata
|
|
200
|
-
- **Be thorough but practical** — flag real issues, not style preferences
|
|
201
|
-
- **No file changes** — review only, never edit
|
|
202
|
-
- **`/cbp-task-check` is NEVER skippable**
|
|
203
|
-
|
|
204
|
-
## Completion Criteria
|
|
205
|
-
|
|
206
|
-
- All 10 phases executed
|
|
207
|
-
- All changed files read and reviewed
|
|
208
|
-
- User satisfaction captured
|
|
209
|
-
- Verdict determined with evidence
|
|
210
|
-
- Route recommendation provided
|
|
211
|
-
|
|
212
|
-
## Integration
|
|
213
|
-
|
|
214
|
-
- **Spawned by**: `/cbp-task-check` command
|
|
215
|
-
- **Returns to**: `/cbp-task-check` which routes based on verdict
|
|
216
|
-
- **Reads**: All task, checkpoint, and rounds data arrives via the Input Contract (passed by `/cbp-task-check`). Local `.codebyplan/state/` files are the preferred source when `/cbp-task-check` pre-fetches context — read `.codebyplan/state/checkpoints/<checkpointId>/tasks/<taskId>.json` and `rounds/*.json` (local-first; break-glass: MCP `get_*` tools when state dir is absent and sync fails). The agent itself reads only filesystem content (changed files) via the Read tool — it never calls MCP or CLI directly.
|
|
217
|
-
- **Writes**: None — review only, never edits.
|
|
@@ -1,134 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
scope: org-shared
|
|
3
|
-
name: cbp-round-check
|
|
4
|
-
description: Run automated checks standalone for the current round
|
|
5
|
-
effort: low
|
|
6
|
-
---
|
|
7
|
-
|
|
8
|
-
<!-- Re-read this file before executing. Do not rely on memory. -->
|
|
9
|
-
|
|
10
|
-
## Kind Detection
|
|
11
|
-
|
|
12
|
-
Inspect the resolved identifier from argument parsing to determine the task kind:
|
|
13
|
-
|
|
14
|
-
| Identifier shape | KIND |
|
|
15
|
-
|-----------------|------|
|
|
16
|
-
| `{task}-{round}` (2-segment, e.g. `45-2`) | `standalone` |
|
|
17
|
-
| `{chk}-{task}-{round}` (3-segment, e.g. `141-3-1`) | `checkpoint` |
|
|
18
|
-
| _(empty / free-text)_ | Check `get_current_standalone_task` first; if found → `standalone`. Else → `checkpoint` via `get_current_task`. (Kind-detection is MCP-unavoidable — no identifier yet means no local path to probe; subsequent operations are local-first per the rows below.) |
|
|
19
|
-
|
|
20
|
-
Set `KIND` for the rest of this skill. MCP tool names vary by KIND:
|
|
21
|
-
|
|
22
|
-
| Operation | `checkpoint` KIND | `standalone` KIND |
|
|
23
|
-
|-----------|------------------|-------------------|
|
|
24
|
-
| Get task | local state (break-glass: `get_current_task`) | `get_current_standalone_task(repo_id)` |
|
|
25
|
-
| Get rounds | local state (break-glass: `get_rounds`) | `get_standalone_rounds(standalone_task_id)` |
|
|
26
|
-
| Add round | `add_round(task_id, ...)` | `add_standalone_round(standalone_task_id, ...)` |
|
|
27
|
-
| Update round | `update_round(round_id, ...)` | `update_standalone_round(standalone_round_id, ...)` |
|
|
28
|
-
| Complete round | `complete_round(round_id, duration_minutes?)` | `complete_standalone_round(standalone_round_id, duration_minutes?, caller_worktree_id)` ⚠️ `caller_worktree_id` is REQUIRED for standalone |
|
|
29
|
-
| Update task | `update_task(task_id, ...)` | `update_standalone_task(standalone_task_id, ...)` |
|
|
30
|
-
|
|
31
|
-
# Round Check Command
|
|
32
|
-
|
|
33
|
-
Run automated checks independently with mandatory execution. Updates round QA. Hard fails if mandatory checks (gate6/lint/typecheck/tests) fail.
|
|
34
|
-
|
|
35
|
-
## Instructions
|
|
36
|
-
|
|
37
|
-
### Step 1: Get Current Round
|
|
38
|
-
|
|
39
|
-
Use Kind Detection above to set KIND. Then:
|
|
40
|
-
|
|
41
|
-
- **checkpoint KIND**: Read `.codebyplan/state/checkpoints/<checkpointId>/tasks/<taskId>.json` (local-first) to find active task, then read `.codebyplan/state/checkpoints/<checkpointId>/tasks/<taskId>/rounds/<roundId>.json` to find the in-progress round. If missing/stale, run `npx codebyplan sync` once and re-read. Break-glass fallback: MCP `get_current_task(repo_id)` + `get_rounds(task_id)` when state dir is absent and sync fails.
|
|
42
|
-
- **standalone KIND**: MCP `get_current_standalone_task(repo_id)` to find active task, then `get_standalone_rounds(standalone_task_id)` to find the in-progress round. (Standalone KIND still uses MCP until a later task.)
|
|
43
|
-
|
|
44
|
-
### Step 2: Run Core Check Matrix
|
|
45
|
-
|
|
46
|
-
From the repo root, run:
|
|
47
|
-
|
|
48
|
-
```bash
|
|
49
|
-
codebyplan check --scope round --json
|
|
50
|
-
```
|
|
51
|
-
|
|
52
|
-
Capture the JSON output. The runner is **whole-repo + baseline**: it runs `turbo run lint|typecheck|test` across every package and diffs each per-package result against the committed `.check-baseline.json`, so a pre-existing failure in an unrelated package does NOT fail the check — only a NEW failure does. The result shape is:
|
|
53
|
-
|
|
54
|
-
```json
|
|
55
|
-
{
|
|
56
|
-
"results": [
|
|
57
|
-
{"check": "gate6"|"lint"|"typecheck"|"tests"|"audit", "status": "pass"|"fail"|"skipped",
|
|
58
|
-
"exit_code": number|null, "command": string, "stdout": string, "stderr": string,
|
|
59
|
-
"executed": boolean, "new_failures"?: string[]}
|
|
60
|
-
],
|
|
61
|
-
"any_failed": boolean,
|
|
62
|
-
"hard_fail_checks": [ ...names of checks that FAILED ]
|
|
63
|
-
}
|
|
64
|
-
```
|
|
65
|
-
|
|
66
|
-
Five checks run in order: `gate6` (sibling-identity parity — `node scripts/check-sibling-identity.mjs`), `lint`, `typecheck`, `tests`, `audit`. For the baselined checks (`lint`/`typecheck`/`tests`) `new_failures` lists the packages that newly fail (not in the baseline); `status` is `pass` when `new_failures` is empty **even if the underlying command exited non-zero** (those failures are pre-existing/baselined). `audit.new_failures` lists new GHSA advisory ids not in the allowlist. **`gate6` is ALWAYS hard-fail — it is never baselined**; its `new_failures` field is omitted (absent/`undefined` in the JSON, not `null`), and a sibling-parity divergence fails the round regardless of the baseline.
|
|
67
|
-
|
|
68
|
-
`hard_fail_checks` is dynamic — it lists only the checks that failed (`[]` when all pass; e.g. `["gate6"]` or `["typecheck","tests"]`), drawn from `results[].check`. The hard-fail checks for `--scope round` are `gate6`, `lint`, `typecheck`, and `tests` (`audit` is `--scope task` only). If `any_failed === true` (equivalently, `hard_fail_checks` is non-empty), this is a **hard fail** — surface each failing result's `stdout`/`stderr` (and `new_failures`) and stop.
|
|
69
|
-
|
|
70
|
-
### Step 3: Execute Conditional Checks
|
|
71
|
-
|
|
72
|
-
| Check | Command | Condition |
|
|
73
|
-
|-------|---------|-----------|
|
|
74
|
-
| **A11y** | Static check (aria, alt, focus) | UI files changed |
|
|
75
|
-
| **API Health** | `curl -s -o /dev/null -w "%{http_code}" http://localhost:{PORT}/` | API routes changed |
|
|
76
|
-
| **Visual** | Visual check flow (page-map + visual-check) | UI work + dev server running |
|
|
77
|
-
|
|
78
|
-
### Step 4: Analyze Output
|
|
79
|
-
|
|
80
|
-
Scan each runner result's `stdout`/`stderr` for:
|
|
81
|
-
- **Warnings** (not just errors)
|
|
82
|
-
- **Deprecation notices** (`grep -i "deprecat"` in output)
|
|
83
|
-
- **Console.log in changed files**: `grep -rn "console\.\(log\|debug\|info\)" {changed_files}` (exclude tests)
|
|
84
|
-
- **Bundle size warnings**
|
|
85
|
-
|
|
86
|
-
### Step 5: Save QA Results
|
|
87
|
-
|
|
88
|
-
Update round QA:
|
|
89
|
-
- **checkpoint KIND**: `codebyplan round update --id <round_id> --task-id <task_id> --checkpoint-id <checkpoint_id> --qa '<json>'` (CLI write-through: local state file + REST). Break-glass fallback: MCP `update_round(round_id, qa: ...)` when the CLI is unavailable.
|
|
90
|
-
- **standalone KIND**: MCP `update_standalone_round(standalone_round_id, qa: ...)`. (Standalone KIND still uses MCP until a later task.)
|
|
91
|
-
|
|
92
|
-
Map each runner result entry to a QA item:
|
|
93
|
-
|
|
94
|
-
```json
|
|
95
|
-
{
|
|
96
|
-
"items": [
|
|
97
|
-
{"type": "auto", "check": "gate6", "status": "pass", "ran_at": "...", "notes": null, "executed": true},
|
|
98
|
-
{"type": "auto", "check": "lint", "status": "pass", "ran_at": "...", "notes": null, "executed": true},
|
|
99
|
-
{"type": "auto", "check": "typecheck", "status": "fail", "ran_at": "...", "notes": "1 new failing package", "executed": true},
|
|
100
|
-
{"type": "auto", "check": "tests", "status": "pass", "ran_at": "...", "notes": "no new failures (baselined)", "executed": true}
|
|
101
|
-
]
|
|
102
|
-
}
|
|
103
|
-
```
|
|
104
|
-
|
|
105
|
-
### Step 6: Show Results
|
|
106
|
-
|
|
107
|
-
```
|
|
108
|
-
## Round Check Results
|
|
109
|
-
|
|
110
|
-
| Check | Status | Executed | Notes |
|
|
111
|
-
|-------|--------|----------|-------|
|
|
112
|
-
| gate6 | pass | yes | sibling-identity OK |
|
|
113
|
-
| lint | pass | yes | - |
|
|
114
|
-
| typecheck | fail | yes | 1 new failing package |
|
|
115
|
-
| tests | pass | yes | no new failures (baselined) |
|
|
116
|
-
| A11y | pass | yes | - |
|
|
117
|
-
| Visual| pass | yes | screenshots saved |
|
|
118
|
-
|
|
119
|
-
**Result**: [N] passed, [N] failed, [N] skipped
|
|
120
|
-
**Hard fail**: [yes/no]
|
|
121
|
-
```
|
|
122
|
-
|
|
123
|
-
If hard fail: `Mandatory checks failed. Fix issues before continuing.`
|
|
124
|
-
If soft failures only: `Run /cbp-round-start to trigger auto-fix, or fix manually.`
|
|
125
|
-
|
|
126
|
-
## Integration
|
|
127
|
-
|
|
128
|
-
- **Reads (checkpoint KIND)**: `.codebyplan/state/checkpoints/<id>.json`, `checkpoints/<id>/tasks/<id>.json`, `checkpoints/<id>/tasks/<id>/rounds/<id>.json` (local-first; run `npx codebyplan sync` if missing; break-glass: MCP `get_current_task` / `get_rounds`)
|
|
129
|
-
- **Reads (standalone KIND)**: MCP `get_current_standalone_task` / `get_standalone_rounds` (standalone KIND still uses MCP until a later task)
|
|
130
|
-
- **Writes (checkpoint KIND)**: `codebyplan round update` (qa field). Break-glass: MCP `update_round`.
|
|
131
|
-
- **Writes (standalone KIND)**: MCP `update_standalone_round` (qa field). (Standalone KIND still uses MCP until a later task.)
|
|
132
|
-
- **Runner**: `codebyplan check --scope round --json` (whole-repo + baseline via `turbo run`; runs gate6 + lint + typecheck + tests; `--files` is accepted but ignored in whole-repo mode)
|
|
133
|
-
- **ci.json awareness**: `codebyplan check` is turbo-native — it runs `turbo run lint|typecheck|test` directly and does NOT read `.codebyplan/ci.json`. ci.json command resolution (via `npx codebyplan ci resolve <category> [--platform <slug>]`) is used by non-check consumers (`cbp-testing-qa-agent`, `cbp-security-agent`, `cbp-standalone-task-testing`, `cbp-checkpoint-check`), with a central-default fallback ensuring exit 0 even when ci.json is absent.
|
|
134
|
-
- **Standalone**: Can be run independently at any time
|