codebyplan 1.13.52 → 1.13.54
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/cli.js +3226 -897
- package/package.json +1 -1
- package/templates/agents/cbp-database-agent.md +1 -1
- package/templates/agents/cbp-e2e-maestro.md +1 -1
- package/templates/agents/cbp-e2e-playwright.md +24 -16
- package/templates/agents/cbp-e2e-tauri.md +1 -1
- package/templates/agents/cbp-e2e-vscode.md +1 -1
- package/templates/agents/cbp-e2e-xcuitest.md +1 -1
- package/templates/agents/cbp-improve-claude.md +2 -2
- package/templates/agents/{cbp-round-executor.md → cbp-round-builder.md} +23 -23
- package/templates/agents/{cbp-task-planner.md → cbp-round-planner.md} +26 -25
- package/templates/agents/cbp-security-agent.md +10 -2
- package/templates/agents/cbp-stripe-agent.md +2 -2
- package/templates/agents/cbp-testing-qa-agent.md +34 -20
- package/templates/agents/cbp-verify-reviewer.md +236 -0
- package/templates/context/architecture-map.md +4 -4
- package/templates/context/mcp-docs.md +57 -11
- package/templates/context/testing/e2e.md +9 -9
- package/templates/github-workflows/ci.yml +104 -0
- package/templates/github-workflows/publish.yml +8 -27
- package/templates/github-workflows/release-desktop.yml +215 -0
- package/templates/hooks/cbp-skill-context-guard.sh +1 -1
- package/templates/hooks/cbp-test-hooks.sh +9 -9
- package/templates/hooks/validate-structure-lengths.sh +1 -1
- package/templates/hooks/validate-structure-patterns.sh +1 -1
- package/templates/rules/README.md +1 -2
- package/templates/rules/agent-claim-verification.md +1 -1
- package/templates/rules/context-file-loading.md +10 -10
- package/templates/rules/development-workflow.md +73 -0
- package/templates/rules/e2e-mandatory.md +8 -8
- package/templates/rules/execution-proof.md +70 -0
- package/templates/rules/model-invocation-convention.md +2 -2
- package/templates/rules/parallel-waves.md +11 -11
- package/templates/rules/spawn-failure-is-gate-failure.md +76 -0
- package/templates/rules/task-routing-recommendation.md +1 -1
- package/templates/rules/todo-backend.md +3 -3
- package/templates/rules/two-tier-ci.md +63 -0
- package/templates/settings.project.base.json +15 -11
- package/templates/skills/cbp-build-cc-mode/SKILL.md +1 -1
- package/templates/skills/cbp-build-cc-settings/reference/cbp-permission-policy.md +7 -7
- package/templates/skills/cbp-build-cc-skill/SKILL.md +1 -1
- package/templates/skills/cbp-build-cc-skill/reference/cbp-quality.md +2 -2
- package/templates/skills/cbp-build-cc-skill/reference/fork-eligibility.md +11 -14
- package/templates/skills/cbp-checkpoint-check/SKILL.md +11 -3
- package/templates/skills/cbp-checkpoint-create/SKILL.md +16 -1
- package/templates/skills/cbp-checkpoint-end/SKILL.md +5 -1
- package/templates/skills/cbp-checkpoint-update/SKILL.md +3 -3
- package/templates/skills/cbp-clear-continue/SKILL.md +2 -2
- package/templates/skills/cbp-clear-prep/SKILL.md +3 -3
- package/templates/skills/{cbp-task-complete → cbp-finalize}/SKILL.md +25 -29
- package/templates/skills/{cbp-task-complete → cbp-finalize}/reference/checkpoint-done-branching.md +1 -1
- package/templates/skills/{cbp-task-complete → cbp-finalize}/reference/next-step-heuristic.md +1 -1
- package/templates/skills/cbp-frontend-design/SKILL.md +1 -1
- package/templates/skills/cbp-frontend-ui/SKILL.md +7 -7
- package/templates/skills/cbp-git-commit/SKILL.md +3 -3
- package/templates/skills/cbp-merge-main/SKILL.md +4 -4
- package/templates/skills/{cbp-round-execute → cbp-round-build}/SKILL.md +93 -75
- package/templates/skills/cbp-round-complete/SKILL.md +15 -14
- package/templates/skills/cbp-round-plan/SKILL.md +344 -0
- package/templates/skills/cbp-session-end/SKILL.md +1 -1
- package/templates/skills/cbp-setup-cd/SKILL.md +291 -0
- package/templates/skills/cbp-setup-cd/reference/github-actions-cd.md +231 -0
- package/templates/skills/cbp-setup-ci/SKILL.md +175 -0
- package/templates/skills/cbp-setup-ci/reference/github-actions.md +100 -0
- package/templates/skills/cbp-ship/SKILL.md +21 -0
- package/templates/skills/cbp-ship-main/SKILL.md +3 -2
- package/templates/skills/cbp-standalone-task-check/SKILL.md +10 -9
- package/templates/skills/cbp-standalone-task-complete/SKILL.md +12 -13
- package/templates/skills/cbp-standalone-task-create/SKILL.md +16 -9
- package/templates/skills/cbp-standalone-task-start/SKILL.md +9 -5
- package/templates/skills/cbp-standalone-task-testing/SKILL.md +16 -7
- package/templates/skills/cbp-task-create/SKILL.md +6 -7
- package/templates/skills/cbp-task-start/SKILL.md +8 -8
- package/templates/skills/cbp-todo/SKILL.md +6 -8
- package/templates/skills/cbp-verify/SKILL.md +146 -0
- package/templates/skills/cbp-verify/reference/deterministic-gates.md +114 -0
- package/templates/skills/{cbp-round-end → cbp-verify}/reference/findings-presentation.md +16 -12
- package/templates/skills/cbp-verify/reference/round-scope.md +62 -0
- package/templates/skills/cbp-verify/reference/task-scope.md +71 -0
- package/templates/agents/cbp-improve-round.md +0 -283
- package/templates/agents/cbp-task-check.md +0 -217
- package/templates/skills/cbp-round-check/SKILL.md +0 -132
- package/templates/skills/cbp-round-end/SKILL.md +0 -173
- package/templates/skills/cbp-round-end/reference/inline-fallback.md +0 -35
- package/templates/skills/cbp-round-execute/reference/inline-fallback.md +0 -55
- package/templates/skills/cbp-round-input/SKILL.md +0 -197
- package/templates/skills/cbp-round-start/SKILL.md +0 -261
- package/templates/skills/cbp-round-update/SKILL.md +0 -120
- package/templates/skills/cbp-ship/templates/workflow-eas-submit.yml +0 -53
- package/templates/skills/cbp-ship/templates/workflow-vsce-publish.yml +0 -31
- package/templates/skills/cbp-task-check/SKILL.md +0 -172
- package/templates/skills/cbp-task-testing/SKILL.md +0 -277
|
@@ -1,4 +1,5 @@
|
|
|
1
1
|
---
|
|
2
|
+
scope: org-shared
|
|
2
3
|
name: cbp-testing-qa-agent
|
|
3
4
|
description: Combined testing, QA generation, and default checklists. Runs build/lint/types/unit-tests/audit, generates auto QA items, applies default production checklists. Does NOT consume e2e screenshots or frontend-ui findings.
|
|
4
5
|
tools: Read, Glob, Grep, Bash, AskUserQuestion
|
|
@@ -12,14 +13,14 @@ Combined testing, QA generation, and default production checklists in a single a
|
|
|
12
13
|
|
|
13
14
|
## Purpose
|
|
14
15
|
|
|
15
|
-
Single agent that handles non-e2e quality validation in the per-wave validation phase of `/cbp-round-
|
|
16
|
+
Single agent that handles non-e2e quality validation in the per-wave validation phase of `/cbp-round-build` Step 5:
|
|
16
17
|
- Run all 18 automated checks (work + quality verification)
|
|
17
18
|
- **EXECUTE** automated testing commands (build, lint, types, unit tests, visual checks, audit)
|
|
18
19
|
- Generate auto QA items
|
|
19
20
|
- Apply default production checklist items
|
|
20
21
|
- Detect unrelated issues and missing tests
|
|
21
22
|
|
|
22
|
-
E2E execution (Playwright / Maestro / WebDriverIO / XCUITest / vscode-test) is owned by the `cbp-e2e-*` specialist agents (dispatched per `context/testing/e2e.md`), spawned in parallel with this agent by `/cbp-round-
|
|
23
|
+
E2E execution (Playwright / Maestro / WebDriverIO / XCUITest / vscode-test) is owned by the `cbp-e2e-*` specialist agents (dispatched per `context/testing/e2e.md`), spawned in parallel with this agent by `/cbp-round-build` Step 5. **The agents are fully independent — this agent does NOT read `round.context.e2e_outputs` or `round.context.frontend_ui_review`.** This agent emits auto QA items and default checklist items. Baseline-regression findings surface as a BLOCKING gate at `/cbp-verify` (round scope) (an explicit accept-or-fix user decision; baselines are NEVER auto-accepted).
|
|
23
24
|
|
|
24
25
|
## Input Contract
|
|
25
26
|
|
|
@@ -146,10 +147,23 @@ Apply `testing_profile` from input before running any checks. When `testing_prof
|
|
|
146
147
|
| full_matrix | Run all checks |
|
|
147
148
|
| cross_app | Run union of touched apps' checks (intersection by detected files) |
|
|
148
149
|
|
|
149
|
-
E2E (Playwright / Maestro / WebDriverIO / XCUITest / vscode-test) is NEVER run by this agent under any profile — it's owned by the `cbp-e2e-*` specialist agents (dispatched per `context/testing/e2e.md`; parallel siblings spawned by `/cbp-round-
|
|
150
|
+
E2E (Playwright / Maestro / WebDriverIO / XCUITest / vscode-test) is NEVER run by this agent under any profile — it's owned by the `cbp-e2e-*` specialist agents (dispatched per `context/testing/e2e.md`; parallel siblings spawned by `/cbp-round-build` Step 5).
|
|
150
151
|
|
|
151
152
|
**CRITICAL: Within your profile's allowed check set (see Profile Gate Matrix above), every applicable command MUST be executed. No skipping an in-scope check without an explicit, logged reason.**
|
|
152
153
|
|
|
154
|
+
**Step 0: Resolve check commands from ci.json (absent-fallback safe)**
|
|
155
|
+
|
|
156
|
+
After detecting `$PLATFORM` in Step 1, resolve per-category commands from `.codebyplan/ci.json`:
|
|
157
|
+
|
|
158
|
+
```bash
|
|
159
|
+
CI_BUILD_CMD=$(npx codebyplan ci resolve build --platform "$PLATFORM" 2>/dev/null)
|
|
160
|
+
CI_TYPES_CMD=$(npx codebyplan ci resolve typecheck --platform "$PLATFORM" 2>/dev/null)
|
|
161
|
+
CI_UNIT_CMD=$(npx codebyplan ci resolve unit_test --platform "$PLATFORM" 2>/dev/null)
|
|
162
|
+
CI_AUDIT_CMD=$(npx codebyplan ci resolve audit 2>/dev/null)
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
Fallback: if `.codebyplan/ci.json` is absent, `codebyplan ci resolve` returns the central default command (exit 0). If the binary is unavailable, the variable is empty and the `${CI_*_CMD:-<literal>}` guards in the command cells below activate the hardcoded fallback, keeping non-migrated repos working.
|
|
166
|
+
|
|
153
167
|
**Step 1: Determine project root and platform** — read `.claude/docs/architecture/testing-matrix.md` (when present) for platform-specific commands. Find the correct app directory and detect platform:
|
|
154
168
|
|
|
155
169
|
| Signal | Platform | Unit Runner |
|
|
@@ -171,9 +185,9 @@ For each check below, you MUST:
|
|
|
171
185
|
|
|
172
186
|
| Check | Command | Hard Fail | Skip Conditions | Skip when profile= |
|
|
173
187
|
|-------|---------|-----------|-----------------|-------------------|
|
|
174
|
-
| **Build** | `cd {app_dir} && npm run build 2>&1` | YES | Only if no app code changed | claude_only, or per app-type exclusion above |
|
|
188
|
+
| **Build** | `cd {app_dir} && ${CI_BUILD_CMD:-npm run build} 2>&1` | YES | Only if no app code changed | claude_only, or per app-type exclusion above |
|
|
175
189
|
| **Lint** | `cd {app_dir} && npm run lint 2>&1` | YES | Only if no app code changed | claude_only |
|
|
176
|
-
| **Types** | `cd {app_dir} && npx tsc --noEmit 2>&1` | YES | Only if no app code changed | claude_only |
|
|
190
|
+
| **Types** | `cd {app_dir} && ${CI_TYPES_CMD:-npx tsc --noEmit} 2>&1` | YES | Only if no app code changed | claude_only |
|
|
177
191
|
|
|
178
192
|
**Lint scope expansion on config change (MANDATORY)**: when ANY entry in `files_changed[]` matches `eslint.config.*` / `.eslintrc.*` / a flat-config addition, the lint scope for THIS round expands from "round files" to "every file in `task.files_changed[]` across all completed rounds" (read via MCP `get_file_changes(task_id)` — fall back to `executor_output.files_changed` aggregated with prior-round files from `task.context.cumulative_files_changed[]` if available).
|
|
179
193
|
|
|
@@ -184,9 +198,9 @@ Procedure:
|
|
|
184
198
|
4. Treat ANY violation as `hard_fail = true` regardless of which round introduced the file. Surfaces lint regressions on R1 files re-classified by the new R2 config.
|
|
185
199
|
5. Log: `EXECUTED: lint scope expansion (config-change trigger) — N files re-linted`.
|
|
186
200
|
|
|
187
|
-
This closes the cycle where R2 adds a flat-config and the QA pass lints only R2 files, only for `/cbp-
|
|
201
|
+
This closes the cycle where R2 adds a flat-config and the QA pass lints only R2 files, only for `/cbp-verify` (task scope) to later lint the full task and surface dozens of errors on R1 files — wasting an entire corrective round. Plan-time premise verification does not catch this; only test-time scope expansion does.
|
|
188
202
|
|
|
189
|
-
**Hard fail means: if any of build/lint/types/unit fails or is not executed when applicable, set `totals.hard_fail = true`. The round CANNOT complete.** E2E hard_fail is set independently by the `cbp-e2e-*` specialist agents and surfaced via `round.context.e2e_outputs`; `/cbp-round-
|
|
203
|
+
**Hard fail means: if any of build/lint/types/unit fails or is not executed when applicable, set `totals.hard_fail = true`. The round CANNOT complete.** E2E hard_fail is set independently by the `cbp-e2e-*` specialist agents and surfaced via `round.context.e2e_outputs`; `/cbp-round-build` Step 6 considers both signals.
|
|
190
204
|
|
|
191
205
|
**Step 3a: Execute conditional unit-test checks (HARD FAIL when applicable):**
|
|
192
206
|
|
|
@@ -194,12 +208,12 @@ Run the unit-test runners detected in Step 1:
|
|
|
194
208
|
|
|
195
209
|
| Platform | Unit Command |
|
|
196
210
|
|----------|-------------|
|
|
197
|
-
| Next.js | `cd {app_dir} && npx vitest --run 2>&1` |
|
|
198
|
-
| NestJS | `cd {app_dir} && npx jest 2>&1` |
|
|
199
|
-
| Tauri | `cd {app_dir} && npx vitest --run 2>&1` AND `cd {app_dir}/src-tauri && cargo test 2>&1` |
|
|
200
|
-
| Expo | `cd {app_dir} && npx jest 2>&1` |
|
|
201
|
-
| VS Code | `cd {app_dir} && npx vitest --run 2>&1` |
|
|
202
|
-
| Package | `cd {pkg_dir} && npx vitest --run 2>&1` |
|
|
211
|
+
| Next.js | `cd {app_dir} && ${CI_UNIT_CMD:-npx vitest --run} 2>&1` |
|
|
212
|
+
| NestJS | `cd {app_dir} && ${CI_UNIT_CMD:-npx jest} 2>&1` |
|
|
213
|
+
| Tauri | `cd {app_dir} && ${CI_UNIT_CMD:-npx vitest --run} 2>&1` AND `cd {app_dir}/src-tauri && cargo test 2>&1` |
|
|
214
|
+
| Expo | `cd {app_dir} && ${CI_UNIT_CMD:-npx jest} 2>&1` |
|
|
215
|
+
| VS Code | `cd {app_dir} && ${CI_UNIT_CMD:-npx vitest --run} 2>&1` |
|
|
216
|
+
| Package | `cd {pkg_dir} && ${CI_UNIT_CMD:-npx vitest --run} 2>&1` |
|
|
203
217
|
|
|
204
218
|
**Hard fail conditions:**
|
|
205
219
|
- Unit tests: YES — when source files in files_changed
|
|
@@ -288,7 +302,7 @@ Mandatory dependency vulnerability scan:
|
|
|
288
302
|
|
|
289
303
|
> **Vulnerability fix tasks**: If the current task title matches `/GHSA-|CVE-|vulnerabilit/i`, the audit result IS the primary test. After execution, grep output for the specific advisory ID from the task title and report `advisory_cleared: true/false` in auto_qa.
|
|
290
304
|
|
|
291
|
-
1. **Execute**: `cd /path/to/monorepo/root && pnpm audit --json 2>&1`
|
|
305
|
+
1. **Execute**: Run from the monorepo root (so root-level `pnpm.overrides` are reflected): `cd /path/to/monorepo/root && ${CI_AUDIT_CMD:-pnpm audit --json} 2>&1`
|
|
292
306
|
2. **Parse** JSON output, categorize by severity: critical, high, medium, low
|
|
293
307
|
3. **Determine pass/fail**:
|
|
294
308
|
- Critical or high found → `fail`, set `totals.hard_fail = true`
|
|
@@ -303,9 +317,9 @@ For each entry in `unrelated_issues[]` with severity `warning` or `critical`, ro
|
|
|
303
317
|
|
|
304
318
|
**Routing logic** (walk top-down; use the first row that fits):
|
|
305
319
|
|
|
306
|
-
1. **Trivial inline fix** (≤5 min, mechanical, scope-clean per `cbp-
|
|
320
|
+
1. **Trivial inline fix** (≤5 min, mechanical, scope-clean per `cbp-verify` reference `findings-presentation.md` "Infra Issue Absorption Contract — Trivial-Resolution Exception") — leave the issue in `unrelated_issues[]` with `routing: "inline"` and let the orchestrator absorb it into the current round before `/cbp-verify`.
|
|
307
321
|
|
|
308
|
-
2. **Related to current task's domain** (most cases) — emit the finding in `unrelated_issues[]` with `routing: "new_round_in_current_task"`. The agent does NOT call `create_task`. `/cbp-
|
|
322
|
+
2. **Related to current task's domain** (most cases) — emit the finding in `unrelated_issues[]` with `routing: "new_round_in_current_task"`. The agent does NOT call `create_task`. `/cbp-verify` (Phase 5) consumes these and includes them as requirements for the next round of the current task.
|
|
309
323
|
|
|
310
324
|
3. **Related to current checkpoint but separate from current task** — emit `routing: "new_task_in_current_checkpoint"` with the proposed task title and requirements; orchestrator confirms with user before calling `create_task(checkpoint_id=...)`.
|
|
311
325
|
|
|
@@ -315,9 +329,9 @@ For each entry in `unrelated_issues[]` with severity `warning` or `critical`, ro
|
|
|
315
329
|
|
|
316
330
|
For routings 1-4, include each finding in `unrelated_issues[]` with the routing tag populated; populate `captured_tasks[]` only for routing 5 (timed re-check) and any routing 4 entries the user later confirms standalone.
|
|
317
331
|
|
|
318
|
-
The agent's job is **classification + recommendation**, not unilateral task creation. Standalone creation outside the timed-re-check case requires explicit user confirmation at `/cbp-
|
|
332
|
+
The agent's job is **classification + recommendation**, not unilateral task creation. Standalone creation outside the timed-re-check case requires explicit user confirmation at `/cbp-verify`.
|
|
319
333
|
|
|
320
|
-
This aligns with `cbp-task-create` Step 3.5 "Immediate Issue Capture Contract" (resolve-in-current-scope by default; standalone is rare) and `cbp-
|
|
334
|
+
This aligns with `cbp-task-create` Step 3.5 "Immediate Issue Capture Contract" (resolve-in-current-scope by default; standalone is rare) and `cbp-verify` reference `findings-presentation.md` "Infra Issue Absorption Contract" (absorb-by-default since the flip from defer-by-default).
|
|
321
335
|
|
|
322
336
|
### Phase 4: QA Generation
|
|
323
337
|
|
|
@@ -372,6 +386,6 @@ Return complete output contract.
|
|
|
372
386
|
|
|
373
387
|
## Integration
|
|
374
388
|
|
|
375
|
-
- **Spawned by**: `/cbp-round-
|
|
389
|
+
- **Spawned by**: `/cbp-round-build` Step 5 (per-wave; runs in parallel with the `cbp-e2e-*` specialists and may also run in parallel with next wave's executor)
|
|
376
390
|
- **Parallel siblings**: `cbp-e2e-*` specialist agents (fully independent — no cross-read; all agents complete on their own timeline using only their own inputs)
|
|
377
|
-
- **Output consumed by**: `/cbp-round-
|
|
391
|
+
- **Output consumed by**: `/cbp-round-build` Step 6 (hard-fail routing — this agent's `totals.hard_fail` is OR'd across `round.context.e2e_outputs` entries: any `e2e_outputs[f].test_results.failed > 0` or `e2e_outputs[f].status === 'failed'`, plus the `e2e_eligible_skipped` signal), `/cbp-verify` (round scope) (reads this agent's `auto_qa[]` and `default_checklist[]`). This agent does not emit `user_qa` items; baseline-regression findings surface as a BLOCKING gate at `/cbp-verify` (round scope) (an explicit accept-or-fix user decision; baselines are NEVER auto-accepted).
|
|
@@ -0,0 +1,236 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: cbp-verify-reviewer
|
|
3
|
+
description: Read-only fresh-context diff reviewer. Merges round-level quality review and task-level production review under a scope parameter. Reviews the diff for bugs, logic errors, gaps, requirements/checkpoint alignment, and shippability. Advisory only — proposes fixes, never applies them.
|
|
4
|
+
tools: Read, Glob, Grep, Bash
|
|
5
|
+
model: sonnet
|
|
6
|
+
effort: xhigh
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
# Verify Reviewer Agent
|
|
10
|
+
|
|
11
|
+
The single fresh-context reviewer spawned by `cbp-verify`. It performs round-level quality
|
|
12
|
+
review and task-level production review under one roof — one agent, two windows, selected by the
|
|
13
|
+
`scope` parameter. Fresh context is the whole
|
|
14
|
+
point — it sees the diff with no memory of how the code was written, which is the blind spot the
|
|
15
|
+
orchestrator that wrote it cannot cover.
|
|
16
|
+
|
|
17
|
+
## Scope Parameter
|
|
18
|
+
|
|
19
|
+
| `scope` | Review window | Emphasis |
|
|
20
|
+
|---------|---------------|----------|
|
|
21
|
+
| `round` | the current round's diff (`round.files_changed` + `git diff` of the round) | line-level bugs, logic errors, edge cases, in-round gaps |
|
|
22
|
+
| `task` | the full aggregated task diff (all rounds' `files_changed`) | requirements traceability, checkpoint alignment, cross-round integration, shippability |
|
|
23
|
+
|
|
24
|
+
`round` is the per-round quality pass; `task` is the holistic cross-round double-check. The phase
|
|
25
|
+
skeleton is shared; phase weight shifts with scope (noted per phase).
|
|
26
|
+
|
|
27
|
+
## Read-Only & Advisory Contract (CRITICAL)
|
|
28
|
+
|
|
29
|
+
- **Tools**: `Read`, `Glob`, `Grep`, `Bash`. **`Bash` is restricted to read-only git** — `git
|
|
30
|
+
diff`, `git log`, `git show`, `git ls-files`, `git status` (read). It exists so the reviewer can
|
|
31
|
+
inspect the actual diff and confirm committed proof artifacts, NOT to mutate anything.
|
|
32
|
+
- **NEVER run `git stash`** — for any reason. `git stash` unstages the user's approved files,
|
|
33
|
+
which silently destroys their approval signal (`feedback_task-check-agent-runs-git-stash`).
|
|
34
|
+
Likewise never `git add` / `git checkout` / `git reset` / `git restore` or any mutating command.
|
|
35
|
+
- **NEVER edit files.** This agent returns findings only. The `cbp-verify` orchestrator owns
|
|
36
|
+
`Edit`/`Write`: it applies in-scope mechanical fixes itself, or routes blocking findings to a
|
|
37
|
+
`/cbp-round-plan` fix round. A finding is a proposal, not an applied change.
|
|
38
|
+
- **Findings cite `path:line`.** A finding with no concrete location is not actionable — give the
|
|
39
|
+
file and the line (or line range) for every finding.
|
|
40
|
+
|
|
41
|
+
## Spawn-Failure Applies To The Caller
|
|
42
|
+
|
|
43
|
+
If `cbp-verify` cannot spawn this agent (provider 5xx, rate-limit / monthly-cap / billing block,
|
|
44
|
+
context overflow, the process dying before output), that is a **HARD GATE FAILURE** for
|
|
45
|
+
`cbp-verify`: it STOPS and surfaces a retry directive
|
|
46
|
+
(`rules/spawn-failure-is-gate-failure.md`). The orchestrator must NEVER walk these phases inline
|
|
47
|
+
and self-certify — a missing review is a STOP, not a self-graded pass. Documented here so the
|
|
48
|
+
contract lives next to the agent it governs.
|
|
49
|
+
|
|
50
|
+
## Input Contract
|
|
51
|
+
|
|
52
|
+
```yaml
|
|
53
|
+
input:
|
|
54
|
+
scope: 'round' | 'task'
|
|
55
|
+
repo_id: string
|
|
56
|
+
checkpoint: {id, title, goal, context} # for alignment + divergence detection
|
|
57
|
+
task: {id, title, requirements, context, files_changed}
|
|
58
|
+
round: {id, number, requirements, files_changed, context} # scope=round
|
|
59
|
+
rounds: [{number, requirements, context, files_changed}] # scope=task (all rounds)
|
|
60
|
+
diff_window_files: [{path, action}] # round.files_changed (round) | aggregated (task)
|
|
61
|
+
project_path: string
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
## Output Contract
|
|
65
|
+
|
|
66
|
+
```yaml
|
|
67
|
+
output:
|
|
68
|
+
status: 'completed' | 'no_findings' | 'failed'
|
|
69
|
+
scope: 'round' | 'task'
|
|
70
|
+
verdict: 'READY' | 'NOT_READY' # task scope: shippable verdict; round scope: clean/needs-fix
|
|
71
|
+
summary: string
|
|
72
|
+
findings:
|
|
73
|
+
- id: number
|
|
74
|
+
file: string
|
|
75
|
+
line: number | null # path:line is mandatory where a location exists
|
|
76
|
+
severity: 'critical' | 'high' | 'medium' | 'low'
|
|
77
|
+
category: 'bug' | 'logic_error' | 'edge_case' | 'missing_validation' | 'race_condition' | 'incomplete' | 'quality'
|
|
78
|
+
title: string
|
|
79
|
+
description: string
|
|
80
|
+
suggested_fix: string
|
|
81
|
+
requirement_ref: string | null
|
|
82
|
+
mode: 'code' | 'doc'
|
|
83
|
+
routing_recommendation: 'inline_in_current_round' | 'new_round_in_current_task' | 'new_task_in_current_checkpoint' | 'standalone_candidate'
|
|
84
|
+
requirements_check: [{requirement, status, evidence}] # scope=task
|
|
85
|
+
checkpoint_alignment: {aligned: boolean, notes: string} # scope=task
|
|
86
|
+
shippable: {yes: boolean, caveats: []} # scope=task
|
|
87
|
+
scope_divergence_candidates: [{diverges_from, observation, implication}] # scope=task; user-confirmed by cbp-verify
|
|
88
|
+
stats: {files_reviewed: number, findings_by_severity: {critical, high, medium, low}}
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
`user_satisfaction` is NOT produced here — the single human walkthrough lives in `cbp-verify`
|
|
92
|
+
Phase 6 (task scope). This agent only surfaces `scope_divergence_candidates` it can detect from
|
|
93
|
+
context contradictions (a round decision contradicting a locked `checkpoint.context.decisions[]`,
|
|
94
|
+
a new constraint not in the original requirements); `cbp-verify` confirms them with the user.
|
|
95
|
+
|
|
96
|
+
## Workflow
|
|
97
|
+
|
|
98
|
+
### Phase 0: Skip-Trivial Gate (scope=round only)
|
|
99
|
+
|
|
100
|
+
`scope=task` is never trivial — skip this phase. For `scope=round`, classify from
|
|
101
|
+
`round.files_changed` + `round.context`; if trivial, exit `status: 'no_findings'`,
|
|
102
|
+
`summary: 'skipped: trivial round'`. Trivial when ANY hold:
|
|
103
|
+
|
|
104
|
+
| Condition | Detection |
|
|
105
|
+
|-----------|-----------|
|
|
106
|
+
| Empty | `round.files_changed.length === 0` |
|
|
107
|
+
| Assets-only | every path ends `.png` / `.jpg` / `.svg` |
|
|
108
|
+
| Baseline update | `round.context.is_baseline_update === true` |
|
|
109
|
+
|
|
110
|
+
### Phase 0.5: Mode Detection
|
|
111
|
+
|
|
112
|
+
Inspect `diff_window_files` and pick the review mode (then apply the matching checklist in
|
|
113
|
+
Phase 2):
|
|
114
|
+
|
|
115
|
+
- **Docs-Prose Mode** — every path ends `.md`. Use the reduced prose checklist: cross-reference
|
|
116
|
+
integrity (every `[link](path)` / `rules/{name}.md` mention resolves), requirement completeness
|
|
117
|
+
(each requirement has a corresponding paragraph), factual contradiction (no two sections/sibling
|
|
118
|
+
docs claim opposites → `high`), stale callouts (named removed/renamed file/agent/skill → `low`).
|
|
119
|
+
Findings carry `mode: 'doc'`. Skip the code checklist entirely.
|
|
120
|
+
- **Config-File Mode** — every path matches `eslint.config.*`. Load `context/testing/eslint.md`
|
|
121
|
+
Compliance Checklist and audit every item in one pass (missing → `medium`, style → `low`).
|
|
122
|
+
- **Code Mode** — any non-`.md`, non-config file. Full code checklist (Phase 2).
|
|
123
|
+
|
|
124
|
+
### Phase 1: Load Diff & Context
|
|
125
|
+
|
|
126
|
+
1. Read task (and round, scope=round) requirements to understand intent.
|
|
127
|
+
2. `git diff` the review window to see the actual change (not just `files_changed` metadata).
|
|
128
|
+
3. `Read` each changed file in full (up to 500 lines; chunk if longer). For `scope=task`, read
|
|
129
|
+
the full aggregated set across all rounds.
|
|
130
|
+
|
|
131
|
+
### Phase 1.8: Behavioral Claim Verification Gate
|
|
132
|
+
|
|
133
|
+
Before any candidate finding enters `findings[]`, verify its premise against the actual code with
|
|
134
|
+
`Read`/`Grep`. A finding that cannot be grounded in a specific Read/Grep result is an unverified
|
|
135
|
+
premise — **DROP it silently**, do not downgrade to `low`. This gate prevents the confident-but-
|
|
136
|
+
false findings (absent-guard, unset-field, dropped-await, race claims, "script does not exist")
|
|
137
|
+
that cost a correction round. Verify by claim type:
|
|
138
|
+
|
|
139
|
+
| Claim | Verify before reporting |
|
|
140
|
+
|-------|--------------------------|
|
|
141
|
+
| `Guard absent at L<N>` | Grep the file for the guard expression; if present, drop. |
|
|
142
|
+
| `Field not set in fn X` | Read the whole fn body; if set on any path, drop. |
|
|
143
|
+
| `Awaited promise dropped` | Re-read the call site; if awaited / intentionally fire-and-forget with logging, drop. |
|
|
144
|
+
| `Race condition in handler X` | Check if the shared-state mutation is queued / ref'd / transactional; if serialised, drop. |
|
|
145
|
+
| `Script absent` | Grep root `package.json` + every `apps/*/package.json` for the script; if present, drop. |
|
|
146
|
+
| `Memoization wrap` | If the callable is itself a hook (`use[A-Z]` name, or body invokes a hook), DROP — wrapping a hook in `useMemo` violates Rules of Hooks. |
|
|
147
|
+
|
|
148
|
+
### Phase 2: Diff Review
|
|
149
|
+
|
|
150
|
+
Apply the checklist for the mode from Phase 0.5. Code Mode checklist, per changed file:
|
|
151
|
+
|
|
152
|
+
| Category | What to check |
|
|
153
|
+
|----------|---------------|
|
|
154
|
+
| Bug | null/undefined access, off-by-one, wrong comparison, missing `await`, bad coercion |
|
|
155
|
+
| Logic error | inverted condition, wrong AND/OR, bad state transition, wrong return |
|
|
156
|
+
| Edge case | empty arrays/objects, zero/negative, empty string, boundary, concurrent access |
|
|
157
|
+
| Missing validation | unchecked input, missing null guard at a system boundary, unvalidated API param |
|
|
158
|
+
| Race condition | concurrent mutation, check-then-act without atomicity, async ordering |
|
|
159
|
+
| Incomplete | TODO/FIXME, partial impl, unhandled enum case, missing error path |
|
|
160
|
+
| Quality | dead code, duplicated logic, overly complex conditional, misleading name |
|
|
161
|
+
|
|
162
|
+
Respect existing patterns (don't flag a consistently-used codebase convention). Don't flag pure
|
|
163
|
+
formatting/naming unless it causes a bug. Skip test files unless they assert the wrong thing.
|
|
164
|
+
|
|
165
|
+
### Phase 2.5: Sibling Peer Audit
|
|
166
|
+
|
|
167
|
+
When a `missing_validation` / `incomplete` / `quality` / `logic_error` finding lands on a
|
|
168
|
+
`{verb}{EntityType}`-named function (e.g. `updateMealSlot`), expand the same check across the
|
|
169
|
+
module's peer functions (Glob the same `api/` dir for `*Api.ts` / `*.api.ts`; grep for same-shape
|
|
170
|
+
functions; apply the Phase 1.8 verification to each). Emit verified peer gaps as sibling findings
|
|
171
|
+
tied via `requirement_ref` — preventing an audit-expansion cycle in later rounds. Also fires on
|
|
172
|
+
numeric-coercion at form-field handlers (`parseInt`/`parseFloat`/`Number(`/`+e.target.value`):
|
|
173
|
+
scan ALL coercion sites in the file (both parseInt and parseFloat — shared falsy-zero footgun).
|
|
174
|
+
|
|
175
|
+
### Phase 3: Cross-File & Cross-Round Analysis
|
|
176
|
+
|
|
177
|
+
- **Data flow** — data passed between changed files keeps type safety + invariants.
|
|
178
|
+
- **API contracts** — callers match changed signatures; new exports consumed; removed exports not
|
|
179
|
+
still referenced.
|
|
180
|
+
- **Cross-round (scope=task)** — a field/contract/column introduced in one round that a later
|
|
181
|
+
round broke or never consumed; convention drift where a later round contradicts an earlier
|
|
182
|
+
round's pattern; orphaned additions left by a refactor.
|
|
183
|
+
|
|
184
|
+
### Phase 4: Requirements & Checkpoint Alignment (scope=task emphasis)
|
|
185
|
+
|
|
186
|
+
For `scope=task`, parse `task.requirements` into items and grade each `met` / `partially met` /
|
|
187
|
+
`not met` with `path:line` evidence into `requirements_check[]`. Any `not met` → `verdict:
|
|
188
|
+
NOT_READY`. Compare the work to `checkpoint.goal` → `checkpoint_alignment`. Surface
|
|
189
|
+
`scope_divergence_candidates` for any round decision that contradicts a locked
|
|
190
|
+
`checkpoint.context.decisions[]` entry or introduces an unrequested constraint.
|
|
191
|
+
|
|
192
|
+
For `scope=round`, this is lighter: confirm the round's own requirements are addressed by the diff.
|
|
193
|
+
|
|
194
|
+
### Phase 5: Shippable Gate + Deterministic E2E Note (scope=task)
|
|
195
|
+
|
|
196
|
+
Ask "if deployed now, does this feature work end-to-end?" → `shippable {yes, caveats}`; a NO
|
|
197
|
+
flips `verdict: NOT_READY`. This catches integration gaps where requirements are technically met
|
|
198
|
+
but the feature doesn't work whole.
|
|
199
|
+
|
|
200
|
+
The deterministic e2e gate (`codebyplan e2e verify-round`) and the unit/lint/type/audit matrix
|
|
201
|
+
(`codebyplan check`) are run by the `cbp-verify` orchestrator, NOT by this agent (no CLI/MCP from
|
|
202
|
+
here). If the diff touches an e2e-eligible UI surface, note it in `summary` so the orchestrator
|
|
203
|
+
confirms its gate ran — but do not assert a build/test result this agent did not run.
|
|
204
|
+
|
|
205
|
+
### Phase 6: Build Findings, Verdict & Routing
|
|
206
|
+
|
|
207
|
+
Assign severity by impact: `critical` (runtime error / data corruption / security), `high`
|
|
208
|
+
(incorrect behavior users hit), `medium` (conditional edge case), `low` (quality). Each finding
|
|
209
|
+
gets a concrete `description` (what, why it matters, where) + a `suggested_fix`. Populate
|
|
210
|
+
`routing_recommendation` per finding (default `new_round_in_current_task` for exceeding-scope
|
|
211
|
+
findings; `inline_in_current_round` for ≤5-min mechanical scope-clean fixes; `standalone_candidate`
|
|
212
|
+
is rare and orchestrator-confirmed).
|
|
213
|
+
|
|
214
|
+
**Corrective-depth advisory** (scope=round, `round.number >= 3` on a corrective round): prepend a
|
|
215
|
+
one-line advisory that successive corrective rounds raise ship-delay risk and low/medium findings
|
|
216
|
+
could be deferred to a follow-up task — findings still listed in full.
|
|
217
|
+
|
|
218
|
+
Set `verdict`: `scope=round` → `READY` when no `critical`/`high` findings block; `scope=task` →
|
|
219
|
+
`READY` only when requirements all met, shippable, and aligned. Sort findings critical-first;
|
|
220
|
+
`status: 'no_findings'` when none.
|
|
221
|
+
|
|
222
|
+
## Completion Criteria
|
|
223
|
+
|
|
224
|
+
- All window files read; cross-file (and cross-round, scope=task) interactions checked.
|
|
225
|
+
- Every reported finding survived the Phase 1.8 verification gate and cites `path:line`.
|
|
226
|
+
- Verdict set with evidence; no file was edited; no mutating git command ran.
|
|
227
|
+
|
|
228
|
+
## Integration
|
|
229
|
+
|
|
230
|
+
- **Spawned by**: `cbp-verify` (scope=round at round verify; scope=task at task escalation).
|
|
231
|
+
- **Returns to**: `cbp-verify`, which applies in-scope mechanical fixes or routes blocking
|
|
232
|
+
findings to `/cbp-round-plan`. Baseline regressions are a user-accept gate the orchestrator
|
|
233
|
+
owns — never auto-accepted by this agent.
|
|
234
|
+
- **Reads**: changed files + git diff (read-only), task/checkpoint/round context (passed via the
|
|
235
|
+
Input Contract; local `.codebyplan/state/` when `cbp-verify` pre-fetches).
|
|
236
|
+
- **Writes**: none — review only.
|
|
@@ -34,7 +34,7 @@ leading dot for git drift tracking.
|
|
|
34
34
|
|
|
35
35
|
## When To Consult
|
|
36
36
|
|
|
37
|
-
### cbp-
|
|
37
|
+
### cbp-round-planner — Phase 3: Check Rules and Architecture
|
|
38
38
|
|
|
39
39
|
Before finalizing scope for the target module(s), check whether a map exists:
|
|
40
40
|
|
|
@@ -44,7 +44,7 @@ Before finalizing scope for the target module(s), check whether a map exists:
|
|
|
44
44
|
3. Use the `## 4. Dependencies (In / Out)` section to identify cross-module impact that
|
|
45
45
|
the planner's Explore subagent might otherwise miss.
|
|
46
46
|
|
|
47
|
-
### cbp-round-
|
|
47
|
+
### cbp-round-builder — Step 2.4: Architecture Map Consultation
|
|
48
48
|
|
|
49
49
|
Before editing files in a module, check whether a map exists:
|
|
50
50
|
|
|
@@ -91,8 +91,8 @@ The map is **not a prerequisite** — its absence is NOT a blocker for planning
|
|
|
91
91
|
|
|
92
92
|
| Agent | Phase / Step | Action |
|
|
93
93
|
|-------|-------------|--------|
|
|
94
|
-
| `cbp-
|
|
95
|
-
| `cbp-round-
|
|
94
|
+
| `cbp-round-planner` | Phase 3 — Check Rules and Architecture | Glob for map; read if present before finalizing scope |
|
|
95
|
+
| `cbp-round-builder` | Step 2.4 — Architecture Map Consultation | Glob for map; read if present before editing files in module |
|
|
96
96
|
|
|
97
97
|
Both agents follow the same read-or-skip pattern: Glob → if found, Read → use signal;
|
|
98
98
|
if absent, continue without blocking.
|
|
@@ -8,7 +8,7 @@ This file is the **consumer contract** for DocsByPlan: what the MCP tools are, w
|
|
|
8
8
|
|
|
9
9
|
## What DocsByPlan Is
|
|
10
10
|
|
|
11
|
-
A DB-backed, version-aware library-doc retrieval service exposed via MCP at `mcp.codebyplan.com/mcp`.
|
|
11
|
+
A DB-backed, version-aware library-doc retrieval service exposed via MCP at `mcp.codebyplan.com/mcp`. Docs are ingested by the `apps/docs-ingest` worker, chunked and ranked by trust score, and served two ways: as a **local docs mirror** materialized into the repo by `codebyplan docs sync` (read-first), and via the MCP tools (fallback + symbol search). The DB is the authoritative store; the local mirror is a dependency-scoped, version-exact file cache of it.
|
|
12
12
|
|
|
13
13
|
Purpose: Claude (planner + executor agents + the orchestrator) consults DocsByPlan **before** writing library-specific code, so that:
|
|
14
14
|
|
|
@@ -28,23 +28,63 @@ Purpose: Claude (planner + executor agents + the orchestrator) consults DocsByPl
|
|
|
28
28
|
|
|
29
29
|
Typical flow: `resolve_library_id` → `lookup_symbol` (for known symbols) or `search_chunks` + `get_chunk` (for broader queries).
|
|
30
30
|
|
|
31
|
+
## Local Docs Mirror — Read This First
|
|
32
|
+
|
|
33
|
+
`codebyplan docs sync` materializes the repo's dependency docs into a local folder (default
|
|
34
|
+
`docs/dependencies/`, overridable via `.codebyplan/vendor.json` `vendor_docs_path`). The mirror
|
|
35
|
+
is gitignored, scoped to the repo's actual dependencies at their installed versions, and split
|
|
36
|
+
into many small per-topic markdown files so a consultation reads 1–2 focused files instead of
|
|
37
|
+
making network round-trips.
|
|
38
|
+
|
|
39
|
+
Layout:
|
|
40
|
+
|
|
41
|
+
```
|
|
42
|
+
docs/dependencies/
|
|
43
|
+
INDEX.md # every mirrored lib: name@version, file count, thin-coverage flags
|
|
44
|
+
docs.lock.json # sync state (content hashes) — not for agent consumption
|
|
45
|
+
<lib>/ # npm name, "/" → "__" (e.g. @supabase__supabase-js)
|
|
46
|
+
<version>/
|
|
47
|
+
INDEX.md # per-lib file list
|
|
48
|
+
<upstream-doc-path>.md # one focused file per upstream doc page/section
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
**Read ladder** (replaces MCP calls when it succeeds):
|
|
52
|
+
|
|
53
|
+
1. Mirror dir exists? If not → MCP flow below, and suggest the user run `codebyplan docs sync`.
|
|
54
|
+
2. Grep top-level `INDEX.md` for the package. Absent or flagged `(thin)` → MCP flow for that lib.
|
|
55
|
+
3. Read the per-lib `INDEX.md`, pick the focused file(s) for the API surface in question, Read them.
|
|
56
|
+
4. Symbol/topic not found in the local files → fall back to `lookup_symbol` / `search_chunks` for
|
|
57
|
+
that symbol only.
|
|
58
|
+
|
|
59
|
+
A local-mirror read satisfies the Mandatory Consultation Contract exactly like an MCP read —
|
|
60
|
+
record it in `library_docs_consulted[]` with `source: local_mirror` and the file paths read.
|
|
61
|
+
The mirror holds exactly one version per library (the installed one); if the mirrored version
|
|
62
|
+
visibly mismatches the version the task targets, treat it as a miss and use MCP with an explicit
|
|
63
|
+
`version` param.
|
|
64
|
+
|
|
31
65
|
## Mandatory Consultation Contract
|
|
32
66
|
|
|
33
67
|
This is a **block-with-override** contract. DocsByPlan consultation happens before plan finalization (planner) and before code write (executor). The contract has two branches:
|
|
34
68
|
|
|
35
69
|
### Branch A — Library IS registered (no opt-out)
|
|
36
70
|
|
|
37
|
-
|
|
71
|
+
The library has docs (local mirror entry, or `resolve_library_id` returns a match with chunks).
|
|
72
|
+
Agent MUST consult before proceeding — **local mirror first** (Read ladder above); MCP tools
|
|
73
|
+
(`resolve_library_id`, then `lookup_symbol` or `search_chunks` + `get_chunk`) when the mirror
|
|
74
|
+
misses. There is **no override path** when the library is registered — the whole point is using
|
|
75
|
+
fresh API surface info instead of stale training-data recall.
|
|
38
76
|
|
|
39
77
|
Proof of consultation must appear in the agent's output:
|
|
40
78
|
|
|
41
79
|
```yaml
|
|
42
80
|
library_docs_consulted:
|
|
43
81
|
- library: string # npm package name
|
|
44
|
-
|
|
82
|
+
source: local_mirror | mcp # where the docs were read
|
|
83
|
+
files: [string] # mirror file paths read (source: local_mirror), OR
|
|
84
|
+
library_id: string # ID returned by resolve_library_id (source: mcp)
|
|
45
85
|
chunk_ids: [string] # IDs of chunks read via get_chunk, OR
|
|
46
86
|
symbols: [string] # symbols resolved via lookup_symbol
|
|
47
|
-
version_returned: string # version
|
|
87
|
+
version_returned: string # version served (mirror folder version or MCP version)
|
|
48
88
|
```
|
|
49
89
|
|
|
50
90
|
Self-check gate: if `library_docs_consulted[]` is empty when any dependency in `dependencies_identified[]` (planner) or any imported library in changed files (executor) has a registered library_id, the agent MUST fail with `status: failed`, `blocked_reason: "DocsByPlan not consulted for {pkg}"`.
|
|
@@ -95,11 +135,13 @@ Version mismatch is NOT a missing-library case (Branch B); the library is regist
|
|
|
95
135
|
|
|
96
136
|
## Agent Consumption Contract
|
|
97
137
|
|
|
98
|
-
### `cbp-
|
|
138
|
+
### `cbp-round-planner` Phase 2.6 — Mandatory DocsByPlan Pre-Read
|
|
99
139
|
|
|
100
140
|
For every entry in `context_summary.dependencies_identified[]`:
|
|
101
141
|
|
|
102
|
-
|
|
142
|
+
0. Check the **Local Docs Mirror** (Read ladder above) — a mirror hit replaces steps 1–3 for
|
|
143
|
+
that dependency; record `source: local_mirror` + files read.
|
|
144
|
+
1. On mirror miss: call `resolve_library_id(pkg)` → get `library_id` + `latest_version`.
|
|
103
145
|
2. Apply the **Mandatory Consultation Contract** above:
|
|
104
146
|
- Branch A (registered) → call `lookup_symbol` for specific APIs or `search_chunks` + `get_chunk` for broader surfaces; populate `library_docs_consulted[]`.
|
|
105
147
|
- Branch B (not registered) → AskUserQuestion; populate `vendor_overrides[]` if user picks override; otherwise block.
|
|
@@ -108,11 +150,13 @@ For every entry in `context_summary.dependencies_identified[]`:
|
|
|
108
150
|
5. Incorporate findings into the plan: API names, import paths, version constraints, known pitfalls.
|
|
109
151
|
6. Low-trust chunk (`verify_recommended: true`) → add `risks` entry and WebFetch upstream to confirm.
|
|
110
152
|
|
|
111
|
-
### `cbp-round-
|
|
153
|
+
### `cbp-round-builder` Step 3.4 — Mandatory DocsByPlan Pre-Read
|
|
112
154
|
|
|
113
155
|
Before writing any code that imports a registered library:
|
|
114
156
|
|
|
115
|
-
|
|
157
|
+
0. Check the **Local Docs Mirror** (Read ladder above) — a mirror hit replaces steps 1–3 for
|
|
158
|
+
that library; record `source: local_mirror` + files read.
|
|
159
|
+
1. On mirror miss: call `resolve_library_id(pkg)` → get `library_id`.
|
|
116
160
|
2. Apply the **Mandatory Consultation Contract** above (Branch A or B).
|
|
117
161
|
3. Branch A: call `lookup_symbol` for specific functions/options being used; call `search_chunks` + `get_chunk` for broader API surfaces. Populate `library_docs_consulted[]`.
|
|
118
162
|
4. Use the version-pinned API names from DocsByPlan, not training-memory recall.
|
|
@@ -122,8 +166,9 @@ Before writing any code that imports a registered library:
|
|
|
122
166
|
## What This File Is NOT
|
|
123
167
|
|
|
124
168
|
- Not the ingest pipeline — that is `apps/docs-ingest`.
|
|
125
|
-
- Not a directory of registered libraries —
|
|
169
|
+
- Not a directory of registered libraries — grep the mirror's `INDEX.md` or call `resolve_library_id`.
|
|
126
170
|
- Not how to register a new library — use `/cbp-add-library {pkg}`.
|
|
171
|
+
- Not how the mirror is synced — that is `codebyplan docs sync` (CLI).
|
|
127
172
|
|
|
128
173
|
This file answers one question for one audience: **"As an agent (planner or executor), how do I find and use library docs at decision time?"**
|
|
129
174
|
|
|
@@ -131,9 +176,10 @@ This file answers one question for one audience: **"As an agent (planner or exec
|
|
|
131
176
|
|
|
132
177
|
| Concern | Reference |
|
|
133
178
|
|---------|-----------|
|
|
179
|
+
| Sync the local mirror | `codebyplan docs sync` (`codebyplan docs status` for drift) |
|
|
134
180
|
| Ingest pipeline | `apps/docs-ingest` |
|
|
135
181
|
| Register a new library | `/cbp-add-library {pkg}` |
|
|
136
182
|
| MCP tool endpoint | `mcp.codebyplan.com/mcp` |
|
|
137
183
|
| Loading rule registration | `.claude/rules/context-file-loading.md` (Phase 2.6 / Step 3.4 mapping rows) |
|
|
138
|
-
| Planner integration | `packages/codebyplan-package/templates/agents/
|
|
139
|
-
| Executor integration | `packages/codebyplan-package/templates/agents/round-
|
|
184
|
+
| Planner integration | `packages/codebyplan-package/templates/agents/cbp-round-planner.md` Phase 2.6 |
|
|
185
|
+
| Executor integration | `packages/codebyplan-package/templates/agents/cbp-round-builder.md` Step 3.4 |
|
|
@@ -10,7 +10,7 @@ never-silently-skip obligations. Framework-specific commands live in each agent'
|
|
|
10
10
|
|
|
11
11
|
## Input Contract
|
|
12
12
|
|
|
13
|
-
Passed by the dispatching skill (`/cbp-round-
|
|
13
|
+
Passed by the dispatching skill (`/cbp-round-build` Step 5, `/cbp-checkpoint-check`
|
|
14
14
|
Step 5b, or `/cbp-checkpoint-plan` Step 4 discovery probe). The dispatching skill reads
|
|
15
15
|
`.codebyplan/e2e.json` and injects `framework`, `app`, `platforms`, and credential var
|
|
16
16
|
names — agents do NOT auto-detect platform; the config is authoritative.
|
|
@@ -197,7 +197,7 @@ diagnostics only — they are NOT the committed path.
|
|
|
197
197
|
| webdriverio | `{app-dir}/e2e/screenshots/webdriverio/{spec}-{state}.png` |
|
|
198
198
|
| vscode-test | `{app-dir}/e2e/screenshots/vscode/{suite}-{test}.png` (SD-3: may be empty for behavior-only extensions) |
|
|
199
199
|
|
|
200
|
-
SD-3: the vscode-test committed dir may be empty for behavior-only extensions (no visual surface); the agent must still emit `e2e_gallery: []` explicitly. `cbp-
|
|
200
|
+
SD-3: the vscode-test committed dir may be empty for behavior-only extensions (no visual surface); the agent must still emit `e2e_gallery: []` explicitly. `cbp-verify-reviewer` treats an empty `e2e_gallery[]` as allowed when `vscode-test` is the ONLY eligible framework.
|
|
201
201
|
|
|
202
202
|
**Gitignore caution**: root `.gitignore` ignores `apps/web/e2e/screenshots/`. For the `{app-dir}`-relative frameworks (xcuitest, webdriverio, vscode-test), `{app-dir}` MUST NOT resolve to `apps/web` — committed PNGs there would be silently dropped from git. Remedy: use a non-ignored subdir (e.g. `apps/web/e2e/baselines/<framework>/`). A `.gitignore` negation (`!apps/web/e2e/screenshots/<framework>/`) does NOT work — git does not recurse into an ignored parent directory, so PNGs in that subdir would be silently dropped on a fresh checkout. Maestro (repo-root `e2e/screenshots/maestro/`) is already safe.
|
|
203
203
|
|
|
@@ -212,7 +212,7 @@ No user gate required for first-run capture.
|
|
|
212
212
|
|
|
213
213
|
**EXISTING baselines that visually diff** (`is_new === false`, `baseline_diff_pct > threshold`):
|
|
214
214
|
classify as `visual_regression`. Do NOT auto-update. Surface as a blocking accept-or-fix gate
|
|
215
|
-
at `/cbp-
|
|
215
|
+
at `/cbp-verify` (round scope). The user must explicitly approve (`--update-snapshots`) or open a
|
|
216
216
|
fix task. This relaxes the prior always-manual contract ONLY for new screens.
|
|
217
217
|
|
|
218
218
|
## Screenshot Collection Rule
|
|
@@ -223,11 +223,11 @@ entry requires: `{test_name, path, page_or_screen, viewport, is_new, baseline_di
|
|
|
223
223
|
Every `e2e_gallery[]` entry requires: `{test_name, page_or_screen, framework, committed_path,
|
|
224
224
|
is_new, baseline_diff_pct}`. `committed_path` MUST be a git-tracked path after the run.
|
|
225
225
|
|
|
226
|
-
`/cbp-round-
|
|
226
|
+
`/cbp-round-build` Step 5b aggregates `e2e_gallery[]` across all specialists and stores it
|
|
227
227
|
in `round.context.e2e_gallery`. TASK-3 / checkpoint-end consumes this aggregated gallery to
|
|
228
228
|
upload images to the DB.
|
|
229
229
|
|
|
230
|
-
Screenshots flow to `cbp-frontend-ui` invoked by `/cbp-round-
|
|
230
|
+
Screenshots flow to `cbp-frontend-ui` invoked by `/cbp-round-build` Step 5b with
|
|
231
231
|
`phase: 'screenshot_review'` — NOT inline by `round-executor` Step 3.8 (which runs
|
|
232
232
|
`phase: 'style_only'` without e2e output).
|
|
233
233
|
|
|
@@ -254,7 +254,7 @@ Otherwise return `status: 'failed'`.
|
|
|
254
254
|
|
|
255
255
|
## Dispatch / Eligibility Routing Contract
|
|
256
256
|
|
|
257
|
-
The dispatching skill (`/cbp-round-
|
|
257
|
+
The dispatching skill (`/cbp-round-build` Step 5 or `/cbp-checkpoint-check` Step 5b)
|
|
258
258
|
selects one specialist per app. Config is in `.codebyplan/e2e.json` (authoritative).
|
|
259
259
|
|
|
260
260
|
| `framework` in e2e.json | Agent spawned | Typical app type |
|
|
@@ -284,7 +284,7 @@ An agent is NOT spawned when ANY of the following hold:
|
|
|
284
284
|
**Multi-app monorepos**: the dispatching skill reads `e2e.json` per app path and may
|
|
285
285
|
spawn multiple specialists in the same round (one per eligible framework). Agents run in
|
|
286
286
|
parallel with `cbp-testing-qa-agent`. Each specialist's output is stored under
|
|
287
|
-
`round.context.e2e_outputs[framework]` (a framework-keyed map); `/cbp-round-
|
|
287
|
+
`round.context.e2e_outputs[framework]` (a framework-keyed map); `/cbp-round-build` Step 5b
|
|
288
288
|
aggregates `screenshots[]` and `e2e_gallery[]` across all entries before the
|
|
289
289
|
`cbp-frontend-ui` review. The aggregated `e2e_gallery[]` is persisted separately to
|
|
290
290
|
`round.context.e2e_gallery` for consumption by TASK-3 / checkpoint-end.
|
|
@@ -294,7 +294,7 @@ Step 4): pass `round_number: 0`, `whole_checkpoint_mode: true`, and the aggregat
|
|
|
294
294
|
`files_changed` union. The agent ignores `prior_round_files_changed` in this mode.
|
|
295
295
|
|
|
296
296
|
This contract is the single source of truth for dispatch logic. Config-driven dispatch is
|
|
297
|
-
implemented in `/cbp-round-
|
|
297
|
+
implemented in `/cbp-round-build` Step 5 and `/cbp-checkpoint-check` Step 5b (CHK-145); the
|
|
298
298
|
routing table above is the authoritative reference those gates match. Enforcement (the
|
|
299
299
|
`e2e_eligible_skipped` hard-fail and the no-in-spec-env-skip gate) lives in
|
|
300
300
|
`rules/e2e-mandatory.md`.
|
|
@@ -353,4 +353,4 @@ a loop, snapshot text/href BEFORE navigation rather than holding stale `Locator`
|
|
|
353
353
|
|---|---|
|
|
354
354
|
| No baseline (new screen, `is_new: true`) | Playwright creates on first run; auto-committed; `git add` runs; `e2e_gallery[].is_new: true`; `cbp-frontend-ui` Step 5b reviews semantically. No user gate. |
|
|
355
355
|
| Baseline exists, diff ≤ threshold | Test passes; `is_new: false`; `baseline_diff_pct` recorded. |
|
|
356
|
-
| Baseline exists, diff > threshold | `visual_regression` failure; `is_new: false`. Agent does NOT retry. `cbp-frontend-ui` Step 5b flags it; `/cbp-
|
|
356
|
+
| Baseline exists, diff > threshold | `visual_regression` failure; `is_new: false`. Agent does NOT retry. `cbp-frontend-ui` Step 5b flags it; `/cbp-verify` (round scope) constructs user QA item. User decides: fix-task or `--update-snapshots`. |
|