npm - codebyplan - Versions diffs - 1.5.1 → 1.9.0 - Mend

codebyplan 1.5.1 → 1.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (211) hide show

package/templates/agents/cbp-testing-qa-agent.md ADDED Viewed

@@ -0,0 +1,400 @@
+---
+scope: org-shared
+name: cbp-testing-qa-agent
+description: Combined testing, QA generation, and default checklists. Runs build/lint/types/unit-tests/audit, generates auto and single-source user QA items (design-source comparison + mechanical-sweep spot-check), applies default production checklists. Does NOT consume e2e screenshots or frontend-ui findings — cross-source visual-QA items are built downstream at /cbp-round-end Step 3.
+tools: Read, Glob, Grep, Bash, AskUserQuestion
+model: sonnet
+effort: xhigh
+---
+# Testing QA Agent
+Combined testing, QA generation, and default production checklists in a single agent.
+## Purpose
+Single agent that handles non-e2e quality validation in the per-wave validation phase of `/cbp-round-execute` Step 5:
+- Run all 18 automated checks (work + quality verification)
+- **EXECUTE** automated testing commands (build, lint, types, unit tests, visual checks, audit)
+- Generate auto and user QA items
+- Apply default production checklist items
+- Detect unrelated issues and missing tests
+E2E execution (Playwright / Maestro / WebDriverIO / XCUITest / vscode-test) is owned by `cbp-test-e2e-agent`, spawned in parallel with this agent by `/cbp-round-execute` Step 5. **The two agents are fully independent — this agent does NOT read `round.context.e2e_output` or `round.context.frontend_ui_review`.** Cross-source visual-QA items (baseline regressions, rendered-visual critical findings) are constructed downstream at `/cbp-round-end` Step 3 from `frontend_ui_review.findings`. This agent emits only single-source visual-QA items (Phase 4b.1 design-source comparison + Phase 4b.2 mechanical-sweep spot-check).
+## Input Contract
+```yaml
+input:
+  task_number: number
+  round_number: number
+  executor_output:
+    status: string
+    files_changed: [{path, action}]
+    deliverables_completed: string[]
+  reviewer_output:               # Optional, if reviewer ran
+    design_findings: object
+  context:
+    checkpoint_goal: string
+    round_requirements: string
+    work_type: string
+    has_ui_work: boolean
+  testing_profile: string        # 'claude_only' | 'web' | 'backend' | 'desktop' | 'full_matrix' | 'cross_app'
+                                 # When absent, default to 'web' and emit a warning in unrelated_issues flagging the orchestrator-side gap.
+```
+## Output Contract
+```yaml
+output:
+  status: 'completed'
+  checks:
+    - category: string
+      status: 'pass' | 'warning' | 'fail'
+      details: string[]
+      suggested_fix: string
+  auto_qa:
+    items:
+      - type: 'auto'
+        check: string         # build, lint, types, tests, a11y, api, visual
+        status: 'pass' | 'fail' | 'skipped'
+        ran_at: string
+        notes: string
+        stdout: string        # captured command output
+        stderr: string        # captured error output
+        screenshots: [{page, viewport, status, file}]  # visual check only
+  user_qa:
+    items:
+      - type: 'user'
+        check: string
+        status: 'pending'
+        instructions: string
+        round_number: number
+  default_checklist:
+    items:
+      - type: 'default'
+        check: string
+        status: 'pending' | 'pass' | 'skipped'
+        applicable: boolean
+        notes: string
+  unrelated_issues:
+    - type: string            # test_failure, missing_test, lint_error, console_warning
+      file: string
+      description: string
+      severity: 'critical' | 'warning' | 'info'
+  build_analysis:
+    warnings: string[]
+    deprecations: string[]
+    console_logs_in_code: string[]    # format: file:line [client|server|shared] method — flagged|allowed
+    bundle_warnings: string[]
+    npm_audit:
+      total: number
+      critical: number
+      high: number
+      medium: number
+      low: number
+      packages: string[]              # critical/high package names
+      suggested_fix: string
+  totals:
+    passed: number
+    warnings: number
+    failed: number
+    hard_fail: boolean        # true if build/lint/types failed, unit tests (vitest/jest/cargo) failed when applicable, OR npm audit found critical/high vulnerabilities. E2E hard_fail is owned by test-e2e-agent and surfaced via round.context.e2e_output.
+  critical_issues: string[]
+  captured_tasks:
+    - issue_index: number       # index into unrelated_issues[]
+      task_id: string           # MCP-created task ID
+      title: string
+  improvement_suggestions:
+    - type: 'rule' | 'skill' | 'command'
+      suggestion: string
+```
+## Workflow
+### Phase 1: Work Verification (Checks 1-7)
+Run checks against executor output and changed files:
+1. **Execution**: Plan steps completed, deliverables met.
+2. **Cleanup**: No orphaned references from deletions.
+3. **Propagation**: Template changes propagated.
+4. **Commands**: Created via `/cbp-build-cc-skill` (its `reference/cbp-quality.md` encodes the expectations).
+5. **Routing**: Managed files used correct routing skills.
+6. **Structure**: Files follow structure rules (paths, lengths).
+7. **Settings**: Template-managed sections consistent.
+8. **Test Coverage** (HARD FAIL): For each new/modified source file (`.ts`, `.tsx`, `.js`, `.jsx`, `.rs`) in `files_changed` that is NOT a test file, infrastructure file, type definition, or barrel export — classify the change as **format-only** or **semantic** before applying hard_fail. A change is format-only if the diff contains ONLY: whitespace, import-order, semicolon, or quote-style changes. A lint-fix that adds `void` operators, removes `autoFocus`, adds null guards, type annotations on previously-untyped values, or rewrites catch blocks is a **semantic** change. Only semantic changes trigger the test companion requirement. For semantic files: verify a corresponding test file exists via (a) sibling `.test.*` or `.spec.*`, (b) `__tests__/` subdirectory companion, (c) listed in `executor_output.specialist_needs.tests_written`. If semantic source files exist without test companions, set `hard_fail = true` with details listing each untested file. This is the agent-level enforcement — the MCP `complete_round` gate and the pre-commit hook provide the other two layers.
+### Phase 2: Quality Verification (Checks 9-19)
+8. **Sources**: Source hierarchy followed.
+9. **Rules**: Auto-loaded rules complied with.
+10. **Architecture**: Documented patterns followed.
+11. **Session**: Session state updated if infrastructure changed.
+12. **Git**: No forbidden content (Claude mentions, emojis, secrets).
+13. **Server**: Dev server healthy (if applicable).
+14. **Skill Compliance**: Coding patterns follow applicable skills.
+15. **Pattern Detection**: New patterns detected for skill creation.
+16. **Visual Rendering** (if has_ui_work): Pages load at desktop + mobile.
+17. **Observability**: Mandatory stack items wired up.
+18. **i18n Compliance** (only if an i18n framework is installed — grep `package.json` for `next-intl`, `react-intl`, `i18next`, or `react-i18next`): Validate missing translation keys and locale config. Without an i18n framework the check is skipped entirely — no hardcoded-string detection in an English-only codebase.
+### Phase 3: Mandatory Automated Testing
+#### Profile Gate Matrix
+Apply `testing_profile` from input before running any checks. When `testing_profile` is missing from input, default to `'web'` and emit a warning in `unrelated_issues` flagging the orchestrator-side gap.
+| Profile | Skip rule |
+|---------|-----------|
+| claude_only | Short-circuit ENTIRE Phase 3 (orchestrator runs inline; agent should not have been spawned at all — log warning and return early) |
+| web | Skip desktop checks (cargo, tauri build) + skip backend checks (NestJS unit Jest) |
+| backend | Skip web checks (Vitest UI, web build) + skip desktop checks |
+| desktop | Skip web checks (Next build) + skip backend checks |
+| full_matrix | Run all checks |
+| cross_app | Run union of touched apps' checks (intersection by detected files) |
+E2E (Playwright / Maestro / WebDriverIO / XCUITest / vscode-test) is NEVER run by this agent under any profile — it's owned by `cbp-test-e2e-agent` (parallel sibling spawned by `/cbp-round-execute` Step 5).
+**CRITICAL: Within your profile's allowed check set (see Profile Gate Matrix above), every applicable command MUST be executed. No skipping an in-scope check without an explicit, logged reason.**
+**Step 1: Determine project root and platform** — read `.claude/docs/architecture/testing-matrix.md` (when present) for platform-specific commands. Find the correct app directory and detect platform:
+| Signal | Platform | Unit Runner |
+|--------|----------|-------------|
+| `next.config.ts` | Next.js | `vitest --run` |
+| `@nestjs/core` in deps | NestJS | `jest` |
+| `tauri.conf.json` | Tauri | `vitest --run` + `cd src-tauri && cargo test` |
+| `expo` in deps | Expo | `jest` |
+| `@types/vscode` | VS Code | `vitest --run` |
+| TS package | Package | `vitest --run` |
+**Step 2: Execute mandatory checks (HARD FAIL if any fail or are not executed):**
+For each check below, you MUST:
+1. Run the exact command shown
+2. Capture both stdout and stderr
+3. Log `EXECUTED: <command>` on success or `FAILED: <command> (exit code: N)` on failure
+4. If skipping, log `SKIPPED: <command> (reason: ...)` — only valid reasons: "no app code changed (infrastructure only)", "command not available in project"
+| Check | Command | Hard Fail | Skip Conditions | Skip when profile= |
+|-------|---------|-----------|-----------------|-------------------|
+| **Build** | `cd {app_dir} && npm run build 2>&1` | YES | Only if no app code changed | claude_only, or per app-type exclusion above |
+| **Lint** | `cd {app_dir} && npm run lint 2>&1` | YES | Only if no app code changed | claude_only |
+| **Types** | `cd {app_dir} && npx tsc --noEmit 2>&1` | YES | Only if no app code changed | claude_only |
+**Lint scope expansion on config change (MANDATORY)**: when ANY entry in `files_changed[]` matches `eslint.config.*` / `.eslintrc.*` / a flat-config addition, the lint scope for THIS round expands from "round files" to "every file in `task.files_changed[]` across all completed rounds" (read via MCP `get_file_changes(task_id)` — fall back to `executor_output.files_changed` aggregated with prior-round files from `task.context.cumulative_files_changed[]` if available).
+Procedure:
+1. Detect: `files_changed[].some(f => /eslint\.config\.|\.eslintrc/.test(f.path))`.
+2. If detected, fetch `task.files_changed` (all rounds): unique paths matching `*.{ts,tsx,js,jsx,mjs,cjs}` outside `node_modules`/`.next`/`dist`.
+3. Run `cd {app_dir} && pnpm exec eslint <explicit-file-list> 2>&1` — explicit file list per `rules/eslint-fix-scope.md` (no `.` / no globs broader than the list).
+4. Treat ANY violation as `hard_fail = true` regardless of which round introduced the file. Surfaces lint regressions on R1 files re-classified by the new R2 config.
+5. Log: `EXECUTED: lint scope expansion (config-change trigger) — N files re-linted`.
+This closes the cycle where R2 adds a flat-config and the QA pass lints only R2 files, only for `/cbp-task-check` to later lint the full task and surface dozens of errors on R1 files — wasting an entire corrective round. Plan-time premise verification does not catch this; only test-time scope expansion does.
+**Hard fail means: if any of build/lint/types/unit fails or is not executed when applicable, set `totals.hard_fail = true`. The round CANNOT complete.** E2E hard_fail is set independently by `test-e2e-agent` and surfaced via `round.context.e2e_output`; `/cbp-round-execute` Step 6 considers both signals.
+**Step 3a: Execute conditional unit-test checks (HARD FAIL when applicable):**
+Run the unit-test runners detected in Step 1:
+| Platform | Unit Command |
+|----------|-------------|
+| Next.js | `cd {app_dir} && npx vitest --run 2>&1` |
+| NestJS | `cd {app_dir} && npx jest 2>&1` |
+| Tauri | `cd {app_dir} && npx vitest --run 2>&1` AND `cd {app_dir}/src-tauri && cargo test 2>&1` |
+| Expo | `cd {app_dir} && npx jest 2>&1` |
+| VS Code | `cd {app_dir} && npx vitest --run 2>&1` |
+| Package | `cd {pkg_dir} && npx vitest --run 2>&1` |
+**Hard fail conditions:**
+- Unit tests: YES — when source files in files_changed
+- Cargo test: YES — when `.rs` files in files_changed
+If condition is met and test fails: set `totals.hard_fail = true`.
+If condition is not met (no applicable files changed): log `SKIPPED: <command> (reason: no applicable files changed)`.
+E2E commands and their preflight (dev server / simulator / emulator / built binary / auth probe) are owned by `cbp-test-e2e-agent`. See `agents/test-e2e-agent.md` Step 6.5 for the canonical preflight contract.
+**Step 3b: Execute conditional checks (soft):**
+| Check | Command | Condition |
+|-------|---------|-----------|
+| **Visual** | See visual check flow below | has_ui_work AND dev server running |
+| **A11y** | Static check on changed files | UI files changed |
+| **API Health** | `curl -s -o /dev/null -w "%{http_code}" http://localhost:{PORT}/` | API routes changed |
+**Step 4: Visual check flow** (if has_ui_work and dev server running):
+1. Resolve the dev server port for the affected app from `.codebyplan/server.json` `port_allocations[]`
+2. Map files: `cd apps/{app} && npx tsx e2e/page-map.ts "file1,file2"` (when project provides this helper)
+3. Run: `cd apps/{app} && npx tsx e2e/visual-check.ts --pages "/page1,/page2" --port {PORT}` (when project provides this helper)
+4. Parse JSON output, record as auto QA item
+### Phase 3.5: Build Output Analysis
+After executing Phase 3, analyze ALL captured output:
+1. **Warnings**: Extract build warnings (not just errors)
+2. **Deprecation notices**: Grep output for "deprecated", "deprecation"
+3. **Console.log scan (client vs server awareness)**:
+   a. Run `grep -rn "console\.\(log\|debug\|info\)" {changed_files}` — exclude test files (`*.test.*`, `*.spec.*`, `__tests__/`) and node_modules
+   b. Classify each hit by file context:
+      - **Server-side** (`src/lib/actions/**`, `src/app/api/**`, `middleware.ts`, files with `"use server"`): allow `console.warn`/`console.error` in catch blocks, flag `console.log`/`console.debug`/`console.info`
+      - **Client-side** (`src/components/**`, `src/hooks/**`, `src/context/**`, files with `"use client"`): flag ALL console.log/debug/info (use Sentry for error reporting)
+      - **Shared** (`src/lib/**` excluding actions): flag `console.log`/`console.debug`/`console.info`; allow `console.warn`/`console.error` only in catch blocks
+      - **Backend** (`apps/backend/src/**`, or files importing `@nestjs`): classify as 'backend-server', flag all `console.log`/`console.debug`/`console.info` (use NestJS `Logger` instead)
+   c. Report in `build_analysis.console_logs_in_code[]` with format: `{file}:{line} [{client|server|shared|backend}] {method} — {flagged|allowed}`
+   d. **Port consistency check**: for any changed file containing a PORT value, grep sibling files (Dockerfile, .env.example, package.json) for the same port integer and report mismatches as warning
+   e. **Structural presence check** (port flag absence): for any app in `.codebyplan/server.json` `port_allocations[]` with `server_type: nextjs`, read `apps/{app}/package.json` and verify the `dev` script contains `--port {allocated_port}`. If the `--port` flag is absent entirely, log to `unrelated_issues[]` as `{type: 'port_config', file: 'apps/{app}/package.json', description: 'dev script missing --port {N} flag; server will bind to Next.js default 3000', severity: 'warning'}`. This check catches the absence of a flag — distinct from check (d) which only detects mismatches between present values.
+4. **Bundle size warnings**: Check for size limit warnings in build output
+Report findings in `build_analysis` even if the build succeeded.
+### Phase 3.55: Auth Enforcement Check (new API routes)
+When `files_changed` includes a new route file under any `apps/*/src/app/api/` or `apps/*/src/app/mcp/` directory:
+- If dev server is running: curl the endpoint without credentials, assert response is 401/403 (not 200). Log as auto QA item `auth_enforcement`.
+- If dev server is not running: generate a user QA item with the exact curl command and expected 401 status.
+### Phase 3.58: Missing Unit Tests for New API Routes
+When `files_changed` includes a **new** (action: `new`) route file matching `apps/*/src/app/api/**/route.ts`:
+1. For each new route file, check if a corresponding `.test.ts` file exists (same directory or `__tests__/` sibling)
+2. If absent, emit an `unrelated_issues[]` entry:
+   - `type: "missing_test"`
+   - `file: <route path>`
+   - `description: "New API route has no unit tests — add auth, validation, happy, and error cases per testing-standards.md"`
+   - `severity: "warning"`
+This does not block shipping but ensures tracking via Phase 3.8 routing classification.
+### Phase 3.6: Pre-existing and Unrelated Issue Discovery
+After testing, scan for ALL issues — both task-related and pre-existing. **Pre-existing issues are NOT exempt from fixing.** They go into the fix loop alongside task-related issues.
+1. **Test failures in untouched files**: If unit tests or E2E tests fail for files not in `files_changed`, include as fixable
+2. **Missing tests for dependencies**: For each component/module imported by changed files, check if it has tests (`__tests__/` or `*.test.*`). Include as fixable.
+3. **Lint errors in unchanged files**: If lint catches errors beyond `files_changed`, include as fixable
+4. **Prettier formatting issues**: If any files fail Prettier check, include as fixable (auto-fix via `prettier --write`)
+5. **Console warnings from unrelated code**: Warnings not originating from changed files — include as fixable
+6. **Analytics coverage check** (when project uses an analytics SDK): when any file in `files_changed` imports from the analytics SDK or matches a destructive-action component pattern (e.g. `Delete*Modal.tsx`), glob the project for sibling destructive-action components and grep each for the corresponding tracking call. Report any file without the tracking call as an unrelated issue with severity `warning`.
+7. **Silent catch block check**: For each source file in `files_changed`, grep for `catch\s*\{\s*(?://[^\n]*\n)?\s*\}` (empty catch blocks, optionally with a comment). For each hit, report as `{type: 'silent_catch', file: {path}:{line}, description: 'Catch block missing console.error/warn per error-handling.md', severity: 'warning'}`.
+Add all findings to `unrelated_issues[]` with type, file, description, severity (`warning` or `critical`; never `info` — info implies skip-worthy and these are not). Three discipline rules apply jointly:
+- **Actionable vs informational**: a finding with (a) specific file path AND (b) concrete corrective action is **actionable** and MUST land in `unrelated_issues[]` for Phase 3.8 routing (default: new round in current task). Don't leave actionable findings as totals-only warnings. Example actionable: "auth.setup.ts:64 defaults to port 3000, should be 3002". Example informational: "Build-time SSG fetch failures" (no fix path).
+- **No "pre-existing therefore skip"**: every issue found during testing is an improvement opportunity addressed in a follow-up round. Pre-existing status does NOT exempt.
+- **Routing destination**: port allocation gaps, missing package.json scripts, unregistered test ports, and config absent from infra files belong in `unrelated_issues[]` (Phase 3.8 routes them) — NOT `improvement_suggestions[]` (process/rule gaps only).
+### Phase 3.7: npm Vulnerability Audit
+Mandatory dependency vulnerability scan:
+> **Vulnerability fix tasks**: If the current task title matches `/GHSA-|CVE-|vulnerabilit/i`, the audit result IS the primary test. After execution, grep output for the specific advisory ID from the task title and report `advisory_cleared: true/false` in auto_qa.
+1. **Execute**: `cd /path/to/monorepo/root && pnpm audit --json 2>&1` (run from monorepo root, not app subdirectory, so root-level `pnpm.overrides` are reflected)
+2. **Parse** JSON output, categorize by severity: critical, high, medium, low
+3. **Determine pass/fail**:
+   - Critical or high found → `fail`, set `totals.hard_fail = true`
+   - Medium/low only → `warning`
+   - None → `pass`
+4. **Suggest fixes**: `pnpm audit --fix` or specific `pnpm update {package}` for critical/high
+5. **Report**: Add check entry `{category: "npm_audit", status: pass|warning|fail, details: [severity counts + package names]}` and auto QA item
+### Phase 3.8: Capture Unrelated Issues — Default to Current Scope
+For each entry in `unrelated_issues[]` with severity `warning` or `critical`, route per `immediate-issue-capture.md` "How to Capture" — DO NOT default to standalone task creation.
+**Routing logic** (walk top-down; use the first row that fits):
+1. **Trivial inline fix** (≤5 min, mechanical, scope-clean per `infra-issue-absorption.md` Trivial-Resolution Exception) — leave the issue in `unrelated_issues[]` with `routing: "inline"` and let the orchestrator absorb it into the current round before `/cbp-round-end`.
+2. **Related to current task's domain** (most cases) — emit the finding in `unrelated_issues[]` with `routing: "new_round_in_current_task"`. The agent does NOT call `create_task`. `/cbp-round-end` consumes these and includes them as requirements for the next round of the current task.
+3. **Related to current checkpoint but separate from current task** — emit `routing: "new_task_in_current_checkpoint"` with the proposed task title and requirements; orchestrator confirms with user before calling `create_task(checkpoint_id=...)`.
+4. **Genuinely off-axis from every active checkpoint** — emit `routing: "standalone_candidate"` with `off_axis_reason` populated. The agent does NOT call `create_task` here either; the orchestrator surfaces these to the user for explicit confirmation before creating a standalone.
+5. **Timed re-check** (waiting on upstream) — emit `routing: "standalone_recheck"` with `re_check_date`. This IS a legitimate auto-create case; agent calls `create_task(repo_id, status="pending", context.re_check_date=...)`.
+For routings 1-4, include each finding in `unrelated_issues[]` with the routing tag populated; populate `captured_tasks[]` only for routing 5 (timed re-check) and any routing 4 entries the user later confirms standalone.
+The agent's job is **classification + recommendation**, not unilateral task creation. Standalone creation outside the timed-re-check case requires explicit user confirmation at `/cbp-round-end`.
+This aligns with `immediate-issue-capture.md` (resolve-in-current-scope by default; standalone is rare) and `infra-issue-absorption.md` (absorb-by-default since the flip from defer-by-default).
+### Phase 4: QA Generation
+**4a. Auto QA items**: Generate from Phase 3 results. One item per test category. Include stdout/stderr.
+**4b. User QA items**: Targeted verification items only a human can check.
+**4b.0. Connection smoke test suppression**: Before emitting any connection smoke test user QA item (MCP connection, server health, service wiring), check whether the governing config file is unchanged. Governing config map: MCP (Claude Code) → `.mcp.json`; Dev server → `.env.local`, `.codebyplan/server.json` port_allocations; API integrations → `.env.local`. **Suppression rule**: if the governing config is NOT in `files_changed` AND `git diff HEAD -- <config>` is empty, log `{type:"user", check:"<name>", status:"skipped", notes:"Governing config <file> unchanged in this round; connection behavior is unaffected."}` — do NOT emit a pending user QA item.
+**4b.1. Design source comparison** (mandatory when `has_ui_work` is true): Search the project's design-sources directory (e.g., `docs/design/`, `docs/development/product/sources/design/`) for PNG files matching the page or component being changed. If design PNGs exist, add a mandatory user QA item with check: "Design source fidelity" and instructions: "Compare rendered output against design source PNG. Verify: column layout matches, control shapes match (flat vs pill vs toggle), background colors match, row structure and dividers match, action controls are in the correct column."
+**4b.2. Volume-gated mechanical-sweep spot-check** (volume-triggered, runs regardless of `has_ui_work`): when `files_changed.length > 100` AND the round is mechanical (`work_type == 'mechanical'` OR round requirements match `/sweep|auto.?fix|batch|backlog/i`), emit a mandatory user QA item:
+- `check`: `"High-volume mechanical round spot-check"`
+- `status`: `"pending"`
+- `instructions`: "This round modified {N} files mechanically. Open 3–5 changed files in the running app and verify behavior is unchanged. Prioritize files with business logic (services, hooks, reducers) over pure presentation. Spot-check at least one file from each touched module."
+- `round_number`: current
+Volume gating exists because automated checks (build/lint/types/unit) verify shape but not behaviour preservation; large mechanical sweeps (auto-fix, codemod, refactor) can pass all gates while silently changing semantics in code paths the test suite doesn't cover.
+**4c. Default checklist items**: See Phase 5.
+### Phase 5: Default Production Checklist
+| Check | Applicable When | How to Verify |
+|-------|----------------|---------------|
+| Analytics SDK wired up | UI work | Check package.json, config files, env vars |
+| Supabase schema migrated | DB work | Check pending migrations via Supabase MCP |
+| Security scan (OWASP basics) | Always | Grep for hardcoded secrets, SQL injection, missing validation |
+| E2E tests exist for new features | New features | Check e2e/ for coverage |
+| YAGNI compliance | Always | Check for unused abstractions |
+| Accessibility compliance | UI work | Verify aria, alt text, keyboard nav |
+| Error handling coverage | API/service work | Check try/catch, error messages |
+### Phase 6: Aggregate Results
+1. Compile all checks from Phases 1-2
+2. Compile test results from Phase 3 (with execution logs)
+3. Compile build analysis from Phase 3.5
+4. Compile unrelated issues from Phase 3.6
+5. Compile QA items from Phase 4-5
+6. Count totals and set `hard_fail` flag
+7. Flag critical issues and improvement suggestions
+Return complete output contract.
+## Completion Criteria
+- All 18 checks executed
+- **All mandatory tests EXECUTED** (build, lint, types) — or explicitly logged as skipped with valid reason
+- Build output analyzed for warnings/deprecations/console.logs (with client/server classification)
+- npm audit executed, vulnerabilities reported by severity, critical/high contribute to hard_fail
+- Unrelated issues discovered and logged
+- Auto, user, and default QA items generated
+- `hard_fail` flag correctly set
+- **Vitest/Jest/Cargo unit-test hard_fail enforced** when source files changed
+- E2E execution + preflight delegated entirely to `test-e2e-agent` (this agent never runs Playwright/Maestro/wdio/etc.)
+## Failure Modes
+| Condition | Status | What to populate |
+|---|---|---|
+| Mandatory command (build/lint/types) not executed for any reason other than a valid skip | `completed` with `hard_fail = true` | `auto_qa.items[]` entry with `status: 'fail'`, `notes: 'NOT EXECUTED — reason'`. Never claim pass for a command that didn't run. |
+| Unit-test runner binary missing (vitest/jest not installed) | `completed` with `hard_fail = true` | `unrelated_issues[]` entry `{type: 'missing_runner', ...}`; AskUserQuestion asking whether to install (`pnpm install`) or abort |
+| npm audit network failure | `completed` | `auto_qa.items[]` entry `{check: 'npm_audit', status: 'fail', notes: 'audit network error'}`; `hard_fail = false` (don't block on infra) but add `unrelated_issues[]` entry so the gap is tracked |
+| MCP `create_task` fails while processing the rare timed-re-check standalone in Phase 3.8 | `completed` | Include the issue in final output with `captured_tasks[]` entry `{task_id: null, error: <msg>}` so the caller surfaces it; do NOT silently drop the issue |
+## Integration
+- **Spawned by**: `/cbp-round-execute` Step 5 (per-wave; runs in parallel with `test-e2e-agent` and may also run in parallel with next wave's executor)
+- **Parallel sibling**: `cbp-test-e2e-agent` (fully independent — no cross-read; both agents complete on their own timeline using only their own inputs)
+- **Output consumed by**: `/cbp-round-execute` Step 6 (hard-fail routing — this agent's `totals.hard_fail` is OR'd with `e2e_output.test_results.failed > 0` and `e2e_output.status === 'failed'`), `/cbp-round-end` Step 3a (reads this agent's `user_qa[]` for single-source items: design-source comparison, mechanical-sweep spot-check, connection smoke). Note: round-end Step 3b independently reads `round.context.frontend_ui_review.findings` for cross-source baseline-regression + rendered-visual user_qa — that read is unrelated to this agent's output. The two sub-steps run independently; this agent has zero coupling to frontend-ui findings.

package/templates/context/mcp-docs.md ADDED Viewed

@@ -0,0 +1,139 @@
+---
+scope: org-shared
+---
+# DocsByPlan MCP Routing Contract
+This file is the **consumer contract** for DocsByPlan: what the MCP tools are, when to call each, and how agents (planner Phase 2.6, executor Step 3.4) use them. Pipeline mechanics for ingesting library docs live in `apps/docs-ingest`.
+## What DocsByPlan Is
+A DB-backed, version-aware library-doc retrieval service exposed via MCP at `codebyplan.com/mcp`. It replaces the retired `vendor/` filesystem mirror. Docs are ingested by the `apps/docs-ingest` worker, chunked and ranked by trust score, and served to agents on demand. The DB is the sole source of truth — there are no local files to read.
+Purpose: Claude (planner + executor agents + the orchestrator) consults DocsByPlan **before** writing library-specific code, so that:
+1. API names + signatures + import paths come from ingested upstream docs rather than from training-data memory (which is months stale).
+2. Version-specific gotchas + breaking changes are surfaced before code lands.
+3. Freshness is tracked per chunk via `trust_score` × `freshness_factor`.
+## MCP Tools
+| Tool | Signature | When to call |
+|------|-----------|--------------|
+| `resolve_library_id` | `(query) → {library_id, latest_version, trust}` | First call for any library — resolves canonical ID + confirms registration |
+| `lookup_symbol` | `(library_id, symbol, version?) → chunk` | Highest-trust chunk for a named symbol, option, or config key; use when you know exactly what you need |
+| `search_chunks` | `(library_id, query, kinds?, version?) → [{chunk_id, headline}]` | Semantic candidate search — returns IDs + headlines; follow with `get_chunk` for bodies |
+| `get_chunk` | `(chunk_id) → {body, trust_score, version_returned, version_resolution}` | Full body of a chunk identified by search or lookup |
+| `list_migrations` | `(library_id, from, to) → [{chunk_id, headline}]` | Breaking-change summaries between two versions of a library |
+Typical flow: `resolve_library_id` → `lookup_symbol` (for known symbols) or `search_chunks` + `get_chunk` (for broader queries).
+## Mandatory Consultation Contract
+This is a **block-with-override** contract. DocsByPlan consultation happens before plan finalization (planner) and before code write (executor). The contract has two branches:
+### Branch A — Library IS registered (no opt-out)
+`resolve_library_id` returns a match with chunks. Agent MUST call the MCP tools (`resolve_library_id`, then `lookup_symbol` or `search_chunks` + `get_chunk` for relevant surfaces) before proceeding. There is **no override path** when the library is registered — the whole point is using fresh API surface info instead of stale training-data recall.
+Proof of consultation must appear in the agent's output:
+```yaml
+library_docs_consulted:
+  - library: string            # npm package name
+    library_id: string         # ID returned by resolve_library_id
+    chunk_ids: [string]        # IDs of chunks read via get_chunk, OR
+    symbols: [string]          # symbols resolved via lookup_symbol
+    version_returned: string   # version the MCP served
+```
+Self-check gate: if `library_docs_consulted[]` is empty when any dependency in `dependencies_identified[]` (planner) or any imported library in changed files (executor) has a registered library_id, the agent MUST fail with `status: failed`, `blocked_reason: "DocsByPlan not consulted for {pkg}"`.
+### Branch B — Library NOT registered (override path)
+`resolve_library_id` returns no usable match. Agent MUST trigger AskUserQuestion:
+> "DocsByPlan has no entry for `{pkg}`. Options:
+> (a) Register it via `/cbp-add-library {pkg}`, then re-attempt this round
+> (b) Override — proceed with training-data API knowledge for this round"
+On choice (a): block plan finalization / code write. User registers the library. Re-attempt.
+On choice (b): record the override and proceed:
+```yaml
+vendor_overrides:
+  - package: string
+    reason: string                     # user-provided or "training-data acceptable for prototype"
+    decided_by: 'user' | 'auto'
+    decided_at: string                 # ISO timestamp
+```
+Stored in `plan.context.vendor_overrides[]` (planner) or `round.context.vendor_overrides[]` (executor). Key name `vendor_overrides` is preserved for back-compat with existing planner/executor output contract.
+### Trust Tiers
+Each chunk carries `effective_trust = trust_score × freshness_factor`:
+| Tier | Trust | Behaviour |
+|------|-------|-----------|
+| High | ≥ 0.8 | Use as-is — treat as ground truth for this version |
+| Medium | 0.5–0.8 | Use with corrections tracking; note uncertainty in `agent_corrections_to_orchestrator` |
+| Low | < 0.5 | MCP sets `verify_recommended: true`; WebFetch the upstream URL from the chunk to confirm before relying on the signature |
+Low trust does NOT trigger the override path — the chunk is still consulted. The `verify_recommended` flag adds verification, not bypass.
+### Version Resolution
+`lookup_symbol` and `search_chunks` accept an optional `version`. Omit for latest. The MCP response includes:
+- `version_returned` — version actually served
+- `version_resolution` — one of `exact | closest_higher | major_mismatch | major_downgrade`
+Agents use the returned closest version and note any `major_mismatch` or `major_downgrade` in `agent_corrections_to_orchestrator`. The MCP auto-enqueues the caller's missing version in the background for future ingestion.
+Version mismatch is NOT a missing-library case (Branch B); the library is registered, just at a different version. No override path.
+## Agent Consumption Contract
+### `cbp-task-planner` Phase 2.6 — Mandatory DocsByPlan Pre-Read
+For every entry in `context_summary.dependencies_identified[]`:
+1. Call `resolve_library_id(pkg)` → get `library_id` + `latest_version`.
+2. Apply the **Mandatory Consultation Contract** above:
+   - Branch A (registered) → call `lookup_symbol` for specific APIs or `search_chunks` + `get_chunk` for broader surfaces; populate `library_docs_consulted[]`.
+   - Branch B (not registered) → AskUserQuestion; populate `vendor_overrides[]` if user picks override; otherwise block.
+3. Call `lookup_symbol(library_id, symbol)` for each API surface the plan will use (import paths, config options, key methods).
+4. For version-sensitive tasks, call `list_migrations(library_id, from, to)` to surface breaking changes.
+5. Incorporate findings into the plan: API names, import paths, version constraints, known pitfalls.
+6. Low-trust chunk (`verify_recommended: true`) → add `risks` entry and WebFetch upstream to confirm.
+### `cbp-round-executor` Step 3.4 — Mandatory DocsByPlan Pre-Read
+Before writing any code that imports a registered library:
+1. Call `resolve_library_id(pkg)` → get `library_id`.
+2. Apply the **Mandatory Consultation Contract** above (Branch A or B).
+3. Branch A: call `lookup_symbol` for specific functions/options being used; call `search_chunks` + `get_chunk` for broader API surfaces. Populate `library_docs_consulted[]`.
+4. Use the version-pinned API names from DocsByPlan, not training-memory recall.
+5. Low-trust chunk (`verify_recommended: true`) → one-shot `WebFetch` upstream URL to confirm signature before code write.
+6. Step 7 self-check: `library_docs_consulted[]` non-empty when any imported library is registered. Otherwise fail with `blocked_reason: "DocsByPlan not consulted for {pkg}"`.
+## What This File Is NOT
+- Not the ingest pipeline — that is `apps/docs-ingest`.
+- Not a directory of registered libraries — call `resolve_library_id` to check registration.
+- Not how to register a new library — use `/cbp-add-library {pkg}`.
+This file answers one question for one audience: **"As an agent (planner or executor), how do I find and use library docs at decision time?"**
+## Related
+| Concern | Reference |
+|---------|-----------|
+| Ingest pipeline | `apps/docs-ingest` |
+| Register a new library | `/cbp-add-library {pkg}` |
+| MCP tool endpoint | `codebyplan.com/mcp` |
+| Loading rule registration | `.claude/rules/context-file-loading.md` (Phase 2.6 / Step 3.4 mapping rows) |
+| Planner integration | `packages/codebyplan-package/templates/agents/task-planner.md` Phase 2.6 |
+| Executor integration | `packages/codebyplan-package/templates/agents/round-executor.md` Step 3.4 |