npm - codebyplan - Versions diffs - 1.13.52 → 1.13.54 - Mend

codebyplan 1.13.52 → 1.13.54

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (92) hide show

package/templates/skills/cbp-ship/templates/workflow-vsce-publish.yml DELETED Viewed

@@ -1,31 +0,0 @@
-name: VS Code Marketplace publish
-on:
-  push:
-    tags:
-      - 'vscode-v*.*.*'
-  workflow_dispatch:
-jobs:
-  publish:
-    runs-on: ubuntu-latest
-    steps:
-      - uses: actions/checkout@v4
-      - uses: pnpm/action-setup@v3
-        with:
-          version: 10
-      - uses: actions/setup-node@v4
-        with:
-          node-version: 22
-          cache: pnpm
-      - run: pnpm install --frozen-lockfile
-      - run: pnpm --filter REPLACE_WITH_EXT_NAME build
-      - name: Publish to Marketplace
-        env:
-          VSCE_PAT: ${{ secrets.VSCE_PAT }}
-        run: |
-          cd apps/vscode
-          npx @vscode/vsce publish --no-yarn

package/templates/skills/cbp-task-check/SKILL.md DELETED Viewed

@@ -1,172 +0,0 @@
----
-name: cbp-task-check
-description: AI production review for the current task
-argument-hint: [chk-task]
-triggers: [cbp-task-testing, cbp-round-input]
-effort: high
----
-# Task Check Command
-AI-driven production readiness review. Spawns the `cbp-task-check` agent for thorough verification including user satisfaction discussion. This command is a thin orchestrator — the agent does the heavy lifting. It is the **cross-round double-check**: rounds already own per-round QA (debug scan, security grep, audit, per-app build/lint/types), so this layer focuses on holistic concerns visible only across the full task diff — requirements traceability, checkpoint alignment, shippability, holistic code review, and scope drift — never re-running per-round checks.
-## Inline-Fallback for Spawn Failure
-If the `cbp-task-check` agent spawn fails for any reason (`API Error: Extra usage required`, monthly Agent usage cap, provider 5xx, rate limit, context overflow), the orchestrator MUST follow the canonical inline-fallback procedure documented in `skills/cbp-round-end/SKILL.md` "Inline-fallback for any spawn failure".
-Procedure summary (pointer back to canonical):
-1. Detect the failure class from the error string; record `round.context.task_check_findings.spawn_failure = { class, error_message, decided_at }`.
-2. Walk the agent's documented Phase 1-10 checklist inline using `Read` / `Grep` / `Bash` / MCP `get_*` tools — the agent's definition file is the inline script.
-3. Populate the agent's output contract (`verdict`, `route_recommendation`, `requirements_status`, `qa_status`, `code_review_findings`, `user_satisfaction`, `scope_divergence_detected`, etc.) with `mode: 'inline_fallback'` so analytics distinguishes.
-4. Apply the pre-emptive-skip rule: when the same failure class fired in the previous skill of this session, skip the spawn attempt entirely and go straight to inline.
-5. Continue the skill — do NOT abort. Inline-fallback is intended to keep the pipeline moving under sustained outages.
-Inline-fallback is NOT a quality downgrade trapdoor — every Phase from the agent definition MUST be walked, in order, with the same Read/Grep depth the agent would have used. Skipping phases under the banner of fallback is a separate failure mode that `cbp-improve-claude` flags as `inline_fallback_shortcutting`.
-## When Used
-- After all rounds complete and all files approved (auto-triggered by `/cbp-round-complete`)
-- Before `/cbp-task-testing`
-- `/cbp-task-check` is NEVER skippable
-## Instructions
-### Step 1: Parse `$ARGUMENTS`
-Parse the argument using the canonical chk-task-round notation (see `cbp-round-start` Step 0 "CHK / TASK / ROUND Identifier Notation Vocabulary"):
-| Shape | Regex | Resolves to |
-|-------|-------|-------------|
-| `{chk}-{task}` (e.g. `108-1`) | `^[0-9]+-[0-9]+$` | Checkpoint-bound: CHK-{chk} TASK-{task} |
-| _(empty)_ | — | Resolve from local state per Step 1.5/2 (MCP `get_current_task` break-glass) — the active in-progress task |
-| `{task}` (bare number) | — | **Error**: "Use /cbp-standalone-task-check {N} instead — bare numbers no longer route to standalone tasks." |
-Anything else is malformed — surface this error and stop:
-```
-task-check: invalid argument `{value}`. Expected:
-  108-1  → CHK-108 TASK-1 (checkpoint-bound)
-  (empty) → active in-progress task
-For standalone tasks, use `/cbp-standalone-task-check {N}`.
-For a specific round, use `/cbp-round-update 108-1-2`.
-```
-Error cases: `108-1-2` (that is round-update's shape), `abc`, `108-`, `-1`, `108--1`, anything with whitespace or non-numeric characters.
-#### Worked examples
-- `task-check 108-1` → CHK-108 TASK-1
-- `task-check` (no arg) → active in-progress task via `get_current_task`
-- `task-check 45` → error: "Use /cbp-standalone-task-check 45 instead — bare numbers no longer route to standalone tasks."
-- `task-check 108-1-2` → error: "use `/cbp-round-update 108-1-2`"
-- `task-check abc` → error: malformed
-### Step 1.5: Get Current Task
-Given the parse from Step 1:
-| Parse | Resolution path |
-|-------|-----------------|
-| `{chk}-{task}` | Read `.codebyplan/state/checkpoints/*.json` → filter `number === {chk}`. Read `.codebyplan/state/checkpoints/<id>/tasks/*.json` → filter `number === {task}`. If missing/stale, run `npx codebyplan sync` once and re-read. Break-glass fallback: MCP `get_checkpoints`/`get_tasks` when state dir absent and sync fails. |
-| _(empty)_ | Read `.codebyplan/state/todos.json` → find the active in-progress task. If missing/stale, run `npx codebyplan sync` once and re-read. Break-glass fallback: MCP `get_current_task(repo_id)` when state dir absent and sync fails. |
-If no in-progress task, show error and stop.
-### Step 2: Quick Gate — Verify All Rounds Complete
-Read `.codebyplan/state/checkpoints/<checkpointId>/tasks/<taskId>/rounds/*.json` (local-first). If missing/stale, run `npx codebyplan sync` once and re-read. Break-glass fallback: MCP `get_rounds` when state dir absent and sync fails. Verify all rounds are `completed`.
-If any rounds still in_progress:
-```
-## Cannot Run Task Check
-TASK-[N] has an active round (Round [N]). Complete it first:
-- Run `/cbp-round-update` to finish the round
-```
-Stop here.
-### Step 3: Load All Context
-1. Checkpoint details — from `.codebyplan/state/checkpoints/<checkpointId>.json` (already read in Step 1.5)
-2. Task details — from `.codebyplan/state/checkpoints/<checkpointId>/tasks/<taskId>.json` (already read in Step 1.5)
-3. All rounds — from `.codebyplan/state/checkpoints/<checkpointId>/tasks/<taskId>/rounds/*.json` (already read in Step 2)
-### Step 4: Spawn Task Check Agent
-Spawn `cbp-task-check` agent with full context:
-```yaml
-input:
-  task_number: [N]
-  round_number: [total rounds]
-  checkpoint: { id, title, goal, context }
-  task: { id, title, requirements, context, files_changed, qa }
-  rounds: [{ number, requirements, context, qa, files_changed }]
-```
-Wait for agent to complete. Agent handles all 10 phases including user satisfaction discussion.
-### Step 5: Save Agent Output
-Save agent output to task context: `codebyplan task update --id <taskId> --checkpoint-id <checkpointId> --context '{"check_verdict": ...}'` (CLI write-through: local state file + REST). Break-glass fallback: MCP `update_task` when CLI is unavailable.
-- `task.context.check_verdict` = agent output (verdict, requirements_check, etc.)
-### Step 6: Route Based on Verdict
-**READY + satisfied:**
-Starting task testing...
-Invoke `cbp-task-testing` via the Skill tool with the same `{chk-task}` argument. `cbp-task-testing`
-is `allow`-tier — it auto-fires silently. If the `cbp-skill-context-guard.sh` hook detects the
-context window is above the 200K threshold it will block the skill and direct you to run
-`/cbp-clear-prep` first; otherwise testing starts immediately.
-**NOT READY — fixable issues:**
-```
-Issues found that need addressing:
-- [issue 1]
-- [issue 2]
-```
-Invoking `cbp-round-input` to address the issues found during review...
-Invoke `cbp-round-input` via the Skill tool. `cbp-round-input` is `allow`-tier — it auto-fires silently.
-**NOT READY — needs new task:**
-```
-Scope issues identified that require a new task:
-- [scope issue]
-```
-Suggest: `/cbp-task-create`. **STOP HERE** — wait for user (creating a new task is a user scope decision — not auto-triggered).
-**NOT READY — approvals missing:**
-```
-Code review passed but [N] files need user approval.
-```
-Suggest: Approve files, then re-run `/cbp-task-check`. **STOP HERE** — wait for user (approval is a user action — not auto-triggered).
-## Key Rules
-- **`/cbp-task-check` is NEVER skippable** — mandatory before `/cbp-task-testing`
-- **This is AI review + user satisfaction** — not automated testing
-- **Read all changed files** — agent does the heavy lifting
-- **No file changes** — review only, never edit
-- **Checkpoint-bound only** — for standalone tasks use `/cbp-standalone-task-check`
-## Integration
-- **Reads**: `.codebyplan/state/checkpoints/*.json`, `checkpoints/<id>/tasks/*.json`, `checkpoints/<id>/tasks/<id>/rounds/*.json`, `todos.json` (local-first; `npx codebyplan sync` on miss; MCP `get_current_task`/`get_rounds` break-glass), plus all changed files (via agent)
-- **Writes**: `codebyplan task update` (CLI write-through; MCP `update_task` break-glass)
-- **Triggers**: auto-triggers `cbp-task-testing` via Skill tool on READY + satisfied (`allow`-tier, fires silently; the 200K context guard handles oversized contexts via the cbp-clear-prep flow); auto-triggers `cbp-round-input` via Skill tool on NOT READY — fixable issues (`allow`-tier, fires silently)
-- **Triggered by**: `/cbp-round-complete` (auto, when all files approved)

package/templates/skills/cbp-task-testing/SKILL.md DELETED Viewed

@@ -1,277 +0,0 @@
----
-name: cbp-task-testing
-description: Run comprehensive task-level testing after /cbp-task-check passes
-argument-hint: [chk-task]
-triggers: [cbp-task-complete, cbp-round-input]
-effort: xhigh
----
-# Task Testing Command
-Comprehensive task-level testing — runs all automated tests and walks the user through manual testing one-by-one. Distinct from round-level testing (`testing-qa-agent`): this tests the **entire delivered feature holistically** after all rounds are complete. Runs inline — no sub-agent.
-## When Used
-- After `/cbp-task-check` passes with READY verdict (auto-triggered)
-- Before `/cbp-task-complete`
-- **Never skippable**
-## Scope vs Round-Level Validation
-Per-wave `testing-qa-agent` runs inside `/cbp-round-execute` Step 5. This skill adds the cross-cutting layer that is only visible across the full task diff: whole-repo lint, whole-repo typecheck, full test suite, `pnpm audit` (via `codebyplan check --scope task --json`), and full-diff security scan — each run once here, not per-round.
-## Instructions
-### Step 1: Parse `$ARGUMENTS`
-Parse the argument using the canonical chk-task-round notation (see `.claude/rules/notation-consistency.md`):
-| Shape | Regex | Resolves to |
-|-------|-------|-------------|
-| `{chk}-{task}` (e.g. `108-1`) | `^[0-9]+-[0-9]+$` | Checkpoint-bound: CHK-{chk} TASK-{task} |
-| _(empty)_ | — | Resolve from local state per Step 1.5/2 (MCP `get_current_task` break-glass) — the active in-progress task |
-| `{task}` (bare number) | — | **Error**: "Use /cbp-standalone-task-testing {N} instead — bare numbers no longer route to standalone tasks." |
-Anything else is malformed — surface this error and stop:
-```
-task-testing: invalid argument `{value}`. Expected:
-  108-1  → CHK-108 TASK-1 (checkpoint-bound)
-  (empty) → active in-progress task
-For standalone tasks, use `/cbp-standalone-task-testing {N}`.
-For a specific round, use `/cbp-round-update 108-1-2`.
-```
-Error cases: `108-1-2` (that is round-update's shape), `abc`, `108-`, `-1`, `108--1`, anything with whitespace or non-numeric characters.
-#### Worked examples
-- `task-testing 108-1` → CHK-108 TASK-1
-- `task-testing` (no arg) → active in-progress task via `get_current_task`
-- `task-testing 45` → error: "Use /cbp-standalone-task-testing 45 instead — bare numbers no longer route to standalone tasks."
-- `task-testing 108-1-2` → error: "use `/cbp-round-update 108-1-2`"
-- `task-testing abc` → error: malformed
-### Step 1.5: Get Current Task
-Given the parse from Step 1:
-| Parse | Resolution path |
-|-------|-----------------|
-| `{chk}-{task}` | Read `.codebyplan/state/checkpoints/*.json` → filter `number === {chk}`. Read `.codebyplan/state/checkpoints/<id>/tasks/*.json` → filter `number === {task}`. If missing/stale, run `npx codebyplan sync` once and re-read. Break-glass fallback: MCP `get_checkpoints`/`get_tasks` when state dir absent and sync fails. |
-| _(empty)_ | Read `.codebyplan/state/todos.json` → find the active in-progress task. If missing/stale, run `npx codebyplan sync` once and re-read. Break-glass fallback: MCP `get_current_task(repo_id)` when state dir absent and sync fails. |
-If no in-progress task, show error and stop.
-### Step 2: Verify All Rounds Complete
-Read `.codebyplan/state/checkpoints/<checkpointId>/tasks/<taskId>/rounds/*.json` (local-first). If missing/stale, run `npx codebyplan sync` once and re-read. Break-glass fallback: MCP `get_rounds(task_id)` when state dir absent and sync fails. Verify all rounds are `completed`. If any still `in_progress`:
-```
-## Cannot Run Task Testing
-TASK-[N] has an active round (Round [N]). Complete it first:
-- Run `/cbp-round-update` to finish the round
-```
-Stop.
-### Step 3: Verify `/cbp-task-check` Passed
-Check `task.context.check_verdict`: must exist and have `verdict = "READY"`. Otherwise:
-```
-## Cannot Run Task Testing
-`/cbp-task-check` has not passed yet. Run `/cbp-task-check` first.
-```
-Stop.
-### Step 4: Aggregate Files Changed
-Collect all `files_changed` from all rounds, deduplicate (latest action per path wins). Skip deleted files for file-reading in Step 5.
-### Step 5: Read ALL Final Changed Files
-Read every non-deleted file in the aggregated list. Understand the complete delivered work across all rounds. Build a mental model of what was built and how it connects.
-### Step 6: Run Comprehensive Automated Testing
-Capture stdout and stderr for each check.
-**Hard-fail tests** (block completion):
-Run the unified check matrix:
-```bash
-codebyplan check --scope task --json
-```
-Capture the JSON result. The runner is **whole-repo + baseline**: it runs `turbo run lint|typecheck|test` across every package and diffs each per-package result against the committed `.check-baseline.json`, so only NEW per-package failures fail a check. Five checks run for `--scope task`: `gate6` (sibling-identity parity — ALWAYS hard-fail, never baselined), `lint`, `typecheck`, `tests`, and `audit` (`audit.new_failures` lists new GHSA advisory ids not in the allowlist). A baselined check's `status` is `pass` when its `new_failures` array is empty even if the underlying command exited non-zero. If `any_failed === true` (or `hard_fail_checks` is non-empty), this is a hard fail — surface each failing result's `stdout`/`stderr`/`new_failures` and stop.
-For each result entry, record: `category` (from `result.check`), `status` (from `result.status`), `details`, `stdout` (from `result.stdout`), `stderr` (from `result.stderr`), and `new_failures` (from `result.new_failures` — the newly-failing packages / new GHSA ids; the field is omitted/`undefined` for `gate6`, not `null`).
-Additional hard-fail checks (not part of the runner):
-| Category                | Command                         | Condition                        |
-| ----------------------- | ------------------------------- | -------------------------------- |
-| Per-package E2E         | `pnpm --filter <pkg> e2e:test`  | UI files in aggregated_files     |
-| Full-diff security scan | inline grep or `security-agent` | Always                           |
-Per-file lint + format are enforced by `lint-format-on-edit.sh` hook per edit. This step catches cross-package issues invisible to per-wave checks.
-**Soft tests** (report, don't block):
-| Category   | Method                                    | Condition                    |
-| ---------- | ----------------------------------------- | ---------------------------- |
-| Visual     | Screenshot compare via `e2e:visual-check` | UI work + dev server running |
-| API Health | `curl` health endpoint                    | API routes changed           |
-#### Step 6.x: Autonomous Sim Screenshot Validation (mobile / on-device)
-For mobile rounds (Maestro / XCUITest / Tauri-mobile) where unit tests passed but the round touched component-mount code paths (custom hooks, prop signatures, conditional renders, navigation tabs), unit-test green is NOT sufficient evidence that the screen mounts at runtime. Use the autonomous sim screenshot loop to catch runtime crashes invisible to mocked unit tests.
-**Procedure** (when iOS Simulator is the target — adapt the screenshot command for Android/Tauri equivalents):
-1. Confirm the target screen's default state via `Read` of its parent component or store.
-2. If the screen is normally gated behind a tab/route the simulator isn't currently on, temporarily flip the gating state's default value at the screen's entry point (`useState(initialTab) → useState('targetTab')` or equivalent) so the screen mounts on the next reload.
-3. Trigger a Fast Refresh by saving a touched file, then capture: `xcrun simctl io booted screenshot /tmp/codebyplan-task-testing-{task}-{state}.png`.
-4. Read the screenshot via the multimodal Read tool. Confirm the screen rendered (vs blank/crash/red-box error overlay).
-5. Revert the state-default flip from step 2. Confirm the file diff is empty (`git diff <path>`) before proceeding.
-**When to use**: round modifies hook usage, prop signatures, component-tree shape, or store subscriptions on a screen whose unit tests mock the data layer. Skip when the modified code path has no UI surface (pure utilities, server actions).
-**When NOT to use**: don't flip state defaults on the main app entry / auth gate / feature-flag boundaries — the revert risk is too high. Use storybook or a dedicated `__dev__` tab if the screen has cross-cutting state.
-Record the result in `task_testing_output.autonomous_sim_check`:
-```yaml
-autonomous_sim_check:
-  screen: "<screen-name>"
-  status: "rendered" | "crashed" | "blank"
-  screenshot_path: "/tmp/codebyplan-task-testing-..."
-  state_flip_reverted: true
-```
-This technique uniquely catches Rules-of-Hooks violations and prop-shape mismatches that mocked unit tests cannot detect — the runtime hook scheduler is the only oracle.
-### Step 6.5: Cross-Round Code Review
-Round-level code review runs per-round via `improve-round` at `/cbp-round-end`. This step adds the cross-round holistic layer — things only visible once all rounds are aggregated.
-Inline review (no sub-agent) across the aggregated files read in Step 5. Check:
-| Concern                   | What to Look For                                                                                                                                 |
-| ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ |
-| Leftover debug            | `console.log`, `debugger`, commented-out blocks, `TODO`/`FIXME` added during this task                                                           |
-| Cross-round duplication   | Same helper/logic written independently in 2+ rounds — candidate for extraction                                                                  |
-| Convention drift          | One round introduces a pattern (error handling, naming, file layout) that contradicts a pattern established in an earlier round of the same task |
-| Incomplete follow-through | A round added a type/field/table column that later rounds never consume                                                                          |
-| Orphaned additions        | Exports or utilities added in an early round with no callers after later rounds refactored past them                                             |
-For each finding, record: `{category, file, description, severity: 'low'|'medium'|'high', suggested_fix}`.
-Findings with severity `medium` or `high` feed the Step 9 problem classification. `low` findings are recorded in `task_testing_output` for the record but do not block.
-If any finding points to a need that exceeds task scope (e.g. a utility worth extracting for the wider codebase, a convention the repo should adopt globally), route per `immediate-issue-capture.md` "How to Capture" — default to a NEW TASK in the current checkpoint, not a standalone task. Standalone routing applies only when the finding is genuinely off-axis from every active checkpoint AND the user has confirmed standalone routing.
-### Step 7: Separate Claude-Testable vs User-Testable
-**Claude handles automatically** (Step 6): build, types, unit tests, E2E tests, visual, API health.
-**User must verify** (requires human judgment):
-- Visual appearance quality (does it look good?)
-- UX flow (is the interaction intuitive?)
-- Business logic correctness (does it do the right thing?)
-- Edge cases (unusual inputs, boundary conditions)
-- Cross-browser / real-device behavior
-- Content accuracy (text, labels, messages)
-Generate user test items based on: task requirements, changed files, round context.
-### Step 8: User Testing Walkthrough
-Present all user-testable items as a **single checklist in one `AskUserQuestion` prompt**. Do not ask one question per item — the batched format is preferred.
-Format the question so every item is visible in the checklist, with a single overall answer (e.g., "all pass", "minor issues", "major issues"). Provide the description, how-to-test steps, and expected result per item inside the question body. If the user reports mixed results, collect the specifics in a follow-up.
-Record the aggregate response and any per-item notes.
-### Step 9: Classify Problems
-Collect failures from automated tests (Step 6), cross-round code review (Step 6.5, medium+), and user tests (Step 8). Classify:
-- **Minor** (round-fixable): styling, small bugs, missing edge cases, localized duplication
-- **Major** (new-task-worthy): architectural issues, missing features, fundamental design problems, convention drift that spans multiple files
-### Step 10: Save Results
-`codebyplan task update --id <taskId> --checkpoint-id <checkpointId> --context '<json>'` (CLI write-through: local state file + REST), merging `task_testing_output` into the existing context object. Break-glass fallback: MCP `update_task` when CLI is unavailable.
-```ts
-// context payload to merge:
-{
-  task_testing_output: {
-    claude_tests: [...],
-    cross_round_code_findings: [...],   // from Step 6.5
-    user_tests: [...],
-    problems_found: [...],
-    all_passed: boolean,
-    summary: { total, passed, failed, pending }
-  }
-}
-```
-### Step 11: Route Based on Results
-**ALL PASS:**
-```
-All tests passed for TASK-[N]. Routing to task-complete...
-```
-Invoke `cbp-task-complete` via the Skill tool. `cbp-task-complete` is `ask`-tier — the harness
-permission prompt IS the human gate; the user confirms (or declines) before task commit,
-merge-main, and completion.
-**Minor problems found:**
-Invoking `cbp-round-input` to address the minor issues found during testing...
-Invoke `cbp-round-input` via the Skill tool. `cbp-round-input` is `allow`-tier — it auto-fires
-silently.
-**Major problems found:**
----
-**Next:**
-Run `/cbp-task-create` to:
-- Create a new task for the identified issues
----
-Waiting for user to run `/cbp-task-create`.
-**User wants re-test:** Suggest re-running `/cbp-task-testing`.
-## Key Rules
-- **Never skippable** — mandatory before `/cbp-task-complete`
-- **Must loop until everything passes** — problems must be addressed
-- **No file changes** — testing only, never edit
-- **Batch user tests** — present all user-testable items in a single `AskUserQuestion` checklist; never one-per-question
-- **Read actual files** — do not rely on metadata alone
-- **Run actual commands** — capture real stdout/stderr
-- **Checkpoint-bound only** — for standalone tasks use `/cbp-standalone-task-testing`
-## Integration
-- **Reads**: `.codebyplan/state/checkpoints/*.json`, `checkpoints/<id>/tasks/*.json`, `checkpoints/<id>/tasks/<id>/rounds/*.json`, `todos.json` (local-first; `npx codebyplan sync` on miss; MCP `get_current_task`/`get_rounds` break-glass), plus all aggregated files
-- **Writes**: `codebyplan task update` (CLI write-through; MCP `update_task` break-glass)
-- **Triggers**: `cbp-task-complete` (auto via Skill tool, when ALL PASS — `ask`-tier, permission prompt IS the human gate); `cbp-round-input` (auto via Skill tool, on minor problems — `allow`-tier, fires silently)
-- **Triggered by**: `cbp-task-check` auto-triggers this skill via Skill tool on READY verdict; `cbp-task-testing` is `allow`-tier and fires silently (no permission prompt)