npm - codebyplan - Versions diffs - 1.13.14 → 1.13.16 - Mend

codebyplan 1.13.14 → 1.13.16

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

package/README.md +77 -2
package/dist/cli.js +726 -69
package/package.json +1 -1
package/templates/agents/cbp-e2e-maestro.md +26 -3
package/templates/agents/cbp-e2e-playwright.md +24 -3
package/templates/agents/cbp-e2e-tauri.md +25 -2
package/templates/agents/cbp-e2e-vscode.md +28 -3
package/templates/agents/cbp-e2e-xcuitest.md +40 -4
package/templates/agents/cbp-task-check.md +2 -0
package/templates/context/testing/e2e.md +57 -9
package/templates/hooks/README.md +23 -1
package/templates/hooks/validate-structure-patterns.sh +1 -1
package/templates/rules/e2e-mandatory.md +19 -2
package/templates/settings.project.base.json +18 -1
package/templates/skills/cbp-checkpoint-end/SKILL.md +18 -1
package/templates/skills/cbp-frontend-ui/SKILL.md +9 -7
package/templates/skills/cbp-round-execute/SKILL.md +49 -7
package/templates/skills/cbp-setup-cmux/SKILL.md +170 -0
package/templates/skills/cbp-task-complete/SKILL.md +14 -0
package/templates/skills/cbp-task-start/SKILL.md +8 -0

package/templates/skills/cbp-round-execute/SKILL.md CHANGED Viewed

@@ -57,6 +57,16 @@ Read the plan from round context (`context.planner_output`). If no plan: `No app
 Read effective testing profile: `round.context.testing_profile_override` if set (user override for this round only), else `task.context.testing_profile` (set by planner Phase 4.8), else default `'web'`. Pass the effective profile to all per-wave `cbp-testing-qa-agent` spawns.
+### Step 2a: Auto-Dev-Server (cmux)
+Fire the dev-server hook at round-execution start. Self-no-ops outside cmux or when `auto_dev_server` is disabled in `.codebyplan/cmux.json`.
+```bash
+npx codebyplan cmux-serve --files "<comma-separated approved_plan.files_to_modify[].path>"
+```
+The subcommand reads `.codebyplan/server.json` `port_allocations[]`, resolves which apps' source dirs intersect the round's files, probes each allocated port, and starts a cmux terminal split + browser pane for any app not already serving. Idempotent — if the port is already listening it only opens the browser pane (mitigating the multi-worktree port collision).
 ### Step 3: Route Execution Path
 Inspect `approved_plan.files_to_modify[]` and `approved_plan.round_type`. Four execution paths exist; pick the one that matches BEFORE Step 3a/3b.
@@ -143,6 +153,14 @@ If the approved plan includes database schema changes, RLS policies, or type gen
 - `status: 'blocked'` → present blocker to user via AskUserQuestion, resolve, re-spawn executor with remaining work
 - Deliverables incomplete → re-spawn executor with remaining deliverables (max 3 re-triggers). After 3 re-triggers, save partial output and proceed.
+### Step 3d: Push cmux QA Status
+Push the round's QA outcome to the cmux workspace sidebar. Self-no-ops outside cmux or when `auto_status` is disabled. Status is one of: `completed`, `blocked`, or `re-triggering`.
+```bash
+npx codebyplan cmux-status --qa "R{round_number} {status}"
+```
 ### Step 4: Dev-Server Probe (rounds 2+, web/desktop profile)
 When `round_number >= 2` AND `testing_profile` is `'web'` or `'desktop'` AND `files_changed` contains any UI file, probe the dev server BEFORE cbp-testing-qa-agent spawns (saves a full agent spawn when the server is down).
@@ -184,11 +202,35 @@ Input contracts: `cbp-testing-qa-agent` receives `executor_output`, `testing_pro
 ### Step 5b: Post-E2E Screenshot Review (cbp-frontend-ui Phase 6.5)
-Aggregate screenshots across ALL specialists that ran: `screenshots = Object.values(round.context.e2e_outputs ?? {}).flatMap(o => o.screenshots ?? [])`. When the aggregated list is non-empty, invoke the `cbp-frontend-ui` skill with `phase: 'screenshot_review'` (input: `files_changed`, `e2e_screenshots: <aggregated screenshots>`, `context: { checkpoint_goal, round_requirements }`). Under this phase the skill runs only Phase 6.5 (Rendered-Output Visual Review) + 7 + 8 — Phases 1-6 (style) already ran inline at executor Step 3.8 with `phase: 'style_only'`.
+Aggregate across ALL specialists that ran:
+```js
+screenshots = Object.values(round.context.e2e_outputs ?? {}).flatMap(o => o.screenshots ?? []);
+e2e_gallery  = Object.values(round.context.e2e_outputs ?? {}).flatMap(o => o.e2e_gallery  ?? []);
+```
+**Auto-new baseline handling**: for each entry in `e2e_gallery` where `is_new === true`, the
+specialist has already run `git add <committed_path>`. No additional user gate is needed.
+**Changed-baseline handling**: entries where `is_new === false` AND `baseline_diff_pct > threshold`
+are `visual_regression` — do NOT auto-accept; surface as blocking gate at Step 7.
+Persist `e2e_gallery` to `round.context.e2e_gallery` (additive alongside existing
+`round.context.e2e_outputs`). This field is consumed by TASK-3 / checkpoint-end for DB upload.
+Note: `e2e_gallery[]` is aggregated and persisted regardless of whether `cbp-frontend-ui` runs — the empty-gallery enforcement lives in `cbp-task-check` Phase 4, while the `screenshots[]` visual review (frontend-ui Phase 6.5) is a separate concern gated on `screenshots[]` being non-empty.
+When the aggregated `screenshots` list is non-empty, invoke the `cbp-frontend-ui` skill with
+`phase: 'screenshot_review'` (input: `files_changed`, `e2e_screenshots: <aggregated screenshots>`,
+`context: { checkpoint_goal, round_requirements }`). Under this phase the skill runs only
+Phase 6.5 (Rendered-Output Visual Review) + 7 + 8 — Phases 1-6 (style) already ran at Step 3.8.
-Persist findings to `round.context.frontend_ui_review` (merge with Step 3.8's style-only output if present). Baseline-regression findings surface as a BLOCKING gate at `/cbp-round-end` Step 7 (an explicit accept-or-fix user decision; baselines are NEVER auto-accepted); rendered_visual critical findings are surfaced in the Step 7 findings presentation. Neither auto-fails the round. cbp-testing-qa-agent does NOT read these findings (full independence per Step 5).
+Persist findings to `round.context.frontend_ui_review` (merge with Step 3.8's style-only output
+if present). Baseline-regression findings surface as a BLOCKING gate at `/cbp-round-end` Step 7
+(an explicit accept-or-fix user decision; changed baselines are NEVER auto-accepted);
+rendered_visual critical findings are surfaced in the Step 7 findings presentation. Neither
+auto-fails the round. cbp-testing-qa-agent does NOT read these findings (full independence).
-**Skip** when `round.context.e2e_outputs` is absent/empty, the aggregated `screenshots` list is empty, or `testing_profile === 'claude_only'`.
+**Skip** when `round.context.e2e_outputs` is absent/empty, the aggregated `screenshots` list
+is empty, or `testing_profile === 'claude_only'`.
 ### Step 6: Hard-Fail Routing
@@ -215,9 +257,9 @@ When `cbp-testing-qa-agent` spawn fails OR the resolved `testing_profile` is `cl
 Update round context via MCP `update_round` / `update_standalone_round` per KIND:
-- `context`: { ...existing, executor_output, testing_qa_output, e2e_eligible, e2e_outputs, frontend_ui_review }
+- `context`: { ...existing, executor_output, testing_qa_output, e2e_eligible, e2e_outputs, e2e_gallery, frontend_ui_review }
-`e2e_outputs` (a framework-keyed map of specialist outputs, e.g. `{ playwright: {...}, maestro: {...} }`) and `frontend_ui_review` are present only when the gates above admitted them (≥1 eligible framework ran AND Step 5b ran). `e2e_eligible[]` records which frameworks were eligible this round and drives the Step 6 `e2e_eligible_skipped` check.
+`e2e_outputs` (a framework-keyed map of specialist outputs, e.g. `{ playwright: {...}, maestro: {...} }`), `e2e_gallery` (aggregated flat array of committed-PNG entries across all specialists — consumed by TASK-3 / checkpoint-end for DB upload), and `frontend_ui_review` are present only when the gates above admitted them (≥1 eligible framework ran AND Step 5b ran). `e2e_eligible[]` records which frameworks were eligible this round and drives the Step 6 `e2e_eligible_skipped` check.
 ### Step 8: Auto-trigger Round End
@@ -234,13 +276,13 @@ Trigger `/cbp-round-end`.
 - `testing_profile` from `task.context` governs which checks run — read it once in Step 2; pass to every testing-qa + e2e specialist spawn
 - `claude_only` profile skips all agent spawns (testing-qa AND `cbp-e2e-*`); runs hook syntax and skill structure checks inline
 - E2E dispatch is **config-driven and opt-out** (`.codebyplan/e2e.json`), not gated on `has_ui_work`/`testing_profile` — an eligible framework that silently does not run is an `e2e_eligible_skipped` hard-fail (`rules/e2e-mandatory.md`)
-- Step 5b (cbp-frontend-ui Phase 6.5) runs only when e2e produced screenshots — gated on the aggregated `e2e_outputs[*].screenshots[]` being non-empty
+- Step 5b (cbp-frontend-ui Phase 6.5) runs only when e2e produced screenshots — gated on the aggregated `e2e_outputs[*].screenshots[]` being non-empty; `e2e_gallery[]` is always aggregated and persisted when any specialist ran
 - Claude NEVER git adds files in round commands
 ## Integration
 - **Reads**: MCP `get_current_task` / `get_current_standalone_task`, `get_rounds` / `get_standalone_rounds` (per KIND)
-- **Writes**: MCP `update_round` / `update_standalone_round` (context with executor_output + testing_qa_output + e2e_eligible + e2e_outputs + frontend_ui_review) — per KIND
+- **Writes**: MCP `update_round` / `update_standalone_round` (context with executor_output + testing_qa_output + e2e_eligible + e2e_outputs + e2e_gallery + frontend_ui_review) — per KIND
 - **Spawns**: `cbp-round-executor` (per wave or single), `cbp-testing-qa-agent` (per wave, parallel sibling of the `cbp-e2e-*` specialists), the `cbp-e2e-*` specialists (config-driven dispatch per `context/testing/e2e.md`, one per eligible framework in `.codebyplan/e2e.json`), `cbp-database-agent` (if DB work), `cbp-security-agent` (if security review needed)
 - **Skill invocations**: `cbp-frontend-ui` at Step 5b with `phase: 'screenshot_review'` (post-e2e)
 - **Triggers**: `/cbp-round-end` (auto)

package/templates/skills/cbp-setup-cmux/SKILL.md ADDED Viewed

@@ -0,0 +1,170 @@
+---
+scope: org-shared
+name: cbp-setup-cmux
+description: Configure .codebyplan/cmux.json — choose workspace color and toggle auto_status / auto_dev_server. Interactive, idempotent.
+argument-hint: "[--force]"
+model: sonnet
+effort: xhigh
+allowed-tools: Read, Write, Edit, Bash(cat *), Bash(jq *), Bash(test *), Bash(mv *), Bash(git check-ignore *), Bash(cmux *), AskUserQuestion
+---
+# cmux Setup
+Configure `.codebyplan/cmux.json` so the cmux workspace integration knows which color to
+apply on session start and whether auto_status / auto_dev_server are enabled.
+Invoke at any time. Existing values are preserved unless `--force` is passed.
+Pass `--force` to re-ask all questions including already-configured fields.
+## Arguments
+Inspect `$ARGUMENTS` for `--force`. If present, set `force_mode = true`.
+Absent: use idempotent mode — preserve existing configured values, skip re-asking
+already-set fields.
+## Step 1 — Parse --force and read existing config
+```bash
+cat .codebyplan/cmux.json 2>/dev/null || echo '{}'
+```
+Capture the result as `EXISTING_JSON`. Extract existing values:
+```bash
+EXISTING_COLOR=$(echo "$EXISTING_JSON" | jq -r '.workspace_color // empty')
+EXISTING_AUTO_STATUS=$(echo "$EXISTING_JSON" | jq -r '.auto_status // empty')
+EXISTING_AUTO_DEV=$(echo "$EXISTING_JSON" | jq -r '.auto_dev_server // empty')
+```
+## Step 2 — Ask for workspace color
+**Idempotency gate**: skip this question if `force_mode = false` AND `EXISTING_COLOR`
+is a non-empty string. Print the preserved value and continue to Step 3.
+Otherwise, AskUserQuestion:
+```
+Workspace color for this cmux workspace
+Choose a named color or enter a custom hex value (#RRGGBB).
+Named colors:
+  A) Red        B) Crimson    C) Orange     D) Amber
+  E) Olive      F) Green      G) Teal       H) Aqua
+  I) Blue       J) Navy       K) Indigo     L) Purple
+  M) Magenta    N) Rose       O) Brown      P) Charcoal
+  Q) Custom hex — enter a value in the format #RRGGBB
+Enter a letter (A-Q) or type a custom hex value directly:
+```
+Validate the input:
+- If a letter A-P: map to the corresponding color name (e.g. "A" → "Red").
+- If "Q": prompt the user to type a custom hex value, then validate the typed value
+  against `^#[0-9A-Fa-f]{6}$`. If invalid, re-prompt once with an error. Never pass the
+  bare letter "Q" to set-color.
+- If the raw input itself already looks like a hex value: validate it against
+  `^#[0-9A-Fa-f]{6}$` and accept on match.
+- Empty input with no existing color: use no color (leave `workspace_color` as `null`).
+Set `NEW_COLOR` to the chosen color name / hex, or `null` if skipped.
+## Step 3 — Ask for auto_status and auto_dev_server toggles
+**Idempotency gate**: skip questions for fields that are already set (non-empty in
+existing config) when `force_mode = false`.
+For each unset field (or all fields when `force_mode = true`), AskUserQuestion:
+```
+cmux integration toggles
+auto_status: automatically run `cmux status` in this workspace on session start?
+  (default: on)
+  A) On
+  B) Off
+auto_dev_server: automatically start the dev server via cmux on session start?
+  (default: on)
+  A) On
+  B) Off
+```
+Ask both in a single question when both are unset. Set `NEW_AUTO_STATUS` and
+`NEW_AUTO_DEV` to `true` or `false` based on the answers.
+## Step 4 — Write .codebyplan/cmux.json
+Build the updated payload using jq deep-merge so sibling fields added by future
+schema versions are preserved. Assemble variables first.
+Backfill any field that the Step 2/3 idempotency gates skipped, so each `jq
+--argjson` always receives valid JSON — an empty shell variable makes `jq` exit 1,
+which would leave `NEW_PAYLOAD` empty and silently no-op the write. Fall back to the
+existing value, then to the default (`true` for the toggles):
+```bash
+NEW_COLOR="${NEW_COLOR:-$EXISTING_COLOR}"
+NEW_AUTO_STATUS="${NEW_AUTO_STATUS:-${EXISTING_AUTO_STATUS:-true}}"
+NEW_AUTO_DEV="${NEW_AUTO_DEV:-${EXISTING_AUTO_DEV:-true}}"
+```
+```bash
+NEW_COLOR_JSON=$(echo "$NEW_COLOR" | jq -R 'if . == "null" or . == "" then null else . end')
+NEW_PAYLOAD=$(jq -n \
+  --argjson color "$NEW_COLOR_JSON" \
+  --argjson auto_status "$NEW_AUTO_STATUS" \
+  --argjson auto_dev_server "$NEW_AUTO_DEV" \
+  '{workspace_color: $color, auto_status: $auto_status, auto_dev_server: $auto_dev_server}')
+```
+Atomic write via jq tmp+mv deep-merge (`. * $new` deep-merges, preserving any
+sibling keys not in the new payload):
+```bash
+jq --argjson new "$NEW_PAYLOAD" '. * $new' \
+  .codebyplan/cmux.json > .codebyplan/cmux.json.tmp \
+  && mv .codebyplan/cmux.json.tmp .codebyplan/cmux.json
+```
+## Step 5 — Apply color immediately
+If `NEW_COLOR` is a non-empty, non-null string AND cmux is available in the current
+environment (`$CMUX_WORKSPACE_ID` is set), apply the color immediately so the change
+is visible without needing to restart the session:
+```bash
+if [ -n "$CMUX_WORKSPACE_ID" ] && [ -n "$NEW_COLOR" ] && [ "$NEW_COLOR" != "null" ]; then
+  CMUX_BIN="${CMUX_BUNDLED_CLI_PATH:-${CMUX_CLAUDE_HOOK_CMUX_BIN:-cmux}}"
+  "$CMUX_BIN" workspace-action --action set-color --color "$NEW_COLOR" 2>/dev/null || true
+fi
+```
+The guard (`2>/dev/null || true`) ensures a missing or non-zero cmux never blocks
+the skill — applying the color is best-effort.
+## Step 6 — Verify and report
+Re-read `.codebyplan/cmux.json` and emit a summary:
+```
+cmux Setup — Complete
+  workspace_color : <value or "(none)">
+  auto_status     : <true|false>
+  auto_dev_server : <true|false>
+Config written to .codebyplan/cmux.json
+Next: on the next SessionStart, codebyplan cmux-sync will apply the color
+automatically. Run `/cbp-setup-cmux --force` at any time to reconfigure.
+```
+## Key Rules
+- Atomic write (tmp + mv) — never leaves cmux.json in a partial state
+- Deep-merge with `. * $new` — sibling keys from future schema versions are preserved
+- Color apply is best-effort — a missing cmux binary never causes an error
+- `auto_status` and `auto_dev_server` default to `true` when absent (documented in
+  `packages/codebyplan-package/src/lib/types.ts` CmuxConfig)
+- cmux.json is COMMITTED (not gitignored) — workspace color is shared across team members

package/templates/skills/cbp-task-complete/SKILL.md CHANGED Viewed

@@ -138,6 +138,20 @@ Skip the push only when nothing was committed in Step 5 AND `/cbp-merge-main` re
 Call `complete_task(task_id)`. The server resolves the caller's worktree identity from the JWT/ctx and enforces the mutate-lock (CHK-140 TASK-3 — `caller_worktree_id` input field removed). The server auto-clears `assigned_user_id` + `assigned_worktree_id` on the task; if this was the last sibling task, it also clears the parent checkpoint's assignment. (Per CHK-104 hard-lock.)
+### Step 7.3: Push cmux Status (task done + progress)
+Push completion status and checkpoint progress to the cmux workspace sidebar. Self-no-ops outside cmux or when `auto_status` is disabled.
+Compute progress from the `get_tasks` data already loaded by the routing step: `completed = count of tasks with status 'completed'`, `total = total task count for the checkpoint`. For standalone tasks (no checkpoint), omit `--progress`.
+```bash
+# Checkpoint-bound task:
+npx codebyplan cmux-status --task "TASK-{N}: {task-title} done" --progress {completed}/{total}
+# Standalone task:
+npx codebyplan cmux-status --task "TASK-{N}: {task-title} done"
+```
 ### Step 8: Run Cleanup + Migration (inline)
 Apply the `cleanup` skill inline to remove orphan references to deleted/modified files. Then apply `migration` to propagate renames/moves to consumers. Both run without sub-agent spawns. Skip cleanup if no deletions/modifications; skip migration if cleanup handled everything.

package/templates/skills/cbp-task-start/SKILL.md CHANGED Viewed

@@ -228,6 +228,14 @@ Display context summary:
 - **Previous rounds**: [count] completed
 ```
+### Step 4.5: Push cmux Status
+Push the active checkpoint and task context to the cmux workspace sidebar. Self-no-ops outside cmux or when `auto_status` is disabled in `.codebyplan/cmux.json`.
+```bash
+npx codebyplan cmux-status --checkpoint "CHK-{NNN}: {checkpoint-title}" --task "TASK-{N}: {task-title}"
+```
 ### Step 5: Set Task Status
 Use MCP `update_task(task_id, status: "in_progress")`.