codebyplan 1.13.14 → 1.13.16

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -57,6 +57,16 @@ Read the plan from round context (`context.planner_output`). If no plan: `No app
57
57
 
58
58
  Read effective testing profile: `round.context.testing_profile_override` if set (user override for this round only), else `task.context.testing_profile` (set by planner Phase 4.8), else default `'web'`. Pass the effective profile to all per-wave `cbp-testing-qa-agent` spawns.
59
59
 
60
+ ### Step 2a: Auto-Dev-Server (cmux)
61
+
62
+ Fire the dev-server hook at round-execution start. Self-no-ops outside cmux or when `auto_dev_server` is disabled in `.codebyplan/cmux.json`.
63
+
64
+ ```bash
65
+ npx codebyplan cmux-serve --files "<comma-separated approved_plan.files_to_modify[].path>"
66
+ ```
67
+
68
+ The subcommand reads `.codebyplan/server.json` `port_allocations[]`, resolves which apps' source dirs intersect the round's files, probes each allocated port, and starts a cmux terminal split + browser pane for any app not already serving. Idempotent — if the port is already listening it only opens the browser pane (mitigating the multi-worktree port collision).
69
+
60
70
  ### Step 3: Route Execution Path
61
71
 
62
72
  Inspect `approved_plan.files_to_modify[]` and `approved_plan.round_type`. Four execution paths exist; pick the one that matches BEFORE Step 3a/3b.
@@ -143,6 +153,14 @@ If the approved plan includes database schema changes, RLS policies, or type gen
143
153
  - `status: 'blocked'` → present blocker to user via AskUserQuestion, resolve, re-spawn executor with remaining work
144
154
  - Deliverables incomplete → re-spawn executor with remaining deliverables (max 3 re-triggers). After 3 re-triggers, save partial output and proceed.
145
155
 
156
+ ### Step 3d: Push cmux QA Status
157
+
158
+ Push the round's QA outcome to the cmux workspace sidebar. Self-no-ops outside cmux or when `auto_status` is disabled. Status is one of: `completed`, `blocked`, or `re-triggering`.
159
+
160
+ ```bash
161
+ npx codebyplan cmux-status --qa "R{round_number} {status}"
162
+ ```
163
+
146
164
  ### Step 4: Dev-Server Probe (rounds 2+, web/desktop profile)
147
165
 
148
166
  When `round_number >= 2` AND `testing_profile` is `'web'` or `'desktop'` AND `files_changed` contains any UI file, probe the dev server BEFORE cbp-testing-qa-agent spawns (saves a full agent spawn when the server is down).
@@ -184,11 +202,35 @@ Input contracts: `cbp-testing-qa-agent` receives `executor_output`, `testing_pro
184
202
 
185
203
  ### Step 5b: Post-E2E Screenshot Review (cbp-frontend-ui Phase 6.5)
186
204
 
187
- Aggregate screenshots across ALL specialists that ran: `screenshots = Object.values(round.context.e2e_outputs ?? {}).flatMap(o => o.screenshots ?? [])`. When the aggregated list is non-empty, invoke the `cbp-frontend-ui` skill with `phase: 'screenshot_review'` (input: `files_changed`, `e2e_screenshots: <aggregated screenshots>`, `context: { checkpoint_goal, round_requirements }`). Under this phase the skill runs only Phase 6.5 (Rendered-Output Visual Review) + 7 + 8 — Phases 1-6 (style) already ran inline at executor Step 3.8 with `phase: 'style_only'`.
205
+ Aggregate across ALL specialists that ran:
206
+
207
+ ```js
208
+ screenshots = Object.values(round.context.e2e_outputs ?? {}).flatMap(o => o.screenshots ?? []);
209
+ e2e_gallery = Object.values(round.context.e2e_outputs ?? {}).flatMap(o => o.e2e_gallery ?? []);
210
+ ```
211
+
212
+ **Auto-new baseline handling**: for each entry in `e2e_gallery` where `is_new === true`, the
213
+ specialist has already run `git add <committed_path>`. No additional user gate is needed.
214
+ **Changed-baseline handling**: entries where `is_new === false` AND `baseline_diff_pct > threshold`
215
+ are `visual_regression` — do NOT auto-accept; surface as blocking gate at Step 7.
216
+
217
+ Persist `e2e_gallery` to `round.context.e2e_gallery` (additive alongside existing
218
+ `round.context.e2e_outputs`). This field is consumed by TASK-3 / checkpoint-end for DB upload.
219
+ Note: `e2e_gallery[]` is aggregated and persisted regardless of whether `cbp-frontend-ui` runs — the empty-gallery enforcement lives in `cbp-task-check` Phase 4, while the `screenshots[]` visual review (frontend-ui Phase 6.5) is a separate concern gated on `screenshots[]` being non-empty.
220
+
221
+ When the aggregated `screenshots` list is non-empty, invoke the `cbp-frontend-ui` skill with
222
+ `phase: 'screenshot_review'` (input: `files_changed`, `e2e_screenshots: <aggregated screenshots>`,
223
+ `context: { checkpoint_goal, round_requirements }`). Under this phase the skill runs only
224
+ Phase 6.5 (Rendered-Output Visual Review) + 7 + 8 — Phases 1-6 (style) already ran at Step 3.8.
188
225
 
189
- Persist findings to `round.context.frontend_ui_review` (merge with Step 3.8's style-only output if present). Baseline-regression findings surface as a BLOCKING gate at `/cbp-round-end` Step 7 (an explicit accept-or-fix user decision; baselines are NEVER auto-accepted); rendered_visual critical findings are surfaced in the Step 7 findings presentation. Neither auto-fails the round. cbp-testing-qa-agent does NOT read these findings (full independence per Step 5).
226
+ Persist findings to `round.context.frontend_ui_review` (merge with Step 3.8's style-only output
227
+ if present). Baseline-regression findings surface as a BLOCKING gate at `/cbp-round-end` Step 7
228
+ (an explicit accept-or-fix user decision; changed baselines are NEVER auto-accepted);
229
+ rendered_visual critical findings are surfaced in the Step 7 findings presentation. Neither
230
+ auto-fails the round. cbp-testing-qa-agent does NOT read these findings (full independence).
190
231
 
191
- **Skip** when `round.context.e2e_outputs` is absent/empty, the aggregated `screenshots` list is empty, or `testing_profile === 'claude_only'`.
232
+ **Skip** when `round.context.e2e_outputs` is absent/empty, the aggregated `screenshots` list
233
+ is empty, or `testing_profile === 'claude_only'`.
192
234
 
193
235
  ### Step 6: Hard-Fail Routing
194
236
 
@@ -215,9 +257,9 @@ When `cbp-testing-qa-agent` spawn fails OR the resolved `testing_profile` is `cl
215
257
 
216
258
  Update round context via MCP `update_round` / `update_standalone_round` per KIND:
217
259
 
218
- - `context`: { ...existing, executor_output, testing_qa_output, e2e_eligible, e2e_outputs, frontend_ui_review }
260
+ - `context`: { ...existing, executor_output, testing_qa_output, e2e_eligible, e2e_outputs, e2e_gallery, frontend_ui_review }
219
261
 
220
- `e2e_outputs` (a framework-keyed map of specialist outputs, e.g. `{ playwright: {...}, maestro: {...} }`) and `frontend_ui_review` are present only when the gates above admitted them (≥1 eligible framework ran AND Step 5b ran). `e2e_eligible[]` records which frameworks were eligible this round and drives the Step 6 `e2e_eligible_skipped` check.
262
+ `e2e_outputs` (a framework-keyed map of specialist outputs, e.g. `{ playwright: {...}, maestro: {...} }`), `e2e_gallery` (aggregated flat array of committed-PNG entries across all specialists — consumed by TASK-3 / checkpoint-end for DB upload), and `frontend_ui_review` are present only when the gates above admitted them (≥1 eligible framework ran AND Step 5b ran). `e2e_eligible[]` records which frameworks were eligible this round and drives the Step 6 `e2e_eligible_skipped` check.
221
263
 
222
264
  ### Step 8: Auto-trigger Round End
223
265
 
@@ -234,13 +276,13 @@ Trigger `/cbp-round-end`.
234
276
  - `testing_profile` from `task.context` governs which checks run — read it once in Step 2; pass to every testing-qa + e2e specialist spawn
235
277
  - `claude_only` profile skips all agent spawns (testing-qa AND `cbp-e2e-*`); runs hook syntax and skill structure checks inline
236
278
  - E2E dispatch is **config-driven and opt-out** (`.codebyplan/e2e.json`), not gated on `has_ui_work`/`testing_profile` — an eligible framework that silently does not run is an `e2e_eligible_skipped` hard-fail (`rules/e2e-mandatory.md`)
237
- - Step 5b (cbp-frontend-ui Phase 6.5) runs only when e2e produced screenshots — gated on the aggregated `e2e_outputs[*].screenshots[]` being non-empty
279
+ - Step 5b (cbp-frontend-ui Phase 6.5) runs only when e2e produced screenshots — gated on the aggregated `e2e_outputs[*].screenshots[]` being non-empty; `e2e_gallery[]` is always aggregated and persisted when any specialist ran
238
280
  - Claude NEVER git adds files in round commands
239
281
 
240
282
  ## Integration
241
283
 
242
284
  - **Reads**: MCP `get_current_task` / `get_current_standalone_task`, `get_rounds` / `get_standalone_rounds` (per KIND)
243
- - **Writes**: MCP `update_round` / `update_standalone_round` (context with executor_output + testing_qa_output + e2e_eligible + e2e_outputs + frontend_ui_review) — per KIND
285
+ - **Writes**: MCP `update_round` / `update_standalone_round` (context with executor_output + testing_qa_output + e2e_eligible + e2e_outputs + e2e_gallery + frontend_ui_review) — per KIND
244
286
  - **Spawns**: `cbp-round-executor` (per wave or single), `cbp-testing-qa-agent` (per wave, parallel sibling of the `cbp-e2e-*` specialists), the `cbp-e2e-*` specialists (config-driven dispatch per `context/testing/e2e.md`, one per eligible framework in `.codebyplan/e2e.json`), `cbp-database-agent` (if DB work), `cbp-security-agent` (if security review needed)
245
287
  - **Skill invocations**: `cbp-frontend-ui` at Step 5b with `phase: 'screenshot_review'` (post-e2e)
246
288
  - **Triggers**: `/cbp-round-end` (auto)
@@ -0,0 +1,170 @@
1
+ ---
2
+ scope: org-shared
3
+ name: cbp-setup-cmux
4
+ description: Configure .codebyplan/cmux.json — choose workspace color and toggle auto_status / auto_dev_server. Interactive, idempotent.
5
+ argument-hint: "[--force]"
6
+ model: sonnet
7
+ effort: xhigh
8
+ allowed-tools: Read, Write, Edit, Bash(cat *), Bash(jq *), Bash(test *), Bash(mv *), Bash(git check-ignore *), Bash(cmux *), AskUserQuestion
9
+ ---
10
+
11
+ # cmux Setup
12
+
13
+ Configure `.codebyplan/cmux.json` so the cmux workspace integration knows which color to
14
+ apply on session start and whether auto_status / auto_dev_server are enabled.
15
+
16
+ Invoke at any time. Existing values are preserved unless `--force` is passed.
17
+ Pass `--force` to re-ask all questions including already-configured fields.
18
+
19
+ ## Arguments
20
+
21
+ Inspect `$ARGUMENTS` for `--force`. If present, set `force_mode = true`.
22
+ Absent: use idempotent mode — preserve existing configured values, skip re-asking
23
+ already-set fields.
24
+
25
+ ## Step 1 — Parse --force and read existing config
26
+
27
+ ```bash
28
+ cat .codebyplan/cmux.json 2>/dev/null || echo '{}'
29
+ ```
30
+
31
+ Capture the result as `EXISTING_JSON`. Extract existing values:
32
+
33
+ ```bash
34
+ EXISTING_COLOR=$(echo "$EXISTING_JSON" | jq -r '.workspace_color // empty')
35
+ EXISTING_AUTO_STATUS=$(echo "$EXISTING_JSON" | jq -r '.auto_status // empty')
36
+ EXISTING_AUTO_DEV=$(echo "$EXISTING_JSON" | jq -r '.auto_dev_server // empty')
37
+ ```
38
+
39
+ ## Step 2 — Ask for workspace color
40
+
41
+ **Idempotency gate**: skip this question if `force_mode = false` AND `EXISTING_COLOR`
42
+ is a non-empty string. Print the preserved value and continue to Step 3.
43
+
44
+ Otherwise, AskUserQuestion:
45
+
46
+ ```
47
+ Workspace color for this cmux workspace
48
+
49
+ Choose a named color or enter a custom hex value (#RRGGBB).
50
+
51
+ Named colors:
52
+ A) Red B) Crimson C) Orange D) Amber
53
+ E) Olive F) Green G) Teal H) Aqua
54
+ I) Blue J) Navy K) Indigo L) Purple
55
+ M) Magenta N) Rose O) Brown P) Charcoal
56
+ Q) Custom hex — enter a value in the format #RRGGBB
57
+
58
+ Enter a letter (A-Q) or type a custom hex value directly:
59
+ ```
60
+
61
+ Validate the input:
62
+ - If a letter A-P: map to the corresponding color name (e.g. "A" → "Red").
63
+ - If "Q": prompt the user to type a custom hex value, then validate the typed value
64
+ against `^#[0-9A-Fa-f]{6}$`. If invalid, re-prompt once with an error. Never pass the
65
+ bare letter "Q" to set-color.
66
+ - If the raw input itself already looks like a hex value: validate it against
67
+ `^#[0-9A-Fa-f]{6}$` and accept on match.
68
+ - Empty input with no existing color: use no color (leave `workspace_color` as `null`).
69
+
70
+ Set `NEW_COLOR` to the chosen color name / hex, or `null` if skipped.
71
+
72
+ ## Step 3 — Ask for auto_status and auto_dev_server toggles
73
+
74
+ **Idempotency gate**: skip questions for fields that are already set (non-empty in
75
+ existing config) when `force_mode = false`.
76
+
77
+ For each unset field (or all fields when `force_mode = true`), AskUserQuestion:
78
+
79
+ ```
80
+ cmux integration toggles
81
+
82
+ auto_status: automatically run `cmux status` in this workspace on session start?
83
+ (default: on)
84
+ A) On
85
+ B) Off
86
+
87
+ auto_dev_server: automatically start the dev server via cmux on session start?
88
+ (default: on)
89
+ A) On
90
+ B) Off
91
+ ```
92
+
93
+ Ask both in a single question when both are unset. Set `NEW_AUTO_STATUS` and
94
+ `NEW_AUTO_DEV` to `true` or `false` based on the answers.
95
+
96
+ ## Step 4 — Write .codebyplan/cmux.json
97
+
98
+ Build the updated payload using jq deep-merge so sibling fields added by future
99
+ schema versions are preserved. Assemble variables first.
100
+
101
+ Backfill any field that the Step 2/3 idempotency gates skipped, so each `jq
102
+ --argjson` always receives valid JSON — an empty shell variable makes `jq` exit 1,
103
+ which would leave `NEW_PAYLOAD` empty and silently no-op the write. Fall back to the
104
+ existing value, then to the default (`true` for the toggles):
105
+
106
+ ```bash
107
+ NEW_COLOR="${NEW_COLOR:-$EXISTING_COLOR}"
108
+ NEW_AUTO_STATUS="${NEW_AUTO_STATUS:-${EXISTING_AUTO_STATUS:-true}}"
109
+ NEW_AUTO_DEV="${NEW_AUTO_DEV:-${EXISTING_AUTO_DEV:-true}}"
110
+ ```
111
+
112
+ ```bash
113
+ NEW_COLOR_JSON=$(echo "$NEW_COLOR" | jq -R 'if . == "null" or . == "" then null else . end')
114
+ NEW_PAYLOAD=$(jq -n \
115
+ --argjson color "$NEW_COLOR_JSON" \
116
+ --argjson auto_status "$NEW_AUTO_STATUS" \
117
+ --argjson auto_dev_server "$NEW_AUTO_DEV" \
118
+ '{workspace_color: $color, auto_status: $auto_status, auto_dev_server: $auto_dev_server}')
119
+ ```
120
+
121
+ Atomic write via jq tmp+mv deep-merge (`. * $new` deep-merges, preserving any
122
+ sibling keys not in the new payload):
123
+
124
+ ```bash
125
+ jq --argjson new "$NEW_PAYLOAD" '. * $new' \
126
+ .codebyplan/cmux.json > .codebyplan/cmux.json.tmp \
127
+ && mv .codebyplan/cmux.json.tmp .codebyplan/cmux.json
128
+ ```
129
+
130
+ ## Step 5 — Apply color immediately
131
+
132
+ If `NEW_COLOR` is a non-empty, non-null string AND cmux is available in the current
133
+ environment (`$CMUX_WORKSPACE_ID` is set), apply the color immediately so the change
134
+ is visible without needing to restart the session:
135
+
136
+ ```bash
137
+ if [ -n "$CMUX_WORKSPACE_ID" ] && [ -n "$NEW_COLOR" ] && [ "$NEW_COLOR" != "null" ]; then
138
+ CMUX_BIN="${CMUX_BUNDLED_CLI_PATH:-${CMUX_CLAUDE_HOOK_CMUX_BIN:-cmux}}"
139
+ "$CMUX_BIN" workspace-action --action set-color --color "$NEW_COLOR" 2>/dev/null || true
140
+ fi
141
+ ```
142
+
143
+ The guard (`2>/dev/null || true`) ensures a missing or non-zero cmux never blocks
144
+ the skill — applying the color is best-effort.
145
+
146
+ ## Step 6 — Verify and report
147
+
148
+ Re-read `.codebyplan/cmux.json` and emit a summary:
149
+
150
+ ```
151
+ cmux Setup — Complete
152
+
153
+ workspace_color : <value or "(none)">
154
+ auto_status : <true|false>
155
+ auto_dev_server : <true|false>
156
+
157
+ Config written to .codebyplan/cmux.json
158
+
159
+ Next: on the next SessionStart, codebyplan cmux-sync will apply the color
160
+ automatically. Run `/cbp-setup-cmux --force` at any time to reconfigure.
161
+ ```
162
+
163
+ ## Key Rules
164
+
165
+ - Atomic write (tmp + mv) — never leaves cmux.json in a partial state
166
+ - Deep-merge with `. * $new` — sibling keys from future schema versions are preserved
167
+ - Color apply is best-effort — a missing cmux binary never causes an error
168
+ - `auto_status` and `auto_dev_server` default to `true` when absent (documented in
169
+ `packages/codebyplan-package/src/lib/types.ts` CmuxConfig)
170
+ - cmux.json is COMMITTED (not gitignored) — workspace color is shared across team members
@@ -138,6 +138,20 @@ Skip the push only when nothing was committed in Step 5 AND `/cbp-merge-main` re
138
138
 
139
139
  Call `complete_task(task_id)`. The server resolves the caller's worktree identity from the JWT/ctx and enforces the mutate-lock (CHK-140 TASK-3 — `caller_worktree_id` input field removed). The server auto-clears `assigned_user_id` + `assigned_worktree_id` on the task; if this was the last sibling task, it also clears the parent checkpoint's assignment. (Per CHK-104 hard-lock.)
140
140
 
141
+ ### Step 7.3: Push cmux Status (task done + progress)
142
+
143
+ Push completion status and checkpoint progress to the cmux workspace sidebar. Self-no-ops outside cmux or when `auto_status` is disabled.
144
+
145
+ Compute progress from the `get_tasks` data already loaded by the routing step: `completed = count of tasks with status 'completed'`, `total = total task count for the checkpoint`. For standalone tasks (no checkpoint), omit `--progress`.
146
+
147
+ ```bash
148
+ # Checkpoint-bound task:
149
+ npx codebyplan cmux-status --task "TASK-{N}: {task-title} done" --progress {completed}/{total}
150
+
151
+ # Standalone task:
152
+ npx codebyplan cmux-status --task "TASK-{N}: {task-title} done"
153
+ ```
154
+
141
155
  ### Step 8: Run Cleanup + Migration (inline)
142
156
 
143
157
  Apply the `cleanup` skill inline to remove orphan references to deleted/modified files. Then apply `migration` to propagate renames/moves to consumers. Both run without sub-agent spawns. Skip cleanup if no deletions/modifications; skip migration if cleanup handled everything.
@@ -228,6 +228,14 @@ Display context summary:
228
228
  - **Previous rounds**: [count] completed
229
229
  ```
230
230
 
231
+ ### Step 4.5: Push cmux Status
232
+
233
+ Push the active checkpoint and task context to the cmux workspace sidebar. Self-no-ops outside cmux or when `auto_status` is disabled in `.codebyplan/cmux.json`.
234
+
235
+ ```bash
236
+ npx codebyplan cmux-status --checkpoint "CHK-{NNN}: {checkpoint-title}" --task "TASK-{N}: {task-title}"
237
+ ```
238
+
231
239
  ### Step 5: Set Task Status
232
240
 
233
241
  Use MCP `update_task(task_id, status: "in_progress")`.