codebyplan 1.13.23 → 1.13.25

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (31) hide show
  1. package/dist/cli.js +445 -187
  2. package/package.json +2 -2
  3. package/templates/agents/cbp-cc-executor.md +7 -7
  4. package/templates/agents/cbp-improve-round.md +2 -2
  5. package/templates/agents/cbp-round-executor.md +20 -4
  6. package/templates/agents/cbp-testing-qa-agent.md +3 -3
  7. package/templates/hooks/README.md +1 -1
  8. package/templates/hooks/cbp-statusline.mjs +106 -11
  9. package/templates/hooks/cbp-statusline.py +79 -13
  10. package/templates/hooks/cbp-statusline.sh +97 -17
  11. package/templates/hooks/validate-structure-patterns.sh +1 -1
  12. package/templates/skills/cbp-checkpoint-check/SKILL.md +2 -2
  13. package/templates/skills/cbp-checkpoint-complete/SKILL.md +2 -2
  14. package/templates/skills/cbp-merge-main/SKILL.md +1 -1
  15. package/templates/skills/cbp-round-end/SKILL.md +12 -35
  16. package/templates/skills/cbp-round-end/reference/findings-presentation.md +76 -3
  17. package/templates/skills/cbp-round-execute/SKILL.md +13 -60
  18. package/templates/skills/cbp-round-start/SKILL.md +3 -1
  19. package/templates/skills/cbp-round-update/SKILL.md +1 -1
  20. package/templates/skills/cbp-session-start/SKILL.md +2 -0
  21. package/templates/skills/cbp-ship-configure/SKILL.md +1 -1
  22. package/templates/skills/cbp-ship-configure/reference/supabase.md +2 -2
  23. package/templates/skills/cbp-ship-main/SKILL.md +2 -0
  24. package/templates/skills/cbp-standalone-task-create/SKILL.md +1 -1
  25. package/templates/skills/cbp-task-check/SKILL.md +1 -1
  26. package/templates/skills/cbp-task-complete/SKILL.md +1 -1
  27. package/templates/skills/cbp-task-create/SKILL.md +50 -1
  28. package/templates/skills/cbp-task-start/SKILL.md +2 -2
  29. package/templates/skills/cbp-task-testing/SKILL.md +2 -2
  30. package/templates/skills/cbp-todo/SKILL.md +36 -3
  31. package/templates/skills/cbp-todo/qa-regression.md +8 -1
@@ -54,6 +54,8 @@ Pass `--dry-run` through if the skill was invoked with a dry-run arg.
54
54
 
55
55
  Parse JSON from Step 3. Report `pr_url`, `merge_commit`, `branch_deleted`. If `checks_failed: true`, surface `checks_failure_reason` and stop.
56
56
 
57
+ > **gh false-negative workaround.** `codebyplan ship` can report `checks_failed: true` when the underlying `gh` query reads a stale/mismatched check field (it queries `conclusion`/`state` and the GitHub API can lag). Before treating the stop as final, verify the real status: `gh pr checks <PR> --watch`. If every required check is green, merge manually with `gh pr merge <PR> --merge` — add `--admin` ONLY to escape a transient secondary-rate-limit loop, never to bypass a genuinely-failing gate. Never auto-merge silently on a `checks_failed` report; this verification is a manual decision.
58
+
57
59
  If `bumps[]` is present with any non-skipped entry, surface a **Version bumps** line per package — `<name>: <currentVersion> → <nextVersion>` — so the user sees what this PR will publish on merge.
58
60
 
59
61
  If `branch_deleted === true`, run a conditional Supabase preview-branch teardown for the feat branch that was just merged:
@@ -17,7 +17,7 @@ Create a new standalone task — independent of any checkpoint. Gathers user con
17
17
 
18
18
  ## Identifier Notation
19
19
 
20
- Standalone tasks use a bare number (e.g. `45` = standalone TASK-45). There is no checkpoint segment. Canonical notation follows `.claude/rules/notation-consistency.md`.
20
+ Standalone tasks use a bare number (e.g. `45` = standalone TASK-45). There is no checkpoint segment. Canonical notation follows `cbp-round-start` Step 0 "CHK / TASK / ROUND Identifier Notation Vocabulary".
21
21
 
22
22
  ## Instructions
23
23
 
@@ -35,7 +35,7 @@ Inline-fallback is NOT a quality downgrade trapdoor — every Phase from the age
35
35
 
36
36
  ### Step 1: Parse `$ARGUMENTS`
37
37
 
38
- Parse the argument using the canonical chk-task-round notation (see `.claude/rules/notation-consistency.md`):
38
+ Parse the argument using the canonical chk-task-round notation (see `cbp-round-start` Step 0 "CHK / TASK / ROUND Identifier Notation Vocabulary"):
39
39
 
40
40
  | Shape | Regex | Resolves to |
41
41
  |-------|-------|-------------|
@@ -14,7 +14,7 @@ Complete the current task. Auto-triggered by `/cbp-task-testing` when all tests
14
14
 
15
15
  ### Step 1: Parse `$ARGUMENTS`
16
16
 
17
- Parse the argument using the canonical chk-task-round notation (see `.claude/rules/notation-consistency.md`):
17
+ Parse the argument using the canonical chk-task-round notation (see `cbp-round-start` Step 0 "CHK / TASK / ROUND Identifier Notation Vocabulary"):
18
18
 
19
19
  | Shape | Regex | Resolves to |
20
20
  |-------|-------|-------------|
@@ -17,7 +17,7 @@ Create a new task within the active checkpoint. Gathers user context, analyzes e
17
17
 
18
18
  ## Identifier Notation
19
19
 
20
- This skill operates on the **active** checkpoint resolved via MCP `get_current_task` and does not accept a positional identifier argument. The task it creates gets its `number` from the next-available slot within the active checkpoint (checkpoint-bound). Canonical chk-task-round notation — used in prose, error messages, and cross-references — follows `.claude/rules/notation-consistency.md` "CHK / TASK / ROUND Identifier Notation": `108-1` (CHK-108 TASK-1), `108-1-2` (round 2 of CHK-108 TASK-1).
20
+ This skill operates on the **active** checkpoint resolved via MCP `get_current_task` and does not accept a positional identifier argument. The task it creates gets its `number` from the next-available slot within the active checkpoint (checkpoint-bound). Canonical chk-task-round notation — used in prose, error messages, and cross-references — follows `cbp-round-start` Step 0 "CHK / TASK / ROUND Identifier Notation Vocabulary": `108-1` (CHK-108 TASK-1), `108-1-2` (round 2 of CHK-108 TASK-1).
21
21
 
22
22
  **Bare-number argument**: if a bare number (e.g. `42`) is provided with no checkpoint context, this skill cannot resolve it to a checkpoint-bound task:
23
23
 
@@ -61,6 +61,55 @@ Use MCP `get_tasks` for the checkpoint. Review:
61
61
  - Task statuses (completed, in_progress, pending)
62
62
  - Dependencies between tasks
63
63
 
64
+ ### Step 3.5: Immediate Issue Capture Contract
65
+
66
+ #### Default to Current Scope
67
+
68
+ Discovered issues MUST be captured. The default target is current scope (round → task → checkpoint); standalone tasks are RARE.
69
+
70
+ #### How to Capture (top-down routing — use the first row that fits)
71
+
72
+ | Situation | Action |
73
+ |-----------|--------|
74
+ | Trivial inline fix (≤5 min, mechanical, scope-clean) | Apply in the CURRENT round per `cbp-round-end` reference `findings-presentation.md` "Trivial-Resolution Exception" |
75
+ | Related to the current task's domain | Create a new ROUND in the current task |
76
+ | Fits the current checkpoint goal but is meaningfully separate | Create a new TASK in the current checkpoint via `create_task(checkpoint_id)` |
77
+ | Large enough to need multiple tasks AND fits no current checkpoint | Create a NEW CHECKPOINT via `create_checkpoint` |
78
+ | Genuinely orphan work — off-axis from all active checkpoints AND user has explicitly confirmed standalone | Create a STANDALONE task via `create_task(repo_id)` — rare |
79
+ | Timed re-check waiting on upstream fix | Standalone task with `context.re_check_date` |
80
+
81
+ Standalone routing requires explicit user confirmation — silence is NOT confirmation.
82
+
83
+ #### Consolidation Before Creation (MANDATORY)
84
+
85
+ Before calling `create_task` for a finding, run a two-step dedup + bundle check:
86
+
87
+ **Step 1 — Dedup against pending standalone tasks:**
88
+
89
+ ```
90
+ mcp__codebyplan__get_tasks(repo_id, standalone=true, status="pending")
91
+ ```
92
+
93
+ Compare the proposed task to each pending standalone task on these match dimensions:
94
+
95
+ | Match dimension | Action if matched |
96
+ |-----------------|-------------------|
97
+ | Same target file(s) | STOP — `update_task` to append, do not create new |
98
+ | Same feature / module | STOP — `update_task` to append, do not create new |
99
+ | Same root cause (e.g. "prettier drift", "router bug") | STOP — `update_task` to append, do not create new |
100
+ | Same dependency / advisory | STOP — `update_task` to append, do not create new |
101
+
102
+ If a match is found, surface it to the user before appending:
103
+
104
+ ```
105
+ Found existing pending task TASK-[N]: [title]
106
+ This finding overlaps on [dimension]. Append to TASK-[N] instead of creating new? (yes / no — create separately)
107
+ ```
108
+
109
+ Default to append. Only create a separate task if the user explicitly says no, OR if the existing task is in_progress / completed (in which case use `context.related_task_ids[]` on the new task to cross-reference).
110
+
111
+ **Step 2 — Bundle within the same agent invocation:** If a single agent run surfaces 2+ findings that share any match dimension, MERGE them into a SINGLE `create_task` call. Do NOT loop and create one task per finding.
112
+
64
113
  ### Step 4: Analyze Codebase Context
65
114
 
66
115
  Brief inline analysis:
@@ -15,7 +15,7 @@ Start a task by loading context from the database and preparing for work.
15
15
 
16
16
  ### Step 1: Parse `$ARGUMENTS`
17
17
 
18
- Parse the argument using the canonical chk-task-round notation (see `.claude/rules/notation-consistency.md`):
18
+ Parse the argument using the canonical chk-task-round notation (see `cbp-round-start` Step 0 "CHK / TASK / ROUND Identifier Notation Vocabulary"):
19
19
 
20
20
  | Shape | Regex | Resolves to |
21
21
  |-------|-------|-------------|
@@ -158,7 +158,7 @@ Skip this step if the task title and requirements contain no CVE ID (`CVE-YYYY-N
158
158
  1. Run `pnpm audit --json` from the monorepo root. If it fails (network, registry), surface the error and stop — do NOT start a CVE task without a clean snapshot.
159
159
  2. Parse the advisory list from the JSON output.
160
160
  3. Call MCP `get_tasks(repo_id)`; for each advisory, match by ID in task title/requirements.
161
- 4. For every advisory with no matching open task, call MCP `create_task` per `immediate-issue-capture.md`.
161
+ 4. For every advisory with no matching open task, call MCP `create_task` per `cbp-task-create` Step 3.5 "Immediate Issue Capture Contract".
162
162
  5. Report the sweep result:
163
163
  ```
164
164
  ## CVE/GHSA Audit Sweep
@@ -25,7 +25,7 @@ Per-wave `testing-qa-agent` runs inside `/cbp-round-execute` Step 5. This skill
25
25
 
26
26
  ### Step 1: Parse `$ARGUMENTS`
27
27
 
28
- Parse the argument using the canonical chk-task-round notation (see `.claude/rules/notation-consistency.md`):
28
+ Parse the argument using the canonical chk-task-round notation (see `cbp-round-start` Step 0 "CHK / TASK / ROUND Identifier Notation Vocabulary"):
29
29
 
30
30
  | Shape | Regex | Resolves to |
31
31
  |-------|-------|-------------|
@@ -170,7 +170,7 @@ For each finding, record: `{category, file, description, severity: 'low'|'medium
170
170
 
171
171
  Findings with severity `medium` or `high` feed the Step 9 problem classification. `low` findings are recorded in `task_testing_output` for the record but do not block.
172
172
 
173
- If any finding points to a need that exceeds task scope (e.g. a utility worth extracting for the wider codebase, a convention the repo should adopt globally), route per `immediate-issue-capture.md` "How to Capture" — default to a NEW TASK in the current checkpoint, not a standalone task. Standalone routing applies only when the finding is genuinely off-axis from every active checkpoint AND the user has confirmed standalone routing.
173
+ If any finding points to a need that exceeds task scope (e.g. a utility worth extracting for the wider codebase, a convention the repo should adopt globally), route per `cbp-task-create` Step 3.5 "Immediate Issue Capture Contract — How to Capture" — default to a NEW TASK in the current checkpoint, not a standalone task. Standalone routing applies only when the finding is genuinely off-axis from every active checkpoint AND the user has confirmed standalone routing.
174
174
 
175
175
  ### Step 7: Separate Claude-Testable vs User-Testable
176
176
 
@@ -44,7 +44,7 @@ With `USER_ID` resolved, call MCP `get_todos({ repo_id, user_id, worktree_id })`
44
44
 
45
45
  - The head carries `command`, `instructions`, `state`, `metadata`, `worktree_id`, `checkpoint_id`, `task_id`.
46
46
  - The routing context (checkpoint/task) lives in **`rows[0].metadata`**.
47
- - `get_todos` is **pure-read** — `apps/todo-worker` is the sole regen authority. NEVER call `regenerate_todos_for_repo`.
47
+ - `get_todos` is **pure-read** — `apps/todo-worker` is the sole regen authority. NEVER call `get_next_action` or `regenerate_todos_for_repo`.
48
48
  - Empty array, or `USER_ID` unavailable → go to Step 3 (empty-queue fallback).
49
49
 
50
50
  Queue `command` values may use the `/codebyplan:<name>` plugin-namespace form (e.g. `/codebyplan:round-start`); treat each as the matching `/cbp-<name>` skill for the Step 2 matrix.
@@ -76,6 +76,39 @@ Options:
76
76
 
77
77
  `<short-uuid>` = first 8 chars of the worktree UUID. Caller unresolved → render "this one" as `(unresolved / —)`. Name lookup miss → show the UUID alone. Wait for the user. NEVER propose reassigning the checkpoint to the caller worktree.
78
78
 
79
+ ### Step 1.55: Stale-Entity Guard
80
+
81
+ Ownership passed (Step 1.5). Now guard against a lagging queue routing into already-finished work — the queue head (`rows[0]`) is produced by the todo-worker and can be stale (`health_check.todos_freshness_ok === false` is the signal), so a head command may target a checkpoint or task that was completed/cancelled since the queue was last regenerated. Reuse the `status` + task statuses already loaded in Step 1.5 — no extra reads, and NEVER call `get_next_action` or `regenerate_todos_for_repo` (an empty/stale queue is reconciled by the todo-worker, never re-derived here).
82
+
83
+ Reject the auto-trigger when EITHER holds:
84
+
85
+ - The target checkpoint's `status` is `completed` or `cancelled`.
86
+ - Every task returned by `get_tasks(checkpoint_id)` (loaded in Step 1.5) has status `completed` or `cancelled` — no actionable task remains.
87
+
88
+ On reject, surface the mismatch — naming the head command and the stale entity — then **STOP** (do not auto-trigger the head command). Use the variant matching the trigger condition:
89
+
90
+ - Checkpoint itself `completed`/`cancelled`:
91
+
92
+ ```
93
+ ⚠ Stale queue head: <command> targets CHK-<NNN> which is <status>.
94
+ The todo queue has not caught up. Options:
95
+ A) /cbp-todo — re-resolve the queue
96
+ B) /cbp-checkpoint-create — start new work
97
+ C) /cbp-session-end
98
+ ```
99
+
100
+ - Checkpoint still active but every task `completed`/`cancelled`:
101
+
102
+ ```
103
+ ⚠ Stale queue head: <command> targets CHK-<NNN> — all tasks are completed/cancelled, no actionable work remains.
104
+ The todo queue has not caught up. Options:
105
+ A) /cbp-todo — re-resolve the queue
106
+ B) /cbp-checkpoint-create — start new work
107
+ C) /cbp-session-end
108
+ ```
109
+
110
+ Skip this gate when the routing target has no checkpoint (idle — see Step 3) or the command is `/cbp-session-start`.
111
+
79
112
  ### Step 1.6: Checkpoint Planning Gate
80
113
 
81
114
  Ownership passed (Step 1.5). Now gate on the checkpoint's planning + activation state — reusing the `plan` + `status` + task count loaded in Step 1.5 — so work never starts on a half-baked or un-activated checkpoint. Evaluate two rules in order (Rule A wins if both could match):
@@ -154,11 +187,11 @@ Wait for the user. Do not auto-trigger `/cbp-session-end` — session wrap-up is
154
187
 
155
188
  ### Step 4: Display and Auto-trigger
156
189
 
157
- Reached only when the Step 1.5 ownership gate allowed routing to continue and the Step 1.6 planning gate fell through (no hand-off). Show `rows[0].instructions` (so the user sees what is happening), then auto-trigger `rows[0].command` (its `/cbp-<name>` form).
190
+ Reached only when the Step 1.5 ownership gate allowed routing to continue, the Step 1.55 stale-entity guard did not reject, and the Step 1.6 planning gate fell through (no hand-off). Show `rows[0].instructions` (so the user sees what is happening), then auto-trigger `rows[0].command` (its `/cbp-<name>` form).
158
191
 
159
192
  ## Integration
160
193
 
161
194
  - **Called by**: `/cbp-session-start`, `/cbp-task-complete`, `/cbp-checkpoint-complete`, manual, after `/clear`
162
195
  - **Resolves**: `npx codebyplan resolve-worktree --json` (worktree id + distress signal), `npx codebyplan whoami --json` (user id)
163
196
  - **Reads**: MCP `get_todos`, `get_current_task`, `get_rounds`, `get_checkpoints`, `get_tasks`, `get_worktrees`
164
- - **Triggers**: `rows[0].command` (auto, after the Step 1.5 ownership gate); Step 1.6 overrides to `/cbp-checkpoint-plan` (unplanned) or `/cbp-checkpoint-start` (planned-but-pending)
197
+ - **Triggers**: `rows[0].command` (auto, after the Step 1.5 ownership gate and Step 1.55 stale-entity guard pass, and the Step 1.6 planning gate falls through); Step 1.55 overrides to STOP (stale completed/cancelled entity); Step 1.6 overrides to `/cbp-checkpoint-plan` (unplanned) or `/cbp-checkpoint-start` (planned-but-pending)
@@ -12,8 +12,9 @@ Repo under test: `2ff6d405-39c5-47b8-a6d1-59f998ac0537`. Resolve a real `user_id
12
12
 
13
13
  ## Preconditions
14
14
 
15
- - `get_todos` is the only Step 1 read — confirm no `regenerate_todos_for_repo` call remains (`grep -n 'regenerate_todos_for_repo' SKILL.md` → no hits).
15
+ - `get_todos` is the only Step 1 read — confirm no `get_next_action` / `regenerate_todos_for_repo` call remains (`grep -n 'get_next_action\|regenerate_todos_for_repo' SKILL.md` → no hits).
16
16
  - Step 0 uses `resolve-worktree --json` and `whoami --json`.
17
+ - Step 1.55 (Stale-Entity Guard) must NOT call `get_next_action` or `regenerate_todos_for_repo` when rejecting a stale head — it reuses the checkpoint `status` + task statuses already loaded in Step 1.5 (the same grep above covers it).
17
18
 
18
19
  ## Scenario A — caller owns the work → auto-trigger
19
20
 
@@ -40,6 +41,12 @@ Repo under test: `2ff6d405-39c5-47b8-a6d1-59f998ac0537`. Resolve a real `user_id
40
41
  2. Step 3 fallback: `get_current_task({ repo_id, worktree_id })` / `get_checkpoints({ repo_id, worktree_id, status: 'active' })` surface a checkpoint whose `worktree_id` differs from the caller.
41
42
  3. **Expected**: the Step 1.5 ownership gate (applied to the fallback target) blocks the discovered work with the same "Work mismatch" message. NO auto-trigger. `regenerate_todos_for_repo` is never called.
42
43
 
44
+ ## Scenario D — stale queue head for a completed/cancelled entity → halt
45
+
46
+ 1. Caller owns the active checkpoint (Scenario A ownership holds), but the todo-worker queue is lagging — `health_check.todos_freshness_ok === false`. The `get_todos` head targets a checkpoint (or task) that has since been completed or cancelled.
47
+ 2. Step 1.5 loads the target checkpoint `status` + `get_tasks(checkpoint_id)` (already required for the ownership/planning gates — no extra reads).
48
+ 3. **Expected**: Step 1.55 rejects the auto-trigger — target checkpoint `status` is `completed`/`cancelled` (or every task is `completed`/`cancelled`) — surfaces the "Stale queue head" message naming the command + `CHK-NNN`[ `TASK-N`] + status, and STOPS. NO auto-trigger. NO `get_next_action` / `regenerate_todos_for_repo` call.
49
+
43
50
  ## Edge — both null
44
51
 
45
52
  Caller `WORKTREE_ID` empty AND target checkpoint `worktree_id` `null` → legitimate main-repo / unassigned work → auto-trigger allowed.