codebyplan 1.13.23 → 1.13.25
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/cli.js +445 -187
- package/package.json +2 -2
- package/templates/agents/cbp-cc-executor.md +7 -7
- package/templates/agents/cbp-improve-round.md +2 -2
- package/templates/agents/cbp-round-executor.md +20 -4
- package/templates/agents/cbp-testing-qa-agent.md +3 -3
- package/templates/hooks/README.md +1 -1
- package/templates/hooks/cbp-statusline.mjs +106 -11
- package/templates/hooks/cbp-statusline.py +79 -13
- package/templates/hooks/cbp-statusline.sh +97 -17
- package/templates/hooks/validate-structure-patterns.sh +1 -1
- package/templates/skills/cbp-checkpoint-check/SKILL.md +2 -2
- package/templates/skills/cbp-checkpoint-complete/SKILL.md +2 -2
- package/templates/skills/cbp-merge-main/SKILL.md +1 -1
- package/templates/skills/cbp-round-end/SKILL.md +12 -35
- package/templates/skills/cbp-round-end/reference/findings-presentation.md +76 -3
- package/templates/skills/cbp-round-execute/SKILL.md +13 -60
- package/templates/skills/cbp-round-start/SKILL.md +3 -1
- package/templates/skills/cbp-round-update/SKILL.md +1 -1
- package/templates/skills/cbp-session-start/SKILL.md +2 -0
- package/templates/skills/cbp-ship-configure/SKILL.md +1 -1
- package/templates/skills/cbp-ship-configure/reference/supabase.md +2 -2
- package/templates/skills/cbp-ship-main/SKILL.md +2 -0
- package/templates/skills/cbp-standalone-task-create/SKILL.md +1 -1
- package/templates/skills/cbp-task-check/SKILL.md +1 -1
- package/templates/skills/cbp-task-complete/SKILL.md +1 -1
- package/templates/skills/cbp-task-create/SKILL.md +50 -1
- package/templates/skills/cbp-task-start/SKILL.md +2 -2
- package/templates/skills/cbp-task-testing/SKILL.md +2 -2
- package/templates/skills/cbp-todo/SKILL.md +36 -3
- package/templates/skills/cbp-todo/qa-regression.md +8 -1
|
@@ -54,6 +54,8 @@ Pass `--dry-run` through if the skill was invoked with a dry-run arg.
|
|
|
54
54
|
|
|
55
55
|
Parse JSON from Step 3. Report `pr_url`, `merge_commit`, `branch_deleted`. If `checks_failed: true`, surface `checks_failure_reason` and stop.
|
|
56
56
|
|
|
57
|
+
> **gh false-negative workaround.** `codebyplan ship` can report `checks_failed: true` when the underlying `gh` query reads a stale/mismatched check field (it queries `conclusion`/`state` and the GitHub API can lag). Before treating the stop as final, verify the real status: `gh pr checks <PR> --watch`. If every required check is green, merge manually with `gh pr merge <PR> --merge` — add `--admin` ONLY to escape a transient secondary-rate-limit loop, never to bypass a genuinely-failing gate. Never auto-merge silently on a `checks_failed` report; this verification is a manual decision.
|
|
58
|
+
|
|
57
59
|
If `bumps[]` is present with any non-skipped entry, surface a **Version bumps** line per package — `<name>: <currentVersion> → <nextVersion>` — so the user sees what this PR will publish on merge.
|
|
58
60
|
|
|
59
61
|
If `branch_deleted === true`, run a conditional Supabase preview-branch teardown for the feat branch that was just merged:
|
|
@@ -17,7 +17,7 @@ Create a new standalone task — independent of any checkpoint. Gathers user con
|
|
|
17
17
|
|
|
18
18
|
## Identifier Notation
|
|
19
19
|
|
|
20
|
-
Standalone tasks use a bare number (e.g. `45` = standalone TASK-45). There is no checkpoint segment. Canonical notation follows
|
|
20
|
+
Standalone tasks use a bare number (e.g. `45` = standalone TASK-45). There is no checkpoint segment. Canonical notation follows `cbp-round-start` Step 0 "CHK / TASK / ROUND Identifier Notation Vocabulary".
|
|
21
21
|
|
|
22
22
|
## Instructions
|
|
23
23
|
|
|
@@ -35,7 +35,7 @@ Inline-fallback is NOT a quality downgrade trapdoor — every Phase from the age
|
|
|
35
35
|
|
|
36
36
|
### Step 1: Parse `$ARGUMENTS`
|
|
37
37
|
|
|
38
|
-
Parse the argument using the canonical chk-task-round notation (see
|
|
38
|
+
Parse the argument using the canonical chk-task-round notation (see `cbp-round-start` Step 0 "CHK / TASK / ROUND Identifier Notation Vocabulary"):
|
|
39
39
|
|
|
40
40
|
| Shape | Regex | Resolves to |
|
|
41
41
|
|-------|-------|-------------|
|
|
@@ -14,7 +14,7 @@ Complete the current task. Auto-triggered by `/cbp-task-testing` when all tests
|
|
|
14
14
|
|
|
15
15
|
### Step 1: Parse `$ARGUMENTS`
|
|
16
16
|
|
|
17
|
-
Parse the argument using the canonical chk-task-round notation (see
|
|
17
|
+
Parse the argument using the canonical chk-task-round notation (see `cbp-round-start` Step 0 "CHK / TASK / ROUND Identifier Notation Vocabulary"):
|
|
18
18
|
|
|
19
19
|
| Shape | Regex | Resolves to |
|
|
20
20
|
|-------|-------|-------------|
|
|
@@ -17,7 +17,7 @@ Create a new task within the active checkpoint. Gathers user context, analyzes e
|
|
|
17
17
|
|
|
18
18
|
## Identifier Notation
|
|
19
19
|
|
|
20
|
-
This skill operates on the **active** checkpoint resolved via MCP `get_current_task` and does not accept a positional identifier argument. The task it creates gets its `number` from the next-available slot within the active checkpoint (checkpoint-bound). Canonical chk-task-round notation — used in prose, error messages, and cross-references — follows
|
|
20
|
+
This skill operates on the **active** checkpoint resolved via MCP `get_current_task` and does not accept a positional identifier argument. The task it creates gets its `number` from the next-available slot within the active checkpoint (checkpoint-bound). Canonical chk-task-round notation — used in prose, error messages, and cross-references — follows `cbp-round-start` Step 0 "CHK / TASK / ROUND Identifier Notation Vocabulary": `108-1` (CHK-108 TASK-1), `108-1-2` (round 2 of CHK-108 TASK-1).
|
|
21
21
|
|
|
22
22
|
**Bare-number argument**: if a bare number (e.g. `42`) is provided with no checkpoint context, this skill cannot resolve it to a checkpoint-bound task:
|
|
23
23
|
|
|
@@ -61,6 +61,55 @@ Use MCP `get_tasks` for the checkpoint. Review:
|
|
|
61
61
|
- Task statuses (completed, in_progress, pending)
|
|
62
62
|
- Dependencies between tasks
|
|
63
63
|
|
|
64
|
+
### Step 3.5: Immediate Issue Capture Contract
|
|
65
|
+
|
|
66
|
+
#### Default to Current Scope
|
|
67
|
+
|
|
68
|
+
Discovered issues MUST be captured. The default target is current scope (round → task → checkpoint); standalone tasks are RARE.
|
|
69
|
+
|
|
70
|
+
#### How to Capture (top-down routing — use the first row that fits)
|
|
71
|
+
|
|
72
|
+
| Situation | Action |
|
|
73
|
+
|-----------|--------|
|
|
74
|
+
| Trivial inline fix (≤5 min, mechanical, scope-clean) | Apply in the CURRENT round per `cbp-round-end` reference `findings-presentation.md` "Trivial-Resolution Exception" |
|
|
75
|
+
| Related to the current task's domain | Create a new ROUND in the current task |
|
|
76
|
+
| Fits the current checkpoint goal but is meaningfully separate | Create a new TASK in the current checkpoint via `create_task(checkpoint_id)` |
|
|
77
|
+
| Large enough to need multiple tasks AND fits no current checkpoint | Create a NEW CHECKPOINT via `create_checkpoint` |
|
|
78
|
+
| Genuinely orphan work — off-axis from all active checkpoints AND user has explicitly confirmed standalone | Create a STANDALONE task via `create_task(repo_id)` — rare |
|
|
79
|
+
| Timed re-check waiting on upstream fix | Standalone task with `context.re_check_date` |
|
|
80
|
+
|
|
81
|
+
Standalone routing requires explicit user confirmation — silence is NOT confirmation.
|
|
82
|
+
|
|
83
|
+
#### Consolidation Before Creation (MANDATORY)
|
|
84
|
+
|
|
85
|
+
Before calling `create_task` for a finding, run a two-step dedup + bundle check:
|
|
86
|
+
|
|
87
|
+
**Step 1 — Dedup against pending standalone tasks:**
|
|
88
|
+
|
|
89
|
+
```
|
|
90
|
+
mcp__codebyplan__get_tasks(repo_id, standalone=true, status="pending")
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
Compare the proposed task to each pending standalone task on these match dimensions:
|
|
94
|
+
|
|
95
|
+
| Match dimension | Action if matched |
|
|
96
|
+
|-----------------|-------------------|
|
|
97
|
+
| Same target file(s) | STOP — `update_task` to append, do not create new |
|
|
98
|
+
| Same feature / module | STOP — `update_task` to append, do not create new |
|
|
99
|
+
| Same root cause (e.g. "prettier drift", "router bug") | STOP — `update_task` to append, do not create new |
|
|
100
|
+
| Same dependency / advisory | STOP — `update_task` to append, do not create new |
|
|
101
|
+
|
|
102
|
+
If a match is found, surface it to the user before appending:
|
|
103
|
+
|
|
104
|
+
```
|
|
105
|
+
Found existing pending task TASK-[N]: [title]
|
|
106
|
+
This finding overlaps on [dimension]. Append to TASK-[N] instead of creating new? (yes / no — create separately)
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
Default to append. Only create a separate task if the user explicitly says no, OR if the existing task is in_progress / completed (in which case use `context.related_task_ids[]` on the new task to cross-reference).
|
|
110
|
+
|
|
111
|
+
**Step 2 — Bundle within the same agent invocation:** If a single agent run surfaces 2+ findings that share any match dimension, MERGE them into a SINGLE `create_task` call. Do NOT loop and create one task per finding.
|
|
112
|
+
|
|
64
113
|
### Step 4: Analyze Codebase Context
|
|
65
114
|
|
|
66
115
|
Brief inline analysis:
|
|
@@ -15,7 +15,7 @@ Start a task by loading context from the database and preparing for work.
|
|
|
15
15
|
|
|
16
16
|
### Step 1: Parse `$ARGUMENTS`
|
|
17
17
|
|
|
18
|
-
Parse the argument using the canonical chk-task-round notation (see
|
|
18
|
+
Parse the argument using the canonical chk-task-round notation (see `cbp-round-start` Step 0 "CHK / TASK / ROUND Identifier Notation Vocabulary"):
|
|
19
19
|
|
|
20
20
|
| Shape | Regex | Resolves to |
|
|
21
21
|
|-------|-------|-------------|
|
|
@@ -158,7 +158,7 @@ Skip this step if the task title and requirements contain no CVE ID (`CVE-YYYY-N
|
|
|
158
158
|
1. Run `pnpm audit --json` from the monorepo root. If it fails (network, registry), surface the error and stop — do NOT start a CVE task without a clean snapshot.
|
|
159
159
|
2. Parse the advisory list from the JSON output.
|
|
160
160
|
3. Call MCP `get_tasks(repo_id)`; for each advisory, match by ID in task title/requirements.
|
|
161
|
-
4. For every advisory with no matching open task, call MCP `create_task` per `
|
|
161
|
+
4. For every advisory with no matching open task, call MCP `create_task` per `cbp-task-create` Step 3.5 "Immediate Issue Capture Contract".
|
|
162
162
|
5. Report the sweep result:
|
|
163
163
|
```
|
|
164
164
|
## CVE/GHSA Audit Sweep
|
|
@@ -25,7 +25,7 @@ Per-wave `testing-qa-agent` runs inside `/cbp-round-execute` Step 5. This skill
|
|
|
25
25
|
|
|
26
26
|
### Step 1: Parse `$ARGUMENTS`
|
|
27
27
|
|
|
28
|
-
Parse the argument using the canonical chk-task-round notation (see
|
|
28
|
+
Parse the argument using the canonical chk-task-round notation (see `cbp-round-start` Step 0 "CHK / TASK / ROUND Identifier Notation Vocabulary"):
|
|
29
29
|
|
|
30
30
|
| Shape | Regex | Resolves to |
|
|
31
31
|
|-------|-------|-------------|
|
|
@@ -170,7 +170,7 @@ For each finding, record: `{category, file, description, severity: 'low'|'medium
|
|
|
170
170
|
|
|
171
171
|
Findings with severity `medium` or `high` feed the Step 9 problem classification. `low` findings are recorded in `task_testing_output` for the record but do not block.
|
|
172
172
|
|
|
173
|
-
If any finding points to a need that exceeds task scope (e.g. a utility worth extracting for the wider codebase, a convention the repo should adopt globally), route per `
|
|
173
|
+
If any finding points to a need that exceeds task scope (e.g. a utility worth extracting for the wider codebase, a convention the repo should adopt globally), route per `cbp-task-create` Step 3.5 "Immediate Issue Capture Contract — How to Capture" — default to a NEW TASK in the current checkpoint, not a standalone task. Standalone routing applies only when the finding is genuinely off-axis from every active checkpoint AND the user has confirmed standalone routing.
|
|
174
174
|
|
|
175
175
|
### Step 7: Separate Claude-Testable vs User-Testable
|
|
176
176
|
|
|
@@ -44,7 +44,7 @@ With `USER_ID` resolved, call MCP `get_todos({ repo_id, user_id, worktree_id })`
|
|
|
44
44
|
|
|
45
45
|
- The head carries `command`, `instructions`, `state`, `metadata`, `worktree_id`, `checkpoint_id`, `task_id`.
|
|
46
46
|
- The routing context (checkpoint/task) lives in **`rows[0].metadata`**.
|
|
47
|
-
- `get_todos` is **pure-read** — `apps/todo-worker` is the sole regen authority. NEVER call `regenerate_todos_for_repo`.
|
|
47
|
+
- `get_todos` is **pure-read** — `apps/todo-worker` is the sole regen authority. NEVER call `get_next_action` or `regenerate_todos_for_repo`.
|
|
48
48
|
- Empty array, or `USER_ID` unavailable → go to Step 3 (empty-queue fallback).
|
|
49
49
|
|
|
50
50
|
Queue `command` values may use the `/codebyplan:<name>` plugin-namespace form (e.g. `/codebyplan:round-start`); treat each as the matching `/cbp-<name>` skill for the Step 2 matrix.
|
|
@@ -76,6 +76,39 @@ Options:
|
|
|
76
76
|
|
|
77
77
|
`<short-uuid>` = first 8 chars of the worktree UUID. Caller unresolved → render "this one" as `(unresolved / —)`. Name lookup miss → show the UUID alone. Wait for the user. NEVER propose reassigning the checkpoint to the caller worktree.
|
|
78
78
|
|
|
79
|
+
### Step 1.55: Stale-Entity Guard
|
|
80
|
+
|
|
81
|
+
Ownership passed (Step 1.5). Now guard against a lagging queue routing into already-finished work — the queue head (`rows[0]`) is produced by the todo-worker and can be stale (`health_check.todos_freshness_ok === false` is the signal), so a head command may target a checkpoint or task that was completed/cancelled since the queue was last regenerated. Reuse the `status` + task statuses already loaded in Step 1.5 — no extra reads, and NEVER call `get_next_action` or `regenerate_todos_for_repo` (an empty/stale queue is reconciled by the todo-worker, never re-derived here).
|
|
82
|
+
|
|
83
|
+
Reject the auto-trigger when EITHER holds:
|
|
84
|
+
|
|
85
|
+
- The target checkpoint's `status` is `completed` or `cancelled`.
|
|
86
|
+
- Every task returned by `get_tasks(checkpoint_id)` (loaded in Step 1.5) has status `completed` or `cancelled` — no actionable task remains.
|
|
87
|
+
|
|
88
|
+
On reject, surface the mismatch — naming the head command and the stale entity — then **STOP** (do not auto-trigger the head command). Use the variant matching the trigger condition:
|
|
89
|
+
|
|
90
|
+
- Checkpoint itself `completed`/`cancelled`:
|
|
91
|
+
|
|
92
|
+
```
|
|
93
|
+
⚠ Stale queue head: <command> targets CHK-<NNN> which is <status>.
|
|
94
|
+
The todo queue has not caught up. Options:
|
|
95
|
+
A) /cbp-todo — re-resolve the queue
|
|
96
|
+
B) /cbp-checkpoint-create — start new work
|
|
97
|
+
C) /cbp-session-end
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
- Checkpoint still active but every task `completed`/`cancelled`:
|
|
101
|
+
|
|
102
|
+
```
|
|
103
|
+
⚠ Stale queue head: <command> targets CHK-<NNN> — all tasks are completed/cancelled, no actionable work remains.
|
|
104
|
+
The todo queue has not caught up. Options:
|
|
105
|
+
A) /cbp-todo — re-resolve the queue
|
|
106
|
+
B) /cbp-checkpoint-create — start new work
|
|
107
|
+
C) /cbp-session-end
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
Skip this gate when the routing target has no checkpoint (idle — see Step 3) or the command is `/cbp-session-start`.
|
|
111
|
+
|
|
79
112
|
### Step 1.6: Checkpoint Planning Gate
|
|
80
113
|
|
|
81
114
|
Ownership passed (Step 1.5). Now gate on the checkpoint's planning + activation state — reusing the `plan` + `status` + task count loaded in Step 1.5 — so work never starts on a half-baked or un-activated checkpoint. Evaluate two rules in order (Rule A wins if both could match):
|
|
@@ -154,11 +187,11 @@ Wait for the user. Do not auto-trigger `/cbp-session-end` — session wrap-up is
|
|
|
154
187
|
|
|
155
188
|
### Step 4: Display and Auto-trigger
|
|
156
189
|
|
|
157
|
-
Reached only when the Step 1.5 ownership gate allowed routing to continue and the Step 1.6 planning gate fell through (no hand-off). Show `rows[0].instructions` (so the user sees what is happening), then auto-trigger `rows[0].command` (its `/cbp-<name>` form).
|
|
190
|
+
Reached only when the Step 1.5 ownership gate allowed routing to continue, the Step 1.55 stale-entity guard did not reject, and the Step 1.6 planning gate fell through (no hand-off). Show `rows[0].instructions` (so the user sees what is happening), then auto-trigger `rows[0].command` (its `/cbp-<name>` form).
|
|
158
191
|
|
|
159
192
|
## Integration
|
|
160
193
|
|
|
161
194
|
- **Called by**: `/cbp-session-start`, `/cbp-task-complete`, `/cbp-checkpoint-complete`, manual, after `/clear`
|
|
162
195
|
- **Resolves**: `npx codebyplan resolve-worktree --json` (worktree id + distress signal), `npx codebyplan whoami --json` (user id)
|
|
163
196
|
- **Reads**: MCP `get_todos`, `get_current_task`, `get_rounds`, `get_checkpoints`, `get_tasks`, `get_worktrees`
|
|
164
|
-
- **Triggers**: `rows[0].command` (auto, after the Step 1.5 ownership gate); Step 1.6 overrides to `/cbp-checkpoint-plan` (unplanned) or `/cbp-checkpoint-start` (planned-but-pending)
|
|
197
|
+
- **Triggers**: `rows[0].command` (auto, after the Step 1.5 ownership gate and Step 1.55 stale-entity guard pass, and the Step 1.6 planning gate falls through); Step 1.55 overrides to STOP (stale completed/cancelled entity); Step 1.6 overrides to `/cbp-checkpoint-plan` (unplanned) or `/cbp-checkpoint-start` (planned-but-pending)
|
|
@@ -12,8 +12,9 @@ Repo under test: `2ff6d405-39c5-47b8-a6d1-59f998ac0537`. Resolve a real `user_id
|
|
|
12
12
|
|
|
13
13
|
## Preconditions
|
|
14
14
|
|
|
15
|
-
- `get_todos` is the only Step 1 read — confirm no `regenerate_todos_for_repo` call remains (`grep -n 'regenerate_todos_for_repo' SKILL.md` → no hits).
|
|
15
|
+
- `get_todos` is the only Step 1 read — confirm no `get_next_action` / `regenerate_todos_for_repo` call remains (`grep -n 'get_next_action\|regenerate_todos_for_repo' SKILL.md` → no hits).
|
|
16
16
|
- Step 0 uses `resolve-worktree --json` and `whoami --json`.
|
|
17
|
+
- Step 1.55 (Stale-Entity Guard) must NOT call `get_next_action` or `regenerate_todos_for_repo` when rejecting a stale head — it reuses the checkpoint `status` + task statuses already loaded in Step 1.5 (the same grep above covers it).
|
|
17
18
|
|
|
18
19
|
## Scenario A — caller owns the work → auto-trigger
|
|
19
20
|
|
|
@@ -40,6 +41,12 @@ Repo under test: `2ff6d405-39c5-47b8-a6d1-59f998ac0537`. Resolve a real `user_id
|
|
|
40
41
|
2. Step 3 fallback: `get_current_task({ repo_id, worktree_id })` / `get_checkpoints({ repo_id, worktree_id, status: 'active' })` surface a checkpoint whose `worktree_id` differs from the caller.
|
|
41
42
|
3. **Expected**: the Step 1.5 ownership gate (applied to the fallback target) blocks the discovered work with the same "Work mismatch" message. NO auto-trigger. `regenerate_todos_for_repo` is never called.
|
|
42
43
|
|
|
44
|
+
## Scenario D — stale queue head for a completed/cancelled entity → halt
|
|
45
|
+
|
|
46
|
+
1. Caller owns the active checkpoint (Scenario A ownership holds), but the todo-worker queue is lagging — `health_check.todos_freshness_ok === false`. The `get_todos` head targets a checkpoint (or task) that has since been completed or cancelled.
|
|
47
|
+
2. Step 1.5 loads the target checkpoint `status` + `get_tasks(checkpoint_id)` (already required for the ownership/planning gates — no extra reads).
|
|
48
|
+
3. **Expected**: Step 1.55 rejects the auto-trigger — target checkpoint `status` is `completed`/`cancelled` (or every task is `completed`/`cancelled`) — surfaces the "Stale queue head" message naming the command + `CHK-NNN`[ `TASK-N`] + status, and STOPS. NO auto-trigger. NO `get_next_action` / `regenerate_todos_for_repo` call.
|
|
49
|
+
|
|
43
50
|
## Edge — both null
|
|
44
51
|
|
|
45
52
|
Caller `WORKTREE_ID` empty AND target checkpoint `worktree_id` `null` → legitimate main-repo / unassigned work → auto-trigger allowed.
|