create-claude-cabinet 0.32.0 → 0.33.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/lib/cli.js +1 -1
- package/package.json +1 -1
- package/templates/README.md +1 -1
- package/templates/cabinet/checkpoint-protocol.md +48 -0
- package/templates/hooks/action-completion-gate.sh +1 -1
- package/templates/skills/cc-upgrade/SKILL.md +6 -0
- package/templates/skills/cc-upgrade/phases/execute-group-workflow-split-detect.md +74 -0
- package/templates/skills/execute-group/SKILL.md +184 -97
- package/templates/workflows/execute-group-complete.js +303 -0
- package/templates/workflows/execute-group-implement.js +243 -0
- package/templates/workflows/execute-group.js +0 -506
package/lib/cli.js
CHANGED
|
@@ -485,7 +485,7 @@ const MODULES = {
|
|
|
485
485
|
mandatory: false,
|
|
486
486
|
default: true,
|
|
487
487
|
lean: true,
|
|
488
|
-
templates: ['skills/plan', 'skills/execute', 'skills/generate-plan-groups', 'skills/execute-group', 'workflows/execute-group.js', 'skills/investigate', 'cabinet/checkpoint-protocol.md'],
|
|
488
|
+
templates: ['skills/plan', 'skills/execute', 'skills/generate-plan-groups', 'skills/execute-group', 'workflows/execute-group-implement.js', 'workflows/execute-group-complete.js', 'skills/investigate', 'cabinet/checkpoint-protocol.md'],
|
|
489
489
|
},
|
|
490
490
|
'compliance': {
|
|
491
491
|
name: 'Compliance Stack (rules + enforcement)',
|
package/package.json
CHANGED
package/templates/README.md
CHANGED
|
@@ -40,7 +40,7 @@ templates, see [EXTENSIONS.md](EXTENSIONS.md).
|
|
|
40
40
|
| `skills/debrief-quick/` | Quick debrief variant — core phases only, skip presentation. |
|
|
41
41
|
| `skills/execute/` | Execute a plan with cabinet member checkpoints. 3-checkpoint protocol (pre-implementation, per-file-group, pre-commit). 5 phase files. |
|
|
42
42
|
| `skills/generate-plan-groups/` | Scheduler: find plans with surface-area declarations, build a conflict graph, persist conflict-free parallel groups as pib-db `grp:` tags. Does not execute — hands each group to /execute-group. |
|
|
43
|
-
| `skills/execute-group/` | Runner: execute one generated group
|
|
43
|
+
| `skills/execute-group/` | Runner: execute one generated group as a 3-stage pipeline — interactive cabinet pre-review (CP1) the operator decides on, then the `execute-group-implement.js` workflow (parallel worktree implementation + sequential merge) and the `execute-group-complete.js` workflow (advisory review + integration + completion report), with operator checkpoints between stages. |
|
|
44
44
|
| `skills/cc-extract/` | Analyze project artifacts and propose upstream extraction candidates for Claude Cabinet. |
|
|
45
45
|
| `skills/investigate/` | Structured codebase exploration: frame, observe, hypothesize, test, conclude. |
|
|
46
46
|
| `skills/cc-link/` | Set up local development linking for Claude Cabinet source repo work. |
|
|
@@ -27,6 +27,36 @@ set of conflict-free plans implemented concurrently in separate worktrees,
|
|
|
27
27
|
then merged together. `/execute` never exercises that scope — it runs one
|
|
28
28
|
plan at a time and uses only the first three.
|
|
29
29
|
|
|
30
|
+
## Checkpoint modes — who acts on the verdict
|
|
31
|
+
|
|
32
|
+
The scope says *what* is reviewed. The **mode** says *what happens to the
|
|
33
|
+
verdict*. This distinction is load-bearing: an autonomous gate that reverts
|
|
34
|
+
or halts on a false-positive `stop` is fragile and expensive. The default for
|
|
35
|
+
high-stakes reviews is to put judgment in front of the operator.
|
|
36
|
+
|
|
37
|
+
| Mode | Where it runs | What a `stop`/`pause` does | Used by |
|
|
38
|
+
|------|---------------|----------------------------|---------|
|
|
39
|
+
| **Interactive CP** | Main session (skill level) | Surfaced to the operator, who decides (proceed / drop / override / abort). Never automatic. | `/execute-group` CP1 |
|
|
40
|
+
| **Advisory CP** | Workflow | Recorded in the Completion Report as a concern. Never halts or reverts. The only automatic gate alongside it is `/validate`. | `/execute-group` CP3 |
|
|
41
|
+
| **Full CP** | Main session or workflow | Halts on `stop`, escalates 3+ `pause` to a halt, requires explicit override. The classic gate. | `/execute` CP1/CP2/CP3 |
|
|
42
|
+
|
|
43
|
+
**Why Interactive and Advisory exist.** `/execute-group` once ran CP1 and CP3
|
|
44
|
+
as autonomous gates inside a single workflow: a cabinet `stop` halted the run
|
|
45
|
+
or reverted a merge with no human in the loop. False positives there cost real
|
|
46
|
+
money (a CP1 halted twice consecutively — 1.6M+ tokens — on concerns the plan
|
|
47
|
+
text already addressed). Moving CP1 to interactive (operator decides) and CP3
|
|
48
|
+
to advisory (concerns recorded, `/validate` is the only hard gate) keeps the
|
|
49
|
+
review signal while removing the destructive autonomous action.
|
|
50
|
+
|
|
51
|
+
### Interactive CP adds a required `addressed_by_plan` field
|
|
52
|
+
|
|
53
|
+
At Interactive CP (`/execute-group` CP1, `pre-impl` scope), each agent's
|
|
54
|
+
verdict carries one extra **required** field, `addressed_by_plan` — the list
|
|
55
|
+
of risks the plan already handles. The agent must enumerate these *first*,
|
|
56
|
+
before raising any concern. This forces the plan-first discipline structurally:
|
|
57
|
+
a risk listed in `addressed_by_plan` cannot also be raised as a concern. It is
|
|
58
|
+
the direct fix for the false-positive halts.
|
|
59
|
+
|
|
30
60
|
## Step 1 — Select which members to spawn
|
|
31
61
|
|
|
32
62
|
Spawn one Agent per cabinet member that matches **either**:
|
|
@@ -107,8 +137,26 @@ Each agent returns exactly this shape:
|
|
|
107
137
|
}
|
|
108
138
|
```
|
|
109
139
|
|
|
140
|
+
At **Interactive CP** (`/execute-group` CP1), add the required
|
|
141
|
+
`addressed_by_plan` array described above:
|
|
142
|
+
|
|
143
|
+
```json
|
|
144
|
+
{
|
|
145
|
+
"cabinet_member": "name",
|
|
146
|
+
"addressed_by_plan": ["risks the plan already handles"],
|
|
147
|
+
"verdict": "continue" | "pause" | "stop",
|
|
148
|
+
"concerns": [ ... ]
|
|
149
|
+
}
|
|
150
|
+
```
|
|
151
|
+
|
|
110
152
|
## Step 4 — Apply escalation
|
|
111
153
|
|
|
154
|
+
The escalation below is **Full CP** behavior (used by `/execute`). For
|
|
155
|
+
**Interactive CP** the verdicts are surfaced to the operator severity-first
|
|
156
|
+
and the operator decides — no automatic halt. For **Advisory CP** the concerns
|
|
157
|
+
are recorded in the Completion Report and nothing halts or reverts; `/validate`
|
|
158
|
+
is the only automatic gate. See "Checkpoint modes" above.
|
|
159
|
+
|
|
112
160
|
Collect every verdict, then:
|
|
113
161
|
|
|
114
162
|
- **Any `stop`** → halt. Show the concern. Require an explicit override
|
|
@@ -98,7 +98,7 @@ try:
|
|
|
98
98
|
cp3g = cks.get('cp3_group', '')
|
|
99
99
|
if me is None: print('NOT_IN_REPORT')
|
|
100
100
|
elif me.get('status') != 'merged': print('plan-status=' + str(me.get('status')))
|
|
101
|
-
elif cp3g not in ('continue', 'skipped', 'n/a'): print('cp3_group=' + str(cp3g))
|
|
101
|
+
elif cp3g not in ('continue', 'skipped', 'n/a'): print('cp3_group=' + str(cp3g)) # n/a: backward-compat with pre-v0.32 reports
|
|
102
102
|
elif integ.get('validate') != 'pass': print('integration.validate=' + str(integ.get('validate')))
|
|
103
103
|
elif integ.get('breadcrumbs') != 'valid': print('integration.breadcrumbs=' + str(integ.get('breadcrumbs')))
|
|
104
104
|
else: print('OK')
|
|
@@ -280,6 +280,12 @@ orphans conversationally:
|
|
|
280
280
|
- **`execute-plans/` → `generate-plan-groups/` + `execute-group/`:** if
|
|
281
281
|
`.claude/skills/execute-plans/` exists, run
|
|
282
282
|
`phases/execute-plans-rename-detect.md`.
|
|
283
|
+
- **`execute-group.js` → `execute-group-implement.js` + `execute-group-complete.js`:**
|
|
284
|
+
if `.claude/workflows/execute-group.js` exists, run
|
|
285
|
+
`phases/execute-group-workflow-split-detect.md`. That phase removes the
|
|
286
|
+
orphaned monolithic workflow once both replacement workflow scripts are
|
|
287
|
+
present (the cleanup loop won't, since the file was deleted upstream rather
|
|
288
|
+
than renamed).
|
|
283
289
|
- **`handoff*` → `engagement*` (+ `.claude/handoff/` infra → `.claude/engagement/`):**
|
|
284
290
|
if any of `.claude/skills/handoff*` or `.claude/handoff/` exists, run
|
|
285
291
|
`phases/handoff-rename-detect.md`. That phase removes the orphaned skill
|
|
@@ -0,0 +1,74 @@
|
|
|
1
|
+
# execute-group.js workflow split detection
|
|
2
|
+
|
|
3
|
+
In the execute-group redesign, the monolithic `execute-group.js` workflow
|
|
4
|
+
(one script running CP1 → implement → merge with per-plan CP3 → integration →
|
|
5
|
+
group CP3 → completion as autonomous gates) was split into two focused
|
|
6
|
+
workflow scripts plus a skill-level interactive checkpoint:
|
|
7
|
+
|
|
8
|
+
- **`execute-group-implement.js`** — mechanical parallel implementation +
|
|
9
|
+
sequential merge (no cabinet review).
|
|
10
|
+
- **`execute-group-complete.js`** — advisory CP3 + integration + completion
|
|
11
|
+
report.
|
|
12
|
+
- **Interactive CP1** moved into the `/execute-group` SKILL.md (the operator
|
|
13
|
+
decides; it is no longer an autonomous gate).
|
|
14
|
+
|
|
15
|
+
The installer copies the two new workflow files, but the old
|
|
16
|
+
`.claude/workflows/execute-group.js` is **not** removed by the cleanup loop:
|
|
17
|
+
that loop only deletes files still mapping to a current CC template, and
|
|
18
|
+
`execute-group.js` no longer maps to one (it was deleted upstream, not
|
|
19
|
+
renamed). So after an upgrade a project that had it ends up with the stale
|
|
20
|
+
`execute-group.js` sitting next to the two new scripts.
|
|
21
|
+
|
|
22
|
+
This phase detects and removes that orphan.
|
|
23
|
+
|
|
24
|
+
## When this phase runs
|
|
25
|
+
|
|
26
|
+
Only when the orphan workflow file is actually on disk:
|
|
27
|
+
|
|
28
|
+
```bash
|
|
29
|
+
test -f .claude/workflows/execute-group.js && echo "HAS_ORPHAN=1"
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
If it's absent, skip silently — nothing to clean up.
|
|
33
|
+
|
|
34
|
+
## What to do
|
|
35
|
+
|
|
36
|
+
When the orphan is present, explain the split in plain terms:
|
|
37
|
+
|
|
38
|
+
> The `/execute-group` workflow was split. The old single
|
|
39
|
+
> `execute-group.js` script (which ran cabinet review as autonomous
|
|
40
|
+
> halt/revert gates) is replaced by:
|
|
41
|
+
> - **`execute-group-implement.js`** — mechanical implementation + merge
|
|
42
|
+
> - **`execute-group-complete.js`** — advisory review + completion
|
|
43
|
+
> - an **interactive CP1** that now lives in the `/execute-group` skill, so
|
|
44
|
+
> *you* decide on pre-implementation concerns instead of a gate halting
|
|
45
|
+
> automatically.
|
|
46
|
+
>
|
|
47
|
+
> Both new workflow scripts are installed. The old `execute-group.js` is
|
|
48
|
+
> left over from before the split and should be removed.
|
|
49
|
+
|
|
50
|
+
The orphan is only safe to remove once **both** replacement workflows are
|
|
51
|
+
present (otherwise removing it would strand `/execute-group` with no
|
|
52
|
+
orchestrator):
|
|
53
|
+
|
|
54
|
+
```bash
|
|
55
|
+
if [ -f .claude/workflows/execute-group-implement.js ] && \
|
|
56
|
+
[ -f .claude/workflows/execute-group-complete.js ]; then
|
|
57
|
+
rm -f .claude/workflows/execute-group.js
|
|
58
|
+
echo "Removed orphaned .claude/workflows/execute-group.js"
|
|
59
|
+
else
|
|
60
|
+
echo "WARN: replacement workflows not both present — leaving execute-group.js in place"
|
|
61
|
+
fi
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
If either replacement is missing, leave the old script in place and tell the
|
|
65
|
+
user the upgrade didn't fully install the new workflows (re-run the
|
|
66
|
+
installer). A working monolith beats a half-removed split.
|
|
67
|
+
|
|
68
|
+
## Note on in-flight runs
|
|
69
|
+
|
|
70
|
+
Completion Reports already written to `.claude/verification/group-*-report.json`
|
|
71
|
+
by the old workflow remain valid — the completion gate reads the same
|
|
72
|
+
`per_plan[].status`, `checkpoints.cp3_group`, and `checkpoints.integration`
|
|
73
|
+
fields, which the new `execute-group-complete.js` preserves. No report
|
|
74
|
+
migration is needed.
|
|
@@ -2,12 +2,12 @@
|
|
|
2
2
|
name: execute-group
|
|
3
3
|
description: |
|
|
4
4
|
Run one parallel plan group produced by /generate-plan-groups. Validates
|
|
5
|
-
the group hasn't drifted,
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
5
|
+
the group hasn't drifted, runs an interactive cabinet pre-review (CP1) you
|
|
6
|
+
decide on, then drives a two-workflow pipeline: mechanical parallel
|
|
7
|
+
implementation + merge, an operator checkpoint, then advisory review +
|
|
8
|
+
completion. Use when: "execute group", "run group", "/execute-group".
|
|
9
9
|
disable-model-invocation: true
|
|
10
|
-
argument-hint: "group label — e.g., '2026-05-30-1'"
|
|
10
|
+
argument-hint: "group label — e.g., '2026-05-30-1' (append --advisory to skip the CP1 pause)"
|
|
11
11
|
related:
|
|
12
12
|
- type: skill
|
|
13
13
|
name: generate-plan-groups
|
|
@@ -15,10 +15,13 @@ related:
|
|
|
15
15
|
name: execute
|
|
16
16
|
- type: file
|
|
17
17
|
path: .claude/cabinet/checkpoint-protocol.md
|
|
18
|
-
role: "The cabinet checkpoint mechanism —
|
|
18
|
+
role: "The cabinet checkpoint mechanism — CP1 (interactive) and CP3 (advisory) both read it"
|
|
19
19
|
- type: file
|
|
20
|
-
path: .claude/workflows/execute-group.js
|
|
21
|
-
role: "
|
|
20
|
+
path: .claude/workflows/execute-group-implement.js
|
|
21
|
+
role: "Stage 2 — mechanical parallel implementation + sequential merge"
|
|
22
|
+
- type: file
|
|
23
|
+
path: .claude/workflows/execute-group-complete.js
|
|
24
|
+
role: "Stage 3 — advisory CP3 + integration + completion report"
|
|
22
25
|
---
|
|
23
26
|
|
|
24
27
|
# /execute-group — Run a Generated Parallel Plan Group
|
|
@@ -26,51 +29,64 @@ related:
|
|
|
26
29
|
## Purpose
|
|
27
30
|
|
|
28
31
|
`/generate-plan-groups` decides *what can run in parallel* and persists each
|
|
29
|
-
conflict-free group as pib-db `grp:` tags. This skill *runs one group
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
**
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
32
|
+
conflict-free group as pib-db `grp:` tags. This skill *runs one group* as a
|
|
33
|
+
**three-stage pipeline with operator checkpoints between stages**:
|
|
34
|
+
|
|
35
|
+
1. **Interactive CP1** (this skill, main session) — cabinet members pre-review
|
|
36
|
+
the plans; *you* decide whether to proceed, drop plans, or pass overrides.
|
|
37
|
+
2. **Implementation workflow** (`execute-group-implement.js`) — parallel
|
|
38
|
+
worktree implementation + sequential merge. Purely mechanical, no review.
|
|
39
|
+
3. **Review + completion workflow** (`execute-group-complete.js`) — advisory
|
|
40
|
+
cabinet review of the merged diff, integration check, and completion report.
|
|
41
|
+
|
|
42
|
+
**Why this shape (and not one monolithic workflow):** an earlier design ran
|
|
43
|
+
CP1 and CP3 *inside* a single workflow as autonomous gates — a cabinet `stop`
|
|
44
|
+
halted the run or reverted a merge automatically. False positives in those
|
|
45
|
+
gates produced expensive halts (field evidence: a CP1 halted twice in a row,
|
|
46
|
+
1.6M+ tokens, on concerns the plan text already addressed). The fix:
|
|
47
|
+
**judgment belongs to the operator, automation stays mechanical.** CP1 is now
|
|
48
|
+
interactive (you see the findings and decide). CP3 is advisory (concerns are
|
|
49
|
+
recorded, never auto-halt/revert). The only hard, automatic gate is
|
|
50
|
+
`/validate` — a deterministic build check, not a judgment call.
|
|
51
|
+
|
|
52
|
+
**Why two workflows instead of direct Agent-tool spawning:** worktree agents
|
|
53
|
+
cannot spawn sub-agents (no Agent-tool access — empirically verified), so a
|
|
54
|
+
worktree agent cannot run its own checkpoints. CP1 escapes this by running at
|
|
55
|
+
the skill level (main session, where the Agent tool *is* available). Stages 2
|
|
56
|
+
and 3 are workflows because they need to spawn worktree/merge/review agents as
|
|
57
|
+
first-class parallel participants.
|
|
41
58
|
|
|
42
59
|
## Prerequisites
|
|
43
60
|
|
|
44
61
|
- The group must have been produced by `/generate-plan-groups` (its plans
|
|
45
62
|
carry `grp:<label>`, `grp-generated:`, and `grp-hash:` tags).
|
|
46
63
|
- Plans must still have `## Surface Area` sections in their notes.
|
|
47
|
-
- The Workflow tool must be available (
|
|
64
|
+
- The Workflow tool must be available (Stages 2 and 3 run as workflows).
|
|
48
65
|
|
|
49
66
|
## Honest ceiling — read before relying on this
|
|
50
67
|
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
`/
|
|
58
|
-
|
|
68
|
+
- **No mid-implementation (CP2) review.** Worktree agents implement without a
|
|
69
|
+
reviewer watching. CP1 reviews before, CP3 reviews after. For a plan whose
|
|
70
|
+
diff is large or touches high-risk surface, run `/execute <plan>`
|
|
71
|
+
individually instead — full `/execute` has the per-file-group checkpoint
|
|
72
|
+
this path sacrifices for parallelism.
|
|
73
|
+
- **CP3 is advisory.** It surfaces concerns; it does not block completion.
|
|
74
|
+
Only `/validate` blocks. A real problem invisible to `/validate` (e.g. a
|
|
75
|
+
subtle behavioral regression) will land on main with an advisory note, not
|
|
76
|
+
a revert. The operator must read the CP3 concerns in the report.
|
|
59
77
|
- **Surface area is intent, not reality.** Under-declared surface area can
|
|
60
|
-
hide a semantic conflict the conflict graph missed
|
|
61
|
-
- **Feature-file "affect" is heuristic.** Behavioral coupling not textually
|
|
62
|
-
referenced may be missed.
|
|
78
|
+
hide a semantic conflict the conflict graph missed.
|
|
63
79
|
|
|
64
80
|
## Workflow
|
|
65
81
|
|
|
66
|
-
### Step 1 — Staleness guard (
|
|
82
|
+
### Step 1 — Staleness guard (BEFORE anything else)
|
|
67
83
|
|
|
68
|
-
The persisted group is a hint, not a contract. Re-validate
|
|
69
|
-
|
|
84
|
+
The persisted group is a hint, not a contract. Re-validate against *current*
|
|
85
|
+
state:
|
|
70
86
|
|
|
71
87
|
1. **Fetch the group's plans.** Query actions whose `tags` contain
|
|
72
|
-
`grp:<label>` (the argument). Use `pib_query`
|
|
73
|
-
query`):
|
|
88
|
+
`grp:<label>` (the argument, minus any `--advisory` flag). Use `pib_query`
|
|
89
|
+
(or `node scripts/pib-db.mjs query`):
|
|
74
90
|
```sql
|
|
75
91
|
SELECT a.fid, a.text, a.notes, a.tags
|
|
76
92
|
FROM actions a
|
|
@@ -81,11 +97,10 @@ The persisted group is a hint, not a contract. Re-validate it against the
|
|
|
81
97
|
2. **Drop plans that are no longer open or lost their surface area.** Report
|
|
82
98
|
each dropped plan and why.
|
|
83
99
|
|
|
84
|
-
3. **Recompute the surface-area hash and compare.** Recompute it **exactly
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
plans.
|
|
100
|
+
3. **Recompute the surface-area hash and compare.** Recompute it **exactly as
|
|
101
|
+
`/generate-plan-groups` did**: for every still-open plan in the group, parse
|
|
102
|
+
its `## Surface Area` file/dir list, concatenate all entries across the
|
|
103
|
+
group, sort, and hash. Compare to the `grp-hash:` token stored on the plans.
|
|
89
104
|
- **Hash matches** → the group is current. Proceed.
|
|
90
105
|
- **Hash differs** → a plan's surface area changed since grouping. **HALT:**
|
|
91
106
|
> Group `<label>` has drifted since it was generated (surface areas
|
|
@@ -94,90 +109,162 @@ The persisted group is a hint, not a contract. Re-validate it against the
|
|
|
94
109
|
Do not run a stale group — the conflict-free guarantee no longer holds.
|
|
95
110
|
|
|
96
111
|
4. **Edge cases:**
|
|
97
|
-
- **0 plans survive**
|
|
98
|
-
|
|
99
|
-
- **1 plan survives** → you may still
|
|
100
|
-
group-level
|
|
112
|
+
- **0 plans survive** → tell the user the group is empty (all
|
|
113
|
+
drifted/closed) and stop. Do not launch any workflow.
|
|
114
|
+
- **1 plan survives** → you may still run it (Stages 2/3 skip the
|
|
115
|
+
group-level aggregate review for a single plan), or suggest
|
|
101
116
|
`/execute <plan>` directly. Single-plan groups gain nothing from the
|
|
102
117
|
parallel machinery.
|
|
103
118
|
|
|
104
119
|
### Step 2 — Select cabinet members
|
|
105
120
|
|
|
106
|
-
Select the cabinet members
|
|
121
|
+
Select the cabinet members CP1 and CP3 will use. From
|
|
107
122
|
`.claude/skills/_index.json`: members whose `standingMandate` includes
|
|
108
123
|
`execute`, plus any whose file patterns match the group's aggregate surface
|
|
109
|
-
area. For each, collect `{ key, agentType, path, directive }` (
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
|
|
124
|
+
area. For each, collect `{ key, agentType, path, directive }` (`agentType` is
|
|
125
|
+
the registered `cabinet-<name>` subagent; `directive` is `directives.execute`
|
|
126
|
+
if present).
|
|
127
|
+
|
|
128
|
+
If the project has no cabinet members, skip CP1 and tell the user the run
|
|
129
|
+
proceeds without review (implementation + `/validate` only).
|
|
130
|
+
|
|
131
|
+
### Step 2.5 — Interactive CP1 (this skill, main session)
|
|
132
|
+
|
|
133
|
+
Spawn one Agent per selected cabinet member **in a single message** (parallel).
|
|
134
|
+
Each agent reads `.claude/cabinet/checkpoint-protocol.md` (interactive CP mode,
|
|
135
|
+
`pre-impl` scope), its own SKILL.md at `path`, the project briefing, and the
|
|
136
|
+
plans' full notes. Each returns this verdict shape (note the **required
|
|
137
|
+
`addressed_by_plan`** field — CP1 only):
|
|
138
|
+
|
|
139
|
+
```
|
|
140
|
+
CP1_VERDICT_SCHEMA:
|
|
141
|
+
{
|
|
142
|
+
"cabinet_member": "name",
|
|
143
|
+
"addressed_by_plan": ["risks the plan already handles — enumerate FIRST"],
|
|
144
|
+
"verdict": "continue" | "pause" | "stop",
|
|
145
|
+
"concerns": [
|
|
146
|
+
{ "description": "...", "evidence": "...", "severity": "blocking" | "advisory" }
|
|
147
|
+
]
|
|
148
|
+
}
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
`addressed_by_plan` is required to force plan-first review: the agent must
|
|
152
|
+
enumerate what the plan already covers *before* raising concerns. A concern
|
|
153
|
+
the plan explicitly handles must not be raised — this is the discipline whose
|
|
154
|
+
absence caused the false-positive halts.
|
|
155
|
+
|
|
156
|
+
**Present findings to the operator severity-first**, not verdict-first:
|
|
157
|
+
1. **Blocking concerns** (any `severity: blocking`) — listed first, with
|
|
158
|
+
member + evidence.
|
|
159
|
+
2. **Advisory concerns** — next.
|
|
160
|
+
3. **Addressed-by-plan** — collapsed to a one-line count ("12 risks the plans
|
|
161
|
+
already cover") unless the operator asks to expand.
|
|
162
|
+
|
|
163
|
+
Then **the operator decides** — this is the checkpoint, not an automatic gate:
|
|
164
|
+
- If any agent returned `stop` or raised a blocking concern:
|
|
165
|
+
> A cabinet member recommends stopping: [concern]. Proceed anyway, drop the
|
|
166
|
+
> affected plan from this run, or abort?
|
|
167
|
+
- Otherwise summarize and ask: **"Launch implementation?"**
|
|
168
|
+
|
|
169
|
+
Capture the operator's response as:
|
|
170
|
+
- `operatorOverrides`: an array of free-text directives to pass into the
|
|
171
|
+
implementation agents (e.g. "skip plan X", "watch the migration ordering").
|
|
172
|
+
Empty array if none.
|
|
173
|
+
- `cp1Findings`: the structured CP1 verdicts (recorded in the final report).
|
|
174
|
+
|
|
175
|
+
**`--advisory` flag:** if the argument includes `--advisory`, still run the
|
|
176
|
+
CP1 agents and record `cp1Findings`, but **skip the operator pause** — print
|
|
177
|
+
the severity-first summary and proceed straight to Stage 2 with no overrides.
|
|
178
|
+
Use this for low-risk groups where you trust the plans.
|
|
179
|
+
|
|
180
|
+
**All CP1 agents errored:** if every agent failed to return a verdict, do not
|
|
181
|
+
silently proceed. Warn:
|
|
182
|
+
> Cabinet review failed (no agent returned a verdict). Proceed without review
|
|
183
|
+
> or abort?
|
|
184
|
+
|
|
185
|
+
### Step 3a — Launch the Implementation workflow (Stage 2)
|
|
186
|
+
|
|
187
|
+
Invoke the Workflow tool:
|
|
188
|
+
|
|
189
|
+
- **script:** `.claude/workflows/execute-group-implement.js`
|
|
190
|
+
- **args:**
|
|
191
|
+
```json
|
|
192
|
+
{
|
|
193
|
+
"label": "<label>",
|
|
194
|
+
"plans": [{ "fid": "...", "text": "...", "notes": "...", "surfaceArea": "..." }],
|
|
195
|
+
"operatorOverrides": ["...optional operator directives from CP1..."]
|
|
196
|
+
}
|
|
197
|
+
```
|
|
198
|
+
Pass `plans` and `operatorOverrides` as real JSON (not stringified).
|
|
114
199
|
|
|
115
|
-
|
|
116
|
-
|
|
200
|
+
The workflow returns a structured result: `{ label, plans_implemented,
|
|
201
|
+
per_plan, merged, loose_ends }`. **Present it plainly:**
|
|
202
|
+
> N of M plans merged. [list any failed/parked/noop plans and their reasons
|
|
203
|
+
> from `loose_ends`].
|
|
117
204
|
|
|
118
|
-
|
|
205
|
+
Then **operator checkpoint:** "Continue to review + completion?" Wait for
|
|
206
|
+
confirmation before Stage 3. If nothing merged, say so and ask whether to stop
|
|
207
|
+
(Stage 3 has nothing to review).
|
|
119
208
|
|
|
120
|
-
|
|
121
|
-
arguments:
|
|
209
|
+
### Step 3b — Launch the Review + Completion workflow (Stage 3)
|
|
122
210
|
|
|
123
|
-
|
|
211
|
+
Invoke the Workflow tool:
|
|
212
|
+
|
|
213
|
+
- **script:** `.claude/workflows/execute-group-complete.js`
|
|
124
214
|
- **args:**
|
|
125
215
|
```json
|
|
126
216
|
{
|
|
127
217
|
"label": "<label>",
|
|
128
|
-
"
|
|
218
|
+
"mergedPlans": [ ...the `merged` array from Stage 2's result... ],
|
|
219
|
+
"implPerPlan": [ ...the `per_plan` array from Stage 2's result (all plans, not just merged)... ],
|
|
129
220
|
"cabinetMembers": [{ "key": "...", "agentType": "cabinet-...", "path": "...", "directive": "..." }],
|
|
221
|
+
"cp1Findings": [ ...the CP1 verdicts captured in Step 2.5... ],
|
|
130
222
|
"checkpointProtocolPath": ".claude/cabinet/checkpoint-protocol.md",
|
|
131
223
|
"briefingPath": ".claude/cabinet/_briefing.md"
|
|
132
224
|
}
|
|
133
225
|
```
|
|
134
226
|
|
|
135
|
-
|
|
227
|
+
The workflow runs advisory CP3 over the aggregate diff, the integration check,
|
|
228
|
+
and completion (it writes the Completion Report to
|
|
229
|
+
`.claude/verification/group-<label>-report.json` **before** marking plans done,
|
|
230
|
+
then marks merged plans done itself). It returns the Completion Report.
|
|
136
231
|
|
|
137
232
|
### Step 4 — Present the Completion Report
|
|
138
233
|
|
|
139
|
-
|
|
140
|
-
|
|
141
|
-
|
|
142
|
-
|
|
143
|
-
(`action-completion-gate.sh`) checks
|
|
144
|
-
|
|
234
|
+
Present it plainly: which plans completed, the advisory CP3 concerns (call
|
|
235
|
+
these out — they are the operator's to weigh), the integration result, any new
|
|
236
|
+
pib-db actions created for deferred manual ACs, and the `loose_ends`. The
|
|
237
|
+
report on disk is also the evidence the completion gate
|
|
238
|
+
(`action-completion-gate.sh`) checks for `grp:`-tagged plans — don't discard
|
|
239
|
+
it.
|
|
145
240
|
|
|
146
|
-
If
|
|
147
|
-
|
|
148
|
-
halt; on a post-merge CP3 stop, the offending plan was reverted.
|
|
241
|
+
If completion was gated (final `/validate` failed), report exactly why; the
|
|
242
|
+
merged plans are left **open** for the operator to fix and re-run.
|
|
149
243
|
|
|
150
244
|
#### Recovery steps for parked/failed plans
|
|
151
245
|
|
|
152
|
-
|
|
153
|
-
|
|
154
|
-
|
|
155
|
-
|
|
156
|
-
|
|
157
|
-
|
|
158
|
-
|
|
159
|
-
|
|
160
|
-
|
|
161
|
-
|
|
162
|
-
- **Failed implementation** — the worktree agent could not complete the
|
|
163
|
-
plan. Investigate the `deviations` in the report, fix the plan, then
|
|
164
|
-
strip the `grp:` tags and run `/execute <plan>` individually.
|
|
165
|
-
- **No result** — the worktree agent errored entirely. Same recovery:
|
|
166
|
-
strip tags, retry via `/execute`.
|
|
167
|
-
|
|
168
|
-
Re-running `/generate-plan-groups` automatically replaces stale `grp:` tags
|
|
169
|
-
on any plans it re-groups — but only for plans it selects. Plans you retry
|
|
170
|
-
individually should have their tags stripped before running `/execute`.
|
|
246
|
+
- **Merged & completed** — done.
|
|
247
|
+
- **Parked / failed implementation / no result** — the worktree branch (if
|
|
248
|
+
any) is preserved. To retry individually with full cabinet checkpoints
|
|
249
|
+
(including the per-file-group CP2 the group path skips), **strip the `grp:`
|
|
250
|
+
tags first**, then run `/execute <plan>`. If you don't strip the tags, the
|
|
251
|
+
completion gate blocks because the report shows the plan as not-merged.
|
|
252
|
+
Strip with: `pib_update_action --tags "<non-grp-tags-only>"`.
|
|
253
|
+
|
|
254
|
+
Re-running `/generate-plan-groups` automatically replaces stale `grp:` tags on
|
|
255
|
+
any plans it re-groups — but only for plans it selects.
|
|
171
256
|
|
|
172
257
|
## Principles
|
|
173
258
|
|
|
174
|
-
- **
|
|
175
|
-
|
|
176
|
-
- **The
|
|
177
|
-
|
|
178
|
-
|
|
259
|
+
- **Judgment to the operator, automation mechanical.** CP1 is an interactive
|
|
260
|
+
decision; CP3 is advisory; `/validate` is the only automatic gate.
|
|
261
|
+
- **The group is a hint, not a contract.** Always re-validate (Step 1) before
|
|
262
|
+
running. Regenerate freely.
|
|
263
|
+
- **Operator checkpoints between stages.** You see CP1 findings before
|
|
264
|
+
implementation and merge results before review+completion. Each is a real
|
|
265
|
+
decision point, not a rubber stamp.
|
|
179
266
|
- **Sequential merges, parallel everything else.** Merges into main are
|
|
180
|
-
serialized with `/validate` between them; CP1, implementation, and
|
|
181
|
-
|
|
182
|
-
- **Honest about the ceiling.**
|
|
183
|
-
|
|
267
|
+
serialized with `/validate` between them; CP1, implementation, and CP3 run
|
|
268
|
+
in parallel.
|
|
269
|
+
- **Honest about the ceiling.** For high-risk plans, prefer individual
|
|
270
|
+
`/execute`.
|