create-claude-cabinet 0.32.0 → 0.33.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/lib/cli.js CHANGED
@@ -485,7 +485,7 @@ const MODULES = {
485
485
  mandatory: false,
486
486
  default: true,
487
487
  lean: true,
488
- templates: ['skills/plan', 'skills/execute', 'skills/generate-plan-groups', 'skills/execute-group', 'workflows/execute-group.js', 'skills/investigate', 'cabinet/checkpoint-protocol.md'],
488
+ templates: ['skills/plan', 'skills/execute', 'skills/generate-plan-groups', 'skills/execute-group', 'workflows/execute-group-implement.js', 'workflows/execute-group-complete.js', 'skills/investigate', 'cabinet/checkpoint-protocol.md'],
489
489
  },
490
490
  'compliance': {
491
491
  name: 'Compliance Stack (rules + enforcement)',
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "create-claude-cabinet",
3
- "version": "0.32.0",
3
+ "version": "0.33.0",
4
4
  "description": "Claude Cabinet — opinionated process scaffolding for Claude Code projects",
5
5
  "bin": {
6
6
  "create-claude-cabinet": "bin/create-claude-cabinet.js"
@@ -40,7 +40,7 @@ templates, see [EXTENSIONS.md](EXTENSIONS.md).
40
40
  | `skills/debrief-quick/` | Quick debrief variant — core phases only, skip presentation. |
41
41
  | `skills/execute/` | Execute a plan with cabinet member checkpoints. 3-checkpoint protocol (pre-implementation, per-file-group, pre-commit). 5 phase files. |
42
42
  | `skills/generate-plan-groups/` | Scheduler: find plans with surface-area declarations, build a conflict graph, persist conflict-free parallel groups as pib-db `grp:` tags. Does not execute — hands each group to /execute-group. |
43
- | `skills/execute-group/` | Runner: execute one generated group via the `execute-group.js` workflow — cabinet pre-review, parallel worktree implementation, sequential merge with per-plan review, integration, informed final review, completion report. |
43
+ | `skills/execute-group/` | Runner: execute one generated group as a 3-stage pipelineinteractive cabinet pre-review (CP1) the operator decides on, then the `execute-group-implement.js` workflow (parallel worktree implementation + sequential merge) and the `execute-group-complete.js` workflow (advisory review + integration + completion report), with operator checkpoints between stages. |
44
44
  | `skills/cc-extract/` | Analyze project artifacts and propose upstream extraction candidates for Claude Cabinet. |
45
45
  | `skills/investigate/` | Structured codebase exploration: frame, observe, hypothesize, test, conclude. |
46
46
  | `skills/cc-link/` | Set up local development linking for Claude Cabinet source repo work. |
@@ -27,6 +27,36 @@ set of conflict-free plans implemented concurrently in separate worktrees,
27
27
  then merged together. `/execute` never exercises that scope — it runs one
28
28
  plan at a time and uses only the first three.
29
29
 
30
+ ## Checkpoint modes — who acts on the verdict
31
+
32
+ The scope says *what* is reviewed. The **mode** says *what happens to the
33
+ verdict*. This distinction is load-bearing: an autonomous gate that reverts
34
+ or halts on a false-positive `stop` is fragile and expensive. The default for
35
+ high-stakes reviews is to put judgment in front of the operator.
36
+
37
+ | Mode | Where it runs | What a `stop`/`pause` does | Used by |
38
+ |------|---------------|----------------------------|---------|
39
+ | **Interactive CP** | Main session (skill level) | Surfaced to the operator, who decides (proceed / drop / override / abort). Never automatic. | `/execute-group` CP1 |
40
+ | **Advisory CP** | Workflow | Recorded in the Completion Report as a concern. Never halts or reverts. The only automatic gate alongside it is `/validate`. | `/execute-group` CP3 |
41
+ | **Full CP** | Main session or workflow | Halts on `stop`, escalates 3+ `pause` to a halt, requires explicit override. The classic gate. | `/execute` CP1/CP2/CP3 |
42
+
43
+ **Why Interactive and Advisory exist.** `/execute-group` once ran CP1 and CP3
44
+ as autonomous gates inside a single workflow: a cabinet `stop` halted the run
45
+ or reverted a merge with no human in the loop. False positives there cost real
46
+ money (a CP1 halted twice consecutively — 1.6M+ tokens — on concerns the plan
47
+ text already addressed). Moving CP1 to interactive (operator decides) and CP3
48
+ to advisory (concerns recorded, `/validate` is the only hard gate) keeps the
49
+ review signal while removing the destructive autonomous action.
50
+
51
+ ### Interactive CP adds a required `addressed_by_plan` field
52
+
53
+ At Interactive CP (`/execute-group` CP1, `pre-impl` scope), each agent's
54
+ verdict carries one extra **required** field, `addressed_by_plan` — the list
55
+ of risks the plan already handles. The agent must enumerate these *first*,
56
+ before raising any concern. This forces the plan-first discipline structurally:
57
+ a risk listed in `addressed_by_plan` cannot also be raised as a concern. It is
58
+ the direct fix for the false-positive halts.
59
+
30
60
  ## Step 1 — Select which members to spawn
31
61
 
32
62
  Spawn one Agent per cabinet member that matches **either**:
@@ -107,8 +137,26 @@ Each agent returns exactly this shape:
107
137
  }
108
138
  ```
109
139
 
140
+ At **Interactive CP** (`/execute-group` CP1), add the required
141
+ `addressed_by_plan` array described above:
142
+
143
+ ```json
144
+ {
145
+ "cabinet_member": "name",
146
+ "addressed_by_plan": ["risks the plan already handles"],
147
+ "verdict": "continue" | "pause" | "stop",
148
+ "concerns": [ ... ]
149
+ }
150
+ ```
151
+
110
152
  ## Step 4 — Apply escalation
111
153
 
154
+ The escalation below is **Full CP** behavior (used by `/execute`). For
155
+ **Interactive CP** the verdicts are surfaced to the operator severity-first
156
+ and the operator decides — no automatic halt. For **Advisory CP** the concerns
157
+ are recorded in the Completion Report and nothing halts or reverts; `/validate`
158
+ is the only automatic gate. See "Checkpoint modes" above.
159
+
112
160
  Collect every verdict, then:
113
161
 
114
162
  - **Any `stop`** → halt. Show the concern. Require an explicit override
@@ -98,7 +98,7 @@ try:
98
98
  cp3g = cks.get('cp3_group', '')
99
99
  if me is None: print('NOT_IN_REPORT')
100
100
  elif me.get('status') != 'merged': print('plan-status=' + str(me.get('status')))
101
- elif cp3g not in ('continue', 'skipped', 'n/a'): print('cp3_group=' + str(cp3g))
101
+ elif cp3g not in ('continue', 'skipped', 'n/a'): print('cp3_group=' + str(cp3g)) # n/a: backward-compat with pre-v0.32 reports
102
102
  elif integ.get('validate') != 'pass': print('integration.validate=' + str(integ.get('validate')))
103
103
  elif integ.get('breadcrumbs') != 'valid': print('integration.breadcrumbs=' + str(integ.get('breadcrumbs')))
104
104
  else: print('OK')
@@ -280,6 +280,12 @@ orphans conversationally:
280
280
  - **`execute-plans/` → `generate-plan-groups/` + `execute-group/`:** if
281
281
  `.claude/skills/execute-plans/` exists, run
282
282
  `phases/execute-plans-rename-detect.md`.
283
+ - **`execute-group.js` → `execute-group-implement.js` + `execute-group-complete.js`:**
284
+ if `.claude/workflows/execute-group.js` exists, run
285
+ `phases/execute-group-workflow-split-detect.md`. That phase removes the
286
+ orphaned monolithic workflow once both replacement workflow scripts are
287
+ present (the cleanup loop won't, since the file was deleted upstream rather
288
+ than renamed).
283
289
  - **`handoff*` → `engagement*` (+ `.claude/handoff/` infra → `.claude/engagement/`):**
284
290
  if any of `.claude/skills/handoff*` or `.claude/handoff/` exists, run
285
291
  `phases/handoff-rename-detect.md`. That phase removes the orphaned skill
@@ -0,0 +1,74 @@
1
+ # execute-group.js workflow split detection
2
+
3
+ In the execute-group redesign, the monolithic `execute-group.js` workflow
4
+ (one script running CP1 → implement → merge with per-plan CP3 → integration →
5
+ group CP3 → completion as autonomous gates) was split into two focused
6
+ workflow scripts plus a skill-level interactive checkpoint:
7
+
8
+ - **`execute-group-implement.js`** — mechanical parallel implementation +
9
+ sequential merge (no cabinet review).
10
+ - **`execute-group-complete.js`** — advisory CP3 + integration + completion
11
+ report.
12
+ - **Interactive CP1** moved into the `/execute-group` SKILL.md (the operator
13
+ decides; it is no longer an autonomous gate).
14
+
15
+ The installer copies the two new workflow files, but the old
16
+ `.claude/workflows/execute-group.js` is **not** removed by the cleanup loop:
17
+ that loop only deletes files still mapping to a current CC template, and
18
+ `execute-group.js` no longer maps to one (it was deleted upstream, not
19
+ renamed). So after an upgrade a project that had it ends up with the stale
20
+ `execute-group.js` sitting next to the two new scripts.
21
+
22
+ This phase detects and removes that orphan.
23
+
24
+ ## When this phase runs
25
+
26
+ Only when the orphan workflow file is actually on disk:
27
+
28
+ ```bash
29
+ test -f .claude/workflows/execute-group.js && echo "HAS_ORPHAN=1"
30
+ ```
31
+
32
+ If it's absent, skip silently — nothing to clean up.
33
+
34
+ ## What to do
35
+
36
+ When the orphan is present, explain the split in plain terms:
37
+
38
+ > The `/execute-group` workflow was split. The old single
39
+ > `execute-group.js` script (which ran cabinet review as autonomous
40
+ > halt/revert gates) is replaced by:
41
+ > - **`execute-group-implement.js`** — mechanical implementation + merge
42
+ > - **`execute-group-complete.js`** — advisory review + completion
43
+ > - an **interactive CP1** that now lives in the `/execute-group` skill, so
44
+ > *you* decide on pre-implementation concerns instead of a gate halting
45
+ > automatically.
46
+ >
47
+ > Both new workflow scripts are installed. The old `execute-group.js` is
48
+ > left over from before the split and should be removed.
49
+
50
+ The orphan is only safe to remove once **both** replacement workflows are
51
+ present (otherwise removing it would strand `/execute-group` with no
52
+ orchestrator):
53
+
54
+ ```bash
55
+ if [ -f .claude/workflows/execute-group-implement.js ] && \
56
+ [ -f .claude/workflows/execute-group-complete.js ]; then
57
+ rm -f .claude/workflows/execute-group.js
58
+ echo "Removed orphaned .claude/workflows/execute-group.js"
59
+ else
60
+ echo "WARN: replacement workflows not both present — leaving execute-group.js in place"
61
+ fi
62
+ ```
63
+
64
+ If either replacement is missing, leave the old script in place and tell the
65
+ user the upgrade didn't fully install the new workflows (re-run the
66
+ installer). A working monolith beats a half-removed split.
67
+
68
+ ## Note on in-flight runs
69
+
70
+ Completion Reports already written to `.claude/verification/group-*-report.json`
71
+ by the old workflow remain valid — the completion gate reads the same
72
+ `per_plan[].status`, `checkpoints.cp3_group`, and `checkpoints.integration`
73
+ fields, which the new `execute-group-complete.js` preserves. No report
74
+ migration is needed.
@@ -2,12 +2,12 @@
2
2
  name: execute-group
3
3
  description: |
4
4
  Run one parallel plan group produced by /generate-plan-groups. Validates
5
- the group hasn't drifted, then launches the execute-group workflow:
6
- cabinet pre-review, parallel worktree implementation, sequential merge with
7
- per-plan review, integration check, informed final review, and a completion
8
- report. Use when: "execute group", "run group", "/execute-group".
5
+ the group hasn't drifted, runs an interactive cabinet pre-review (CP1) you
6
+ decide on, then drives a two-workflow pipeline: mechanical parallel
7
+ implementation + merge, an operator checkpoint, then advisory review +
8
+ completion. Use when: "execute group", "run group", "/execute-group".
9
9
  disable-model-invocation: true
10
- argument-hint: "group label — e.g., '2026-05-30-1'"
10
+ argument-hint: "group label — e.g., '2026-05-30-1' (append --advisory to skip the CP1 pause)"
11
11
  related:
12
12
  - type: skill
13
13
  name: generate-plan-groups
@@ -15,10 +15,13 @@ related:
15
15
  name: execute
16
16
  - type: file
17
17
  path: .claude/cabinet/checkpoint-protocol.md
18
- role: "The cabinet checkpoint mechanism — the workflow's review agents read and follow it"
18
+ role: "The cabinet checkpoint mechanism — CP1 (interactive) and CP3 (advisory) both read it"
19
19
  - type: file
20
- path: .claude/workflows/execute-group.js
21
- role: "The orchestrator this skill launches"
20
+ path: .claude/workflows/execute-group-implement.js
21
+ role: "Stage 2 mechanical parallel implementation + sequential merge"
22
+ - type: file
23
+ path: .claude/workflows/execute-group-complete.js
24
+ role: "Stage 3 — advisory CP3 + integration + completion report"
22
25
  ---
23
26
 
24
27
  # /execute-group — Run a Generated Parallel Plan Group
@@ -26,51 +29,64 @@ related:
26
29
  ## Purpose
27
30
 
28
31
  `/generate-plan-groups` decides *what can run in parallel* and persists each
29
- conflict-free group as pib-db `grp:` tags. This skill *runs one group*: it
30
- re-checks the group is still safe, then hands off to a **workflow
31
- orchestrator** (`execute-group.js`) that drives implementation and cabinet
32
- review end to end.
33
-
34
- **Why a workflow, not direct Agent-tool spawning:** worktree agents cannot
35
- spawn sub-agents (no Agent-tool access empirically verified). So a worktree
36
- agent cannot run its own cabinet checkpoints. The workflow script solves this
37
- by being the single orchestrator: it spawns worktree agents for
38
- implementation AND cabinet agents for review as first-class parallel
39
- participants. This is the capability the old all-in-one parallel-execution
40
- skill could not provide.
32
+ conflict-free group as pib-db `grp:` tags. This skill *runs one group* as a
33
+ **three-stage pipeline with operator checkpoints between stages**:
34
+
35
+ 1. **Interactive CP1** (this skill, main session) — cabinet members pre-review
36
+ the plans; *you* decide whether to proceed, drop plans, or pass overrides.
37
+ 2. **Implementation workflow** (`execute-group-implement.js`) parallel
38
+ worktree implementation + sequential merge. Purely mechanical, no review.
39
+ 3. **Review + completion workflow** (`execute-group-complete.js`) advisory
40
+ cabinet review of the merged diff, integration check, and completion report.
41
+
42
+ **Why this shape (and not one monolithic workflow):** an earlier design ran
43
+ CP1 and CP3 *inside* a single workflow as autonomous gates — a cabinet `stop`
44
+ halted the run or reverted a merge automatically. False positives in those
45
+ gates produced expensive halts (field evidence: a CP1 halted twice in a row,
46
+ 1.6M+ tokens, on concerns the plan text already addressed). The fix:
47
+ **judgment belongs to the operator, automation stays mechanical.** CP1 is now
48
+ interactive (you see the findings and decide). CP3 is advisory (concerns are
49
+ recorded, never auto-halt/revert). The only hard, automatic gate is
50
+ `/validate` — a deterministic build check, not a judgment call.
51
+
52
+ **Why two workflows instead of direct Agent-tool spawning:** worktree agents
53
+ cannot spawn sub-agents (no Agent-tool access — empirically verified), so a
54
+ worktree agent cannot run its own checkpoints. CP1 escapes this by running at
55
+ the skill level (main session, where the Agent tool *is* available). Stages 2
56
+ and 3 are workflows because they need to spawn worktree/merge/review agents as
57
+ first-class parallel participants.
41
58
 
42
59
  ## Prerequisites
43
60
 
44
61
  - The group must have been produced by `/generate-plan-groups` (its plans
45
62
  carry `grp:<label>`, `grp-generated:`, and `grp-hash:` tags).
46
63
  - Plans must still have `## Surface Area` sections in their notes.
47
- - The Workflow tool must be available (the orchestrator runs as a workflow).
64
+ - The Workflow tool must be available (Stages 2 and 3 run as workflows).
48
65
 
49
66
  ## Honest ceiling — read before relying on this
50
67
 
51
- The workflow runs the checkpoints; it does not guarantee the *review was
52
- thorough*. Specifically:
53
-
54
- - **No mid-implementation (CP2) review.** Worktree agents implement without
55
- a reviewer looking over their shoulder. CP1 reviews before, CP3 reviews
56
- after. For a plan whose diff is large or touches high-risk surface, run
57
- `/execute <plan>` individually instead of via a group full `/execute`
58
- has the per-file-group checkpoint this path sacrifices for parallelism.
68
+ - **No mid-implementation (CP2) review.** Worktree agents implement without a
69
+ reviewer watching. CP1 reviews before, CP3 reviews after. For a plan whose
70
+ diff is large or touches high-risk surface, run `/execute <plan>`
71
+ individually instead full `/execute` has the per-file-group checkpoint
72
+ this path sacrifices for parallelism.
73
+ - **CP3 is advisory.** It surfaces concerns; it does not block completion.
74
+ Only `/validate` blocks. A real problem invisible to `/validate` (e.g. a
75
+ subtle behavioral regression) will land on main with an advisory note, not
76
+ a revert. The operator must read the CP3 concerns in the report.
59
77
  - **Surface area is intent, not reality.** Under-declared surface area can
60
- hide a semantic conflict the conflict graph missed; only CP3 catches it.
61
- - **Feature-file "affect" is heuristic.** Behavioral coupling not textually
62
- referenced may be missed.
78
+ hide a semantic conflict the conflict graph missed.
63
79
 
64
80
  ## Workflow
65
81
 
66
- ### Step 1 — Staleness guard (skill-level, BEFORE launching)
82
+ ### Step 1 — Staleness guard (BEFORE anything else)
67
83
 
68
- The persisted group is a hint, not a contract. Re-validate it against the
69
- *current* state before running:
84
+ The persisted group is a hint, not a contract. Re-validate against *current*
85
+ state:
70
86
 
71
87
  1. **Fetch the group's plans.** Query actions whose `tags` contain
72
- `grp:<label>` (the argument). Use `pib_query` (or `node scripts/pib-db.mjs
73
- query`):
88
+ `grp:<label>` (the argument, minus any `--advisory` flag). Use `pib_query`
89
+ (or `node scripts/pib-db.mjs query`):
74
90
  ```sql
75
91
  SELECT a.fid, a.text, a.notes, a.tags
76
92
  FROM actions a
@@ -81,11 +97,10 @@ The persisted group is a hint, not a contract. Re-validate it against the
81
97
  2. **Drop plans that are no longer open or lost their surface area.** Report
82
98
  each dropped plan and why.
83
99
 
84
- 3. **Recompute the surface-area hash and compare.** Recompute it **exactly
85
- as `/generate-plan-groups` did**: for every still-open plan in the group,
86
- parse its `## Surface Area` file/dir list, concatenate all entries across
87
- the group, sort, and hash. Compare to the `grp-hash:` token stored on the
88
- plans.
100
+ 3. **Recompute the surface-area hash and compare.** Recompute it **exactly as
101
+ `/generate-plan-groups` did**: for every still-open plan in the group, parse
102
+ its `## Surface Area` file/dir list, concatenate all entries across the
103
+ group, sort, and hash. Compare to the `grp-hash:` token stored on the plans.
89
104
  - **Hash matches** → the group is current. Proceed.
90
105
  - **Hash differs** → a plan's surface area changed since grouping. **HALT:**
91
106
  > Group `<label>` has drifted since it was generated (surface areas
@@ -94,90 +109,162 @@ The persisted group is a hint, not a contract. Re-validate it against the
94
109
  Do not run a stale group — the conflict-free guarantee no longer holds.
95
110
 
96
111
  4. **Edge cases:**
97
- - **0 plans survive** the filter → tell the user the group is empty
98
- (all drifted/closed) and stop. Don't launch the workflow.
99
- - **1 plan survives** → you may still launch (the workflow skips
100
- group-level checkpoints for a single plan), or just suggest
112
+ - **0 plans survive** → tell the user the group is empty (all
113
+ drifted/closed) and stop. Do not launch any workflow.
114
+ - **1 plan survives** → you may still run it (Stages 2/3 skip the
115
+ group-level aggregate review for a single plan), or suggest
101
116
  `/execute <plan>` directly. Single-plan groups gain nothing from the
102
117
  parallel machinery.
103
118
 
104
119
  ### Step 2 — Select cabinet members
105
120
 
106
- Select the cabinet members the workflow's checkpoints will use. Use
121
+ Select the cabinet members CP1 and CP3 will use. From
107
122
  `.claude/skills/_index.json`: members whose `standingMandate` includes
108
123
  `execute`, plus any whose file patterns match the group's aggregate surface
109
- area. For each, collect `{ key, agentType, path, directive }` (the
110
- `agentType` is the registered `cabinet-<name>` subagent; `directive` is
111
- `directives.execute` if present). The workflow's review agents each read
112
- `.claude/cabinet/checkpoint-protocol.md` and follow it, scoped to the
113
- checkpoint they run (group aggregate / pre-impl / post-merge).
124
+ area. For each, collect `{ key, agentType, path, directive }` (`agentType` is
125
+ the registered `cabinet-<name>` subagent; `directive` is `directives.execute`
126
+ if present).
127
+
128
+ If the project has no cabinet members, skip CP1 and tell the user the run
129
+ proceeds without review (implementation + `/validate` only).
130
+
131
+ ### Step 2.5 — Interactive CP1 (this skill, main session)
132
+
133
+ Spawn one Agent per selected cabinet member **in a single message** (parallel).
134
+ Each agent reads `.claude/cabinet/checkpoint-protocol.md` (interactive CP mode,
135
+ `pre-impl` scope), its own SKILL.md at `path`, the project briefing, and the
136
+ plans' full notes. Each returns this verdict shape (note the **required
137
+ `addressed_by_plan`** field — CP1 only):
138
+
139
+ ```
140
+ CP1_VERDICT_SCHEMA:
141
+ {
142
+ "cabinet_member": "name",
143
+ "addressed_by_plan": ["risks the plan already handles — enumerate FIRST"],
144
+ "verdict": "continue" | "pause" | "stop",
145
+ "concerns": [
146
+ { "description": "...", "evidence": "...", "severity": "blocking" | "advisory" }
147
+ ]
148
+ }
149
+ ```
150
+
151
+ `addressed_by_plan` is required to force plan-first review: the agent must
152
+ enumerate what the plan already covers *before* raising concerns. A concern
153
+ the plan explicitly handles must not be raised — this is the discipline whose
154
+ absence caused the false-positive halts.
155
+
156
+ **Present findings to the operator severity-first**, not verdict-first:
157
+ 1. **Blocking concerns** (any `severity: blocking`) — listed first, with
158
+ member + evidence.
159
+ 2. **Advisory concerns** — next.
160
+ 3. **Addressed-by-plan** — collapsed to a one-line count ("12 risks the plans
161
+ already cover") unless the operator asks to expand.
162
+
163
+ Then **the operator decides** — this is the checkpoint, not an automatic gate:
164
+ - If any agent returned `stop` or raised a blocking concern:
165
+ > A cabinet member recommends stopping: [concern]. Proceed anyway, drop the
166
+ > affected plan from this run, or abort?
167
+ - Otherwise summarize and ask: **"Launch implementation?"**
168
+
169
+ Capture the operator's response as:
170
+ - `operatorOverrides`: an array of free-text directives to pass into the
171
+ implementation agents (e.g. "skip plan X", "watch the migration ordering").
172
+ Empty array if none.
173
+ - `cp1Findings`: the structured CP1 verdicts (recorded in the final report).
174
+
175
+ **`--advisory` flag:** if the argument includes `--advisory`, still run the
176
+ CP1 agents and record `cp1Findings`, but **skip the operator pause** — print
177
+ the severity-first summary and proceed straight to Stage 2 with no overrides.
178
+ Use this for low-risk groups where you trust the plans.
179
+
180
+ **All CP1 agents errored:** if every agent failed to return a verdict, do not
181
+ silently proceed. Warn:
182
+ > Cabinet review failed (no agent returned a verdict). Proceed without review
183
+ > or abort?
184
+
185
+ ### Step 3a — Launch the Implementation workflow (Stage 2)
186
+
187
+ Invoke the Workflow tool:
188
+
189
+ - **script:** `.claude/workflows/execute-group-implement.js`
190
+ - **args:**
191
+ ```json
192
+ {
193
+ "label": "<label>",
194
+ "plans": [{ "fid": "...", "text": "...", "notes": "...", "surfaceArea": "..." }],
195
+ "operatorOverrides": ["...optional operator directives from CP1..."]
196
+ }
197
+ ```
198
+ Pass `plans` and `operatorOverrides` as real JSON (not stringified).
114
199
 
115
- If the project has no cabinet members, the workflow still runs — it just
116
- skips the checkpoints (implementation + validate only). Say so.
200
+ The workflow returns a structured result: `{ label, plans_implemented,
201
+ per_plan, merged, loose_ends }`. **Present it plainly:**
202
+ > N of M plans merged. [list any failed/parked/noop plans and their reasons
203
+ > from `loose_ends`].
117
204
 
118
- ### Step 3 Launch the workflow
205
+ Then **operator checkpoint:** "Continue to review + completion?" Wait for
206
+ confirmation before Stage 3. If nothing merged, say so and ask whether to stop
207
+ (Stage 3 has nothing to review).
119
208
 
120
- Invoke the Workflow tool with the orchestrator script and the assembled
121
- arguments:
209
+ ### Step 3b Launch the Review + Completion workflow (Stage 3)
122
210
 
123
- - **script:** `.claude/workflows/execute-group.js`
211
+ Invoke the Workflow tool:
212
+
213
+ - **script:** `.claude/workflows/execute-group-complete.js`
124
214
  - **args:**
125
215
  ```json
126
216
  {
127
217
  "label": "<label>",
128
- "plans": [{ "fid": "...", "text": "...", "notes": "...", "surfaceArea": "..." }],
218
+ "mergedPlans": [ ...the `merged` array from Stage 2's result... ],
219
+ "implPerPlan": [ ...the `per_plan` array from Stage 2's result (all plans, not just merged)... ],
129
220
  "cabinetMembers": [{ "key": "...", "agentType": "cabinet-...", "path": "...", "directive": "..." }],
221
+ "cp1Findings": [ ...the CP1 verdicts captured in Step 2.5... ],
130
222
  "checkpointProtocolPath": ".claude/cabinet/checkpoint-protocol.md",
131
223
  "briefingPath": ".claude/cabinet/_briefing.md"
132
224
  }
133
225
  ```
134
226
 
135
- Pass `plans` and `cabinetMembers` as real JSON arrays (not stringified).
227
+ The workflow runs advisory CP3 over the aggregate diff, the integration check,
228
+ and completion (it writes the Completion Report to
229
+ `.claude/verification/group-<label>-report.json` **before** marking plans done,
230
+ then marks merged plans done itself). It returns the Completion Report.
136
231
 
137
232
  ### Step 4 — Present the Completion Report
138
233
 
139
- The workflow returns a structured Completion Report. Present it plainly:
140
- which plans merged, which parked/failed, the checkpoint verdicts, the
141
- integration result, any new pib-db actions created for deferred manual ACs,
142
- and the `loose_ends`. The report is also the evidence the completion gate
143
- (`action-completion-gate.sh`) checks when a `grp:`-tagged plan is marked
144
- done — don't discard it.
234
+ Present it plainly: which plans completed, the advisory CP3 concerns (call
235
+ these out they are the operator's to weigh), the integration result, any new
236
+ pib-db actions created for deferred manual ACs, and the `loose_ends`. The
237
+ report on disk is also the evidence the completion gate
238
+ (`action-completion-gate.sh`) checks for `grp:`-tagged plans don't discard
239
+ it.
145
240
 
146
- If the workflow halted early (a checkpoint returned `stop`, or integration
147
- failed), report exactly where and why. Nothing was merged on a pre-merge
148
- halt; on a post-merge CP3 stop, the offending plan was reverted.
241
+ If completion was gated (final `/validate` failed), report exactly why; the
242
+ merged plans are left **open** for the operator to fix and re-run.
149
243
 
150
244
  #### Recovery steps for parked/failed plans
151
245
 
152
- After a mixed result, present explicit next steps for each status:
153
-
154
- - **Merged** — done. No action needed.
155
- - **Parked** (a merge was reverted after CP3 rejection, or /validate failed
156
- post-merge) the worktree branch is preserved. To retry this plan
157
- individually with full cabinet checkpoints (including the per-file-group
158
- CP2 that the group path skips), **strip its `grp:` tags first** and then
159
- run `/execute <plan>`. If you don't strip the tags, the completion gate
160
- will block because the Completion Report shows this plan as "parked," not
161
- "merged." Strip with: `pib_update_action --tags "<non-grp-tags-only>"`.
162
- - **Failed implementation** — the worktree agent could not complete the
163
- plan. Investigate the `deviations` in the report, fix the plan, then
164
- strip the `grp:` tags and run `/execute <plan>` individually.
165
- - **No result** — the worktree agent errored entirely. Same recovery:
166
- strip tags, retry via `/execute`.
167
-
168
- Re-running `/generate-plan-groups` automatically replaces stale `grp:` tags
169
- on any plans it re-groups — but only for plans it selects. Plans you retry
170
- individually should have their tags stripped before running `/execute`.
246
+ - **Merged & completed** done.
247
+ - **Parked / failed implementation / no result** — the worktree branch (if
248
+ any) is preserved. To retry individually with full cabinet checkpoints
249
+ (including the per-file-group CP2 the group path skips), **strip the `grp:`
250
+ tags first**, then run `/execute <plan>`. If you don't strip the tags, the
251
+ completion gate blocks because the report shows the plan as not-merged.
252
+ Strip with: `pib_update_action --tags "<non-grp-tags-only>"`.
253
+
254
+ Re-running `/generate-plan-groups` automatically replaces stale `grp:` tags on
255
+ any plans it re-groups but only for plans it selects.
171
256
 
172
257
  ## Principles
173
258
 
174
- - **The group is a hint, not a contract.** Always re-validate (Step 1)
175
- before running. Regenerate freely.
176
- - **The workflow is the single orchestrator.** Don't try to run the
177
- checkpoints from this skill — the whole point is that the workflow can
178
- spawn both implementors and reviewers, and a worktree agent cannot.
259
+ - **Judgment to the operator, automation mechanical.** CP1 is an interactive
260
+ decision; CP3 is advisory; `/validate` is the only automatic gate.
261
+ - **The group is a hint, not a contract.** Always re-validate (Step 1) before
262
+ running. Regenerate freely.
263
+ - **Operator checkpoints between stages.** You see CP1 findings before
264
+ implementation and merge results before review+completion. Each is a real
265
+ decision point, not a rubber stamp.
179
266
  - **Sequential merges, parallel everything else.** Merges into main are
180
- serialized with `/validate` between them; CP1, implementation, and
181
- per-plan CP3 run in parallel.
182
- - **Honest about the ceiling.** This runs the checkpoints; it does not prove
183
- the review was deep. For high-risk plans, prefer individual `/execute`.
267
+ serialized with `/validate` between them; CP1, implementation, and CP3 run
268
+ in parallel.
269
+ - **Honest about the ceiling.** For high-risk plans, prefer individual
270
+ `/execute`.