cclaw-cli 8.2.0 → 8.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -35,6 +35,15 @@
35
35
 
36
36
  Three slash commands (`/cc`, `/cc-cancel`, `/cc-idea`). Four stages (`plan → build → review → ship`). Six specialists, all on-demand, all running as sub-agents. Fifteen skills including the always-on `triage-gate`, `flow-resume`, `tdd-cycle`, `conversation-language`, and `anti-slop`. Ten templates including `plan-soft.md` and `build-soft.md` for the soft-mode path. Four runbooks. Eight reference patterns. Three research playbooks. Five recovery playbooks. Eight worked examples. Two mandatory gates in strict mode (AC traceability + TDD phase chain); soft mode keeps both as advisory; inline mode skips both.
37
37
 
38
+ ## What changed in 8.3
39
+
40
+ 8.3 is a non-breaking content + UX patch on top of 8.2.
41
+
42
+ - **Triage as a structured ask, not a code block.** The orchestrator now uses the harness's structured question tool (`AskUserQuestion` / `AskQuestion` / `prompt`) to render the triage. Two questions, in order: pick the path, then pick the run mode. The fenced form remains as a fallback only.
43
+ - **Run mode: `step` (default) vs `auto`.** `step` pauses after every stage and waits for `continue` (8.2 behaviour). `auto` chains plan → build → review → ship without pausing; stops only on block findings, cap-reached, security findings, or before `ship`. New optional field `triage.runMode` in `flow-state.json`.
44
+ - **Explicit parallel-build fan-out in Hop 3.** The `/cc` body now carries a full ASCII fan-out diagram for the strict-mode parallel-build path — `git worktree` per slice, max 5 slices, one `slice-builder` sub-agent per slice, integration reviewer, merge sequence. The skill `parallel-build.md` already had this; the orchestrator now sees it at the dispatch site.
45
+ - **TDD cycle deepening.** Four new sections in `tdd-cycle.md`: vertical slicing / tracer bullets, stop-the-line rule, Prove-It pattern for bug fixes, writing-good-tests rules (state-not-interactions, DAMP over DRY, real-over-mock, test pyramid). Three new antipatterns: A-13 horizontal slicing, A-14 pushing past a failing test, A-15 mocking what should not be mocked.
46
+
38
47
  ## What changed in 8.2
39
48
 
40
49
  8.2 is a non-breaking redesign of the `/cc` orchestrator on top of 8.1.
@@ -1,4 +1,4 @@
1
- export declare const CCLAW_VERSION = "8.2.0";
1
+ export declare const CCLAW_VERSION = "8.3.0";
2
2
  export declare const RUNTIME_ROOT = ".cclaw";
3
3
  export declare const STATE_REL_PATH = ".cclaw/state";
4
4
  export declare const HOOKS_REL_PATH = ".cclaw/hooks";
package/dist/constants.js CHANGED
@@ -1,4 +1,4 @@
1
- export const CCLAW_VERSION = "8.2.0";
1
+ export const CCLAW_VERSION = "8.3.0";
2
2
  export const RUNTIME_ROOT = ".cclaw";
3
3
  export const STATE_REL_PATH = `${RUNTIME_ROOT}/state`;
4
4
  export const HOOKS_REL_PATH = `${RUNTIME_ROOT}/hooks`;
@@ -1 +1 @@
1
- export declare const ANTIPATTERNS = "# .cclaw/lib/antipatterns.md\n\nPatterns we have seen fail. Each entry is a short symptom, the underlying mistake, and the corrective action. The orchestrator and specialists open this file when a smell is detected; the reviewer cites entries as findings when applicable.\n\n## A-1 \u2014 \"Just one more AC\"\n\n**Symptom.** A plan starts with 4 AC and ends with 11. Most of the additions appeared during build.\n\n**Underlying mistake.** Scope is being expanded mid-flight without going back to plan-stage.\n\n**Correction.** When build encounters new work, surface it as a follow-up in `.cclaw/ideas.md` or a fresh slug. If the new work is genuinely required to satisfy an existing AC, that AC was wrong; cancel the slug and re-plan with a tighter AC set.\n\n## A-2 \u2014 TDD phase integrity broken\n\n**Symptom (any of):**\n\n- Build commits land for AC-N with `--phase=green` but no `--phase=red` recorded earlier.\n- AC has RED + GREEN commits but no `--phase=refactor` (skipped or applied) entry in flow-state.\n- A `--phase=red` commit touches `src/`, `lib/`, or `app/` \u2014 production code slipped into RED.\n- Tests for AC-N appear in a separate commit a few minutes after the AC-N implementation lands.\n\n**Underlying mistake.** The TDD cycle was treated as ceremony, not as the contract. The cycle exists so the failing test encodes the AC; skipping or scrambling phases produces an audit trail that nobody can trust.\n\n**Correction.** `commit-helper.mjs` enforces RED \u2192 GREEN \u2192 REFACTOR per AC. Write a failing test first and commit under `--phase=red` (test files only). Implement the smallest production change that turns it green; commit under `--phase=green`. Either commit a refactor under `--phase=refactor` or skip it explicitly with `--phase=refactor --skipped --message=\"refactor(AC-N) skipped: <reason>\"`. The reviewer cites this entry whenever the chain is incomplete.\n\n## A-3 \u2014 Work outside the AC\n\n**Symptom (any of):**\n\n- A small AC commit also restructures an unrelated module.\n- A commit produced by `commit-helper.mjs` contains files that are unrelated to the AC.\n- `git add -A` appears in shell history inside `/cc`.\n\n**Underlying mistake.** Slice-builder absorbed unrelated edits or silently expanded scope. The AC commit no longer maps cleanly to the AC.\n\n**Correction.** Stage AC-related files explicitly: `git add <path>` per file, or `git add -p` to pick hunks. Never `git add -A` inside `/cc`. If a refactor really must happen, capture it as a follow-up; if it really blocks the AC, cancel the slug and re-plan as a refactor + behaviour-change pair.\n\n## A-4 \u2014 AC that mirror sub-tasks\n\n**Symptom.** AC read like \"implement the helper\", \"wire the helper\", \"test the helper\".\n\n**Underlying mistake.** AC are outcomes, not sub-tasks. Outcomes survive refactors; sub-tasks do not.\n\n**Correction.** Rewrite AC as observable outcomes. The helper is an implementation detail, not an AC.\n\n## A-5 \u2014 Over-careful brainstormer\n\n**Symptom.** Brainstormer produces three pages of Context for a small task; planner is then unable to size the work.\n\n**Underlying mistake.** Brainstormer ignored the routing class. Trivial / small-medium tasks should have a one-paragraph Context, not a Frame + Scope + Alternatives sweep.\n\n**Correction.** Brainstormer reads the routing class first and short-circuits when the task is small. Three sentences of Context is enough for AC-1.\n\n## A-6 \u2014 \"I already looked\"\n\n**Symptom.** Reviewer reports a \"clear\" decision without a Five Failure Modes pass.\n\n**Underlying mistake.** The Five Failure Modes pass is the artifact. Skipping it because \"I already looked\" produces no audit trail.\n\n**Correction.** Reviewer always emits the Five Failure Modes block. Each item gets yes / no with citation when yes. A \"no\" with no thinking attached is fine; an absent block is not.\n\n## A-7 \u2014 Shipping with a pending AC\n\n**Symptom.** `runCompoundAndShip()` is invoked while flow-state has at least one AC with `status: pending`.\n\n**Underlying mistake.** The agent expected the orchestrator to \"figure it out\" and complete the AC silently.\n\n**Correction.** The AC traceability gate refuses ship. Either complete the AC (slice-builder) or cancel the slug (`/cc-cancel`) and re-plan with the smaller AC set. There is no override.\n\n## A-8 \u2014 Re-creating a shipped slug instead of refining\n\n**Symptom.** A new `/cc` invocation produces a slug whose plan is 80% identical to a slug already in `.cclaw/flows/shipped/`.\n\n**Underlying mistake.** Existing-plan detection was skipped or its output was ignored.\n\n**Correction.** Existing-plan detection is mandatory at the start of every `/cc`. When a shipped match is offered, the user picks **refine shipped** or **new unrelated**, not \"ignore the match\".\n\n## A-9 \u2014 Editing shipped artifacts\n\n**Symptom.** A shipped slug's `plan.md` is edited weeks after ship.\n\n**Underlying mistake.** Shipped artifacts are immutable. Editing them invalidates the knowledge index and breaks refinement chains.\n\n**Correction.** Open a refinement slug. The new slug carries `refines: <old-slug>` and contains the corrections. The old slug stays as it shipped.\n\n## A-10 \u2014 Force-push during ship\n\n**Symptom.** `git push --force` appears in shell history during ship.\n\n**Underlying mistake.** Force-push rewrites the SHAs that flow-state and the AC traceability block reference. The chain breaks silently; nothing in the runtime detects it.\n\n**Correction.** Refuse `git push --force` inside `/cc` unless the user explicitly requested it twice (initial request + confirmation). After the force-push, every recorded SHA in the slug must be re-verified by hand and updated.\n\n## A-11 \u2014 Hidden security surface\n\n**Symptom.** A slug ships without `security_flag: true` even though the diff added a new auth-adjacent code path.\n\n**Underlying mistake.** The author judged \"this is mostly UI\" and skipped the security checklist.\n\n**Correction.** `security_flag` is set whenever the diff touches authn / authz / secrets / supply chain / data exposure, even when the change feels small. The cost of a spurious security flag is a few minutes; the cost of a missed one is a CVE.\n\n## A-12 \u2014 Single test green, didn't run the suite\n\n**Symptom.** `flows/<slug>/build.md` GREEN evidence column shows `npm test path/to/single.test` only; full-suite run is missing.\n\n**Underlying mistake.** A passing single test is not GREEN. Production change can break adjacent tests; without running the suite, the AC is shipped on a regression.\n\n**Correction.** GREEN evidence must be the **full relevant suite** for the affected module(s), not the single test. The reviewer cites this as a block finding.\n";
1
+ export declare const ANTIPATTERNS = "# .cclaw/lib/antipatterns.md\n\nPatterns we have seen fail. Each entry is a short symptom, the underlying mistake, and the corrective action. The orchestrator and specialists open this file when a smell is detected; the reviewer cites entries as findings when applicable.\n\n## A-1 \u2014 \"Just one more AC\"\n\n**Symptom.** A plan starts with 4 AC and ends with 11. Most of the additions appeared during build.\n\n**Underlying mistake.** Scope is being expanded mid-flight without going back to plan-stage.\n\n**Correction.** When build encounters new work, surface it as a follow-up in `.cclaw/ideas.md` or a fresh slug. If the new work is genuinely required to satisfy an existing AC, that AC was wrong; cancel the slug and re-plan with a tighter AC set.\n\n## A-2 \u2014 TDD phase integrity broken\n\n**Symptom (any of):**\n\n- Build commits land for AC-N with `--phase=green` but no `--phase=red` recorded earlier.\n- AC has RED + GREEN commits but no `--phase=refactor` (skipped or applied) entry in flow-state.\n- A `--phase=red` commit touches `src/`, `lib/`, or `app/` \u2014 production code slipped into RED.\n- Tests for AC-N appear in a separate commit a few minutes after the AC-N implementation lands.\n\n**Underlying mistake.** The TDD cycle was treated as ceremony, not as the contract. The cycle exists so the failing test encodes the AC; skipping or scrambling phases produces an audit trail that nobody can trust.\n\n**Correction.** `commit-helper.mjs` enforces RED \u2192 GREEN \u2192 REFACTOR per AC. Write a failing test first and commit under `--phase=red` (test files only). Implement the smallest production change that turns it green; commit under `--phase=green`. Either commit a refactor under `--phase=refactor` or skip it explicitly with `--phase=refactor --skipped --message=\"refactor(AC-N) skipped: <reason>\"`. The reviewer cites this entry whenever the chain is incomplete.\n\n## A-3 \u2014 Work outside the AC\n\n**Symptom (any of):**\n\n- A small AC commit also restructures an unrelated module.\n- A commit produced by `commit-helper.mjs` contains files that are unrelated to the AC.\n- `git add -A` appears in shell history inside `/cc`.\n\n**Underlying mistake.** Slice-builder absorbed unrelated edits or silently expanded scope. The AC commit no longer maps cleanly to the AC.\n\n**Correction.** Stage AC-related files explicitly: `git add <path>` per file, or `git add -p` to pick hunks. Never `git add -A` inside `/cc`. If a refactor really must happen, capture it as a follow-up; if it really blocks the AC, cancel the slug and re-plan as a refactor + behaviour-change pair.\n\n## A-4 \u2014 AC that mirror sub-tasks\n\n**Symptom.** AC read like \"implement the helper\", \"wire the helper\", \"test the helper\".\n\n**Underlying mistake.** AC are outcomes, not sub-tasks. Outcomes survive refactors; sub-tasks do not.\n\n**Correction.** Rewrite AC as observable outcomes. The helper is an implementation detail, not an AC.\n\n## A-5 \u2014 Over-careful brainstormer\n\n**Symptom.** Brainstormer produces three pages of Context for a small task; planner is then unable to size the work.\n\n**Underlying mistake.** Brainstormer ignored the routing class. Trivial / small-medium tasks should have a one-paragraph Context, not a Frame + Scope + Alternatives sweep.\n\n**Correction.** Brainstormer reads the routing class first and short-circuits when the task is small. Three sentences of Context is enough for AC-1.\n\n## A-6 \u2014 \"I already looked\"\n\n**Symptom.** Reviewer reports a \"clear\" decision without a Five Failure Modes pass.\n\n**Underlying mistake.** The Five Failure Modes pass is the artifact. Skipping it because \"I already looked\" produces no audit trail.\n\n**Correction.** Reviewer always emits the Five Failure Modes block. Each item gets yes / no with citation when yes. A \"no\" with no thinking attached is fine; an absent block is not.\n\n## A-7 \u2014 Shipping with a pending AC\n\n**Symptom.** `runCompoundAndShip()` is invoked while flow-state has at least one AC with `status: pending`.\n\n**Underlying mistake.** The agent expected the orchestrator to \"figure it out\" and complete the AC silently.\n\n**Correction.** The AC traceability gate refuses ship. Either complete the AC (slice-builder) or cancel the slug (`/cc-cancel`) and re-plan with the smaller AC set. There is no override.\n\n## A-8 \u2014 Re-creating a shipped slug instead of refining\n\n**Symptom.** A new `/cc` invocation produces a slug whose plan is 80% identical to a slug already in `.cclaw/flows/shipped/`.\n\n**Underlying mistake.** Existing-plan detection was skipped or its output was ignored.\n\n**Correction.** Existing-plan detection is mandatory at the start of every `/cc`. When a shipped match is offered, the user picks **refine shipped** or **new unrelated**, not \"ignore the match\".\n\n## A-9 \u2014 Editing shipped artifacts\n\n**Symptom.** A shipped slug's `plan.md` is edited weeks after ship.\n\n**Underlying mistake.** Shipped artifacts are immutable. Editing them invalidates the knowledge index and breaks refinement chains.\n\n**Correction.** Open a refinement slug. The new slug carries `refines: <old-slug>` and contains the corrections. The old slug stays as it shipped.\n\n## A-10 \u2014 Force-push during ship\n\n**Symptom.** `git push --force` appears in shell history during ship.\n\n**Underlying mistake.** Force-push rewrites the SHAs that flow-state and the AC traceability block reference. The chain breaks silently; nothing in the runtime detects it.\n\n**Correction.** Refuse `git push --force` inside `/cc` unless the user explicitly requested it twice (initial request + confirmation). After the force-push, every recorded SHA in the slug must be re-verified by hand and updated.\n\n## A-11 \u2014 Hidden security surface\n\n**Symptom.** A slug ships without `security_flag: true` even though the diff added a new auth-adjacent code path.\n\n**Underlying mistake.** The author judged \"this is mostly UI\" and skipped the security checklist.\n\n**Correction.** `security_flag` is set whenever the diff touches authn / authz / secrets / supply chain / data exposure, even when the change feels small. The cost of a spurious security flag is a few minutes; the cost of a missed one is a CVE.\n\n## A-12 \u2014 Single test green, didn't run the suite\n\n**Symptom.** `flows/<slug>/build.md` GREEN evidence column shows `npm test path/to/single.test` only; full-suite run is missing.\n\n**Underlying mistake.** A passing single test is not GREEN. Production change can break adjacent tests; without running the suite, the AC is shipped on a regression.\n\n**Correction.** GREEN evidence must be the **full relevant suite** for the affected module(s), not the single test. The reviewer cites this as a block finding.\n\n## A-13 \u2014 Horizontal slicing (RED-batch then GREEN-batch)\n\n**Symptom.** `flows/<slug>/build.md` shows AC-1 RED, AC-2 RED, AC-3 RED committed in a row, then AC-1 GREEN, AC-2 GREEN, AC-3 GREEN. Or the slice-builder describes the build as \"tests written, now I'll implement\".\n\n**Underlying mistake.** Writing all RED tests before any GREEN code means the tests describe the behaviour you *guessed* before you saw the real interface. Tests written this way pass when behaviour breaks (because they test the imagined shape) and fail when behaviour is fine (because the real shape diverged from the imagination). They get rewritten during the next refactor.\n\n**Correction.** One test \u2192 one implementation \u2192 repeat. Each cycle informs the next. The AC-2 test is shaped by what the AC-1 implementation revealed about the real interface. `commit-helper.mjs --phase=red` for AC-2 will refuse if AC-1's chain isn't closed yet \u2014 that is the rail. See the Vertical Slicing section in `tdd-cycle.md`.\n\n## A-14 \u2014 Pushing past a failing test\n\n**Symptom.** Build log shows a flaky or unexpected failure on AC-2, then continues into AC-3 with \"I'll come back to AC-2 later\". Or a hook rejection silently retried with a slightly different commit message.\n\n**Underlying mistake.** Errors compound. AC-3 is built on the invariants AC-2 was supposed to establish. If AC-2's RED failed for the wrong reason, you are debugging a stack of broken assumptions, and every cycle past that point makes the diagnosis harder.\n\n**Correction.** Stop the line. Preserve the failure (command + 1\u20133 lines of output verbatim), reproduce in isolation, root-cause to a concrete file:line, fix once, re-run the full relevant suite, then resume the cycle. If the root cause cannot be identified in three attempts, surface a blocker to the orchestrator \u2014 do not \"make it work\" by removing the test or weakening the assertion.\n\n## A-15 \u2014 Mocking what should not be mocked\n\n**Symptom.** A database query test mocks the driver and asserts on `db.query` call shape; the test is green and the actual query never runs in production. Or a service test mocks every collaborator and only verifies which methods were called, in which order.\n\n**Underlying mistake.** Mocking a dependency you control couples the test to the implementation. The test reads green even when the SQL is wrong, the migration is missing, the column is misspelled. Real bugs live in those gaps. Interaction-based assertions (`expect(x).toHaveBeenCalledWith(...)`) break on every refactor and provide weaker confidence than state-based assertions on the outcome.\n\n**Correction.** Use a real test database (or an in-memory fake of the same shape) and assert on the **outcome** \u2014 the row that was inserted, the response from the query, the observable side effect \u2014 not on the call. Reach for mocks only for things genuinely outside your control: third-party APIs, time, randomness, the network. Real > Fake (in-memory) > Stub (canned data) > Mock (interaction).\n";
@@ -106,4 +106,28 @@ Patterns we have seen fail. Each entry is a short symptom, the underlying mistak
106
106
  **Underlying mistake.** A passing single test is not GREEN. Production change can break adjacent tests; without running the suite, the AC is shipped on a regression.
107
107
 
108
108
  **Correction.** GREEN evidence must be the **full relevant suite** for the affected module(s), not the single test. The reviewer cites this as a block finding.
109
+
110
+ ## A-13 — Horizontal slicing (RED-batch then GREEN-batch)
111
+
112
+ **Symptom.** \`flows/<slug>/build.md\` shows AC-1 RED, AC-2 RED, AC-3 RED committed in a row, then AC-1 GREEN, AC-2 GREEN, AC-3 GREEN. Or the slice-builder describes the build as "tests written, now I'll implement".
113
+
114
+ **Underlying mistake.** Writing all RED tests before any GREEN code means the tests describe the behaviour you *guessed* before you saw the real interface. Tests written this way pass when behaviour breaks (because they test the imagined shape) and fail when behaviour is fine (because the real shape diverged from the imagination). They get rewritten during the next refactor.
115
+
116
+ **Correction.** One test → one implementation → repeat. Each cycle informs the next. The AC-2 test is shaped by what the AC-1 implementation revealed about the real interface. \`commit-helper.mjs --phase=red\` for AC-2 will refuse if AC-1's chain isn't closed yet — that is the rail. See the Vertical Slicing section in \`tdd-cycle.md\`.
117
+
118
+ ## A-14 — Pushing past a failing test
119
+
120
+ **Symptom.** Build log shows a flaky or unexpected failure on AC-2, then continues into AC-3 with "I'll come back to AC-2 later". Or a hook rejection silently retried with a slightly different commit message.
121
+
122
+ **Underlying mistake.** Errors compound. AC-3 is built on the invariants AC-2 was supposed to establish. If AC-2's RED failed for the wrong reason, you are debugging a stack of broken assumptions, and every cycle past that point makes the diagnosis harder.
123
+
124
+ **Correction.** Stop the line. Preserve the failure (command + 1–3 lines of output verbatim), reproduce in isolation, root-cause to a concrete file:line, fix once, re-run the full relevant suite, then resume the cycle. If the root cause cannot be identified in three attempts, surface a blocker to the orchestrator — do not "make it work" by removing the test or weakening the assertion.
125
+
126
+ ## A-15 — Mocking what should not be mocked
127
+
128
+ **Symptom.** A database query test mocks the driver and asserts on \`db.query\` call shape; the test is green and the actual query never runs in production. Or a service test mocks every collaborator and only verifies which methods were called, in which order.
129
+
130
+ **Underlying mistake.** Mocking a dependency you control couples the test to the implementation. The test reads green even when the SQL is wrong, the migration is missing, the column is misspelled. Real bugs live in those gaps. Interaction-based assertions (\`expect(x).toHaveBeenCalledWith(...)\`) break on every refactor and provide weaker confidence than state-based assertions on the outcome.
131
+
132
+ **Correction.** Use a real test database (or an in-memory fake of the same shape) and assert on the **outcome** — the row that was inserted, the response from the query, the observable side effect — not on the call. Reach for mocks only for things genuinely outside your control: third-party APIs, time, randomness, the network. Real > Fake (in-memory) > Stub (canned data) > Mock (interaction).
109
133
  `;
@@ -5,9 +5,7 @@ trigger: at the start of every new /cc invocation, before any specialist runs
5
5
 
6
6
  # Skill: triage-gate
7
7
 
8
- Every new flow opens with a **triage gate**. The orchestrator analyses the user's request, picks a complexity class, names an AC mode, proposes a path, and **asks the user to confirm**. Nothing else runs until the user has confirmed (or overridden) the triage.
9
-
10
- This skill exists because cclaw v8.1 used to silently pick a path and lock the user into it. v8.2 makes that decision explicit, audit-able, and overridable.
8
+ Every new flow opens with a **triage gate**. The orchestrator analyses the user's request, picks a complexity class, names an AC mode, proposes a path, and **asks the user to confirm twice**: once for the path, once for the run mode (autopilot or step-by-step). Nothing else runs until both questions are answered.
11
9
 
12
10
  ## When this skill applies
13
11
 
@@ -15,9 +13,37 @@ This skill exists because cclaw v8.1 used to silently pick a path and lock the u
15
13
  - Skipped on \`/cc\` (no argument) when an active flow is detected — see \`flow-resume.md\`.
16
14
  - Skipped on \`/cc-cancel\` and \`/cc-idea\` (these never open a flow).
17
15
 
18
- ## Output format (mandatory)
16
+ ## How to render the question — STRUCTURED, not prose
17
+
18
+ If the harness exposes a structured question tool — \`AskUserQuestion\` (Claude Code), \`AskQuestion\` (Cursor), an "ask" content block (OpenCode), \`prompt\` (Codex) — **use it**. Two separate calls, in order. Do **not** print the triage as a code block and rely on the user reading numbered options. v8.2 shipped that way and the harness rendered prose; v8.3 fixes it.
19
+
20
+ ### Question 1 — path
21
+
22
+ Render the analysis as the question prompt and the four choices as options:
23
+
24
+ - prompt: \`Triage — Complexity: small/medium (high). Recommended: plan → build → review → ship. Why: 3 modules, ~150 LOC, no auth touch. AC mode: soft. Pick a path.\`
25
+ - options:
26
+ - \`Proceed as recommended\`
27
+ - \`Switch to trivial (inline edit + commit, skip plan/review)\`
28
+ - \`Escalate to large-risky (add brainstormer/architect, strict AC, parallel slices)\`
29
+ - \`Custom (let me edit complexity / acMode / path)\`
30
+
31
+ The prompt MUST embed the four heuristic facts (complexity + confidence, recommended path, why, ac mode) so the user can decide without reading another block. Keep it under 280 characters; truncate the rationale before truncating the facts.
32
+
33
+ ### Question 2 — run mode
34
+
35
+ Right after the user picks a path, ask:
36
+
37
+ - prompt: \`Run mode for this flow?\`
38
+ - options:
39
+ - \`Step (default) — pause after every stage; I type "continue" to advance\`
40
+ - \`Auto — run plan → build → review → ship without pausing; stop only on block findings or security flag\`
41
+
42
+ Default \`step\` if the user dismisses the question or the harness lacks a structured ask facility. Inline / trivial flows skip Question 2 (there are no stages to chain).
19
43
 
20
- Reply with a single fenced block followed by an option list:
44
+ ## Fallback when no structured ask tool exists
45
+
46
+ Only when the harness has no structured ask facility (rare; legacy CLI mode), print the same content as a fenced block plus numbered options:
21
47
 
22
48
  \`\`\`
23
49
  Triage
@@ -27,8 +53,6 @@ Triage
27
53
  ─ AC mode: <inline | soft | strict>
28
54
  \`\`\`
29
55
 
30
- Then list the four options verbatim:
31
-
32
56
  \`\`\`
33
57
  [1] Proceed as recommended
34
58
  [2] Switch to trivial (inline edit + commit, skip plan/review)
@@ -36,6 +60,16 @@ Then list the four options verbatim:
36
60
  [4] Custom (let me edit complexity / acMode / path)
37
61
  \`\`\`
38
62
 
63
+ Then a separate block for run mode:
64
+
65
+ \`\`\`
66
+ Run mode
67
+ [s] Step — pause after every stage (default)
68
+ [a] Auto — chain stages without pausing; stop only on block findings or security flag
69
+ \`\`\`
70
+
71
+ The fenced form is a fallback, not the primary path. Always try the structured tool first.
72
+
39
73
  ## Heuristics — how to pick
40
74
 
41
75
  Rank the request against these signals. The orchestrator picks the **highest** complexity any signal triggers (escalation is one-way).
@@ -62,7 +96,7 @@ If the heuristic gives \`small/medium\` but the user said something like "featur
62
96
 
63
97
  ## What the orchestrator records
64
98
 
65
- After the user picks (1)/(2)/(3)/(4), patch \`.cclaw/state/flow-state.json\`:
99
+ After both questions are answered, patch \`.cclaw/state/flow-state.json\`:
66
100
 
67
101
  \`\`\`json
68
102
  {
@@ -72,14 +106,15 @@ After the user picks (1)/(2)/(3)/(4), patch \`.cclaw/state/flow-state.json\`:
72
106
  "path": ["plan", "build", "review", "ship"],
73
107
  "rationale": "3 modules, ~150 LOC, no auth touch.",
74
108
  "decidedAt": "2026-05-08T12:34:56Z",
75
- "userOverrode": false
109
+ "userOverrode": false,
110
+ "runMode": "step"
76
111
  }
77
112
  }
78
113
  \`\`\`
79
114
 
80
- \`userOverrode\` is \`true\` only when the user picked (2), (3), or a (4) custom that disagrees with the recommendation.
115
+ \`userOverrode\` is \`true\` only when the user picked (2), (3), or a (4) custom that disagrees with the recommendation. \`runMode\` is \`step\` by default; record \`auto\` only when the user explicitly opted into autopilot in Question 2.
81
116
 
82
- The triage block is **immutable for the lifetime of the flow**. If the user wants to escalate mid-flight (e.g. discovers it is bigger than thought), \`/cc-cancel\` and start a fresh flow with new triage.
117
+ The triage block is **immutable for the lifetime of the flow**. If the user wants to escalate mid-flight (e.g. discovers it is bigger than thought), \`/cc-cancel\` and start a fresh flow with new triage. Switching from \`step\` to \`auto\` (or vice versa) is also a fresh-flow decision — the orchestrator does not flip mid-flight.
83
118
 
84
119
  ## Path semantics
85
120
 
@@ -160,11 +195,12 @@ The user is expected to clarify in (4) Custom or accept (1) Proceed; either way
160
195
 
161
196
  ## Common pitfalls
162
197
 
163
- - Returning the triage as prose paragraphs instead of the fenced block. The orchestrator expects the structured form so it can parse \`acMode\` and \`path\` reliably.
198
+ - **Rendering the triage as a code block when a structured ask tool is available.** v8.3 fixes this: try the harness's structured ask facility (\`AskUserQuestion\` / \`AskQuestion\` / \`prompt\` / "ask" content block) first; the fenced form is a fallback only.
164
199
  - Stating "I think this is medium-complexity" and then immediately invoking planner. That is the v8.1 bug. Wait for the user's pick.
165
200
  - Picking \`large-risky\` for a one-file rename "to be safe". Do not pad the heuristic; the user reads it and learns to ignore your triage.
201
+ - Forgetting to ask Question 2 (run mode) after Question 1 (path). \`triage.runMode\` controls Hop 4 (pause); a missing value defaults to \`step\` — safe but wastes a click for users who wanted autopilot.
166
202
  - Forgetting to write \`triage\` into \`flow-state.json\`. The hook check \`commit-helper.mjs\` and the resume detector both read it; an absent triage breaks both.
167
- - Re-running the gate on resume. Resume reads the saved triage and continues from \`currentStage\`; it never re-prompts.
203
+ - Re-running the gate on resume. Resume reads the saved triage (path + runMode) and continues from \`currentStage\`; it never re-prompts.
168
204
  `;
169
205
  const FLOW_RESUME = `---
170
206
  name: flow-resume
@@ -711,6 +747,63 @@ Silence fails the gate.
711
747
 
712
748
  (a) **discovery_complete** — relevant tests / fixtures / helpers / commands cited.\n(b) **impact_check_complete** — affected callbacks / state / interfaces / contracts named.\n(c) **red_test_recorded** — failing test exists, watched-RED proof attached.\n(d) **red_fails_for_right_reason** — RED captured a real assertion failure.\n(e) **green_full_suite** — full relevant suite green after GREEN.\n(f) **refactor_run_or_skipped_with_reason** — REFACTOR ran, or explicitly skipped with reason.\n(g) **traceable_to_plan** — commits reference plan AC ids and the plan's file set.\n(h) **commit_chain_intact** — RED + GREEN + REFACTOR SHAs (or skipped sentinel) recorded in flow-state.
713
749
 
750
+ ## Vertical slicing — tracer bullets, never horizontal waves
751
+
752
+ **One test → one impl → repeat.** Even in strict mode, you do not write all RED tests for the slice and then all GREEN code. That horizontal pattern produces tests of *imagined* behaviour: the data shape you guessed, the function signature you guessed, the error message you guessed. The tests pass when behaviour breaks and fail when behaviour is fine.
753
+
754
+ The correct pattern is a tracer bullet per AC:
755
+
756
+ \`\`\`
757
+ WRONG (horizontal):
758
+ RED: AC-1 test, AC-2 test, AC-3 test
759
+ GREEN: AC-1 impl, AC-2 impl, AC-3 impl
760
+
761
+ RIGHT (vertical / tracer bullet):
762
+ AC-1: RED → GREEN → REFACTOR (commit chain closes here)
763
+ AC-2: RED → GREEN → REFACTOR (next chain starts here, informed by what you learned in AC-1)
764
+ AC-3: RED → GREEN → REFACTOR
765
+ \`\`\`
766
+
767
+ Each cycle informs the next. The AC-2 test is shaped by what the AC-1 implementation revealed about the real interface. \`commit-helper.mjs --phase=red\` for AC-2 will refuse if AC-1's chain isn't closed yet — that's the rail.
768
+
769
+ In soft mode the same principle applies at feature granularity: write 1–3 tests for the highest-priority condition, implement, then if more tests are needed for adjacent conditions, write them after you've seen the real shape of the GREEN code.
770
+
771
+ ## Stop-the-line rule
772
+
773
+ When **anything** unexpected happens during build — a test fails for the wrong reason, the build breaks, a prior-green test starts failing, a hook rejects a commit — **stop adding code**. Do not push past the failure to "come back later". Errors compound: a wrong assumption in AC-1 makes AC-2 and AC-3 wrong.
774
+
775
+ Procedure:
776
+
777
+ 1. Preserve evidence. Capture the failing command + 1–3 lines of output verbatim.
778
+ 2. Reproduce in isolation. Run only the failing test to confirm it fails reliably.
779
+ 3. Diagnose root cause. Trace the failing assertion back to a concrete cause (the actual cause, not the first plausible one). Cite the file:line in the build log.
780
+ 4. Fix. The fix is a refactor of the GREEN code, a correction of the RED test (if it tested the wrong thing), or a new RED that captures the missed behaviour — never silent.
781
+ 5. Re-run the **full relevant suite**. A passing single test is not GREEN if the suite is red elsewhere.
782
+ 6. Resume the cycle from where you stopped, with the chain intact.
783
+
784
+ If the root cause cannot be identified in three attempts, surface a blocker to the orchestrator. Do not "make it work" by removing the test, weakening the assertion, or commenting out the failure.
785
+
786
+ ## Prove-It pattern (bug fixes)
787
+
788
+ When the input is a bug fix, the order is non-negotiable:
789
+
790
+ 1. **Write a failing test that reproduces the bug.** This is the watched-RED proof. If you cannot reproduce the bug with a test, you cannot fix it with confidence — go gather more context.
791
+ 2. Confirm the test fails for the right reason — your test captured the bug, not a syntax / fixture / import error.
792
+ 3. Fix the bug. Smallest possible production diff that turns the new test green.
793
+ 4. Run the full relevant suite — the fix must not break adjacent behaviour.
794
+ 5. Refactor.
795
+
796
+ Bug-fix RED commits use \`--phase=red\` like any other RED. The AC id is the user's bug-fix slug (e.g. \`AC-1: completing a task sets completedAt\`). In soft mode, the same five steps apply, just with one cycle for the whole fix and a plain \`git commit\`.
797
+
798
+ ## Writing good tests (state, not interactions; DAMP, not DRY)
799
+
800
+ These rules apply equally to soft and strict modes. They make the difference between tests that survive a refactor and tests that have to be rewritten every time.
801
+
802
+ - **Test state, not interactions.** Assert on the *outcome* of the operation — return value, persisted record, observable side effect — not on which methods were called internally. \`expect(result).toEqual(...)\` is good; \`expect(db.query).toHaveBeenCalledWith(...)\` couples the test to the implementation.
803
+ - **DAMP over DRY in tests.** A test should read like a specification. Each test independently understandable beats a clever shared setup that reads well only after tracing helpers. Duplication in test code is acceptable when it makes each case independently readable.
804
+ - **Prefer real implementations over mocks.** The more your tests use real code, the more confidence they provide. Mock only what is genuinely outside your control (third-party APIs, time, randomness). Real > Fake (in-memory) > Stub (canned data) > Mock (interaction). Reach for the simplest level that gets the job done.
805
+ - **Test pyramid: small / medium / large.** Most tests should be small (single process, no I/O, milliseconds). A handful are medium (boundary tests, in-process integration, seconds). E2E / multi-machine tests stay reserved for critical paths only.
806
+
714
807
  ## Anti-patterns
715
808
 
716
809
  - "The implementation is obvious, skipping RED." A-13 — gate fails immediately.
@@ -719,6 +812,9 @@ Silence fails the gate.
719
812
  - "Stage everything with \`git add -A\`." A-16 — staged unrelated edits leak into the AC commit.
720
813
  - "Production code in the RED commit." A-17 — RED is test files only.
721
814
  - **"Test file named after the AC id" — \`AC-1.test.ts\`, \`tests/AC-2.spec.ts\`, etc.** The reviewer flags this as \`block\`. Mirror the unit under test in the filename; carry the AC id inside the test name and commit message only.
815
+ - **Horizontal slicing.** A-18 — writing all RED tests first, then all GREEN code, produces tests of imagined behaviour. One test → one impl → repeat. See the Vertical Slicing section above.
816
+ - **Pushing past a failing test.** A-19 — the next cycle is built on the previous cycle's invariants; if those invariants are broken you are debugging a stack of broken assumptions. Stop the line, root-cause, then resume.
817
+ - **Mocking what you should not mock.** A-20 — mocking the database for a query test reads green and breaks in production. Use a fake or a real test DB; mock only what is genuinely outside your control.
722
818
 
723
819
  ## Fix-only flow
724
820
 
@@ -1,7 +1,30 @@
1
1
  import { CORE_AGENTS } from "./core-agents.js";
2
2
  import { ironLawsMarkdown } from "./iron-laws.js";
3
3
  const SPECIALIST_LIST = CORE_AGENTS.map((agent) => `- **${agent.id}** (${agent.modes.join(" / ")}) — ${agent.description}`).join("\n");
4
- const TRIAGE_BLOCK_EXAMPLE = `\`\`\`
4
+ const TRIAGE_ASK_EXAMPLE = `\`\`\`
5
+ askUserQuestion(
6
+ prompt: "Triage — Complexity: small/medium (high). Recommended: plan → build → review → ship. Why: 3 modules, ~150 LOC, no auth touch. AC mode: soft. Pick a path.",
7
+ options: [
8
+ "Proceed as recommended",
9
+ "Switch to trivial (inline edit + commit, skip plan/review)",
10
+ "Escalate to large-risky (add brainstormer/architect, strict AC, parallel slices)",
11
+ "Custom (let me edit complexity / acMode / path)"
12
+ ],
13
+ multiSelect: false
14
+ )
15
+
16
+ # After the user picks, ask the second question:
17
+
18
+ askUserQuestion(
19
+ prompt: "Run mode for this flow?",
20
+ options: [
21
+ "Step (default) — pause after every stage; I type \\"continue\\" to advance",
22
+ "Auto — chain plan → build → review → ship; stop only on block findings or security flag"
23
+ ],
24
+ multiSelect: false
25
+ )
26
+ \`\`\``;
27
+ const TRIAGE_FALLBACK_EXAMPLE = `\`\`\`
5
28
  Triage
6
29
  ─ Complexity: small/medium (confidence: high)
7
30
  ─ Recommended path: plan → build → review → ship
@@ -12,6 +35,12 @@ Triage
12
35
  [2] Switch to trivial (inline edit + commit, skip plan/review)
13
36
  [3] Escalate to large-risky (add brainstormer/architect, strict AC, parallel slices)
14
37
  [4] Custom (let me edit complexity / acMode / path)
38
+ \`\`\`
39
+
40
+ \`\`\`
41
+ Run mode
42
+ [s] Step — pause after every stage (default)
43
+ [a] Auto — chain stages; stop only on block findings or security flag
15
44
  \`\`\``;
16
45
  const TRIAGE_PERSIST_EXAMPLE = `\`\`\`json
17
46
  {
@@ -21,7 +50,8 @@ const TRIAGE_PERSIST_EXAMPLE = `\`\`\`json
21
50
  "path": ["plan", "build", "review", "ship"],
22
51
  "rationale": "3 modules, ~150 LOC, no auth touch.",
23
52
  "decidedAt": "2026-05-08T12:34:56Z",
24
- "userOverrode": false
53
+ "userOverrode": false,
54
+ "runMode": "step"
25
55
  }
26
56
  }
27
57
  \`\`\``;
@@ -101,17 +131,25 @@ Do not auto-delete state. Do not hand-edit the JSON.
101
131
 
102
132
  ## Hop 2 — Triage (fresh starts only)
103
133
 
104
- Run the \`triage-gate.md\` skill. The output is a single fenced block followed by four numbered options:
134
+ Run the \`triage-gate.md\` skill. **Use the harness's structured question tool** (\`AskUserQuestion\` in Claude Code, \`AskQuestion\` in Cursor, the "ask" content block in OpenCode, \`prompt\` in Codex). Two questions, in order:
135
+
136
+ ${TRIAGE_ASK_EXAMPLE}
137
+
138
+ The first question's prompt MUST embed the four heuristic facts (complexity + confidence, recommended path, why, AC mode) so the user can decide without reading another block. Keep it under 280 characters; truncate the rationale before truncating the facts.
139
+
140
+ The second question is skipped on the trivial / inline path (no stages to chain). Default \`runMode\` is \`step\` if the user dismisses the question.
141
+
142
+ If the harness lacks a structured ask facility, fall back to the legacy form:
105
143
 
106
- ${TRIAGE_BLOCK_EXAMPLE}
144
+ ${TRIAGE_FALLBACK_EXAMPLE}
107
145
 
108
- Wait for the user's pick. Then patch \`flow-state.json\`:
146
+ Once both answers are in, patch \`flow-state.json\`:
109
147
 
110
148
  ${TRIAGE_PERSIST_EXAMPLE}
111
149
 
112
- The triage decision is **immutable** for the lifetime of the flow. If the user wants a different acMode mid-flight, the path is \`/cc-cancel\` and a fresh \`/cc\` invocation.
150
+ The triage decision is **immutable** for the lifetime of the flow. If the user wants a different acMode or runMode mid-flight, the path is \`/cc-cancel\` and a fresh \`/cc\` invocation.
113
151
 
114
- After triage, the rest of the orchestrator runs the stages listed in \`triage.path\`, in order, pausing between each.
152
+ After triage, the rest of the orchestrator runs the stages listed in \`triage.path\`, in order. Pause behaviour between stages is controlled by \`triage.runMode\` — see Hop 4.
115
153
 
116
154
  ### Trivial path (acMode: inline)
117
155
 
@@ -186,11 +224,78 @@ The orchestrator reads only this. The full artifact stays in \`.cclaw/flows/<slu
186
224
  - Specialist: \`slice-builder\`.
187
225
  - Inputs: \`.cclaw/flows/<slug>/plan.md\`, \`.cclaw/lib/templates/build.md\`, \`.cclaw/lib/skills/tdd-cycle.md\`.
188
226
  - Output: \`.cclaw/flows/<slug>/build.md\` with TDD evidence at the granularity dictated by \`acMode\`.
189
- - Strict mode: full RED GREEN REFACTOR per AC, every commit through \`commit-helper.mjs\`. Parallel-build only if planner declared it AND \`acMode == strict\`.
190
- - Soft mode: one TDD cycle for the whole feature; tests under \`tests/\` mirroring the production module path; plain \`git commit\`.
227
+ - Soft mode: one TDD cycle for the whole feature; tests under \`tests/\` mirroring the production module path; plain \`git commit\`. Sequential, single dispatch, no worktrees.
228
+ - Strict mode, sequential: full RED GREEN REFACTOR per AC, every commit through \`commit-helper.mjs\`. Single \`slice-builder\` dispatch in the main working tree.
229
+ - Strict mode, parallel: see "Parallel-build fan-out" below — only when planner declared \`topology: parallel-build\` AND ≥4 AC AND ≥2 disjoint touchSurface clusters.
191
230
  - Inline mode: not dispatched here — handled in the trivial path of Hop 2.
192
231
  - Slim summary: AC committed (strict) or conditions verified (soft), suite-status (passed / failed), open follow-ups.
193
232
 
233
+ ##### Parallel-build fan-out (strict mode + planner topology=parallel-build only)
234
+
235
+ When the planner artifact declares \`topology: parallel-build\` with ≥2 slices and \`acMode == strict\`, the orchestrator fans out one \`slice-builder\` sub-agent per slice, **capped at 5**, each in its own \`git worktree\`. This is the only fan-out cclaw uses outside of \`ship\`.
236
+
237
+ \`\`\`text
238
+ flows/<slug>/plan.md
239
+ topology: parallel-build
240
+ slices: [s-1, s-2, s-3] (max 5)
241
+
242
+
243
+ git worktree add .cclaw/worktrees/<slug>-s-1 -b cclaw/<slug>/s-1
244
+ git worktree add .cclaw/worktrees/<slug>-s-2 -b cclaw/<slug>/s-2
245
+ git worktree add .cclaw/worktrees/<slug>-s-3 -b cclaw/<slug>/s-3
246
+
247
+ ┌───────────────────┼───────────────────┐
248
+ ▼ ▼ ▼
249
+ slice-builder slice-builder slice-builder
250
+ (s-1; AC-1, AC-2) (s-2; AC-3) (s-3; AC-4, AC-5)
251
+ cwd: …/<slug>-s-1 cwd: …/<slug>-s-2 cwd: …/<slug>-s-3
252
+ RED→GREEN→REFACTOR RED→GREEN→REFACTOR RED→GREEN→REFACTOR
253
+ per AC, in slice per AC, in slice per AC, in slice
254
+ │ │ │
255
+ └───────────────────┼───────────────────┘
256
+
257
+ reviewer (mode=integration)
258
+ reads each branch, checks
259
+ cross-slice conflicts, AC↔commit
260
+ chain across the wave
261
+
262
+
263
+ merge cclaw/<slug>/s-1 → main, then s-2, then s-3
264
+ (fast-forward when wave was clean; otherwise stop and ask)
265
+
266
+
267
+ git worktree remove .cclaw/worktrees/<slug>-s-N (per slice)
268
+ \`\`\`
269
+
270
+ Dispatch envelope per slice:
271
+
272
+ \`\`\`
273
+ Dispatch slice-builder
274
+ ─ Stage: build
275
+ ─ Slug: <slug>
276
+ ─ Slice: s-N (acIds: [AC-N, AC-N+1])
277
+ ─ Working tree: .cclaw/worktrees/<slug>-s-N
278
+ ─ Branch: cclaw/<slug>/s-N
279
+ ─ AC mode: strict
280
+ ─ Touch surface (only paths this slice may modify): [<paths from plan>]
281
+ ─ Output: .cclaw/flows/<slug>/build.md (append, marked with slice id)
282
+ ─ Forbidden: read or modify any path outside touch surface; read another slice's worktree mid-flight; merge or rebase
283
+ \`\`\`
284
+
285
+ After every slice-builder returns:
286
+
287
+ 1. Patch \`flow-state.json\` with the per-slice progress.
288
+ 2. When **every** slice has reported, dispatch \`reviewer\` mode=\`integration\` (one sub-agent, reads from each branch).
289
+ 3. On clear integration review, merge slices into main one at a time. On block, dispatch \`slice-builder\` mode=\`fix-only\` against the cited file:line refs, then re-run the integration reviewer.
290
+ 4. Worktree cleanup happens after merge; the cclaw branches stay until ship.
291
+
292
+ Hard rules:
293
+
294
+ - **More than 5 parallel slices is forbidden.** If planner produced >5, the planner must merge thinner slices into fatter ones before build; do not generate "wave 2".
295
+ - Slice-builders never read each other's worktrees mid-flight. A slice that detects a conflict with another stops and raises an integration finding.
296
+ - If the harness lacks sub-agent dispatch or worktree creation fails (non-git repo, permissions), parallel-build degrades silently to inline-sequential. Record the fallback in \`flows/<slug>/build.md\` frontmatter (\`subAgentDispatch: inline-fallback\`) — not an error.
297
+ - \`auto\` runMode does **not** affect the integration-reviewer ask: a parallel wave that produces a block finding always asks the user before fix-only.
298
+
194
299
  #### review
195
300
 
196
301
  - Specialist: \`reviewer\` (mode = \`code\` for sequential build, \`integration\` for parallel-build).
@@ -220,6 +325,10 @@ Each step is a separate dispatch + pause + slim summary. The user can stop after
220
325
 
221
326
  ## Hop 4 — Pause and resume
222
327
 
328
+ Pause behaviour depends on \`triage.runMode\` (default \`step\`).
329
+
330
+ ### \`step\` mode (default; safer; recommended for \`strict\` work)
331
+
223
332
  After every dispatch returns:
224
333
 
225
334
  1. Render the slim summary back to the user.
@@ -227,7 +336,25 @@ After every dispatch returns:
227
336
  3. Wait. Do **not** auto-advance. The user types \`continue\`, \`show\`, \`fix-only\`, or \`cancel\`.
228
337
  4. On \`continue\` → next stage in \`triage.path\`. On \`show\` → open the artifact and stop. On \`fix-only\` → re-dispatch slice-builder with mode=fix-only and the cited findings. On \`cancel\` → \`/cc-cancel\`.
229
338
 
230
- Resume from a fresh session works because everything is on disk: \`flow-state.json\` has \`currentStage\` and \`triage\`, \`flows/<slug>/*.md\` carries the artifacts. The next \`/cc\` invocation enters Hop 1 → detect → resume summary → continue from \`currentStage\`.
339
+ ### \`auto\` mode (autopilot; faster; recommended for \`inline\` / \`soft\` work)
340
+
341
+ After every dispatch returns:
342
+
343
+ 1. Render the slim summary back to the user (one block, no prompt).
344
+ 2. **Immediately** dispatch the next stage in \`triage.path\` — no waiting, no question.
345
+ 3. Stop unconditionally only on these hard gates (autopilot **always** asks here):
346
+ - \`reviewer\` returned \`block\` decision (open findings) → render the findings, ask \`continue with fix-only\` / \`cancel\`.
347
+ - \`security-reviewer\` raised any finding → ask before proceeding.
348
+ - \`reviewer\` returned \`cap-reached\` (5 iterations without convergence) → ask.
349
+ - About to run \`ship\` (last stage in \`triage.path\`) → ask \`ship now?\` once, then proceed on confirmation. Ship is the only stage that always confirms in autopilot.
350
+
351
+ Auto mode never silently skips a hard gate; it just removes the cosmetic pause between green stages. The user typed \`auto\` once during triage and meant it.
352
+
353
+ ### Common rules for both modes
354
+
355
+ Resume from a fresh session works because everything is on disk: \`flow-state.json\` has \`currentStage\`, \`triage\` (with \`runMode\`), \`flows/<slug>/*.md\` carries the artifacts. The next \`/cc\` invocation enters Hop 1 → detect → resume summary → continue from \`currentStage\` with the saved runMode.
356
+
357
+ Resuming a paused \`auto\` flow re-enters auto mode silently. Resuming a paused \`step\` flow renders the slim summary again and waits for \`continue\`.
231
358
 
232
359
  ## Hop 5 — Compound (automatic)
233
360
 
@@ -244,8 +371,9 @@ After ship + compound, move every \`<stage>.md\` from \`flows/<slug>/\` into \`.
244
371
 
245
372
  ## Always-ask rules
246
373
 
247
- - Always run the triage gate on a fresh \`/cc\`. Never silently pick a path.
248
- - Always pause after every stage. Never auto-advance through plan → build → review without asking.
374
+ - Always run the triage gate on a fresh \`/cc\`. Never silently pick a path. Use the harness's structured question tool, not a printed code block.
375
+ - In \`step\` mode, always pause after every stage. Never auto-advance.
376
+ - In \`auto\` mode, never auto-advance past a hard gate (block / cap-reached / security finding / ship). The user opted into chaining green stages, not chaining decisions.
249
377
  - Always ask before \`git push\` or PR creation. Commit-helper auto-commits in strict mode; everything past commit is opt-in.
250
378
  - Always ask before deleting active artifacts (\`/cc-cancel\` is the supported way; do not \`rm\` artifacts directly).
251
379
  - Always show the slim summary back to the user; do not summarise from your own memory of the dispatch.
@@ -1,4 +1,4 @@
1
- import { type AcMode, type AcceptanceCriterionState, type BuildProfile, type DiscoverySpecialistId, type FlowStage, type RoutingClass, type TriageDecision } from "./types.js";
1
+ import { type AcMode, type AcceptanceCriterionState, type BuildProfile, type DiscoverySpecialistId, type FlowStage, type RoutingClass, type RunMode, type TriageDecision } from "./types.js";
2
2
  export declare const FLOW_STATE_SCHEMA_VERSION = 3;
3
3
  /** v8.0–v8.1 schema. Auto-migrated to v3 on read. */
4
4
  export declare const LEGACY_V8_FLOW_STATE_SCHEMA_VERSION = 2;
@@ -28,10 +28,18 @@ export declare class LegacyFlowStateError extends Error {
28
28
  export declare function isFlowStage(value: unknown): value is FlowStage;
29
29
  export declare function isRoutingClass(value: unknown): value is RoutingClass;
30
30
  export declare function isAcMode(value: unknown): value is AcMode;
31
+ export declare function isRunMode(value: unknown): value is RunMode;
31
32
  export declare function isDiscoverySpecialist(value: unknown): value is DiscoverySpecialistId;
32
33
  export declare function createInitialFlowState(nowIso?: string): FlowStateV82;
33
34
  /** @deprecated kept for source-level compatibility with v8.1 imports. */
34
35
  export declare const createInitialFlowStateV8: typeof createInitialFlowState;
36
+ /**
37
+ * Read a triage decision's runMode with the documented default.
38
+ *
39
+ * v8.2 state files do not record runMode; treat them as `step` so existing
40
+ * flows keep their pause-between-stages behaviour byte-for-byte.
41
+ */
42
+ export declare function runModeOf(triage: TriageDecision | null | undefined): RunMode;
35
43
  /**
36
44
  * Validate a flow-state object. Throws on hard schema errors.
37
45
  *
@@ -1,4 +1,4 @@
1
- import { AC_MODES, FLOW_STAGES, ROUTING_CLASSES } from "./types.js";
1
+ import { AC_MODES, FLOW_STAGES, ROUTING_CLASSES, RUN_MODES } from "./types.js";
2
2
  export const FLOW_STATE_SCHEMA_VERSION = 3;
3
3
  /** v8.0–v8.1 schema. Auto-migrated to v3 on read. */
4
4
  export const LEGACY_V8_FLOW_STATE_SCHEMA_VERSION = 2;
@@ -19,6 +19,9 @@ export function isRoutingClass(value) {
19
19
  export function isAcMode(value) {
20
20
  return typeof value === "string" && AC_MODES.includes(value);
21
21
  }
22
+ export function isRunMode(value) {
23
+ return typeof value === "string" && RUN_MODES.includes(value);
24
+ }
22
25
  export function isDiscoverySpecialist(value) {
23
26
  return value === "brainstormer" || value === "architect" || value === "planner";
24
27
  }
@@ -62,7 +65,8 @@ function inferTriageFromLegacy(state) {
62
65
  path: ["plan", "build", "review", "ship"],
63
66
  rationale: "Auto-migrated from cclaw 8.0/8.1 flow-state (no triage recorded; preserved as strict).",
64
67
  decidedAt: state.startedAt,
65
- userOverrode: false
68
+ userOverrode: false,
69
+ runMode: "step"
66
70
  };
67
71
  }
68
72
  function assertAcArray(value) {
@@ -116,6 +120,18 @@ function assertTriageOrNull(value) {
116
120
  if (typeof triage.userOverrode !== "boolean") {
117
121
  throw new Error("triage.userOverrode must be a boolean");
118
122
  }
123
+ if (triage.runMode !== undefined && !isRunMode(triage.runMode)) {
124
+ throw new Error(`Invalid triage.runMode: ${String(triage.runMode)}`);
125
+ }
126
+ }
127
+ /**
128
+ * Read a triage decision's runMode with the documented default.
129
+ *
130
+ * v8.2 state files do not record runMode; treat them as `step` so existing
131
+ * flows keep their pause-between-stages behaviour byte-for-byte.
132
+ */
133
+ export function runModeOf(triage) {
134
+ return triage?.runMode ?? "step";
119
135
  }
120
136
  /**
121
137
  * Validate a flow-state object. Throws on hard schema errors.
package/dist/types.d.ts CHANGED
@@ -41,6 +41,21 @@ export type RoutingClass = (typeof ROUTING_CLASSES)[number];
41
41
  */
42
42
  export declare const AC_MODES: readonly ["inline", "soft", "strict"];
43
43
  export type AcMode = (typeof AC_MODES)[number];
44
+ /**
45
+ * How aggressively the orchestrator advances through the flow.
46
+ *
47
+ * - `step` (default): pause after every stage. The orchestrator renders the
48
+ * slim summary and waits for the user to type "continue". The original
49
+ * v8.2 behaviour, recommended for `strict` and unfamiliar work.
50
+ * - `auto`: render the slim summary and immediately dispatch the next stage
51
+ * without asking. Stops only on hard gates (block findings, security flag,
52
+ * ship). Recommended for `inline` / `soft` work the user has already
53
+ * scoped tightly.
54
+ *
55
+ * Selected at the triage gate; user can override per flow.
56
+ */
57
+ export declare const RUN_MODES: readonly ["step", "auto"];
58
+ export type RunMode = (typeof RUN_MODES)[number];
44
59
  /**
45
60
  * Decision recorded at the triage gate that opens every new flow.
46
61
  * Persisted in flow-state.json so resumes never re-trigger triage.
@@ -56,6 +71,14 @@ export interface TriageDecision {
56
71
  decidedAt: string;
57
72
  /** Did the user override the orchestrator's recommendation? */
58
73
  userOverrode: boolean;
74
+ /**
75
+ * Step-by-step (default) or autopilot. Persisted across resumes so the
76
+ * user only picks once per flow.
77
+ *
78
+ * Optional in TypeScript so v8.2 state files (which lack `runMode`) still
79
+ * validate; readers MUST default to `step` on absent.
80
+ */
81
+ runMode?: RunMode;
59
82
  }
60
83
  export interface CliContext {
61
84
  cwd: string;
package/dist/types.js CHANGED
@@ -21,3 +21,17 @@ export const ROUTING_CLASSES = ["trivial", "small-medium", "large-risky"];
21
21
  * Selected at the triage gate; user can override.
22
22
  */
23
23
  export const AC_MODES = ["inline", "soft", "strict"];
24
+ /**
25
+ * How aggressively the orchestrator advances through the flow.
26
+ *
27
+ * - `step` (default): pause after every stage. The orchestrator renders the
28
+ * slim summary and waits for the user to type "continue". The original
29
+ * v8.2 behaviour, recommended for `strict` and unfamiliar work.
30
+ * - `auto`: render the slim summary and immediately dispatch the next stage
31
+ * without asking. Stops only on hard gates (block findings, security flag,
32
+ * ship). Recommended for `inline` / `soft` work the user has already
33
+ * scoped tightly.
34
+ *
35
+ * Selected at the triage gate; user can override per flow.
36
+ */
37
+ export const RUN_MODES = ["step", "auto"];
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "cclaw-cli",
3
- "version": "8.2.0",
3
+ "version": "8.3.0",
4
4
  "description": "Lightweight harness-first flow toolkit for coding agents",
5
5
  "type": "module",
6
6
  "bin": {