npm - cclaw-cli - Versions diffs - 8.2.0 → 8.3.0 - Mend

cclaw-cli 8.2.0 → 8.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

package/README.md +9 -0
package/dist/constants.d.ts +1 -1
package/dist/constants.js +1 -1
package/dist/content/antipatterns.d.ts +1 -1
package/dist/content/antipatterns.js +24 -0
package/dist/content/skills.js +109 -13
package/dist/content/start-command.js +140 -12
package/dist/flow-state.d.ts +9 -1
package/dist/flow-state.js +18 -2
package/dist/types.d.ts +23 -0
package/dist/types.js +14 -0
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -35,6 +35,15 @@
 Three slash commands (`/cc`, `/cc-cancel`, `/cc-idea`). Four stages (`plan → build → review → ship`). Six specialists, all on-demand, all running as sub-agents. Fifteen skills including the always-on `triage-gate`, `flow-resume`, `tdd-cycle`, `conversation-language`, and `anti-slop`. Ten templates including `plan-soft.md` and `build-soft.md` for the soft-mode path. Four runbooks. Eight reference patterns. Three research playbooks. Five recovery playbooks. Eight worked examples. Two mandatory gates in strict mode (AC traceability + TDD phase chain); soft mode keeps both as advisory; inline mode skips both.
+## What changed in 8.3
+8.3 is a non-breaking content + UX patch on top of 8.2.
+- **Triage as a structured ask, not a code block.** The orchestrator now uses the harness's structured question tool (`AskUserQuestion` / `AskQuestion` / `prompt`) to render the triage. Two questions, in order: pick the path, then pick the run mode. The fenced form remains as a fallback only.
+- **Run mode: `step` (default) vs `auto`.** `step` pauses after every stage and waits for `continue` (8.2 behaviour). `auto` chains plan → build → review → ship without pausing; stops only on block findings, cap-reached, security findings, or before `ship`. New optional field `triage.runMode` in `flow-state.json`.
+- **Explicit parallel-build fan-out in Hop 3.** The `/cc` body now carries a full ASCII fan-out diagram for the strict-mode parallel-build path — `git worktree` per slice, max 5 slices, one `slice-builder` sub-agent per slice, integration reviewer, merge sequence. The skill `parallel-build.md` already had this; the orchestrator now sees it at the dispatch site.
+- **TDD cycle deepening.** Four new sections in `tdd-cycle.md`: vertical slicing / tracer bullets, stop-the-line rule, Prove-It pattern for bug fixes, writing-good-tests rules (state-not-interactions, DAMP over DRY, real-over-mock, test pyramid). Three new antipatterns: A-13 horizontal slicing, A-14 pushing past a failing test, A-15 mocking what should not be mocked.
 ## What changed in 8.2
 8.2 is a non-breaking redesign of the `/cc` orchestrator on top of 8.1.

package/dist/constants.d.ts CHANGED Viewed

@@ -1,4 +1,4 @@
-export declare const CCLAW_VERSION = "8.2.0";
+export declare const CCLAW_VERSION = "8.3.0";
 export declare const RUNTIME_ROOT = ".cclaw";
 export declare const STATE_REL_PATH = ".cclaw/state";
 export declare const HOOKS_REL_PATH = ".cclaw/hooks";

package/dist/constants.js CHANGED Viewed

@@ -1,4 +1,4 @@
-export const CCLAW_VERSION = "8.2.0";
+export const CCLAW_VERSION = "8.3.0";
 export const RUNTIME_ROOT = ".cclaw";
 export const STATE_REL_PATH = `${RUNTIME_ROOT}/state`;
 export const HOOKS_REL_PATH = `${RUNTIME_ROOT}/hooks`;

package/dist/content/antipatterns.d.ts CHANGED Viewed

	@@ -1 +1 @@
1	- export declare const ANTIPATTERNS = "# .cclaw/lib/antipatterns.md\n\nPatterns we have seen fail. Each entry is a short symptom, the underlying mistake, and the corrective action. The orchestrator and specialists open this file when a smell is detected; the reviewer cites entries as findings when applicable.\n\n## A-1 \u2014 \"Just one more AC\"\n\nSymptom. A plan starts with 4 AC and ends with 11. Most of the additions appeared during build.\n\nUnderlying mistake. Scope is being expanded mid-flight without going back to plan-stage.\n\nCorrection. When build encounters new work, surface it as a follow-up in `.cclaw/ideas.md` or a fresh slug. If the new work is genuinely required to satisfy an existing AC, that AC was wrong; cancel the slug and re-plan with a tighter AC set.\n\n## A-2 \u2014 TDD phase integrity broken\n\nSymptom (any of):\n\n- Build commits land for AC-N with `--phase=green` but no `--phase=red` recorded earlier.\n- AC has RED + GREEN commits but no `--phase=refactor` (skipped or applied) entry in flow-state.\n- A `--phase=red` commit touches `src/`, `lib/`, or `app/` \u2014 production code slipped into RED.\n- Tests for AC-N appear in a separate commit a few minutes after the AC-N implementation lands.\n\nUnderlying mistake. The TDD cycle was treated as ceremony, not as the contract. The cycle exists so the failing test encodes the AC; skipping or scrambling phases produces an audit trail that nobody can trust.\n\nCorrection. `commit-helper.mjs` enforces RED \u2192 GREEN \u2192 REFACTOR per AC. Write a failing test first and commit under `--phase=red` (test files only). Implement the smallest production change that turns it green; commit under `--phase=green`. Either commit a refactor under `--phase=refactor` or skip it explicitly with `--phase=refactor --skipped --message=\"refactor(AC-N) skipped: <reason>\"`. The reviewer cites this entry whenever the chain is incomplete.\n\n## A-3 \u2014 Work outside the AC\n\nSymptom (any of):\n\n- A small AC commit also restructures an unrelated module.\n- A commit produced by `commit-helper.mjs` contains files that are unrelated to the AC.\n- `git add -A` appears in shell history inside `/cc`.\n\nUnderlying mistake. Slice-builder absorbed unrelated edits or silently expanded scope. The AC commit no longer maps cleanly to the AC.\n\nCorrection. Stage AC-related files explicitly: `git add <path>` per file, or `git add -p` to pick hunks. Never `git add -A` inside `/cc`. If a refactor really must happen, capture it as a follow-up; if it really blocks the AC, cancel the slug and re-plan as a refactor + behaviour-change pair.\n\n## A-4 \u2014 AC that mirror sub-tasks\n\nSymptom. AC read like \"implement the helper\", \"wire the helper\", \"test the helper\".\n\nUnderlying mistake. AC are outcomes, not sub-tasks. Outcomes survive refactors; sub-tasks do not.\n\nCorrection. Rewrite AC as observable outcomes. The helper is an implementation detail, not an AC.\n\n## A-5 \u2014 Over-careful brainstormer\n\nSymptom. Brainstormer produces three pages of Context for a small task; planner is then unable to size the work.\n\nUnderlying mistake. Brainstormer ignored the routing class. Trivial / small-medium tasks should have a one-paragraph Context, not a Frame + Scope + Alternatives sweep.\n\nCorrection. Brainstormer reads the routing class first and short-circuits when the task is small. Three sentences of Context is enough for AC-1.\n\n## A-6 \u2014 \"I already looked\"\n\nSymptom. Reviewer reports a \"clear\" decision without a Five Failure Modes pass.\n\nUnderlying mistake. The Five Failure Modes pass is the artifact. Skipping it because \"I already looked\" produces no audit trail.\n\nCorrection. Reviewer always emits the Five Failure Modes block. Each item gets yes / no with citation when yes. A \"no\" with no thinking attached is fine; an absent block is not.\n\n## A-7 \u2014 Shipping with a pending AC\n\nSymptom. `runCompoundAndShip()` is invoked while flow-state has at least one AC with `status: pending`.\n\nUnderlying mistake. The agent expected the orchestrator to \"figure it out\" and complete the AC silently.\n\nCorrection. The AC traceability gate refuses ship. Either complete the AC (slice-builder) or cancel the slug (`/cc-cancel`) and re-plan with the smaller AC set. There is no override.\n\n## A-8 \u2014 Re-creating a shipped slug instead of refining\n\nSymptom. A new `/cc` invocation produces a slug whose plan is 80% identical to a slug already in `.cclaw/flows/shipped/`.\n\nUnderlying mistake. Existing-plan detection was skipped or its output was ignored.\n\nCorrection. Existing-plan detection is mandatory at the start of every `/cc`. When a shipped match is offered, the user picks refine shipped or new unrelated, not \"ignore the match\".\n\n## A-9 \u2014 Editing shipped artifacts\n\nSymptom. A shipped slug's `plan.md` is edited weeks after ship.\n\nUnderlying mistake. Shipped artifacts are immutable. Editing them invalidates the knowledge index and breaks refinement chains.\n\nCorrection. Open a refinement slug. The new slug carries `refines: <old-slug>` and contains the corrections. The old slug stays as it shipped.\n\n## A-10 \u2014 Force-push during ship\n\nSymptom. `git push --force` appears in shell history during ship.\n\nUnderlying mistake. Force-push rewrites the SHAs that flow-state and the AC traceability block reference. The chain breaks silently; nothing in the runtime detects it.\n\nCorrection. Refuse `git push --force` inside `/cc` unless the user explicitly requested it twice (initial request + confirmation). After the force-push, every recorded SHA in the slug must be re-verified by hand and updated.\n\n## A-11 \u2014 Hidden security surface\n\nSymptom. A slug ships without `security_flag: true` even though the diff added a new auth-adjacent code path.\n\nUnderlying mistake. The author judged \"this is mostly UI\" and skipped the security checklist.\n\nCorrection. `security_flag` is set whenever the diff touches authn / authz / secrets / supply chain / data exposure, even when the change feels small. The cost of a spurious security flag is a few minutes; the cost of a missed one is a CVE.\n\n## A-12 \u2014 Single test green, didn't run the suite\n\nSymptom. `flows/<slug>/build.md` GREEN evidence column shows `npm test path/to/single.test` only; full-suite run is missing.\n\nUnderlying mistake. A passing single test is not GREEN. Production change can break adjacent tests; without running the suite, the AC is shipped on a regression.\n\nCorrection. GREEN evidence must be the full relevant suite for the affected module(s), not the single test. The reviewer cites this as a block finding.\n";
1	+ export declare const ANTIPATTERNS = "# .cclaw/lib/antipatterns.md\n\nPatterns we have seen fail. Each entry is a short symptom, the underlying mistake, and the corrective action. The orchestrator and specialists open this file when a smell is detected; the reviewer cites entries as findings when applicable.\n\n## A-1 \u2014 \"Just one more AC\"\n\nSymptom. A plan starts with 4 AC and ends with 11. Most of the additions appeared during build.\n\nUnderlying mistake. Scope is being expanded mid-flight without going back to plan-stage.\n\nCorrection. When build encounters new work, surface it as a follow-up in `.cclaw/ideas.md` or a fresh slug. If the new work is genuinely required to satisfy an existing AC, that AC was wrong; cancel the slug and re-plan with a tighter AC set.\n\n## A-2 \u2014 TDD phase integrity broken\n\nSymptom (any of):\n\n- Build commits land for AC-N with `--phase=green` but no `--phase=red` recorded earlier.\n- AC has RED + GREEN commits but no `--phase=refactor` (skipped or applied) entry in flow-state.\n- A `--phase=red` commit touches `src/`, `lib/`, or `app/` \u2014 production code slipped into RED.\n- Tests for AC-N appear in a separate commit a few minutes after the AC-N implementation lands.\n\nUnderlying mistake. The TDD cycle was treated as ceremony, not as the contract. The cycle exists so the failing test encodes the AC; skipping or scrambling phases produces an audit trail that nobody can trust.\n\nCorrection. `commit-helper.mjs` enforces RED \u2192 GREEN \u2192 REFACTOR per AC. Write a failing test first and commit under `--phase=red` (test files only). Implement the smallest production change that turns it green; commit under `--phase=green`. Either commit a refactor under `--phase=refactor` or skip it explicitly with `--phase=refactor --skipped --message=\"refactor(AC-N) skipped: <reason>\"`. The reviewer cites this entry whenever the chain is incomplete.\n\n## A-3 \u2014 Work outside the AC\n\nSymptom (any of):\n\n- A small AC commit also restructures an unrelated module.\n- A commit produced by `commit-helper.mjs` contains files that are unrelated to the AC.\n- `git add -A` appears in shell history inside `/cc`.\n\nUnderlying mistake. Slice-builder absorbed unrelated edits or silently expanded scope. The AC commit no longer maps cleanly to the AC.\n\nCorrection. Stage AC-related files explicitly: `git add <path>` per file, or `git add -p` to pick hunks. Never `git add -A` inside `/cc`. If a refactor really must happen, capture it as a follow-up; if it really blocks the AC, cancel the slug and re-plan as a refactor + behaviour-change pair.\n\n## A-4 \u2014 AC that mirror sub-tasks\n\nSymptom. AC read like \"implement the helper\", \"wire the helper\", \"test the helper\".\n\nUnderlying mistake. AC are outcomes, not sub-tasks. Outcomes survive refactors; sub-tasks do not.\n\nCorrection. Rewrite AC as observable outcomes. The helper is an implementation detail, not an AC.\n\n## A-5 \u2014 Over-careful brainstormer\n\nSymptom. Brainstormer produces three pages of Context for a small task; planner is then unable to size the work.\n\nUnderlying mistake. Brainstormer ignored the routing class. Trivial / small-medium tasks should have a one-paragraph Context, not a Frame + Scope + Alternatives sweep.\n\nCorrection. Brainstormer reads the routing class first and short-circuits when the task is small. Three sentences of Context is enough for AC-1.\n\n## A-6 \u2014 \"I already looked\"\n\nSymptom. Reviewer reports a \"clear\" decision without a Five Failure Modes pass.\n\nUnderlying mistake. The Five Failure Modes pass is the artifact. Skipping it because \"I already looked\" produces no audit trail.\n\nCorrection. Reviewer always emits the Five Failure Modes block. Each item gets yes / no with citation when yes. A \"no\" with no thinking attached is fine; an absent block is not.\n\n## A-7 \u2014 Shipping with a pending AC\n\nSymptom. `runCompoundAndShip()` is invoked while flow-state has at least one AC with `status: pending`.\n\nUnderlying mistake. The agent expected the orchestrator to \"figure it out\" and complete the AC silently.\n\nCorrection. The AC traceability gate refuses ship. Either complete the AC (slice-builder) or cancel the slug (`/cc-cancel`) and re-plan with the smaller AC set. There is no override.\n\n## A-8 \u2014 Re-creating a shipped slug instead of refining\n\nSymptom. A new `/cc` invocation produces a slug whose plan is 80% identical to a slug already in `.cclaw/flows/shipped/`.\n\nUnderlying mistake. Existing-plan detection was skipped or its output was ignored.\n\nCorrection. Existing-plan detection is mandatory at the start of every `/cc`. When a shipped match is offered, the user picks refine shipped or new unrelated, not \"ignore the match\".\n\n## A-9 \u2014 Editing shipped artifacts\n\nSymptom. A shipped slug's `plan.md` is edited weeks after ship.\n\nUnderlying mistake. Shipped artifacts are immutable. Editing them invalidates the knowledge index and breaks refinement chains.\n\nCorrection. Open a refinement slug. The new slug carries `refines: <old-slug>` and contains the corrections. The old slug stays as it shipped.\n\n## A-10 \u2014 Force-push during ship\n\nSymptom. `git push --force` appears in shell history during ship.\n\nUnderlying mistake. Force-push rewrites the SHAs that flow-state and the AC traceability block reference. The chain breaks silently; nothing in the runtime detects it.\n\nCorrection. Refuse `git push --force` inside `/cc` unless the user explicitly requested it twice (initial request + confirmation). After the force-push, every recorded SHA in the slug must be re-verified by hand and updated.\n\n## A-11 \u2014 Hidden security surface\n\nSymptom. A slug ships without `security_flag: true` even though the diff added a new auth-adjacent code path.\n\nUnderlying mistake. The author judged \"this is mostly UI\" and skipped the security checklist.\n\nCorrection. `security_flag` is set whenever the diff touches authn / authz / secrets / supply chain / data exposure, even when the change feels small. The cost of a spurious security flag is a few minutes; the cost of a missed one is a CVE.\n\n## A-12 \u2014 Single test green, didn't run the suite\n\nSymptom. `flows/<slug>/build.md` GREEN evidence column shows `npm test path/to/single.test` only; full-suite run is missing.\n\nUnderlying mistake. A passing single test is not GREEN. Production change can break adjacent tests; without running the suite, the AC is shipped on a regression.\n\nCorrection. GREEN evidence must be the full relevant suite for the affected module(s), not the single test. The reviewer cites this as a block finding.\n\n## A-13 \u2014 Horizontal slicing (RED-batch then GREEN-batch)\n\nSymptom. `flows/<slug>/build.md` shows AC-1 RED, AC-2 RED, AC-3 RED committed in a row, then AC-1 GREEN, AC-2 GREEN, AC-3 GREEN. Or the slice-builder describes the build as \"tests written, now I'll implement\".\n\nUnderlying mistake. Writing all RED tests before any GREEN code means the tests describe the behaviour you guessed before you saw the real interface. Tests written this way pass when behaviour breaks (because they test the imagined shape) and fail when behaviour is fine (because the real shape diverged from the imagination). They get rewritten during the next refactor.\n\nCorrection. One test \u2192 one implementation \u2192 repeat. Each cycle informs the next. The AC-2 test is shaped by what the AC-1 implementation revealed about the real interface. `commit-helper.mjs --phase=red` for AC-2 will refuse if AC-1's chain isn't closed yet \u2014 that is the rail. See the Vertical Slicing section in `tdd-cycle.md`.\n\n## A-14 \u2014 Pushing past a failing test\n\nSymptom. Build log shows a flaky or unexpected failure on AC-2, then continues into AC-3 with \"I'll come back to AC-2 later\". Or a hook rejection silently retried with a slightly different commit message.\n\nUnderlying mistake. Errors compound. AC-3 is built on the invariants AC-2 was supposed to establish. If AC-2's RED failed for the wrong reason, you are debugging a stack of broken assumptions, and every cycle past that point makes the diagnosis harder.\n\nCorrection. Stop the line. Preserve the failure (command + 1\u20133 lines of output verbatim), reproduce in isolation, root-cause to a concrete file:line, fix once, re-run the full relevant suite, then resume the cycle. If the root cause cannot be identified in three attempts, surface a blocker to the orchestrator \u2014 do not \"make it work\" by removing the test or weakening the assertion.\n\n## A-15 \u2014 Mocking what should not be mocked\n\nSymptom. A database query test mocks the driver and asserts on `db.query` call shape; the test is green and the actual query never runs in production. Or a service test mocks every collaborator and only verifies which methods were called, in which order.\n\nUnderlying mistake. Mocking a dependency you control couples the test to the implementation. The test reads green even when the SQL is wrong, the migration is missing, the column is misspelled. Real bugs live in those gaps. Interaction-based assertions (`expect(x).toHaveBeenCalledWith(...)`) break on every refactor and provide weaker confidence than state-based assertions on the outcome.\n\nCorrection. Use a real test database (or an in-memory fake of the same shape) and assert on the outcome \u2014 the row that was inserted, the response from the query, the observable side effect \u2014 not on the call. Reach for mocks only for things genuinely outside your control: third-party APIs, time, randomness, the network. Real > Fake (in-memory) > Stub (canned data) > Mock (interaction).\n";

package/dist/content/antipatterns.js CHANGED Viewed

@@ -106,4 +106,28 @@ Patterns we have seen fail. Each entry is a short symptom, the underlying mistak
 **Underlying mistake.** A passing single test is not GREEN. Production change can break adjacent tests; without running the suite, the AC is shipped on a regression.
 **Correction.** GREEN evidence must be the **full relevant suite** for the affected module(s), not the single test. The reviewer cites this as a block finding.
+## A-13 — Horizontal slicing (RED-batch then GREEN-batch)
+**Symptom.** \`flows/<slug>/build.md\` shows AC-1 RED, AC-2 RED, AC-3 RED committed in a row, then AC-1 GREEN, AC-2 GREEN, AC-3 GREEN. Or the slice-builder describes the build as "tests written, now I'll implement".
+**Underlying mistake.** Writing all RED tests before any GREEN code means the tests describe the behaviour you *guessed* before you saw the real interface. Tests written this way pass when behaviour breaks (because they test the imagined shape) and fail when behaviour is fine (because the real shape diverged from the imagination). They get rewritten during the next refactor.
+**Correction.** One test → one implementation → repeat. Each cycle informs the next. The AC-2 test is shaped by what the AC-1 implementation revealed about the real interface. \`commit-helper.mjs --phase=red\` for AC-2 will refuse if AC-1's chain isn't closed yet — that is the rail. See the Vertical Slicing section in \`tdd-cycle.md\`.
+## A-14 — Pushing past a failing test
+**Symptom.** Build log shows a flaky or unexpected failure on AC-2, then continues into AC-3 with "I'll come back to AC-2 later". Or a hook rejection silently retried with a slightly different commit message.
+**Underlying mistake.** Errors compound. AC-3 is built on the invariants AC-2 was supposed to establish. If AC-2's RED failed for the wrong reason, you are debugging a stack of broken assumptions, and every cycle past that point makes the diagnosis harder.
+**Correction.** Stop the line. Preserve the failure (command + 1–3 lines of output verbatim), reproduce in isolation, root-cause to a concrete file:line, fix once, re-run the full relevant suite, then resume the cycle. If the root cause cannot be identified in three attempts, surface a blocker to the orchestrator — do not "make it work" by removing the test or weakening the assertion.
+## A-15 — Mocking what should not be mocked
+**Symptom.** A database query test mocks the driver and asserts on \`db.query\` call shape; the test is green and the actual query never runs in production. Or a service test mocks every collaborator and only verifies which methods were called, in which order.
+**Underlying mistake.** Mocking a dependency you control couples the test to the implementation. The test reads green even when the SQL is wrong, the migration is missing, the column is misspelled. Real bugs live in those gaps. Interaction-based assertions (\`expect(x).toHaveBeenCalledWith(...)\`) break on every refactor and provide weaker confidence than state-based assertions on the outcome.
+**Correction.** Use a real test database (or an in-memory fake of the same shape) and assert on the **outcome** — the row that was inserted, the response from the query, the observable side effect — not on the call. Reach for mocks only for things genuinely outside your control: third-party APIs, time, randomness, the network. Real > Fake (in-memory) > Stub (canned data) > Mock (interaction).
 `;

package/dist/content/skills.js CHANGED Viewed

@@ -5,9 +5,7 @@ trigger: at the start of every new /cc invocation, before any specialist runs
 # Skill: triage-gate
-Every new flow opens with a **triage gate**. The orchestrator analyses the user's request, picks a complexity class, names an AC mode, proposes a path, and **asks the user to confirm**. Nothing else runs until the user has confirmed (or overridden) the triage.
-This skill exists because cclaw v8.1 used to silently pick a path and lock the user into it. v8.2 makes that decision explicit, audit-able, and overridable.
+Every new flow opens with a **triage gate**. The orchestrator analyses the user's request, picks a complexity class, names an AC mode, proposes a path, and **asks the user to confirm — twice**: once for the path, once for the run mode (autopilot or step-by-step). Nothing else runs until both questions are answered.
 ## When this skill applies
@@ -15,9 +13,37 @@ This skill exists because cclaw v8.1 used to silently pick a path and lock the u
 - Skipped on \`/cc\` (no argument) when an active flow is detected — see \`flow-resume.md\`.
 - Skipped on \`/cc-cancel\` and \`/cc-idea\` (these never open a flow).
-## Output format (mandatory)
+## How to render the question — STRUCTURED, not prose
+If the harness exposes a structured question tool — \`AskUserQuestion\` (Claude Code), \`AskQuestion\` (Cursor), an "ask" content block (OpenCode), \`prompt\` (Codex) — **use it**. Two separate calls, in order. Do **not** print the triage as a code block and rely on the user reading numbered options. v8.2 shipped that way and the harness rendered prose; v8.3 fixes it.
+### Question 1 — path
+Render the analysis as the question prompt and the four choices as options:
+- prompt: \`Triage — Complexity: small/medium (high). Recommended: plan → build → review → ship. Why: 3 modules, ~150 LOC, no auth touch. AC mode: soft. Pick a path.\`
+- options:
+  - \`Proceed as recommended\`
+  - \`Switch to trivial (inline edit + commit, skip plan/review)\`
+  - \`Escalate to large-risky (add brainstormer/architect, strict AC, parallel slices)\`
+  - \`Custom (let me edit complexity / acMode / path)\`
+The prompt MUST embed the four heuristic facts (complexity + confidence, recommended path, why, ac mode) so the user can decide without reading another block. Keep it under 280 characters; truncate the rationale before truncating the facts.
+### Question 2 — run mode
+Right after the user picks a path, ask:
+- prompt: \`Run mode for this flow?\`
+- options:
+  - \`Step (default) — pause after every stage; I type "continue" to advance\`
+  - \`Auto — run plan → build → review → ship without pausing; stop only on block findings or security flag\`
+Default \`step\` if the user dismisses the question or the harness lacks a structured ask facility. Inline / trivial flows skip Question 2 (there are no stages to chain).
-Reply with a single fenced block followed by an option list:
+## Fallback — when no structured ask tool exists
+Only when the harness has no structured ask facility (rare; legacy CLI mode), print the same content as a fenced block plus numbered options:
 \`\`\`
 Triage
@@ -27,8 +53,6 @@ Triage
 ─ AC mode: <inline | soft | strict>
 \`\`\`
-Then list the four options verbatim:
 \`\`\`
 [1] Proceed as recommended
 [2] Switch to trivial (inline edit + commit, skip plan/review)
@@ -36,6 +60,16 @@ Then list the four options verbatim:
 [4] Custom (let me edit complexity / acMode / path)
 \`\`\`
+Then a separate block for run mode:
+\`\`\`
+Run mode
+[s] Step — pause after every stage (default)
+[a] Auto — chain stages without pausing; stop only on block findings or security flag
+\`\`\`
+The fenced form is a fallback, not the primary path. Always try the structured tool first.
 ## Heuristics — how to pick
 Rank the request against these signals. The orchestrator picks the **highest** complexity any signal triggers (escalation is one-way).
@@ -62,7 +96,7 @@ If the heuristic gives \`small/medium\` but the user said something like "featur
 ## What the orchestrator records
-After the user picks (1)/(2)/(3)/(4), patch \`.cclaw/state/flow-state.json\`:
+After both questions are answered, patch \`.cclaw/state/flow-state.json\`:
 \`\`\`json
 {
@@ -72,14 +106,15 @@ After the user picks (1)/(2)/(3)/(4), patch \`.cclaw/state/flow-state.json\`:
     "path": ["plan", "build", "review", "ship"],
     "rationale": "3 modules, ~150 LOC, no auth touch.",
     "decidedAt": "2026-05-08T12:34:56Z",
-    "userOverrode": false
+    "userOverrode": false,
+    "runMode": "step"
   }
 }
 \`\`\`
-\`userOverrode\` is \`true\` only when the user picked (2), (3), or a (4) custom that disagrees with the recommendation.
+\`userOverrode\` is \`true\` only when the user picked (2), (3), or a (4) custom that disagrees with the recommendation. \`runMode\` is \`step\` by default; record \`auto\` only when the user explicitly opted into autopilot in Question 2.
-The triage block is **immutable for the lifetime of the flow**. If the user wants to escalate mid-flight (e.g. discovers it is bigger than thought), \`/cc-cancel\` and start a fresh flow with new triage.
+The triage block is **immutable for the lifetime of the flow**. If the user wants to escalate mid-flight (e.g. discovers it is bigger than thought), \`/cc-cancel\` and start a fresh flow with new triage. Switching from \`step\` to \`auto\` (or vice versa) is also a fresh-flow decision — the orchestrator does not flip mid-flight.
 ## Path semantics
@@ -160,11 +195,12 @@ The user is expected to clarify in (4) Custom or accept (1) Proceed; either way
 ## Common pitfalls
-- Returning the triage as prose paragraphs instead of the fenced block. The orchestrator expects the structured form so it can parse \`acMode\` and \`path\` reliably.
+- **Rendering the triage as a code block when a structured ask tool is available.** v8.3 fixes this: try the harness's structured ask facility (\`AskUserQuestion\` / \`AskQuestion\` / \`prompt\` / "ask" content block) first; the fenced form is a fallback only.
 - Stating "I think this is medium-complexity" and then immediately invoking planner. That is the v8.1 bug. Wait for the user's pick.
 - Picking \`large-risky\` for a one-file rename "to be safe". Do not pad the heuristic; the user reads it and learns to ignore your triage.
+- Forgetting to ask Question 2 (run mode) after Question 1 (path). \`triage.runMode\` controls Hop 4 (pause); a missing value defaults to \`step\` — safe but wastes a click for users who wanted autopilot.
 - Forgetting to write \`triage\` into \`flow-state.json\`. The hook check \`commit-helper.mjs\` and the resume detector both read it; an absent triage breaks both.
-- Re-running the gate on resume. Resume reads the saved triage and continues from \`currentStage\`; it never re-prompts.
+- Re-running the gate on resume. Resume reads the saved triage (path + runMode) and continues from \`currentStage\`; it never re-prompts.
 `;
 const FLOW_RESUME = `---
 name: flow-resume
@@ -711,6 +747,63 @@ Silence fails the gate.
 (a) **discovery_complete** — relevant tests / fixtures / helpers / commands cited.\n(b) **impact_check_complete** — affected callbacks / state / interfaces / contracts named.\n(c) **red_test_recorded** — failing test exists, watched-RED proof attached.\n(d) **red_fails_for_right_reason** — RED captured a real assertion failure.\n(e) **green_full_suite** — full relevant suite green after GREEN.\n(f) **refactor_run_or_skipped_with_reason** — REFACTOR ran, or explicitly skipped with reason.\n(g) **traceable_to_plan** — commits reference plan AC ids and the plan's file set.\n(h) **commit_chain_intact** — RED + GREEN + REFACTOR SHAs (or skipped sentinel) recorded in flow-state.
+## Vertical slicing — tracer bullets, never horizontal waves
+**One test → one impl → repeat.** Even in strict mode, you do not write all RED tests for the slice and then all GREEN code. That horizontal pattern produces tests of *imagined* behaviour: the data shape you guessed, the function signature you guessed, the error message you guessed. The tests pass when behaviour breaks and fail when behaviour is fine.
+The correct pattern is a tracer bullet per AC:
+\`\`\`
+WRONG (horizontal):
+  RED:   AC-1 test, AC-2 test, AC-3 test
+  GREEN: AC-1 impl, AC-2 impl, AC-3 impl
+RIGHT (vertical / tracer bullet):
+  AC-1: RED → GREEN → REFACTOR  (commit chain closes here)
+  AC-2: RED → GREEN → REFACTOR  (next chain starts here, informed by what you learned in AC-1)
+  AC-3: RED → GREEN → REFACTOR
+\`\`\`
+Each cycle informs the next. The AC-2 test is shaped by what the AC-1 implementation revealed about the real interface. \`commit-helper.mjs --phase=red\` for AC-2 will refuse if AC-1's chain isn't closed yet — that's the rail.
+In soft mode the same principle applies at feature granularity: write 1–3 tests for the highest-priority condition, implement, then if more tests are needed for adjacent conditions, write them after you've seen the real shape of the GREEN code.
+## Stop-the-line rule
+When **anything** unexpected happens during build — a test fails for the wrong reason, the build breaks, a prior-green test starts failing, a hook rejects a commit — **stop adding code**. Do not push past the failure to "come back later". Errors compound: a wrong assumption in AC-1 makes AC-2 and AC-3 wrong.
+Procedure:
+1. Preserve evidence. Capture the failing command + 1–3 lines of output verbatim.
+2. Reproduce in isolation. Run only the failing test to confirm it fails reliably.
+3. Diagnose root cause. Trace the failing assertion back to a concrete cause (the actual cause, not the first plausible one). Cite the file:line in the build log.
+4. Fix. The fix is a refactor of the GREEN code, a correction of the RED test (if it tested the wrong thing), or a new RED that captures the missed behaviour — never silent.
+5. Re-run the **full relevant suite**. A passing single test is not GREEN if the suite is red elsewhere.
+6. Resume the cycle from where you stopped, with the chain intact.
+If the root cause cannot be identified in three attempts, surface a blocker to the orchestrator. Do not "make it work" by removing the test, weakening the assertion, or commenting out the failure.
+## Prove-It pattern (bug fixes)
+When the input is a bug fix, the order is non-negotiable:
+1. **Write a failing test that reproduces the bug.** This is the watched-RED proof. If you cannot reproduce the bug with a test, you cannot fix it with confidence — go gather more context.
+2. Confirm the test fails for the right reason — your test captured the bug, not a syntax / fixture / import error.
+3. Fix the bug. Smallest possible production diff that turns the new test green.
+4. Run the full relevant suite — the fix must not break adjacent behaviour.
+5. Refactor.
+Bug-fix RED commits use \`--phase=red\` like any other RED. The AC id is the user's bug-fix slug (e.g. \`AC-1: completing a task sets completedAt\`). In soft mode, the same five steps apply, just with one cycle for the whole fix and a plain \`git commit\`.
+## Writing good tests (state, not interactions; DAMP, not DRY)
+These rules apply equally to soft and strict modes. They make the difference between tests that survive a refactor and tests that have to be rewritten every time.
+- **Test state, not interactions.** Assert on the *outcome* of the operation — return value, persisted record, observable side effect — not on which methods were called internally. \`expect(result).toEqual(...)\` is good; \`expect(db.query).toHaveBeenCalledWith(...)\` couples the test to the implementation.
+- **DAMP over DRY in tests.** A test should read like a specification. Each test independently understandable beats a clever shared setup that reads well only after tracing helpers. Duplication in test code is acceptable when it makes each case independently readable.
+- **Prefer real implementations over mocks.** The more your tests use real code, the more confidence they provide. Mock only what is genuinely outside your control (third-party APIs, time, randomness). Real > Fake (in-memory) > Stub (canned data) > Mock (interaction). Reach for the simplest level that gets the job done.
+- **Test pyramid: small / medium / large.** Most tests should be small (single process, no I/O, milliseconds). A handful are medium (boundary tests, in-process integration, seconds). E2E / multi-machine tests stay reserved for critical paths only.
 ## Anti-patterns
 - "The implementation is obvious, skipping RED." A-13 — gate fails immediately.
@@ -719,6 +812,9 @@ Silence fails the gate.
 - "Stage everything with \`git add -A\`." A-16 — staged unrelated edits leak into the AC commit.
 - "Production code in the RED commit." A-17 — RED is test files only.
 - **"Test file named after the AC id" — \`AC-1.test.ts\`, \`tests/AC-2.spec.ts\`, etc.** The reviewer flags this as \`block\`. Mirror the unit under test in the filename; carry the AC id inside the test name and commit message only.
+- **Horizontal slicing.** A-18 — writing all RED tests first, then all GREEN code, produces tests of imagined behaviour. One test → one impl → repeat. See the Vertical Slicing section above.
+- **Pushing past a failing test.** A-19 — the next cycle is built on the previous cycle's invariants; if those invariants are broken you are debugging a stack of broken assumptions. Stop the line, root-cause, then resume.
+- **Mocking what you should not mock.** A-20 — mocking the database for a query test reads green and breaks in production. Use a fake or a real test DB; mock only what is genuinely outside your control.
 ## Fix-only flow

package/dist/content/start-command.js CHANGED Viewed

@@ -1,7 +1,30 @@
 import { CORE_AGENTS } from "./core-agents.js";
 import { ironLawsMarkdown } from "./iron-laws.js";
 const SPECIALIST_LIST = CORE_AGENTS.map((agent) => `- **${agent.id}** (${agent.modes.join(" / ")}) — ${agent.description}`).join("\n");
-const TRIAGE_BLOCK_EXAMPLE = `\`\`\`
+const TRIAGE_ASK_EXAMPLE = `\`\`\`
+askUserQuestion(
+  prompt: "Triage — Complexity: small/medium (high). Recommended: plan → build → review → ship. Why: 3 modules, ~150 LOC, no auth touch. AC mode: soft. Pick a path.",
+  options: [
+    "Proceed as recommended",
+    "Switch to trivial (inline edit + commit, skip plan/review)",
+    "Escalate to large-risky (add brainstormer/architect, strict AC, parallel slices)",
+    "Custom (let me edit complexity / acMode / path)"
+  ],
+  multiSelect: false
+)
+# After the user picks, ask the second question:
+askUserQuestion(
+  prompt: "Run mode for this flow?",
+  options: [
+    "Step (default) — pause after every stage; I type \\"continue\\" to advance",
+    "Auto — chain plan → build → review → ship; stop only on block findings or security flag"
+  ],
+  multiSelect: false
+)
+\`\`\``;
+const TRIAGE_FALLBACK_EXAMPLE = `\`\`\`
 Triage
 ─ Complexity: small/medium  (confidence: high)
 ─ Recommended path: plan → build → review → ship
@@ -12,6 +35,12 @@ Triage
 [2] Switch to trivial (inline edit + commit, skip plan/review)
 [3] Escalate to large-risky (add brainstormer/architect, strict AC, parallel slices)
 [4] Custom (let me edit complexity / acMode / path)
+\`\`\`
+\`\`\`
+Run mode
+[s] Step — pause after every stage (default)
+[a] Auto — chain stages; stop only on block findings or security flag
 \`\`\``;
 const TRIAGE_PERSIST_EXAMPLE = `\`\`\`json
 {
@@ -21,7 +50,8 @@ const TRIAGE_PERSIST_EXAMPLE = `\`\`\`json
     "path": ["plan", "build", "review", "ship"],
     "rationale": "3 modules, ~150 LOC, no auth touch.",
     "decidedAt": "2026-05-08T12:34:56Z",
-    "userOverrode": false
+    "userOverrode": false,
+    "runMode": "step"
   }
 }
 \`\`\``;
@@ -101,17 +131,25 @@ Do not auto-delete state. Do not hand-edit the JSON.
 ## Hop 2 — Triage (fresh starts only)
-Run the \`triage-gate.md\` skill. The output is a single fenced block followed by four numbered options:
+Run the \`triage-gate.md\` skill. **Use the harness's structured question tool** (\`AskUserQuestion\` in Claude Code, \`AskQuestion\` in Cursor, the "ask" content block in OpenCode, \`prompt\` in Codex). Two questions, in order:
+${TRIAGE_ASK_EXAMPLE}
+The first question's prompt MUST embed the four heuristic facts (complexity + confidence, recommended path, why, AC mode) so the user can decide without reading another block. Keep it under 280 characters; truncate the rationale before truncating the facts.
+The second question is skipped on the trivial / inline path (no stages to chain). Default \`runMode\` is \`step\` if the user dismisses the question.
+If the harness lacks a structured ask facility, fall back to the legacy form:
-${TRIAGE_BLOCK_EXAMPLE}
+${TRIAGE_FALLBACK_EXAMPLE}
-Wait for the user's pick. Then patch \`flow-state.json\`:
+Once both answers are in, patch \`flow-state.json\`:
 ${TRIAGE_PERSIST_EXAMPLE}
-The triage decision is **immutable** for the lifetime of the flow. If the user wants a different acMode mid-flight, the path is \`/cc-cancel\` and a fresh \`/cc\` invocation.
+The triage decision is **immutable** for the lifetime of the flow. If the user wants a different acMode or runMode mid-flight, the path is \`/cc-cancel\` and a fresh \`/cc\` invocation.
-After triage, the rest of the orchestrator runs the stages listed in \`triage.path\`, in order, pausing between each.
+After triage, the rest of the orchestrator runs the stages listed in \`triage.path\`, in order. Pause behaviour between stages is controlled by \`triage.runMode\` — see Hop 4.
 ### Trivial path (acMode: inline)
@@ -186,11 +224,78 @@ The orchestrator reads only this. The full artifact stays in \`.cclaw/flows/<slu
 - Specialist: \`slice-builder\`.
 - Inputs: \`.cclaw/flows/<slug>/plan.md\`, \`.cclaw/lib/templates/build.md\`, \`.cclaw/lib/skills/tdd-cycle.md\`.
 - Output: \`.cclaw/flows/<slug>/build.md\` with TDD evidence at the granularity dictated by \`acMode\`.
-- Strict mode: full RED → GREEN → REFACTOR per AC, every commit through \`commit-helper.mjs\`. Parallel-build only if planner declared it AND \`acMode == strict\`.
-- Soft mode: one TDD cycle for the whole feature; tests under \`tests/\` mirroring the production module path; plain \`git commit\`.
+- Soft mode: one TDD cycle for the whole feature; tests under \`tests/\` mirroring the production module path; plain \`git commit\`. Sequential, single dispatch, no worktrees.
+- Strict mode, sequential: full RED → GREEN → REFACTOR per AC, every commit through \`commit-helper.mjs\`. Single \`slice-builder\` dispatch in the main working tree.
+- Strict mode, parallel: see "Parallel-build fan-out" below — only when planner declared \`topology: parallel-build\` AND ≥4 AC AND ≥2 disjoint touchSurface clusters.
 - Inline mode: not dispatched here — handled in the trivial path of Hop 2.
 - Slim summary: AC committed (strict) or conditions verified (soft), suite-status (passed / failed), open follow-ups.
+##### Parallel-build fan-out (strict mode + planner topology=parallel-build only)
+When the planner artifact declares \`topology: parallel-build\` with ≥2 slices and \`acMode == strict\`, the orchestrator fans out one \`slice-builder\` sub-agent per slice, **capped at 5**, each in its own \`git worktree\`. This is the only fan-out cclaw uses outside of \`ship\`.
+\`\`\`text
+                                  flows/<slug>/plan.md
+                                  topology: parallel-build
+                                  slices: [s-1, s-2, s-3]   (max 5)
+                                              │
+                                              ▼
+                            git worktree add .cclaw/worktrees/<slug>-s-1 -b cclaw/<slug>/s-1
+                            git worktree add .cclaw/worktrees/<slug>-s-2 -b cclaw/<slug>/s-2
+                            git worktree add .cclaw/worktrees/<slug>-s-3 -b cclaw/<slug>/s-3
+                                              │
+                          ┌───────────────────┼───────────────────┐
+                          ▼                   ▼                   ▼
+                   slice-builder         slice-builder         slice-builder
+                   (s-1; AC-1, AC-2)     (s-2; AC-3)           (s-3; AC-4, AC-5)
+                   cwd: …/<slug>-s-1      cwd: …/<slug>-s-2     cwd: …/<slug>-s-3
+                   RED→GREEN→REFACTOR     RED→GREEN→REFACTOR    RED→GREEN→REFACTOR
+                   per AC, in slice       per AC, in slice      per AC, in slice
+                          │                   │                   │
+                          └───────────────────┼───────────────────┘
+                                              ▼
+                                  reviewer (mode=integration)
+                                  reads each branch, checks
+                                  cross-slice conflicts, AC↔commit
+                                  chain across the wave
+                                              │
+                                              ▼
+                          merge cclaw/<slug>/s-1 → main, then s-2, then s-3
+                          (fast-forward when wave was clean; otherwise stop and ask)
+                                              │
+                                              ▼
+                          git worktree remove .cclaw/worktrees/<slug>-s-N (per slice)
+\`\`\`
+Dispatch envelope per slice:
+\`\`\`
+Dispatch slice-builder
+─ Stage: build
+─ Slug: <slug>
+─ Slice: s-N  (acIds: [AC-N, AC-N+1])
+─ Working tree: .cclaw/worktrees/<slug>-s-N
+─ Branch: cclaw/<slug>/s-N
+─ AC mode: strict
+─ Touch surface (only paths this slice may modify): [<paths from plan>]
+─ Output: .cclaw/flows/<slug>/build.md (append, marked with slice id)
+─ Forbidden: read or modify any path outside touch surface; read another slice's worktree mid-flight; merge or rebase
+\`\`\`
+After every slice-builder returns:
+1. Patch \`flow-state.json\` with the per-slice progress.
+2. When **every** slice has reported, dispatch \`reviewer\` mode=\`integration\` (one sub-agent, reads from each branch).
+3. On clear integration review, merge slices into main one at a time. On block, dispatch \`slice-builder\` mode=\`fix-only\` against the cited file:line refs, then re-run the integration reviewer.
+4. Worktree cleanup happens after merge; the cclaw branches stay until ship.
+Hard rules:
+- **More than 5 parallel slices is forbidden.** If planner produced >5, the planner must merge thinner slices into fatter ones before build; do not generate "wave 2".
+- Slice-builders never read each other's worktrees mid-flight. A slice that detects a conflict with another stops and raises an integration finding.
+- If the harness lacks sub-agent dispatch or worktree creation fails (non-git repo, permissions), parallel-build degrades silently to inline-sequential. Record the fallback in \`flows/<slug>/build.md\` frontmatter (\`subAgentDispatch: inline-fallback\`) — not an error.
+- \`auto\` runMode does **not** affect the integration-reviewer ask: a parallel wave that produces a block finding always asks the user before fix-only.
 #### review
 - Specialist: \`reviewer\` (mode = \`code\` for sequential build, \`integration\` for parallel-build).
@@ -220,6 +325,10 @@ Each step is a separate dispatch + pause + slim summary. The user can stop after
 ## Hop 4 — Pause and resume
+Pause behaviour depends on \`triage.runMode\` (default \`step\`).
+### \`step\` mode (default; safer; recommended for \`strict\` work)
 After every dispatch returns:
 1. Render the slim summary back to the user.
@@ -227,7 +336,25 @@ After every dispatch returns:
 3. Wait. Do **not** auto-advance. The user types \`continue\`, \`show\`, \`fix-only\`, or \`cancel\`.
 4. On \`continue\` → next stage in \`triage.path\`. On \`show\` → open the artifact and stop. On \`fix-only\` → re-dispatch slice-builder with mode=fix-only and the cited findings. On \`cancel\` → \`/cc-cancel\`.
-Resume from a fresh session works because everything is on disk: \`flow-state.json\` has \`currentStage\` and \`triage\`, \`flows/<slug>/*.md\` carries the artifacts. The next \`/cc\` invocation enters Hop 1 → detect → resume summary → continue from \`currentStage\`.
+### \`auto\` mode (autopilot; faster; recommended for \`inline\` / \`soft\` work)
+After every dispatch returns:
+1. Render the slim summary back to the user (one block, no prompt).
+2. **Immediately** dispatch the next stage in \`triage.path\` — no waiting, no question.
+3. Stop unconditionally only on these hard gates (autopilot **always** asks here):
+   - \`reviewer\` returned \`block\` decision (open findings) → render the findings, ask \`continue with fix-only\` / \`cancel\`.
+   - \`security-reviewer\` raised any finding → ask before proceeding.
+   - \`reviewer\` returned \`cap-reached\` (5 iterations without convergence) → ask.
+   - About to run \`ship\` (last stage in \`triage.path\`) → ask \`ship now?\` once, then proceed on confirmation. Ship is the only stage that always confirms in autopilot.
+Auto mode never silently skips a hard gate; it just removes the cosmetic pause between green stages. The user typed \`auto\` once during triage and meant it.
+### Common rules for both modes
+Resume from a fresh session works because everything is on disk: \`flow-state.json\` has \`currentStage\`, \`triage\` (with \`runMode\`), \`flows/<slug>/*.md\` carries the artifacts. The next \`/cc\` invocation enters Hop 1 → detect → resume summary → continue from \`currentStage\` with the saved runMode.
+Resuming a paused \`auto\` flow re-enters auto mode silently. Resuming a paused \`step\` flow renders the slim summary again and waits for \`continue\`.
 ## Hop 5 — Compound (automatic)
@@ -244,8 +371,9 @@ After ship + compound, move every \`<stage>.md\` from \`flows/<slug>/\` into \`.
 ## Always-ask rules
-- Always run the triage gate on a fresh \`/cc\`. Never silently pick a path.
-- Always pause after every stage. Never auto-advance through plan → build → review without asking.
+- Always run the triage gate on a fresh \`/cc\`. Never silently pick a path. Use the harness's structured question tool, not a printed code block.
+- In \`step\` mode, always pause after every stage. Never auto-advance.
+- In \`auto\` mode, never auto-advance past a hard gate (block / cap-reached / security finding / ship). The user opted into chaining green stages, not chaining decisions.
 - Always ask before \`git push\` or PR creation. Commit-helper auto-commits in strict mode; everything past commit is opt-in.
 - Always ask before deleting active artifacts (\`/cc-cancel\` is the supported way; do not \`rm\` artifacts directly).
 - Always show the slim summary back to the user; do not summarise from your own memory of the dispatch.

package/dist/flow-state.d.ts CHANGED Viewed

@@ -1,4 +1,4 @@
-import { type AcMode, type AcceptanceCriterionState, type BuildProfile, type DiscoverySpecialistId, type FlowStage, type RoutingClass, type TriageDecision } from "./types.js";
+import { type AcMode, type AcceptanceCriterionState, type BuildProfile, type DiscoverySpecialistId, type FlowStage, type RoutingClass, type RunMode, type TriageDecision } from "./types.js";
 export declare const FLOW_STATE_SCHEMA_VERSION = 3;
 /** v8.0–v8.1 schema. Auto-migrated to v3 on read. */
 export declare const LEGACY_V8_FLOW_STATE_SCHEMA_VERSION = 2;
@@ -28,10 +28,18 @@ export declare class LegacyFlowStateError extends Error {
 export declare function isFlowStage(value: unknown): value is FlowStage;
 export declare function isRoutingClass(value: unknown): value is RoutingClass;
 export declare function isAcMode(value: unknown): value is AcMode;
+export declare function isRunMode(value: unknown): value is RunMode;
 export declare function isDiscoverySpecialist(value: unknown): value is DiscoverySpecialistId;
 export declare function createInitialFlowState(nowIso?: string): FlowStateV82;
 /** @deprecated kept for source-level compatibility with v8.1 imports. */
 export declare const createInitialFlowStateV8: typeof createInitialFlowState;
+/**
+ * Read a triage decision's runMode with the documented default.
+ *
+ * v8.2 state files do not record runMode; treat them as `step` so existing
+ * flows keep their pause-between-stages behaviour byte-for-byte.
+ */
+export declare function runModeOf(triage: TriageDecision | null | undefined): RunMode;
 /**
  * Validate a flow-state object. Throws on hard schema errors.
  *

package/dist/flow-state.js CHANGED Viewed

@@ -1,4 +1,4 @@
-import { AC_MODES, FLOW_STAGES, ROUTING_CLASSES } from "./types.js";
+import { AC_MODES, FLOW_STAGES, ROUTING_CLASSES, RUN_MODES } from "./types.js";
 export const FLOW_STATE_SCHEMA_VERSION = 3;
 /** v8.0–v8.1 schema. Auto-migrated to v3 on read. */
 export const LEGACY_V8_FLOW_STATE_SCHEMA_VERSION = 2;
@@ -19,6 +19,9 @@ export function isRoutingClass(value) {
 export function isAcMode(value) {
     return typeof value === "string" && AC_MODES.includes(value);
 }
+export function isRunMode(value) {
+    return typeof value === "string" && RUN_MODES.includes(value);
+}
 export function isDiscoverySpecialist(value) {
     return value === "brainstormer" || value === "architect" || value === "planner";
 }
@@ -62,7 +65,8 @@ function inferTriageFromLegacy(state) {
         path: ["plan", "build", "review", "ship"],
         rationale: "Auto-migrated from cclaw 8.0/8.1 flow-state (no triage recorded; preserved as strict).",
         decidedAt: state.startedAt,
-        userOverrode: false
+        userOverrode: false,
+        runMode: "step"
     };
 }
 function assertAcArray(value) {
@@ -116,6 +120,18 @@ function assertTriageOrNull(value) {
     if (typeof triage.userOverrode !== "boolean") {
         throw new Error("triage.userOverrode must be a boolean");
     }
+    if (triage.runMode !== undefined && !isRunMode(triage.runMode)) {
+        throw new Error(`Invalid triage.runMode: ${String(triage.runMode)}`);
+    }
+}
+/**
+ * Read a triage decision's runMode with the documented default.
+ *
+ * v8.2 state files do not record runMode; treat them as `step` so existing
+ * flows keep their pause-between-stages behaviour byte-for-byte.
+ */
+export function runModeOf(triage) {
+    return triage?.runMode ?? "step";
 }
 /**
  * Validate a flow-state object. Throws on hard schema errors.

package/dist/types.d.ts CHANGED Viewed

@@ -41,6 +41,21 @@ export type RoutingClass = (typeof ROUTING_CLASSES)[number];
  */
 export declare const AC_MODES: readonly ["inline", "soft", "strict"];
 export type AcMode = (typeof AC_MODES)[number];
+/**
+ * How aggressively the orchestrator advances through the flow.
+ *
+ * - `step` (default): pause after every stage. The orchestrator renders the
+ *   slim summary and waits for the user to type "continue". The original
+ *   v8.2 behaviour, recommended for `strict` and unfamiliar work.
+ * - `auto`: render the slim summary and immediately dispatch the next stage
+ *   without asking. Stops only on hard gates (block findings, security flag,
+ *   ship). Recommended for `inline` / `soft` work the user has already
+ *   scoped tightly.
+ *
+ * Selected at the triage gate; user can override per flow.
+ */
+export declare const RUN_MODES: readonly ["step", "auto"];
+export type RunMode = (typeof RUN_MODES)[number];
 /**
  * Decision recorded at the triage gate that opens every new flow.
  * Persisted in flow-state.json so resumes never re-trigger triage.
@@ -56,6 +71,14 @@ export interface TriageDecision {
     decidedAt: string;
     /** Did the user override the orchestrator's recommendation? */
     userOverrode: boolean;
+    /**
+     * Step-by-step (default) or autopilot. Persisted across resumes so the
+     * user only picks once per flow.
+     *
+     * Optional in TypeScript so v8.2 state files (which lack `runMode`) still
+     * validate; readers MUST default to `step` on absent.
+     */
+    runMode?: RunMode;
 }
 export interface CliContext {
     cwd: string;

package/dist/types.js CHANGED Viewed

@@ -21,3 +21,17 @@ export const ROUTING_CLASSES = ["trivial", "small-medium", "large-risky"];
  * Selected at the triage gate; user can override.
  */
 export const AC_MODES = ["inline", "soft", "strict"];
+/**
+ * How aggressively the orchestrator advances through the flow.
+ *
+ * - `step` (default): pause after every stage. The orchestrator renders the
+ *   slim summary and waits for the user to type "continue". The original
+ *   v8.2 behaviour, recommended for `strict` and unfamiliar work.
+ * - `auto`: render the slim summary and immediately dispatch the next stage
+ *   without asking. Stops only on hard gates (block findings, security flag,
+ *   ship). Recommended for `inline` / `soft` work the user has already
+ *   scoped tightly.
+ *
+ * Selected at the triage gate; user can override per flow.
+ */
+export const RUN_MODES = ["step", "auto"];

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "cclaw-cli",
-  "version": "8.2.0",
+  "version": "8.3.0",
   "description": "Lightweight harness-first flow toolkit for coding agents",
   "type": "module",
   "bin": {