cclaw-cli 8.1.2 → 8.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,35 +1,62 @@
1
1
  # cclaw
2
2
 
3
- **cclaw is a lightweight harness-first flow toolkit for coding agents.** It installs three slash commands, six on-demand specialists, twelve auto-trigger skills (including TDD cycle and conversation-language), ten artifact templates, four stage runbooks, eight reference patterns, five research playbooks, five recovery playbooks, thirteen worked examples, an antipatterns library, a decision protocol, a meta-skill, an interactive harness picker, and a tiny runtime together a deep content layer wrapped around a runtime under 1 KLOC — so Claude Code, Cursor, OpenCode, or Codex can move from idea to shipped change with a clear plan, AC traceability, TDD per AC, and almost no ceremony.
3
+ **cclaw is a lightweight harness-first flow toolkit for coding agents.** Three slash commands. Five hops (`Detect Triage Dispatch Pause Compound/Ship`). Four stages (`plan build review ship`, where **build IS a TDD cycle**: RED GREEN REFACTOR). Six on-demand specialists, all running as isolated sub-agents. Three Acceptance-Criteria modes (`inline` / `soft` / `strict`) so trivial edits do not pay the price of risky migrations. A deep content layer of skills, templates, runbooks, patterns, examples, and recovery playbooks wrapped around a runtime under 1 KLOC — so Claude Code, Cursor, OpenCode, or Codex can move from idea to shipped change with a clear plan, the right amount of ceremony, and almost no orchestrator bloat.
4
4
 
5
5
  ```text
6
- idea
7
-
8
-
9
- /cc <task>
10
-
11
- ┌─────┴─────────────────────────────────────┐
12
- Phase 0 calibration:
13
- │ targeted change or multi-component? │
14
- └─────┬─────────────────┬───────────────────┘
15
- │trivial │small/medium │large/risky
16
- ▼ ▼ ▼
17
- edit + commit plan build brainstormer
18
- per AC → review ship architect planner
19
- (each is optional)
20
-
21
-
22
- compound (auto, gated)
23
-
24
-
25
- active artifactsshipped/<slug>/
6
+ idea
7
+
8
+
9
+ /cc <task>
10
+
11
+ ┌─────────┴──────────────────────────────────────────┐
12
+ Hop 1: Detect — fresh start? or resume active flow?
13
+ └─────────┬──────────────────────────────────────────┘
14
+ │ fresh
15
+
16
+ ┌────────────────────────────────────────────────────┐
17
+ Hop 2: Triage auto-classify task, │
18
+ recommend path + acMode, user accepts or overrides │
19
+ └─────────┬──────────────────────────────────────────┘
20
+
21
+ trivial │ small-medium │ large-risky
22
+ acMode │ acMode soft │ acMode strict
23
+ inline
24
+ ▼ ▼
25
+ edit + commit plan build → review → ship brainstorm? → architect? → plan → build → review → ship
26
+ (no plan) each stage in a fresh sub-agent each stage in a fresh sub-agent, parallel-build allowed
27
+ │ │
28
+ └─────────┬────────────┘
29
+
30
+ compound (auto, gated by quality)
31
+
32
+
33
+ active flows → shipped/<slug>/
26
34
  ```
27
35
 
28
- Three slash commands. Four stages (`plan → build → review → ship`, where **build IS a TDD cycle**: RED GREEN REFACTOR per AC). Six specialists. Eleven skills (including a TDD-cycle skill that's always-on while building). Ten templates. Four runbooks. Eight reference patterns. Five research playbooks. Five recovery playbooks. Thirteen worked examples. Two mandatory gates (AC traceability + TDD phase chain).
36
+ Three slash commands (`/cc`, `/cc-cancel`, `/cc-idea`). Four stages (`plan → build → review → ship`). Six specialists, all on-demand, all running as sub-agents. Fifteen skills including the always-on `triage-gate`, `flow-resume`, `tdd-cycle`, `conversation-language`, and `anti-slop`. Ten templates including `plan-soft.md` and `build-soft.md` for the soft-mode path. Four runbooks. Eight reference patterns. Three research playbooks. Five recovery playbooks. Eight worked examples. Two mandatory gates in strict mode (AC traceability + TDD phase chain); soft mode keeps both as advisory; inline mode skips both.
37
+
38
+ ## What changed in 8.3
39
+
40
+ 8.3 is a non-breaking content + UX patch on top of 8.2.
41
+
42
+ - **Triage as a structured ask, not a code block.** The orchestrator now uses the harness's structured question tool (`AskUserQuestion` / `AskQuestion` / `prompt`) to render the triage. Two questions, in order: pick the path, then pick the run mode. The fenced form remains as a fallback only.
43
+ - **Run mode: `step` (default) vs `auto`.** `step` pauses after every stage and waits for `continue` (8.2 behaviour). `auto` chains plan → build → review → ship without pausing; stops only on block findings, cap-reached, security findings, or before `ship`. New optional field `triage.runMode` in `flow-state.json`.
44
+ - **Explicit parallel-build fan-out in Hop 3.** The `/cc` body now carries a full ASCII fan-out diagram for the strict-mode parallel-build path — `git worktree` per slice, max 5 slices, one `slice-builder` sub-agent per slice, integration reviewer, merge sequence. The skill `parallel-build.md` already had this; the orchestrator now sees it at the dispatch site.
45
+ - **TDD cycle deepening.** Four new sections in `tdd-cycle.md`: vertical slicing / tracer bullets, stop-the-line rule, Prove-It pattern for bug fixes, writing-good-tests rules (state-not-interactions, DAMP over DRY, real-over-mock, test pyramid). Three new antipatterns: A-13 horizontal slicing, A-14 pushing past a failing test, A-15 mocking what should not be mocked.
46
+
47
+ ## What changed in 8.2
48
+
49
+ 8.2 is a non-breaking redesign of the `/cc` orchestrator on top of 8.1.
50
+
51
+ - **Triage gate.** Every fresh flow runs the `triage-gate` skill, which classifies the task as `trivial` / `small-medium` / `large-risky` from six heuristics, recommends a path and an `acMode`, and asks the user to accept or override. The decision is persisted into `flow-state.json` so resumes never re-prompt.
52
+ - **Graduated AC.** Acceptance Criteria are no longer one-size-fits-all. `inline` (trivial) skips them entirely. `soft` (small-medium) uses a bullet list of testable conditions with no AC IDs and an advisory commit-helper. `strict` (large-risky) is the 8.1 behaviour byte-for-byte: AC IDs, mandatory `commit-helper.mjs --ac-id=AC-N --phase=red|green|refactor`, per-AC TDD chain.
53
+ - **Sub-agent dispatch.** `plan`, `build`, `review`, and `ship` each run in a fresh sub-agent invocation. The orchestrator hands a slim envelope (slug / stage / acMode / artifact paths) and gets back a fixed 5-to-7-line summary plus the artifact on disk. No specialist reasoning leaks into the orchestrator context.
54
+ - **Resume.** Invoking `/cc` while a flow is active triggers the `flow-resume` skill: 4-line summary plus `r` resume / `s` show / `c` cancel / `n` start new. The triage decision is preserved across sessions.
55
+ - **Schema bump.** `flow-state.json` is now `schemaVersion: 3` with a `triage` field. Existing v2 files are auto-migrated on first read with `acMode: strict` so existing flows behave exactly as in 8.1.
29
56
 
30
57
  ## What changed in v8
31
58
 
32
- cclaw v8.0 is a breaking redesign. We dropped the 7.x stage machine: no more `brainstorm` / `scope` / `design` / `spec` / `tdd` mandatory stages, no more 18 specialists, no more 9 state files, no more 30 stage gates. v7.x runs are not migrated; see [docs/migration-v7-to-v8.md](docs/migration-v7-to-v8.md).
59
+ cclaw v8.0 was a breaking redesign of the v7 stage machine. We dropped the 7.x stage machine: no more `brainstorm` / `scope` / `design` / `spec` / `tdd` mandatory stages, no more 18 specialists, no more 9 state files, no more 30 stage gates. v7.x runs are not migrated; see [docs/migration-v7-to-v8.md](docs/migration-v7-to-v8.md).
33
60
 
34
61
  What we kept and made deeper:
35
62
 
@@ -1,4 +1,4 @@
1
- export declare const CCLAW_VERSION = "8.1.2";
1
+ export declare const CCLAW_VERSION = "8.3.0";
2
2
  export declare const RUNTIME_ROOT = ".cclaw";
3
3
  export declare const STATE_REL_PATH = ".cclaw/state";
4
4
  export declare const HOOKS_REL_PATH = ".cclaw/hooks";
package/dist/constants.js CHANGED
@@ -1,4 +1,4 @@
1
- export const CCLAW_VERSION = "8.1.2";
1
+ export const CCLAW_VERSION = "8.3.0";
2
2
  export const RUNTIME_ROOT = ".cclaw";
3
3
  export const STATE_REL_PATH = `${RUNTIME_ROOT}/state`;
4
4
  export const HOOKS_REL_PATH = `${RUNTIME_ROOT}/hooks`;
@@ -1 +1 @@
1
- export declare const ANTIPATTERNS = "# .cclaw/lib/antipatterns.md\n\nPatterns we have seen fail. Each entry is a short symptom, the underlying mistake, and the corrective action. The orchestrator and specialists open this file when a smell is detected; the reviewer cites entries as findings when applicable.\n\n## A-1 \u2014 \"Just one more AC\"\n\n**Symptom.** A plan starts with 4 AC and ends with 11. Most of the additions appeared during build.\n\n**Underlying mistake.** Scope is being expanded mid-flight without going back to plan-stage.\n\n**Correction.** When build encounters new work, surface it as a follow-up in `.cclaw/ideas.md` or a fresh slug. If the new work is genuinely required to satisfy an existing AC, that AC was wrong; cancel the slug and re-plan with a tighter AC set.\n\n## A-2 \u2014 TDD phase integrity broken\n\n**Symptom (any of):**\n\n- Build commits land for AC-N with `--phase=green` but no `--phase=red` recorded earlier.\n- AC has RED + GREEN commits but no `--phase=refactor` (skipped or applied) entry in flow-state.\n- A `--phase=red` commit touches `src/`, `lib/`, or `app/` \u2014 production code slipped into RED.\n- Tests for AC-N appear in a separate commit a few minutes after the AC-N implementation lands.\n\n**Underlying mistake.** The TDD cycle was treated as ceremony, not as the contract. The cycle exists so the failing test encodes the AC; skipping or scrambling phases produces an audit trail that nobody can trust.\n\n**Correction.** `commit-helper.mjs` enforces RED \u2192 GREEN \u2192 REFACTOR per AC. Write a failing test first and commit under `--phase=red` (test files only). Implement the smallest production change that turns it green; commit under `--phase=green`. Either commit a refactor under `--phase=refactor` or skip it explicitly with `--phase=refactor --skipped --message=\"refactor(AC-N) skipped: <reason>\"`. The reviewer cites this entry whenever the chain is incomplete.\n\n## A-3 \u2014 Work outside the AC\n\n**Symptom (any of):**\n\n- A small AC commit also restructures an unrelated module.\n- A commit produced by `commit-helper.mjs` contains files that are unrelated to the AC.\n- `git add -A` appears in shell history inside `/cc`.\n\n**Underlying mistake.** Slice-builder absorbed unrelated edits or silently expanded scope. The AC commit no longer maps cleanly to the AC.\n\n**Correction.** Stage AC-related files explicitly: `git add <path>` per file, or `git add -p` to pick hunks. Never `git add -A` inside `/cc`. If a refactor really must happen, capture it as a follow-up; if it really blocks the AC, cancel the slug and re-plan as a refactor + behaviour-change pair.\n\n## A-4 \u2014 AC that mirror sub-tasks\n\n**Symptom.** AC read like \"implement the helper\", \"wire the helper\", \"test the helper\".\n\n**Underlying mistake.** AC are outcomes, not sub-tasks. Outcomes survive refactors; sub-tasks do not.\n\n**Correction.** Rewrite AC as observable outcomes. The helper is an implementation detail, not an AC.\n\n## A-5 \u2014 Over-careful brainstormer\n\n**Symptom.** Brainstormer produces three pages of Context for a small task; planner is then unable to size the work.\n\n**Underlying mistake.** Brainstormer ignored the routing class. Trivial / small-medium tasks should have a one-paragraph Context, not a Frame + Scope + Alternatives sweep.\n\n**Correction.** Brainstormer reads the routing class first and short-circuits when the task is small. Three sentences of Context is enough for AC-1.\n\n## A-6 \u2014 \"I already looked\"\n\n**Symptom.** Reviewer reports a \"clear\" decision without a Five Failure Modes pass.\n\n**Underlying mistake.** The Five Failure Modes pass is the artifact. Skipping it because \"I already looked\" produces no audit trail.\n\n**Correction.** Reviewer always emits the Five Failure Modes block. Each item gets yes / no with citation when yes. A \"no\" with no thinking attached is fine; an absent block is not.\n\n## A-7 \u2014 Shipping with a pending AC\n\n**Symptom.** `runCompoundAndShip()` is invoked while flow-state has at least one AC with `status: pending`.\n\n**Underlying mistake.** The agent expected the orchestrator to \"figure it out\" and complete the AC silently.\n\n**Correction.** The AC traceability gate refuses ship. Either complete the AC (slice-builder) or cancel the slug (`/cc-cancel`) and re-plan with the smaller AC set. There is no override.\n\n## A-8 \u2014 Re-creating a shipped slug instead of refining\n\n**Symptom.** A new `/cc` invocation produces a slug whose plan is 80% identical to a slug already in `.cclaw/flows/shipped/`.\n\n**Underlying mistake.** Existing-plan detection was skipped or its output was ignored.\n\n**Correction.** Existing-plan detection is mandatory at the start of every `/cc`. When a shipped match is offered, the user picks **refine shipped** or **new unrelated**, not \"ignore the match\".\n\n## A-9 \u2014 Editing shipped artifacts\n\n**Symptom.** A shipped slug's `plan.md` is edited weeks after ship.\n\n**Underlying mistake.** Shipped artifacts are immutable. Editing them invalidates the knowledge index and breaks refinement chains.\n\n**Correction.** Open a refinement slug. The new slug carries `refines: <old-slug>` and contains the corrections. The old slug stays as it shipped.\n\n## A-10 \u2014 Force-push during ship\n\n**Symptom.** `git push --force` appears in shell history during ship.\n\n**Underlying mistake.** Force-push rewrites the SHAs that flow-state and the AC traceability block reference. The chain breaks silently; nothing in the runtime detects it.\n\n**Correction.** Refuse `git push --force` inside `/cc` unless the user explicitly requested it twice (initial request + confirmation). After the force-push, every recorded SHA in the slug must be re-verified by hand and updated.\n\n## A-11 \u2014 Hidden security surface\n\n**Symptom.** A slug ships without `security_flag: true` even though the diff added a new auth-adjacent code path.\n\n**Underlying mistake.** The author judged \"this is mostly UI\" and skipped the security checklist.\n\n**Correction.** `security_flag` is set whenever the diff touches authn / authz / secrets / supply chain / data exposure, even when the change feels small. The cost of a spurious security flag is a few minutes; the cost of a missed one is a CVE.\n\n## A-12 \u2014 Single test green, didn't run the suite\n\n**Symptom.** `flows/<slug>/build.md` GREEN evidence column shows `npm test path/to/single.test` only; full-suite run is missing.\n\n**Underlying mistake.** A passing single test is not GREEN. Production change can break adjacent tests; without running the suite, the AC is shipped on a regression.\n\n**Correction.** GREEN evidence must be the **full relevant suite** for the affected module(s), not the single test. The reviewer cites this as a block finding.\n";
1
+ export declare const ANTIPATTERNS = "# .cclaw/lib/antipatterns.md\n\nPatterns we have seen fail. Each entry is a short symptom, the underlying mistake, and the corrective action. The orchestrator and specialists open this file when a smell is detected; the reviewer cites entries as findings when applicable.\n\n## A-1 \u2014 \"Just one more AC\"\n\n**Symptom.** A plan starts with 4 AC and ends with 11. Most of the additions appeared during build.\n\n**Underlying mistake.** Scope is being expanded mid-flight without going back to plan-stage.\n\n**Correction.** When build encounters new work, surface it as a follow-up in `.cclaw/ideas.md` or a fresh slug. If the new work is genuinely required to satisfy an existing AC, that AC was wrong; cancel the slug and re-plan with a tighter AC set.\n\n## A-2 \u2014 TDD phase integrity broken\n\n**Symptom (any of):**\n\n- Build commits land for AC-N with `--phase=green` but no `--phase=red` recorded earlier.\n- AC has RED + GREEN commits but no `--phase=refactor` (skipped or applied) entry in flow-state.\n- A `--phase=red` commit touches `src/`, `lib/`, or `app/` \u2014 production code slipped into RED.\n- Tests for AC-N appear in a separate commit a few minutes after the AC-N implementation lands.\n\n**Underlying mistake.** The TDD cycle was treated as ceremony, not as the contract. The cycle exists so the failing test encodes the AC; skipping or scrambling phases produces an audit trail that nobody can trust.\n\n**Correction.** `commit-helper.mjs` enforces RED \u2192 GREEN \u2192 REFACTOR per AC. Write a failing test first and commit under `--phase=red` (test files only). Implement the smallest production change that turns it green; commit under `--phase=green`. Either commit a refactor under `--phase=refactor` or skip it explicitly with `--phase=refactor --skipped --message=\"refactor(AC-N) skipped: <reason>\"`. The reviewer cites this entry whenever the chain is incomplete.\n\n## A-3 \u2014 Work outside the AC\n\n**Symptom (any of):**\n\n- A small AC commit also restructures an unrelated module.\n- A commit produced by `commit-helper.mjs` contains files that are unrelated to the AC.\n- `git add -A` appears in shell history inside `/cc`.\n\n**Underlying mistake.** Slice-builder absorbed unrelated edits or silently expanded scope. The AC commit no longer maps cleanly to the AC.\n\n**Correction.** Stage AC-related files explicitly: `git add <path>` per file, or `git add -p` to pick hunks. Never `git add -A` inside `/cc`. If a refactor really must happen, capture it as a follow-up; if it really blocks the AC, cancel the slug and re-plan as a refactor + behaviour-change pair.\n\n## A-4 \u2014 AC that mirror sub-tasks\n\n**Symptom.** AC read like \"implement the helper\", \"wire the helper\", \"test the helper\".\n\n**Underlying mistake.** AC are outcomes, not sub-tasks. Outcomes survive refactors; sub-tasks do not.\n\n**Correction.** Rewrite AC as observable outcomes. The helper is an implementation detail, not an AC.\n\n## A-5 \u2014 Over-careful brainstormer\n\n**Symptom.** Brainstormer produces three pages of Context for a small task; planner is then unable to size the work.\n\n**Underlying mistake.** Brainstormer ignored the routing class. Trivial / small-medium tasks should have a one-paragraph Context, not a Frame + Scope + Alternatives sweep.\n\n**Correction.** Brainstormer reads the routing class first and short-circuits when the task is small. Three sentences of Context is enough for AC-1.\n\n## A-6 \u2014 \"I already looked\"\n\n**Symptom.** Reviewer reports a \"clear\" decision without a Five Failure Modes pass.\n\n**Underlying mistake.** The Five Failure Modes pass is the artifact. Skipping it because \"I already looked\" produces no audit trail.\n\n**Correction.** Reviewer always emits the Five Failure Modes block. Each item gets yes / no with citation when yes. A \"no\" with no thinking attached is fine; an absent block is not.\n\n## A-7 \u2014 Shipping with a pending AC\n\n**Symptom.** `runCompoundAndShip()` is invoked while flow-state has at least one AC with `status: pending`.\n\n**Underlying mistake.** The agent expected the orchestrator to \"figure it out\" and complete the AC silently.\n\n**Correction.** The AC traceability gate refuses ship. Either complete the AC (slice-builder) or cancel the slug (`/cc-cancel`) and re-plan with the smaller AC set. There is no override.\n\n## A-8 \u2014 Re-creating a shipped slug instead of refining\n\n**Symptom.** A new `/cc` invocation produces a slug whose plan is 80% identical to a slug already in `.cclaw/flows/shipped/`.\n\n**Underlying mistake.** Existing-plan detection was skipped or its output was ignored.\n\n**Correction.** Existing-plan detection is mandatory at the start of every `/cc`. When a shipped match is offered, the user picks **refine shipped** or **new unrelated**, not \"ignore the match\".\n\n## A-9 \u2014 Editing shipped artifacts\n\n**Symptom.** A shipped slug's `plan.md` is edited weeks after ship.\n\n**Underlying mistake.** Shipped artifacts are immutable. Editing them invalidates the knowledge index and breaks refinement chains.\n\n**Correction.** Open a refinement slug. The new slug carries `refines: <old-slug>` and contains the corrections. The old slug stays as it shipped.\n\n## A-10 \u2014 Force-push during ship\n\n**Symptom.** `git push --force` appears in shell history during ship.\n\n**Underlying mistake.** Force-push rewrites the SHAs that flow-state and the AC traceability block reference. The chain breaks silently; nothing in the runtime detects it.\n\n**Correction.** Refuse `git push --force` inside `/cc` unless the user explicitly requested it twice (initial request + confirmation). After the force-push, every recorded SHA in the slug must be re-verified by hand and updated.\n\n## A-11 \u2014 Hidden security surface\n\n**Symptom.** A slug ships without `security_flag: true` even though the diff added a new auth-adjacent code path.\n\n**Underlying mistake.** The author judged \"this is mostly UI\" and skipped the security checklist.\n\n**Correction.** `security_flag` is set whenever the diff touches authn / authz / secrets / supply chain / data exposure, even when the change feels small. The cost of a spurious security flag is a few minutes; the cost of a missed one is a CVE.\n\n## A-12 \u2014 Single test green, didn't run the suite\n\n**Symptom.** `flows/<slug>/build.md` GREEN evidence column shows `npm test path/to/single.test` only; full-suite run is missing.\n\n**Underlying mistake.** A passing single test is not GREEN. Production change can break adjacent tests; without running the suite, the AC is shipped on a regression.\n\n**Correction.** GREEN evidence must be the **full relevant suite** for the affected module(s), not the single test. The reviewer cites this as a block finding.\n\n## A-13 \u2014 Horizontal slicing (RED-batch then GREEN-batch)\n\n**Symptom.** `flows/<slug>/build.md` shows AC-1 RED, AC-2 RED, AC-3 RED committed in a row, then AC-1 GREEN, AC-2 GREEN, AC-3 GREEN. Or the slice-builder describes the build as \"tests written, now I'll implement\".\n\n**Underlying mistake.** Writing all RED tests before any GREEN code means the tests describe the behaviour you *guessed* before you saw the real interface. Tests written this way pass when behaviour breaks (because they test the imagined shape) and fail when behaviour is fine (because the real shape diverged from the imagination). They get rewritten during the next refactor.\n\n**Correction.** One test \u2192 one implementation \u2192 repeat. Each cycle informs the next. The AC-2 test is shaped by what the AC-1 implementation revealed about the real interface. `commit-helper.mjs --phase=red` for AC-2 will refuse if AC-1's chain isn't closed yet \u2014 that is the rail. See the Vertical Slicing section in `tdd-cycle.md`.\n\n## A-14 \u2014 Pushing past a failing test\n\n**Symptom.** Build log shows a flaky or unexpected failure on AC-2, then continues into AC-3 with \"I'll come back to AC-2 later\". Or a hook rejection silently retried with a slightly different commit message.\n\n**Underlying mistake.** Errors compound. AC-3 is built on the invariants AC-2 was supposed to establish. If AC-2's RED failed for the wrong reason, you are debugging a stack of broken assumptions, and every cycle past that point makes the diagnosis harder.\n\n**Correction.** Stop the line. Preserve the failure (command + 1\u20133 lines of output verbatim), reproduce in isolation, root-cause to a concrete file:line, fix once, re-run the full relevant suite, then resume the cycle. If the root cause cannot be identified in three attempts, surface a blocker to the orchestrator \u2014 do not \"make it work\" by removing the test or weakening the assertion.\n\n## A-15 \u2014 Mocking what should not be mocked\n\n**Symptom.** A database query test mocks the driver and asserts on `db.query` call shape; the test is green and the actual query never runs in production. Or a service test mocks every collaborator and only verifies which methods were called, in which order.\n\n**Underlying mistake.** Mocking a dependency you control couples the test to the implementation. The test reads green even when the SQL is wrong, the migration is missing, the column is misspelled. Real bugs live in those gaps. Interaction-based assertions (`expect(x).toHaveBeenCalledWith(...)`) break on every refactor and provide weaker confidence than state-based assertions on the outcome.\n\n**Correction.** Use a real test database (or an in-memory fake of the same shape) and assert on the **outcome** \u2014 the row that was inserted, the response from the query, the observable side effect \u2014 not on the call. Reach for mocks only for things genuinely outside your control: third-party APIs, time, randomness, the network. Real > Fake (in-memory) > Stub (canned data) > Mock (interaction).\n";
@@ -106,4 +106,28 @@ Patterns we have seen fail. Each entry is a short symptom, the underlying mistak
106
106
  **Underlying mistake.** A passing single test is not GREEN. Production change can break adjacent tests; without running the suite, the AC is shipped on a regression.
107
107
 
108
108
  **Correction.** GREEN evidence must be the **full relevant suite** for the affected module(s), not the single test. The reviewer cites this as a block finding.
109
+
110
+ ## A-13 — Horizontal slicing (RED-batch then GREEN-batch)
111
+
112
+ **Symptom.** \`flows/<slug>/build.md\` shows AC-1 RED, AC-2 RED, AC-3 RED committed in a row, then AC-1 GREEN, AC-2 GREEN, AC-3 GREEN. Or the slice-builder describes the build as "tests written, now I'll implement".
113
+
114
+ **Underlying mistake.** Writing all RED tests before any GREEN code means the tests describe the behaviour you *guessed* before you saw the real interface. Tests written this way pass when behaviour breaks (because they test the imagined shape) and fail when behaviour is fine (because the real shape diverged from the imagination). They get rewritten during the next refactor.
115
+
116
+ **Correction.** One test → one implementation → repeat. Each cycle informs the next. The AC-2 test is shaped by what the AC-1 implementation revealed about the real interface. \`commit-helper.mjs --phase=red\` for AC-2 will refuse if AC-1's chain isn't closed yet — that is the rail. See the Vertical Slicing section in \`tdd-cycle.md\`.
117
+
118
+ ## A-14 — Pushing past a failing test
119
+
120
+ **Symptom.** Build log shows a flaky or unexpected failure on AC-2, then continues into AC-3 with "I'll come back to AC-2 later". Or a hook rejection silently retried with a slightly different commit message.
121
+
122
+ **Underlying mistake.** Errors compound. AC-3 is built on the invariants AC-2 was supposed to establish. If AC-2's RED failed for the wrong reason, you are debugging a stack of broken assumptions, and every cycle past that point makes the diagnosis harder.
123
+
124
+ **Correction.** Stop the line. Preserve the failure (command + 1–3 lines of output verbatim), reproduce in isolation, root-cause to a concrete file:line, fix once, re-run the full relevant suite, then resume the cycle. If the root cause cannot be identified in three attempts, surface a blocker to the orchestrator — do not "make it work" by removing the test or weakening the assertion.
125
+
126
+ ## A-15 — Mocking what should not be mocked
127
+
128
+ **Symptom.** A database query test mocks the driver and asserts on \`db.query\` call shape; the test is green and the actual query never runs in production. Or a service test mocks every collaborator and only verifies which methods were called, in which order.
129
+
130
+ **Underlying mistake.** Mocking a dependency you control couples the test to the implementation. The test reads green even when the SQL is wrong, the migration is missing, the column is misspelled. Real bugs live in those gaps. Interaction-based assertions (\`expect(x).toHaveBeenCalledWith(...)\`) break on every refactor and provide weaker confidence than state-based assertions on the outcome.
131
+
132
+ **Correction.** Use a real test database (or an in-memory fake of the same shape) and assert on the **outcome** — the row that was inserted, the response from the query, the observable side effect — not on the call. Reach for mocks only for things genuinely outside your control: third-party APIs, time, randomness, the network. Real > Fake (in-memory) > Stub (canned data) > Mock (interaction).
109
133
  `;
@@ -1,5 +1,5 @@
1
1
  export interface ArtifactTemplate {
2
- id: "plan" | "build" | "review" | "ship" | "decisions" | "learnings" | "manifest" | "ideas";
2
+ id: "plan" | "plan-soft" | "build" | "build-soft" | "review" | "ship" | "decisions" | "learnings" | "manifest" | "ideas";
3
3
  fileName: string;
4
4
  description: string;
5
5
  body: string;
@@ -92,6 +92,53 @@ _(Planner topology mode. Default: \`inline\`. \`parallel-build\` is opt-in; see
92
92
 
93
93
  This block is rebuilt by \`commit-helper.mjs\` after every AC commit. Do not edit by hand once a commit is recorded.
94
94
  `;
95
+ const PLAN_TEMPLATE_SOFT = `---
96
+ slug: SLUG-PLACEHOLDER
97
+ stage: plan
98
+ status: active
99
+ ac_mode: soft
100
+ last_specialist: null
101
+ refines: null
102
+ shipped_at: null
103
+ ship_commit: null
104
+ review_iterations: 0
105
+ security_flag: false
106
+ ---
107
+
108
+ # SLUG-PLACEHOLDER
109
+
110
+ > One short paragraph: what we are doing and why. If the goal does not fit in 4 lines, the request is probably too large — split it or re-triage to large-risky.
111
+
112
+ ## Plan
113
+
114
+ _(Planner authors this. One short paragraph describing the change end-to-end. No phases, no AC IDs.)_
115
+
116
+ ## Testable conditions
117
+
118
+ _(Bullet list. Each line is a behaviour the slice-builder's tests must verify. Conditions are observable; if you can't name a test or manual step that proves it, drop the bullet.)_
119
+
120
+ - _Condition 1 — observable behaviour, e.g. "Pill renders the request status (Pending / Approved / Denied)."_
121
+ - _Condition 2._
122
+ - _Condition 3._
123
+
124
+ ## Verification
125
+
126
+ _(One block per layer. Tests file paths, manual steps, runner command.)_
127
+
128
+ - \`tests/unit/<module>.test.ts\` — covers all listed conditions in one test file.
129
+ - Manual: _open <url>, perform <action>, observe <outcome>_.
130
+
131
+ ## Touch surface
132
+
133
+ _(Files the slice-builder is allowed to modify. Used by reviewer to flag scope creep.)_
134
+
135
+ - \`src/<module>/<file>.ts\`
136
+ - \`tests/unit/<module>.test.ts\`
137
+
138
+ ## Notes
139
+
140
+ _(Optional. Brainstormer / architect did NOT run for soft-mode flows; if you discover the work needs structural decisions or threat modelling mid-flight, surface back to the orchestrator and ask to re-triage as large-risky.)_
141
+ `;
95
142
  const BUILD_TEMPLATE = `---
96
143
  slug: SLUG-PLACEHOLDER
97
144
  stage: build
@@ -162,6 +209,38 @@ _(Append one fix-iteration block per review iteration that returned \`block\`. S
162
209
 
163
210
  _(Surprises, deviations from the plan, tests added, refactors that came up, paths considered and discarded, etc.)_
164
211
  `;
212
+ const BUILD_TEMPLATE_SOFT = `---
213
+ slug: SLUG-PLACEHOLDER
214
+ stage: build
215
+ status: active
216
+ ac_mode: soft
217
+ last_commit: null
218
+ ---
219
+
220
+ # Build log — SLUG-PLACEHOLDER
221
+
222
+ This is the soft-mode build log. One TDD cycle covers all listed conditions; commits are plain \`git commit\` (the commit-helper is advisory in soft mode).
223
+
224
+ > **Iron Law:** NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST. The RED failure is the spec.
225
+
226
+ ## Plan summary
227
+
228
+ _(One paragraph mirroring \`flows/SLUG-PLACEHOLDER/plan.md\` Plan section.)_
229
+
230
+ ## Build log
231
+
232
+ - **Tests added**: _\`tests/unit/<module>.test.ts\` — N tests, mirroring the listed conditions._
233
+ - **Discovery**: _\`src/<module>/<file>.ts:<line>\`, \`tests/unit/<existing>.test.ts:<line>\`._
234
+ - **RED**: _\`<runner command>\` → N failing (expected). Cite the assertion that fails (≤3 lines)._
235
+ - **GREEN**: _One sentence on the minimal change. \`<full-suite command>\` → all passing._
236
+ - **REFACTOR**: _One-line shape change applied, or "skipped: <reason>"._
237
+ - **Commit**: _\`<one-line message>\` (\`<SHA>\`)._
238
+ - **Follow-ups**: _\`info\` items deferred to a separate slug, or "none"._
239
+
240
+ ## Notes
241
+
242
+ _(Surprises, deviations from the plan, paths considered and discarded, etc.)_
243
+ `;
165
244
  const REVIEW_TEMPLATE = `---
166
245
  slug: SLUG-PLACEHOLDER
167
246
  stage: review
@@ -519,8 +598,10 @@ This file is a free-form idea backlog. Entries are appended by \`/cc-idea\` and
519
598
  Each entry begins with an ISO timestamp, then a single-line summary, then the body.
520
599
  `;
521
600
  export const ARTIFACT_TEMPLATES = [
522
- { id: "plan", fileName: "plan.md", description: "Plan template with frontmatter, AC table, and traceability block.", body: PLAN_TEMPLATE },
523
- { id: "build", fileName: "build.md", description: "Build log template with commit table and hook invocation log.", body: BUILD_TEMPLATE },
601
+ { id: "plan", fileName: "plan.md", description: "Strict-mode plan template (AC table, parallelSafe, touchSurface, traceability block).", body: PLAN_TEMPLATE },
602
+ { id: "plan-soft", fileName: "plan-soft.md", description: "Soft-mode plan template (bullet-list testable conditions, no AC IDs).", body: PLAN_TEMPLATE_SOFT },
603
+ { id: "build", fileName: "build.md", description: "Strict-mode build log (six-column TDD table, RED proofs, GREEN suite evidence).", body: BUILD_TEMPLATE },
604
+ { id: "build-soft", fileName: "build-soft.md", description: "Soft-mode build log (single-cycle summary, plain git commit).", body: BUILD_TEMPLATE_SOFT },
524
605
  { id: "review", fileName: "review.md", description: "Review template with iteration table, findings table, and Five Failure Modes pass.", body: REVIEW_TEMPLATE },
525
606
  { id: "ship", fileName: "ship.md", description: "Ship notes template with AC↔commit map, push/PR section, release notes paragraph.", body: SHIP_TEMPLATE },
526
607
  { id: "decisions", fileName: "decisions.md", description: "Architect-style decision record template (D-N entries).", body: DECISIONS_TEMPLATE },
@@ -20,9 +20,14 @@ if (!state) {
20
20
  process.exit(0);
21
21
  }
22
22
 
23
- if (state.schemaVersion !== 2) {
24
- console.error("[cclaw] flow-state schema is from cclaw 7.x. cclaw v8 cannot resume it.");
25
- console.error("[cclaw] options: 1) finish/abandon the run with cclaw 7.x; 2) delete .cclaw/state/flow-state.json; 3) start a new v8 plan.");
23
+ if (state.schemaVersion === 1 || state.schemaVersion === undefined) {
24
+ console.error("[cclaw] flow-state predates cclaw v8 and cannot be auto-migrated.");
25
+ console.error("[cclaw] options: 1) finish/abandon the run with the older cclaw; 2) delete .cclaw/state/flow-state.json; 3) start a new flow.");
26
+ process.exit(0);
27
+ }
28
+
29
+ if (state.schemaVersion !== 3 && state.schemaVersion !== 2) {
30
+ console.error(\`[cclaw] unknown flow-state schemaVersion \${state.schemaVersion}.\`);
26
31
  process.exit(0);
27
32
  }
28
33
 
@@ -31,9 +36,14 @@ if (!state.currentSlug) {
31
36
  process.exit(0);
32
37
  }
33
38
 
34
- const pending = (state.ac || []).filter((item) => item.status !== "committed").length;
35
- const total = (state.ac || []).length;
36
- console.log(\`[cclaw] active: \${state.currentSlug} (stage=\${state.currentStage ?? "n/a"}); AC committed \${total - pending}/\${total}\`);
39
+ const acMode = state.triage?.acMode ?? "strict";
40
+ const ac = state.ac ?? [];
41
+ if (acMode === "strict" && ac.length > 0) {
42
+ const pending = ac.filter((item) => item.status !== "committed").length;
43
+ console.log(\`[cclaw] active: \${state.currentSlug} (stage=\${state.currentStage ?? "n/a"}, mode=strict); AC committed \${ac.length - pending}/\${ac.length}\`);
44
+ } else {
45
+ console.log(\`[cclaw] active: \${state.currentSlug} (stage=\${state.currentStage ?? "n/a"}, mode=\${acMode}).\`);
46
+ }
37
47
  `;
38
48
  const STOP_HANDOFF_HOOK = `#!/usr/bin/env node
39
49
  // cclaw stop-handoff: short reminder when the agent stops mid-flow.
@@ -53,13 +63,30 @@ async function readState() {
53
63
 
54
64
  const state = await readState();
55
65
  if (!state || !state.currentSlug) process.exit(0);
56
- const pending = (state.ac || []).filter((item) => item.status !== "committed");
57
- if (pending.length === 0) process.exit(0);
58
- console.error(\`[cclaw] stopping with \${pending.length} pending AC for \${state.currentSlug}: \${pending.map((item) => item.id).join(", ")}\`);
66
+
67
+ const acMode = state.triage?.acMode ?? "strict";
68
+ if (acMode === "strict") {
69
+ const pending = (state.ac || []).filter((item) => item.status !== "committed");
70
+ if (pending.length === 0) process.exit(0);
71
+ console.error(\`[cclaw] stopping with \${pending.length} pending AC for \${state.currentSlug}: \${pending.map((item) => item.id).join(", ")}\`);
72
+ process.exit(0);
73
+ }
74
+
75
+ console.error(\`[cclaw] stopping mid-flow for \${state.currentSlug} (stage=\${state.currentStage ?? "n/a"}, mode=\${acMode}). Run /cc to resume.\`);
59
76
  `;
60
77
  const COMMIT_HELPER_HOOK = `#!/usr/bin/env node
61
- // cclaw commit-helper: TDD-aware atomic commit per AC phase
62
- // (RED -> GREEN -> REFACTOR) + AC traceability gate.
78
+ // cclaw commit-helper: ac_mode-aware atomic commit hook.
79
+ //
80
+ // strict mode (large-risky / security-flagged):
81
+ // commit-helper.mjs --ac=AC-N --phase=red|green|refactor [--skipped] [--message="..."]
82
+ // enforces TDD cycle, AC trace, no production files in RED, full chain RED -> GREEN -> REFACTOR.
83
+ //
84
+ // soft / inline mode (small-medium / trivial):
85
+ // commit-helper.mjs --message="..."
86
+ // advisory only — proxies to git commit, prints a one-line note. --ac/--phase ignored.
87
+ //
88
+ // the mode is read from flow-state.json: state.triage.acMode. if no triage is recorded,
89
+ // default to strict (preserves v8.0/v8.1 behaviour for migrated projects).
63
90
  import { execFileSync } from "node:child_process";
64
91
  import fs from "node:fs/promises";
65
92
  import path from "node:path";
@@ -77,19 +104,58 @@ function flag(name) {
77
104
  return process.argv.includes(\`--\${name}\`);
78
105
  }
79
106
 
107
+ let state;
108
+ try {
109
+ state = JSON.parse(await fs.readFile(statePath, "utf8"));
110
+ } catch {
111
+ console.error("[commit-helper] no flow-state.json. Start a flow with /cc first.");
112
+ process.exit(2);
113
+ }
114
+
115
+ if (state.schemaVersion !== 3 && state.schemaVersion !== 2) {
116
+ console.error(\`[commit-helper] unsupported flow-state schemaVersion \${state.schemaVersion}.\`);
117
+ process.exit(2);
118
+ }
119
+
120
+ const acMode = state.triage?.acMode ?? "strict";
121
+
122
+ if (acMode !== "strict") {
123
+ // soft / inline mode: advisory passthrough.
124
+ const message = arg("message");
125
+ if (!message) {
126
+ console.error("[commit-helper] --message=\\"...\\" is required.");
127
+ process.exit(2);
128
+ }
129
+ let staged;
130
+ try {
131
+ staged = execFileSync("git", ["diff", "--cached", "--name-only"], { cwd: root, encoding: "utf8" }).trim();
132
+ } catch (error) {
133
+ console.error(\`[commit-helper] git not available: \${error.message}\`);
134
+ process.exit(2);
135
+ }
136
+ if (!staged) {
137
+ console.error("[commit-helper] nothing staged. Stage your changes before invoking commit-helper.");
138
+ process.exit(2);
139
+ }
140
+ execFileSync("git", ["commit", "-m", message], { cwd: root, stdio: "inherit" });
141
+ console.log(\`[commit-helper] committed in \${acMode} mode (no AC trace recorded).\`);
142
+ process.exit(0);
143
+ }
144
+
145
+ // strict mode below.
80
146
  const acId = arg("ac");
81
147
  const phase = arg("phase");
82
148
  const message = arg("message") ?? \`cclaw: progress on \${acId ?? "AC"}\`;
83
149
  const skipped = flag("skipped");
84
150
 
85
151
  if (!acId || !/^AC-\\d+$/.test(acId)) {
86
- console.error("[commit-helper] usage: commit-helper.mjs --ac=AC-N --phase=red|green|refactor [--skipped] [--message='...']");
152
+ console.error("[commit-helper] strict mode usage: commit-helper.mjs --ac=AC-N --phase=red|green|refactor [--skipped] [--message='...']");
87
153
  process.exit(2);
88
154
  }
89
155
 
90
156
  if (!phase || !["red", "green", "refactor"].includes(phase)) {
91
- console.error("[commit-helper] --phase is required. Allowed: red, green, refactor.");
92
- console.error("[commit-helper] build is a TDD cycle: every AC needs RED -> GREEN -> REFACTOR.");
157
+ console.error("[commit-helper] --phase is required in strict mode. Allowed: red, green, refactor.");
158
+ console.error("[commit-helper] strict-mode build is a TDD cycle: every AC needs RED -> GREEN -> REFACTOR.");
93
159
  process.exit(2);
94
160
  }
95
161
 
@@ -98,19 +164,6 @@ if (skipped && phase !== "refactor") {
98
164
  process.exit(2);
99
165
  }
100
166
 
101
- let state;
102
- try {
103
- state = JSON.parse(await fs.readFile(statePath, "utf8"));
104
- } catch {
105
- console.error("[commit-helper] no flow-state.json. Start a flow with /cc first.");
106
- process.exit(2);
107
- }
108
-
109
- if (state.schemaVersion !== 2) {
110
- console.error("[commit-helper] flow-state schema is not v8.");
111
- process.exit(2);
112
- }
113
-
114
167
  const matching = (state.ac ?? []).find((item) => item.id === acId);
115
168
  if (!matching) {
116
169
  console.error(\`[commit-helper] AC \${acId} is not declared in plan.md / flow-state.\`);