npm - cclaw-cli - Versions diffs - 8.1.2 → 8.3.0 - Mend

cclaw-cli 8.1.2 → 8.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (29) hide show

package/README.md +50 -23
package/dist/constants.d.ts +1 -1
package/dist/constants.js +1 -1
package/dist/content/antipatterns.d.ts +1 -1
package/dist/content/antipatterns.js +24 -0
package/dist/content/artifact-templates.d.ts +1 -1
package/dist/content/artifact-templates.js +83 -2
package/dist/content/node-hooks.js +80 -27
package/dist/content/skills.js +397 -13
package/dist/content/specialist-prompts/architect.d.ts +1 -1
package/dist/content/specialist-prompts/architect.js +30 -6
package/dist/content/specialist-prompts/brainstormer.d.ts +1 -1
package/dist/content/specialist-prompts/brainstormer.js +31 -8
package/dist/content/specialist-prompts/planner.d.ts +1 -1
package/dist/content/specialist-prompts/planner.js +81 -12
package/dist/content/specialist-prompts/reviewer.d.ts +1 -1
package/dist/content/specialist-prompts/reviewer.js +43 -6
package/dist/content/specialist-prompts/security-reviewer.d.ts +1 -1
package/dist/content/specialist-prompts/security-reviewer.js +31 -6
package/dist/content/specialist-prompts/slice-builder.d.ts +1 -1
package/dist/content/specialist-prompts/slice-builder.js +79 -10
package/dist/content/start-command.js +310 -153
package/dist/flow-state.d.ts +46 -6
package/dist/flow-state.js +141 -6
package/dist/run-persistence.d.ts +11 -4
package/dist/run-persistence.js +18 -7
package/dist/types.d.ts +55 -1
package/dist/types.js +28 -0
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -1,35 +1,62 @@
 # cclaw
-**cclaw is a lightweight harness-first flow toolkit for coding agents.** It installs three slash commands, six on-demand specialists, twelve auto-trigger skills (including TDD cycle and conversation-language), ten artifact templates, four stage runbooks, eight reference patterns, five research playbooks, five recovery playbooks, thirteen worked examples, an antipatterns library, a decision protocol, a meta-skill, an interactive harness picker, and a tiny runtime — together a deep content layer wrapped around a runtime under 1 KLOC — so Claude Code, Cursor, OpenCode, or Codex can move from idea to shipped change with a clear plan, AC traceability, TDD per AC, and almost no ceremony.
+**cclaw is a lightweight harness-first flow toolkit for coding agents.** Three slash commands. Five hops (`Detect → Triage → Dispatch → Pause → Compound/Ship`). Four stages (`plan → build → review → ship`, where **build IS a TDD cycle**: RED → GREEN → REFACTOR). Six on-demand specialists, all running as isolated sub-agents. Three Acceptance-Criteria modes (`inline` / `soft` / `strict`) so trivial edits do not pay the price of risky migrations. A deep content layer of skills, templates, runbooks, patterns, examples, and recovery playbooks wrapped around a runtime under 1 KLOC — so Claude Code, Cursor, OpenCode, or Codex can move from idea to shipped change with a clear plan, the right amount of ceremony, and almost no orchestrator bloat.
 ```text
-        idea
-         │
-         ▼
-     /cc <task>
-         │
-   ┌─────┴─────────────────────────────────────┐
-   │ Phase 0 calibration:                      │
-   │ targeted change or multi-component?       │
-   └─────┬─────────────────┬───────────────────┘
-         │trivial          │small/medium       │large/risky
-         ▼                 ▼                   ▼
-    edit + commit     plan → build       brainstormer →
-    per AC            → review → ship    architect → planner
-                                         (each is optional)
-                              │
-                              ▼
-                     compound (auto, gated)
-                              │
-                              ▼
-                  active artifacts → shipped/<slug>/
+            idea
+             │
+             ▼
+         /cc <task>
+             │
+   ┌─────────┴──────────────────────────────────────────┐
+   │ Hop 1: Detect — fresh start? or resume active flow? │
+   └─────────┬──────────────────────────────────────────┘
+             │ fresh
+             ▼
+   ┌────────────────────────────────────────────────────┐
+   │ Hop 2: Triage — auto-classify task,                │
+   │ recommend path + acMode, user accepts or overrides │
+   └─────────┬──────────────────────────────────────────┘
+             │
+   trivial   │   small-medium       │   large-risky
+   acMode    │   acMode soft        │   acMode strict
+   inline    │                      │
+             ▼                      ▼                      ▼
+        edit + commit        plan → build → review → ship   brainstorm? → architect? → plan → build → review → ship
+        (no plan)            each stage in a fresh sub-agent  each stage in a fresh sub-agent, parallel-build allowed
+                                     │                      │
+                                     └─────────┬────────────┘
+                                               ▼
+                                  compound (auto, gated by quality)
+                                               │
+                                               ▼
+                                   active flows → shipped/<slug>/
 ```
-Three slash commands. Four stages (`plan → build → review → ship`, where **build IS a TDD cycle**: RED → GREEN → REFACTOR per AC). Six specialists. Eleven skills (including a TDD-cycle skill that's always-on while building). Ten templates. Four runbooks. Eight reference patterns. Five research playbooks. Five recovery playbooks. Thirteen worked examples. Two mandatory gates (AC traceability + TDD phase chain).
+Three slash commands (`/cc`, `/cc-cancel`, `/cc-idea`). Four stages (`plan → build → review → ship`). Six specialists, all on-demand, all running as sub-agents. Fifteen skills including the always-on `triage-gate`, `flow-resume`, `tdd-cycle`, `conversation-language`, and `anti-slop`. Ten templates including `plan-soft.md` and `build-soft.md` for the soft-mode path. Four runbooks. Eight reference patterns. Three research playbooks. Five recovery playbooks. Eight worked examples. Two mandatory gates in strict mode (AC traceability + TDD phase chain); soft mode keeps both as advisory; inline mode skips both.
+## What changed in 8.3
+8.3 is a non-breaking content + UX patch on top of 8.2.
+- **Triage as a structured ask, not a code block.** The orchestrator now uses the harness's structured question tool (`AskUserQuestion` / `AskQuestion` / `prompt`) to render the triage. Two questions, in order: pick the path, then pick the run mode. The fenced form remains as a fallback only.
+- **Run mode: `step` (default) vs `auto`.** `step` pauses after every stage and waits for `continue` (8.2 behaviour). `auto` chains plan → build → review → ship without pausing; stops only on block findings, cap-reached, security findings, or before `ship`. New optional field `triage.runMode` in `flow-state.json`.
+- **Explicit parallel-build fan-out in Hop 3.** The `/cc` body now carries a full ASCII fan-out diagram for the strict-mode parallel-build path — `git worktree` per slice, max 5 slices, one `slice-builder` sub-agent per slice, integration reviewer, merge sequence. The skill `parallel-build.md` already had this; the orchestrator now sees it at the dispatch site.
+- **TDD cycle deepening.** Four new sections in `tdd-cycle.md`: vertical slicing / tracer bullets, stop-the-line rule, Prove-It pattern for bug fixes, writing-good-tests rules (state-not-interactions, DAMP over DRY, real-over-mock, test pyramid). Three new antipatterns: A-13 horizontal slicing, A-14 pushing past a failing test, A-15 mocking what should not be mocked.
+## What changed in 8.2
+8.2 is a non-breaking redesign of the `/cc` orchestrator on top of 8.1.
+- **Triage gate.** Every fresh flow runs the `triage-gate` skill, which classifies the task as `trivial` / `small-medium` / `large-risky` from six heuristics, recommends a path and an `acMode`, and asks the user to accept or override. The decision is persisted into `flow-state.json` so resumes never re-prompt.
+- **Graduated AC.** Acceptance Criteria are no longer one-size-fits-all. `inline` (trivial) skips them entirely. `soft` (small-medium) uses a bullet list of testable conditions with no AC IDs and an advisory commit-helper. `strict` (large-risky) is the 8.1 behaviour byte-for-byte: AC IDs, mandatory `commit-helper.mjs --ac-id=AC-N --phase=red|green|refactor`, per-AC TDD chain.
+- **Sub-agent dispatch.** `plan`, `build`, `review`, and `ship` each run in a fresh sub-agent invocation. The orchestrator hands a slim envelope (slug / stage / acMode / artifact paths) and gets back a fixed 5-to-7-line summary plus the artifact on disk. No specialist reasoning leaks into the orchestrator context.
+- **Resume.** Invoking `/cc` while a flow is active triggers the `flow-resume` skill: 4-line summary plus `r` resume / `s` show / `c` cancel / `n` start new. The triage decision is preserved across sessions.
+- **Schema bump.** `flow-state.json` is now `schemaVersion: 3` with a `triage` field. Existing v2 files are auto-migrated on first read with `acMode: strict` so existing flows behave exactly as in 8.1.
 ## What changed in v8
-cclaw v8.0 is a breaking redesign. We dropped the 7.x stage machine: no more `brainstorm` / `scope` / `design` / `spec` / `tdd` mandatory stages, no more 18 specialists, no more 9 state files, no more 30 stage gates. v7.x runs are not migrated; see [docs/migration-v7-to-v8.md](docs/migration-v7-to-v8.md).
+cclaw v8.0 was a breaking redesign of the v7 stage machine. We dropped the 7.x stage machine: no more `brainstorm` / `scope` / `design` / `spec` / `tdd` mandatory stages, no more 18 specialists, no more 9 state files, no more 30 stage gates. v7.x runs are not migrated; see [docs/migration-v7-to-v8.md](docs/migration-v7-to-v8.md).
 What we kept and made deeper:

package/dist/constants.d.ts CHANGED Viewed

@@ -1,4 +1,4 @@
-export declare const CCLAW_VERSION = "8.1.2";
+export declare const CCLAW_VERSION = "8.3.0";
 export declare const RUNTIME_ROOT = ".cclaw";
 export declare const STATE_REL_PATH = ".cclaw/state";
 export declare const HOOKS_REL_PATH = ".cclaw/hooks";

package/dist/constants.js CHANGED Viewed

@@ -1,4 +1,4 @@
-export const CCLAW_VERSION = "8.1.2";
+export const CCLAW_VERSION = "8.3.0";
 export const RUNTIME_ROOT = ".cclaw";
 export const STATE_REL_PATH = `${RUNTIME_ROOT}/state`;
 export const HOOKS_REL_PATH = `${RUNTIME_ROOT}/hooks`;

package/dist/content/antipatterns.d.ts CHANGED Viewed

	@@ -1 +1 @@
1	- export declare const ANTIPATTERNS = "# .cclaw/lib/antipatterns.md\n\nPatterns we have seen fail. Each entry is a short symptom, the underlying mistake, and the corrective action. The orchestrator and specialists open this file when a smell is detected; the reviewer cites entries as findings when applicable.\n\n## A-1 \u2014 \"Just one more AC\"\n\nSymptom. A plan starts with 4 AC and ends with 11. Most of the additions appeared during build.\n\nUnderlying mistake. Scope is being expanded mid-flight without going back to plan-stage.\n\nCorrection. When build encounters new work, surface it as a follow-up in `.cclaw/ideas.md` or a fresh slug. If the new work is genuinely required to satisfy an existing AC, that AC was wrong; cancel the slug and re-plan with a tighter AC set.\n\n## A-2 \u2014 TDD phase integrity broken\n\nSymptom (any of):\n\n- Build commits land for AC-N with `--phase=green` but no `--phase=red` recorded earlier.\n- AC has RED + GREEN commits but no `--phase=refactor` (skipped or applied) entry in flow-state.\n- A `--phase=red` commit touches `src/`, `lib/`, or `app/` \u2014 production code slipped into RED.\n- Tests for AC-N appear in a separate commit a few minutes after the AC-N implementation lands.\n\nUnderlying mistake. The TDD cycle was treated as ceremony, not as the contract. The cycle exists so the failing test encodes the AC; skipping or scrambling phases produces an audit trail that nobody can trust.\n\nCorrection. `commit-helper.mjs` enforces RED \u2192 GREEN \u2192 REFACTOR per AC. Write a failing test first and commit under `--phase=red` (test files only). Implement the smallest production change that turns it green; commit under `--phase=green`. Either commit a refactor under `--phase=refactor` or skip it explicitly with `--phase=refactor --skipped --message=\"refactor(AC-N) skipped: <reason>\"`. The reviewer cites this entry whenever the chain is incomplete.\n\n## A-3 \u2014 Work outside the AC\n\nSymptom (any of):\n\n- A small AC commit also restructures an unrelated module.\n- A commit produced by `commit-helper.mjs` contains files that are unrelated to the AC.\n- `git add -A` appears in shell history inside `/cc`.\n\nUnderlying mistake. Slice-builder absorbed unrelated edits or silently expanded scope. The AC commit no longer maps cleanly to the AC.\n\nCorrection. Stage AC-related files explicitly: `git add <path>` per file, or `git add -p` to pick hunks. Never `git add -A` inside `/cc`. If a refactor really must happen, capture it as a follow-up; if it really blocks the AC, cancel the slug and re-plan as a refactor + behaviour-change pair.\n\n## A-4 \u2014 AC that mirror sub-tasks\n\nSymptom. AC read like \"implement the helper\", \"wire the helper\", \"test the helper\".\n\nUnderlying mistake. AC are outcomes, not sub-tasks. Outcomes survive refactors; sub-tasks do not.\n\nCorrection. Rewrite AC as observable outcomes. The helper is an implementation detail, not an AC.\n\n## A-5 \u2014 Over-careful brainstormer\n\nSymptom. Brainstormer produces three pages of Context for a small task; planner is then unable to size the work.\n\nUnderlying mistake. Brainstormer ignored the routing class. Trivial / small-medium tasks should have a one-paragraph Context, not a Frame + Scope + Alternatives sweep.\n\nCorrection. Brainstormer reads the routing class first and short-circuits when the task is small. Three sentences of Context is enough for AC-1.\n\n## A-6 \u2014 \"I already looked\"\n\nSymptom. Reviewer reports a \"clear\" decision without a Five Failure Modes pass.\n\nUnderlying mistake. The Five Failure Modes pass is the artifact. Skipping it because \"I already looked\" produces no audit trail.\n\nCorrection. Reviewer always emits the Five Failure Modes block. Each item gets yes / no with citation when yes. A \"no\" with no thinking attached is fine; an absent block is not.\n\n## A-7 \u2014 Shipping with a pending AC\n\nSymptom. `runCompoundAndShip()` is invoked while flow-state has at least one AC with `status: pending`.\n\nUnderlying mistake. The agent expected the orchestrator to \"figure it out\" and complete the AC silently.\n\nCorrection. The AC traceability gate refuses ship. Either complete the AC (slice-builder) or cancel the slug (`/cc-cancel`) and re-plan with the smaller AC set. There is no override.\n\n## A-8 \u2014 Re-creating a shipped slug instead of refining\n\nSymptom. A new `/cc` invocation produces a slug whose plan is 80% identical to a slug already in `.cclaw/flows/shipped/`.\n\nUnderlying mistake. Existing-plan detection was skipped or its output was ignored.\n\nCorrection. Existing-plan detection is mandatory at the start of every `/cc`. When a shipped match is offered, the user picks refine shipped or new unrelated, not \"ignore the match\".\n\n## A-9 \u2014 Editing shipped artifacts\n\nSymptom. A shipped slug's `plan.md` is edited weeks after ship.\n\nUnderlying mistake. Shipped artifacts are immutable. Editing them invalidates the knowledge index and breaks refinement chains.\n\nCorrection. Open a refinement slug. The new slug carries `refines: <old-slug>` and contains the corrections. The old slug stays as it shipped.\n\n## A-10 \u2014 Force-push during ship\n\nSymptom. `git push --force` appears in shell history during ship.\n\nUnderlying mistake. Force-push rewrites the SHAs that flow-state and the AC traceability block reference. The chain breaks silently; nothing in the runtime detects it.\n\nCorrection. Refuse `git push --force` inside `/cc` unless the user explicitly requested it twice (initial request + confirmation). After the force-push, every recorded SHA in the slug must be re-verified by hand and updated.\n\n## A-11 \u2014 Hidden security surface\n\nSymptom. A slug ships without `security_flag: true` even though the diff added a new auth-adjacent code path.\n\nUnderlying mistake. The author judged \"this is mostly UI\" and skipped the security checklist.\n\nCorrection. `security_flag` is set whenever the diff touches authn / authz / secrets / supply chain / data exposure, even when the change feels small. The cost of a spurious security flag is a few minutes; the cost of a missed one is a CVE.\n\n## A-12 \u2014 Single test green, didn't run the suite\n\nSymptom. `flows/<slug>/build.md` GREEN evidence column shows `npm test path/to/single.test` only; full-suite run is missing.\n\nUnderlying mistake. A passing single test is not GREEN. Production change can break adjacent tests; without running the suite, the AC is shipped on a regression.\n\nCorrection. GREEN evidence must be the full relevant suite for the affected module(s), not the single test. The reviewer cites this as a block finding.\n";
1	+ export declare const ANTIPATTERNS = "# .cclaw/lib/antipatterns.md\n\nPatterns we have seen fail. Each entry is a short symptom, the underlying mistake, and the corrective action. The orchestrator and specialists open this file when a smell is detected; the reviewer cites entries as findings when applicable.\n\n## A-1 \u2014 \"Just one more AC\"\n\nSymptom. A plan starts with 4 AC and ends with 11. Most of the additions appeared during build.\n\nUnderlying mistake. Scope is being expanded mid-flight without going back to plan-stage.\n\nCorrection. When build encounters new work, surface it as a follow-up in `.cclaw/ideas.md` or a fresh slug. If the new work is genuinely required to satisfy an existing AC, that AC was wrong; cancel the slug and re-plan with a tighter AC set.\n\n## A-2 \u2014 TDD phase integrity broken\n\nSymptom (any of):\n\n- Build commits land for AC-N with `--phase=green` but no `--phase=red` recorded earlier.\n- AC has RED + GREEN commits but no `--phase=refactor` (skipped or applied) entry in flow-state.\n- A `--phase=red` commit touches `src/`, `lib/`, or `app/` \u2014 production code slipped into RED.\n- Tests for AC-N appear in a separate commit a few minutes after the AC-N implementation lands.\n\nUnderlying mistake. The TDD cycle was treated as ceremony, not as the contract. The cycle exists so the failing test encodes the AC; skipping or scrambling phases produces an audit trail that nobody can trust.\n\nCorrection. `commit-helper.mjs` enforces RED \u2192 GREEN \u2192 REFACTOR per AC. Write a failing test first and commit under `--phase=red` (test files only). Implement the smallest production change that turns it green; commit under `--phase=green`. Either commit a refactor under `--phase=refactor` or skip it explicitly with `--phase=refactor --skipped --message=\"refactor(AC-N) skipped: <reason>\"`. The reviewer cites this entry whenever the chain is incomplete.\n\n## A-3 \u2014 Work outside the AC\n\nSymptom (any of):\n\n- A small AC commit also restructures an unrelated module.\n- A commit produced by `commit-helper.mjs` contains files that are unrelated to the AC.\n- `git add -A` appears in shell history inside `/cc`.\n\nUnderlying mistake. Slice-builder absorbed unrelated edits or silently expanded scope. The AC commit no longer maps cleanly to the AC.\n\nCorrection. Stage AC-related files explicitly: `git add <path>` per file, or `git add -p` to pick hunks. Never `git add -A` inside `/cc`. If a refactor really must happen, capture it as a follow-up; if it really blocks the AC, cancel the slug and re-plan as a refactor + behaviour-change pair.\n\n## A-4 \u2014 AC that mirror sub-tasks\n\nSymptom. AC read like \"implement the helper\", \"wire the helper\", \"test the helper\".\n\nUnderlying mistake. AC are outcomes, not sub-tasks. Outcomes survive refactors; sub-tasks do not.\n\nCorrection. Rewrite AC as observable outcomes. The helper is an implementation detail, not an AC.\n\n## A-5 \u2014 Over-careful brainstormer\n\nSymptom. Brainstormer produces three pages of Context for a small task; planner is then unable to size the work.\n\nUnderlying mistake. Brainstormer ignored the routing class. Trivial / small-medium tasks should have a one-paragraph Context, not a Frame + Scope + Alternatives sweep.\n\nCorrection. Brainstormer reads the routing class first and short-circuits when the task is small. Three sentences of Context is enough for AC-1.\n\n## A-6 \u2014 \"I already looked\"\n\nSymptom. Reviewer reports a \"clear\" decision without a Five Failure Modes pass.\n\nUnderlying mistake. The Five Failure Modes pass is the artifact. Skipping it because \"I already looked\" produces no audit trail.\n\nCorrection. Reviewer always emits the Five Failure Modes block. Each item gets yes / no with citation when yes. A \"no\" with no thinking attached is fine; an absent block is not.\n\n## A-7 \u2014 Shipping with a pending AC\n\nSymptom. `runCompoundAndShip()` is invoked while flow-state has at least one AC with `status: pending`.\n\nUnderlying mistake. The agent expected the orchestrator to \"figure it out\" and complete the AC silently.\n\nCorrection. The AC traceability gate refuses ship. Either complete the AC (slice-builder) or cancel the slug (`/cc-cancel`) and re-plan with the smaller AC set. There is no override.\n\n## A-8 \u2014 Re-creating a shipped slug instead of refining\n\nSymptom. A new `/cc` invocation produces a slug whose plan is 80% identical to a slug already in `.cclaw/flows/shipped/`.\n\nUnderlying mistake. Existing-plan detection was skipped or its output was ignored.\n\nCorrection. Existing-plan detection is mandatory at the start of every `/cc`. When a shipped match is offered, the user picks refine shipped or new unrelated, not \"ignore the match\".\n\n## A-9 \u2014 Editing shipped artifacts\n\nSymptom. A shipped slug's `plan.md` is edited weeks after ship.\n\nUnderlying mistake. Shipped artifacts are immutable. Editing them invalidates the knowledge index and breaks refinement chains.\n\nCorrection. Open a refinement slug. The new slug carries `refines: <old-slug>` and contains the corrections. The old slug stays as it shipped.\n\n## A-10 \u2014 Force-push during ship\n\nSymptom. `git push --force` appears in shell history during ship.\n\nUnderlying mistake. Force-push rewrites the SHAs that flow-state and the AC traceability block reference. The chain breaks silently; nothing in the runtime detects it.\n\nCorrection. Refuse `git push --force` inside `/cc` unless the user explicitly requested it twice (initial request + confirmation). After the force-push, every recorded SHA in the slug must be re-verified by hand and updated.\n\n## A-11 \u2014 Hidden security surface\n\nSymptom. A slug ships without `security_flag: true` even though the diff added a new auth-adjacent code path.\n\nUnderlying mistake. The author judged \"this is mostly UI\" and skipped the security checklist.\n\nCorrection. `security_flag` is set whenever the diff touches authn / authz / secrets / supply chain / data exposure, even when the change feels small. The cost of a spurious security flag is a few minutes; the cost of a missed one is a CVE.\n\n## A-12 \u2014 Single test green, didn't run the suite\n\nSymptom. `flows/<slug>/build.md` GREEN evidence column shows `npm test path/to/single.test` only; full-suite run is missing.\n\nUnderlying mistake. A passing single test is not GREEN. Production change can break adjacent tests; without running the suite, the AC is shipped on a regression.\n\nCorrection. GREEN evidence must be the full relevant suite for the affected module(s), not the single test. The reviewer cites this as a block finding.\n\n## A-13 \u2014 Horizontal slicing (RED-batch then GREEN-batch)\n\nSymptom. `flows/<slug>/build.md` shows AC-1 RED, AC-2 RED, AC-3 RED committed in a row, then AC-1 GREEN, AC-2 GREEN, AC-3 GREEN. Or the slice-builder describes the build as \"tests written, now I'll implement\".\n\nUnderlying mistake. Writing all RED tests before any GREEN code means the tests describe the behaviour you guessed before you saw the real interface. Tests written this way pass when behaviour breaks (because they test the imagined shape) and fail when behaviour is fine (because the real shape diverged from the imagination). They get rewritten during the next refactor.\n\nCorrection. One test \u2192 one implementation \u2192 repeat. Each cycle informs the next. The AC-2 test is shaped by what the AC-1 implementation revealed about the real interface. `commit-helper.mjs --phase=red` for AC-2 will refuse if AC-1's chain isn't closed yet \u2014 that is the rail. See the Vertical Slicing section in `tdd-cycle.md`.\n\n## A-14 \u2014 Pushing past a failing test\n\nSymptom. Build log shows a flaky or unexpected failure on AC-2, then continues into AC-3 with \"I'll come back to AC-2 later\". Or a hook rejection silently retried with a slightly different commit message.\n\nUnderlying mistake. Errors compound. AC-3 is built on the invariants AC-2 was supposed to establish. If AC-2's RED failed for the wrong reason, you are debugging a stack of broken assumptions, and every cycle past that point makes the diagnosis harder.\n\nCorrection. Stop the line. Preserve the failure (command + 1\u20133 lines of output verbatim), reproduce in isolation, root-cause to a concrete file:line, fix once, re-run the full relevant suite, then resume the cycle. If the root cause cannot be identified in three attempts, surface a blocker to the orchestrator \u2014 do not \"make it work\" by removing the test or weakening the assertion.\n\n## A-15 \u2014 Mocking what should not be mocked\n\nSymptom. A database query test mocks the driver and asserts on `db.query` call shape; the test is green and the actual query never runs in production. Or a service test mocks every collaborator and only verifies which methods were called, in which order.\n\nUnderlying mistake. Mocking a dependency you control couples the test to the implementation. The test reads green even when the SQL is wrong, the migration is missing, the column is misspelled. Real bugs live in those gaps. Interaction-based assertions (`expect(x).toHaveBeenCalledWith(...)`) break on every refactor and provide weaker confidence than state-based assertions on the outcome.\n\nCorrection. Use a real test database (or an in-memory fake of the same shape) and assert on the outcome \u2014 the row that was inserted, the response from the query, the observable side effect \u2014 not on the call. Reach for mocks only for things genuinely outside your control: third-party APIs, time, randomness, the network. Real > Fake (in-memory) > Stub (canned data) > Mock (interaction).\n";

package/dist/content/antipatterns.js CHANGED Viewed

@@ -106,4 +106,28 @@ Patterns we have seen fail. Each entry is a short symptom, the underlying mistak
 **Underlying mistake.** A passing single test is not GREEN. Production change can break adjacent tests; without running the suite, the AC is shipped on a regression.
 **Correction.** GREEN evidence must be the **full relevant suite** for the affected module(s), not the single test. The reviewer cites this as a block finding.
+## A-13 — Horizontal slicing (RED-batch then GREEN-batch)
+**Symptom.** \`flows/<slug>/build.md\` shows AC-1 RED, AC-2 RED, AC-3 RED committed in a row, then AC-1 GREEN, AC-2 GREEN, AC-3 GREEN. Or the slice-builder describes the build as "tests written, now I'll implement".
+**Underlying mistake.** Writing all RED tests before any GREEN code means the tests describe the behaviour you *guessed* before you saw the real interface. Tests written this way pass when behaviour breaks (because they test the imagined shape) and fail when behaviour is fine (because the real shape diverged from the imagination). They get rewritten during the next refactor.
+**Correction.** One test → one implementation → repeat. Each cycle informs the next. The AC-2 test is shaped by what the AC-1 implementation revealed about the real interface. \`commit-helper.mjs --phase=red\` for AC-2 will refuse if AC-1's chain isn't closed yet — that is the rail. See the Vertical Slicing section in \`tdd-cycle.md\`.
+## A-14 — Pushing past a failing test
+**Symptom.** Build log shows a flaky or unexpected failure on AC-2, then continues into AC-3 with "I'll come back to AC-2 later". Or a hook rejection silently retried with a slightly different commit message.
+**Underlying mistake.** Errors compound. AC-3 is built on the invariants AC-2 was supposed to establish. If AC-2's RED failed for the wrong reason, you are debugging a stack of broken assumptions, and every cycle past that point makes the diagnosis harder.
+**Correction.** Stop the line. Preserve the failure (command + 1–3 lines of output verbatim), reproduce in isolation, root-cause to a concrete file:line, fix once, re-run the full relevant suite, then resume the cycle. If the root cause cannot be identified in three attempts, surface a blocker to the orchestrator — do not "make it work" by removing the test or weakening the assertion.
+## A-15 — Mocking what should not be mocked
+**Symptom.** A database query test mocks the driver and asserts on \`db.query\` call shape; the test is green and the actual query never runs in production. Or a service test mocks every collaborator and only verifies which methods were called, in which order.
+**Underlying mistake.** Mocking a dependency you control couples the test to the implementation. The test reads green even when the SQL is wrong, the migration is missing, the column is misspelled. Real bugs live in those gaps. Interaction-based assertions (\`expect(x).toHaveBeenCalledWith(...)\`) break on every refactor and provide weaker confidence than state-based assertions on the outcome.
+**Correction.** Use a real test database (or an in-memory fake of the same shape) and assert on the **outcome** — the row that was inserted, the response from the query, the observable side effect — not on the call. Reach for mocks only for things genuinely outside your control: third-party APIs, time, randomness, the network. Real > Fake (in-memory) > Stub (canned data) > Mock (interaction).
 `;

package/dist/content/artifact-templates.d.ts CHANGED Viewed

@@ -1,5 +1,5 @@
 export interface ArtifactTemplate {
-    id: "plan" | "build" | "review" | "ship" | "decisions" | "learnings" | "manifest" | "ideas";
+    id: "plan" | "plan-soft" | "build" | "build-soft" | "review" | "ship" | "decisions" | "learnings" | "manifest" | "ideas";
     fileName: string;
     description: string;
     body: string;

package/dist/content/artifact-templates.js CHANGED Viewed

@@ -92,6 +92,53 @@ _(Planner topology mode. Default: \`inline\`. \`parallel-build\` is opt-in; see
 This block is rebuilt by \`commit-helper.mjs\` after every AC commit. Do not edit by hand once a commit is recorded.
 `;
+const PLAN_TEMPLATE_SOFT = `---
+slug: SLUG-PLACEHOLDER
+stage: plan
+status: active
+ac_mode: soft
+last_specialist: null
+refines: null
+shipped_at: null
+ship_commit: null
+review_iterations: 0
+security_flag: false
+---
+# SLUG-PLACEHOLDER
+> One short paragraph: what we are doing and why. If the goal does not fit in 4 lines, the request is probably too large — split it or re-triage to large-risky.
+## Plan
+_(Planner authors this. One short paragraph describing the change end-to-end. No phases, no AC IDs.)_
+## Testable conditions
+_(Bullet list. Each line is a behaviour the slice-builder's tests must verify. Conditions are observable; if you can't name a test or manual step that proves it, drop the bullet.)_
+- _Condition 1 — observable behaviour, e.g. "Pill renders the request status (Pending / Approved / Denied)."_
+- _Condition 2._
+- _Condition 3._
+## Verification
+_(One block per layer. Tests file paths, manual steps, runner command.)_
+- \`tests/unit/<module>.test.ts\` — covers all listed conditions in one test file.
+- Manual: _open <url>, perform <action>, observe <outcome>_.
+## Touch surface
+_(Files the slice-builder is allowed to modify. Used by reviewer to flag scope creep.)_
+- \`src/<module>/<file>.ts\`
+- \`tests/unit/<module>.test.ts\`
+## Notes
+_(Optional. Brainstormer / architect did NOT run for soft-mode flows; if you discover the work needs structural decisions or threat modelling mid-flight, surface back to the orchestrator and ask to re-triage as large-risky.)_
+`;
 const BUILD_TEMPLATE = `---
 slug: SLUG-PLACEHOLDER
 stage: build
@@ -162,6 +209,38 @@ _(Append one fix-iteration block per review iteration that returned \`block\`. S
 _(Surprises, deviations from the plan, tests added, refactors that came up, paths considered and discarded, etc.)_
 `;
+const BUILD_TEMPLATE_SOFT = `---
+slug: SLUG-PLACEHOLDER
+stage: build
+status: active
+ac_mode: soft
+last_commit: null
+---
+# Build log — SLUG-PLACEHOLDER
+This is the soft-mode build log. One TDD cycle covers all listed conditions; commits are plain \`git commit\` (the commit-helper is advisory in soft mode).
+> **Iron Law:** NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST. The RED failure is the spec.
+## Plan summary
+_(One paragraph mirroring \`flows/SLUG-PLACEHOLDER/plan.md\` Plan section.)_
+## Build log
+- **Tests added**: _\`tests/unit/<module>.test.ts\` — N tests, mirroring the listed conditions._
+- **Discovery**: _\`src/<module>/<file>.ts:<line>\`, \`tests/unit/<existing>.test.ts:<line>\`._
+- **RED**: _\`<runner command>\` → N failing (expected). Cite the assertion that fails (≤3 lines)._
+- **GREEN**: _One sentence on the minimal change. \`<full-suite command>\` → all passing._
+- **REFACTOR**: _One-line shape change applied, or "skipped: <reason>"._
+- **Commit**: _\`<one-line message>\` (\`<SHA>\`)._
+- **Follow-ups**: _\`info\` items deferred to a separate slug, or "none"._
+## Notes
+_(Surprises, deviations from the plan, paths considered and discarded, etc.)_
+`;
 const REVIEW_TEMPLATE = `---
 slug: SLUG-PLACEHOLDER
 stage: review
@@ -519,8 +598,10 @@ This file is a free-form idea backlog. Entries are appended by \`/cc-idea\` and
 Each entry begins with an ISO timestamp, then a single-line summary, then the body.
 `;
 export const ARTIFACT_TEMPLATES = [
-    { id: "plan", fileName: "plan.md", description: "Plan template with frontmatter, AC table, and traceability block.", body: PLAN_TEMPLATE },
-    { id: "build", fileName: "build.md", description: "Build log template with commit table and hook invocation log.", body: BUILD_TEMPLATE },
+    { id: "plan", fileName: "plan.md", description: "Strict-mode plan template (AC table, parallelSafe, touchSurface, traceability block).", body: PLAN_TEMPLATE },
+    { id: "plan-soft", fileName: "plan-soft.md", description: "Soft-mode plan template (bullet-list testable conditions, no AC IDs).", body: PLAN_TEMPLATE_SOFT },
+    { id: "build", fileName: "build.md", description: "Strict-mode build log (six-column TDD table, RED proofs, GREEN suite evidence).", body: BUILD_TEMPLATE },
+    { id: "build-soft", fileName: "build-soft.md", description: "Soft-mode build log (single-cycle summary, plain git commit).", body: BUILD_TEMPLATE_SOFT },
     { id: "review", fileName: "review.md", description: "Review template with iteration table, findings table, and Five Failure Modes pass.", body: REVIEW_TEMPLATE },
     { id: "ship", fileName: "ship.md", description: "Ship notes template with AC↔commit map, push/PR section, release notes paragraph.", body: SHIP_TEMPLATE },
     { id: "decisions", fileName: "decisions.md", description: "Architect-style decision record template (D-N entries).", body: DECISIONS_TEMPLATE },

package/dist/content/node-hooks.js CHANGED Viewed

@@ -20,9 +20,14 @@ if (!state) {
   process.exit(0);
 }
-if (state.schemaVersion !== 2) {
-  console.error("[cclaw] flow-state schema is from cclaw 7.x. cclaw v8 cannot resume it.");
-  console.error("[cclaw] options: 1) finish/abandon the run with cclaw 7.x; 2) delete .cclaw/state/flow-state.json; 3) start a new v8 plan.");
+if (state.schemaVersion === 1 || state.schemaVersion === undefined) {
+  console.error("[cclaw] flow-state predates cclaw v8 and cannot be auto-migrated.");
+  console.error("[cclaw] options: 1) finish/abandon the run with the older cclaw; 2) delete .cclaw/state/flow-state.json; 3) start a new flow.");
+  process.exit(0);
+}
+if (state.schemaVersion !== 3 && state.schemaVersion !== 2) {
+  console.error(\`[cclaw] unknown flow-state schemaVersion \${state.schemaVersion}.\`);
   process.exit(0);
 }
@@ -31,9 +36,14 @@ if (!state.currentSlug) {
   process.exit(0);
 }
-const pending = (state.ac || []).filter((item) => item.status !== "committed").length;
-const total = (state.ac || []).length;
-console.log(\`[cclaw] active: \${state.currentSlug} (stage=\${state.currentStage ?? "n/a"}); AC committed \${total - pending}/\${total}\`);
+const acMode = state.triage?.acMode ?? "strict";
+const ac = state.ac ?? [];
+if (acMode === "strict" && ac.length > 0) {
+  const pending = ac.filter((item) => item.status !== "committed").length;
+  console.log(\`[cclaw] active: \${state.currentSlug} (stage=\${state.currentStage ?? "n/a"}, mode=strict); AC committed \${ac.length - pending}/\${ac.length}\`);
+} else {
+  console.log(\`[cclaw] active: \${state.currentSlug} (stage=\${state.currentStage ?? "n/a"}, mode=\${acMode}).\`);
+}
 `;
 const STOP_HANDOFF_HOOK = `#!/usr/bin/env node
 // cclaw stop-handoff: short reminder when the agent stops mid-flow.
@@ -53,13 +63,30 @@ async function readState() {
 const state = await readState();
 if (!state || !state.currentSlug) process.exit(0);
-const pending = (state.ac || []).filter((item) => item.status !== "committed");
-if (pending.length === 0) process.exit(0);
-console.error(\`[cclaw] stopping with \${pending.length} pending AC for \${state.currentSlug}: \${pending.map((item) => item.id).join(", ")}\`);
+const acMode = state.triage?.acMode ?? "strict";
+if (acMode === "strict") {
+  const pending = (state.ac || []).filter((item) => item.status !== "committed");
+  if (pending.length === 0) process.exit(0);
+  console.error(\`[cclaw] stopping with \${pending.length} pending AC for \${state.currentSlug}: \${pending.map((item) => item.id).join(", ")}\`);
+  process.exit(0);
+}
+console.error(\`[cclaw] stopping mid-flow for \${state.currentSlug} (stage=\${state.currentStage ?? "n/a"}, mode=\${acMode}). Run /cc to resume.\`);
 `;
 const COMMIT_HELPER_HOOK = `#!/usr/bin/env node
-// cclaw commit-helper: TDD-aware atomic commit per AC phase
-// (RED -> GREEN -> REFACTOR) + AC traceability gate.
+// cclaw commit-helper: ac_mode-aware atomic commit hook.
+//
+// strict mode (large-risky / security-flagged):
+//   commit-helper.mjs --ac=AC-N --phase=red|green|refactor [--skipped] [--message="..."]
+//   enforces TDD cycle, AC trace, no production files in RED, full chain RED -> GREEN -> REFACTOR.
+//
+// soft / inline mode (small-medium / trivial):
+//   commit-helper.mjs --message="..."
+//   advisory only — proxies to git commit, prints a one-line note. --ac/--phase ignored.
+//
+// the mode is read from flow-state.json: state.triage.acMode. if no triage is recorded,
+// default to strict (preserves v8.0/v8.1 behaviour for migrated projects).
 import { execFileSync } from "node:child_process";
 import fs from "node:fs/promises";
 import path from "node:path";
@@ -77,19 +104,58 @@ function flag(name) {
   return process.argv.includes(\`--\${name}\`);
 }
+let state;
+try {
+  state = JSON.parse(await fs.readFile(statePath, "utf8"));
+} catch {
+  console.error("[commit-helper] no flow-state.json. Start a flow with /cc first.");
+  process.exit(2);
+}
+if (state.schemaVersion !== 3 && state.schemaVersion !== 2) {
+  console.error(\`[commit-helper] unsupported flow-state schemaVersion \${state.schemaVersion}.\`);
+  process.exit(2);
+}
+const acMode = state.triage?.acMode ?? "strict";
+if (acMode !== "strict") {
+  // soft / inline mode: advisory passthrough.
+  const message = arg("message");
+  if (!message) {
+    console.error("[commit-helper] --message=\\"...\\" is required.");
+    process.exit(2);
+  }
+  let staged;
+  try {
+    staged = execFileSync("git", ["diff", "--cached", "--name-only"], { cwd: root, encoding: "utf8" }).trim();
+  } catch (error) {
+    console.error(\`[commit-helper] git not available: \${error.message}\`);
+    process.exit(2);
+  }
+  if (!staged) {
+    console.error("[commit-helper] nothing staged. Stage your changes before invoking commit-helper.");
+    process.exit(2);
+  }
+  execFileSync("git", ["commit", "-m", message], { cwd: root, stdio: "inherit" });
+  console.log(\`[commit-helper] committed in \${acMode} mode (no AC trace recorded).\`);
+  process.exit(0);
+}
+// strict mode below.
 const acId = arg("ac");
 const phase = arg("phase");
 const message = arg("message") ?? \`cclaw: progress on \${acId ?? "AC"}\`;
 const skipped = flag("skipped");
 if (!acId || !/^AC-\\d+$/.test(acId)) {
-  console.error("[commit-helper] usage: commit-helper.mjs --ac=AC-N --phase=red|green|refactor [--skipped] [--message='...']");
+  console.error("[commit-helper] strict mode usage: commit-helper.mjs --ac=AC-N --phase=red|green|refactor [--skipped] [--message='...']");
   process.exit(2);
 }
 if (!phase || !["red", "green", "refactor"].includes(phase)) {
-  console.error("[commit-helper] --phase is required. Allowed: red, green, refactor.");
-  console.error("[commit-helper] build is a TDD cycle: every AC needs RED -> GREEN -> REFACTOR.");
+  console.error("[commit-helper] --phase is required in strict mode. Allowed: red, green, refactor.");
+  console.error("[commit-helper] strict-mode build is a TDD cycle: every AC needs RED -> GREEN -> REFACTOR.");
   process.exit(2);
 }
@@ -98,19 +164,6 @@ if (skipped && phase !== "refactor") {
   process.exit(2);
 }
-let state;
-try {
-  state = JSON.parse(await fs.readFile(statePath, "utf8"));
-} catch {
-  console.error("[commit-helper] no flow-state.json. Start a flow with /cc first.");
-  process.exit(2);
-}
-if (state.schemaVersion !== 2) {
-  console.error("[commit-helper] flow-state schema is not v8.");
-  process.exit(2);
-}
 const matching = (state.ac ?? []).find((item) => item.id === acId);
 if (!matching) {
   console.error(\`[commit-helper] AC \${acId} is not declared in plan.md / flow-state.\`);