cclaw-cli 8.3.0 → 8.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -92,6 +92,7 @@ Stage: <stage> ✅ complete | ⏸ paused | ❌ blocked
92
92
  Artifact: .cclaw/flows/<slug>/<stage>.md
93
93
  What changed: <one sentence; e.g. "5 testable conditions written" or "AC-1 RED+GREEN+REFACTOR committed">
94
94
  Open findings: <0 outside review; integer in review>
95
+ Confidence: <high | medium | low>
95
96
  Recommended next: <continue | review-pause | fix-only | cancel>
96
97
  \`\`\``;
97
98
  export const START_COMMAND_BODY = `# /cc — cclaw orchestrator
@@ -100,15 +101,16 @@ You are the **cclaw orchestrator**. Your job is to *coordinate*: detect what flo
100
101
 
101
102
  User input: ${"`{{TASK}}`"}.
102
103
 
103
- The flow has five hops, in order:
104
+ The flow has six hops, in order:
104
105
 
105
106
  1. **Detect** — fresh \`/cc\` or resume?
106
107
  2. **Triage** — only on fresh starts; classify and confirm with the user.
107
- 3. **Dispatch** — for each stage on the chosen path, hand off to a sub-agent.
108
- 4. **Pause** — after each stage, summarise and wait for "continue" / "show" / "cancel".
109
- 5. **Ship** — last hop on \`small/medium\` and \`large-risky\` paths; \`trivial\` skips this.
108
+ 3. **Pre-flight (Hop 2.5)** — only on fresh starts AND only when the path is not \`inline\`; surface 3-7 assumptions; user confirms before any specialist runs.
109
+ 4. **Dispatch** — for each stage on the chosen path, hand off to a sub-agent.
110
+ 5. **Pause** — after each stage, summarise and wait for "continue" / "show" / "cancel".
111
+ 6. **Ship + Compound** — last hops on \`small/medium\` and \`large-risky\` paths; \`trivial\` skips both.
110
112
 
111
- Skipping any hop is a bug; the gates downstream will fail. Read \`triage-gate.md\`, \`flow-resume.md\`, \`tdd-cycle.md\` (active during build), and \`ac-traceability.md\` (active in strict mode) before starting.
113
+ Skipping any hop is a bug; the gates downstream will fail. Read \`triage-gate.md\`, \`pre-flight-assumptions.md\`, \`flow-resume.md\`, \`tdd-cycle.md\` (active during build), and \`ac-traceability.md\` (active in strict mode) before starting.
112
114
 
113
115
  ## Hop 1 — Detect
114
116
 
@@ -149,11 +151,11 @@ ${TRIAGE_PERSIST_EXAMPLE}
149
151
 
150
152
  The triage decision is **immutable** for the lifetime of the flow. If the user wants a different acMode or runMode mid-flight, the path is \`/cc-cancel\` and a fresh \`/cc\` invocation.
151
153
 
152
- After triage, the rest of the orchestrator runs the stages listed in \`triage.path\`, in order. Pause behaviour between stages is controlled by \`triage.runMode\` — see Hop 4.
154
+ After triage, the rest of the orchestrator runs the stages listed in \`triage.path\`, in order. Pause behaviour between stages is controlled by \`triage.runMode\` — see Hop 4. Before the first dispatch, run **Hop 2.5 (pre-flight)** unless the path is \`inline\`.
153
155
 
154
156
  ### Trivial path (acMode: inline)
155
157
 
156
- \`triage.path\` is \`["build"]\`. Skip plan/review/ship. Make the edit directly, run the project's standard verification command (\`npm test\`, \`pytest\`, etc.) once if there is one, commit with plain \`git commit\`. Single message back to the user with the commit SHA. Done.
158
+ \`triage.path\` is \`["build"]\`. Skip plan/review/ship — and skip pre-flight (Hop 2.5) along with them. Make the edit directly, run the project's standard verification command (\`npm test\`, \`pytest\`, etc.) once if there is one, commit with plain \`git commit\`. Single message back to the user with the commit SHA. Done.
157
159
 
158
160
  This is the only path where the orchestrator writes code itself; everything else dispatches a sub-agent.
159
161
 
@@ -163,7 +165,32 @@ Run the \`flow-resume.md\` skill. Render the resume summary:
163
165
 
164
166
  ${RESUME_SUMMARY_EXAMPLE}
165
167
 
166
- Wait for r/s/c (and n on collision). On \`r\`, jump to Hop 3 with the saved \`currentStage\`. On \`s\`, open the artifact and stop. On \`c\`, run \`/cc-cancel\` semantics (move artifacts to \`cancelled/<slug>/\`, reset state).
168
+ Wait for r/s/c (and n on collision). On \`r\`, jump to Hop 4 with the saved \`currentStage\` — pre-flight is **not** re-run on resume; the saved \`triage.assumptions\` is read from disk. On \`s\`, open the artifact and stop. On \`c\`, run \`/cc-cancel\` semantics (move artifacts to \`cancelled/<slug>/\`, reset state).
169
+
170
+ ## Hop 2.5 — Pre-flight (fresh starts on non-inline paths)
171
+
172
+ Run the \`pre-flight-assumptions.md\` skill. Surface 3-7 numbered assumptions covering stack, conventions, architecture defaults, and out-of-scope items. Use the harness's structured ask tool with four options (\`Proceed\` / \`Edit one\` / \`Edit several\` / \`Cancel\`); fall back to a fenced block only when no structured ask is available.
173
+
174
+ \`\`\`
175
+ Pre-flight — I'm about to run with these assumptions:
176
+
177
+ 1. <stack: lang version, framework, runtime> (read from <file>)
178
+ 2. <test convention: location + filename pattern> (read from <file or shipped slug>)
179
+ 3. <architecture default 1>
180
+ 4. <architecture default 2>
181
+ 5. <out-of-scope default>
182
+
183
+ Correct me now or I proceed with these.
184
+ \`\`\`
185
+
186
+ Persist the user-confirmed list to \`flow-state.json\` under \`triage.assumptions\` (string array). The list is **immutable** for the lifetime of the flow.
187
+
188
+ Skip rules:
189
+ - \`triage.path == ["build"]\` (inline) → skip Hop 2.5 entirely.
190
+ - Resume from a paused flow → skip Hop 2.5 (saved \`assumptions\` is already on disk).
191
+ - \`flow-state.json\` already has \`triage.assumptions\` populated (mid-flight resume) → read but do not re-prompt.
192
+
193
+ Every dispatch envelope from Hop 3 onward includes the line \`Pre-flight assumptions: see triage.assumptions in flow-state.json\`. Sub-agents read the list; planner and architect copy it verbatim into their artifacts.
167
194
 
168
195
  ## Hop 3 — Dispatch
169
196
 
@@ -213,11 +240,11 @@ The orchestrator reads only this. The full artifact stays in \`.cclaw/flows/<slu
213
240
  #### plan
214
241
 
215
242
  - Specialist: \`planner\`.
216
- - Inputs: triage decision, the user's original prompt, \`.cclaw/lib/templates/plan.md\`, and any matching shipped slug if refining.
217
- - Output: \`.cclaw/flows/<slug>/plan.md\` with \`status: active\`.
243
+ - Inputs: triage decision (including \`assumptions\` from Hop 2.5), the user's original prompt, \`.cclaw/lib/templates/plan.md\`, **\`.cclaw/knowledge.jsonl\`** (append-only log of every shipped slug — planner reads up to 3 relevant prior entries and copies their lessons into the plan body), and any matching shipped slug if refining.
244
+ - Output: \`.cclaw/flows/<slug>/plan.md\` with \`status: active\`. Includes a \`## Assumptions\` block (verbatim from triage) and a \`## Prior lessons\` block (1-3 cross-flow lessons or "No prior shipped slugs apply to this task.").
218
245
  - Soft-mode plan body: bullet list of testable conditions, no AC IDs, no commit-trace block.
219
246
  - Strict-mode plan body: AC table with IDs, verification lines, touch surfaces, parallel-build topology if it applies.
220
- - Slim summary: condition / AC count, max touch surface, parallel-build flag, recommended-next.
247
+ - Slim summary: condition / AC count, max touch surface, parallel-build flag, recommended-next, prior-lesson count.
221
248
 
222
249
  #### build
223
250
 
@@ -307,11 +334,75 @@ Hard rules:
307
334
 
308
335
  #### ship
309
336
 
310
- - Specialist: \`reviewer\` mode=\`release\` AND \`security-reviewer\` mode=\`threat-model\` if \`security_flag\` is true.
311
- - Pattern: **parallel fan-out + merge** (the only fan-out cclaw uses). Dispatch both specialists in the same message; merge their summaries in your context.
337
+ - Specialists fanned out in parallel (the only fan-out cclaw uses):
338
+ - \`reviewer\` mode=\`release\` always.
339
+ - \`reviewer\` mode=\`adversarial\` — **strict mode only** (see below).
340
+ - \`security-reviewer\` mode=\`threat-model\` — when \`security_flag\` is true.
341
+ - Pattern: **parallel fan-out + merge** (the canonical cclaw fan-out). Dispatch all specialists in the same message; merge their summaries in your context.
312
342
  - Inputs: \`.cclaw/flows/<slug>/plan.md\`, build.md, review.md.
313
- - Output: \`.cclaw/flows/<slug>/ship.md\` with the go/no-go decision, AC↔commit map (strict) or condition checklist (soft), release notes, and rollback plan.
314
- - After ship, run the compound learning gate (Hop 5).
343
+ - Output: \`.cclaw/flows/<slug>/ship.md\` with the go/no-go decision, AC↔commit map (strict) or condition checklist (soft), release notes, and rollback plan. Plus, in strict mode, \`.cclaw/flows/<slug>/pre-mortem.md\` written by the adversarial reviewer (see below).
344
+ - After ship, run the compound learning gate (Hop 6).
345
+
346
+ ##### Adversarial pre-mortem (strict mode only)
347
+
348
+ Before the ship gate finalises, the orchestrator dispatches \`reviewer\` mode=\`adversarial\` against the diff produced for this slug. The adversarial reviewer's specific job is to **think like the failure**: how would this break in production a week from now?
349
+
350
+ The adversarial sweep produces \`.cclaw/flows/<slug>/pre-mortem.md\`:
351
+
352
+ \`\`\`markdown
353
+ ---
354
+ slug: <slug>
355
+ stage: ship
356
+ status: pre-mortem
357
+ generated_by: reviewer mode=adversarial
358
+ generated_at: <iso>
359
+ ---
360
+
361
+ # Pre-mortem — <slug>
362
+
363
+ It is now <ship-date>+7d. This change shipped, then failed. What was the failure?
364
+
365
+ ## Most likely failure modes
366
+
367
+ 1. **<class>: <one-line failure>** — trigger: <input/condition>; impact: <user-visible result>; covered by AC: <yes/no, AC-N or "no AC tests this">.
368
+ 2. **<class>: ...**
369
+ 3. ...
370
+
371
+ ## Underexplored axes
372
+
373
+ - <axis (correctness/readability/architecture/security/perf)>: <what reviewer's code-mode pass might have missed>
374
+ - ...
375
+
376
+ ## Recommended pre-ship actions
377
+
378
+ - <add a regression test for failure 1: file:line>
379
+ - <surface decision X to the user before merge>
380
+ - <none — pre-mortem is satisfied>
381
+ \`\`\`
382
+
383
+ Failure classes the adversarial pass MUST consider (mark each as "covered" / "not covered" / "n/a"):
384
+
385
+ - **data-loss** — write paths that could lose user data on rollback or partial failure;
386
+ - **race** — concurrent operations on shared state without locking / ordering guarantees;
387
+ - **regression** — prior-shipped behaviour an existing test does not pin;
388
+ - **rollback impossibility** — schema migration / persisted state shape that cannot be reverted;
389
+ - **accidental scope** — diff touches files no AC mentions;
390
+ - **security-edge** — auth bypass, injection, leaked secret in logs, untrusted input.
391
+
392
+ The adversarial reviewer treats every "not covered" as a finding (axis varies; severity \`required\` by default, escalated to \`critical\` for data-loss / security-edge). Findings go into the existing Concern Ledger in \`review.md\`; the pre-mortem.md is a parallel artifact summarising the adversarial pass's reasoning so the user can read a one-page rationale.
393
+
394
+ Ship gate decision after fan-out:
395
+
396
+ | reviewer:release | reviewer:adversarial | security-reviewer | gate |
397
+ | --- | --- | --- | --- |
398
+ | clear | clear | clear | clear → ship may proceed |
399
+ | clear | block | any | block → fix-only loop or user override |
400
+ | any | any | block | block → fix-only loop |
401
+ | clear | warn | clear | warn → render adversarial findings, ask user |
402
+
403
+ The adversarial pass runs **once per ship attempt**, not iteratively. If it produces \`block\`-level findings, the orchestrator dispatches \`slice-builder\` mode=\`fix-only\` and re-runs the **regular** reviewer (mode=\`code\`) to confirm the fix; the adversarial pass does not re-run unless the user explicitly requests it (the marginal value drops fast on second run).
404
+
405
+ In \`soft\` mode the adversarial pass is **skipped** by default — the lighter-weight regular reviewer is enough for small/medium work. The user can opt in with \`/cc <task> --adversarial\` if they want the extra sweep regardless.
315
406
 
316
407
  ### Discovery (large-risky only)
317
408
 
@@ -346,10 +437,27 @@ After every dispatch returns:
346
437
  - \`reviewer\` returned \`block\` decision (open findings) → render the findings, ask \`continue with fix-only\` / \`cancel\`.
347
438
  - \`security-reviewer\` raised any finding → ask before proceeding.
348
439
  - \`reviewer\` returned \`cap-reached\` (5 iterations without convergence) → ask.
440
+ - **A returned slim summary has \`Confidence: low\`** → ask before proceeding (covered in detail below).
349
441
  - About to run \`ship\` (last stage in \`triage.path\`) → ask \`ship now?\` once, then proceed on confirmation. Ship is the only stage that always confirms in autopilot.
350
442
 
351
443
  Auto mode never silently skips a hard gate; it just removes the cosmetic pause between green stages. The user typed \`auto\` once during triage and meant it.
352
444
 
445
+ ### Confidence as a hard gate (both modes)
446
+
447
+ Every slim summary carries a \`Confidence: high | medium | low\` line. The orchestrator reads it and treats it as a quality signal for the dispatch that just returned, not a prediction of the next stage:
448
+
449
+ | Confidence | step mode | auto mode |
450
+ | --- | --- | --- |
451
+ | \`high\` | normal pause; render summary, ask continue | normal flow; chain to next stage |
452
+ | \`medium\` | normal pause; render summary, mention confidence in the user-facing line ("Plan ready (medium confidence — see Notes). Continue?") | render the summary inline ("medium — see Notes"); chain anyway. The Notes line is required when confidence is medium |
453
+ | \`low\` | hard gate. Render the summary, do **not** offer \`continue\` as a verb. Offer: \`expand <stage>\` (re-dispatch the same specialist with a richer envelope), \`show\` (open the artifact), \`override\` (acknowledge the risk and continue anyway), \`cancel\` | hard gate. Stop chaining. Render the summary, ask the same expand/show/override/cancel question. \`override\` is the only word that resumes auto-chaining |
454
+
455
+ A specialist that returns \`Confidence: low\` MUST also write a non-empty \`Notes:\` line that explains the dimension that drove confidence down (missing input, unverified citation, partial coverage, etc.). The orchestrator surfaces that Notes line verbatim — the sub-agent is the only one with the context to explain.
456
+
457
+ Repeated low-confidence on the same stage (the second consecutive dispatch returns low) is itself a routing signal: the orchestrator should suggest re-triage with a richer path (e.g. \`small/medium\` → \`large-risky\`) or splitting the slug, rather than dispatching the same specialist a third time.
458
+
459
+ Override is sticky to **this stage only** — the next stage starts with the normal high-confidence-default behaviour.
460
+
353
461
  ### Common rules for both modes
354
462
 
355
463
  Resume from a fresh session works because everything is on disk: \`flow-state.json\` has \`currentStage\`, \`triage\` (with \`runMode\`), \`flows/<slug>/*.md\` carries the artifacts. The next \`/cc\` invocation enters Hop 1 → detect → resume summary → continue from \`currentStage\` with the saved runMode.
@@ -373,7 +481,8 @@ After ship + compound, move every \`<stage>.md\` from \`flows/<slug>/\` into \`.
373
481
 
374
482
  - Always run the triage gate on a fresh \`/cc\`. Never silently pick a path. Use the harness's structured question tool, not a printed code block.
375
483
  - In \`step\` mode, always pause after every stage. Never auto-advance.
376
- - In \`auto\` mode, never auto-advance past a hard gate (block / cap-reached / security finding / ship). The user opted into chaining green stages, not chaining decisions.
484
+ - In \`auto\` mode, never auto-advance past a hard gate (block / cap-reached / security finding / **Confidence: low** / ship). The user opted into chaining green stages, not chaining decisions.
485
+ - Always honour \`Confidence: low\` in the slim summary. Stop and ask, both modes. See "Confidence as a hard gate" above.
377
486
  - Always ask before \`git push\` or PR creation. Commit-helper auto-commits in strict mode; everything past commit is opt-in.
378
487
  - Always ask before deleting active artifacts (\`/cc-cancel\` is the supported way; do not \`rm\` artifacts directly).
379
488
  - Always show the slim summary back to the user; do not summarise from your own memory of the dispatch.
@@ -391,6 +500,7 @@ These skills auto-trigger during \`/cc\`. Do not re-explain them; obey them.
391
500
  - **conversation-language** — always-on; reply in the user's language but never translate \`AC-N\`, \`D-N\`, \`F-N\`, slugs, paths, frontmatter keys, mode names, or hook output.
392
501
  - **anti-slop** — always-on for any code-modifying step; bans redundant verification and environment shims.
393
502
  - **triage-gate** — Hop 2 of every fresh \`/cc\`.
503
+ - **pre-flight-assumptions** — Hop 2.5 of every fresh non-inline \`/cc\`; surfaces 3-7 stack/convention/architecture defaults for user confirmation.
394
504
  - **flow-resume** — when \`/cc\` is invoked with no task or with an active flow.
395
505
  - **plan-authoring** — on every edit to \`.cclaw/flows/<slug>/plan.md\`.
396
506
  - **ac-traceability** — strict mode only; before every commit.
@@ -398,7 +508,8 @@ These skills auto-trigger during \`/cc\`. Do not re-explain them; obey them.
398
508
  - **refinement** — when an existing plan match is detected.
399
509
  - **parallel-build** — strict mode + planner topology=parallel-build; enforces 5-slice cap and worktree dispatch.
400
510
  - **security-review** — when the diff touches sensitive surfaces.
401
- - **review-loop** — wraps every reviewer / security-reviewer invocation; runs the Concern Ledger + convergence detector.
511
+ - **review-loop** — wraps every reviewer / security-reviewer invocation; runs the Concern Ledger + Five-axis pass + convergence detector.
512
+ - **source-driven** — strict mode only (opt-in for soft); architect/planner detect stack version, fetch official doc deep-links, cite URLs, mark UNVERIFIED when docs are missing.
402
513
 
403
514
  ${ironLawsMarkdown()}
404
515
  `;
@@ -33,6 +33,17 @@ export declare function isDiscoverySpecialist(value: unknown): value is Discover
33
33
  export declare function createInitialFlowState(nowIso?: string): FlowStateV82;
34
34
  /** @deprecated kept for source-level compatibility with v8.1 imports. */
35
35
  export declare const createInitialFlowStateV8: typeof createInitialFlowState;
36
+ /**
37
+ * Read a triage decision's pre-flight assumptions.
38
+ *
39
+ * Returns:
40
+ * - `[]` when no pre-flight ran (legacy state, trivial path, or older
41
+ * `step`/`auto` flow-state with no assumptions field). Callers should
42
+ * treat this as "no captured assumptions, do not surface anything".
43
+ * - the recorded array (possibly empty if the pre-flight ran but the user
44
+ * confirmed there were no assumptions to record — rare but valid).
45
+ */
46
+ export declare function assumptionsOf(triage: TriageDecision | null | undefined): readonly string[];
36
47
  /**
37
48
  * Read a triage decision's runMode with the documented default.
38
49
  *
@@ -123,6 +123,32 @@ function assertTriageOrNull(value) {
123
123
  if (triage.runMode !== undefined && !isRunMode(triage.runMode)) {
124
124
  throw new Error(`Invalid triage.runMode: ${String(triage.runMode)}`);
125
125
  }
126
+ if (triage.assumptions !== undefined && triage.assumptions !== null) {
127
+ if (!Array.isArray(triage.assumptions)) {
128
+ throw new Error("triage.assumptions must be an array, null, or absent");
129
+ }
130
+ for (const entry of triage.assumptions) {
131
+ if (typeof entry !== "string") {
132
+ throw new Error("triage.assumptions entries must be strings");
133
+ }
134
+ }
135
+ }
136
+ }
137
+ /**
138
+ * Read a triage decision's pre-flight assumptions.
139
+ *
140
+ * Returns:
141
+ * - `[]` when no pre-flight ran (legacy state, trivial path, or older
142
+ * `step`/`auto` flow-state with no assumptions field). Callers should
143
+ * treat this as "no captured assumptions, do not surface anything".
144
+ * - the recorded array (possibly empty if the pre-flight ran but the user
145
+ * confirmed there were no assumptions to record — rare but valid).
146
+ */
147
+ export function assumptionsOf(triage) {
148
+ const value = triage?.assumptions;
149
+ if (value === null || value === undefined)
150
+ return [];
151
+ return value;
126
152
  }
127
153
  /**
128
154
  * Read a triage decision's runMode with the documented default.
package/dist/types.d.ts CHANGED
@@ -79,6 +79,23 @@ export interface TriageDecision {
79
79
  * validate; readers MUST default to `step` on absent.
80
80
  */
81
81
  runMode?: RunMode;
82
+ /**
83
+ * Pre-flight assumptions surfaced at Hop 2.5 (between triage and first
84
+ * dispatch). Each entry is one short sentence the orchestrator was about
85
+ * to silently default to (stack pick, lib version, file layout, target
86
+ * platform, code-style preference). The user either acknowledged or
87
+ * corrected these before any sub-agent ran.
88
+ *
89
+ * Optional and skipped entirely on the inline path. On soft/strict, the
90
+ * pre-flight skill writes 3-7 entries here; subsequent flows in the same
91
+ * project may seed defaults from the most recent shipped slug's
92
+ * `assumptions:` block.
93
+ *
94
+ * Reading rule: `null` or absent means "no pre-flight ran" (legacy state
95
+ * or trivial path). An empty array means "ran and the user accepted no
96
+ * assumptions are needed", which is rare but valid.
97
+ */
98
+ assumptions?: string[] | null;
82
99
  }
83
100
  export interface CliContext {
84
101
  cwd: string;
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "cclaw-cli",
3
- "version": "8.3.0",
3
+ "version": "8.4.0",
4
4
  "description": "Lightweight harness-first flow toolkit for coding agents",
5
5
  "type": "module",
6
6
  "bin": {