npm - cclaw-cli - Versions diffs - 0.8.0 → 0.10.0 - Mend

cclaw-cli 0.8.0 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

package/dist/content/examples.d.ts +16 -0
package/dist/content/examples.js +364 -55
package/dist/content/harness-tool-refs.d.ts +20 -0
package/dist/content/harness-tool-refs.js +240 -0
package/dist/content/hooks.js +48 -2
package/dist/content/meta-skill.js +72 -4
package/dist/content/skills.d.ts +5 -0
package/dist/content/skills.js +118 -46
package/dist/content/stage-schema.d.ts +9 -3
package/dist/content/stage-schema.js +72 -22
package/dist/content/subagents.js +21 -0
package/dist/content/templates.js +13 -3
package/dist/doctor.js +82 -0
package/dist/harness-adapters.js +11 -3
package/dist/install.js +25 -1
package/dist/policy.js +1 -1
package/package.json +1 -1

package/dist/content/harness-tool-refs.js ADDED Viewed

@@ -0,0 +1,240 @@
+/**
+ * Per-harness tool-mapping reference files.
+ *
+ * Addresses A.1#4: the four supported harnesses (claude, cursor, opencode, codex)
+ * expose different primitive names for the same capabilities (ask-user,
+ * delegate/Task, web fetch, file edit, code execution, ...). cclaw's stage skills
+ * need to pick the right name at runtime without bloating every stage with per-harness
+ * if/else ladders.
+ *
+ * Each file below is short (one table per capability), authoritative, and materialised
+ * at `.cclaw/references/harness-tools/<harness>.md`. Stage skills and the meta-skill
+ * cite the folder instead of duplicating the mappings inline.
+ *
+ * When a new harness is added (or an existing one renames a tool), update the
+ * corresponding entry here — do NOT scatter tool names across skill text.
+ */
+export const HARNESS_TOOL_REFS_DIR = "references/harness-tools";
+const CLAUDE_TOOLS_MD = `---
+harness: claude
+name: Claude Code tool map
+description: "Canonical mapping of cclaw capability names → Claude Code tool names. Cited by stage skills; do not duplicate in per-stage text."
+---
+# Claude Code — Tool Map
+Use this file as the single source of truth for which Claude Code tool to call when a cclaw skill references a generic capability.
+## Core capabilities
+| cclaw capability | Claude Code tool | Notes |
+|---|---|---|
+| Ask user a structured question | \`AskUserQuestion\` | Max 4 options; lettered labels ≤12 chars. Fall back to plain-text lettered list on schema error. |
+| Dispatch a subagent (read-only or write) | \`Task\` with \`subagent_type\` | \`explore\` = read-only; \`generalPurpose\` = read-write. Background via \`run_in_background: true\`. |
+| Read file | \`Read\` | Prefer this over \`cat\` / \`head\` / \`tail\`. |
+| Edit file | \`StrReplace\` (exact match) or \`Write\` (overwrite) | Always \`Read\` before editing; avoid \`sed\`/\`awk\` unless asked. |
+| Create file | \`Write\` | Reject if the task can be solved by editing an existing file. |
+| Search file contents | \`Grep\` (ripgrep-backed) | Use \`output_mode: files_with_matches\` for file lists. |
+| Find files by name / glob | \`Glob\` | Pattern matches mtime-sorted. |
+| Shell command | \`Shell\` | Background long-running jobs with \`block_until_ms: 0\`; poll with \`Await\`. |
+| Fetch URL | \`WebFetch\` | Returns markdown. No auth, no binaries. |
+| Web search | \`WebSearch\` | Use for docs, real-time info, version lookups. |
+| Semantic code search | \`SemanticSearch\` | One directory per call; whole-repo via \`[]\`. |
+| Todo tracking | \`TodoWrite\` | Use \`merge: true\` to update; keep one task \`in_progress\`. |
+| Ask tool (multi-question) | \`AskQuestion\` (Cursor-only, unavailable in Claude) | NOT available in Claude — use \`AskUserQuestion\` instead. |
+| MCP tool call | \`CallMcpTool\` | Always read the tool's schema descriptor first. |
+## Decision-protocol mapping
+When a stage skill says "ask the user a structured question", in Claude Code that means:
+\`\`\`
+AskUserQuestion({
+  questions: [{
+    id: "...",
+    prompt: "One-sentence decision, plain English",
+    options: [
+      { id: "a", label: "Short label" },   // ≤12 chars
+      { id: "b", label: "Alt label" },
+      { id: "c", label: "Recommended" }
+    ]
+  }]
+})
+\`\`\`
+One question per call. Never batch.
+## Escalation / fall-back
+If a tool returns a schema error twice in a row (see the meta-skill's Error / Retry Budget), switch to plain-text equivalents:
+- \`AskUserQuestion\` → write a numbered list in the response, wait for reply.
+- \`Task\` (dispatch) → inline the work in the current turn.
+- \`WebFetch\` → ask the user for the URL's content.
+`;
+const CURSOR_TOOLS_MD = `---
+harness: cursor
+name: Cursor tool map
+description: "Canonical mapping of cclaw capability names → Cursor agent tool names. Cited by stage skills; do not duplicate in per-stage text."
+---
+# Cursor — Tool Map
+Use this file as the single source of truth for which Cursor agent tool to call when a cclaw skill references a generic capability.
+## Core capabilities
+| cclaw capability | Cursor tool | Notes |
+|---|---|---|
+| Ask user a structured question | \`AskQuestion\` | \`questions\` is an array; each question has \`id\`, \`prompt\`, \`options\`, optional \`allow_multiple\`. |
+| Dispatch a subagent | \`Task\` with \`subagent_type\` | Available types: \`generalPurpose\`, \`explore\` (readonly), \`shell\`, \`browser-use\`, \`best-of-n-runner\`. |
+| Read file | \`Read\` | Line-numbered output; avoid \`cat\` / \`head\` / \`tail\`. |
+| Edit file | \`StrReplace\` | Unique \`old_string\` required; use \`replace_all: true\` for bulk renames. |
+| Create file | \`Write\` | Prefer editing existing files. |
+| Search file contents | \`Grep\` (ripgrep-backed) | Output modes: \`content\`, \`files_with_matches\`, \`count\`. |
+| Find files by name / glob | \`Glob\` | Auto-prepends \`**/\` when pattern does not start with it. |
+| Shell command | \`Shell\` | Long-running jobs go to background via \`block_until_ms: 0\`; poll with \`Await\`. |
+| Fetch URL | \`WebFetch\` | Markdown output. |
+| Web search | \`WebSearch\` | Use for real-time info, framework docs, news. |
+| Semantic code search | \`SemanticSearch\` | Prefer for exploratory "how does X work?" queries. |
+| Todo tracking | \`TodoWrite\` | Supports \`merge: true\` for partial updates. |
+| Generate image | \`GenerateImage\` | Only on explicit user request. |
+| Ask structured questions (Claude-style) | \`AskUserQuestion\` | NOT available in Cursor — use \`AskQuestion\`. |
+| MCP tool call | \`CallMcpTool\` | Cursor exposes MCP tools via this wrapper; read the descriptor first. |
+| Jupyter notebook edit | \`EditNotebook\` | Use for \`.ipynb\` only; cell-granular edits. |
+| Mode switching | \`SwitchMode\` | Propose plan/agent mode changes when task character shifts. |
+## Decision-protocol mapping
+In Cursor, structured asks look like:
+\`\`\`
+AskQuestion({
+  questions: [{
+    id: "...",
+    prompt: "One-sentence decision",
+    options: [
+      { id: "a", label: "Option A" },
+      { id: "b", label: "Option B" }
+    ]
+  }]
+})
+\`\`\`
+## Escalation / fall-back
+On repeated tool errors, fall back to plain-text equivalents just like Claude — see the meta-skill's Error / Retry Budget.
+`;
+const OPENCODE_TOOLS_MD = `---
+harness: opencode
+name: OpenCode tool map
+description: "Canonical mapping of cclaw capability names → OpenCode primitives. Cited by stage skills; do not duplicate in per-stage text."
+---
+# OpenCode — Tool Map
+OpenCode exposes a leaner tool surface than Claude Code / Cursor. When a cclaw skill describes a capability that OpenCode lacks, fall back to the plain-text equivalent listed below.
+## Core capabilities
+| cclaw capability | OpenCode primitive | Notes |
+|---|---|---|
+| Ask user a structured question | **Not available as a tool.** | Emit a plain-text numbered list: \`A) ... B) ... C) (recommended) ...\`. Wait for the user's letter. |
+| Dispatch a subagent | **Not available as a tool.** | Inline the work in the current turn, or split across multiple turns with the user driving. |
+| Read file | file-read primitive | Same role as \`Read\`. |
+| Edit file | file-edit primitive | Same role as \`StrReplace\`; confirm diff before writing. |
+| Create file | file-write primitive | Prefer editing existing files. |
+| Search file contents | \`rg\` via shell | Cite \`rg\` output verbatim as evidence when a skill requires a grep result. |
+| Find files by name / glob | \`fd\` or \`find\` via shell | Capture the command + output. |
+| Shell command | shell primitive | Long-running jobs require explicit background + polling — check the OpenCode docs for \`&\` semantics. |
+| Fetch URL | \`curl\` via shell | No markdown conversion; extract manually. |
+| Web search | **Not available.** | Ask the user to paste docs or provide a URL, then fetch via shell. |
+| Todo tracking | **Not available as a tool.** | Maintain a \`### TODO\` block inline in your response; keep one item in progress. |
+| MCP tool call | Depends on runtime config. | If MCP is enabled, use the documented invocation; otherwise treat as unavailable. |
+## Decision-protocol mapping
+\`\`\`
+Decision: <one sentence>.
+A) <label> — <trade-off>
+B) <label> — <trade-off>
+C) <label> — <trade-off>  (recommended, because <one-line reason>)
+Please reply with the letter.
+\`\`\`
+## Escalation / fall-back
+Because OpenCode lacks native ask-user and dispatch tools, more of cclaw's protocols degrade to plain text. This is expected — the flow gates and artifacts are identical; only the delivery channel changes.
+`;
+const CODEX_TOOLS_MD = `---
+harness: codex
+name: Codex tool map
+description: "Canonical mapping of cclaw capability names → Codex CLI primitives. Cited by stage skills; do not duplicate in per-stage text."
+---
+# Codex — Tool Map
+Codex (OpenAI Codex CLI) exposes roughly the same core surface as OpenCode: file I/O, shell, no native ask-user, no dispatch. Fall back to plain text for anything else.
+## Core capabilities
+| cclaw capability | Codex primitive | Notes |
+|---|---|---|
+| Ask user a structured question | **Not available as a tool.** | Emit a plain-text lettered list; wait for the user's reply. |
+| Dispatch a subagent | **Not available as a tool.** | Inline the work; split turns if needed. |
+| Read file | \`read\` / \`open\` primitive | Same role as \`Read\`. |
+| Edit file | \`edit\` / \`patch\` primitive | Same role as \`StrReplace\`. |
+| Create file | \`write\` primitive | Prefer editing existing files. |
+| Search file contents | \`rg\` via shell | Capture command + output verbatim. |
+| Find files by name / glob | \`fd\` / \`find\` / \`ls\` via shell | Capture command + output. |
+| Shell command | shell primitive | Codex CLI may restrict some binaries by default — check the effective permissions. |
+| Fetch URL | \`curl\` via shell | Extract markdown manually. |
+| Web search | **Not available.** | Ask user for docs / URL. |
+| Todo tracking | **Not available as a tool.** | Keep an inline \`### TODO\` section; update it as you progress. |
+| MCP tool call | Depends on runtime config. | If MCP is wired, cite the descriptor; otherwise treat as unavailable. |
+## Decision-protocol mapping
+\`\`\`
+Decision: <one sentence>.
+A) <label> — <trade-off>
+B) <label> — <trade-off>  (recommended, because <reason>)
+C) <label> — <trade-off>
+Please reply with the letter.
+\`\`\`
+## Escalation / fall-back
+Treat missing tools as "plain-text required", not "skip the step". The gate still has to pass; only the channel changes.
+`;
+const HARNESS_TOOL_REFS = {
+    claude: CLAUDE_TOOLS_MD,
+    cursor: CURSOR_TOOLS_MD,
+    opencode: OPENCODE_TOOLS_MD,
+    codex: CODEX_TOOLS_MD
+};
+export function harnessToolRefMarkdown(harness) {
+    return HARNESS_TOOL_REFS[harness];
+}
+export const HARNESS_TOOL_REFS_INDEX_MD = `---
+name: Harness tool maps
+description: "Index file. One reference per supported harness — cite the per-harness file instead of hardcoding tool names in stage skills."
+---
+# Harness Tool Maps
+cclaw supports four harnesses; each exposes different primitive names for the same capabilities. Stage skills and utility skills cite the file matching the currently active harness and fall back to plain-text equivalents for capabilities that the harness lacks.
+| Harness | File | Notes |
+|---|---|---|
+| Claude Code | \`.cclaw/${HARNESS_TOOL_REFS_DIR}/claude.md\` | Richest tool surface (AskUserQuestion, Task, WebFetch, WebSearch, MCP, …). |
+| Cursor | \`.cclaw/${HARNESS_TOOL_REFS_DIR}/cursor.md\` | Near-parity with Claude; uses \`AskQuestion\` instead of \`AskUserQuestion\`. |
+| OpenCode | \`.cclaw/${HARNESS_TOOL_REFS_DIR}/opencode.md\` | No native ask-user / dispatch; more plain-text fallbacks. |
+| Codex | \`.cclaw/${HARNESS_TOOL_REFS_DIR}/codex.md\` | No native ask-user / dispatch; shell + file I/O only by default. |
+When a new harness is added or an existing one renames a tool, update the corresponding file (and this index) — do NOT scatter tool names across skill text.
+`;

package/dist/content/hooks.js CHANGED Viewed

@@ -309,14 +309,60 @@ if [ -f "$META_SKILL" ]; then
   META_CONTENT=$(cat "$META_SKILL" 2>/dev/null || echo "")
 fi
-# --- Load knowledge snapshot (canonical JSONL tail) ---
+# --- Load knowledge snapshot (canonical JSONL tail + total count) ---
 KNOWLEDGE_SUMMARY=""
+LEARNINGS_COUNT=0
 if [ -f "$KNOWLEDGE_FILE" ] && [ -s "$KNOWLEDGE_FILE" ]; then
   KNOWLEDGE_SUMMARY=$(tail -n 30 "$KNOWLEDGE_FILE" 2>/dev/null || echo "")
+  LEARNINGS_COUNT=$(grep -c '^{' "$KNOWLEDGE_FILE" 2>/dev/null || echo "0")
+fi
+# --- Installed cclaw-cli version vs. project's recorded version (one-block
+# upgrade-check, gstack-style). Purely informational — we never block. ---
+VERSION_NOTE=""
+INSTALLED_VERSION=""
+PROJECT_VERSION=""
+# Version lookup is skipped by default — spawning the cli on every session
+# start adds ~10s on Node-based installs. Opt-in via CCLAW_HOOK_VERSION_CHECK=1.
+if [ "\${CCLAW_HOOK_VERSION_CHECK:-0}" = "1" ] && command -v cclaw >/dev/null 2>&1; then
+  INSTALLED_VERSION=$(cclaw --version 2>/dev/null | head -1 | awk '{print $NF}' || echo "")
+fi
+CONFIG_FILE="$ROOT/${RUNTIME_ROOT}/config.json"
+if [ -f "$CONFIG_FILE" ]; then
+  if command -v jq >/dev/null 2>&1; then
+    PROJECT_VERSION=$(jq -r '.version // ""' "$CONFIG_FILE" 2>/dev/null || echo "")
+  else
+    PROJECT_VERSION=$(grep -o '"version"[[:space:]]*:[[:space:]]*"[^"]*"' "$CONFIG_FILE" 2>/dev/null | head -1 | sed 's/.*"\\([^"]*\\)"$/\\1/' || echo "")
+  fi
+fi
+if [ -n "$INSTALLED_VERSION" ] && [ -n "$PROJECT_VERSION" ] && [ "$INSTALLED_VERSION" != "$PROJECT_VERSION" ]; then
+  VERSION_NOTE="cclaw-cli $INSTALLED_VERSION installed; project recorded $PROJECT_VERSION — run 'cclaw sync' to realign."
+fi
+# --- Routing-check: AGENTS.md / CLAUDE.md must contain the cclaw block. ---
+ROUTING_NOTE=""
+ROUTING_MISSING=""
+for routing_file in "$ROOT/AGENTS.md" "$ROOT/CLAUDE.md"; do
+  if [ -f "$routing_file" ]; then
+    if ! grep -q "cclaw-start" "$routing_file" 2>/dev/null; then
+      ROUTING_MISSING="$ROUTING_MISSING $(basename "$routing_file")"
+    fi
+  fi
+done
+if [ -n "$ROUTING_MISSING" ]; then
+  ROUTING_NOTE="Routing block missing from:\${ROUTING_MISSING}. Run 'cclaw sync' to re-inject."
 fi
 # --- Build context message ---
-CTX="cclaw loaded. Flow: stage=$STAGE ($COMPLETED/8 completed, run=$ACTIVE_RUN). Active artifacts: ${RUNTIME_ROOT}/artifacts/"
+CTX="cclaw loaded. Flow: stage=$STAGE ($COMPLETED/8 completed, run=$ACTIVE_RUN). Active artifacts: ${RUNTIME_ROOT}/artifacts/. Learnings: $LEARNINGS_COUNT entries."
+if [ -n "$VERSION_NOTE" ]; then
+  CTX="$CTX
+$VERSION_NOTE"
+fi
+if [ -n "$ROUTING_NOTE" ]; then
+  CTX="$CTX
+$ROUTING_NOTE"
+fi
 if [ -n "$CONTEXT_MODE_NOTE" ]; then
   CTX="$CTX
 $CONTEXT_MODE_NOTE"

package/dist/content/meta-skill.js CHANGED Viewed

@@ -209,10 +209,7 @@ When a stage requires user input (approval, choice, direction):
 1. **State the decision** in one sentence.
 2. **Present options** as labeled choices (A, B, C...), one-line each, with trade-off / consequence.
 3. **Mark one option \`(recommended)\`** with a one-line reason. Do NOT use numeric "Completeness" rubrics — pick the option that best closes the decision with the smallest blast radius, lowest irreversible risk, and clearest evidence.
-4. **Use the harness ask-user tool when available:**
-   - Claude Code: \`AskUserQuestion\`
-   - Cursor: \`AskQuestion\` (options array)
-   - Codex/OpenCode: numbered list in plain text (no native ask tool).
+4. **Use the harness ask-user tool when available.** For the exact tool name and fallback, consult \`.cclaw/references/harness-tools/<harness>.md\` (one file per supported harness — claude, cursor, opencode, codex). Summary: Claude Code → \`AskUserQuestion\`; Cursor → \`AskQuestion\`; OpenCode / Codex → plain-text lettered list.
 5. **Wait for response.** Do not proceed until the user picks.
 6. **Commit to the choice.** Once decided, do not re-argue.
@@ -236,6 +233,43 @@ When a stage requires user input (approval, choice, direction):
 If the same approach fails three times in a row (same verification command, same review finding, same tool invocation), STOP and escalate: summarize what you tried, what evidence you have, what hypothesis you are now testing, and ask the user how to proceed. Do not invent a new angle silently on the fourth attempt.
+### Shared Stage Completion Protocol
+Every stage skill ends with a completion block parameterized by four values: \`next\` (next stage or \`done\`), \`gates\` (gate IDs to mark passed), \`artifact\` (file under \`.cclaw/artifacts/\`), and \`mandatory\` (agents required by delegation enforcement). Stage skills print their **Completion Parameters** and then defer to this procedure — do NOT re-print the full procedure per stage.
+When all required gates are satisfied and the artifact is written, execute **in this exact order**:
+0. **Delegation pre-flight** (BLOCKING, only when \`mandatory\` is non-empty).
+   - For each agent in \`mandatory\`: confirm it was dispatched (via Task/delegate) and completed, OR record an explicit waiver with reason in \`.cclaw/state/delegation-log.json\`.
+   - Write a JSON entry per agent: \`{ "stage": "<stage>", "agent": "<name>", "mode": "mandatory", "status": "completed"|"waived", "waiverReason": "<if waived>", "ts": "<ISO timestamp>" }\`.
+   - If the harness does not support delegation, record status \`"waived"\` with reason \`"harness_limitation"\`.
+   - **Do NOT proceed to step 1 until every mandatory agent has an entry in the delegation log.**
+1. **Update \`.cclaw/state/flow-state.json\`:**
+   - Set \`currentStage\` to \`next\` (or leave unchanged when \`next === "done"\`).
+   - Add the current stage to \`completedStages\`.
+   - Move every gate ID in \`gates\` into \`stageGateCatalog.<stage>.passed\`.
+   - Clear \`stageGateCatalog.<stage>.blocked\`.
+   - For each passed gate, add an entry to \`guardEvidence\`: \`"<gate_id>": "<artifact path or excerpt proving the gate>"\`. Do NOT leave \`guardEvidence\` empty.
+2. **Persist artifact** at \`.cclaw/artifacts/<artifact>\`. Do NOT manually copy into \`.cclaw/runs/\`; archival is handled by \`cclaw archive\`.
+3. **Doctor pre-flight** — run \`npx cclaw doctor\` (or the installed cclaw binary). If any check fails, resolve the issue (missing delegation entry, artifact section, gate evidence) and re-run until all checks pass. Do NOT proceed while doctor reports failures.
+4. **Tell the user** (verbatim when \`next\` is a stage; use the flow-complete variant when \`next === "done"\`):
+   > **Stage \`<stage>\` complete.** Next: **<next>** — <one-line next-stage description>.
+   >
+   > Run \`/cc-next\` to continue.
+   Flow-complete variant:
+   > **Flow complete.** All stages finished. The project is ready for release.
+5. **STOP.** Do not load the next stage skill yourself. The user will run \`/cc-next\` when ready (same session or new session).
+### Shared Resume Protocol
+When resuming a stage in a NEW session (artifact exists but gates are not all passed in \`flow-state.json\`):
+1. Read the existing artifact and mark every gate whose evidence is already present in the artifact.
+2. For each unverified gate, ask the user to confirm ONE gate at a time. Do NOT batch multiple gate confirmations in a single message.
+3. Update \`guardEvidence\` for each confirmed gate before proceeding to the next unverified gate.
 ## </EXTREMELY-IMPORTANT>
 ## Invocation Preamble (per turn, non-trivial tasks)
@@ -255,6 +289,40 @@ The preamble exists to prevent silent drift from the user's ask. If the preamble
 Do not re-emit the preamble on every subsequent tool call — once per user turn is sufficient. If the user message changes the goal mid-execution, emit a fresh preamble before acting on the new direction.
+## Engineering Ethos
+Three guardrails apply to every stage, every turn. Internalise them — they trump speed, cleverness, and novelty:
+### Search Before Building
+Before writing new code, a new skill, a new abstraction, or a new artifact section, spend 60–120 seconds checking whether the thing already exists. Order of search:
+1. **Project artifacts** — \`.cclaw/artifacts/**\`, \`docs/**\`, root-level \`README.md\` / \`SPEC.md\` / \`DESIGN.md\`.
+2. **Project knowledge** — \`.cclaw/knowledge.jsonl\` (lessons with matching \`domain\` / \`trigger\`).
+3. **Codebase** — \`rg\` / \`Grep\` for the symbol, function, test, or comment that describes what you're about to add.
+4. **Framework/library primitives** — prefer a stdlib or framework-native affordance over a handwritten helper.
+5. **Existing skill or stage rule** — \`.cclaw/skills/**/SKILL.md\` and \`.cclaw/commands/**/*.md\`.
+Only after the first four turn up nothing do you build. Every duplicate helper, redefined type, parallel-but-incompatible artifact section, or re-discovered lesson is a tax on the next five sessions. Record the negative search result (what you looked for, where, and why nothing fit) in the turn's preamble or the stage artifact so future agents don't repeat the hunt.
+### Boil the Lake (scoped minimum-sweep rule)
+"Boil the lake" normally means wasteful, exhaustive work. **cclaw inverts the phrase**: within the current stage, you are expected to sweep *the defined surface exhaustively* — not to stop at the first plausible answer.
+- In \`brainstorm\` / \`scope\` — enumerate every viable approach in the defined option space; name the ones you rejected and why.
+- In \`design\` — trace every data-flow and failure edge across the chosen component boundary, not just the happy path.
+- In \`spec\` — list every acceptance criterion for the in-scope surface; "and similar" / "etc." is banned.
+- In \`tdd\` — exercise every branch / error path / boundary of the slice under test, not only the canonical case.
+- In \`review\` — audit every file touched in the diff, not just the files named in the spec.
+The sweep is bounded by the stage's declared surface. Expanding the surface is a Decision Protocol question, not a silent enlargement.
+### Do Less, Prove More
+When in doubt between adding code / scope / artifact sections and cutting them, cut. The flow already forces you to justify each stage's output — volume is never a proxy for quality. One acceptance criterion with captured evidence beats five without; one labeled architecture diagram beats three generic boxes-and-arrows; one REFACTOR note explaining a concrete trade-off beats a paragraph of filler.
+If a rule, template section, or agent feels ornamental, flag it in \`Operational Self-Improvement\` and propose removal — cclaw's invariant is that every section must pay its tokens back by preventing a specific failure mode.
 ## Operational Self-Improvement (auto-learn)
 cclaw treats **lived friction** as first-class knowledge. When you observe one of the triggers below during a session, append a single JSONL line to \`.cclaw/knowledge.jsonl\` via \`/cc-learn add\` (or queue it for the next \`/cc-learn\` call) — do NOT let the signal evaporate when the session ends.

package/dist/content/skills.d.ts CHANGED Viewed

@@ -1,3 +1,8 @@
 import type { FlowStage } from "../types.js";
+/**
+ * Long-form Wave Execution walkthrough. Rendered once into
+ * \`.cclaw/references/stages/tdd-wave-walkthrough.md\` by the installer.
+ */
+export declare const TDD_WAVE_WALKTHROUGH_MARKDOWN = "# TDD \u2014 Wave Execution Walkthrough\n\nDetailed RED / GREEN / REFACTOR transcript for a 3-task wave. Illustrative\nonly \u2014 do not copy the command names blindly, match them to your stack.\n\n## Wave 1 example tasks\n\n| Task ID | Description | AC | Verification |\n|---|---|---|---|\n| T-1 `[~3m]` | Add `User.emailNormalized` column | AC-1 | `npm test -- users/schema` |\n| T-2 `[~4m]` | Normalize on write in `UserRepo.save` | AC-1 | `npm test -- users/repo` |\n| T-3 `[~3m]` | Reject duplicates in `UserService.signup` | AC-2 | `npm test -- users/service` |\n\n## Execution transcript\n\n### T-1 \u2014 RED\n\n> Run: `npm test -- users/schema` \u2192 **FAIL** (missing column: `emailNormalized`). Captured the failure stack as RED evidence. No production code touched yet.\n\n### T-1 \u2014 GREEN\n\n> Added the column in the schema module. Re-ran `npm test -- users/schema` \u2192 **PASS**. Ran the full suite `npm test` \u2192 **PASS**. Captured both outputs as GREEN evidence.\n\n### T-1 \u2014 REFACTOR\n\n> Extracted the column definition into a shared `NormalizedEmail` type used by T-2/T-3. Re-ran `npm test` \u2192 **PASS**. Captured REFACTOR note: \"Extracted NormalizedEmail type to keep T-2/T-3 DRY; zero behavior change, all tests still green.\"\n\n### T-2 \u2014 RED / GREEN / REFACTOR\n\nWrite the repo test that expects normalised writes, watch it fail (RED), implement normalisation inside `UserRepo.save` only (GREEN), then refactor the normaliser out of the repo into a helper shared with T-3 (REFACTOR).\n\n### T-3 \u2014 RED / GREEN / REFACTOR\n\nWrite the service-level duplicate test that expects a rejection, watch it fail (RED), add the duplicate check in `UserService.signup` (GREEN), refactor the error message into a named constant (REFACTOR).\n\n## Wave gate check\n\nAfter T-3 REFACTOR, before declaring Wave 1 done:\n\n1. Run the full suite (`npm test`) one final time \u2192 **PASS** captured as wave-exit evidence.\n2. Verify the TDD artifact contains RED, GREEN, and REFACTOR evidence for T-1, T-2, **and** T-3. No partial waves.\n3. Only now mark Wave 1 complete. Wave 2 cannot start until this step.\n\n## When to stop mid-wave (do NOT push through)\n\n- A RED test fails for a reason you did not predict (e.g. an unrelated flaky test) \u2192 **pause**, diagnose, log an operational-self-improvement entry, and decide with the user before proceeding.\n- A GREEN step would require touching code outside the task's acceptance criterion \u2192 **pause**, the task is scoped wrong; adjust the plan or open a follow-up task.\n- The same RED failure reappears after a GREEN change \u2192 **escalate** per the 3-attempts rule; do not keep patching.\n";
 export declare function stageSkillFolder(stage: FlowStage): string;
 export declare function stageSkillMarkdown(stage: FlowStage): string;

package/dist/content/skills.js CHANGED Viewed

@@ -1,5 +1,5 @@
 import { RUNTIME_ROOT } from "../constants.js";
-import { stageExamples, stageGoodBadExamples } from "./examples.js";
+import { STAGE_EXAMPLES_REFERENCE_DIR, stageDomainExamples, stageExamples, stageGoodBadExamples } from "./examples.js";
 import { selfImprovementBlock } from "./learnings.js";
 import { stageAutoSubagentDispatch, stageSchema } from "./stage-schema.js";
 function rationalizationTable(stage) {
@@ -146,6 +146,12 @@ On session stop or stage completion, the agent should write delegation entries t
 `;
 }
 const VERIFICATION_STAGES = ["tdd", "review", "ship"];
+/**
+ * Short inline summary of Wave Execution Mode. The detailed 3-task
+ * walkthrough (RED/GREEN/REFACTOR transcript per slice) lives in the
+ * companion reference file so the always-rendered skill body stays under
+ * the 400-line soft budget.
+ */
 function waveExecutionModeBlock(stage) {
     const schema = stageSchema(stage);
     if (!schema.waveExecutionAllowed) {
@@ -155,60 +161,103 @@ function waveExecutionModeBlock(stage) {
 After plan approval (**WAIT_FOR_CONFIRM** / \`plan_wait_for_confirm\` satisfied), process **all tasks in the current dependency wave** sequentially: **RED → GREEN → REFACTOR** per task, recording evidence per slice. **Stop** only on **BLOCKED**, a test failure that **requires user input**, or **wave completion** (every task in the wave has the required RED / GREEN / REFACTOR evidence per the plan artifact).
+**Wave gate check (before marking a wave complete):**
+1. Run the **full suite** one final time → PASS, captured as wave-exit evidence.
+2. Verify the TDD artifact contains RED, GREEN, and REFACTOR evidence for every task in the wave. No partial waves.
+3. Only then declare the wave complete. The next wave cannot start until this step.
+**When to stop mid-wave (do NOT push through):**
+- A RED test fails for an unpredicted reason (e.g. an unrelated flaky test) → **pause**, diagnose, log an operational-self-improvement entry.
+- A GREEN step would require touching code outside the task's acceptance criterion → **pause**, the task is scoped wrong.
+- The same RED failure reappears after a GREEN change → **escalate** per the 3-attempts rule.
+> **Full 3-task walkthrough transcript** (RED/GREEN/REFACTOR per slice, with wave gate check): see \`.cclaw/${STAGE_EXAMPLES_REFERENCE_DIR}/tdd-wave-walkthrough.md\`.
 `;
 }
+/**
+ * Long-form Wave Execution walkthrough. Rendered once into
+ * \`.cclaw/references/stages/tdd-wave-walkthrough.md\` by the installer.
+ */
+export const TDD_WAVE_WALKTHROUGH_MARKDOWN = `# TDD — Wave Execution Walkthrough
+Detailed RED / GREEN / REFACTOR transcript for a 3-task wave. Illustrative
+only — do not copy the command names blindly, match them to your stack.
+## Wave 1 example tasks
+| Task ID | Description | AC | Verification |
+|---|---|---|---|
+| T-1 \`[~3m]\` | Add \`User.emailNormalized\` column | AC-1 | \`npm test -- users/schema\` |
+| T-2 \`[~4m]\` | Normalize on write in \`UserRepo.save\` | AC-1 | \`npm test -- users/repo\` |
+| T-3 \`[~3m]\` | Reject duplicates in \`UserService.signup\` | AC-2 | \`npm test -- users/service\` |
+## Execution transcript
+### T-1 — RED
+> Run: \`npm test -- users/schema\` → **FAIL** (missing column: \`emailNormalized\`). Captured the failure stack as RED evidence. No production code touched yet.
+### T-1 — GREEN
+> Added the column in the schema module. Re-ran \`npm test -- users/schema\` → **PASS**. Ran the full suite \`npm test\` → **PASS**. Captured both outputs as GREEN evidence.
+### T-1 — REFACTOR
+> Extracted the column definition into a shared \`NormalizedEmail\` type used by T-2/T-3. Re-ran \`npm test\` → **PASS**. Captured REFACTOR note: "Extracted NormalizedEmail type to keep T-2/T-3 DRY; zero behavior change, all tests still green."
+### T-2 — RED / GREEN / REFACTOR
+Write the repo test that expects normalised writes, watch it fail (RED), implement normalisation inside \`UserRepo.save\` only (GREEN), then refactor the normaliser out of the repo into a helper shared with T-3 (REFACTOR).
+### T-3 — RED / GREEN / REFACTOR
+Write the service-level duplicate test that expects a rejection, watch it fail (RED), add the duplicate check in \`UserService.signup\` (GREEN), refactor the error message into a named constant (REFACTOR).
+## Wave gate check
+After T-3 REFACTOR, before declaring Wave 1 done:
+1. Run the full suite (\`npm test\`) one final time → **PASS** captured as wave-exit evidence.
+2. Verify the TDD artifact contains RED, GREEN, and REFACTOR evidence for T-1, T-2, **and** T-3. No partial waves.
+3. Only now mark Wave 1 complete. Wave 2 cannot start until this step.
+## When to stop mid-wave (do NOT push through)
+- A RED test fails for a reason you did not predict (e.g. an unrelated flaky test) → **pause**, diagnose, log an operational-self-improvement entry, and decide with the user before proceeding.
+- A GREEN step would require touching code outside the task's acceptance criterion → **pause**, the task is scoped wrong; adjust the plan or open a follow-up task.
+- The same RED failure reappears after a GREEN change → **escalate** per the 3-attempts rule; do not keep patching.
+`;
 function stageCompletionProtocol(schema) {
     const stage = schema.stage;
     const gateIds = schema.requiredGates.map((g) => g.id);
     const gateList = gateIds.map((id) => `\`${id}\``).join(", ");
-    const nextStage = schema.next === "done" ? null : schema.next;
+    const nextStage = schema.next === "done" ? "done" : schema.next;
     const mandatory = schema.mandatoryDelegations;
-    const delegationLogRel = `${RUNTIME_ROOT}/state/delegation-log.json`;
-    const stateUpdate = nextStage
-        ? `   - Set \`currentStage\` to \`"${nextStage}"\`
-   - Add \`"${stage}"\` to \`completedStages\` array
-   - Move all gate IDs for this stage (${gateList}) into \`stageGateCatalog.${stage}.passed\`
-   - Clear \`stageGateCatalog.${stage}.blocked\``
-        : `   - Add \`"${stage}"\` to \`completedStages\` array
-   - Move all gate IDs for this stage (${gateList}) into \`stageGateCatalog.${stage}.passed\`
-   - Clear \`stageGateCatalog.${stage}.blocked\``;
-    const delegationBlock = mandatory.length > 0
-        ? `0. **Delegation pre-flight** (BLOCKING):
-   - Mandatory agents for this stage: ${mandatory.map((a) => `\`${a}\``).join(", ")}.
-   - For each mandatory agent: confirm it was dispatched (via Task/delegate) and completed, OR record an explicit waiver with reason in \`${delegationLogRel}\`.
-   - Write a JSON entry per agent: \`{ "stage": "${stage}", "agent": "<name>", "mode": "mandatory", "status": "completed"|"waived", "waiverReason": "<if waived>", "ts": "<ISO timestamp>" }\`.
-   - If the harness does not support delegation, record status \`"waived"\` with reason \`"harness_limitation"\`.
-   - **Do NOT proceed to step 1 until every mandatory agent has an entry in the delegation log.**
-`
-        : "";
-    let nextAction;
-    if (nextStage) {
-        const nextSchema = stageSchema(nextStage);
-        const nextDescription = nextSchema.skillDescription.charAt(0).toLowerCase() + nextSchema.skillDescription.slice(1);
-        nextAction = `4. Tell the user:\n\n   > **Stage \`${stage}\` complete.** Next: **${nextStage}** — ${nextDescription}\n   >\n   > Run \`/cc-next\` to continue.`;
-    }
-    else {
-        nextAction = `4. Tell the user:\n\n   > **Flow complete.** All stages finished. The project is ready for release.`;
-    }
+    const mandatoryList = mandatory.length > 0 ? mandatory.map((a) => `\`${a}\``).join(", ") : "none";
+    const nextDescription = schema.next === "done"
+        ? "flow complete — release cut and handoff signed off"
+        : (() => {
+            const nextSchema = stageSchema(schema.next);
+            return nextSchema.skillDescription.charAt(0).toLowerCase() + nextSchema.skillDescription.slice(1);
+        })();
     return `## Stage Completion Protocol
-When all required gates are satisfied and the artifact is written:
+Apply the **Shared Stage Completion Protocol** from \`.cclaw/skills/using-cclaw/SKILL.md\` with these parameters — do NOT re-derive the generic steps here.
-${delegationBlock}1. **Update \`${RUNTIME_ROOT}/state/flow-state.json\`:**
-${stateUpdate}
-   - For each passed gate, add an entry to \`guardEvidence\`: \`"<gate_id>": "<artifact path or excerpt proving the gate>"\`. Do NOT leave \`guardEvidence\` empty.
-2. **Persist artifact** at \`${RUNTIME_ROOT}/artifacts/${schema.artifactFile}\`. Do NOT manually copy into \`${RUNTIME_ROOT}/runs/\`; archival is handled by \`cclaw archive\`.
-3. **Doctor pre-flight** — Run \`npx cclaw doctor\` (or the installed cclaw binary). If any check fails, resolve the issue (missing delegation entry, artifact section, gate evidence) and re-run until all checks pass. Do NOT proceed to the next step while doctor reports failures.
-${nextAction}
+**Completion Parameters**
+- \`stage\` — \`${stage}\`
+- \`next\` — \`${nextStage}\` (${nextDescription})
+- \`gates\` — ${gateList}
+- \`artifact\` — \`${RUNTIME_ROOT}/artifacts/${schema.artifactFile}\`
+- \`mandatory\` — ${mandatoryList}
-**STOP.** Do not load the next stage skill yourself. The user will run \`/cc-next\` when ready (same session or new session).
+When all required gates are satisfied and the artifact is written, execute the shared procedure (delegation pre-flight → flow-state update → artifact persistence → \`npx cclaw doctor\` → user handoff → STOP) using the parameters above. If any check fails, resolve the issue and re-run before proceeding.
 ## Resume Protocol
-When resuming a stage in a NEW session (artifact exists but gates are not all passed in flow-state):
-1. Read the existing artifact and check which gates can be verified from artifact evidence.
-2. For each unverified gate, ask the user to confirm ONE gate at a time. Do NOT batch multiple gate confirmations in a single message.
-3. Update \`guardEvidence\` for each confirmed gate before proceeding.
+When resuming this stage in a NEW session (artifact exists but not all of ${gateList} are passed), follow the **Shared Resume Protocol** in \`.cclaw/skills/using-cclaw/SKILL.md\` — confirm one gate at a time, update \`guardEvidence\` for each, never batch confirmations.
 `;
 }
 function stageTransitionAutoAdvanceBlock(schema) {
@@ -335,6 +384,14 @@ description: "${schema.skillDescription}"
 # ${schema.skillName}
+<EXTREMELY-IMPORTANT>
+**IRON LAW — ${stage.toUpperCase()}:** ${schema.ironLaw}
+If you are about to violate the Iron Law, STOP. No amount of urgency, partial progress, or clever reinterpretation overrides it. Escalate via the Decision Protocol or abandon the stage.
+</EXTREMELY-IMPORTANT>
 ${quickStartBlock(stage)}
 ## Overview
 ${schema.purpose}
@@ -364,6 +421,7 @@ You MUST complete these steps in order:
 ${checklistItems}
 ${stageGoodBadExamples(stage)}
+${stageDomainExamples(stage)}
 ${stageExamples(stage)}
 ${namedAntiPatternBlock(stage)}
 ${cognitivePatternsList(stage)}
@@ -391,11 +449,25 @@ ${decisionRecordBlock(stage)}
 ## Common Rationalizations
 ${rationalizationTable(stage)}
-## Anti-Patterns
-${[...schema.antiPatterns, ...schema.blockers].map((item) => `- ${item}`).join("\n")}
-## Red Flags
-${schema.redFlags.map((item) => `- ${item}`).join("\n")}
+## Anti-Patterns & Red Flags
+> One consolidated list of observable failure modes for this stage. Mix of
+> behavioural anti-patterns (things you might do wrong) and red-flag
+> signals (things you might notice going wrong). Dedup-merged so no item
+> appears twice.
+${(() => {
+        const merged = [];
+        const seen = new Set();
+        for (const item of [...schema.antiPatterns, ...schema.blockers, ...schema.redFlags]) {
+            const key = item.trim().toLowerCase();
+            if (seen.has(key))
+                continue;
+            seen.add(key);
+            merged.push(item);
+        }
+        return merged.map((item) => `- ${item}`).join("\n");
+    })()}
 ${completionStatusBlock(stage)}
 ## Verification