npm - cclaw-cli - Versions diffs - 0.48.25 → 0.48.26 - Mend

cclaw-cli 0.48.25 → 0.48.26

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/dist/content/ideate-command.js +106 -53
package/dist/content/skills.js +2 -4
package/dist/content/stages/brainstorm.js +25 -6
package/dist/content/stages/design.js +67 -30
package/dist/content/stages/scope.js +34 -4
package/dist/content/templates.js +33 -4
package/package.json +1 -1

package/dist/content/ideate-command.js CHANGED Viewed

@@ -43,19 +43,26 @@ same session, or save/discard the backlog.
 1. **Resume check.** Glob \`${IDEATE_ARTIFACT_GLOB}\`. If any artifact
    has been modified within the last ${IDEATE_RESUME_WINDOW_DAYS} days,
    offer the user: continue that backlog, start fresh, or cancel.
-2. **Scan repo signals:**
-   - open TODO/FIXME/XXX/HACK notes,
-   - flaky or failing tests,
-   - oversized modules / complexity hotspots,
-   - docs drift vs changed code,
-   - repeated entries in \`${RUNTIME_ROOT}/knowledge.jsonl\`.
-3. **Produce 5-10 candidates** with impact (High/Medium/Low),
-   effort (S/M/L), confidence (High/Medium/Low), and one evidence path
-   per candidate.
-4. **Rank by impact/effort**, recommend the top item.
-5. **Write the artifact** at
+2. **Mode classification.** Explicitly classify subject:
+   \`repo-grounded\` / \`elsewhere-software\` / \`elsewhere-non-software\`.
+   Do not assume repo-grounded by default.
+3. **Mode-aware grounding (parallel).**
+   - Repo-grounded: repo signal scan + \`${RUNTIME_ROOT}/knowledge.jsonl\`
+     repetition scan.
+   - Elsewhere-software: docs-first grounding (Context7 and official docs).
+   - Elsewhere-non-software: constraints and objective grounding.
+4. **Divergent ideation frames (parallel).** Generate candidates with at least
+   4 distinct frames: pain/friction, inversion, assumption-break, leverage,
+   cross-domain analogy, constraint-flip.
+5. **Adversarial critique pass.** For each candidate, write the strongest
+   counter-argument, kill weak ideas, and keep survivors only.
+6. **Produce 5-10 survivors** with impact (High/Medium/Low),
+   effort (S/M/L), confidence (High/Medium/Low), and one evidence path per
+   survivor.
+7. **Rank by impact/effort**, recommend the top survivor.
+8. **Write the artifact** at
    \`${IDEATE_ARTIFACT_PATTERN}\` using the schema in the skill.
-6. **Present the handoff prompt** with four concrete options — not A/B/C
+9. **Present the handoff prompt** with four concrete options — not A/B/C
    letters. Default = "Start /cc on the top recommendation".
 ## Headless mode
@@ -99,7 +106,7 @@ repository. Will persist a ranked backlog to
 ## Protocol
-### Phase 0 — Resume check
+### Phase 0 — Resume and classify
 1. Use the harness's file-glob tool (\`Glob\` pattern
    \`${IDEATE_ARTIFACT_GLOB}\` or equivalent \`ls\`/\`find\`).
@@ -112,57 +119,102 @@ repository. Will persist a ranked backlog to
      on disk for history.
    - **Cancel** — stop; do not scan or write anything.
 4. If no recent artifact exists, proceed to Phase 1 silently.
-### Phase 1 — Collect evidence
-Scan the current repo. Examples of signals (not exhaustive):
-- \`rg -n 'TODO|FIXME|XXX|HACK|TBD'\` grouped by file.
-- Test-runner output (\`npm test\`, \`pytest\`, \`go test ./...\`) — note
-  failures, timeouts, deprecation warnings.
-- Module size outliers (\`wc -l\` or \`du\`) with weak direct test coverage.
-- Docs drift: check that \`README.md\` / \`docs/\` reference files that
-  still exist and flags/APIs that still match \`src/\`.
-- \`${RUNTIME_ROOT}/knowledge.jsonl\` entries with \`type: "heuristic"\`
-  or repeated \`subject:\` values.
-Record each finding with the exact file path or command that produced it.
-### Phase 2 — Build candidates
-For each high-signal finding, construct a candidate:
-- **ID** — \`I-1\`, \`I-2\`, …
-- **Title** — one short imperative phrase
-- **Impact** — High / Medium / Low
-- **Effort** — S / M / L
-- **Confidence** — High / Medium / Low
-- **Evidence** — path(s) or command output, inline if short
-- **Proposed handoff** — the exact \`/cc <phrase>\` the user would run
-  to act on this candidate
-Aim for 5–10 candidates. Do not invent candidates without evidence.
-### Phase 3 — Rank and write the artifact
-1. Sort by impact/effort ratio; break ties with confidence.
-2. Compute the artifact filename:
+5. Classify the ideation mode before grounding:
+   - \`repo-grounded\` — explicitly tied to this repository.
+   - \`elsewhere-software\` — software problem not tied to this repository.
+   - \`elsewhere-non-software\` — process/business/non-software problem.
+6. Record the chosen mode in the artifact.
+### Phase 1 — Mode-aware grounding
+Run grounding in parallel where available:
+- For \`repo-grounded\`:
+  - \`rg -n 'TODO|FIXME|XXX|HACK|TBD'\` grouped by file.
+  - Test-runner output (\`npm test\`, \`pytest\`, \`go test ./...\`) — note
+    failures, timeouts, deprecation warnings.
+  - Module size outliers (\`wc -l\` or \`du\`) with weak direct test coverage.
+  - Docs drift: check that \`README.md\` / \`docs/\` reference files that still
+    exist and flags/APIs that still match \`src/\`.
+  - \`${RUNTIME_ROOT}/knowledge.jsonl\` entries with \`type: "heuristic"\`
+    or repeated \`subject:\` values.
+- For \`elsewhere-software\`:
+  - Gather current framework/library docs first.
+  - Add one comparison scan for established solutions.
+- For \`elsewhere-non-software\`:
+  - Capture objective, constraints, and measured friction before proposing fixes.
+Record each finding with exact evidence (path, command, or doc source).
+### Phase 2 — Divergent ideation
+Generate candidate ideas by frame, in parallel when possible:
+- pain/friction
+- inversion
+- assumption-break
+- leverage
+- cross-domain analogy
+- constraint-flip
+Require at least 4 distinct frames in every run. Avoid frame-collapse
+(same idea rewritten 6 times). Keep raw outputs for auditability.
+### Phase 3 — Critique all, keep survivors
+For each raw candidate:
+- Write strongest argument **against** this idea.
+- Identify disqualifiers (duplicate, weak evidence, poor ROI, wrong timing).
+- Mark as \`survivor\` or \`critiqued-out\`.
+Only survivors advance to ranking.
+### Phase 4 — Rank and write the artifact
+1. Keep 5–10 survivors.
+2. For each survivor, include:
+   - **ID** — \`I-1\`, \`I-2\`, …
+   - **Title** — one short imperative phrase
+   - **Impact** — High / Medium / Low
+   - **Effort** — S / M / L
+   - **Confidence** — High / Medium / Low
+   - **Evidence** — path(s) or command output, inline if short
+   - **Counter-argument** — strongest concern that survived
+   - **Proposed handoff** — exact \`/cc <phrase>\`
+3. Sort by impact/effort ratio; break ties with confidence.
+4. Compute the artifact filename:
    - \`slug\` = first 3–5 words of the top recommendation, lowercase,
      non-alphanumeric collapsed to \`-\`, trimmed. When ideate mode is
      focus-hinted (user passed an argument), use the focus hint instead.
    - \`date\` = today in \`YYYY-MM-DD\` (local time).
    - Path = \`.cclaw/artifacts/ideate-<date>-<slug>.md\`.
-3. Use the harness's write-file tool (\`Write\`, \`apply_patch\`, or shell
+5. Use the harness's write-file tool (\`Write\`, \`apply_patch\`, or shell
    \`cat <<EOF > path\`) to create the artifact with this schema:
    \`\`\`markdown
    # Ideation — <date>
    **Focus:** <user-supplied focus or "open-ended scan">
+   **Mode:** <repo-grounded | elsewhere-software | elsewhere-non-software>
    **Generated:** <ISO-8601 timestamp>
+   **Frames used:** <comma-separated list>
+   **Raw candidates:** <N>
+   **Critiqued out:** <M>
    **Recommendation:** I-1
-   ## Ranked backlog
+   ## Grounding evidence
+   - <signal and evidence>
+   - ...
+   ## Critiqued out
+   | Idea | Why it was rejected |
+   |---|---|
+   | ... | ... |
+   ## Ranked survivors
    | ID | Improvement | Impact | Effort | Confidence | Evidence |
    |---|---|---|---|---|---|
@@ -173,14 +225,15 @@ Aim for 5–10 candidates. Do not invent candidates without evidence.
    ### I-1 — Fix feature-worktree test timeouts
    - **Evidence:** \`npm test\` hangs 40s on tests/unit/feature-system.test.ts:31.
+   - **Counter-argument:** Fix may hide deeper orchestration race.
    - **Handoff:** \`/cc Fix feature-worktree test timeouts on macOS\`
    ### I-2 — …
    \`\`\`
-4. Confirm in chat: "Wrote <path>."
+6. Confirm in chat: "Wrote <path>."
-### Phase 4 — Handoff prompt
+### Phase 5 — Handoff prompt
 Present **one** structured ask using the harness's native tool
 (${STRUCTURED_ASK_TOOLS}). Each option must name the concrete follow-up —
@@ -201,7 +254,7 @@ Required options, in this order:
 When the structured-ask tool is unavailable, fall back to a plain-text
 lettered list with the same four labels. Do not invent extra options.
-### Phase 5 — Execute the choice
+### Phase 6 — Execute the choice
 - **Start /cc on I-1** or **different candidate:** announce
   "Handing off to /cc <phrase>" and load the \`using-cclaw\` router

package/dist/content/skills.js CHANGED Viewed

@@ -77,10 +77,8 @@ function reviewSectionsBlock(stage, track) {
     const sections = schema.reviewSections
         .map((sec) => {
         const points = sec.evaluationPoints.map((p) => `- ${p}`).join("\n");
-        const stop = sec.stopGate
-            ? "\n\n**STOP:** resolve findings in this section before moving forward."
-            : "";
-        return `### ${sec.title}\n${points}${stop}`;
+        const title = sec.stopGate ? `${sec.title} (STOP gate)` : sec.title;
+        return `### ${title}\n${points}`;
     })
         .join("\n\n");
     return `## Review Sections

package/dist/content/stages/brainstorm.js CHANGED Viewed

@@ -21,11 +21,17 @@ export const BRAINSTORM = {
     ],
     checklist: [
         "**Explore project context** — check files, docs, recent commits to understand what already exists.",
+        "**Assess depth tier first** — classify the request as Lightweight / Standard / Deep. Lightweight = narrow/localized ask; Standard = cross-module but bounded; Deep = platform or multi-surface product change.",
         "**Assess scope** — if the request covers multiple independent subsystems, flag it and help decompose before deep-diving. Each sub-project gets its own brainstorm cycle.",
+        "**Short-circuit gate** — if requirements are already concrete and unambiguous, write a minimal brainstorm stub (problem + approved intent + constraints) and hand off to scope.",
         "**Ask clarifying questions** — one at a time, understand purpose, constraints, and success criteria. Prefer multiple choice when possible. Each question should change what we build, not just gather trivia.",
-        "**Propose 2-3 architecturally distinct approaches** — with real trade-offs and your recommendation. Lead with the recommended option and explain why.",
+        "**Propose 2-3 architecturally distinct approaches** — with real trade-offs and no recommendation yet. At least one option must be a higher-upside challenger that raises ambition vs the user's initial ask.",
+        "**Collect user reaction** — ask which approach feels closest and what concerns remain before stating your recommendation.",
+        "**Recommend only after reaction** — present final recommendation with rationale that explicitly references user feedback.",
         "**Present design by sections** — scale each section to its complexity. Ask after each section whether it looks right so far. Cover: architecture, key components, data flow.",
+        "**Optional visual companion** — when architecture/data flow complexity is medium+ offer a compact diagram (ASCII or Mermaid) before artifact write-up.",
         "**Write artifact** to `.cclaw/artifacts/01-brainstorm.md`.",
+        "**Document-quality pass** — run a brief adversarial review of the artifact (gaps, contradictions, missing trade-offs), then patch before user review.",
         "**Self-review** — scan for placeholders/TODOs, check internal consistency, verify scope is focused, resolve any ambiguity.",
         "**User reviews artifact** — ask the user to review the written artifact and explicitly approve or request changes.",
         "**Handoff** — only then complete stage and point to `/cc-next`."
@@ -37,18 +43,24 @@ export const BRAINSTORM = {
         "After 2-3 questions, summarize your emerging understanding before continuing so the user can correct course early.",
         "Each question should change a concrete design decision. Litmus test: if the two most likely answers do not lead to different architectures, make the choice yourself and state it.",
         "Present design in sections scaled to their complexity — a few sentences for simple aspects, detailed for nuanced ones. Get approval after each section.",
-        "When proposing approaches, lead with your recommendation and explain why.",
+        "When proposing approaches, do NOT reveal your recommendation yet. Present options first, gather reaction, then recommend.",
+        "At least one approach must be a higher-upside challenger; avoid three same-altitude variants.",
         "State explicitly what is being approved when requesting approval.",
         "Run a brief self-review (placeholders, contradictions, scope, ambiguity) before presenting the artifact.",
         "**STOP.** Wait for explicit user approval after writing the artifact. Do NOT auto-advance."
     ],
     process: [
         "Explore project context: check files, docs, recent activity.",
+        "Classify depth tier (Lightweight / Standard / Deep) before diving.",
         "Assess scope: flag if request is too broad, help decompose first.",
+        "Apply short-circuit when requirements are already concrete enough for scope.",
         "Ask clarifying questions one at a time — focus on purpose, constraints, success criteria.",
-        "Propose 2-3 architecturally distinct approaches with trade-offs and a recommendation.",
+        "Propose 2-3 architecturally distinct approaches with trade-offs (one must be higher-upside challenger).",
+        "Collect user reaction before giving your recommendation.",
+        "Recommend after reaction and explain how feedback changed the recommendation.",
         "Present design sections incrementally, get approval after each.",
         "Write approved direction to `.cclaw/artifacts/01-brainstorm.md`.",
+        "Run document-quality pass to close contradictions and weak trade-off reasoning.",
         "Self-review: placeholder scan, internal consistency, scope check, ambiguity check.",
         "Request explicit user approval of the artifact.",
         "Handoff to scope only after approval is explicit."
@@ -62,7 +74,9 @@ export const BRAINSTORM = {
         "Artifact written to `.cclaw/artifacts/01-brainstorm.md`.",
         "Project context was explored (files, docs, or recent activity referenced).",
         "Clarifying questions and their answers are captured.",
-        "2-3 approaches with trade-offs and recommendation are recorded.",
+        "2-3 approaches with trade-offs are recorded, including one higher-upside challenger option.",
+        "User reaction to approaches is captured before final recommendation.",
+        "Final recommendation explicitly reflects user reaction.",
         "Approved direction and approval marker are present.",
         "Assumptions and open questions are captured (or explicitly marked as none)."
     ],
@@ -96,6 +110,8 @@ export const BRAINSTORM = {
         "Asking questions without exploring existing project context first",
         "Asking bundled or purely informational questions that don't change decisions",
         "Proposing cosmetic option variants instead of architecturally distinct approaches",
+        "Revealing recommendation before collecting user reaction",
+        "Three same-altitude approaches with no higher-upside challenger",
         "Jumping directly into implementation",
         "Requesting approval without stating what decision is being approved",
         "Questions that only gather preferences without design impact",
@@ -122,9 +138,12 @@ export const BRAINSTORM = {
         { section: "Context", required: true, validationRule: "Must reference project state and relevant existing code or patterns." },
         { section: "Problem", required: true, validationRule: "Must define what we're solving, success criteria, and constraints." },
         { section: "Clarifying Questions", required: false, validationRule: "Must capture question, answer, and decision impact for each clarifying question." },
-        { section: "Approaches", required: true, validationRule: "Must compare 2-3 architecturally distinct options with real trade-offs and recommendation." },
-        { section: "Selected Direction", required: true, validationRule: "Must include the selected approach, rationale, and explicit approval marker." },
+        { section: "Approach Tier", required: false, validationRule: "Must classify depth as Lightweight/Standard/Deep and explain why." },
+        { section: "Approaches", required: true, validationRule: "Must compare 2-3 architecturally distinct options with real trade-offs and include one higher-upside challenger option." },
+        { section: "Approach Reaction", required: false, validationRule: "Must summarize user reaction before recommendation, including concerns that changed direction." },
+        { section: "Selected Direction", required: true, validationRule: "Must include the selected approach, rationale tied to user reaction, and explicit approval marker." },
         { section: "Design", required: false, validationRule: "Must cover architecture, key components, and data flow scaled to complexity." },
+        { section: "Visual Companion", required: false, validationRule: "If architecture/data-flow complexity is medium+, include compact ASCII/Mermaid diagram or explicitly justify omission." },
         { section: "Assumptions and Open Questions", required: false, validationRule: "Must capture unresolved assumptions/open questions, or explicitly state none." }
     ]
 };

package/dist/content/stages/design.js CHANGED Viewed

@@ -21,16 +21,20 @@ export const DESIGN = {
     ],
     checklist: [
         "Trivial-Change Escape Hatch — If scope artifact shows ≤3 files, zero new interfaces, and no cross-module data flow, skip full review sections. Produce a mini-design: one paragraph of rationale, list of changed files, one risk to watch. Proceed to spec.",
-        "Parallel Research Fleet — run `research/research-fleet.md` before architecture lock. Record 4-lens findings in `.cclaw/artifacts/02a-research.md` and summarize resulting decisions in `## Research Fleet Synthesis`.",
+        "Parallel Research Fleet — run `research/research-fleet.md` before architecture lock. Fleet size scales by complexity: Lightweight=1 lens (pitfalls), Standard=2 lenses (architecture+pitfalls), Deep=4 lenses. Record findings in `.cclaw/artifacts/02a-research.md` and summarize resulting decisions in `## Research Fleet Synthesis`.",
         "Design Doc Check — read existing design docs, scope artifact, brainstorm artifact. If a design doc exists that covers this area, check for 'Supersedes:' and use the latest. Use upstream artifacts as source of truth.",
         "Codebase Investigation — Before any design decision, read the actual code in the blast radius. List every file that will be touched, its current responsibilities, and existing patterns (error handling, naming, test style). Design must conform to discovered patterns, not impose new ones without justification.",
         "Step 0: Scope Challenge — what existing code solves sub-problems? Minimum change set? Complexity check: 8+ files or 2+ new services = complexity smell → flag for possible scope reduction.",
         "Search Before Building — For each technical choice (library, pattern, architecture), search for existing solutions. Label findings: Layer 1 (exact match), Layer 2 (partial match, needs adaptation), Layer 3 (inspiration only), EUREKA (unexpected perfect solution). Default to existing before custom.",
-        "Architecture Review — system design, component boundaries, data flow, scaling, security architecture. For each new codepath: one realistic production failure scenario. **Mandatory:** produce at least one architecture diagram (ASCII, Mermaid, or tool-generated) showing component boundaries and data flow direction. Include at least one labeled failure edge, e.g. `API -->|timeout| FallbackCache -->|degraded response| User`. Apply the **Visual Communication rules** (see below) — an unlabeled or generic diagram is worse than no diagram, because it pretends to encode decisions it does not.",
-        "Code Quality Review — code organization, DRY violations, error handling patterns, over/under-engineering assessment.",
+        "Architecture Review — lock component boundaries and one realistic failure scenario per new codepath. **Mandatory diagrams:** architecture for all tiers; Standard/Deep adds Data-Flow Shadow Paths and Error Flow.",
+        "Security & Threat Model Review — trust boundaries, authn/authz, input validation, secrets handling, data exposure risks, abuse cases, and mitigation ownership.",
+        "Code Quality Review — code organization, DRY violations, error handling patterns, over/under-engineering assessment. Include stale-diagram audit for touched files.",
         "Test Review — diagram every new flow, data path, error path. For each: what test type covers it? Does one exist? What is the gap? Produce test plan artifact.",
         "Performance Review — N+1 queries, memory concerns, caching opportunities, slow code paths. What breaks at 10x load? At 100x?",
+        "Observability & Debuggability Review — logging, metrics, traces, alerts, and on-call diagnosis path for each critical failure mode.",
+        "Deployment & Rollout Review — migration sequencing, flag strategy, rollback plan, compatibility window, and post-deploy verification steps.",
         "Parallelization Strategy — If multiple independent modules, produce dependency table: which can be built in parallel? Where are conflict risks? Flag shared-state modules.",
+        "Outside Voice + Spec Review Loop — run adversarial second-opinion review, reconcile findings, and iterate up to 3 cycles or until quality score >= 0.8.",
         "Unresolved Decisions — List any design decisions that could not be resolved in this session. For each: what information is missing? Who can provide it? What is the default if no answer comes?",
         "Distribution Check — If the plan creates new artifact types (packages, CLI tools, configs), document the build/publish story. How does it reach the user?",
         "Deferred Items Cross-Reference — Collect every item explicitly deferred during design review. Each must appear in the Unresolved Decisions table or in the upstream scope artifact's deferred list. No deferred item may exist only in conversation — it must be written down."
@@ -38,29 +42,35 @@ export const DESIGN = {
     interactionProtocol: [
         "Review architecture decisions section-by-section.",
         "For EACH issue found in a review section, present it ONE AT A TIME. Do NOT batch multiple issues.",
-        "For each issue: use the Decision Protocol — describe concretely with file/line references, present labeled options (A/B/C) with trade-offs, effort estimate (S/M/L/XL), risk level (Low/Med/High), and mark one as (recommended). Do NOT use a numeric Completeness rubric; recommend the option that best covers architecture, data-flow, failure-modes, test, and perf review concerns for the issue with the lowest risk. If the harness's native structured-ask tool is available (`AskUserQuestion` / `AskQuestion` / `question` / `request_user_input`), send exactly ONE question per call, validate fields against the runtime schema, and on schema error immediately fall back to a plain-text lettered list instead of retrying guessed payloads.",
+        "For each issue: use the Decision Protocol — describe concretely with file/line references, present labeled options (A/B/C) with trade-offs, effort estimate (S/M/L/XL), risk level (Low/Med/High), and mark one as (recommended). Do NOT use a numeric Completeness rubric. If the harness's native structured-ask tool is available (`AskUserQuestion` / `AskQuestion` / `question` / `request_user_input`), send exactly ONE question per call and fall back to plain-text letters on schema/tool failure.",
         "Only proceed to the next review section after ALL issues in the current section are resolved.",
         "If a section has no issues, say 'No issues found' and move on.",
         "Do not skip failure-mode mapping.",
+        "Use Failure Mode Table columns in fixed order: Method, Exception, Rescue, UserSees. Silent user impact without rescue is treated as critical.",
         "For design baseline approval: present the full baseline. **STOP.** Do NOT proceed until user explicitly approves the design.",
         "**STOP BEFORE ADVANCE.** Mandatory delegation `planner` must be marked completed or explicitly waived in `.cclaw/state/delegation-log.json`. Then close the stage via `node .cclaw/hooks/stage-complete.mjs design` (do not hand-edit `.cclaw/state/flow-state.json`).",
         "Take a firm position on every recommendation. Do NOT hedge with 'it depends' or 'you could do either'. State your opinion, then justify it.",
-        "Use pushback patterns for weak framing: if the user says 'it's just a small change', respond with 'small changes to shared interfaces have outsized blast radius — let's map it'. If 'we'll refactor later', respond with 'later never comes — show me the refactor ticket or do it now'.",
+        "Use pushback for weak framing: 'small changes' on shared interfaces can still have large blast radius.",
         "When the user's proposed architecture is suboptimal, say so directly. Offer the alternative with concrete trade-offs, do not bury criticism in praise.",
-        "When encountering ambiguity, classify it before acting: (A) ask user for missing info, (B) enumerate interpretations and pick one with justification, (C) propose hypothesis with validation path. Do NOT silently resolve ambiguity."
+        "When encountering ambiguity, classify it before acting: (A) ask user for missing info, (B) enumerate interpretations and pick one with justification, (C) propose hypothesis with validation path. Do NOT silently resolve ambiguity.",
+        "Before final approval, run outside-voice review loop and reconcile each finding (accept/reject/defer) with rationale.",
+        "Bound review-loop retries: max 3 iterations or early stop at quality score >= 0.8."
     ],
     process: [
         "Read upstream artifacts (brainstorm, scope).",
-        "Run the research fleet playbook and write `.cclaw/artifacts/02a-research.md` before locking architecture choices.",
+        "Run the research fleet playbook with tiered fleet size and write `.cclaw/artifacts/02a-research.md` before locking architecture choices.",
         "Investigate codebase: read files in blast radius, catalogue current patterns and responsibilities.",
         "Run Step 0 scope challenge: existing code leverage, minimum change set, complexity check.",
         "Walk through each review section interactively.",
         "Define architecture boundaries and ownership.",
-        "Describe data flow and state transitions with edge paths.",
-        "Map failure modes and recovery strategy.",
+        "Describe data flow and state transitions with edge paths + interaction edge-case matrix.",
+        "Map failure modes and recovery strategy using Method/Exception/Rescue/UserSees table.",
+        "Add security, observability, and deployment reviews for Standard+ changes.",
+        "Run stale-diagram audit in touched files and reconcile drift.",
         "Define test coverage strategy and performance budget.",
-        "Produce required outputs: NOT-in-scope section, What-already-exists section, diagrams, failure mode table.",
-        "Produce completion dashboard: list every review section with status (clear / issues-found-resolved / issues-open), count of decisions made, and list of unresolved items.",
+        "Produce required outputs: NOT-in-scope section, What-already-exists section, architecture + shadow/error diagrams, failure mode table.",
+        "Run outside-voice spec review loop (up to 3 iterations, quality score target >= 0.8).",
+        "Produce completion dashboard: status per review section, critical/open gap counts, decision count, unresolved items.",
         "Write design lock artifact for downstream spec/plan."
     ],
     requiredGates: [
@@ -73,11 +83,16 @@ export const DESIGN = {
     requiredEvidence: [
         "Research artifact written to `.cclaw/artifacts/02a-research.md` with stack/features/architecture/pitfalls sections plus synthesis.",
         "Artifact written to `.cclaw/artifacts/03-design.md`.",
-        "Failure-mode table exists with mitigations.",
+        "Failure-mode table exists in Method/Exception/Rescue/UserSees format.",
+        "Data-flow shadow and error-flow diagrams are present for Standard+ complexity.",
+        "Security & threat model findings are documented with mitigations.",
+        "Observability and deployment plans are explicit for critical flows.",
+        "Outside-voice findings and dispositions are recorded (accept/reject/defer).",
+        "Spec review loop summary includes iteration count and quality score trajectory.",
         "Test strategy includes unit/integration/e2e expectations.",
         "NOT-in-scope section produced.",
         "What-already-exists section produced.",
-        "Completion dashboard lists every review section status, decision count, and unresolved items (or 'None')."
+        "Completion dashboard lists review section status, critical/open gap counts, decision count, and unresolved items (or 'None')."
     ],
     inputs: ["scope agreement artifact", "system constraints", "non-functional requirements"],
     requiredContext: [
@@ -115,7 +130,11 @@ export const DESIGN = {
     commonRationalizations: [
         "Architecture deferred to implementation phase",
         "Missing data-flow edge cases",
+        "No interaction-edge-case matrix (double-click, navigate-away, stale-state, large-result)",
         "No performance budget for critical path",
+        "Failure mode table omits rescue path or user-visible impact",
+        "Skipping security/observability/deployment review for non-trivial change",
+        "Skipping outside-voice review loop and treating first draft as final",
         "Batching multiple design issues into one question",
         "Agreeing with user's architecture choice without evaluating alternatives",
         "No NOT-in-scope output section",
@@ -135,15 +154,19 @@ export const DESIGN = {
         {
             title: "Architecture Review",
             evaluationPoints: [
-                "System design and component boundaries",
-                "Dependency graph and coupling concerns",
-                "Data flow patterns and potential bottlenecks",
-                "Scaling characteristics and single points of failure",
-                "Security architecture (auth, data access, API boundaries)",
+                "System design, boundaries, coupling, and bottlenecks",
                 "For each new codepath: one realistic production failure scenario"
             ],
             stopGate: true
         },
+        {
+            title: "Security & Threat Model",
+            evaluationPoints: [
+                "Trust boundaries, authz rules, and sensitive data flows are explicit",
+                "Mitigation ownership and residual risk are documented"
+            ],
+            stopGate: true
+        },
         {
             title: "Code Quality Review",
             evaluationPoints: [
@@ -155,6 +178,15 @@ export const DESIGN = {
             ],
             stopGate: true
         },
+        {
+            title: "Data Flow & Interaction Edge Cases",
+            evaluationPoints: [
+                "Happy/nil/empty/error paths are explicit",
+                "Interaction edge cases and Standard+ shadow/error diagrams are present",
+                "Error-flow includes rescue path and user-visible outcome"
+            ],
+            stopGate: true
+        },
         {
             title: "Test Review",
             evaluationPoints: [
@@ -176,14 +208,20 @@ export const DESIGN = {
             stopGate: true
         },
         {
-            title: "Distribution & Delivery Review",
+            title: "Observability & Debuggability",
+            evaluationPoints: [
+                "Logs/metrics/traces exist for critical failure modes",
+                "Alerting and debug path from symptom to root cause are documented"
+            ],
+            stopGate: true
+        },
+        {
+            title: "Deployment & Rollout Review",
             evaluationPoints: [
-                "If new artifact types are created (packages, CLI, configs): is the build/publish story documented?",
-                "Are there new dependencies that need version pinning?",
-                "Does the change affect existing consumers (APIs, shared modules)?",
-                "Is backwards compatibility maintained or is a migration needed?"
+                "Migration sequencing, rollout/rollback, and compatibility window are explicit",
+                "Post-deploy verification and distribution/build story are documented"
             ],
-            stopGate: false
+            stopGate: true
         }
     ],
     completionStatus: ["DONE", "DONE_WITH_CONCERNS", "BLOCKED"],
@@ -201,18 +239,17 @@ export const DESIGN = {
         { section: "Codebase Investigation", required: false, validationRule: "Must list blast-radius files with current responsibilities and discovered patterns." },
         { section: "Search Before Building", required: false, validationRule: "For each technical choice: Layer 1 (exact match), Layer 2 (partial match), Layer 3 (inspiration), EUREKA labels with reuse-first default." },
         { section: "Architecture Boundaries", required: true, validationRule: "Must list component boundaries with ownership." },
-        { section: "Architecture Diagram", required: true, validationRule: "At least one diagram (ASCII, Mermaid, or image) showing component boundaries and data flow direction. Diagram must: (1) label every node with a concrete component name (no generic 'Service A/B'), (2) label every arrow with the action or message (no unlabeled arrows), (3) mark direction of data flow explicitly, (4) distinguish synchronous from asynchronous edges (e.g. solid vs dashed, or `sync:` / `async:` prefix), (5) include at least one failure/degraded edge line that contains an arrow plus a failure keyword (`timeout`, `error`, `fallback`, `degraded`, `retry`, etc.)." },
-        { section: "Data Flow", required: false, validationRule: "Must include happy path, nil input, empty input, upstream error paths." },
-        { section: "Failure Mode Table", required: true, validationRule: "Each failure mode has: trigger, detection, mitigation, user impact." },
+        { section: "Architecture Diagram", required: true, validationRule: "At least one diagram (ASCII, Mermaid, or image) showing component boundaries and data flow direction. Diagram must: (1) label every node with a concrete component name (no generic 'Service A/B'), (2) label every arrow with the action or message (no unlabeled arrows), (3) mark direction of data flow explicitly, (4) distinguish synchronous from asynchronous edges (e.g. solid vs dashed, or `sync:` / `async:` prefix), (5) include at least one failure/degraded edge line that contains an arrow plus a failure keyword (`timeout`, `error`, `fallback`, `degraded`, `retry`, etc.). Standard/Deep complexity must also include `Data-Flow Shadow Paths` and `Error Flow Diagram` sections." },
+        { section: "Data Flow", required: false, validationRule: "Must include happy path, nil input, empty input, upstream error paths, plus interaction edge-case matrix (double-click, navigate-away, stale-state, large-result, background-job abandonment)." },
+        { section: "Failure Mode Table", required: true, validationRule: "Use Method/Exception/Rescue/UserSees columns and treat silent user impact without rescue as critical." },
+        { section: "Security & Threat Model", required: false, validationRule: "Must list trust boundaries, abuse/failure scenarios, mitigations, and residual risks." },
         { section: "Test Strategy", required: false, validationRule: "Must define unit/integration/e2e expectations with coverage targets." },
         { section: "Performance Budget", required: false, validationRule: "For each critical path: metric name, target threshold, and measurement method." },
         { section: "What Already Exists", required: false, validationRule: "For each sub-problem: existing code/library found (Layer 1-3/EUREKA label), reuse decision, and adaptation needed." },
         { section: "NOT in scope", required: false, validationRule: "Work considered and explicitly deferred with one-line rationale." },
         { section: "Parallelization Strategy", required: false, validationRule: "If multi-module: dependency table, parallel lanes, conflict flags." },
         { section: "Unresolved Decisions", required: false, validationRule: "If any: what info is missing, who provides it, default if unanswered." },
-        { section: "Interface Contracts", required: false, validationRule: "If present: for each module boundary list produces (outputs) and consumes (inputs) with data types." },
-        { section: "Patterns to Mirror", required: false, validationRule: "If present: list discovered codebase patterns to follow, with file references and rationale for each." },
-        { section: "Completion Dashboard", required: true, validationRule: "Lists every review section with status (clear / issues-found-resolved / issues-open), decision count, and unresolved items (or 'None')." }
+        { section: "Completion Dashboard", required: true, validationRule: "Lists every review section with status (clear / issues-found-resolved / issues-open), critical/open gap counts, decision count, and unresolved items (or 'None')." }
     ],
     trivialOverrideSections: ["Architecture Boundaries", "NOT in scope", "Completion Dashboard"]
 };

package/dist/content/stages/scope.js CHANGED Viewed

@@ -20,15 +20,19 @@ export const SCOPE = {
         "The work is a pure implementation or debugging pass within existing scope"
     ],
     checklist: [
+        "**Pre-Scope System Audit** — before premise challenge, gather reality snapshot: recent commits (`git log -30 --oneline`), current diff (`git diff --stat`), stash state (`git stash list`), and deferred debt markers (`rg -n 'TODO|FIXME|XXX|HACK'`). Record findings in scope artifact.",
         "**Assess complexity** — Read the brainstorm artifact. If project is simple (single component, clear architecture, personal/prototype), run light-touch scope: mode selection, 3-5 key in/out boundaries, deferred items. Skip Dream State Mapping and Temporal Interrogation. If project is complex (multi-component, team delivery, production), run the full checklist.",
         "**Prime Directives** — Zero silent failures. For each in-scope capability, name concrete failure modes, the exact error surface, and trace all four data-flow paths (happy, nil, empty, upstream error). Include interaction edge cases (double-click, navigate-away, stale state), observability commitments, and explicit deferred-item logging.",
         "**Premise Challenge** — Is this the right problem? What if we do nothing? What are we optimizing for?",
+        "**Landscape Check** — for EXPAND/SELECTIVE candidates, perform a brief external scan of comparable products/patterns to calibrate ambition and avoid local maxima.",
         "**Existing Code Leverage** — Search for existing solutions before deciding to build new.",
+        "**Taste Calibration** — identify 2-3 high-quality files/modules in this codebase and explicitly align scope quality bar to them.",
         "**Dream State Mapping** — (complex projects only) describe the ideal state 12 months out using `CURRENT STATE -> THIS PLAN -> 12-MONTH IDEAL`, then verify this scope moves toward that target.",
         "**Implementation Alternatives** — Produce 2-3 distinct approaches. For each: Name, Summary, Effort (S/M/L/XL), Risk (Low/Med/High), 2-3 Pros, 2-3 Cons, and explicit Reuses. One option must be minimal viable, one must be ideal architecture.",
         "**Temporal Interrogation** — (complex projects only) simulate implementation timeline: HOUR 1 foundations, HOUR 2-3 core logic, HOUR 4-5 integration surprises, HOUR 6+ polish/tests. Decide what must be locked now vs safely deferred.",
         "**Mode Selection** — Present expand/selective/hold/reduce with recommendation and default heuristic: greenfield -> expand, feature enhancement -> selective, bugfix/hotfix/refactor -> hold, broad blast radius (>15 files or multi-team impact) -> reduce.",
         "**Mode-Specific Analysis** — After mode is selected, run the matching analysis: EXPAND (10x and delight opportunities), SELECTIVE (hold-scope rigor then cherry-picked expansions), HOLD (minimum-change-set hardening), REDUCE (ruthless cuts and follow-up split).",
+        "**Outside Voice + Spec Review Loop** — run an adversarial second-opinion pass on the scope artifact, reconcile findings, and iterate up to 3 cycles or until quality score >= 0.8.",
         "**Error and Rescue Registry** — For each capability: what breaks, how detected, what fallback."
     ],
     interactionProtocol: [
@@ -40,19 +44,25 @@ export const SCOPE = {
         "Present one structural scope issue at a time for decision. Do NOT batch. Use structured options for each scope boundary question.",
         "Record explicit in-scope and out-of-scope contract.",
         "Once the user accepts or rejects a recommendation, commit fully. Do not re-argue.",
+        "Before final scope approval, run an adversarial outside-voice review and reconcile every finding explicitly (accept/reject/defer with rationale).",
+        "Bound review-loop retries: max 3 iterations or early stop at quality score >= 0.8.",
         "Produce a clean scope summary after all issues are resolved.",
         "**STOP.** Wait for explicit user approval of scope contract before advancing to design.",
         "**STOP BEFORE ADVANCE.** Mandatory delegation `planner` must be marked completed or explicitly waived in `.cclaw/state/delegation-log.json`. Then close the stage via `node .cclaw/hooks/stage-complete.mjs scope` (do not hand-edit `.cclaw/state/flow-state.json`)."
     ],
     process: [
+        "Run pre-scope system audit (git log/diff/stash/debt markers).",
         "Run premise challenge and existing-solution leverage check.",
+        "When mode is EXPAND/SELECTIVE, run brief landscape check before final scope lock.",
+        "Calibrate quality bar against 2-3 strong existing modules/files.",
         "Produce 2-3 scope alternatives in a structured format (Name, Summary, Effort, Risk, Pros, Cons, Reuses) with minimum viable and ideal architecture options included.",
         "Choose scope mode with user approval.",
         "Run mode-specific analysis that matches the selected scope mode.",
         "Walk through scope review sections one at a time.",
+        "Run outside-voice spec review loop (up to 3 iterations, quality score target >= 0.8).",
         "Write explicit scope contract, discretion areas, and deferred items.",
         "Freeze non-negotiable boundaries as stable Locked Decisions (D-XX IDs).",
-        "Produce scope summary plus completion dashboard (checklist findings, number of resolved decisions, unresolved items or `None`)."
+        "Produce scope summary plus completion dashboard (section status, critical gaps, resolved decisions, unresolved items or `None`)."
     ],
     requiredGates: [
         { id: "scope_mode_selected", description: "One scope mode was explicitly selected." },
@@ -61,13 +71,16 @@ export const SCOPE = {
     ],
     requiredEvidence: [
         "Artifact written to `.cclaw/artifacts/02-scope.md`.",
+        "Pre-Scope System Audit findings are captured (git log/diff/stash/debt markers).",
         "In-scope and out-of-scope lists are explicit.",
         "Discretion areas are explicit (or marked as `None`).",
         "Selected mode and rationale are documented.",
         "Locked Decisions section lists stable D-XX IDs for non-negotiable boundaries.",
         "Premise challenge findings documented.",
+        "Outside Voice findings and dispositions are recorded (accept/reject/defer with rationale).",
+        "Spec review loop summary includes iteration count and quality score trajectory.",
         "Deferred items list with one-line rationale for each.",
-        "Completion dashboard lists checklist findings, decision count, and unresolved items (or `None`)."
+        "Completion dashboard lists per-section status, critical/open gaps, decision count, and unresolved items (or `None`)."
     ],
     inputs: ["brainstorm artifact", "timeline constraints", "product priorities"],
     requiredContext: [
@@ -95,6 +108,7 @@ export const SCOPE = {
         "scope summary produced"
     ],
     commonRationalizations: [
+        "Skipping pre-scope audit because the task looks small",
         "Scope silently expanded during discussion",
         "No explicit out-of-scope section",
         "Premise accepted without challenge",
@@ -108,7 +122,8 @@ export const SCOPE = {
         "No discretion section (or explicit `None`) in artifact",
         "No deferred/not-in-scope section",
         "No user approval marker",
-        "Missing Locked Decisions section or decisions without D-XX IDs"
+        "Missing Locked Decisions section or decisions without D-XX IDs",
+        "Skipping outside-voice review loop and treating first draft as final"
     ],
     policyNeedles: ["Scope mode", "In Scope", "Out of Scope", "Discretion Areas", "NOT in scope", "Premise Challenge", "Locked Decisions"],
     artifactFile: "02-scope.md",
@@ -160,6 +175,16 @@ export const SCOPE = {
                 "Is observability (logging, metrics, alerts) explicitly in or out of scope?"
             ],
             stopGate: true
+        },
+        {
+            title: "Outside Voice Reconciliation",
+            evaluationPoints: [
+                "Were adversarial findings categorized as accept/reject/defer with rationale?",
+                "Did any rejected finding still expose a real gap in assumptions?",
+                "Is quality score trajectory improving across iterations?",
+                "Did the review loop stop because quality threshold was met (>=0.8) or because retry budget was exhausted?"
+            ],
+            stopGate: true
         }
     ],
     completionStatus: ["DONE", "DONE_WITH_CONCERNS", "BLOCKED"],
@@ -169,8 +194,11 @@ export const SCOPE = {
         traceabilityRule: "Every scope boundary must be traceable to a brainstorm decision. Every downstream design choice must stay within the scope contract."
     },
     artifactValidation: [
+        { section: "Pre-Scope System Audit", required: false, validationRule: "Must capture git log/diff/stash/debt-marker findings before premise challenge." },
         { section: "Prime Directives", required: false, validationRule: "For each scoped capability: named failure modes, explicit error surface, four data-flow paths, interaction edge cases, observability expectations, and deferred-item handling." },
         { section: "Premise Challenge", required: false, validationRule: "Must contain explicit answers to: right problem? direct path? what if nothing?" },
+        { section: "Landscape Check", required: false, validationRule: "When mode is EXPAND/SELECTIVE, include at least one external reference insight and its impact on scope." },
+        { section: "Taste Calibration", required: false, validationRule: "Must reference 2-3 strong in-repo modules/files that define the quality bar or explicitly justify omission." },
         { section: "Requirements", required: false, validationRule: "Table of stable requirement IDs (R1, R2, R3…) one per row with observable outcome, priority, and source. IDs are assigned once and never renumbered across scope/design/spec/plan/review; dropped requirements stay with Priority `DROPPED`." },
         { section: "Locked Decisions (D-XX)", required: false, validationRule: "List of stable locked decisions with IDs D-01, D-02... Each ID appears once, includes rationale, and is intended for downstream cross-stage traceability." },
         { section: "Implementation Alternatives", required: false, validationRule: "2-3 options with Name, Summary, Effort, Risk, Pros, Cons, and Reuses. Must include minimal viable and ideal architecture options." },
@@ -180,7 +208,9 @@ export const SCOPE = {
         { section: "Discretion Areas", required: false, validationRule: "Explicit list of implementer decision zones, or 'None' if scope is fully locked." },
         { section: "Deferred Items", required: false, validationRule: "Each item has one-line rationale. If empty, state 'None' explicitly." },
         { section: "Error & Rescue Registry", required: false, validationRule: "Each scoped capability has: failure mode, detection method, fallback decision." },
-        { section: "Completion Dashboard", required: true, validationRule: "Lists checklist findings, count of resolved decisions, and unresolved decisions (or 'None')." },
+        { section: "Outside Voice Findings", required: false, validationRule: "Must list external/adversarial findings and disposition (accept/reject/defer) with rationale." },
+        { section: "Spec Review Loop", required: false, validationRule: "Must record iterations (max 3), quality score per iteration, stop reason, and unresolved concerns." },
+        { section: "Completion Dashboard", required: true, validationRule: "Lists per-review-section status, count of critical/open gaps, resolved decisions, and unresolved decisions (or 'None')." },
         { section: "Scope Summary", required: true, validationRule: "Clean summary: mode, strongest challenges, recommended path, accepted scope, deferred, excluded." },
         { section: "Dream State Mapping", required: false, validationRule: "If present (complex projects): CURRENT STATE, THIS PLAN, 12-MONTH IDEAL, and alignment verdict." },
         { section: "Temporal Interrogation", required: false, validationRule: "If present (complex projects): timeline simulation table with decision pressures and lock-now vs defer verdicts." }

package/dist/content/templates.js CHANGED Viewed

@@ -240,6 +240,17 @@ inputs_hash: sha256:pending
 (ASCII, Mermaid, or tool-generated diagram showing component boundaries and data flow direction)
 \`\`\`
+## Data-Flow Shadow Paths
+| Path | Trigger | Fallback/Degrade behavior |
+|---|---|---|
+|  |  |  |
+## Error Flow Diagram
+\`\`\`
+(failure detection -> rescue action -> user-visible outcome)
+\`\`\`
 ## What Already Exists
 | Sub-problem | Existing code/library | Layer | Reuse decision |
 |---|---|---|---|
@@ -251,10 +262,15 @@ inputs_hash: sha256:pending
 - Upstream error path:
 - Timeout/downstream path:
+## Security & Threat Model
+| Boundary | Threat | Mitigation | Owner |
+|---|---|---|---|
+|  |  |  |  |
 ## Failure Mode Table
-| Failure mode | Trigger | Detection | Mitigation | User impact |
-|---|---|---|---|---|
-|  |  |  |  |  |
+| Method | Exception | Rescue | UserSees |
+|---|---|---|---|
+|  |  |  |  |
 ## Test Strategy
 - Unit:
@@ -266,6 +282,16 @@ inputs_hash: sha256:pending
 |---|---|---|---|
 |  |  |  |  |
+## Observability & Debuggability
+| Signal | Source | Alert/Debug path |
+|---|---|---|
+|  |  |  |
+## Deployment & Rollout
+| Step | Strategy | Rollback plan |
+|---|---|---|
+|  |  |  |
 ## NOT in scope
 -
@@ -292,10 +318,13 @@ inputs_hash: sha256:pending
 | Review Section | Status | Issues |
 |---|---|---|
 | Architecture Review |  |  |
+| Security & Threat Model |  |  |
 | Code Quality Review |  |  |
+| Data Flow & Interaction Edge Cases |  |  |
 | Test Review |  |  |
 | Performance Review |  |  |
-| Distribution & Delivery Review |  |  |
+| Observability & Debuggability |  |  |
+| Deployment & Rollout Review |  |  |
 **Decisions made:** 0 | **Unresolved:** 0

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "cclaw-cli",
-  "version": "0.48.25",
+  "version": "0.48.26",
   "description": "Installer-first flow toolkit for coding agents",
   "type": "module",
   "bin": {