npm - cclaw-cli - Versions diffs - 0.7.1 → 0.9.0 - Mend

cclaw-cli 0.7.1 → 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

package/dist/content/agents.d.ts +9 -0
package/dist/content/agents.js +177 -6
package/dist/content/examples.d.ts +17 -0
package/dist/content/examples.js +275 -4
package/dist/content/harness-tool-refs.d.ts +20 -0
package/dist/content/harness-tool-refs.js +240 -0
package/dist/content/meta-skill.js +203 -33
package/dist/content/skills.js +106 -49
package/dist/content/stage-schema.js +63 -11
package/dist/content/start-command.js +63 -17
package/dist/content/subagents.js +169 -0
package/dist/content/templates.js +44 -6
package/dist/content/utility-skills.d.ts +2 -1
package/dist/content/utility-skills.js +141 -2
package/dist/doctor.js +77 -0
package/dist/harness-adapters.js +55 -16
package/dist/install.js +19 -0
package/package.json +1 -1

package/dist/content/skills.js CHANGED Viewed

@@ -1,7 +1,7 @@
 import { RUNTIME_ROOT } from "../constants.js";
-import { stageExamples } from "./examples.js";
+import { stageDomainExamples, stageExamples, stageGoodBadExamples } from "./examples.js";
 import { selfImprovementBlock } from "./learnings.js";
-import { QUESTION_FORMAT_SPEC, ERROR_BUDGET_SPEC, stageAutoSubagentDispatch, stageSchema } from "./stage-schema.js";
+import { stageAutoSubagentDispatch, stageSchema } from "./stage-schema.js";
 function rationalizationTable(stage) {
     const schema = stageSchema(stage);
     return `| Rationalization | Reality |
@@ -67,6 +67,25 @@ function decisionRecordBlock(stage) {
         return "";
     return `## Decision Record Template\n\nUse this format for every non-trivial architecture or scope decision made during this stage:\n\n\`\`\`\n${fmt}\n\`\`\`\n`;
 }
+function visualCommunicationBlock(stage) {
+    if (stage !== "design")
+        return "";
+    return `## Visual Communication Rules
+Diagrams are load-bearing artifacts in the design stage, not decoration. A diagram that encodes structure wrongly (or hides structure behind generic labels) misleads every downstream reader. Apply these rules to **every** diagram in the design artifact:
+1. **Concrete names, never generic.** "Service A → Service B" is not a diagram; it is a shape. Every node must name a real component the team will build or touch (\`NotificationPublisher\`, \`FeedReadModel\`, \`Stripe webhook handler\`). If you cannot name it concretely, the design is not ready.
+2. **Every arrow is labeled.** Label with the message, action, or protocol it carries (\`publishEvent(user_id, payload)\`, \`GET /snapshot\`, \`dedupe-key upsert\`). Unlabeled arrows silently lose the contract between components.
+3. **Direction is explicit.** Use arrowheads, not bare lines; draw the flow of *data* (not "dependency") unless the diagram type is explicitly a dependency graph, in which case say so in a one-line caption.
+4. **Distinguish sync vs async.** Use a convention and state it once in a legend: e.g. solid arrow = synchronous request/response, dashed arrow = async message via queue/bus, double arrow = two-way. Async edges always name the queue or topic.
+5. **Show at least one failure edge.** Every non-trivial diagram needs one branch that represents the degraded or error path (timeout, reconnect, fallback to cache, poison-message routing). A diagram with only the happy path hides the interesting half of the design.
+6. **One level of detail per diagram.** Do not mix "service-level" and "class-level" on the same canvas. If you need both, produce two diagrams — one at the system boundary, one at the internal module — and cross-reference them.
+7. **Caption, not decoration.** Each diagram gets a one-sentence caption below it stating what the reader should take away ("*Publish path with idempotent outbox; SSE stream reads the projection, not the bus directly*"). If you cannot write the caption in one sentence, the diagram is doing two things at once.
+8. **Prefer text-based formats** (Mermaid, ASCII) over binary images in \`.cclaw/artifacts/\` so diffs stay reviewable. Binary/SVG is allowed when the diagram is already the source of truth elsewhere (e.g. \`docs/architecture/\`) and the artifact embeds a link plus a text-based summary.
+If a diagram cannot satisfy rules 1–5, do NOT include it — a missing diagram is honest; a misleading diagram is worse. Surface the gap in **Unresolved Decisions** and proceed without the diagram until the decisions that would populate it are locked.
+`;
+}
 function contextLoadingBlock(stage) {
     const trace = stageSchema(stage).crossStageTrace;
     const readLines = trace.readsFrom.length > 0
@@ -136,60 +155,81 @@ function waveExecutionModeBlock(stage) {
 After plan approval (**WAIT_FOR_CONFIRM** / \`plan_wait_for_confirm\` satisfied), process **all tasks in the current dependency wave** sequentially: **RED → GREEN → REFACTOR** per task, recording evidence per slice. **Stop** only on **BLOCKED**, a test failure that **requires user input**, or **wave completion** (every task in the wave has the required RED / GREEN / REFACTOR evidence per the plan artifact).
+### Walkthrough — Wave 1 with 3 tasks
+The example below is **illustrative only** — do not copy the command names blindly, match them to your stack.
+Assume Wave 1 from the plan artifact contains three tasks:
+| Task ID | Description | AC | Verification |
+|---|---|---|---|
+| T-1 \`[~3m]\` | Add \`User.emailNormalized\` column | AC-1 | \`npm test -- users/schema\` |
+| T-2 \`[~4m]\` | Normalize on write in \`UserRepo.save\` | AC-1 | \`npm test -- users/repo\` |
+| T-3 \`[~3m]\` | Reject duplicates in \`UserService.signup\` | AC-2 | \`npm test -- users/service\` |
+**Execution transcript** (one slice at a time, evidence captured per step):
+**T-1 — RED**
+> Run: \`npm test -- users/schema\` → **FAIL** (missing column: \`emailNormalized\`). Captured the failure stack as RED evidence. No production code touched yet.
+**T-1 — GREEN**
+> Added the column in the schema module. Re-ran \`npm test -- users/schema\` → **PASS**. Ran the full suite \`npm test\` → **PASS**. Captured both outputs as GREEN evidence.
+**T-1 — REFACTOR**
+> Extracted the column definition into a shared \`NormalizedEmail\` type used by T-2/T-3. Re-ran \`npm test\` → **PASS**. Captured REFACTOR note: "Extracted NormalizedEmail type to keep T-2/T-3 DRY; zero behavior change, all tests still green."
+**T-2 — RED / GREEN / REFACTOR**: same shape — write the repo test that expects normalised writes, watch it fail (RED), implement normalisation inside \`UserRepo.save\` only (GREEN), then refactor the normaliser out of the repo into a helper shared with T-3 (REFACTOR).
+**T-3 — RED / GREEN / REFACTOR**: write the service-level duplicate test that expects a rejection, watch it fail (RED), add the duplicate check in \`UserService.signup\` (GREEN), refactor the error message into a named constant (REFACTOR).
+**Wave gate check**
+After T-3 REFACTOR, before declaring Wave 1 done:
+1. Run the **full suite** (\`npm test\`) one final time → **PASS** captured as wave-exit evidence.
+2. Verify the TDD artifact contains RED, GREEN, and REFACTOR evidence for T-1, T-2, **and** T-3. No partial waves.
+3. Only now mark Wave 1 complete. Wave 2 cannot start until this step.
+**When to stop mid-wave (do NOT push through)**
+- A RED test fails for a reason you did not predict (e.g. an unrelated flaky test) → **pause**, diagnose, log an operational-self-improvement entry, and decide with the user before proceeding.
+- A GREEN step would require touching code outside the task's acceptance criterion → **pause**, the task is scoped wrong; adjust the plan or open a follow-up task.
+- The same RED failure reappears after a GREEN change → **escalate** per the 3-attempts rule; do not keep patching.
 `;
 }
 function stageCompletionProtocol(schema) {
     const stage = schema.stage;
     const gateIds = schema.requiredGates.map((g) => g.id);
     const gateList = gateIds.map((id) => `\`${id}\``).join(", ");
-    const nextStage = schema.next === "done" ? null : schema.next;
+    const nextStage = schema.next === "done" ? "done" : schema.next;
     const mandatory = schema.mandatoryDelegations;
-    const delegationLogRel = `${RUNTIME_ROOT}/state/delegation-log.json`;
-    const stateUpdate = nextStage
-        ? `   - Set \`currentStage\` to \`"${nextStage}"\`
-   - Add \`"${stage}"\` to \`completedStages\` array
-   - Move all gate IDs for this stage (${gateList}) into \`stageGateCatalog.${stage}.passed\`
-   - Clear \`stageGateCatalog.${stage}.blocked\``
-        : `   - Add \`"${stage}"\` to \`completedStages\` array
-   - Move all gate IDs for this stage (${gateList}) into \`stageGateCatalog.${stage}.passed\`
-   - Clear \`stageGateCatalog.${stage}.blocked\``;
-    const delegationBlock = mandatory.length > 0
-        ? `0. **Delegation pre-flight** (BLOCKING):
-   - Mandatory agents for this stage: ${mandatory.map((a) => `\`${a}\``).join(", ")}.
-   - For each mandatory agent: confirm it was dispatched (via Task/delegate) and completed, OR record an explicit waiver with reason in \`${delegationLogRel}\`.
-   - Write a JSON entry per agent: \`{ "stage": "${stage}", "agent": "<name>", "mode": "mandatory", "status": "completed"|"waived", "waiverReason": "<if waived>", "ts": "<ISO timestamp>" }\`.
-   - If the harness does not support delegation, record status \`"waived"\` with reason \`"harness_limitation"\`.
-   - **Do NOT proceed to step 1 until every mandatory agent has an entry in the delegation log.**
-`
-        : "";
-    let nextAction;
-    if (nextStage) {
-        const nextSchema = stageSchema(nextStage);
-        const nextDescription = nextSchema.skillDescription.charAt(0).toLowerCase() + nextSchema.skillDescription.slice(1);
-        nextAction = `4. Tell the user:\n\n   > **Stage \`${stage}\` complete.** Next: **${nextStage}** — ${nextDescription}\n   >\n   > Run \`/cc-next\` to continue.`;
-    }
-    else {
-        nextAction = `4. Tell the user:\n\n   > **Flow complete.** All stages finished. The project is ready for release.`;
-    }
+    const mandatoryList = mandatory.length > 0 ? mandatory.map((a) => `\`${a}\``).join(", ") : "none";
+    const nextDescription = schema.next === "done"
+        ? "flow complete — release cut and handoff signed off"
+        : (() => {
+            const nextSchema = stageSchema(schema.next);
+            return nextSchema.skillDescription.charAt(0).toLowerCase() + nextSchema.skillDescription.slice(1);
+        })();
     return `## Stage Completion Protocol
-When all required gates are satisfied and the artifact is written:
+Apply the **Shared Stage Completion Protocol** from \`.cclaw/skills/using-cclaw/SKILL.md\` with these parameters — do NOT re-derive the generic steps here.
-${delegationBlock}1. **Update \`${RUNTIME_ROOT}/state/flow-state.json\`:**
-${stateUpdate}
-   - For each passed gate, add an entry to \`guardEvidence\`: \`"<gate_id>": "<artifact path or excerpt proving the gate>"\`. Do NOT leave \`guardEvidence\` empty.
-2. **Persist artifact** at \`${RUNTIME_ROOT}/artifacts/${schema.artifactFile}\`. Do NOT manually copy into \`${RUNTIME_ROOT}/runs/\`; archival is handled by \`cclaw archive\`.
-3. **Doctor pre-flight** — Run \`npx cclaw doctor\` (or the installed cclaw binary). If any check fails, resolve the issue (missing delegation entry, artifact section, gate evidence) and re-run until all checks pass. Do NOT proceed to the next step while doctor reports failures.
-${nextAction}
+**Completion Parameters**
+- \`stage\` — \`${stage}\`
+- \`next\` — \`${nextStage}\` (${nextDescription})
+- \`gates\` — ${gateList}
+- \`artifact\` — \`${RUNTIME_ROOT}/artifacts/${schema.artifactFile}\`
+- \`mandatory\` — ${mandatoryList}
-**STOP.** Do not load the next stage skill yourself. The user will run \`/cc-next\` when ready (same session or new session).
+When all required gates are satisfied and the artifact is written, execute the shared procedure (delegation pre-flight → flow-state update → artifact persistence → \`npx cclaw doctor\` → user handoff → STOP) using the parameters above. If any check fails, resolve the issue and re-run before proceeding.
 ## Resume Protocol
-When resuming a stage in a NEW session (artifact exists but gates are not all passed in flow-state):
-1. Read the existing artifact and check which gates can be verified from artifact evidence.
-2. For each unverified gate, ask the user to confirm ONE gate at a time. Do NOT batch multiple gate confirmations in a single message.
-3. Update \`guardEvidence\` for each confirmed gate before proceeding.
+When resuming this stage in a NEW session (artifact exists but not all of ${gateList} are passed), follow the **Shared Resume Protocol** in \`.cclaw/skills/using-cclaw/SKILL.md\` — confirm one gate at a time, update \`guardEvidence\` for each, never batch confirmations.
 `;
 }
 function stageTransitionAutoAdvanceBlock(schema) {
@@ -344,15 +384,15 @@ You MUST complete these steps in order:
 ${checklistItems}
+${stageGoodBadExamples(stage)}
+${stageDomainExamples(stage)}
 ${stageExamples(stage)}
 ${namedAntiPatternBlock(stage)}
 ${cognitivePatternsList(stage)}
 ## Interaction Protocol
 ${schema.interactionProtocol.map((item, i) => `${i + 1}. ${item}`).join("\n")}
-${QUESTION_FORMAT_SPEC}
-${ERROR_BUDGET_SPEC}
+**See \`.cclaw/skills/using-cclaw/SKILL.md\` "Shared Decision + Tool-Use Protocol"** for the full AskUserQuestion format, error/retry budget, and the 3-attempt escalation rule. Do not duplicate those rules here — apply them verbatim.
 ${waveExecutionModeBlock(stage)}
 ## Required Gates
@@ -368,15 +408,13 @@ ${reviewSectionsBlock(stage)}
 ${verificationBlock(stage)}
 ${crossStageTraceBlock(stage)}
 ${artifactValidationBlock(stage)}
+${visualCommunicationBlock(stage)}
 ${decisionRecordBlock(stage)}
 ## Common Rationalizations
 ${rationalizationTable(stage)}
-## Blockers
-${schema.blockers.length > 0 ? schema.blockers.map((item) => `- ${item}`).join("\n") : "- None — stage can always proceed"}
 ## Anti-Patterns
-${schema.antiPatterns.map((item) => `- ${item}`).join("\n")}
+${[...schema.antiPatterns, ...schema.blockers].map((item) => `- ${item}`).join("\n")}
 ## Red Flags
 ${schema.redFlags.map((item) => `- ${item}`).join("\n")}
@@ -389,6 +427,25 @@ ${stageTransitionAutoAdvanceBlock(schema)}
 ${progressiveDisclosureBlock(stage)}
 ${selfImprovementBlock(stage)}
 ## Handoff
+Before closing the stage, announce the handoff explicitly so the user can steer. Use the **Handoff Menu** below; never auto-advance silently, even when \`/cc-next\` is available.
+### Handoff Menu
+Offer the user a lettered choice at the end of the stage (use \`AskUserQuestion\` / \`AskQuestion\` when the harness supports it, otherwise plain lettered text):
+- **A) Advance** — run \`/cc-next\` and continue to the next stage. Default when all gates are satisfied and there are no open concerns.
+- **B) Revise this stage** — stay on the current stage; apply the user's feedback, then re-ask for handoff.
+- **C) Pause / park** — save state; stop here. Useful when the user wants to share the artifact with a human reviewer before continuing.
+- **D) Rewind** — move to a prior stage (user names which). Use when downstream work revealed that an earlier stage was wrong.
+- **E) Abandon** — mark the flow as cancelled; no further stages will run. Artifacts remain on disk.
+Recommendation rules:
+- If all required gates are satisfied AND the stage's completion status is \`DONE\`, recommend **A (Advance)**.
+- If completion status is \`DONE_WITH_CONCERNS\`, recommend **B (Revise)** and name the concern.
+- If completion status is \`BLOCKED\`, recommend **B (Revise)** or **C (Pause)** depending on whether the blocker is internal or external.
+Reference data for the user:
 - Next command: \`/cc-next\` (loads whatever stage is current in flow-state)
 - Required artifact: \`.cclaw/artifacts/${schema.artifactFile}\`
 - Stage stays blocked if any required gate is unsatisfied

package/dist/content/stage-schema.js CHANGED Viewed

@@ -195,7 +195,7 @@ const SCOPE = {
         "**Error and Rescue Registry** — For each capability: what breaks, how detected, what fallback."
     ],
     interactionProtocol: [
-        "For scope mode selection: use the Decision Protocol — present expand/selective/hold/reduce as labeled options with trade-offs and mark one as (recommended). **Score each option `Completeness: X/10`** (10 = covers every prime-directive failure mode, four data-flow paths, observability, and deferred handling for the in-scope set; subtract for each gap). Recommend the highest-scoring option; if scores tie, pick the lowest blast radius. Base your recommendation on default heuristics: greenfield -> expand, enhancement -> selective, bugfix/hotfix/refactor -> hold, broad blast radius -> reduce. If AskQuestion/AskUserQuestion is available, send exactly ONE question per call, validate fields against runtime schema, and on schema error immediately fall back to plain-text question instead of retrying guessed payloads.",
+        "For scope mode selection: use the Decision Protocol — present expand/selective/hold/reduce as labeled options with trade-offs and mark one as (recommended). Do NOT use a numeric Completeness rubric; recommend the option that best covers the prime-directive failure modes, four data-flow paths, observability, and deferred handling for the in-scope set with the smallest blast radius. Base your recommendation on default heuristics: greenfield -> expand, enhancement -> selective, bugfix/hotfix/refactor -> hold, broad blast radius -> reduce. If AskQuestion/AskUserQuestion is available, send exactly ONE question per call, validate fields against runtime schema, and on schema error immediately fall back to plain-text question instead of retrying guessed payloads.",
         "Walk through the scope checklist interactively. Each checklist item that surfaces a decision should be presented to the user as a question, not as a monologue. Do not dump all items at once.",
         "Challenge premise and verify the problem framing before anything else.",
         "Take a position on every scope decision. Avoid hedging phrases like 'this could work' or 'there are many ways'; state your recommendation and one concrete condition that would change it.",
@@ -350,6 +350,7 @@ const SCOPE = {
     artifactValidation: [
         { section: "Prime Directives", required: true, validationRule: "For each scoped capability: named failure modes, explicit error surface, four data-flow paths, interaction edge cases, observability expectations, and deferred-item handling." },
         { section: "Premise Challenge", required: true, validationRule: "Must contain explicit answers to: right problem? direct path? what if nothing?" },
+        { section: "Requirements", required: true, validationRule: "Table of stable requirement IDs (R1, R2, R3…) one per row with observable outcome, priority, and source. IDs are assigned once and never renumbered across scope/design/spec/plan/review; dropped requirements stay with Priority `DROPPED`." },
         { section: "Implementation Alternatives", required: true, validationRule: "2-3 options with Name, Summary, Effort, Risk, Pros, Cons, and Reuses. Must include minimal viable and ideal architecture options." },
         { section: "Scope Mode", required: true, validationRule: "Must state selected mode and rationale with default heuristic justification." },
         { section: "Mode-Specific Analysis", required: true, validationRule: "Must document the analysis matching the selected scope mode: EXPAND (10x and delight opportunities), SELECTIVE (hold-scope baseline then cherry-picked expansions), HOLD (minimum-change-set hardening), REDUCE (ruthless cuts and follow-up split)." },
@@ -393,7 +394,7 @@ const DESIGN = {
         "Codebase Investigation — Before any design decision, read the actual code in the blast radius. List every file that will be touched, its current responsibilities, and existing patterns (error handling, naming, test style). Design must conform to discovered patterns, not impose new ones without justification.",
         "Step 0: Scope Challenge — what existing code solves sub-problems? Minimum change set? Complexity check: 8+ files or 2+ new services = complexity smell → flag for possible scope reduction.",
         "Search Before Building — For each technical choice (library, pattern, architecture), search for existing solutions. Label findings: Layer 1 (exact match), Layer 2 (partial match, needs adaptation), Layer 3 (inspiration only), EUREKA (unexpected perfect solution). Default to existing before custom.",
-        "Architecture Review — system design, component boundaries, data flow, scaling, security architecture. For each new codepath: one realistic production failure scenario. **Mandatory:** produce at least one architecture diagram (ASCII, Mermaid, or tool-generated) showing component boundaries and data flow direction.",
+        "Architecture Review — system design, component boundaries, data flow, scaling, security architecture. For each new codepath: one realistic production failure scenario. **Mandatory:** produce at least one architecture diagram (ASCII, Mermaid, or tool-generated) showing component boundaries and data flow direction. Apply the **Visual Communication rules** (see below) — an unlabeled or generic diagram is worse than no diagram, because it pretends to encode decisions it does not.",
         "Code Quality Review — code organization, DRY violations, error handling patterns, over/under-engineering assessment.",
         "Test Review — diagram every new flow, data path, error path. For each: what test type covers it? Does one exist? What is the gap? Produce test plan artifact.",
         "Performance Review — N+1 queries, memory concerns, caching opportunities, slow code paths. What breaks at 10x load? At 100x?",
@@ -405,7 +406,7 @@ const DESIGN = {
     interactionProtocol: [
         "Review architecture decisions section-by-section.",
         "For EACH issue found in a review section, present it ONE AT A TIME. Do NOT batch multiple issues.",
-        "For each issue: use the Decision Protocol — describe concretely with file/line references, present labeled options (A/B/C) with trade-offs, effort estimate (S/M/L/XL), risk level (Low/Med/High), **`Completeness: X/10` per option** (10 = fully addresses architecture/data-flow/failure-modes/test+perf review concerns for the issue, subtract for each unaddressed dimension), and mark one as (recommended). Prefer the highest-scoring option; if scores tie, prefer the lower-risk one. If AskQuestion/AskUserQuestion is available, send exactly ONE question per call, validate fields against runtime schema, and on schema error immediately fall back to plain-text question instead of retrying guessed payloads.",
+        "For each issue: use the Decision Protocol — describe concretely with file/line references, present labeled options (A/B/C) with trade-offs, effort estimate (S/M/L/XL), risk level (Low/Med/High), and mark one as (recommended). Do NOT use a numeric Completeness rubric; recommend the option that best covers architecture, data-flow, failure-modes, test, and perf review concerns for the issue with the lowest risk. If AskQuestion/AskUserQuestion is available, send exactly ONE question per call, validate fields against runtime schema, and on schema error immediately fall back to plain-text question instead of retrying guessed payloads.",
         "Only proceed to the next review section after ALL issues in the current section are resolved.",
         "If a section has no issues, say 'No issues found' and move on.",
         "Do not skip failure-mode mapping.",
@@ -583,7 +584,7 @@ const DESIGN = {
         { section: "Codebase Investigation", required: true, validationRule: "Must list blast-radius files with current responsibilities and discovered patterns." },
         { section: "Search Before Building", required: true, validationRule: "For each technical choice: Layer 1 (exact match), Layer 2 (partial match), Layer 3 (inspiration), EUREKA labels with reuse-first default." },
         { section: "Architecture Boundaries", required: true, validationRule: "Must list component boundaries with ownership." },
-        { section: "Architecture Diagram", required: true, validationRule: "At least one diagram (ASCII, Mermaid, or image) showing component boundaries and data flow direction." },
+        { section: "Architecture Diagram", required: true, validationRule: "At least one diagram (ASCII, Mermaid, or image) showing component boundaries and data flow direction. Diagram must: (1) label every node with a concrete component name (no generic 'Service A/B'), (2) label every arrow with the action or message (no unlabeled arrows), (3) mark direction of data flow explicitly, (4) distinguish synchronous from asynchronous edges (e.g. solid vs dashed, or `sync:` / `async:` prefix), (5) show at least one failure edge or degraded-mode branch when the system has one." },
         { section: "Data Flow", required: true, validationRule: "Must include happy path, nil input, empty input, upstream error paths." },
         { section: "Failure Mode Table", required: true, validationRule: "Each failure mode has: trigger, detection, mitigation, user impact." },
         { section: "Test Strategy", required: true, validationRule: "Must define unit/integration/e2e expectations with coverage targets." },
@@ -748,7 +749,7 @@ const SPEC = {
         traceabilityRule: "Every acceptance criterion must trace to a design decision. Every downstream plan task must trace to a spec criterion."
     },
     artifactValidation: [
-        { section: "Acceptance Criteria", required: true, validationRule: "Each criterion is observable, measurable, and falsifiable. Table should include a Design Decision Ref column tracing back to design artifact." },
+        { section: "Acceptance Criteria", required: true, validationRule: "Each criterion is observable, measurable, and falsifiable. Table must include a Requirement Ref column linking to R# IDs in 02-scope.md and a Design Decision Ref column tracing back to design artifact. AC IDs (AC-1, AC-2…) are stable across revisions — dropped ACs stay with Priority `DROPPED`." },
         { section: "Edge Cases", required: true, validationRule: "At least one boundary and one error condition per criterion." },
         { section: "Constraints and Assumptions", required: true, validationRule: "All implicit assumptions surfaced. Constraints have sources." },
         { section: "Testability Map", required: true, validationRule: "Each criterion maps to a concrete test description with verification approach (unit, integration, e2e, manual) and command or manual steps." },
@@ -864,6 +865,8 @@ const PLAN = {
     cognitivePatterns: [
         { name: "Vertical Slice Thinking", description: "Each task delivers one thin end-to-end slice of value. Horizontal layers (all models, then all controllers) create integration risk. Vertical slices (one feature through all layers) reduce it." },
         { name: "Two-Minute Smell Test", description: "If a competent engineer cannot understand and start a task in two minutes, the task is too large or too vague. Break it down further." },
+        { name: "Five-Minute Budget (hard)", description: "Every plan step MUST fit a 2-to-5-minute execution budget on a competent implementer. If a step plausibly takes longer, it is two steps pretending to be one — split it. Measure by 'keyboard minutes on this slice', not by wall clock. Write the estimated minutes next to each task (e.g. `[~3m]`); when a TDD slice later consumes >2× the estimate, log an operational-self-improvement entry so future plans calibrate better." },
+        { name: "No Placeholders", description: "Plan text must be copy-pasteable. Forbidden tokens anywhere in the artifact: `TODO`, `TBD`, `FIXME`, `<fill-in>`, `<your-*-here>`, `xxx`, `...` (as ellipsis for omitted content — real commands use real args). Every acceptance-criterion link, file path, test command, and verification command must be concrete and runnable as written. A placeholder is a deferred decision masquerading as a plan; decide it now or remove the task." },
         { name: "Make the Change Easy, Then Make the Easy Change", description: "Refactor first, implement second. Never structural + behavioral changes simultaneously. Sequence tasks accordingly." },
         { name: "Diagnose Before Fix", description: "Before decomposing work, understand the current state of the codebase. Read existing code, tests, and conventions. Tasks should reference what exists, not assume a blank slate." },
         { name: "Scrap Signals", description: "If a task description is vague, the acceptance criterion is missing, or the verification command is a placeholder — it is scrap. Either rewrite it or remove it. Half-specified tasks waste more time than no tasks." },
@@ -891,6 +894,16 @@ const PLAN = {
                 "Are there hidden dependencies between tasks in different waves?"
             ],
             stopGate: true
+        },
+        {
+            title: "Five-Minute Budget + No-Placeholders Audit",
+            evaluationPoints: [
+                "Does every task carry an explicit minutes estimate (e.g. `[~3m]`) and does every estimate fit the 2-to-5-minute budget? Estimates >5 minutes must be split.",
+                "Are all file paths, test commands, and verification commands copy-pasteable as written — no `TODO`, `TBD`, `FIXME`, `<fill-in>`, `<your-*-here>`, `xxx`, or ellipsis standing in for omitted args?",
+                "Does every acceptance-criterion reference resolve to a real R# / AC-### in the spec (not a blank link)?",
+                "If an estimate is genuinely uncertain (first-time integration, unfamiliar library), is the uncertainty named explicitly and scheduled as a spike task in wave 0, rather than hidden behind a large estimate?"
+            ],
+            stopGate: true
         }
     ],
     completionStatus: ["DONE", "DONE_WITH_CONCERNS", "BLOCKED"],
@@ -902,11 +915,12 @@ const PLAN = {
     artifactValidation: [
         { section: "Dependency Graph", required: true, validationRule: "Ordering and parallel opportunities explicit. No circular dependencies." },
         { section: "Dependency Waves", required: true, validationRule: "Every task belongs to a wave. Each wave has an exit gate and dependency statement." },
-        { section: "Task List", required: true, validationRule: "Each task: ID, description, acceptance criterion link, verification command, and effort estimate (S/M/L)." },
+        { section: "Task List", required: true, validationRule: "Each task row includes ID, description, acceptance criterion, verification command, and effort estimate (S/M/L). Every task must also carry a minutes estimate within the 2-5 minute budget." },
         { section: "Acceptance Mapping", required: true, validationRule: "Every spec criterion is covered by at least one task." },
         { section: "Risk Assessment", required: false, validationRule: "If present: per-task or per-wave risk identification with likelihood, impact, and mitigation strategy." },
         { section: "Boundary Map", required: false, validationRule: "If present: per-wave or per-task interface contracts listing what each task produces (exports) and consumes (imports) from other tasks." },
-        { section: "WAIT_FOR_CONFIRM", required: true, validationRule: "Explicit marker present. Status: pending until user approves." }
+        { section: "WAIT_FOR_CONFIRM", required: true, validationRule: "Explicit marker present. Status: pending until user approves." },
+        { section: "No-Placeholder Scan", required: false, validationRule: "If present: confirmation that a text scan for `TODO`, `TBD`, `FIXME`, `<fill-in>`, `<your-*-here>`, `xxx`, or bare ellipses has zero hits in the task list. A placeholder is a deferred decision masquerading as a plan." }
     ],
     namedAntiPattern: {
         title: "Task Details Can Be Finalized During Coding",
@@ -1037,7 +1051,12 @@ const TDD = {
         { name: "Regression Paranoia", description: "Assume every change breaks something until the full suite proves otherwise. Partial test runs are lies of omission." },
         { name: "Refactor-as-Hygiene", description: "Refactoring is not optional cleanup — it is the third leg of TDD. GREEN without REFACTOR accumulates mess. REFACTOR without GREEN breaks things." },
         { name: "Evidence Over Anecdote", description: "Every claim about test state must be backed by captured output. 'It passed' without terminal evidence is not evidence. 'I saw it fail' without the failure output is not RED. Capture commands, outputs, and results — not summaries from memory." },
-        { name: "Characterization First", description: "Before changing existing behavior, write characterization tests that capture current behavior as-is. These tests document what the system does today — even if that behavior is wrong. Only after the characterization suite is green do you add the new RED test for the desired change. This prevents accidental behavior destruction during refactoring." }
+        { name: "Characterization First", description: "Before changing existing behavior, write characterization tests that capture current behavior as-is. These tests document what the system does today — even if that behavior is wrong. Only after the characterization suite is green do you add the new RED test for the desired change. This prevents accidental behavior destruction during refactoring." },
+        { name: "Test Pyramid Shape", description: "Healthy test suites look like a pyramid: many small fast tests at the base, fewer medium integration tests in the middle, few large end-to-end tests at the top. Each layer catches a different class of bug; none of them substitutes for another. If your suite is top-heavy (mostly E2E) it is slow and flaky; if it is base-only it misses integration contracts. During TDD, default to the smallest layer that can prove the behavior." },
+        { name: "Prove-It Pattern (bug fixes)", description: "For any reported regression or hotfix, the FIRST test is a reproduction — it must fail without your fix, pass with your fix, and fail again if the fix is reverted. This is the only way to prove you fixed the reported bug and not a superficially similar one. Skipping this step is how bugs come back two releases later wearing a different name." },
+        { name: "Test Size Model", description: "Size tests by scope, not by name: Small = pure logic, no I/O, <50ms; Medium = one process boundary, possibly filesystem or an in-memory DB; Large = multi-process / network / real external service. Small tests are the default; escalate to Medium only when a real boundary must be exercised, and to Large only for end-to-end user journeys. Record the size class in the TDD artifact so reviewers can sanity-check the pyramid shape." },
+        { name: "State Over Interaction", description: "Assert on observable outcomes (return values, state changes, persisted data, HTTP responses) — NOT on which helper methods were called, how many times, or in what order. Interaction-style assertions (`expect(mock.foo).toHaveBeenCalledWith(...)` without a state assertion) couple tests to implementation and shatter under harmless refactors. Use mocks only at trust boundaries (network, filesystem, time); for everything inside the module, let state do the asserting. If you cannot observe the outcome without a mock-spy, rework the seam before writing the test." },
+        { name: "Beyoncé Rule", description: "If you liked it, you should have put a test on it. Every surface that a caller can observe — public API, CLI flag, config key, exit code, persisted schema — is a contract, and every contract without a test is a silent regression waiting to happen. When a bug or production incident reveals an uncovered surface, the fix is never 'patch the code'; it is 'patch the code AND add the test that would have caught it'. Untested behavior does not exist for future refactors — it only exists until somebody accidentally removes it." }
     ],
     reviewSections: [
         {
@@ -1061,6 +1080,37 @@ const TDD = {
                 "Is traceability complete: every change links to plan task ID and spec criterion?"
             ],
             stopGate: true
+        },
+        {
+            title: "Test Pyramid + Size Audit",
+            evaluationPoints: [
+                "Is the tests-added count skewed toward Small (unit) tests, with Medium and Large used only when a real boundary justifies the cost?",
+                "Does every newly added test declare a size class (Small / Medium / Large) — either inline in the test file or in the TDD artifact table?",
+                "Are Large tests reserved for genuine end-to-end user journeys (not substitutes for unit coverage)?",
+                "Has the slice avoided using Medium/Large tests to paper over testability problems that should be fixed at the design layer?"
+            ],
+            stopGate: false
+        },
+        {
+            title: "Prove-It Reproduction (bug-fix slices)",
+            evaluationPoints: [
+                "Does the artifact identify this slice as a bug fix, and if so, include a reproduction test checked in alongside the fix?",
+                "Is there captured RED evidence from running the reproduction WITHOUT the fix applied?",
+                "Is there captured GREEN evidence from the same reproduction AFTER the fix was applied?",
+                "Is there a note confirming the reproduction test fails again if the fix is reverted (or equivalent evidence that the test is actually pinned to this fix)?"
+            ],
+            stopGate: false
+        },
+        {
+            title: "State-over-Interaction + Beyoncé Coverage",
+            evaluationPoints: [
+                "Do assertions target observable state (return values, persisted data, HTTP responses, logs) rather than which internal helpers were called?",
+                "Are mocks/spies used only at true trust boundaries (network, filesystem, time, external services), not for module-internal collaborators?",
+                "For every public surface touched in this slice (exported API, CLI flag, config key, env var, exit code, schema field) — does at least one test observe it?",
+                "If a bug or review finding revealed an uncovered surface, was a test added alongside the fix, not just the code change?",
+                "Are interaction-style assertions (e.g. `toHaveBeenCalledWith` without a state assertion) justified by an explicit boundary comment, or flagged for follow-up?"
+            ],
+            stopGate: false
         }
     ],
     completionStatus: ["DONE", "DONE_WITH_CONCERNS", "BLOCKED"],
@@ -1077,7 +1127,9 @@ const TDD = {
         { section: "REFACTOR Notes", required: true, validationRule: "What changed, why, behavior preservation confirmed." },
         { section: "Traceability", required: true, validationRule: "Plan task ID and spec criterion linked." },
         { section: "Verification Ladder", required: false, validationRule: "If present: per-slice verification tier (static, command, behavioral, human) with evidence for highest tier reached." },
-        { section: "Coverage Targets", required: false, validationRule: "If present: per-module or per-code-type coverage thresholds with current values and measurement commands." }
+        { section: "Coverage Targets", required: false, validationRule: "If present: per-module or per-code-type coverage thresholds with current values and measurement commands." },
+        { section: "Test Pyramid Shape", required: false, validationRule: "If present: per-slice count of Small/Medium/Large tests added, to let reviewers verify the suite is not drifting top-heavy." },
+        { section: "Prove-It Reproduction", required: false, validationRule: "Required for bug-fix slices: original failing reproduction test (RED without fix), passing output with fix (GREEN), and a note confirming the test fails again if the fix is reverted." }
     ],
     namedAntiPattern: {
         title: "Code Before Failing Test",
@@ -1125,7 +1177,7 @@ const REVIEW = {
         "Run Layer 1 (spec compliance) completely before starting Layer 2.",
         "In each review section, present findings ONE AT A TIME. Do NOT batch.",
         "Classify every finding as Critical, Important, or Suggestion.",
-        "For each Critical finding: use the Decision Protocol — present resolution options (A/B/C) with trade-offs, **score each option `Completeness: X/10`** (10 = fully closes the finding with no carry-over risk; subtract for partial fixes, deferred follow-ups, or new risk introduced), and mark one as (recommended). Prefer the highest-scoring option; if scores tie, prefer the option with the smallest blast radius. If AskQuestion/AskUserQuestion is available, send exactly ONE question per call, validate fields against runtime schema, and on schema error immediately fall back to plain-text question instead of retrying guessed payloads.",
+        "For each Critical finding: use the Decision Protocol — present resolution options (A/B/C) with trade-offs, and mark one as (recommended). Do NOT use a numeric Completeness rubric; recommend the option that fully closes the finding with no carry-over risk and the smallest blast radius. If AskQuestion/AskUserQuestion is available, send exactly ONE question per call, validate fields against runtime schema, and on schema error immediately fall back to plain-text question instead of retrying guessed payloads.",
         "Resolve all critical blockers before ship.",
         "For final verdict: use AskQuestion/AskUserQuestion only if runtime schema is confirmed; otherwise collect verdict with a plain-text single-choice prompt (APPROVED / APPROVED_WITH_CONCERNS / BLOCKED).",
         "**STOP.** Do NOT proceed to ship until the user provides an explicit verdict."
@@ -1336,7 +1388,7 @@ const SHIP = {
     interactionProtocol: [
         "Run preflight checks before any release action.",
         "Document release notes and rollback plan explicitly.",
-        "For finalization mode: use the Decision Protocol — present modes as labeled options (A/B/C/D) with consequences, **score each option `Completeness: X/10`** (10 = fully addresses release blast-radius, rollback readiness, observability, and stakeholder communication), and mark one as (recommended). Prefer the highest-scoring option; if scores tie, prefer the most reversible one. If AskQuestion/AskUserQuestion is available, send exactly ONE question per call, validate fields against runtime schema, and on schema error immediately fall back to plain-text question instead of retrying guessed payloads.",
+        "For finalization mode: use the Decision Protocol — present modes as labeled options (A/B/C/D) with consequences, and mark one as (recommended). Do NOT use a numeric Completeness rubric; recommend the mode that best addresses release blast-radius, rollback readiness, observability, and stakeholder communication — ties go to the most reversible option. If AskQuestion/AskUserQuestion is available, send exactly ONE question per call, validate fields against runtime schema, and on schema error immediately fall back to plain-text question instead of retrying guessed payloads.",
         "Do not proceed if critical blockers remain from review.",
         "**STOP.** Present finalization options and wait for user selection before executing any finalization action."
     ],

package/dist/content/start-command.js CHANGED Viewed

@@ -25,30 +25,69 @@ This is the **recommended way to start** working with cclaw. Use \`/cc-next\` fo
 ## HARD-GATE
 - **Do not** skip reading \`${flowPath}\` — always check current state before acting.
-- **Do not** start implementation stages directly from \`/cc <prompt>\` — always begin at brainstorm.
+- **Do not** start implementation stages directly from \`/cc <prompt>\` — always begin at the first stage of the resolved track (brainstorm for standard, spec for quick).
+- **Do not** start a stage pipeline for a task that is not a software change (pure question, non-software task, conversation).
 ## Algorithm
 ### With prompt (\`/cc <text>\`)
-1. Read \`${flowPath}\`.
-2. If flow already has completed stages beyond brainstorm, warn the user that starting a new brainstorm will reset progress. Ask for confirmation before proceeding.
-3. **Track heuristic** — classify the idea text and **recommend** a track (the user can override before any state mutation):
+1. **Phase 0 — Task classification.** Before any stage routing, classify the prompt:
+   | Class | Signals | Action |
+   |---|---|---|
+   | **non-software** | legal text / docs / marketing copy / meeting notes / therapy-style conversation | Respond directly, do NOT open a stage, do NOT mutate flow state. |
+   | **pure-question** | "how does X work?", "explain Y", "what are the trade-offs of Z?" | Answer directly, do NOT open a stage. |
+   | **trivial** | typo, one-liner, rename, config tweak, copy change, version bump with zero behavior change | Fast-path: skip \`brainstorm\` and \`scope\`, seed \`00-idea.md\`, move straight to \`design\` or \`spec\` depending on whether an interface change is involved. |
+   | **software — bug fix with repro** | regression / hotfix / named symptom + repro steps | Fast-path: set track to \`quick\`, seed \`04-spec.md\` with the reproduction, enter \`tdd\` with a RED reproduction test first. |
+   | **software — standard** | feature, refactor, migration, integration, architecture change | Full 8-stage flow starting at \`brainstorm\`. |
+   Record the chosen class in \`.cclaw/artifacts/00-idea.md\` on the \`Class:\` line. Do NOT silently treat a non-software task as software.
+2. **Phase 1 — Origin-document discovery.** Before asking the user for context, scan for existing requirements/plan artifacts and merge them into initial context:
+   - \`.cclaw/artifacts/00-idea.md\` if it already exists (resumed flow).
+   - Common origin locations: \`docs/prd/**\`, \`docs/rfcs/**\`, \`docs/adr/**\`, \`docs/design/**\`, \`specs/**\`, \`prd/**\`, \`rfc/**\`, \`design/**\`, root-level \`PRD.md\` / \`SPEC.md\` / \`DESIGN.md\` / \`REQUIREMENTS.md\` / \`ROADMAP.md\`.
+   - Summarize each discovered doc in \`00-idea.md\` under a \`Discovered context\` section with path + 1-line summary.
+   - If an origin doc contradicts the prompt, surface the conflict to the user before routing.
+3. **Phase 2 — Tech-stack + version detection.** Sniff the repo for stack + language versions and record under \`Stack:\`:
+   - Node: \`package.json\` \`engines\` / \`volta\` / \`packageManager\` / \`devDependencies\`.
+   - Python: \`pyproject.toml\` / \`requirements*.txt\` / \`.python-version\`.
+   - Go: \`go.mod\` (module + Go version).
+   - Rust: \`Cargo.toml\` (\`[package]\` + \`rust-version\`).
+   - Java/Kotlin: \`pom.xml\` / \`build.gradle*\` + toolchain version.
+   - Containers: \`Dockerfile\`, \`docker-compose*.yml\`.
+   - CI: \`.github/workflows\`, \`.gitlab-ci.yml\`.
+   Skip detection quietly if no markers are found — do NOT invent a stack.
+4. Read \`${flowPath}\`.
+5. If flow already has completed stages beyond brainstorm, warn the user that starting a new brainstorm will reset progress. Ask for confirmation before proceeding.
+6. **Track heuristic** — classify the idea text and **recommend** a track (the user can override before any state mutation):
    - **quick** (\`spec → tdd → review → ship\`) — single-purpose work where the spec is essentially already known.
      Triggers (case-insensitive substring or close variant): \`bug\`, \`bugfix\`, \`fix\`, \`hotfix\`, \`patch\`, \`typo\`, \`regression\`, \`copy change\`, \`rename\`, \`bump\`, \`upgrade dep\`, \`config tweak\`, \`docs only\`, \`comment\`, \`lint\`, \`format\`, \`small\`, \`tiny\`, \`one-liner\`, \`revert\`.
    - **standard** (full 8 stages — default) — anything that introduces a new capability, touches multiple modules, or has unclear scope.
      Triggers: \`new feature\`, \`add\`, \`build\`, \`design\`, \`refactor\`, \`migration\`, \`platform\`, \`architecture\`, \`endpoint\`, \`schema\`, \`api\`, \`integrate\`, \`workflow\`, \`onboarding\`, or any prompt that does not match quick triggers.
    - When triggers conflict (e.g. "small refactor that touches 5 modules") prefer **standard** — quick is opt-in and only safe when scope is genuinely tiny.
-4. Present the recommendation as a single decision with explicit options:
+7. Present the recommendation as a single decision with explicit options:
    > \`Recommended track: <quick|standard>\` because \`<one-line reason citing matched triggers>\`.
    > Override? (A) keep \`<recommended>\`  (B) switch to \`<other>\`  (C) cancel.
    If \`AskQuestion\`/\`AskUserQuestion\` is available, send exactly ONE question; on schema error, fall back to plain text.
-5. Persist the chosen track to \`${flowPath}\` (\`track\` field). Compute \`skippedStages\` from the track and write that too. Use the **first stage of the chosen track** as \`currentStage\` (quick → \`spec\`, standard → \`brainstorm\`).
-6. Write the prompt to \`.cclaw/artifacts/00-idea.md\` as the raw idea capture, and append a \`Track:\` line referencing the chosen track and the matched heuristic.
-7. Load the **first-stage skill for the chosen track** and its command file:
-   - quick → \`.cclaw/skills/specification-authoring/SKILL.md\` + \`.cclaw/commands/spec.md\`
-   - standard → \`.cclaw/skills/brainstorming/SKILL.md\` + \`.cclaw/commands/brainstorm.md\`
-8. Execute that stage with the prompt as initial context.
+8. Persist the chosen track to \`${flowPath}\` (\`track\` field). Compute \`skippedStages\` from the track and write that too. Use the **first stage of the chosen track** as \`currentStage\` (quick → \`spec\`, standard → \`brainstorm\`, trivial fast-path → \`design\` or \`spec\` per Phase 0).
+9. Write the prompt to \`.cclaw/artifacts/00-idea.md\` with the following header lines: \`Class:\` (from Phase 0), \`Track:\` (chosen track + matched heuristic), \`Stack:\` (from Phase 2 detection, or \`unknown\`), and a \`Discovered context\` section if Phase 1 found origin docs.
+10. Load the **first-stage skill for the chosen track** and its command file:
+    - quick → \`.cclaw/skills/specification-authoring/SKILL.md\` + \`.cclaw/commands/spec.md\`
+    - standard → \`.cclaw/skills/brainstorming/SKILL.md\` + \`.cclaw/commands/brainstorm.md\`
+    - trivial fast-path → design or spec skill per Phase 0 decision.
+11. Execute that stage with the prompt + Phase 1/Phase 2 context as initial input.
+### Reclassification on discovery
+If during any stage the agent discovers evidence that contradicts the initial Phase 0 / track decision (e.g. a supposedly \`trivial\` change turns out to require schema migration, a \`quick\` bug fix turns out to need design discussion, an origin doc reveals scope 3× larger than the prompt), STOP and re-classify:
+1. Surface the new evidence in plain text.
+2. Propose the updated \`Class\` + \`Track\` with a one-line reason.
+3. Use the Decision Protocol to let the user accept, override, or cancel.
+4. On acceptance: update \`00-idea.md\` with a \`Reclassification:\` entry (old → new, reason, ISO timestamp) and update \`flow-state.json\` accordingly — do NOT rewrite prior artifacts, they stay as history.
 ### Without prompt (\`/cc\`)
@@ -88,12 +127,15 @@ Do **not** silently discard an existing flow when the user provides a prompt. If
 ### Path A: \`/cc <prompt>\`
-1. Read \`${flowPath}\`.
-2. If \`completedStages\` is non-empty:
+1. **Task classification (Phase 0).** Decide whether the prompt is \`software-standard\`, \`software-trivial\`, \`software-bugfix\`, \`pure-question\`, or \`non-software\`. Non-software and pure-question exit immediately — answer directly, do not open a stage.
+2. **Origin-document discovery (Phase 1).** Scan for \`docs/prd/**\`, \`docs/rfcs/**\`, \`docs/adr/**\`, \`docs/design/**\`, \`specs/**\`, root-level \`PRD.md\` / \`SPEC.md\` / \`DESIGN.md\` / \`REQUIREMENTS.md\`. Summarize any hits in \`00-idea.md\` under \`Discovered context\`. Surface conflicts with the prompt before routing.
+3. **Stack detection (Phase 2).** Inspect \`package.json\` engines, \`pyproject.toml\`, \`go.mod\`, \`Cargo.toml\`, \`pom.xml\`, \`build.gradle*\`, \`Dockerfile\`, \`docker-compose*.yml\`, and CI configs. Record stack + versions on the \`Stack:\` line. Do not invent stack details.
+4. Read \`${flowPath}\`.
+5. If \`completedStages\` is non-empty:
    - Inform: "You have an active flow at stage **{currentStage}** with {N} completed stages. Starting a new brainstorm will reset progress."
    - Ask: "Continue with reset? (A) Yes, start fresh (B) No, resume current flow"
    - If (B) → switch to Path B behavior.
-3. **Classify the idea** using the heuristic below and present a single track recommendation. Wait for explicit confirmation or override before mutating any state.
+6. **Classify the idea** using the heuristic below and present a single track recommendation. Wait for explicit confirmation or override before mutating any state.
    **Track heuristic** (lowercase substring match against the user prompt):
@@ -104,9 +146,13 @@ Do **not** silently discard an existing flow when the user provides a prompt. If
    - On conflict, prefer \`standard\` (quick is opt-in for genuinely tiny work).
    - Always state the recommendation as a one-line reason citing the matched trigger.
-4. Persist the chosen track in \`${flowPath}\` (\`track\` + \`skippedStages\`). Set \`currentStage\` to the first stage of the chosen track (\`quick\` → \`spec\`, \`standard\` → \`brainstorm\`). Reset gate catalog.
-5. Write \`${RUNTIME_ROOT}/artifacts/00-idea.md\` with the user's prompt and an explicit \`Track:\` line capturing the heuristic decision.
-6. Load and execute the **first stage skill of the chosen track** (\`brainstorming\` for standard, \`specification-authoring\` for quick) plus its matching command file.
+7. Persist the chosen track in \`${flowPath}\` (\`track\` + \`skippedStages\`). Set \`currentStage\` to the first stage of the chosen track (\`quick\` → \`spec\`, \`standard\` → \`brainstorm\`, trivial fast-path → \`design\` or \`spec\`). Reset gate catalog.
+8. Write \`${RUNTIME_ROOT}/artifacts/00-idea.md\` with the user's prompt plus header lines: \`Class:\`, \`Track:\`, \`Stack:\`, and a \`Discovered context\` section from Phase 1.
+9. Load and execute the **first stage skill of the chosen track** (\`brainstorming\` for standard, \`specification-authoring\` for quick) plus its matching command file.
+### Reclassification on discovery
+If mid-stage evidence contradicts the initial Class/Track decision (the "trivial" change needs a migration, the "quick" bug fix needs architecture work, an origin doc multiplies scope), STOP and re-classify using the Decision Protocol. Record \`Reclassification:\` in \`00-idea.md\` with old/new class and a one-line reason. Do NOT rewrite prior artifacts — they stay as history.
 ### Path B: \`/cc\` (no arguments)