npm - @wazir-dev/cli - Versions diffs - 1.1.0 → 1.3.0 - Mend

@wazir-dev/cli 1.1.0 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (138) hide show

package/CHANGELOG.md +74 -10
package/README.md +15 -15
package/assets/demo.cast +47 -0
package/assets/demo.gif +0 -0
package/docs/anti-patterns/AP-23-skipping-enabled-workflows.md +28 -0
package/docs/anti-patterns/AP-24-clarifier-deciding-scope.md +34 -0
package/docs/concepts/architecture.md +1 -1
package/docs/concepts/roles-and-workflows.md +2 -0
package/docs/concepts/why-wazir.md +59 -0
package/docs/decisions/2026-03-19-deferred-items.md +564 -0
package/docs/decisions/2026-03-19-enhancement-decisions.md +300 -0
package/docs/readmes/INDEX.md +21 -5
package/docs/readmes/features/expertise/README.md +2 -2
package/docs/readmes/features/exports/README.md +2 -2
package/docs/readmes/features/hooks/pre-compact-summary.md +1 -1
package/docs/readmes/features/schemas/README.md +3 -0
package/docs/readmes/features/skills/README.md +17 -0
package/docs/readmes/features/skills/clarifier.md +5 -0
package/docs/readmes/features/skills/claude-cli.md +5 -0
package/docs/readmes/features/skills/codex-cli.md +5 -0
package/docs/readmes/features/skills/dispatching-parallel-agents.md +5 -0
package/docs/readmes/features/skills/executing-plans.md +5 -0
package/docs/readmes/features/skills/executor.md +5 -0
package/docs/readmes/features/skills/finishing-a-development-branch.md +5 -0
package/docs/readmes/features/skills/gemini-cli.md +5 -0
package/docs/readmes/features/skills/humanize.md +5 -0
package/docs/readmes/features/skills/init-pipeline.md +5 -0
package/docs/readmes/features/skills/receiving-code-review.md +5 -0
package/docs/readmes/features/skills/requesting-code-review.md +5 -0
package/docs/readmes/features/skills/reviewer.md +5 -0
package/docs/readmes/features/skills/subagent-driven-development.md +5 -0
package/docs/readmes/features/skills/using-git-worktrees.md +5 -0
package/docs/readmes/features/skills/wazir.md +5 -0
package/docs/readmes/features/skills/writing-skills.md +5 -0
package/docs/readmes/features/workflows/prepare-next.md +1 -1
package/docs/reference/configuration-reference.md +47 -6
package/docs/reference/hooks.md +1 -0
package/docs/reference/launch-checklist.md +4 -4
package/docs/reference/review-loop-pattern.md +119 -9
package/docs/reference/roles-reference.md +1 -0
package/docs/reference/skill-tiers.md +147 -0
package/docs/reference/tooling-cli.md +3 -1
package/docs/truth-claims.yaml +12 -0
package/expertise/antipatterns/process/ai-coding-antipatterns.md +214 -1
package/exports/hosts/claude/.claude/commands/plan-review.md +3 -1
package/exports/hosts/claude/.claude/commands/verify.md +30 -1
package/exports/hosts/claude/.claude/settings.json +9 -0
package/exports/hosts/claude/CLAUDE.md +1 -1
package/exports/hosts/claude/export.manifest.json +6 -4
package/exports/hosts/claude/host-package.json +3 -1
package/exports/hosts/codex/AGENTS.md +1 -1
package/exports/hosts/codex/export.manifest.json +6 -4
package/exports/hosts/codex/host-package.json +3 -1
package/exports/hosts/cursor/.cursor/hooks.json +4 -0
package/exports/hosts/cursor/.cursor/rules/wazir-core.mdc +1 -1
package/exports/hosts/cursor/export.manifest.json +6 -4
package/exports/hosts/cursor/host-package.json +3 -1
package/exports/hosts/gemini/GEMINI.md +1 -1
package/exports/hosts/gemini/export.manifest.json +6 -4
package/exports/hosts/gemini/host-package.json +3 -1
package/hooks/context-mode-router +191 -0
package/hooks/definitions/context_mode_router.yaml +19 -0
package/hooks/hooks.json +31 -6
package/hooks/protected-path-write-guard +8 -0
package/hooks/routing-matrix.json +45 -0
package/hooks/session-start +62 -1
package/llms-full.txt +937 -134
package/package.json +2 -4
package/schemas/hook.schema.json +2 -1
package/schemas/phase-report.schema.json +89 -0
package/schemas/usage.schema.json +25 -1
package/schemas/wazir-manifest.schema.json +19 -0
package/skills/brainstorming/SKILL.md +32 -157
package/skills/clarifier/SKILL.md +289 -111
package/skills/claude-cli/SKILL.md +320 -0
package/skills/codex-cli/SKILL.md +260 -0
package/skills/debugging/SKILL.md +13 -0
package/skills/design/SKILL.md +13 -0
package/skills/dispatching-parallel-agents/SKILL.md +13 -0
package/skills/executing-plans/SKILL.md +13 -0
package/skills/executor/SKILL.md +139 -19
package/skills/finishing-a-development-branch/SKILL.md +13 -0
package/skills/gemini-cli/SKILL.md +260 -0
package/skills/humanize/SKILL.md +13 -0
package/skills/init-pipeline/SKILL.md +72 -164
package/skills/prepare-next/SKILL.md +81 -10
package/skills/receiving-code-review/SKILL.md +13 -0
package/skills/requesting-code-review/SKILL.md +13 -0
package/skills/reviewer/SKILL.md +369 -24
package/skills/run-audit/SKILL.md +13 -0
package/skills/scan-project/SKILL.md +13 -0
package/skills/self-audit/SKILL.md +217 -16
package/skills/skill-research/SKILL.md +188 -0
package/skills/subagent-driven-development/SKILL.md +13 -0
package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +2 -0
package/skills/subagent-driven-development/implementer-prompt.md +8 -0
package/skills/subagent-driven-development/spec-reviewer-prompt.md +7 -0
package/skills/tdd/SKILL.md +13 -0
package/skills/using-git-worktrees/SKILL.md +13 -0
package/skills/using-skills/SKILL.md +13 -0
package/skills/verification/SKILL.md +54 -3
package/skills/wazir/SKILL.md +464 -381
package/skills/writing-plans/SKILL.md +14 -1
package/skills/writing-skills/SKILL.md +13 -0
package/templates/artifacts/implementation-plan.md +3 -0
package/templates/artifacts/tasks-template.md +133 -0
package/templates/examples/phase-report.example.json +48 -0
package/tooling/src/adapters/composition-engine.js +256 -0
package/tooling/src/adapters/model-router.js +84 -0
package/tooling/src/capture/command.js +41 -2
package/tooling/src/capture/run-config.js +3 -1
package/tooling/src/capture/store.js +56 -0
package/tooling/src/capture/usage.js +106 -0
package/tooling/src/capture/user-input.js +66 -0
package/tooling/src/checks/ac-matrix.js +256 -0
package/tooling/src/checks/command-registry.js +12 -0
package/tooling/src/checks/docs-truth.js +1 -1
package/tooling/src/checks/security-sensitivity.js +69 -0
package/tooling/src/checks/skills.js +111 -0
package/tooling/src/cli.js +31 -20
package/tooling/src/commands/stats.js +161 -0
package/tooling/src/commands/validate.js +5 -1
package/tooling/src/export/compiler.js +33 -37
package/tooling/src/gating/agent.js +145 -0
package/tooling/src/guards/phase-prerequisite-guard.js +185 -0
package/tooling/src/hooks/routing-logic.js +69 -0
package/tooling/src/init/auto-detect.js +258 -0
package/tooling/src/init/command.js +38 -170
package/tooling/src/input/scanner.js +46 -0
package/tooling/src/reports/command.js +103 -0
package/tooling/src/reports/phase-report.js +323 -0
package/tooling/src/state/command.js +160 -0
package/tooling/src/state/db.js +287 -0
package/tooling/src/status/command.js +58 -1
package/tooling/src/verify/proof-collector.js +299 -0
package/wazir.manifest.yaml +26 -14
package/workflows/plan-review.md +3 -1
package/workflows/verify.md +30 -1

package/skills/wazir/SKILL.md CHANGED Viewed

@@ -9,6 +9,29 @@ The user typed `/wazir <their request>`. Run the entire pipeline end-to-end, han
 All questions use **numbered interactive options** — one question at a time, defaults marked "(Recommended)", wait for user response before proceeding.
+## User Input Capture
+After every user response (approval, correction, rejection, redirect, instruction), capture it:
+```
+captureUserInput(runDir, { phase: '<current-phase>', type: '<instruction|approval|correction|rejection|redirect>', content: '<user message>', context: '<what prompted the question>' })
+```
+This uses `tooling/src/capture/user-input.js`. The log at `user-input-log.ndjson` feeds the learning system — user corrections are the strongest signal for improvement. At run end, prune logs older than 10 runs via `pruneOldInputLogs(stateRoot, 10)`.
+## Command Routing
+Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
+- Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
+- Small commands (git status, ls, pwd, wazir CLI) → native Bash
+- If context-mode unavailable, fall back to native Bash with warning
+## Codebase Exploration
+1. Query `wazir index search-symbols <query>` first
+2. Use `wazir recall file <path> --tier L1` for targeted reads
+3. Fall back to direct file reads ONLY for files identified by index queries
+4. Maximum 10 direct file reads without a justifying index query
+5. If no index exists: `wazir index build && wazir index summarize --tier all`
 ## Subcommand Detection
 Before anything else, check if the request starts with a known subcommand:
@@ -18,11 +41,24 @@ Before anything else, check if the request starts with a known subcommand:
 | `/wazir audit ...` | Jump to **Audit Mode** (see below) |
 | `/wazir prd [run-id]` | Jump to **PRD Mode** (see below) |
 | `/wazir init` | Invoke the `init-pipeline` skill directly, then stop |
-| Anything else | Continue to Step 1 (normal pipeline) |
+| Anything else | Continue to Phase 1 (Init) |
+---
+# 4-Phase Pipeline
+The pipeline has 4 phases. Each phase groups related workflows. Individual workflows within a phase can be enabled/disabled via `workflow_policy` in run-config.
+| Phase | Contains | Owner Skill | Key Output |
+|-------|----------|-------------|------------|
+| **Init** | Setup, prereqs, run directory, input scan | `wz:wazir` (inline) | `run-config.yaml` |
+| **Clarifier** | Research, clarify, specify, brainstorm, plan | `wz:clarifier` | Approved spec + design + plan |
+| **Executor** | Implement, verify | `wz:executor` | Code + verification proof |
+| **Final Review** | Review vs original input, learn, prepare next | `wz:reviewer` | Verdict + learnings + handoff |
 ---
-# Normal Pipeline Mode
+# Phase 1: Init
 ## Step 1: Capture the Request
@@ -37,16 +73,28 @@ If the user provided no text after `/wazir`, ask:
 Save their answer as the briefing, then continue.
+### Scan Input Directory
+Scan both `input/` (project-level) and `.wazir/input/` (state-level) for existing briefing materials. If files exist beyond `briefing.md`, list them:
+> **Found input files:**
+> - `input/2026-03-19-deferred-items.md`
+> - `.wazir/input/briefing.md`
+>
+> Using all found input as context for clarification.
 ### Inline Modifiers
-Parse the request for inline modifiers before the main text. These skip the corresponding interview question:
+Parse the request for inline modifiers before the main text:
 - `/wazir quick fix the login redirect` → depth = quick, intent = bugfix
 - `/wazir deep design a new onboarding flow` → depth = deep, intent = feature
-- `/wazir feature add CSV export` → intent = feature, depth = standard (default)
 Recognized modifiers:
 - **Depth:** `quick`, `deep` (standard is default when omitted)
+- **Interaction mode:** `auto`, `interactive` (guided is default when omitted)
+  - `/wazir auto fix the auth bug` → interaction_mode = auto
+  - `/wazir interactive design the onboarding` → interaction_mode = interactive
 - **Intent:** `bugfix`, `feature`, `refactor`, `docs`, `spike`
 ## Step 2: Check Prerequisites
@@ -58,24 +106,18 @@ Run `which wazir` to check if the CLI is installed.
 **If not installed**, present:
 > **The Wazir CLI is not installed. It's required for event capture, validation, and indexing.**
->
-> **How would you like to install it?**
->
-> 1. **npm** (Recommended) — `npm install -g @wazir-dev/cli`
-> 2. **Local link** — `npm link` from the Wazir project root
-If the user picks 1, run `npm install -g @wazir-dev/cli` and verify with `wazir --version`.
-If the user picks 2, run `npm link` from the project root and verify.
-The CLI is **required** — the pipeline uses `wazir capture`, `wazir validate`, `wazir index`, and `wazir doctor` throughout execution. There is no skip option.
+Ask the user via AskUserQuestion:
+- **Question:** "The Wazir CLI is not installed. How would you like to install it?"
+- **Options:**
+  1. "npm install -g @wazir-dev/cli" *(Recommended)*
+  2. "npm link from the Wazir project root"
-**If installed**, run `wazir doctor --json` to verify repo health.
+Wait for the user's selection before continuing.
-If doctor reports unhealthy:
-> **Repo health check failed:** [details from doctor output]
-> Fix issues before running the pipeline.
+The CLI is **required** — the pipeline uses `wazir capture`, `wazir validate`, `wazir index`, and `wazir doctor` throughout execution.
-Stop. Do NOT continue the pipeline until the health check passes.
+**If installed**, run `wazir doctor --json` to verify repo health. Stop if unhealthy.
 ### Branch Check
@@ -83,13 +125,14 @@ Run `wazir validate branches` to check the current git branch.
 - If on `main` or `develop`:
   > You're on **[branch]**. The pipeline requires a feature branch.
-  >
-  > 1. **Create feat/<slug>** (Recommended) — branch from current
-  > 2. **Continue on [branch]** — not recommended for feature/refactor work
-  Wait for the user to answer before continuing.
+  Ask the user via AskUserQuestion:
+  - **Question:** "You're on a protected branch. Create a feature branch?"
+  - **Options:**
+    1. "Create feat/<slug> from current branch" *(Recommended)*
+    2. "Continue on current branch — not recommended"
-- If branch name is invalid (not `feat/`, `fix/`, `chore/`, etc.): warn but continue.
+  Wait for the user's selection before continuing.
 ### Index Check
@@ -107,87 +150,83 @@ fi
 Check if `.wazir/state/config.json` exists.
-- **If missing** — invoke the `init-pipeline` skill. This will ask the user interactive questions to set up the config.
-- **If exists** — continue to Step 2.5.
+- **If missing** — invoke the `init-pipeline` skill.
+- **If exists** — continue.
-## Step 2.5: Create Run Directory
+## Step 3: Create Run Directory
 Generate a run ID using the current timestamp: `run-YYYYMMDD-HHMMSS`
 ```bash
-mkdir -p .wazir/runs/run-YYYYMMDD-HHMMSS/{sources,tasks,artifacts,reviews}
+mkdir -p .wazir/runs/run-YYYYMMDD-HHMMSS/{sources,tasks,artifacts,reviews,clarified}
 ln -sfn run-YYYYMMDD-HHMMSS .wazir/runs/latest
 ```
-If a previous completed run exists (check for a `completed_at` field in the previous `latest` run's `run-config.yaml`), record its `run_id` as `parent_run_id` in the new run's config.
-After creating the run directory, initialize event capture:
+Initialize event capture:
 ```bash
-wazir capture init --run <run-id> --phase clarify --status starting
+wazir capture init --run <run-id> --phase init --status starting
 ```
-## Step 3: Pre-Flight Configuration
+### Resume Detection
+Check if a previous incomplete run exists (via `latest` symlink pointing to a run without `completed_at`).
-Build the run configuration. Skip questions that were answered via inline modifiers.
+**If previous incomplete run found**, present:
-### Question 1: Depth (if not set via modifier)
+> **A previous incomplete run was detected:** `<previous-run-id>`
-> **How thorough should this run be?**
->
-> 1. **Quick** — Minimal research, single-pass review, fast execution. Good for small fixes and config changes.
-> 2. **Standard** (Recommended) — Balanced research, multi-pass hardening, full review. Good for most features.
-> 3. **Deep** — Extended research, thorough hardening, strict review thresholds. Good for complex or security-critical work.
+Ask the user via AskUserQuestion:
+- **Question:** "A previous incomplete run was detected. Resume or start fresh?"
+- **Options:**
+  1. "Resume from the last completed phase" *(Recommended)*
+  2. "Start fresh with a new empty run"
-### Question 2: Intent (if not set via modifier and not obvious from the request)
+Wait for the user's selection before continuing.
-Only ask this if the request is ambiguous. If the intent is clear from the text (e.g., "fix the bug" → bugfix), infer it and skip.
+**If Resume:**
+- Copy `clarified/` from previous run into new run, EXCEPT `user-feedback.md`.
+- Detect last completed phase by checking which artifacts exist.
+- **Staleness check:** If input files are newer than copied artifacts, warn and offer to re-run clarification.
-> **What kind of work is this?**
->
-> 1. **Feature** (Recommended) — New functionality or enhancement
-> 2. **Bugfix** — Fix broken behavior
-> 3. **Refactor** — Restructure without changing behavior
-> 4. **Docs** — Documentation only
-> 5. **Spike** — Research and exploration, no production code
+## Step 4: Build Run Config
-### Question 3: Agent Teams (conditional)
+**No questions asked.** Depth, intent, and mode are all inferred or defaulted.
-Only ask this if ALL of these are true:
-- The host is Claude Code (not Codex/Gemini/Cursor)
-- Depth is `standard` or `deep`
-- Intent is `feature` or `refactor` (not bugfix/docs/spike)
+### Intent Inference
-> **Would you like to use Agent Teams for parallel execution?**
->
-> 1. **No** (Recommended) — Tasks run sequentially. Predictable, lower cost.
-> 2. **Yes** — Spawns parallel teammates for independent tasks. Potentially faster and richer output.
->
-> *Agent Teams is experimental from Claude's side. Requires Opus model. Higher token consumption.*
+Infer intent from the request text using keyword matching:
+| Keywords in request | Inferred Intent |
+|-------------------|-----------------|
+| fix, bug, broken, crash, error, issue, wrong | `bugfix` |
+| refactor, clean, restructure, reorganize, rename, simplify | `refactor` |
+| doc, document, readme, guide, explain | `docs` |
+| research, spike, explore, investigate, prototype | `spike` |
+| (anything else) | `feature` |
+Depth defaults to `standard`. Override only via inline modifiers (`/wazir quick ...`, `/wazir deep ...`).
 ### Write Run Config
-Save all decisions to `.wazir/runs/<run-id>/run-config.yaml`:
+Save to `.wazir/runs/<run-id>/run-config.yaml`:
 ```yaml
-# Identity
-run_id: run-20260317-143000
-parent_run_id: null                    # set if resuming/continuing from a prior run
-continuation_reason: null              # e.g. "review found minor fixes"
+run_id: run-YYYYMMDD-HHMMSS
+parent_run_id: null
+continuation_reason: null
-# User request
 request: "the original user request"
-request_summary: "short summary of intent"
-parsed_intent: feature                 # feature | bugfix | refactor | docs | spike
-entry_point: "/wazir"                  # how the user entered the pipeline
+request_summary: "short summary"
+parsed_intent: feature
+entry_point: "/wazir"
-# Configuration
-depth: standard                        # quick | standard | deep
-team_mode: sequential                  # sequential | parallel
-parallel_backend: none                 # none | claude_teams (future: subagents, worktrees)
+depth: standard
+interaction_mode: guided  # auto | guided | interactive
-# Phase policy (system-decided, not user-facing)
-phase_policy:
+# Workflow policy — individual workflows within each phase
+workflow_policy:
+  # Clarifier phase workflows
   discover:       { enabled: true, loop_cap: 10 }
   clarify:        { enabled: true, loop_cap: 10 }
   specify:        { enabled: true, loop_cap: 10 }
@@ -197,333 +236,419 @@ phase_policy:
   design-review:  { enabled: true, loop_cap: 10 }
   plan:           { enabled: true, loop_cap: 10 }
   plan-review:    { enabled: true, loop_cap: 10 }
+  # Executor phase workflows
   execute:        { enabled: true, loop_cap: 10 }
   verify:         { enabled: true, loop_cap: 5 }
+  # Final Review phase workflows
   review:         { enabled: true, loop_cap: 10 }
-  learn:          { enabled: false, loop_cap: 5 }
-  prepare_next:   { enabled: false, loop_cap: 5 }
+  learn:          { enabled: true, loop_cap: 5 }
+  prepare_next:   { enabled: true, loop_cap: 5 }
   run_audit:      { enabled: false, loop_cap: 10 }
-# Research
-research_topics: []                    # populated by researcher phase
+research_topics: []
-# Timestamps
-created_at: 2026-03-17T14:30:00Z
+created_at: "YYYY-MM-DDTHH:MM:SSZ"
 completed_at: null
 ```
-Mutable execution state (current phase, task progress, error counts) lives in `.wazir/runs/<run-id>/status.json`, NOT in `run-config.yaml`. The run config captures setup decisions only.
-### Phase Policy
-Map intent + depth to applicable phases. The system decides — the user does NOT pick phases.
+### Workflow Skip Rules
-**Phase classes:**
+Map intent + depth to applicable workflows. The system decides — the user does NOT pick.
-| Class | Phases | Rules |
-|-------|--------|-------|
-| **Core** (always run) | `clarify`, `verify`, `review` | Never skipped |
+| Class | Workflows | Rules |
+|-------|-----------|-------|
+| **Core** (always run) | `clarify`, `execute`, `verify`, `review` | Never skipped |
 | **Adaptive** (run when evidence says so) | `discover`, `design`, `author`, `specify` | Skipped for bugfix/docs/spike at quick depth |
 | **Scale** (intensity varies) | `spec-challenge`, `plan-review`, `design-review` | Loop cap controls iteration depth |
+| **Post-run** (always run) | `learn`, `prepare_next` | Part of Final Review phase |
-Log skip decisions to the run's `run-config.yaml` with reasons:
-```yaml
-phase_policy:
-  discover:       { enabled: true, loop_cap: 10 }
-  design:         { enabled: false, loop_cap: 10, reason: "bugfix intent — no design needed" }
-  spec-challenge: { enabled: true, loop_cap: 10 }
-```
+Log skip decisions with reasons in `workflow_policy`.
 ### Confidence Gate
-After building the run config, evaluate confidence:
-- **High confidence** (clear intent, depth set, no ambiguity) — show a one-line summary and proceed:
-  > **Running: standard depth, feature, sequential. 11 of 15 phases. Proceeding...**
-- **Low confidence** (ambiguous intent, unclear scope) — show the full plan and ask:
-  > **Here's the run plan:**
-  > - Depth: standard
-  > - Intent: feature
-  > - Phases: [list enabled phases]
-  > - Skipped: [list skipped with reasons]
-  >
-  > **Does this look right?**
-  > 1. **Yes, proceed** (Recommended)
-  > 2. **No, let me adjust**
-## Step 4: Run Pipeline Phases
-The full pipeline runs these phases in order. Each phase produces an artifact that must pass its review loop before flowing to the next phase. Review mode is always passed explicitly (`--mode`) -- no auto-detection.
-### 4a: Source Capture
-Before invoking the clarifier, capture all referenced sources locally:
-- Fetch all URLs referenced in `.wazir/input/` briefing files
-- Save fetched content to `.wazir/runs/<run-id>/sources/`
-- Name files as `src-NNN-<slug>.md` (fetched content) or `src-NNN-fetch-failed.json` (failures)
-- Create `.wazir/runs/<run-id>/sources/manifest.json` indexing all captures:
-```json
-[
-  {
-    "id": "src-001",
-    "origin_url": "https://...",
-    "fetch_time": "2026-03-17T14:30:00Z",
-    "content_hash": "sha256:abc...",
-    "status": "captured",
-    "local_path": "src-001-github-readme.md"
-  },
-  {
-    "id": "src-002",
-    "origin_url": "https://...",
-    "status": "failed",
-    "error": "403 Forbidden",
-    "fetch_time": "2026-03-17T14:30:01Z"
-  }
-]
-```
+After building run config:
-Research briefs produced by the researcher must reference local paths (`sources/src-001-...`) instead of live URLs. The original URL is preserved in the manifest for provenance. Failures are recorded explicitly — never silently skipped.
+- **High confidence** — one-line summary and proceed:
+  > **Running: standard depth, feature, sequential. Proceeding...**
-### 4b: Clarify (clarifier role)
+- **Low confidence** — show plan and ask:
-```bash
-wazir capture event --run <run-id> --event phase_enter --phase clarify --status in_progress
-```
+  Ask the user via AskUserQuestion:
+  - **Question:** "Does this run configuration look right?"
+  - **Options:**
+    1. "Yes, proceed" *(Recommended)*
+    2. "No, let me adjust"
-Invoke the clarifier skill for Phase 1A.
-Produces clarification artifact.
-Review: clarification-review loop (`--mode clarification-review`, spec/clarification dimensions).
-Pass count: quick=3, standard=5, deep=7. No extension.
-Checkpoint: user approves clarification.
+  Wait for the user's selection before continuing.
 ```bash
-wazir capture event --run <run-id> --event phase_exit --phase clarify --status completed
+wazir capture event --run <run-id> --event phase_exit --phase init --status completed
 ```
-### 4c: Research (researcher role via discover workflow)
+Run the phase report and display it to the user:
 ```bash
-wazir capture event --run <run-id> --event phase_enter --phase discover --status in_progress
+wazir report phase --run <run-id> --phase init
 ```
-Clarifier delegates to discover workflow (researcher role).
-Produces research artifact.
-Review: research-review loop (`--mode research-review`, research dimensions).
-Pass count: quick=3, standard=5, deep=7. No extension.
-Skip condition: depth=quick AND intent=bugfix.
+Output the report content to the user in the conversation.
-```bash
-wazir capture event --run <run-id> --event phase_exit --phase discover --status completed
-```
+---
-### 4d: Specify (specifier role)
+# Interaction Modes
-```bash
-wazir capture event --run <run-id> --event phase_enter --phase specify --status in_progress
-```
+The `interaction_mode` field in run-config controls how the pipeline interacts with the user:
-Delegate to specify workflow.
-Specifier produces measurable spec from clarification + research.
-Review: spec-challenge loop (`--mode spec-challenge`, spec/clarification dimensions).
-Pass count: quick=3, standard=5, deep=7. No extension.
-Checkpoint: user approves spec.
+| Mode | Inline modifier | Behavior | Best for |
+|------|----------------|----------|----------|
+| **`guided`** | (default) | Pipeline runs, pauses at phase checkpoints for user approval. Current default behavior. | Most work |
+| **`auto`** | `/wazir auto ...` | No human checkpoints. Codex reviews all. Gating agent decides continue/loop_back/escalate. Stops ONLY on escalate. | Overnight, clear spec, well-understood domain |
+| **`interactive`** | `/wazir interactive ...` | More questions, more discussion, co-designs with user. Researcher presents options. Executor checks approach before coding. | Ambiguous requirements, new domain, learning |
-```bash
-wazir capture event --run <run-id> --event phase_exit --phase specify --status completed
-```
+## `auto` mode constraints
-### 4d.5: Author (content-author role) [ADAPTIVE]
+- **Codex REQUIRED** — refuse to start auto mode if `multi_tool.codex` is not configured in `.wazir/state/config.json`. Error: "Auto mode requires an external reviewer (Codex). Configure it first or use guided mode."
+- **On escalate:** STOP immediately, write the escalation reason to `.wazir/runs/<id>/escalations/`, and wait for user input
+- **Wall-clock limit:** default 4 hours. If exceeded, stop with escalation.
+- **Never auto-commits to main** — always work on feature branch
+- All checkpoints (AskUserQuestion) are skipped — gating agent evaluates phase reports and decides
-```bash
-wazir capture event --run <run-id> --event phase_enter --phase author --status in_progress
-```
+## `guided` mode (default)
-Enabled when `phase_policy.author.enabled = true` (default: false).
-Content-author writes non-code content artifacts.
-Approval gate: human approval required (not a review loop).
-Skip condition: disabled by default. Enable for content-heavy projects.
+Current behavior — no changes needed. Checkpoints at phase boundaries, user approves before advancing.
-```bash
-wazir capture event --run <run-id> --event phase_exit --phase author --status completed
+## `interactive` mode
+- **Clarifier:** asks more detailed questions, presents research findings with options: "I found 3 approaches — which interests you?"
+- **Executor:** checks approach before coding: "I'm about to implement auth with Supabase — sound right?"
+- **Reviewer:** discusses findings with user, not just presents verdict: "I found a potential auth bypass — here's why I think it's high severity, do you agree?"
+- Slower but highest quality for complex/ambiguous work
+## Mode checking in phase skills
+All phase skills check `interaction_mode` from run-config at every checkpoint:
+```
+# Read from run-config
+interaction_mode = run_config.interaction_mode ?? 'guided'
+# At each checkpoint:
+if interaction_mode == 'auto':
+    # Skip checkpoint, let gating agent decide
+elif interaction_mode == 'interactive':
+    # More detailed question, present options, discuss
+else:
+    # guided — standard checkpoint with AskUserQuestion
 ```
-### 4e: Brainstorm (designer role)
+---
+# Two-Level Phase Model
+The pipeline has 4 top-level **phases**, each containing multiple **workflows** with review loops:
+```
+Phase 1: Init
+  └── (inline — no sub-workflows)
+Phase 2: Clarifier
+  ├── discover (research) ← research-review loop
+  ├── clarify ← clarification-review loop
+  ├── specify ← spec-challenge loop
+  ├── author (adaptive) ← approval gate
+  ├── design ← design-review loop
+  └── plan ← plan-review loop
+Phase 3: Executor
+  ├── execute (per-task) ← task-review loop per task
+  └── verify
+Phase 4: Final Review
+  ├── review (final) ← scored review
+  ├── learn
+  └── prepare_next
+```
+**Event capture uses both levels.** When emitting phase events, include `--parent-phase`:
 ```bash
-wazir capture event --run <run-id> --event phase_enter --phase design --status in_progress
+wazir capture event --run <id> --event phase_enter --phase discover --parent-phase clarifier --status in_progress
 ```
-Invoke brainstorming skill for Phase 1B.
-Interactive -- pauses for user approval of design concept.
-After user approval: design-review loop (`--mode design-review`,
-canonical design-review dimensions: spec coverage, design-spec consistency,
-accessibility, visual consistency, exported-code fidelity).
-Pass count: quick=3, standard=5, deep=7. No extension.
-Skip condition: intent=bugfix/docs.
+**Progress markers between workflows:** After each workflow completes, output:
+> Phase 2: Clarifier > Workflow: specify (3 of 6 workflows complete)
+**`wazir status` shows both levels:** "Phase 2: Clarifier > Workflow: specify"
+---
+# Phase 2: Clarifier
+**Before starting this phase, output to the user:**
+> **Clarifier Phase** — About to research your codebase, clarify requirements, harden the spec, brainstorm designs, and produce an execution plan.
+>
+> **Why this matters:** Without this, I'd guess your tech stack, misunderstand constraints, miss edge cases in the spec, and build the wrong architecture. Every ambiguity left unresolved here becomes a bug or rework cycle later.
+>
+> **Looking for:** Unstated assumptions, scope boundaries, conflicting requirements, missing acceptance criteria
 ```bash
-wazir capture event --run <run-id> --event phase_exit --phase design --status completed
+wazir capture event --run <run-id> --event phase_enter --phase clarifier --status in_progress
 ```
-### 4f: Plan (planner role via wz:writing-plans)
+Invoke the `wz:clarifier` skill. It handles all sub-workflows internally:
+1. **Source Capture** — fetch URLs from input
+2. **Research** (discover workflow) — codebase + external research
+3. **Clarify** (clarify workflow) — scope, constraints, assumptions
+4. **Spec Harden** (specify + spec-challenge workflows) — measurable spec
+5. **Brainstorm** (design + design-review workflows) — design approaches
+6. **Plan** (plan + plan-review workflows) — execution plan
+Each sub-workflow has its own review loop. User checkpoints between major steps.
+### Scope Invariant
+**Hard rule:** `items_in_plan >= items_in_input` unless the user explicitly approves scope reduction. The clarifier MUST NOT autonomously tier, defer, or drop items from the user's input. It can suggest prioritization, but the decision belongs to the user.
+Output: approved spec + design + execution plan in `.wazir/runs/latest/clarified/`.
+**After completing this phase, output to the user:**
+> **Clarifier Phase complete.**
+>
+> **Found:** [N] ambiguities resolved, [N] assumptions made explicit, [N] scope boundaries drawn, [N] acceptance criteria hardened
+>
+> **Without this phase:** Requirements would be interpreted differently across tasks, acceptance criteria would be vague and untestable, the design would be ad-hoc, and the plan would miss dependency ordering
+>
+> **Changed because of this work:** [List spec tightening changes, resolved questions, design decisions, scope adjustments]
 ```bash
-wazir capture event --run <run-id> --event phase_enter --phase plan --status in_progress
+wazir capture event --run <run-id> --event phase_exit --phase clarifier --status completed
 ```
-Delegate to `wz:writing-plans`.
-Planner produces execution plan and task specs.
-Review: plan-review loop (`--mode plan-review`, plan dimensions).
-Pass count: quick=3, standard=5, deep=7. No extension.
-Checkpoint: user approves plan.
+Run the phase report and display savings to the user:
 ```bash
-wazir capture event --run <run-id> --event phase_exit --phase plan --status completed
+wazir report phase --run <run-id> --phase clarifier
+wazir stats --run <run-id>
 ```
-### 4g: Execute (executor role)
+**Show savings in conversation output:**
+> **Context savings this phase:** Used wazir index for [N] queries and context-mode for [M] commands, saving ~[X] tokens ([Y]% reduction). Without these, this phase would have consumed [A] tokens instead of [B].
+Output the report content to the user in the conversation.
+---
+# Phase 3: Executor
+**Before starting this phase, output to the user:**
+> **Executor Phase** — About to implement [N] tasks in dependency order with TDD (test-first), per-task code review, and verification before each commit.
+>
+> **Why this matters:** Without this discipline, tests get skipped, edge cases get missed, integration points break silently, and review catches problems too late when they're expensive to fix.
+>
+> **Looking for:** Correct dependency ordering, test coverage for each task, clean per-task review passes, no implementation drift from the approved plan
+## Phase Gate (Hard Gate)
+Before entering the Executor phase, verify ALL clarifier artifacts exist:
+- [ ] `.wazir/runs/latest/clarified/clarification.md`
+- [ ] `.wazir/runs/latest/clarified/spec-hardened.md`
+- [ ] `.wazir/runs/latest/clarified/design.md`
+- [ ] `.wazir/runs/latest/clarified/execution-plan.md`
+If ANY file is missing, **STOP**:
+> **Cannot enter Executor phase: missing prerequisite artifacts from Clarifier.**
+>
+> Missing: [list missing files]
+>
+> The Clarifier phase must complete before execution can begin. Run `/wazir:clarifier` first.
+**Do NOT skip this check. Do NOT rationalize that the input is "clear enough" to bypass clarification. Every pipeline run must produce these artifacts.**
 ```bash
-wazir capture event --run <run-id> --event phase_enter --phase execute --status in_progress
+wazir capture event --run <run-id> --event phase_enter --phase executor --status in_progress
 ```
-**Pre-execution gate** — run before the first task:
+**Pre-execution gate:**
 ```bash
 wazir validate manifest && wazir validate hooks
-# If either fails, stop and report the failure. Do NOT proceed to task execution.
+# Hard gate — stop if either fails.
 ```
-Invoke executor skill for Phase 2.
-Per-task review: task-review loop (`--mode task-review --task-id <NNN>`,
-5 task-execution dimensions) before each commit.
-Review logs: `execute-task-<NNN>-review-pass-<N>.md`
-Cap tracking: `wazir capture loop-check --task-id <NNN>`
-Codex error handling: non-zero exit -> codex-unavailable, self-review only.
-NOTE: per-task review is NOT the final review.
+Invoke the `wz:executor` skill. It handles:
-If `team_mode: parallel` in run-config, the executor spawns Agent Teams for independent tasks. Otherwise, tasks run sequentially.
+1. **Execute** (execute workflow) — per-task TDD cycle with review before each commit
+2. **Verify** (verify workflow) — deterministic verification of all claims
-```bash
-wazir capture event --run <run-id> --event phase_exit --phase execute --status completed
-```
+Per-task review: `--mode task-review`, 5 task-execution dimensions.
+Tasks always run sequentially.
-### 4h: Verify (verifier role)
+Output: code changes + verification proof in `.wazir/runs/latest/artifacts/`.
-```bash
-wazir capture event --run <run-id> --event phase_enter --phase verify --status in_progress
-```
+**After completing this phase, output to the user:**
-Deterministic verification of execution claims.
-Not a review loop -- produces proof, not findings.
+> **Executor Phase complete.**
+>
+> **Found:** [N]/[N] tasks implemented, [N] tests written, [N] per-task review passes completed, [N] findings fixed before commit
+>
+> **Without this phase:** Code would ship without tests, review findings would accumulate until final review (10x more expensive to fix), and verification claims would be unsubstantiated
+>
+> **Changed because of this work:** [List of commits with conventional commit messages, test counts, verification evidence collected]
 ```bash
-wazir capture event --run <run-id> --event phase_exit --phase verify --status completed
+wazir capture event --run <run-id> --event phase_exit --phase executor --status completed
 ```
-### 4i: Final Review (reviewer role in final mode)
+Run the phase report and display savings to the user:
 ```bash
-wazir capture event --run <run-id> --event phase_enter --phase review --status in_progress
+wazir report phase --run <run-id> --phase executor
+wazir stats --run <run-id>
 ```
-Invoke reviewer skill with `--mode final`.
-7-dimension scored review (correctness, completeness, wiring, verification,
-drift, quality, documentation). Score 0-70.
-This IS the scored final review gate.
+Output the report content to the user in the conversation.
+**Show savings in conversation output:**
+> **Context savings this phase:** Used wazir index for [N] queries and context-mode for [M] commands, saving ~[X] tokens ([Y]% reduction).
+---
+# Phase 4: Final Review
+**Before starting this phase, output to the user:**
+> **Final Review Phase** — About to run adversarial 7-dimension review comparing the implementation against your original input, extract durable learnings, and prepare the handoff.
+>
+> **Why this matters:** Without this, implementation drift ships undetected, missing acceptance criteria go unnoticed, untested code paths hide bugs, and the same mistakes repeat in the next run.
+>
+> **Looking for:** Spec violations, missing features, dead code paths, unsubstantiated claims, scope creep, security gaps, stale documentation
+## Phase Gate (Hard Gate)
+Before entering the Final Review phase, verify the Executor produced its proof:
+- [ ] `.wazir/runs/latest/artifacts/verification-proof.md`
+If missing, **STOP**:
+> **Cannot enter Final Review: missing verification proof from Executor.**
+>
+> The Executor phase must complete and produce `verification-proof.md` before final review. Run `/wazir:executor` first.
 ```bash
-wazir capture event --run <run-id> --event phase_exit --phase review --status completed
+wazir capture event --run <run-id> --event phase_enter --phase final_review --status in_progress
 ```
-### 4j: Learn (learner role) [ADAPTIVE]
+This phase validates the implementation against the **ORIGINAL INPUT** (not the task specs — the executor's per-task reviewer already covered that).
-Enabled when `phase_policy.learn.enabled = true` (default: false).
-Extract durable learnings from the completed run.
-No review loop. Learnings require explicit scope tags.
-Skip condition: disabled by default. Enable for retrospective runs.
+### 4a: Review (reviewer role in final mode)
-### 4k: Prepare Next (planner role) [ADAPTIVE]
+Invoke `wz:reviewer --mode final`.
+7-dimension scored review comparing implementation against the original user input.
+Score 0-70. Verdicts: PASS (56+), NEEDS MINOR FIXES (42-55), NEEDS REWORK (28-41), FAIL (0-27).
-Enabled when `phase_policy.prepare_next.enabled = true` (default: false).
-Prepare context and handoff for the next run.
-No review loop. No implicit carry-forward of unapproved learnings.
-Skip condition: disabled by default.
+### 4b: Learn (learner role)
-`run_audit` is NOT part of the pipeline flow -- it is an on-demand standalone phase invoked separately.
+Extract durable learnings from the completed run:
+- Scan all review findings (internal + Codex)
+- Propose learnings to `memory/learnings/proposed/`
+- Findings that recur across 2+ runs → auto-proposed as learnings
+- Learnings require explicit scope tags (roles, stacks, concerns)
-### Resume Detection
+### 4c: Prepare Next (planner role)
-If the run has partial progress, detect the latest completed phase and resume:
+Prepare context and handoff for the next run:
+- Write handoff document
+- Compress/archive unneeded files
+- Record what's left to do
-- If clarification exists but no spec: resume at 4d (specify)
-- If spec exists but no design: resume at 4e (brainstorm)
-- If design exists but no plan: resume at 4f (plan)
-- If plan exists but no task artifacts: resume at 4g (execute)
-- If task artifacts exist but no verification: resume at 4h (verify)
-- If verification exists: resume at 4i (final review)
+**After completing this phase, output to the user:**
-Present resume options:
-> **Previous progress detected (completed through [phase]).**
+> **Final Review Phase complete.**
+>
+> **Found:** [N] findings across 7 dimensions, [N] blocking issues, [N] warnings, [N] learnings proposed for future runs
 >
-> **What would you like to do?**
-> 1. **Resume from [next phase]** (Recommended)
-> 2. **Start fresh** — Re-run all phases from scratch
+> **Without this phase:** Implementation drift from the original request would ship undetected, untested paths would hide production bugs, and recurring mistakes would never get captured as learnings
+>
+> **Changed because of this work:** [List of findings fixed, score achieved, learnings extracted, handoff prepared]
+```bash
+wazir capture event --run <run-id> --event phase_exit --phase final_review --status completed
+```
-## Step 5: Present Results
+Run the phase report and display it to the user:
+```bash
+wazir report phase --run <run-id> --phase final_review
+```
-After the reviewer completes, present the verdict and offer next steps with numbered options:
+Output the report content to the user in the conversation.
+---
+## Step 5: CHANGELOG + Gitflow Validation (Hard Gates)
+Before presenting results:
+```bash
+wazir validate changelog --require-entries --base main
+wazir validate commits --base main
+```
+Both must pass before PR. These are not warnings.
+## Step 6: Present Results
+After the reviewer completes, present verdict with numbered options:
 ### If PASS (score 56+):
 > **Result: PASS (score/70)**
->
-> [score breakdown]
->
-> **What would you like to do?**
-> 1. **Create a PR** (Recommended)
-> 2. **Merge directly**
-> 3. **Review the changes first**
+Ask the user via AskUserQuestion:
+- **Question:** "Pipeline passed. What would you like to do next?"
+- **Options:**
+  1. "Create a PR" *(Recommended)*
+  2. "Merge directly"
+  3. "Review the changes first"
+Wait for the user's selection before continuing.
 ### If NEEDS MINOR FIXES (score 42-55):
 > **Result: NEEDS MINOR FIXES (score/70)**
->
-> [findings list]
->
-> **What would you like to do?**
-> 1. **Auto-fix and re-review** (Recommended)
-> 2. **Fix manually**
-> 3. **Accept as-is**
+Ask the user via AskUserQuestion:
+- **Question:** "Minor issues found. How should we handle them?"
+- **Options:**
+  1. "Auto-fix and re-review" *(Recommended)*
+  2. "Fix manually"
+  3. "Accept as-is"
+Wait for the user's selection before continuing.
 ### If NEEDS REWORK (score 28-41):
 > **Result: NEEDS REWORK (score/70)**
->
-> [findings list with affected tasks]
->
-> **What would you like to do?**
-> 1. **Re-run affected tasks** (Recommended)
-> 2. **Review findings in detail**
-> 3. **Abandon this run**
+Ask the user via AskUserQuestion:
+- **Question:** "Significant issues found. How should we proceed?"
+- **Options:**
+  1. "Re-run affected tasks" *(Recommended)*
+  2. "Review findings in detail"
+  3. "Abandon this run"
+Wait for the user's selection before continuing.
 ### If FAIL (score 0-27):
 > **Result: FAIL (score/70)**
 >
-> [full findings]
->
-> Something fundamental went wrong. Review the findings above and decide how to proceed.
+> Something fundamental went wrong. Review the findings above.
 ### Run Summary
-After presenting results (regardless of verdict), capture the run summary:
 ```bash
 wazir capture summary --run <run-id>
 wazir status --run <run-id> --json
@@ -531,19 +656,18 @@ wazir status --run <run-id> --json
 ## Error Handling
-If any phase fails or the user cancels:
-1. Report which phase failed and why
-2. Present recovery options:
+If any phase fails:
 > **Phase [name] failed: [reason]**
->
-> **What would you like to do?**
-> 1. **Retry this phase** (Recommended)
-> 2. **Skip and continue** (only if phase is adaptive, not core)
-> 3. **Abort the run**
-The run config persists, so running `/wazir` again will detect the partial state and offer to resume.
+Ask the user via AskUserQuestion:
+- **Question:** "Phase [name] failed: [reason]. How should we proceed?"
+- **Options:**
+  1. "Retry this phase" *(Recommended)*
+  2. "Skip and continue" *(only if workflows within phase are adaptive)*
+  3. "Abort the run"
+Wait for the user's selection before continuing.
 ---
@@ -551,27 +675,22 @@ The run config persists, so running `/wazir` again will detect the partial state
 Triggered by `/wazir audit` or `/wazir audit <focus>`.
-Runs a structured codebase audit. Invokes the `run-audit` skill with the interactive question flow.
+Runs a structured codebase audit. Invokes the `run-audit` skill.
-## Inline Audit Modifiers
+Parse inline audit types: `/wazir audit security` → skip Question 1.
-Parse for known audit types after `audit`:
+After audit:
-- `/wazir audit security` → audit type = security, skip Question 1
-- `/wazir audit deps` → audit type = dependencies, skip Question 1
-- `/wazir audit` → ask Question 1
+Ask the user via AskUserQuestion:
+- **Question:** "Audit complete. What would you like to do with the findings?"
+- **Options:**
+  1. "Review the findings" *(Recommended)*
+  2. "Generate a fix plan"
+  3. "Run the pipeline on the fix plan"
-Then let the `run-audit` skill handle the rest (scope, output mode). All its questions already follow the interactive numbered pattern.
+Wait for the user's selection before continuing.
-After the audit completes:
-> **Audit complete. What would you like to do?**
->
-> 1. **Review the findings** (Recommended)
-> 2. **Generate a fix plan** — turn findings into implementation tasks
-> 3. **Run the pipeline on the fix plan** — generate plan, then execute and review fixes
-If the user picks option 3, save the findings as the briefing and run the normal pipeline (Steps 3-5) with intent = `bugfix`.
+If option 3, save findings as briefing and run pipeline with intent = `bugfix`.
 ---
@@ -579,89 +698,53 @@ If the user picks option 3, save the findings as the briefing and run the normal
 Triggered by `/wazir prd` or `/wazir prd <run-id>`.
-Generates a Product Requirements Document from a completed pipeline run.
-## Pre-Flight
-1. If a `<run-id>` was provided, use that run's directory. Otherwise, use `.wazir/runs/latest`.
-2. Verify the run has completed artifacts:
-   - Design doc in the run's tasks or in `docs/plans/`
-   - Task specs in the run's `clarified/`
-   - Review results in the run's `reviews/` (if available)
-3. If the run is incomplete or has no artifacts:
-> **No completed run found. Run `/wazir <your request>` first to create a pipeline run, then use `/wazir prd` to generate the PRD.**
+Generates a PRD from a completed run. Reads approved design, task specs, execution plan, review results. Saves to `docs/prd/YYYY-MM-DD-<topic>-prd.md`.
-## Inputs (read-only)
+After generation:
-Read these artifacts from the completed run:
-- Approved design document
-- Task specs (all `spec.md` files in `clarified/`)
-- Execution plan
-- Review results and verification proofs (if available)
-- Run config (for context on depth, intent, decisions)
-## Output
-Generate a PRD and save to `docs/prd/YYYY-MM-DD-<topic>-prd.md`.
-### PRD Template
-```markdown
-# Product Requirements Document — <Topic>
+Ask the user via AskUserQuestion:
+- **Question:** "PRD generated. What would you like to do?"
+- **Options:**
+  1. "Review the PRD" *(Recommended)*
+  2. "Commit it"
+  3. "Edit before committing"
-**Generated from run:** `<run-id>`
-**Date:** YYYY-MM-DD
+Wait for the user's selection before continuing.
-## Vision & Core Thesis
-[1-2 paragraphs synthesized from the design document's core approach]
-## What We're Building
-### Feature Area 1: <name>
+---
-**What:** [description from task specs]
-**Why:** [rationale from design doc]
-**Requirements:**
-- [ ] [from task spec acceptance criteria]
-- [ ] ...
+## Reasoning Chain Output
-### Feature Area 2: <name>
-...
+Every phase produces reasoning output at two layers:
-## Success Criteria
+### Layer 1: Conversation Output (concise — for the user)
-[From review results and verification proofs — what was tested and confirmed]
+Before each major decision, output one trigger sentence and one reasoning sentence:
-## Technical Constraints
+> "Your request mentions 'overnight autonomous run' — researching how Devin and Karpathy's autoresearch handle this, because unattended runs need different safety constraints than interactive ones."
-[From architecture decisions, run config, and design trade-offs]
+After each phase, output what was found and a counterfactual:
-## What's NOT in Scope
+> "Found: you use Supabase auth (not custom JWT). If I'd skipped research, I would have built JWT middleware — completely wrong."
-[From design doc's rejected alternatives and explicit exclusions]
+### Layer 2: File Output (detailed — for learning and reports)
-## Open Questions
+Save full reasoning chain to `.wazir/runs/<id>/reasoning/phase-<name>-reasoning.md` with entries:
-[From design doc's open questions and review findings]
+```markdown
+### Decision: [title]
+- **Trigger:** What prompted this decision
+- **Options considered:** List of alternatives
+- **Chosen:** The selected option
+- **Reasoning:** Why this option was chosen
+- **Confidence:** high | medium | low
+- **Counterfactual:** What would have gone wrong without this information
 ```
-## After Generation
-> **PRD generated at `docs/prd/YYYY-MM-DD-<topic>-prd.md`.**
->
-> **What would you like to do?**
-> 1. **Review the PRD** (Recommended)
-> 2. **Commit it**
-> 3. **Edit before committing**
----
+Create the `reasoning/` directory during run init. Every phase skill (clarifier, executor, reviewer) writes its own reasoning file. Counterfactuals appear in BOTH conversation output AND reasoning files.
 ## Interaction Rules
-These rules apply to ALL questions in the pipeline, including those asked by sub-skills (clarifier, executor, reviewer) and audit modes:
 - **One question at a time** — never combine multiple questions
 - **Numbered options** — always present choices as numbered lists
 - **Mark defaults** — always show "(Recommended)" on the suggested option