npm - @wazir-dev/cli - Versions diffs - 1.0.0 → 1.1.0 - Mend

@wazir-dev/cli 1.0.0 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (69) hide show

package/CHANGELOG.md +31 -2
package/docs/plans/2026-03-15-cli-pipeline-integration-plan.md +1 -1
package/docs/reference/review-loop-pattern.md +429 -0
package/docs/reference/tooling-cli.md +2 -0
package/docs/truth-claims.yaml +6 -0
package/exports/hosts/claude/.claude/agents/clarifier.md +3 -0
package/exports/hosts/claude/.claude/agents/designer.md +3 -0
package/exports/hosts/claude/.claude/agents/executor.md +2 -0
package/exports/hosts/claude/.claude/agents/planner.md +3 -0
package/exports/hosts/claude/.claude/agents/researcher.md +2 -0
package/exports/hosts/claude/.claude/agents/reviewer.md +5 -1
package/exports/hosts/claude/.claude/agents/specifier.md +3 -0
package/exports/hosts/claude/.claude/commands/clarify.md +4 -0
package/exports/hosts/claude/.claude/commands/design-review.md +4 -0
package/exports/hosts/claude/.claude/commands/design.md +4 -0
package/exports/hosts/claude/.claude/commands/discover.md +4 -0
package/exports/hosts/claude/.claude/commands/execute.md +4 -0
package/exports/hosts/claude/.claude/commands/plan-review.md +4 -0
package/exports/hosts/claude/.claude/commands/plan.md +4 -0
package/exports/hosts/claude/.claude/commands/spec-challenge.md +4 -0
package/exports/hosts/claude/.claude/commands/specify.md +4 -0
package/exports/hosts/claude/.claude/commands/verify.md +4 -0
package/exports/hosts/claude/export.manifest.json +19 -19
package/exports/hosts/codex/export.manifest.json +19 -19
package/exports/hosts/cursor/export.manifest.json +19 -19
package/exports/hosts/gemini/export.manifest.json +19 -19
package/hooks/definitions/loop_cap_guard.yaml +1 -1
package/hooks/hooks.json +18 -0
package/package.json +3 -2
package/roles/clarifier.md +3 -0
package/roles/designer.md +3 -0
package/roles/executor.md +2 -0
package/roles/planner.md +3 -0
package/roles/researcher.md +2 -0
package/roles/reviewer.md +5 -1
package/roles/specifier.md +3 -0
package/skills/brainstorming/SKILL.md +139 -38
package/skills/clarifier/SKILL.md +219 -0
package/skills/debugging/SKILL.md +11 -1
package/skills/executing-plans/SKILL.md +15 -2
package/skills/executor/SKILL.md +76 -0
package/skills/init-pipeline/SKILL.md +106 -17
package/skills/receiving-code-review/SKILL.md +8 -0
package/skills/requesting-code-review/SKILL.md +25 -5
package/skills/reviewer/SKILL.md +151 -0
package/skills/subagent-driven-development/SKILL.md +25 -2
package/skills/tdd/SKILL.md +8 -0
package/skills/wazir/SKILL.md +250 -43
package/skills/writing-plans/SKILL.md +31 -4
package/templates/examples/wazir-manifest.example.yaml +1 -1
package/tooling/src/capture/command.js +87 -1
package/tooling/src/capture/run-config.js +21 -0
package/tooling/src/checks/brand-truth.js +3 -6
package/tooling/src/checks/command-registry.js +1 -0
package/tooling/src/checks/docs-truth.js +1 -1
package/tooling/src/checks/runtime-surface.js +3 -7
package/tooling/src/cli.js +8 -3
package/tooling/src/init/command.js +201 -0
package/wazir.manifest.yaml +0 -3
package/workflows/clarify.md +4 -0
package/workflows/design-review.md +4 -0
package/workflows/design.md +4 -0
package/workflows/discover.md +4 -0
package/workflows/execute.md +4 -0
package/workflows/plan-review.md +4 -0
package/workflows/plan.md +4 -0
package/workflows/spec-challenge.md +4 -0
package/workflows/specify.md +4 -0
package/workflows/verify.md +4 -0

package/skills/init-pipeline/SKILL.md CHANGED Viewed

@@ -5,27 +5,35 @@ description: Initialize the Wazir pipeline with interactive setup. Creates proje
 # Initialize Pipeline
-Set up the Wazir pipeline for this project. All questions use numbered interactive options — one question at a time.
+Set up the Wazir pipeline for this project.
 ## Step 0: Check Wazir CLI
 Run `which wazir` to check if the CLI is installed.
+**If installed** — run `wazir init` and let it handle the interactive setup (arrow-key selection). If the pipeline was already initialized, use `wazir init --force` to reinitialize. Once it completes, skip to Step 9 (Confirm).
 **If not installed**, present:
 > **The Wazir CLI is not installed. It's required for event capture, validation, and indexing.**
 >
 > **How would you like to install it?**
 >
-> 1. **npm** (Recommended) — `npm install -g wazir`
+> 1. **npm** (Recommended) — `npm install -g @wazir-dev/cli`
 > 2. **Local link** — `npm link` from the Wazir project root
 > 3. **Skip** — Continue without the CLI (some features will be unavailable)
-If the user picks 1, run `npm install -g wazir` and verify with `wazir --version`.
+Wait for the user to answer before continuing.
+If the user picks 1, run `npm install -g @wazir-dev/cli` and verify with `wazir --version`.
 If the user picks 2, run `npm link` from the project root and verify.
-If the user picks 3, warn that `wazir capture`, `wazir validate`, and `wazir index` commands will not work, then continue.
+If the user picks 3, warn that `wazir capture`, `wazir validate`, and `wazir index` commands will not work, then continue to the manual steps below.
+After installing, run `wazir init` and let it handle the rest. Skip to Step 9.
+---
-**If installed**, run `wazir doctor --json` to verify repo health and continue.
+**The steps below are the manual fallback — only used when the CLI is not installed and the user chose to skip installation.**
 ## Step 1: Create Project Directories
@@ -39,9 +47,9 @@ Present this question:
 > **How should Wazir run in this project?**
 >
-> 1. **Claude only** (Recommended) — Everything runs in Claude Code. Single model, slash commands only.
-> 2. **Multi-model** — Still Claude Code, but routes tasks by complexity (Haiku for micro, Sonnet for standard, Opus for complex).
-> 3. **Multi-tool** — Claude Code + external tools for reviews.
+> 1. **Single model** (Recommended) — Everything runs in the current model. Single model, slash commands only.
+> 2. **Multi-model** — Still the current model, but routes tasks by complexity (Haiku for micro, Sonnet for standard, Opus for complex).
+> 3. **Multi-tool** — Current model + external tools for reviews.
 Wait for the user to answer before continuing.
@@ -51,37 +59,118 @@ Only ask this if the user selected option 3:
 > **Which external tools should Wazir use for reviews?**
 >
-> 1. **Codex** — Send reviews to OpenAI Codex
+> 1. **Codex** (Recommended) — Send reviews to OpenAI Codex
 > 2. **Gemini** — Send reviews to Google Gemini
 > 3. **Both** — Use Codex and Gemini as secondary reviewers
 Wait for the user to answer before continuing.
-## Step 4: Write Config
+## Step 3.5: Codex Model (conditional)
+Only ask this if Codex was selected in Step 3:
+> **Which Codex model should Wazir use?**
+>
+> 1. **gpt-5.3-codex-spark** (Recommended) — Fast, good for review loops and grounder work
+> 2. **gpt-5.4** — Slower, deeper analysis for complex reviews
+>
+> *You can change this later in `.wazir/state/config.json` under `multi_tool.codex.model`.*
+Wait for the user to answer before continuing.
+## Step 4: Default Depth
+> **What default depth should runs use?**
+>
+> 1. **Quick** — Minimal research, single-pass review, fast execution. Good for small fixes and config changes.
+> 2. **Standard** (Recommended) — Balanced research, multi-pass hardening, full review. Good for most features.
+> 3. **Deep** — Extended research, thorough hardening, strict review thresholds. Good for complex or security-critical work.
+>
+> *This sets the project default. Individual runs can override via inline modifiers (e.g. `/wazir quick ...`).*
+Wait for the user to answer before continuing.
+## Step 5: Default Intent
+> **What kind of work does this project mostly involve?**
+>
+> 1. **Feature** (Recommended) — New functionality or enhancement
+> 2. **Bugfix** — Fix broken behavior
+> 3. **Refactor** — Restructure without changing behavior
+> 4. **Docs** — Documentation only
+> 5. **Spike** — Research and exploration, no production code
+>
+> *This sets the project default. Individual runs can override via inline modifiers or when intent is obvious from the request.*
+Wait for the user to answer before continuing.
+## Step 6: Agent Teams (conditional)
+Only ask this if ALL of these are true:
+- The host is Claude Code (not Codex/Gemini/Cursor)
+- Default depth is `standard` or `deep`
+- Default intent is `feature` or `refactor` (not bugfix/docs/spike)
+> **Would you like to use Agent Teams for parallel execution?**
+>
+> 1. **No** (Recommended) — Tasks run sequentially. Predictable, lower cost.
+> 2. **Yes** — Spawns parallel teammates for independent tasks. Potentially faster and richer output.
+>
+> *Agent Teams is experimental from Claude's side. Requires Opus model. Higher token consumption.*
+If the conditions above are NOT met, silently default to `team_mode: sequential`.
+If the user selects **Yes**, enable the Agent Teams experimental feature:
+```bash
+claude config set env.CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS 1
+```
+Then inform the user they need to restart their Claude Code session for it to take effect.
+Wait for the user to answer before continuing.
+## Step 7: Write Config
 Create/update `.wazir/state/config.json`:
 - Set `model_mode` to the selected mode (`claude-only`, `multi-model`, or `multi-tool`)
 - If `multi-tool`, set `multi_tool.tools` to the selected tools (e.g. `["codex"]`, `["gemini"]`, or `["codex", "gemini"]`)
+- Set `default_depth` to the selected depth (`quick`, `standard`, or `deep`)
+- Set `default_intent` to the selected intent (`feature`, `bugfix`, `refactor`, `docs`, or `spike`)
+- Set `team_mode` to the selected mode (`sequential` or `parallel`)
+- If `team_mode` is `parallel`, set `parallel_backend` to `claude_teams`
+- If Codex selected, set `multi_tool.codex.model` to the chosen model
-Example for claude-only:
+Example for claude-only with defaults:
 ```json
 {
-  "model_mode": "claude-only"
+  "model_mode": "claude-only",
+  "default_depth": "standard",
+  "default_intent": "feature",
+  "team_mode": "sequential",
+  "parallel_backend": "none"
 }
 ```
-Example for multi-tool with codex:
+Example for multi-tool with teams enabled:
 ```json
 {
   "model_mode": "multi-tool",
   "multi_tool": {
-    "tools": ["codex"]
-  }
+    "tools": ["codex"],
+    "codex": {
+      "model": "gpt-5.3-codex-spark"
+    }
+  },
+  "default_depth": "deep",
+  "default_intent": "feature",
+  "team_mode": "parallel",
+  "parallel_backend": "claude_teams"
 }
 ```
-## Step 5: Runtime-Specific Setup
+## Step 8: Runtime-Specific Setup
 Based on `multi_tool.tools`:
@@ -104,7 +193,7 @@ Based on `multi_tool.tools`:
 - If **both**: Create both files.
-## Step 6: Confirm
+## Step 9: Confirm
 List all files created and show the selected mode. Then present:

package/skills/receiving-code-review/SKILL.md CHANGED Viewed

@@ -11,6 +11,14 @@ Code review requires technical evaluation, not emotional performance.
 **Core principle:** Verify before implementing. Ask before assuming. Technical correctness over social comfort.
+## Loop Tracking
+When receiving review findings, the fix-and-re-review cycle follows the review loop pattern:
+- **Pipeline mode** (`.wazir/runs/latest/` exists): track iterations via `wazir capture loop-check`. If the cap is reached (exit 43), escalate to the user with current state and evidence.
+- **Standalone mode** (no `.wazir/runs/latest/`): the loop runs for `pass_counts[depth]` passes (quick=3, standard=5, deep=7) with no cap guard. Track iteration count manually.
+Reference `docs/reference/review-loop-pattern.md` for the full loop contract.
 ## The Response Pattern
 ```

package/skills/requesting-code-review/SKILL.md CHANGED Viewed

@@ -7,7 +7,7 @@ description: Use when completing tasks, implementing major features, or before m
 Dispatch wz:code-reviewer subagent to catch issues before they cascade. The reviewer gets precisely crafted context for evaluation — never your session's history. This keeps the reviewer focused on the work product, not your thought process, and preserves your own context for continued work.
-**Core principle:** Review early, review often.
+**Core principle:** Review early, review often. Review follows the loop pattern in `docs/reference/review-loop-pattern.md`. Dispatch the reviewer with explicit `--mode` and depth-aware loop parameters.
 ## When to Request Review
@@ -23,22 +23,36 @@ Dispatch wz:code-reviewer subagent to catch issues before they cascade. The revi
 ## How to Request
-**1. Get git SHAs:**
+**1. Get git SHAs and scope the review:**
+Use `--uncommitted` for uncommitted changes, `--base <sha>` for committed changes.
 ```bash
+# For committed changes:
 BASE_SHA=$(git rev-parse HEAD~1)  # or origin/main
 HEAD_SHA=$(git rev-parse HEAD)
+codex review --base $BASE_SHA
+# For uncommitted changes:
+codex review --uncommitted
 ```
-**2. Dispatch code-reviewer subagent:**
+**2. Dispatch code-reviewer subagent with loop config:**
 Use Task tool with wz:code-reviewer type, fill template at `./code-reviewer.md`
+Include explicit loop parameters:
+- `--mode` (e.g., `task-review`, `final`)
+- Depth-aware dimensions and cap from `phase_policy`
+- Review pass number (for log filenames)
 **Placeholders:**
 - `{WHAT_WAS_IMPLEMENTED}` - What you just built
 - `{PLAN_OR_REQUIREMENTS}` - What it should do
 - `{BASE_SHA}` - Starting commit
 - `{HEAD_SHA}` - Ending commit
 - `{DESCRIPTION}` - Brief summary
+- `{REVIEW_MODE}` - Explicit review mode (e.g., task-review)
 **3. Act on feedback:**
 - Fix Critical issues immediately
@@ -46,6 +60,10 @@ Use Task tool with wz:code-reviewer type, fill template at `./code-reviewer.md`
 - Note Minor issues for later
 - Push back if reviewer is wrong (with reasoning)
+### Codex Error Handling
+If codex exits non-zero during review, log the error, mark the pass as codex-unavailable, and use self-review findings only. Do not treat a Codex failure as a clean pass.
 ## Example
 ```
@@ -56,12 +74,13 @@ You: Let me request code review before proceeding.
 BASE_SHA=$(git log --oneline | grep "Task 1" | head -1 | awk '{print $1}')
 HEAD_SHA=$(git rev-parse HEAD)
-[Dispatch wz:code-reviewer subagent]
+[Dispatch wz:code-reviewer subagent with --mode task-review]
   WHAT_WAS_IMPLEMENTED: Verification and repair functions for conversation index
   PLAN_OR_REQUIREMENTS: Task 2 from docs/plans/deployment-plan.md
   BASE_SHA: a7981ec
   HEAD_SHA: 3df7661
   DESCRIPTION: Added verifyIndex() and repairIndex() with 4 issue types
+  REVIEW_MODE: task-review
 [Subagent returns]:
   Strengths: Clean architecture, real tests
@@ -82,7 +101,7 @@ You: [Fix progress indicators]
 - Fix before moving to next task
 **Executing Plans:**
-- Review after each batch (3 tasks)
+- Review after each task (per-task review checkpoint)
 - Get feedback, apply, continue
 **Ad-Hoc Development:**
@@ -96,6 +115,7 @@ You: [Fix progress indicators]
 - Ignore Critical issues
 - Proceed with unfixed Important issues
 - Argue with valid technical feedback
+- Dispatch review without explicit `--mode`
 **If reviewer wrong:**
 - Push back with technical reasoning

package/skills/reviewer/SKILL.md ADDED Viewed

@@ -0,0 +1,151 @@
+---
+name: wz:reviewer
+description: Run the review phase — adversarial review of implementation against the approved spec, plan, and verification evidence.
+---
+# Reviewer
+Run Phase 3 (Review) for the current project.
+The reviewer role owns all review loops across the pipeline: research-review, clarification-review, spec-challenge, design-review, plan-review, per-task execution review, and final review. Each uses phase-specific dimensions from `docs/reference/review-loop-pattern.md`.
+## Review Modes
+The reviewer operates in different modes depending on the phase. Mode MUST be passed explicitly by the caller (`--mode <mode>`). The reviewer does NOT auto-detect mode from artifact availability. If `--mode` is not provided, ask the user which review to run.
+| Mode | Invoked during | Prerequisites | Dimensions | Output |
+|------|---------------|---------------|------------|--------|
+| `final` | After execution + verification | Completed task artifacts, approved spec/plan/design | 7 final-review dims, scored 0-70 | Scored verdict (PASS/FAIL) |
+| `spec-challenge` | After specify | Draft spec artifact | 5 spec/clarification dims | Pass/fix loop, no score |
+| `design-review` | After design approval | Design artifact, approved spec | 5 design-review dims (canonical) | Pass/fix loop, no score |
+| `plan-review` | After planning | Draft plan artifact | 7 plan dims | Pass/fix loop, no score |
+| `task-review` | During execution, per task | Uncommitted changes or `--base` SHA | 5 task-execution dims (correctness, tests, wiring, drift, quality) | Pass/fix loop, no score |
+| `research-review` | During discover | Research artifact | 5 research dims | Pass/fix loop, no score |
+| `clarification-review` | During clarify | Clarification artifact | 5 spec/clarification dims | Pass/fix loop, no score |
+Each mode follows the review loop pattern in `docs/reference/review-loop-pattern.md`. Pass counts are fixed by depth (quick=3, standard=5, deep=7). No extension.
+### CHANGELOG Enforcement
+In `task-review` and `final` modes, flag missing CHANGELOG entries for user-facing changes as **[warning]** severity. User-facing changes include new features, behavior changes, and bug fixes visible to users. Internal changes (refactors, tooling, tests) do not require CHANGELOG entries.
+## Prerequisites
+Prerequisites depend on the review mode:
+### `final` mode
+1. Check `.wazir/runs/latest/artifacts/` has completed task artifacts. If not, tell the user to run `/wazir:executor` first.
+2. Read the approved spec, plan, and design from `.wazir/runs/latest/clarified/`.
+3. Read `.wazir/state/config.json` for depth and multi_tool settings.
+### `task-review` mode
+1. Uncommitted changes exist for the current task, or a `--base` SHA is provided for committed changes.
+2. Read `.wazir/state/config.json` for depth and multi_tool settings.
+### `spec-challenge`, `design-review`, `plan-review`, `research-review`, `clarification-review` modes
+1. The appropriate input artifact for the mode exists.
+2. Read `.wazir/state/config.json` for depth and multi_tool settings.
+## Review Process (`final` mode)
+Perform adversarial review across 7 dimensions:
+1. **Correctness** — Does the code do what the spec says?
+2. **Completeness** — Are all acceptance criteria met?
+3. **Wiring** — Are all paths connected end-to-end?
+4. **Verification** — Is there evidence (tests, type checks) for each claim?
+5. **Drift** — Does the implementation match the approved plan?
+6. **Quality** — Code style, naming, error handling, security
+7. **Documentation** — Changelog entries, commit messages, comments
+## Context Retrieval
+- Read the diff first (primary input)
+- Use `wazir index search-symbols <name>` to locate related code
+- Use `wazir recall symbol <name-or-id> --tier L1` to check structural alignment
+- Read files directly for: logic errors, missing edge cases, integration concerns
+## Scoring (`final` mode)
+Score each dimension 0-10. Total out of 70.
+| Verdict | Score | Action |
+|---------|-------|--------|
+| **PASS** | 56+ | Ready for PR or merge |
+| **NEEDS MINOR FIXES** | 42-55 | Auto-fix and re-review |
+| **NEEDS REWORK** | 28-41 | Re-run affected tasks |
+| **FAIL** | 0-27 | Fundamental issues |
+## Secondary Review
+Read `.wazir/state/config.json`. If `multi_tool.tools` includes external reviewers, run them **after** your own review and **before** producing the final verdict.
+### Codex Review
+If `codex` is in `multi_tool.tools`:
+1. Run Codex review against the current changes:
+   ```bash
+   CODEX_MODEL=$(jq -r '.multi_tool.codex.model // empty' .wazir/state/config.json 2>/dev/null)
+   CODEX_MODEL=${CODEX_MODEL:-gpt-5.4}
+   codex review -c model="$CODEX_MODEL" --uncommitted --title "Wazir review: <brief summary>" \
+     "Review against these acceptance criteria: <paste criteria from spec>" \
+     2>&1 | tee .wazir/runs/latest/reviews/codex-review.md
+   ```
+   Or if changes are committed:
+   ```bash
+   CODEX_MODEL=$(jq -r '.multi_tool.codex.model // empty' .wazir/state/config.json 2>/dev/null)
+   CODEX_MODEL=${CODEX_MODEL:-gpt-5.4}
+   codex review -c model="$CODEX_MODEL" --base <base-branch> --title "Wazir review: <brief summary>" \
+     "Review against these acceptance criteria: <paste criteria from spec>" \
+     2>&1 | tee .wazir/runs/latest/reviews/codex-review.md
+   ```
+2. Read the Codex findings from `.wazir/runs/latest/reviews/codex-review.md`
+3. Incorporate Codex findings into your scoring — if Codex flags something you missed, add it. If you disagree with a Codex finding, note it with your rationale.
+**Codex error handling:** If codex exits non-zero (auth/rate-limit/transport failure), log the full stderr, mark the pass as `codex-unavailable` in the review log, and use self-review findings only for that pass. Do NOT treat a Codex failure as a clean review. Do NOT skip the pass. The next pass still attempts Codex (transient failures may recover).
+**Code review scoping by mode:**
+- Use `--uncommitted` when reviewing uncommitted changes (`task-review` mode).
+- Use `--base <sha>` when reviewing committed changes.
+- Use `codex exec -c model="$CODEX_MODEL"` with stdin pipe for non-code artifacts (`spec-challenge`, `design-review`, `plan-review`, `research-review`, `clarification-review` modes).
+- See `docs/reference/review-loop-pattern.md` for code review scoping rules.
+### Gemini Review
+If `gemini` is in `multi_tool.tools`, follow the same pattern using the Gemini CLI when available.
+### Merging Findings
+The final review report must clearly attribute each finding:
+- `[Wazir]` — found by primary review
+- `[Codex]` — found by Codex secondary review
+- `[Both]` — found independently by both
+## Task-Review Log Filenames
+In `task-review` mode, use task-scoped log filenames and cap tracking:
+- Log filenames: `.wazir/runs/latest/reviews/execute-task-<NNN>-review-pass-<N>.md`
+- Cap tracking: `wazir capture loop-check --task-id <NNN>` (each task has its own independent cap counter)
+## Output
+Save review results to `.wazir/runs/latest/reviews/review.md` with:
+- Findings with severity (blocking, warning, note)
+- Rationale tied to evidence
+- Score breakdown
+- Verdict
+## Done
+Present the verdict and offer next steps:
+> **Review complete: [VERDICT] ([score]/70)**
+>
+> [Score breakdown and findings summary]
+>
+> **What would you like to do?**
+> 1. **Create a PR** (if PASS)
+> 2. **Auto-fix and re-review** (if MINOR FIXES)
+> 3. **Review findings in detail**

package/skills/subagent-driven-development/SKILL.md CHANGED Viewed

@@ -45,6 +45,7 @@ digraph process {
     subgraph cluster_per_task {
         label="Per Task";
+        "Capture PRE_TASK_SHA" [shape=box];
         "Dispatch implementer subagent (./implementer-prompt.md)" [shape=box];
         "Implementer subagent asks questions?" [shape=diamond];
         "Answer questions, provide context" [shape=box];
@@ -63,7 +64,8 @@ digraph process {
     "Dispatch final code reviewer subagent for entire implementation" [shape=box];
     "Use wz:finishing-a-development-branch" [shape=box style=filled fillcolor=lightgreen];
-    "Read plan, extract all tasks with full text, note context, create TodoWrite" -> "Dispatch implementer subagent (./implementer-prompt.md)";
+    "Read plan, extract all tasks with full text, note context, create TodoWrite" -> "Capture PRE_TASK_SHA";
+    "Capture PRE_TASK_SHA" -> "Dispatch implementer subagent (./implementer-prompt.md)";
     "Dispatch implementer subagent (./implementer-prompt.md)" -> "Implementer subagent asks questions?";
     "Implementer subagent asks questions?" -> "Answer questions, provide context" [label="yes"];
     "Answer questions, provide context" -> "Implementer subagent implements, tests, commits, self-reviews";
@@ -78,12 +80,32 @@ digraph process {
     "Code quality reviewer subagent approves?" -> "Implementer subagent fixes quality issues" [label="no"];
     "Implementer subagent fixes quality issues" -> "Dispatch code quality reviewer subagent (./code-quality-reviewer-prompt.md)";
     "Mark task complete in TodoWrite" -> "More tasks remain?";
-    "More tasks remain?" -> "Dispatch implementer subagent (./implementer-prompt.md)" [label="yes"];
+    "More tasks remain?" -> "Capture PRE_TASK_SHA" [label="yes"];
     "More tasks remain?" -> "Dispatch final code reviewer subagent for entire implementation" [label="no"];
     "Dispatch final code reviewer subagent for entire implementation" -> "Use wz:finishing-a-development-branch";
 }
 ```
+### Code Review Scoping
+The implementer subagent commits before review. The spec reviewer and code quality reviewer must use `codex review --base <pre-task-sha>` to scope their review to the task's changes. Capture `PRE_TASK_SHA=$(git rev-parse HEAD)` before dispatching the implementer.
+### Review Loop Alignment
+Both review stages follow the review loop pattern in `docs/reference/review-loop-pattern.md` with explicit `--mode task-review`:
+- **Spec compliance review:** Uses spec dimensions with `--mode task-review`
+- **Code quality review:** Uses 5 task-execution dimensions with `--mode task-review`
+Review logs use task-scoped filenames: `execute-task-<NNN>-review-pass-<N>.md`
+Each review stage respects the loop cap via `wazir capture loop-check --task-id <NNN>`. If the cap is reached (exit 43), escalate to the controller (you) for a decision.
+### Codex Error Handling
+If codex exits non-zero during review, log the error, mark the pass as codex-unavailable, and use self-review findings only. Do not treat a Codex failure as a clean pass.
+**Standalone mode:** When no `.wazir/runs/latest/` exists, review logs go to `docs/plans/`.
 ## Prompt Templates
 - `./implementer-prompt.md` - Dispatch implementer subagent
@@ -137,6 +159,7 @@ digraph process {
 - Let implementer self-review replace actual review (both are needed)
 - **Start code quality review before spec compliance is PASS** (wrong order)
 - Move to next task while either review has open issues
+- **Review the wrong diff -- always scope to the current task's changes using --base**
 **If subagent asks questions:**
 - Answer clearly and completely

package/skills/tdd/SKILL.md CHANGED Viewed

@@ -10,6 +10,12 @@ Sequence:
 1. RED
 Write or update a test that expresses the new behavior or the bug being fixed, then run it and confirm failure.
+**Test quality check (single-pass):** Before proceeding to GREEN, verify:
+- Are these tests testing the right behavior?
+- Are they real assertions, not tautologies?
+- Do they fail for the right reason (not a syntax error or import failure)?
+If any check fails, fix the test before moving on. This is a single-pass quality check, not a full review loop.
 2. GREEN
 Write the smallest implementation change that makes the failing test pass.
@@ -21,3 +27,5 @@ Rules:
 - do not skip the failing-test step when automated verification is feasible
 - do not rewrite tests to fit broken behavior
 - rerun verification after each meaningful refactor
+For the full review loop pattern, see `docs/reference/review-loop-pattern.md`. TDD uses a single-pass quality check, not the full loop.