@wazir-dev/cli 1.0.0 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (69) hide show
  1. package/CHANGELOG.md +31 -2
  2. package/docs/plans/2026-03-15-cli-pipeline-integration-plan.md +1 -1
  3. package/docs/reference/review-loop-pattern.md +429 -0
  4. package/docs/reference/tooling-cli.md +2 -0
  5. package/docs/truth-claims.yaml +6 -0
  6. package/exports/hosts/claude/.claude/agents/clarifier.md +3 -0
  7. package/exports/hosts/claude/.claude/agents/designer.md +3 -0
  8. package/exports/hosts/claude/.claude/agents/executor.md +2 -0
  9. package/exports/hosts/claude/.claude/agents/planner.md +3 -0
  10. package/exports/hosts/claude/.claude/agents/researcher.md +2 -0
  11. package/exports/hosts/claude/.claude/agents/reviewer.md +5 -1
  12. package/exports/hosts/claude/.claude/agents/specifier.md +3 -0
  13. package/exports/hosts/claude/.claude/commands/clarify.md +4 -0
  14. package/exports/hosts/claude/.claude/commands/design-review.md +4 -0
  15. package/exports/hosts/claude/.claude/commands/design.md +4 -0
  16. package/exports/hosts/claude/.claude/commands/discover.md +4 -0
  17. package/exports/hosts/claude/.claude/commands/execute.md +4 -0
  18. package/exports/hosts/claude/.claude/commands/plan-review.md +4 -0
  19. package/exports/hosts/claude/.claude/commands/plan.md +4 -0
  20. package/exports/hosts/claude/.claude/commands/spec-challenge.md +4 -0
  21. package/exports/hosts/claude/.claude/commands/specify.md +4 -0
  22. package/exports/hosts/claude/.claude/commands/verify.md +4 -0
  23. package/exports/hosts/claude/export.manifest.json +19 -19
  24. package/exports/hosts/codex/export.manifest.json +19 -19
  25. package/exports/hosts/cursor/export.manifest.json +19 -19
  26. package/exports/hosts/gemini/export.manifest.json +19 -19
  27. package/hooks/definitions/loop_cap_guard.yaml +1 -1
  28. package/hooks/hooks.json +18 -0
  29. package/package.json +3 -2
  30. package/roles/clarifier.md +3 -0
  31. package/roles/designer.md +3 -0
  32. package/roles/executor.md +2 -0
  33. package/roles/planner.md +3 -0
  34. package/roles/researcher.md +2 -0
  35. package/roles/reviewer.md +5 -1
  36. package/roles/specifier.md +3 -0
  37. package/skills/brainstorming/SKILL.md +139 -38
  38. package/skills/clarifier/SKILL.md +219 -0
  39. package/skills/debugging/SKILL.md +11 -1
  40. package/skills/executing-plans/SKILL.md +15 -2
  41. package/skills/executor/SKILL.md +76 -0
  42. package/skills/init-pipeline/SKILL.md +106 -17
  43. package/skills/receiving-code-review/SKILL.md +8 -0
  44. package/skills/requesting-code-review/SKILL.md +25 -5
  45. package/skills/reviewer/SKILL.md +151 -0
  46. package/skills/subagent-driven-development/SKILL.md +25 -2
  47. package/skills/tdd/SKILL.md +8 -0
  48. package/skills/wazir/SKILL.md +250 -43
  49. package/skills/writing-plans/SKILL.md +31 -4
  50. package/templates/examples/wazir-manifest.example.yaml +1 -1
  51. package/tooling/src/capture/command.js +87 -1
  52. package/tooling/src/capture/run-config.js +21 -0
  53. package/tooling/src/checks/brand-truth.js +3 -6
  54. package/tooling/src/checks/command-registry.js +1 -0
  55. package/tooling/src/checks/docs-truth.js +1 -1
  56. package/tooling/src/checks/runtime-surface.js +3 -7
  57. package/tooling/src/cli.js +8 -3
  58. package/tooling/src/init/command.js +201 -0
  59. package/wazir.manifest.yaml +0 -3
  60. package/workflows/clarify.md +4 -0
  61. package/workflows/design-review.md +4 -0
  62. package/workflows/design.md +4 -0
  63. package/workflows/discover.md +4 -0
  64. package/workflows/execute.md +4 -0
  65. package/workflows/plan-review.md +4 -0
  66. package/workflows/plan.md +4 -0
  67. package/workflows/spec-challenge.md +4 -0
  68. package/workflows/specify.md +4 -0
  69. package/workflows/verify.md +4 -0
@@ -12,8 +12,8 @@ Rules:
12
12
  1. Do not write implementation code before the design is reviewed with the operator.
13
13
  2. Ask clarifying questions only when the ambiguity changes scope, architecture, or acceptance criteria.
14
14
  3. Propose 2-3 approaches with trade-offs and a recommendation.
15
- 4. Write the approved design to `docs/plans/YYYY-MM-DD-<topic>-design.md`.
16
- 5. Hand off to `wz:writing-plans` after approval.
15
+ 4. Write the approved design to `.wazir/runs/latest/clarified/design.md` (if inside a pipeline run) or `docs/plans/YYYY-MM-DD-<topic>-design.md` (if standalone).
16
+ 5. After user approves the design concept, the reviewer role runs the design-review loop with `--mode design-review` using canonical design-review dimensions (spec coverage, design-spec consistency, accessibility, visual consistency, exported-code fidelity). See `workflows/design-review.md` and `docs/reference/review-loop-pattern.md`. The designer resolves any findings. If the design-review loop completes all passes clean, hand off to `wz:writing-plans`. Planning does not start until design-review is complete.
17
17
 
18
18
  Required outputs:
19
19
 
@@ -23,55 +23,156 @@ Required outputs:
23
23
 
24
24
  ---
25
25
 
26
- ## Team Mode: Structured Dialogue
26
+ ## Team Mode: Agent Teams Structured Dialogue
27
27
 
28
28
  **Condition:** Only activate when `team_mode: parallel` in `.wazir/runs/latest/run-config.yaml`. Otherwise, use the default single-agent brainstorming above.
29
29
 
30
- When team mode is active, spawn a 3-agent team using Claude Code Agent Teams:
30
+ This mode uses **Agent Teams** (experimental, Claude Code + Opus 4.6) to run a
31
+ multi-agent brainstorming session. Your role is the **Arbiter** — you coordinate
32
+ the dialogue, evaluate convergence, and signal when to stop. You do NOT generate
33
+ design ideas yourself.
34
+
35
+ ### Infrastructure: Claude Code Agent Teams
36
+
37
+ This skill uses **Agent Teams** — not subagents. The distinction matters:
38
+
39
+ | | Subagents (Task tool) | Agent Teams |
40
+ |---|---|---|
41
+ | **Lifecycle** | Spawn, return result, die | Full independent sessions that persist for team lifetime |
42
+ | **Communication** | Report back to parent only | Direct peer-to-peer messaging via `SendMessage` |
43
+ | **Coordination** | Parent manages everything | Shared task list with self-coordination |
44
+
45
+ **Critical constraint:** Text output from teammates is NOT visible to the team.
46
+ Teammates MUST use `SendMessage` to communicate with each other. Regular text
47
+ output is only visible in a teammate's own terminal pane.
48
+
49
+ ### Prerequisites
50
+
51
+ ```bash
52
+ # Check if Agent Teams is enabled
53
+ echo $CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS
54
+ ```
55
+
56
+ If not set (empty), tell the user:
57
+
58
+ > Agent Teams is not enabled. Run this command and restart Claude Code:
59
+ > ```bash
60
+ > claude config set env.CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS 1
61
+ > ```
62
+
63
+ Then fall back to single-agent brainstorming (rules 1-5 above).
64
+
65
+ ### Step 1: Create the Team
66
+
67
+ Derive `<concept-slug>` from the briefing topic (lowercased, hyphens for spaces).
68
+
69
+ Use `TeamCreate` to initialize the team with the name `wazir-brainstorm-<concept-slug>`.
70
+
71
+ ### Step 2: Spawn Teammates
72
+
73
+ Spawn three teammates using the `Agent` tool with the `team_name` parameter set
74
+ to `wazir-brainstorm-<concept-slug>`. Each agent receives a detailed system
75
+ prompt via the `prompt` parameter.
76
+
77
+ #### Free Thinker
78
+
79
+ ```
80
+ Agent(
81
+ team_name: "wazir-brainstorm-<concept-slug>",
82
+ prompt: "You are the Free Thinker in a Wazir brainstorming session.
83
+ Your job is to propose creative design directions without self-censoring.
84
+ Open new threads, explore possibilities, make connections. Communicate
85
+ ONLY via SendMessage — your text output is not visible to the team.
86
+ After proposing a direction, wait for the Grounder's response before
87
+ opening a new one."
88
+ )
89
+ ```
90
+
91
+ #### Grounder
92
+
93
+ ```
94
+ Agent(
95
+ team_name: "wazir-brainstorm-<concept-slug>",
96
+ prompt: "You are the Grounder in a Wazir brainstorming session. Your
97
+ job is to challenge every proposal from the Free Thinker with practical
98
+ concerns: feasibility, complexity, risk, alternatives. After 3-5
99
+ exchanges on a direction, decide: pursue, park, or redirect.
100
+ Communicate ONLY via SendMessage — your text output is not visible to
101
+ the team."
102
+ )
103
+ ```
104
+
105
+ #### Synthesizer
106
+
107
+ ```
108
+ Agent(
109
+ team_name: "wazir-brainstorm-<concept-slug>",
110
+ prompt: "You are the Synthesizer in a Wazir brainstorming session.
111
+ You NEVER participate in dialogue — only observe. Read all SendMessage
112
+ traffic between Free Thinker and Grounder. When the Arbiter signals
113
+ convergence, write the final design document to
114
+ .wazir/runs/latest/clarified/design.md with: design summary, pursued
115
+ directions, rejected alternatives, open questions, recommendation."
116
+ )
117
+ ```
118
+
119
+ ### Step 3: Coordinate the Dialogue (You Are the Arbiter)
120
+
121
+ 1. Use `SendMessage` to tell the Free Thinker to open the first direction
122
+ based on the briefing, research brief, and hardened spec.
123
+ 2. Monitor exchanges via `SendMessage`. Do NOT generate ideas — only
124
+ coordinate, nudge, and evaluate.
125
+ 3. After each direction is explored (3-5 exchanges), the Grounder decides:
126
+ **pursue**, **park**, or **redirect**.
127
+ 4. After depth-appropriate directions are explored:
128
+
129
+ | Depth | Directions to explore | Exchanges per direction |
130
+ |-------|-----------------------|------------------------|
131
+ | Standard | 3-5 | 3-5 exchanges |
132
+ | Deep | 5-8 | 5-8 exchanges |
133
+
134
+ 5. **Signal convergence:** Use `SendMessage` to tell the Synthesizer to
135
+ produce the final design document.
136
+ 6. Wait for the Synthesizer to write the design to
137
+ `.wazir/runs/latest/clarified/design.md` (if inside a pipeline run) or
138
+ `docs/plans/YYYY-MM-DD-<topic>-design.md` (if standalone).
139
+
140
+ ### Step 4: Convergence Criteria
31
141
 
32
- ### Agents
33
-
34
- | Agent | Role | Cognitive Mode |
35
- |-------|------|----------------|
36
- | **Free Thinker** | Proposes design directions, creative leaps, "what if..." scenarios. Speaks first, opens new threads. | Divergent generation |
37
- | **Grounder** | Challenges proposals, sorts signal from noise, picks winners, redirects dead ends. Responds to Free Thinker. | Convergent editing |
38
- | **Synthesizer** | Observes silently, maintains a running summary, produces the final design document. Never participates in dialogue. | Synthesis only |
39
-
40
- ### Communication Protocol
41
-
42
- - Free Thinker and Grounder exchange via **broadcast** (all agents see every message)
43
- - Synthesizer **NEVER** participates in dialogue — only observes and writes to files
44
- - After each direction is explored (3-5 exchanges), the Grounder decides: pursue, park, or redirect
45
-
46
- ### Dialogue Flow
142
+ The dialogue has converged when:
47
143
 
48
- 1. **Open a direction** Free Thinker proposes (broadcast)
49
- 2. **Deepen** 3-5 exchanges between Free Thinker and Grounder
50
- 3. **Decide** Grounder calls it: pursue, park, or redirect
51
- 4. **Next direction** Free Thinker opens a new thread, aware of connections to prior threads
144
+ 1. Enough directions have been explored for the depth level
145
+ 2. The pursued directions have genuine range (not variations of the same idea)
146
+ 3. The Grounder signals satisfaction
147
+ 4. Further dialogue is producing diminishing returns
52
148
 
53
- Early rounds: more divergent. Later rounds: more convergent.
149
+ Early rounds should be more divergent. Later rounds more convergent.
54
150
 
55
- ### Depth-Bounded Behavior
151
+ ### Step 5: Clean Up
56
152
 
57
- | Depth | Directions to explore | Exchanges per direction |
58
- |-------|-----------------------|------------------------|
59
- | Standard | 3-5 | 3 exchanges |
60
- | Deep | 5-8 | 5 exchanges |
153
+ Use `TeamDelete` to tear down the team after the Synthesizer has written the
154
+ design document.
61
155
 
62
- ### Convergence
156
+ ### Constraints
63
157
 
64
- The dialogue has converged when:
65
- 1. Enough directions have been explored for the depth level
66
- 2. The pursued directions have genuine range (not variations of the same idea)
67
- 3. The Grounder signals satisfaction
68
- 4. Further dialogue is producing diminishing returns
158
+ - Text output from teammates is NOT visible to the team — they MUST use
159
+ `SendMessage`
160
+ - The Arbiter (you) coordinates but does NOT generate ideas
161
+ - The Synthesizer NEVER sends messages — only reads and writes files
162
+ - Free Thinker and Grounder exchange via broadcast (all agents see every
163
+ message)
69
164
 
70
165
  ### Output
71
166
 
72
- The Synthesizer produces the design document following the same format as single-agent brainstorming:
167
+ The Synthesizer produces the design document following the same format as
168
+ single-agent brainstorming:
169
+
73
170
  - Design summary
171
+ - Pursued directions with rationale
172
+ - Rejected alternatives with reasons
74
173
  - Open questions or resolved assumptions
75
- - Explicit recommendation and rejected alternatives
174
+ - Explicit recommendation
76
175
 
77
- The Synthesizer then writes the design to `docs/plans/YYYY-MM-DD-<topic>-design.md` and hands off to `wz:writing-plans`.
176
+ After the design is written, submit it for the design-review loop
177
+ (`--mode design-review`). After design-review is complete, hand off to
178
+ `wz:writing-plans`.
@@ -0,0 +1,219 @@
1
+ ---
2
+ name: wz:clarifier
3
+ description: Run the clarification pipeline — research, clarify scope, brainstorm design, generate task specs and execution plan. Pauses for user approval between phases.
4
+ ---
5
+
6
+ # Clarifier
7
+
8
+ Run Phase 0 (Research) + Phase 1 (Clarify, Brainstorm, Plan) for the current project.
9
+
10
+ **Pacing rule:** This skill has mandatory user checkpoints between phases. Do NOT skip checkpoints. Do NOT combine phases. Complete each phase fully, present the output, and wait for explicit user approval before advancing.
11
+
12
+ Review loops follow the pattern in `docs/reference/review-loop-pattern.md`. All reviewer invocations use explicit `--mode`.
13
+
14
+ **Standalone mode:** If no `.wazir/runs/latest/` exists, artifacts go to `docs/plans/` and review logs go alongside.
15
+
16
+ ## Prerequisites
17
+
18
+ 1. Check `.wazir/state/config.json` exists. If not, run `wazir init` first.
19
+ 2. Check `.wazir/input/briefing.md` exists. If not, ask the user what they want to build and save it there.
20
+ 3. Read config for `default_depth`, `default_intent`, `team_mode`, and `multi_tool` settings.
21
+ 4. Create a run directory if one doesn't exist:
22
+ ```bash
23
+ mkdir -p .wazir/runs/run-YYYYMMDD-HHMMSS/{sources,tasks,artifacts,reviews,clarified}
24
+ ln -sfn run-YYYYMMDD-HHMMSS .wazir/runs/latest
25
+ ```
26
+
27
+ ---
28
+
29
+ ## Phase 0: Research (delegated)
30
+
31
+ Delegate to the discover workflow (`workflows/discover.md`):
32
+
33
+ 1. The **researcher role** produces the research artifact
34
+ (codebase scan, external sources, source manifest, research brief).
35
+ 2. The **reviewer role** runs the research-review loop
36
+ using research dimensions with `--mode research-review`
37
+ (see `docs/reference/review-loop-pattern.md`).
38
+ 3. The researcher resolves findings from each pass.
39
+ 4. Loop runs for `pass_counts[depth]` passes.
40
+ 5. Research artifact flows back to the clarifier for Checkpoint 0.
41
+
42
+ Save result to `.wazir/runs/latest/clarified/research-brief.md`.
43
+
44
+ ### Checkpoint 0: Research Review
45
+
46
+ Present the research brief to the user:
47
+
48
+ > **Research complete. Here's what I found:**
49
+ >
50
+ > [Summary of existing codebase state, relevant architecture, external context]
51
+ >
52
+ > **Does this match your understanding? Anything to add or correct?**
53
+ > 1. **Looks good, continue** (Recommended)
54
+ > 2. **Missing context** — let me add more information
55
+ > 3. **Wrong direction** — let me clarify the intent
56
+
57
+ **Wait for user response before continuing.**
58
+
59
+ ---
60
+
61
+ ## Phase 1A: Clarify (autonomous, then review, then checkpoint)
62
+
63
+ Read the briefing, research brief, and codebase context. Produce:
64
+
65
+ - **What** we're building — concrete deliverables, not vague descriptions
66
+ - **Why** — the motivation and business value
67
+ - **Constraints** — technical, timeline, dependencies
68
+ - **Assumptions** — what we're taking as given (explicitly stated)
69
+ - **Scope boundaries** — what's IN and what's explicitly OUT
70
+ - **Unresolved questions** — anything ambiguous that could change architecture or acceptance criteria
71
+
72
+ Save to `.wazir/runs/latest/clarified/clarification.md`.
73
+
74
+ Invoke the review loop for the clarification artifact using spec/clarification dimensions with `--mode clarification-review`. The **reviewer role** runs the loop (see `docs/reference/review-loop-pattern.md`). Resolve any findings before presenting to user.
75
+
76
+ ### Checkpoint 1A: Clarification Review
77
+
78
+ Present the full clarification to the user:
79
+
80
+ > **Here's the clarified scope:**
81
+ >
82
+ > [Full clarification with what/why/constraints/assumptions/scope/questions]
83
+ >
84
+ > **Are there any corrections, missing context, or open questions to resolve?**
85
+ > 1. **Approved — continue to spec hardening**
86
+ > 2. **Needs changes** — [user provides corrections]
87
+ > 3. **Missing important context** — [user adds information]
88
+
89
+ **Wait for user response. If the user provides corrections, update the clarification and re-present.**
90
+
91
+ ---
92
+
93
+ ## Phase 1A+: Spec Harden (delegated, then checkpoint)
94
+
95
+ Delegate to the specify workflow (`workflows/specify.md`):
96
+
97
+ 1. The **specifier role** produces a measurable spec from the clarification
98
+ and research artifacts.
99
+ 2. The **reviewer role** runs the spec-challenge loop
100
+ (`workflows/spec-challenge.md`) with `--mode spec-challenge`.
101
+ 3. The specifier resolves findings from each pass.
102
+ 4. Loop runs for `pass_counts[depth]` passes.
103
+
104
+ Save result to `.wazir/runs/latest/clarified/spec-hardened.md`.
105
+
106
+ ### Checkpoint 1A+: Hardened Spec Review
107
+
108
+ Present the changes made during hardening:
109
+
110
+ > **Spec hardened. Changes made:**
111
+ >
112
+ > [List of each gap found and how it was tightened]
113
+ >
114
+ > **Review the hardened spec. Approve or adjust?**
115
+ > 1. **Approved — continue to brainstorming** (Recommended)
116
+ > 2. **Disagree with a change** — [user specifies]
117
+ > 3. **Found more gaps** — [user adds]
118
+
119
+ **Wait for user response before continuing.**
120
+
121
+ ---
122
+
123
+ ## Phase 1B: Brainstorm (interactive — always pauses)
124
+
125
+ Invoke the `brainstorming` skill (`wz:brainstorming`) and follow it.
126
+
127
+ This phase explores design approaches:
128
+ 1. Propose 2-3 viable approaches with explicit trade-offs
129
+ 2. For each approach: effort estimate, risk assessment, what it enables/prevents
130
+ 3. Recommend one approach with rationale
131
+
132
+ If `team_mode: parallel` in config, the brainstorming skill activates its
133
+ **Agent Teams Structured Dialogue** mode:
134
+
135
+ 1. Checks that `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS` is enabled (falls back
136
+ to single-agent brainstorming if not)
137
+ 2. Creates a team via `TeamCreate` (`wazir-brainstorm-<concept-slug>`)
138
+ 3. Spawns three teammates via `Agent` with `team_name`:
139
+ - **Free Thinker** — proposes creative directions via `SendMessage`
140
+ - **Grounder** — challenges each direction with practical concerns via `SendMessage`
141
+ - **Synthesizer** — observes silently, writes the design document on convergence
142
+ 4. You (the Arbiter) coordinate the dialogue, signal convergence, and clean up
143
+ with `TeamDelete`
144
+
145
+ See `skills/brainstorming/SKILL.md` "Team Mode: Agent Teams Structured Dialogue"
146
+ for full spawn prompts, convergence criteria, and constraints.
147
+
148
+ ### Checkpoint 1B: Design Approval
149
+
150
+ > **Proposed design approaches:**
151
+ >
152
+ > [Approaches with trade-offs, recommendation]
153
+ >
154
+ > **Which approach should we implement?**
155
+ > 1. **Approach A** — [one-line summary]
156
+ > 2. **Approach B** — [one-line summary]
157
+ > 3. **Approach C** — [one-line summary]
158
+ > 4. **Modify an approach** — [user specifies changes]
159
+
160
+ **This is the most important checkpoint. Do NOT proceed without explicit design approval.**
161
+
162
+ Save approved design to `.wazir/runs/latest/clarified/design.md`.
163
+
164
+ ### Design Review
165
+
166
+ After the user approves the design concept, invoke the design-review loop with `--mode design-review`. The **reviewer role** validates the design against the approved spec using the canonical design-review dimensions:
167
+
168
+ - Spec coverage
169
+ - Design-spec consistency
170
+ - Accessibility
171
+ - Visual consistency
172
+ - Exported-code fidelity
173
+
174
+ See `workflows/design-review.md` and `docs/reference/review-loop-pattern.md`. The designer resolves findings. Proceed to planning only after all design-review passes complete.
175
+
176
+ ---
177
+
178
+ ## Phase 1C: Plan (delegated, then checkpoint)
179
+
180
+ Delegate to `wz:writing-plans`:
181
+
182
+ 1. `wz:writing-plans` (using **planner role**) produces the execution plan
183
+ and task specs.
184
+ 2. The **reviewer role** runs the plan-review loop
185
+ (`workflows/plan-review.md`) with `--mode plan-review`.
186
+ 3. The planner resolves findings from each pass.
187
+ 4. Loop runs for `pass_counts[depth]` passes.
188
+
189
+ ### Checkpoint 1C: Plan Review
190
+
191
+ > **Implementation plan: [N] tasks**
192
+ >
193
+ > | # | Task | Complexity | Dependencies | Description |
194
+ > |---|------|-----------|--------------|-------------|
195
+ > | 1 | ... | S | none | ... |
196
+ > | 2 | ... | M | task-1 | ... |
197
+ >
198
+ > **Review the plan. Approve or adjust?**
199
+ > 1. **Approved — ready for execution** (Recommended)
200
+ > 2. **Reorder or split tasks** — [user specifies]
201
+ > 3. **Missing tasks** — [user adds]
202
+ > 4. **Too granular / too coarse** — [user adjusts scope]
203
+
204
+ **Wait for user response before completing.**
205
+
206
+ ---
207
+
208
+ ## Done
209
+
210
+ When the plan is approved, present:
211
+
212
+ > **Clarification complete.**
213
+ >
214
+ > - Spec: `.wazir/runs/latest/clarified/spec-hardened.md`
215
+ > - Design: `.wazir/runs/latest/clarified/design.md`
216
+ > - Tasks: [count] tasks in `.wazir/runs/latest/tasks/`
217
+ > - Plan: `.wazir/runs/latest/clarified/execution-plan.md`
218
+ >
219
+ > **Next:** Run `/wazir:executor` to execute the plan.
@@ -43,7 +43,17 @@ Follow this order:
43
43
 
44
44
  Apply the minimum corrective change, then rerun the failing check and the relevant broader verification set.
45
45
 
46
- Rules:
46
+ ## Loop Cap Awareness
47
+
48
+ Debugging loops respect the loop cap when running inside a pipeline:
49
+ - **Pipeline mode** (`.wazir/runs/latest/` exists): use `wazir capture loop-check` to track iteration count. If the cap is reached (exit 43), escalate to the user with all evidence collected so far.
50
+ - **Standalone mode** (no `.wazir/runs/latest/`): the loop runs for `pass_counts[depth]` passes (quick=3, standard=5, deep=7) with no cap guard. Track iteration count manually.
51
+
52
+ In standalone mode, any debug logs go to `docs/plans/` alongside the artifact.
53
+
54
+ See `docs/reference/review-loop-pattern.md` for cap guard integration.
55
+
56
+ ## Rules
47
57
 
48
58
  - change one thing at a time
49
59
  - keep evidence for each failed hypothesis
@@ -7,7 +7,7 @@ description: Use when you have a written implementation plan to execute in a sep
7
7
 
8
8
  ## Overview
9
9
 
10
- Load plan, review critically, execute all tasks, report when complete.
10
+ Load plan, review critically, execute all tasks with per-task review checkpoints, report when complete.
11
11
 
12
12
  **Announce at start:** "I'm using the executing-plans skill to implement this plan."
13
13
 
@@ -27,7 +27,18 @@ For each task:
27
27
  1. Mark as in_progress
28
28
  2. Follow each step exactly (plan has bite-sized steps)
29
29
  3. Run verifications as specified
30
- 4. Mark as completed
30
+ 4. Review BEFORE marking complete (per-task review, 5 task-execution dimensions):
31
+ - Run task-review loop with `--mode task-review`
32
+ - Use `codex review --uncommitted` for uncommitted changes, or `codex review --base <sha>` if already committed
33
+ - Codex error handling: if codex exits non-zero, log the error, mark the pass as codex-unavailable, and use self-review findings only. Do not treat a Codex failure as a clean pass.
34
+ - Resolve all findings before proceeding
35
+ - Log to: `.wazir/runs/latest/reviews/execute-task-<NNN>-review-pass-<N>.md`
36
+ - Cap tracking: `wazir capture loop-check --task-id <NNN>`
37
+ - This is NOT the final scored review -- it is a per-task gate using 5 task-execution dimensions
38
+ - See `docs/reference/review-loop-pattern.md` for the full review loop contract
39
+ 5. Only after review passes: mark as completed, commit
40
+
41
+ **Standalone mode:** When no `.wazir/runs/latest/` exists, review logs go to `docs/plans/` alongside the artifact. The loop runs for `pass_counts[depth]` passes with no cap guard.
31
42
 
32
43
  ### Step 3: Complete Development
33
44
 
@@ -58,9 +69,11 @@ After all tasks complete and verified:
58
69
  - Review plan critically first
59
70
  - Follow plan steps exactly
60
71
  - Don't skip verifications
72
+ - Don't skip per-task review -- it catches issues before they cascade to later tasks
61
73
  - Reference skills when plan says to
62
74
  - Stop when blocked, don't guess
63
75
  - Never start implementation on main/master branch without explicit user consent
76
+ - Review loop pattern: see `docs/reference/review-loop-pattern.md`
64
77
 
65
78
  ## Integration
66
79
 
@@ -0,0 +1,76 @@
1
+ ---
2
+ name: wz:executor
3
+ description: Run the execution phase — implement the approved plan with TDD, quality gates, and verification.
4
+ ---
5
+
6
+ # Executor
7
+
8
+ Run Phase 2 (Execute) for the current project.
9
+
10
+ ## Prerequisites
11
+
12
+ 1. Check `.wazir/runs/latest/clarified/execution-plan.md` exists. If not, tell the user to run `/wazir:clarifier` first.
13
+ 2. Read the execution plan and task specs from `.wazir/runs/latest/tasks/`.
14
+ 3. Read `.wazir/state/config.json` for team_mode and depth settings.
15
+
16
+ ## Pre-Execution Validation
17
+
18
+ Run these checks before implementing:
19
+ - `wazir validate manifest` — confirm manifest schema is valid
20
+ - `wazir validate hooks` — confirm hook contracts are intact
21
+
22
+ If either fails, surface the failure and do NOT proceed until resolved.
23
+
24
+ ## Execution
25
+
26
+ Implement tasks in the order defined by the execution plan.
27
+
28
+ For each task:
29
+
30
+ 1. **Read** the task spec at `.wazir/runs/latest/tasks/task-NNN/spec.md`
31
+ 2. **Implement** using TDD (write test first, make it pass, refactor)
32
+ 3. **Verify** — run tests, type checks, linting as appropriate
33
+ 4. **Review BEFORE commit** (per-task review, NOT final review):
34
+ - Reviewer runs task-review loop with `--mode task-review` using 5 task-execution dimensions (correctness, tests, wiring, drift, quality)
35
+ - Reads the Codex model from config: `CODEX_MODEL=$(jq -r '.multi_tool.codex.model // empty' .wazir/state/config.json 2>/dev/null); CODEX_MODEL=${CODEX_MODEL:-gpt-5.4}`
36
+ - Uses `codex review -c model="$CODEX_MODEL" --uncommitted` for the current task's changes
37
+ - Codex error handling: if codex exits non-zero, log error, mark pass as `codex-unavailable`, use self-review only for that pass. Do NOT treat a Codex failure as a clean review. Do NOT skip the pass. The next pass still attempts Codex (transient failures may recover).
38
+ - Executor resolves findings, reviewer re-reviews
39
+ - Loop runs for `pass_counts[depth]` passes (quick=3, standard=5, deep=7). No extension.
40
+ - Review logs: `.wazir/runs/latest/reviews/execute-task-<NNN>-review-pass-<N>.md`
41
+ - Loop cap tracking: `wazir capture loop-check --task-id <NNN>` (each task has its own cap counter)
42
+ - See `docs/reference/review-loop-pattern.md` for full protocol
43
+ - NOTE: this is the per-task review (5 dims), not the final scored review (7 dims) which runs later in `/wazir:reviewer --mode final`
44
+ 5. **Commit** — only after review passes, commit with conventional commit format: `<type>(<scope>): <description>`
45
+ 6. **CHANGELOG** — if the change is user-facing (new feature, behavior change, bug fix visible to users), update `CHANGELOG.md` `[Unreleased]` section. If not user-facing (refactor, internal tooling, tests), skip.
46
+ 7. **Record** evidence at `.wazir/runs/latest/artifacts/task-NNN/`
47
+
48
+ Review loops follow the pattern in `docs/reference/review-loop-pattern.md`. Code review scoping: review uncommitted changes before commit. If changes are already committed (subagent workflow), use `codex review -c model="$CODEX_MODEL" --base <pre-task-sha>`.
49
+
50
+ If `team_mode: parallel` in config, spawn Agent Teams for independent tasks. Otherwise, tasks run sequentially.
51
+
52
+ **Standalone mode:** When no `.wazir/runs/latest/` exists, review logs go to `docs/plans/` alongside the artifact.
53
+
54
+ ## Context Retrieval
55
+
56
+ - Use `wazir index search-symbols <query>` to locate relevant code before reading
57
+ - Read full files directly when editing or verifying
58
+ - Use `wazir recall file <path> --tier L1` for files you need to understand but not modify
59
+
60
+ ## Escalation
61
+
62
+ Pause and ask the user when:
63
+ - The plan is blocked or contradictory
64
+ - Implementation would require unapproved scope change
65
+ - A task's acceptance criteria can't be met
66
+
67
+ ## Done
68
+
69
+ When all tasks are complete, present:
70
+
71
+ > **Execution complete.**
72
+ >
73
+ > - Tasks: [completed]/[total] implemented
74
+ > - Artifacts: `.wazir/runs/latest/artifacts/`
75
+ >
76
+ > **Next:** Run `/wazir:reviewer --mode final` to review the changes, or `/wazir` for the full pipeline.