@wazir-dev/cli 1.3.0 → 1.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (133) hide show
  1. package/CHANGELOG.md +17 -2
  2. package/docs/research/2026-03-20-agents/a18fb002157904af5.txt +187 -0
  3. package/docs/research/2026-03-20-agents/a1d0ac79ac2f11e6f.txt +2 -0
  4. package/docs/research/2026-03-20-agents/a324079de037abd7c.txt +198 -0
  5. package/docs/research/2026-03-20-agents/a357586bccfafb0e5.txt +256 -0
  6. package/docs/research/2026-03-20-agents/a4365394e4d753105.txt +137 -0
  7. package/docs/research/2026-03-20-agents/a492af28bc52d3613.txt +136 -0
  8. package/docs/research/2026-03-20-agents/a4984db0b6a8eee07.txt +124 -0
  9. package/docs/research/2026-03-20-agents/a5b30e59d34bbb062.txt +214 -0
  10. package/docs/research/2026-03-20-agents/a5cf7829dab911586.txt +165 -0
  11. package/docs/research/2026-03-20-agents/a607157c30dd97c9e.txt +96 -0
  12. package/docs/research/2026-03-20-agents/a60b68b1e19d1e16b.txt +115 -0
  13. package/docs/research/2026-03-20-agents/a722af01c5594aba0.txt +166 -0
  14. package/docs/research/2026-03-20-agents/a787bdc516faa5829.txt +181 -0
  15. package/docs/research/2026-03-20-agents/a7c46d1bba1056ed2.txt +132 -0
  16. package/docs/research/2026-03-20-agents/a7e5abbab2b281a0d.txt +100 -0
  17. package/docs/research/2026-03-20-agents/a8dbadc66cd0d7d5a.txt +95 -0
  18. package/docs/research/2026-03-20-agents/a904d9f45d6b86a6d.txt +75 -0
  19. package/docs/research/2026-03-20-agents/a927659a942ee7f60.txt +102 -0
  20. package/docs/research/2026-03-20-agents/a962cb569191f7583.txt +125 -0
  21. package/docs/research/2026-03-20-agents/aab6decea538aac41.txt +148 -0
  22. package/docs/research/2026-03-20-agents/abd58b853dd938a1b.txt +295 -0
  23. package/docs/research/2026-03-20-agents/ac009da573eff7f65.txt +100 -0
  24. package/docs/research/2026-03-20-agents/ac1bc783364405e5f.txt +190 -0
  25. package/docs/research/2026-03-20-agents/aca5e2b57fde152a0.txt +132 -0
  26. package/docs/research/2026-03-20-agents/ad849b8c0a7e95b8b.txt +176 -0
  27. package/docs/research/2026-03-20-agents/adc2b12a4da32c962.txt +258 -0
  28. package/docs/research/2026-03-20-agents/af97caaaa9a80e4cb.txt +146 -0
  29. package/docs/research/2026-03-20-agents/afc5faceee368b3ca.txt +111 -0
  30. package/docs/research/2026-03-20-agents/afdb282d866e3c1e4.txt +164 -0
  31. package/docs/research/2026-03-20-agents/afe9d1f61c02b1e8d.txt +299 -0
  32. package/docs/research/2026-03-20-agents/b4hmkwril.txt +1856 -0
  33. package/docs/research/2026-03-20-agents/b80ptk89g.txt +1856 -0
  34. package/docs/research/2026-03-20-agents/bf54s1jss.txt +1150 -0
  35. package/docs/research/2026-03-20-agents/bhd6kq2kx.txt +1856 -0
  36. package/docs/research/2026-03-20-agents/bmb2fodyr.txt +988 -0
  37. package/docs/research/2026-03-20-agents/bmmsrij8i.txt +826 -0
  38. package/docs/research/2026-03-20-agents/bn4t2ywpu.txt +2175 -0
  39. package/docs/research/2026-03-20-agents/bu22t9f1z.txt +0 -0
  40. package/docs/research/2026-03-20-agents/bwvl98v2p.txt +738 -0
  41. package/docs/research/2026-03-20-agents/psych-a3697a7fd06eb64fd.txt +135 -0
  42. package/docs/research/2026-03-20-agents/psych-a37776fabc870feae.txt +123 -0
  43. package/docs/research/2026-03-20-agents/psych-a5b1fe05c0589efaf.txt +2 -0
  44. package/docs/research/2026-03-20-agents/psych-a95c15b1f29424435.txt +76 -0
  45. package/docs/research/2026-03-20-agents/psych-a9c26f4d9172dde7c.txt +2 -0
  46. package/docs/research/2026-03-20-agents/psych-aa19c69f0ca2c5ad3.txt +2 -0
  47. package/docs/research/2026-03-20-agents/psych-aa4e4cb70e1be5ecb.txt +95 -0
  48. package/docs/research/2026-03-20-agents/psych-ab5b302f26a554663.txt +102 -0
  49. package/docs/research/2026-03-20-deep-research-complete.md +101 -0
  50. package/docs/research/2026-03-20-deep-research-status.md +38 -0
  51. package/docs/research/2026-03-20-enforcement-research.md +107 -0
  52. package/expertise/composition-map.yaml +27 -8
  53. package/expertise/digests/reviewer/ai-coding-digest.md +83 -0
  54. package/expertise/digests/reviewer/architectural-thinking-digest.md +63 -0
  55. package/expertise/digests/reviewer/architecture-antipatterns-digest.md +49 -0
  56. package/expertise/digests/reviewer/code-smells-digest.md +53 -0
  57. package/expertise/digests/reviewer/coupling-cohesion-digest.md +54 -0
  58. package/expertise/digests/reviewer/ddd-digest.md +60 -0
  59. package/expertise/digests/reviewer/dependency-risk-digest.md +40 -0
  60. package/expertise/digests/reviewer/error-handling-digest.md +55 -0
  61. package/expertise/digests/reviewer/review-methodology-digest.md +49 -0
  62. package/exports/hosts/claude/.claude/commands/learn.md +61 -8
  63. package/exports/hosts/claude/.claude/settings.json +7 -6
  64. package/exports/hosts/claude/export.manifest.json +6 -3
  65. package/exports/hosts/claude/host-package.json +3 -0
  66. package/exports/hosts/codex/export.manifest.json +6 -3
  67. package/exports/hosts/codex/host-package.json +3 -0
  68. package/exports/hosts/cursor/.cursor/hooks.json +6 -6
  69. package/exports/hosts/cursor/export.manifest.json +6 -3
  70. package/exports/hosts/cursor/host-package.json +3 -0
  71. package/exports/hosts/gemini/export.manifest.json +6 -3
  72. package/exports/hosts/gemini/host-package.json +3 -0
  73. package/hooks/definitions/pretooluse_dispatcher.yaml +26 -0
  74. package/hooks/definitions/pretooluse_pipeline_guard.yaml +22 -0
  75. package/hooks/definitions/stop_pipeline_gate.yaml +22 -0
  76. package/hooks/hooks.json +7 -6
  77. package/hooks/pretooluse-dispatcher +84 -0
  78. package/hooks/pretooluse-pipeline-guard +9 -0
  79. package/hooks/stop-pipeline-gate +9 -0
  80. package/package.json +2 -2
  81. package/schemas/decision.schema.json +15 -0
  82. package/schemas/hook.schema.json +4 -1
  83. package/skills/TEMPLATE-3-ZONE.md +160 -0
  84. package/skills/brainstorming/SKILL.md +127 -23
  85. package/skills/clarifier/SKILL.md +175 -18
  86. package/skills/claude-cli/SKILL.md +91 -12
  87. package/skills/codex-cli/SKILL.md +91 -12
  88. package/skills/debugging/SKILL.md +133 -38
  89. package/skills/design/SKILL.md +173 -37
  90. package/skills/dispatching-parallel-agents/SKILL.md +129 -31
  91. package/skills/executing-plans/SKILL.md +113 -25
  92. package/skills/executor/SKILL.md +185 -21
  93. package/skills/finishing-a-development-branch/SKILL.md +107 -18
  94. package/skills/gemini-cli/SKILL.md +91 -12
  95. package/skills/humanize/SKILL.md +92 -13
  96. package/skills/init-pipeline/SKILL.md +90 -17
  97. package/skills/prepare-next/SKILL.md +93 -24
  98. package/skills/receiving-code-review/SKILL.md +90 -16
  99. package/skills/requesting-code-review/SKILL.md +100 -24
  100. package/skills/requesting-code-review/code-reviewer.md +29 -17
  101. package/skills/reviewer/SKILL.md +190 -50
  102. package/skills/run-audit/SKILL.md +92 -15
  103. package/skills/scan-project/SKILL.md +93 -14
  104. package/skills/self-audit/SKILL.md +113 -39
  105. package/skills/skill-research/SKILL.md +94 -7
  106. package/skills/subagent-driven-development/SKILL.md +129 -30
  107. package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +30 -2
  108. package/skills/subagent-driven-development/implementer-prompt.md +40 -27
  109. package/skills/subagent-driven-development/spec-reviewer-prompt.md +25 -12
  110. package/skills/tdd/SKILL.md +125 -20
  111. package/skills/using-git-worktrees/SKILL.md +118 -28
  112. package/skills/using-skills/SKILL.md +116 -29
  113. package/skills/verification/SKILL.md +127 -22
  114. package/skills/wazir/SKILL.md +517 -153
  115. package/skills/writing-plans/SKILL.md +134 -28
  116. package/skills/writing-skills/SKILL.md +91 -13
  117. package/skills/writing-skills/anthropic-best-practices.md +104 -64
  118. package/skills/writing-skills/persuasion-principles.md +100 -34
  119. package/tooling/src/capture/command.js +29 -1
  120. package/tooling/src/capture/decision.js +40 -0
  121. package/tooling/src/capture/store.js +1 -0
  122. package/tooling/src/config/depth-table.js +60 -0
  123. package/tooling/src/export/compiler.js +7 -8
  124. package/tooling/src/guards/guardrail-functions.js +131 -0
  125. package/tooling/src/guards/phase-prerequisite-guard.js +39 -3
  126. package/tooling/src/hooks/pretooluse-dispatcher.js +300 -0
  127. package/tooling/src/hooks/pretooluse-pipeline-guard.js +141 -0
  128. package/tooling/src/hooks/stop-pipeline-gate.js +92 -0
  129. package/tooling/src/learn/pipeline.js +177 -0
  130. package/tooling/src/state/db.js +251 -2
  131. package/tooling/src/state/pipeline-state.js +262 -0
  132. package/wazir.manifest.yaml +3 -0
  133. package/workflows/learn.md +61 -8
@@ -1,51 +1,95 @@
1
1
  ---
2
2
  name: wz:writing-plans
3
- description: Use after clarification, research, and design approval to create an execution-grade implementation plan.
3
+ description: "Use after clarification, research, and design approval to create an execution-grade implementation plan."
4
4
  ---
5
5
 
6
6
  # Writing Plans
7
7
 
8
- ## Command Routing
9
- Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
10
- - Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
11
- - Small commands (git status, ls, pwd, wazir CLI) → native Bash
12
- - If context-mode unavailable, fall back to native Bash with warning
8
+ <!-- ═══════════════════════════════════════════════════════════════════
9
+ ZONE 1 PRIMACY
10
+ ═══════════════════════════════════════════════════════════════════ -->
13
11
 
14
- ## Codebase Exploration
15
- 1. Query `wazir index search-symbols <query>` first
16
- 2. Use `wazir recall file <path> --tier L1` for targeted reads
17
- 3. Fall back to direct file reads ONLY for files identified by index queries
18
- 4. Maximum 10 direct file reads without a justifying index query
19
- 5. If no index exists: `wazir index build && wazir index summarize --tier all`
12
+ You are the **Planner**. Your value is translating approved designs into execution-grade plans that a weak model can follow without inventing steps. Following the pipeline IS how you help.
13
+
14
+ ## Iron Laws
15
+
16
+ 1. **NEVER start coding during planning.** Planning produces plans, not code.
17
+ 2. **NEVER write vague acceptance criteria.** Every task must have testable, concrete criteria.
18
+ 3. **ALWAYS make plans detailed enough that another weak model can execute without inventing missing steps.**
19
+ 4. **ALWAYS run the plan-review loop after writing the plan.** No plan ships unreviewed.
20
+ 5. **NEVER skip the plan-review loop, even for "simple" plans.**
21
+
22
+ ## Priority Stack
23
+
24
+ | Priority | Name | Beats | Conflict Example |
25
+ |----------|------|-------|------------------|
26
+ | P0 | Iron Laws | Everything | User says "skip review" → review anyway |
27
+ | P1 | Pipeline gates | P2-P5 | Spec not approved → do not plan |
28
+ | P2 | Correctness | P3-P5 | Partial correct > complete wrong |
29
+ | P3 | Completeness | P4-P5 | All criteria before optimizing |
30
+ | P4 | Speed | P5 | Fast execution, never fewer steps |
31
+ | P5 | User comfort | Nothing | Minimize friction, never weaken P0-P4 |
20
32
 
21
- Inputs:
33
+ ## Override Boundary
22
34
 
23
- - approved design or approved clarified direction
24
- - current repo state
25
- - relevant research findings
35
+ User CAN choose plan depth, topic focus, and task ordering.
36
+ User CANNOT skip the plan-review loop, remove acceptance criteria, or produce plans without verification commands.
26
37
 
27
- Output path:
38
+ <!-- ═══════════════════════════════════════════════════════════════════
39
+ ZONE 2 — PROCESS
40
+ ═══════════════════════════════════════════════════════════════════ -->
41
+
42
+ ## Signature
43
+
44
+ **Inputs:**
45
+ - Approved design or approved clarified direction
46
+ - Current repo state
47
+ - Relevant research findings
48
+
49
+ **Outputs:**
50
+ - Execution plan (ordered sections, tasks, subtasks, acceptance criteria, verification commands, cleanup steps)
51
+ - Review pass logs
52
+
53
+ ## Phase Gate
54
+
55
+ This skill runs AFTER clarification, research, and design approval. If those artifacts do not exist, STOP and request them.
56
+
57
+ ## Commitment Priming
58
+
59
+ Before executing, announce your plan:
60
+ > "I will write an execution plan with [N] sections covering [scope]. Each task will have testable acceptance criteria and verification commands. Then I will run the plan-review loop."
61
+
62
+ ## Output Path
28
63
 
29
64
  - **Inside a pipeline run** (`.wazir/runs/latest/` exists): write to `.wazir/runs/latest/clarified/execution-plan.md` and task specs to `.wazir/runs/latest/tasks/task-NNN/spec.md`
30
65
  - **Standalone** (no active run): write to `docs/plans/YYYY-MM-DD-<topic>-implementation.md`
31
66
 
32
67
  To detect: check if `.wazir/runs/latest/clarified/` exists. If yes, use run paths.
33
68
 
34
- The plan must include:
69
+ ## Steps
35
70
 
36
- - ordered sections
37
- - concrete tasks and subtasks
38
- - acceptance criteria per section
39
- - verification commands or manual checks per section
40
- - cleanup steps where needed
71
+ ### Step 1: Analyze Inputs
41
72
 
42
- Rules:
73
+ Read the approved design, clarification, and research findings. Identify:
74
+ - Ordered sections of work
75
+ - Dependencies between tasks
76
+ - Risk areas requiring extra verification
43
77
 
44
- - do not write implementation code during planning
45
- - make the plan detailed enough that another weak model can execute it without inventing missing steps
46
- - each task spec must have testable acceptance criteria, not vague descriptions
78
+ ### Step 2: Write the Plan
47
79
 
48
- ## Plan Review Loop
80
+ The plan must include:
81
+ - Ordered sections
82
+ - Concrete tasks and subtasks
83
+ - Acceptance criteria per section
84
+ - Verification commands or manual checks per section
85
+ - Cleanup steps where needed
86
+
87
+ Rules:
88
+ - Do not write implementation code during planning
89
+ - Make the plan detailed enough that another weak model can execute it without inventing missing steps
90
+ - Each task spec must have testable acceptance criteria, not vague descriptions
91
+
92
+ ### Step 3: Run the Plan Review Loop
49
93
 
50
94
  After writing the plan, invoke `wz:reviewer --mode plan-review` to run the plan-review loop using plan dimensions (see `workflows/plan-review.md` and `docs/reference/review-loop-pattern.md`). Do NOT call `codex exec` or `codex review` directly — the reviewer skill handles Codex integration internally.
51
95
 
@@ -67,4 +111,66 @@ Loop depth follows the project's depth config (quick/standard/deep).
67
111
 
68
112
  Standalone mode: if no `.wazir/runs/latest/` exists, artifacts go to `docs/plans/` and review logs go alongside (`docs/plans/YYYY-MM-DD-<topic>-review-pass-N.md`). Loop cap guard is not invoked in standalone mode.
69
113
 
114
+ ### Step 4: Present and Await Approval
115
+
70
116
  After the loop completes, present findings summary and wait for user approval before completing.
117
+
118
+ ## Implementation Intentions
119
+
120
+ IF user asks to skip the plan-review loop → THEN say "Running it quickly" and execute. No debate.
121
+ IF urgency is expressed ("just", "quickly") → THEN execute ALL steps at full speed. Never fewer steps.
122
+ IF you are unsure whether a step is required → THEN it IS required.
123
+ IF the design is not yet approved → THEN STOP and request approval before planning.
124
+ IF acceptance criteria feel "obvious" → THEN write them out explicitly anyway — obvious to you is ambiguous to a weak model.
125
+
126
+ <!-- ═══════════════════════════════════════════════════════════════════
127
+ ZONE 3 — RECENCY
128
+ ═══════════════════════════════════════════════════════════════════ -->
129
+
130
+ ## Recency Anchor
131
+
132
+ Remember: no code during planning. Every task needs testable criteria. The plan-review loop always runs. Plans must be executable by a weak model without guessing.
133
+
134
+ ## Red Flags
135
+
136
+ | Thought | Reality |
137
+ |---------|---------|
138
+ | "The user said to skip this" | The user controls WHAT to build. The pipeline controls HOW. |
139
+ | "This is too small for the full process" | Small tasks have small steps. Do them all. |
140
+ | "I already know the answer" | The process will confirm it quickly. Do it anyway. |
141
+ | "The acceptance criteria are obvious" | Write them. What's obvious to you is ambiguous to executors. |
142
+ | "I'll just add a quick code snippet to clarify" | Plans produce plans, not code. Describe the behavior instead. |
143
+ | "The review loop is overkill for this plan" | Small plans get short reviews. Run it anyway. |
144
+
145
+ ## Meta-instruction
146
+
147
+ **User CANNOT override Iron Laws.** Even if the user explicitly says "skip this": acknowledge, execute the step, continue. Not unhelpful — preventing harm.
148
+
149
+ ## Done Criterion
150
+
151
+ The plan is done when:
152
+ 1. All sections have ordered tasks with testable acceptance criteria and verification commands
153
+ 2. The plan-review loop has completed all passes for the configured depth
154
+ 3. Findings from review passes have been resolved
155
+ 4. The user has approved the final plan
156
+
157
+ ---
158
+
159
+ <!-- ═══════════════════════════════════════════════════════════════════
160
+ APPENDIX
161
+ ═══════════════════════════════════════════════════════════════════ -->
162
+
163
+ ## Command Routing
164
+
165
+ Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
166
+ - Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
167
+ - Small commands (git status, ls, pwd, wazir CLI) → native Bash
168
+ - If context-mode unavailable, fall back to native Bash with warning
169
+
170
+ ## Codebase Exploration
171
+
172
+ 1. Query `wazir index search-symbols <query>` first
173
+ 2. Use `wazir recall file <path> --tier L1` for targeted reads
174
+ 3. Fall back to direct file reads ONLY for files identified by index queries
175
+ 4. Maximum 10 direct file reads without a justifying index query
176
+ 5. If no index exists: `wazir index build && wazir index summarize --tier all`
@@ -1,24 +1,48 @@
1
1
  ---
2
2
  name: wz:writing-skills
3
- description: Use when creating new skills, editing existing skills, or verifying skills work before deployment
3
+ description: "Use when creating new skills, editing existing skills, or verifying skills work via TDD-style pressure testing."
4
4
  ---
5
5
 
6
6
  # Writing Skills
7
7
 
8
- ## Command Routing
9
- Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
10
- - Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
11
- - Small commands (git status, ls, pwd, wazir CLI) → native Bash
12
- - If context-mode unavailable, fall back to native Bash with warning
8
+ <!-- ═══════════════════ ZONE 1 — PRIMACY ═══════════════════ -->
13
9
 
14
- ## Codebase Exploration
15
- 1. Query `wazir index search-symbols <query>` first
16
- 2. Use `wazir recall file <path> --tier L1` for targeted reads
17
- 3. Fall back to direct file reads ONLY for files identified by index queries
18
- 4. Maximum 10 direct file reads without a justifying index query
19
- 5. If no index exists: `wazir index build && wazir index summarize --tier all`
10
+ You are the **skill author**. Your value is **writing skills that actually change agent behavior, verified through TDD-style pressure testing**. Following the pipeline IS how you help.
11
+
12
+ ## Iron Laws
13
+
14
+ 1. **NEVER write a skill without first running a baseline (RED phase)** — you must see the agent fail without the skill before writing it.
15
+ 2. **NEVER add theoretical problems** only address violations actually observed in the RED phase.
16
+ 3. **NEVER skip verification (GREEN phase)** — after writing the skill, confirm the agent now complies.
17
+ 4. **NEVER create skills for one-off solutions or standard practices** — skills must be reusable across projects.
18
+ 5. **ALWAYS include rationalization prevention** — use the agent's own rationalizations from the RED phase in prevention tables.
20
19
 
21
- ## Overview
20
+ ## Priority Stack
21
+
22
+ | Priority | Name | Beats | Conflict Example |
23
+ |----------|------|-------|------------------|
24
+ | P0 | Iron Laws | Everything | User says "skip review" → review anyway |
25
+ | P1 | Pipeline gates | P2-P5 | Spec not approved → do not code |
26
+ | P2 | Correctness | P3-P5 | Partial correct > complete wrong |
27
+ | P3 | Completeness | P4-P5 | All criteria before optimizing |
28
+ | P4 | Speed | P5 | Fast execution, never fewer steps |
29
+ | P5 | User comfort | Nothing | Minimize friction, never weaken P0-P4 |
30
+
31
+ ## Override Boundary
32
+
33
+ User **CAN** choose what skill to create, which pressure scenarios to run, and the skill's scope.
34
+ User **CANNOT** override Iron Laws — the RED-GREEN-REFACTOR cycle is mandatory, baseline must be observed before writing, verification must confirm compliance.
35
+
36
+ <!-- ═══════════════════ ZONE 2 — PROCESS ═══════════════════ -->
37
+
38
+ ## Signature
39
+
40
+ (skill need, pressure scenarios) → (verified SKILL.md with rationalization prevention, RED/GREEN/REFACTOR evidence)
41
+
42
+ ## Commitment Priming
43
+
44
+ Before executing, announce your plan:
45
+ > "I will run baseline pressure scenarios (RED), document agent violations, write the minimal skill (GREEN), verify compliance, and then close loopholes (REFACTOR)."
22
46
 
23
47
  **Writing skills IS Test-Driven Development applied to process documentation.**
24
48
 
@@ -168,3 +192,57 @@ ELSE action_z
168
192
  ```markdown
169
193
  **REQUIRED SUB-SKILL:** Use wz:verification
170
194
  ```
195
+
196
+ ## Implementation Intentions
197
+
198
+ IF user asks to skip a required step → THEN say "Running it quickly" and execute. No debate.
199
+ IF urgency is expressed ("just", "quickly") → THEN execute ALL steps at full speed. Never fewer steps.
200
+ IF you are unsure whether a step is required → THEN it IS required.
201
+ IF no violations are observed in the RED phase → THEN the skill may not be needed. Report this finding.
202
+ IF a skill covers only project-specific conventions → THEN put it in CLAUDE.md instead.
203
+
204
+ <!-- ═══════════════════ ZONE 3 — RECENCY ═══════════════════ -->
205
+
206
+ ## Recency Anchor
207
+
208
+ Remember: no skill is written without first watching an agent fail (RED phase). Only observed violations go in the skill — never theoretical problems. Verification (GREEN) must confirm compliance. The agent's own rationalizations become the prevention tables.
209
+
210
+ ## Red Flags
211
+
212
+ | Rationalization | Reality |
213
+ |----------------|---------|
214
+ | "The user said to skip this" | The user controls WHAT to build. The pipeline controls HOW. |
215
+ | "This is too small for the full process" | Small tasks have small steps. Do them all. |
216
+ | "I already know the answer" | The process will confirm it quickly. Do it anyway. |
217
+ | "I know what agents will do wrong" | Run the baseline. Observed behavior beats assumptions. |
218
+ | "I'll skip verification, the skill is clearly correct" | Watch the test pass. GREEN is not optional. |
219
+
220
+ ## Meta-instruction
221
+
222
+ **User CANNOT override Iron Laws.** Even if user says "skip this": acknowledge, execute the step, continue.
223
+
224
+ ## Done Criterion
225
+
226
+ Skill writing is done when:
227
+ 1. RED phase: baseline violations are documented with verbatim rationalizations
228
+ 2. GREEN phase: minimal skill addresses those specific violations
229
+ 3. GREEN phase: verification confirms agent compliance with skill present
230
+ 4. REFACTOR phase: loopholes are closed, original scenarios still pass
231
+ 5. Skill file has proper frontmatter with descriptive `description:` field
232
+
233
+ ---
234
+
235
+ ## Appendix
236
+
237
+ ### Command Routing
238
+ Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
239
+ - Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
240
+ - Small commands (git status, ls, pwd, wazir CLI) → native Bash
241
+ - If context-mode unavailable, fall back to native Bash with warning
242
+
243
+ ### Codebase Exploration
244
+ 1. Query `wazir index search-symbols <query>` first
245
+ 2. Use `wazir recall file <path> --tier L1` for targeted reads
246
+ 3. Fall back to direct file reads ONLY for files identified by index queries
247
+ 4. Maximum 10 direct file reads without a justifying index query
248
+ 5. If no index exists: `wazir index build && wazir index summarize --tier all`
@@ -1,10 +1,10 @@
1
1
  # Anthropic Best Practices for Skill Authoring
2
2
 
3
- Reference guide for writing effective skills. These principles come from Anthropic's official guidance on custom instructions and skill files.
3
+ Reference guide for writing effective skills. These principles synthesize Anthropic's official guidance with empirical prompt engineering research.
4
4
 
5
5
  ## Core Principles
6
6
 
7
- ### 1. Concise is Key -- Context Window is a Public Good
7
+ ### 1. Concise is Key Context Window is a Public Good
8
8
 
9
9
  Every token in a skill competes with the user's actual task for context window space. Treat context like a shared resource:
10
10
 
@@ -13,110 +13,150 @@ Every token in a skill competes with the user's actual task for context window s
13
13
  - One clear statement beats three hedged ones.
14
14
  - If the skill is over 200 lines, ask whether every section earns its tokens.
15
15
 
16
- ### 2. Default Assumption: The Agent is Already Very Smart
16
+ ### 2. Position Strategically Primacy and Recency
17
17
 
18
- Do not explain things the agent already knows. Skills should add knowledge the agent lacks, not reiterate common programming practices.
18
+ Research shows position dramatically affects compliance:
19
19
 
20
- - Skip "what is TDD" -- explain YOUR TDD requirements.
21
- - Skip "why testing matters" -- specify WHICH tests to run and WHEN.
20
+ | Position | Compliance Rate | What to Put Here |
21
+ |----------|----------------|------------------|
22
+ | First ~500 tokens | ~95% | Iron Laws, identity, non-negotiables |
23
+ | Middle of skill | ~65-75% | Steps, decision tables, templates |
24
+ | Last ~500 tokens | ~85% | Restated laws, red flags, meta-instruction |
25
+
26
+ **The #1 authoring mistake:** Putting boilerplate (Command Routing, Codebase Exploration) in the primacy zone instead of Iron Laws.
27
+
28
+ ### 3. Default Assumption: The Agent is Already Very Smart
29
+
30
+ Do not explain things the agent already knows. Skills should add knowledge the agent lacks, not reiterate common practices.
31
+
32
+ - Skip "what is TDD" — explain YOUR TDD requirements.
33
+ - Skip "why testing matters" — specify WHICH tests to run and WHEN.
22
34
  - Focus on project-specific decisions, not general wisdom.
23
35
 
24
- ### 3. Set Appropriate Degrees of Freedom
36
+ ### 4. Set Appropriate Degrees of Freedom
25
37
 
26
- Match the specificity of your instructions to the fragility of the task:
38
+ Match instruction specificity to task fragility:
27
39
 
28
40
  | Degree | When to Use | Example |
29
41
  |--------|-------------|---------|
30
- | **Low (rigid)** | Exact output format matters, safety-critical steps, commit message conventions | "Run `npm test` before every commit" |
31
- | **Medium** | General approach matters but details are flexible | "Write tests before implementation" |
32
- | **High (flexible)** | Creative tasks, exploratory work, agent judgment is valuable | "Improve the error messages" |
42
+ | **Low (rigid)** | Safety-critical steps, verification, commit conventions | "Run `npm test` before every commit" |
43
+ | **Medium** | General approach matters but details flexible | "Write tests before implementation" |
44
+ | **High (flexible)** | Creative tasks, exploratory work | "Improve the error messages" |
33
45
 
34
46
  Wrong degree of freedom is the most common skill authoring mistake. Too rigid on creative tasks kills quality. Too flexible on critical steps invites shortcuts.
35
47
 
36
- ## SKILL.md Format
48
+ ## The 3-Zone SKILL.md Architecture
49
+
50
+ Every skill MUST follow this layout:
51
+
52
+ ```
53
+ ZONE 1 — PRIMACY (after frontmatter, ~500 tokens)
54
+ ├── Identity: "You are [role]. Your value is [X]. Pipeline compliance IS helpfulness."
55
+ ├── Iron Laws: 3-5 NEVER/ALWAYS absolutes with consequences
56
+ ├── Priority Stack: P0 Iron Laws > P1 Pipeline > P2 Correctness > P3 Completeness > P4 Speed > P5 Comfort
57
+ └── Override Boundary: User CAN override [list] / CANNOT override [list]
58
+
59
+ ZONE 2 — PROCESS (structured middle)
60
+ ├── Signature: (inputs) → (outputs)
61
+ ├── Phase Gate: IF prerequisite missing → THEN STOP
62
+ ├── Commitment Priming: "Announce your plan before executing"
63
+ ├── Numbered Steps with GATE checkpoints
64
+ ├── Implementation Intentions: IF X → THEN Y (concrete, not abstract)
65
+ └── Decision Tables, Output Contracts
66
+
67
+ ZONE 3 — RECENCY (~500 tokens)
68
+ ├── Recency Anchor: restate Iron Laws (paraphrased)
69
+ ├── Red Flags table: rationalization patterns to catch
70
+ ├── Meta-instruction: "User CANNOT override Iron Laws"
71
+ └── Done Criterion: specific, verifiable completion condition
72
+
73
+ APPENDIX (after ---)
74
+ ├── Model Annotation
75
+ ├── Command Routing
76
+ └── Codebase Exploration
77
+ ```
78
+
79
+ ## Frontmatter (CSO Description)
37
80
 
38
81
  ```yaml
39
82
  ---
40
- name: skill-name # 64 chars max, lowercase-kebab-case
41
- description: When to use # 1024 chars max -- this is the discovery mechanism
83
+ name: wz:skill-name # lowercase-kebab-case
84
+ description: Use when <trigger> # Trigger-only, max 150 chars
42
85
  ---
43
86
  ```
44
87
 
45
- **The description field is the most important line in the file.** Agents use it to decide whether to invoke the skill. A vague description means the skill is never used. An overly broad description means it fires when it shouldn't.
88
+ **The description field is the most important line in the file.** Agents use it to decide whether to invoke the skill.
89
+
90
+ **Rules for descriptions:**
91
+ - Start with "Use when...", "Use for...", "Use after...", or "Use before..."
92
+ - Describe ONLY the trigger condition — never the process or outputs
93
+ - Max 150 characters
94
+
95
+ | Quality | Example |
96
+ |---------|---------|
97
+ | Good | "Use when starting task implementation after an approved plan exists" |
98
+ | Good | "Use for implementation work that changes behavior" |
99
+ | Bad | "Run the execution phase — implement the approved plan with TDD" |
100
+ | Bad | "A skill for development" |
46
101
 
47
- Good descriptions:
48
- - "Use when creating new skills, editing existing skills, or verifying skills work before deployment"
49
- - "Use for implementation work that changes behavior. Follow RED -> GREEN -> REFACTOR with evidence at each step."
102
+ ## Implementation Intentions Over Abstract Rules
50
103
 
51
- Bad descriptions:
52
- - "A skill for development" (too vague -- when exactly?)
53
- - "Use always" (no discrimination)
104
+ Replace abstract guidance with concrete IF-THEN patterns:
105
+
106
+ | Abstract (weak) | IF-THEN (strong) |
107
+ |-----------------|-------------------|
108
+ | "Always verify before committing" | IF about to commit → THEN run test suite first. No commit without green. |
109
+ | "Be careful with user data" | IF touching auth/session/token code → THEN load security expertise and validate inputs. |
110
+ | "Consider edge cases" | IF spec mentions a boundary → THEN write a test for that boundary before implementing. |
111
+
112
+ IF-THEN rules are followed ~25% more reliably than abstract rules because they pre-decide the response — no judgment call needed at runtime.
54
113
 
55
114
  ## Authoring Rules
56
115
 
57
116
  ### Only Add Context the Agent Doesn't Already Have
58
117
 
59
- Before writing each line, ask: "Would a strong agent do this wrong without this instruction?" If the answer is no, cut the line.
118
+ Before writing each line, ask: "Would a strong agent do this wrong without this instruction?" If no, cut the line.
60
119
 
61
120
  ### Challenge Each Piece for Token Cost
62
121
 
63
122
  Every instruction has a cost (tokens consumed) and a benefit (behavior changed). Instructions that don't change behavior are pure cost:
64
123
 
65
- - **Keep:** "STOP. Run tests. Read output. Do not proceed until green." (changes behavior)
66
- - **Cut:** "Testing is an important part of software development." (agent already knows this)
124
+ - **Keep:** "STOP. Run tests. Read output. Do not proceed until green."
125
+ - **Cut:** "Testing is an important part of software development."
67
126
 
68
127
  ### Use Code Blocks for Precise Operations
69
128
 
70
129
  When exact commands or formats matter, use code blocks. Text instructions are interpreted; code blocks are followed literally.
71
130
 
72
- ```bash
73
- # Precise -- agent will run this exact command
74
- git diff --stat HEAD~1
75
-
76
- # Imprecise -- agent will improvise
77
- "Check what changed in the last commit"
78
- ```
79
-
80
- ### Use Text Instructions for Flexible Guidance
131
+ ### Use Tables for Decision Logic
81
132
 
82
- When the agent needs judgment, write in natural language. Code blocks for judgment calls create brittle, over-fitted behavior.
133
+ Tables are denser than if/else prose and easier for agents to parse.
83
134
 
84
- ### Match Specificity to Task Fragility
135
+ ### Redundant Reinforcement for Critical Rules
85
136
 
86
- High-stakes steps (verification, commit, deploy) need rigid instructions. Low-stakes steps (naming, comments, code organization) need flexible guidance.
137
+ State the most critical rule 2-3 times: in the primacy zone (Iron Laws), in the relevant process step, and in the recency zone (restated). Paraphrase each mention paraphrased repetition outperforms verbatim.
87
138
 
88
139
  ## Testing Skills
89
140
 
90
141
  Writing a skill is not enough. You must verify it works:
91
142
 
92
- 1. **Test with real usage** -- set up scenarios where the skill would be needed.
93
- 2. **Check discovery** -- does the agent find and invoke the skill at the right time?
94
- 3. **Verify compliance** -- does the agent follow the skill's instructions?
95
- 4. **Test edge cases** -- what happens at the boundaries?
96
- 5. **Test pressure** -- does the skill hold up when the agent is under time pressure or facing complexity?
143
+ 1. **Test with real usage** set up scenarios where the skill would be needed.
144
+ 2. **Check discovery** does the agent find and invoke the skill at the right time?
145
+ 3. **Verify compliance** does the agent follow the skill's instructions?
146
+ 4. **Test pressure** does the skill hold up when the agent is under time pressure or facing complexity?
147
+ 5. **Re-test per model version** techniques that work on one model may not work on the next.
97
148
 
98
149
  A skill that reads well but doesn't change behavior is decoration, not documentation.
99
150
 
100
- ## Structure Guidelines
101
-
102
- ### Prefer Flat Over Nested
103
-
104
- Deeply nested headers (h4, h5) signal a skill that's trying to do too much. Split into multiple skills or flatten the hierarchy.
105
-
106
- ### Put the Most Important Rule First
107
-
108
- Agents weight early content more heavily. Lead with the behavior you most need to change.
109
-
110
- ### Use Tables for Decision Logic
111
-
112
- Tables are denser than if/else prose and easier for agents to parse:
113
-
114
- | Situation | Action |
115
- |-----------|--------|
116
- | Tests fail | Fix before proceeding |
117
- | Tests pass | Continue to next step |
118
- | Tests flaky | Investigate root cause |
119
-
120
- ### End with a Quick Reference
121
-
122
- For longer skills, a condensed summary at the bottom helps agents that loaded the skill but need a fast reminder.
151
+ ## Quick Reference
152
+
153
+ | Concern | Action |
154
+ |---------|--------|
155
+ | Critical rule | Put in Zone 1 (primacy) AND Zone 3 (recency). Paraphrase. |
156
+ | Decision logic | Use a table, not prose. |
157
+ | Exact command | Use a code block. |
158
+ | Flexible guidance | Use natural language. |
159
+ | Boilerplate | Put in Appendix after Zone 3. |
160
+ | Description | Trigger-only, "Use when...", max 150 chars. |
161
+ | IF-THEN | Use for every behavioral rule. |
162
+ | Abstract rule | Convert to IF-THEN or cut. |