aw-ecc 1.4.31 → 1.4.47

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (259) hide show
  1. package/.claude-plugin/plugin.json +1 -1
  2. package/.codex/hooks/aw-post-tool-use.sh +8 -2
  3. package/.codex/hooks/aw-session-start.sh +11 -4
  4. package/.codex/hooks/aw-stop.sh +8 -2
  5. package/.codex/hooks/aw-user-prompt-submit.sh +10 -2
  6. package/.codex/hooks.json +8 -8
  7. package/.cursor/INSTALL.md +7 -5
  8. package/.cursor/hooks/adapter.js +41 -4
  9. package/.cursor/hooks/after-agent-response.js +62 -0
  10. package/.cursor/hooks/before-submit-prompt.js +7 -1
  11. package/.cursor/hooks/post-tool-use-failure.js +21 -0
  12. package/.cursor/hooks/post-tool-use.js +39 -0
  13. package/.cursor/hooks/shared/aw-phase-definitions.js +53 -0
  14. package/.cursor/hooks/shared/aw-phase-runner.js +3 -1
  15. package/.cursor/hooks/subagent-start.js +22 -4
  16. package/.cursor/hooks/subagent-stop.js +18 -1
  17. package/.cursor/hooks.json +23 -2
  18. package/.opencode/package.json +1 -1
  19. package/AGENTS.md +3 -3
  20. package/README.md +5 -5
  21. package/commands/adk.md +52 -0
  22. package/commands/build.md +22 -9
  23. package/commands/deploy.md +12 -0
  24. package/commands/execute.md +9 -0
  25. package/commands/feature.md +333 -0
  26. package/commands/investigate.md +18 -5
  27. package/commands/plan.md +23 -9
  28. package/commands/publish.md +65 -0
  29. package/commands/review.md +12 -0
  30. package/commands/ship.md +12 -0
  31. package/commands/test.md +12 -0
  32. package/commands/verify.md +9 -0
  33. package/hooks/hooks.json +36 -0
  34. package/manifests/install-components.json +8 -0
  35. package/manifests/install-modules.json +83 -0
  36. package/manifests/install-profiles.json +7 -0
  37. package/package.json +1 -1
  38. package/scripts/ci/validate-rules.js +51 -0
  39. package/scripts/cursor-aw-home/hooks.json +23 -2
  40. package/scripts/cursor-aw-hooks/adapter.js +41 -4
  41. package/scripts/cursor-aw-hooks/before-submit-prompt.js +7 -1
  42. package/scripts/hooks/aw-usage-commit-created.js +32 -0
  43. package/scripts/hooks/aw-usage-post-tool-use-failure.js +56 -0
  44. package/scripts/hooks/aw-usage-post-tool-use.js +242 -0
  45. package/scripts/hooks/aw-usage-prompt-submit.js +112 -0
  46. package/scripts/hooks/aw-usage-session-start.js +48 -0
  47. package/scripts/hooks/aw-usage-stop.js +182 -0
  48. package/scripts/hooks/aw-usage-telemetry-send.js +84 -0
  49. package/scripts/hooks/cost-tracker.js +3 -23
  50. package/scripts/hooks/shared/aw-phase-definitions.js +53 -0
  51. package/scripts/hooks/shared/aw-phase-runner.js +3 -1
  52. package/scripts/lib/aw-hook-contract.js +2 -2
  53. package/scripts/lib/aw-pricing.js +306 -0
  54. package/scripts/lib/aw-usage-telemetry.js +472 -0
  55. package/scripts/lib/codex-hook-config.js +8 -8
  56. package/scripts/lib/cursor-hook-config.js +25 -10
  57. package/scripts/lib/install-targets/codex-home.js +7 -0
  58. package/scripts/lib/install-targets/cursor-project.js +3 -0
  59. package/scripts/lib/install-targets/helpers.js +20 -3
  60. package/skills/aw-adk/SKILL.md +317 -0
  61. package/skills/aw-adk/agents/analyzer.md +113 -0
  62. package/skills/aw-adk/agents/comparator.md +113 -0
  63. package/skills/aw-adk/agents/grader.md +115 -0
  64. package/skills/aw-adk/assets/eval_review.html +76 -0
  65. package/skills/aw-adk/eval-viewer/generate_review.py +164 -0
  66. package/skills/aw-adk/eval-viewer/viewer.html +181 -0
  67. package/skills/aw-adk/evals/eval-colocated-placement.md +84 -0
  68. package/skills/aw-adk/evals/eval-create-agent.md +90 -0
  69. package/skills/aw-adk/evals/eval-create-command.md +98 -0
  70. package/skills/aw-adk/evals/eval-create-eval.md +89 -0
  71. package/skills/aw-adk/evals/eval-create-rule.md +99 -0
  72. package/skills/aw-adk/evals/eval-create-skill.md +97 -0
  73. package/skills/aw-adk/evals/eval-delete-agent.md +79 -0
  74. package/skills/aw-adk/evals/eval-delete-command.md +89 -0
  75. package/skills/aw-adk/evals/eval-delete-rule.md +86 -0
  76. package/skills/aw-adk/evals/eval-delete-skill.md +90 -0
  77. package/skills/aw-adk/evals/eval-meta-eval-coverage.md +78 -0
  78. package/skills/aw-adk/evals/eval-meta-eval-determinism.md +81 -0
  79. package/skills/aw-adk/evals/eval-meta-eval-false-pass.md +81 -0
  80. package/skills/aw-adk/evals/eval-score-accuracy.md +95 -0
  81. package/skills/aw-adk/evals/eval-type-redirect.md +68 -0
  82. package/skills/aw-adk/evals/evals.json +96 -0
  83. package/skills/aw-adk/references/artifact-wiring.md +162 -0
  84. package/skills/aw-adk/references/cross-ide-mapping.md +71 -0
  85. package/skills/aw-adk/references/eval-placement-guide.md +183 -0
  86. package/skills/aw-adk/references/external-resources.md +75 -0
  87. package/skills/aw-adk/references/getting-started.md +66 -0
  88. package/skills/aw-adk/references/registry-structure.md +152 -0
  89. package/skills/aw-adk/references/rubric-agent.md +36 -0
  90. package/skills/aw-adk/references/rubric-command.md +36 -0
  91. package/skills/aw-adk/references/rubric-eval.md +36 -0
  92. package/skills/aw-adk/references/rubric-meta-eval.md +132 -0
  93. package/skills/aw-adk/references/rubric-rule.md +36 -0
  94. package/skills/aw-adk/references/rubric-skill.md +36 -0
  95. package/skills/aw-adk/references/schemas.md +222 -0
  96. package/skills/aw-adk/references/template-agent.md +251 -0
  97. package/skills/aw-adk/references/template-command.md +279 -0
  98. package/skills/aw-adk/references/template-eval.md +176 -0
  99. package/skills/aw-adk/references/template-rule.md +119 -0
  100. package/skills/aw-adk/references/template-skill.md +123 -0
  101. package/skills/aw-adk/references/type-classifier.md +98 -0
  102. package/skills/aw-adk/references/writing-good-agents.md +227 -0
  103. package/skills/aw-adk/references/writing-good-commands.md +258 -0
  104. package/skills/aw-adk/references/writing-good-evals.md +271 -0
  105. package/skills/aw-adk/references/writing-good-rules.md +214 -0
  106. package/skills/aw-adk/references/writing-good-skills.md +159 -0
  107. package/skills/aw-adk/scripts/aggregate-benchmark.py +190 -0
  108. package/skills/aw-adk/scripts/lint-artifact.sh +211 -0
  109. package/skills/aw-adk/scripts/score-artifact.sh +179 -0
  110. package/skills/aw-adk/scripts/trigger-eval.py +192 -0
  111. package/skills/aw-build/SKILL.md +19 -2
  112. package/skills/aw-deploy/SKILL.md +65 -3
  113. package/skills/aw-design/SKILL.md +156 -0
  114. package/skills/aw-design/references/highrise-tokens.md +394 -0
  115. package/skills/aw-design/references/micro-interactions.md +76 -0
  116. package/skills/aw-design/references/prompt-template.md +160 -0
  117. package/skills/aw-design/references/quality-checklist.md +70 -0
  118. package/skills/aw-design/references/self-review.md +497 -0
  119. package/skills/aw-design/references/stitch-workflow.md +127 -0
  120. package/skills/aw-feature/SKILL.md +293 -0
  121. package/skills/aw-investigate/SKILL.md +17 -0
  122. package/skills/aw-plan/SKILL.md +34 -3
  123. package/skills/aw-publish/SKILL.md +300 -0
  124. package/skills/aw-publish/evals/eval-confirmation-gate.md +60 -0
  125. package/skills/aw-publish/evals/eval-intent-detection.md +111 -0
  126. package/skills/aw-publish/evals/eval-push-modes.md +67 -0
  127. package/skills/aw-publish/evals/eval-rules-push.md +60 -0
  128. package/skills/aw-publish/evals/evals.json +29 -0
  129. package/skills/aw-publish/references/push-modes.md +38 -0
  130. package/skills/aw-review/SKILL.md +88 -9
  131. package/skills/aw-rules-review/SKILL.md +124 -0
  132. package/skills/aw-rules-review/agents/openai.yaml +3 -0
  133. package/skills/aw-rules-review/scripts/generate-review-template.mjs +323 -0
  134. package/skills/aw-ship/SKILL.md +16 -0
  135. package/skills/aw-spec/SKILL.md +15 -0
  136. package/skills/aw-tasks/SKILL.md +15 -0
  137. package/skills/aw-test/SKILL.md +16 -0
  138. package/skills/aw-yolo/SKILL.md +4 -0
  139. package/skills/diagnose/SKILL.md +121 -0
  140. package/skills/diagnose/scripts/hitl-loop.template.sh +41 -0
  141. package/skills/finish-only-when-green/SKILL.md +265 -0
  142. package/skills/grill-me/SKILL.md +24 -0
  143. package/skills/grill-with-docs/SKILL.md +92 -0
  144. package/skills/grill-with-docs/adr-format.md +47 -0
  145. package/skills/grill-with-docs/context-format.md +67 -0
  146. package/skills/improve-codebase-architecture/SKILL.md +75 -0
  147. package/skills/improve-codebase-architecture/deepening.md +37 -0
  148. package/skills/improve-codebase-architecture/interface-design.md +44 -0
  149. package/skills/improve-codebase-architecture/language.md +53 -0
  150. package/skills/local-ghl-setup-from-screenshot/SKILL.md +538 -0
  151. package/skills/tdd/SKILL.md +115 -0
  152. package/skills/tdd/deep-modules.md +33 -0
  153. package/skills/tdd/interface-design.md +31 -0
  154. package/skills/tdd/mocking.md +59 -0
  155. package/skills/tdd/refactoring.md +10 -0
  156. package/skills/tdd/tests.md +61 -0
  157. package/skills/to-issues/SKILL.md +62 -0
  158. package/skills/to-prd/SKILL.md +75 -0
  159. package/skills/using-aw-skills/SKILL.md +170 -237
  160. package/skills/using-aw-skills/hooks/session-start.sh +11 -41
  161. package/skills/zoom-out/SKILL.md +24 -0
  162. package/.cursor/rules/common-agents.md +0 -53
  163. package/.cursor/rules/common-aw-routing.md +0 -43
  164. package/.cursor/rules/common-coding-style.md +0 -52
  165. package/.cursor/rules/common-development-workflow.md +0 -33
  166. package/.cursor/rules/common-git-workflow.md +0 -28
  167. package/.cursor/rules/common-hooks.md +0 -34
  168. package/.cursor/rules/common-patterns.md +0 -35
  169. package/.cursor/rules/common-performance.md +0 -59
  170. package/.cursor/rules/common-security.md +0 -33
  171. package/.cursor/rules/common-testing.md +0 -33
  172. package/.cursor/skills/api-and-interface-design/SKILL.md +0 -75
  173. package/.cursor/skills/article-writing/SKILL.md +0 -85
  174. package/.cursor/skills/aw-brainstorm/SKILL.md +0 -115
  175. package/.cursor/skills/aw-build/SKILL.md +0 -152
  176. package/.cursor/skills/aw-build/evals/build-stage-cases.json +0 -28
  177. package/.cursor/skills/aw-debug/SKILL.md +0 -49
  178. package/.cursor/skills/aw-deploy/SKILL.md +0 -101
  179. package/.cursor/skills/aw-deploy/evals/deploy-stage-cases.json +0 -32
  180. package/.cursor/skills/aw-execute/SKILL.md +0 -47
  181. package/.cursor/skills/aw-execute/references/mode-code.md +0 -47
  182. package/.cursor/skills/aw-execute/references/mode-docs.md +0 -28
  183. package/.cursor/skills/aw-execute/references/mode-infra.md +0 -44
  184. package/.cursor/skills/aw-execute/references/mode-migration.md +0 -58
  185. package/.cursor/skills/aw-execute/references/worker-implementer.md +0 -26
  186. package/.cursor/skills/aw-execute/references/worker-parallel-worker.md +0 -23
  187. package/.cursor/skills/aw-execute/references/worker-quality-reviewer.md +0 -23
  188. package/.cursor/skills/aw-execute/references/worker-spec-reviewer.md +0 -23
  189. package/.cursor/skills/aw-execute/scripts/build-worker-bundle.js +0 -229
  190. package/.cursor/skills/aw-finish/SKILL.md +0 -111
  191. package/.cursor/skills/aw-investigate/SKILL.md +0 -109
  192. package/.cursor/skills/aw-plan/SKILL.md +0 -368
  193. package/.cursor/skills/aw-prepare/SKILL.md +0 -118
  194. package/.cursor/skills/aw-review/SKILL.md +0 -118
  195. package/.cursor/skills/aw-ship/SKILL.md +0 -115
  196. package/.cursor/skills/aw-spec/SKILL.md +0 -104
  197. package/.cursor/skills/aw-tasks/SKILL.md +0 -138
  198. package/.cursor/skills/aw-test/SKILL.md +0 -118
  199. package/.cursor/skills/aw-verify/SKILL.md +0 -51
  200. package/.cursor/skills/aw-yolo/SKILL.md +0 -111
  201. package/.cursor/skills/browser-testing-with-devtools/SKILL.md +0 -81
  202. package/.cursor/skills/bun-runtime/SKILL.md +0 -84
  203. package/.cursor/skills/ci-cd-and-automation/SKILL.md +0 -71
  204. package/.cursor/skills/code-simplification/SKILL.md +0 -74
  205. package/.cursor/skills/content-engine/SKILL.md +0 -88
  206. package/.cursor/skills/context-engineering/SKILL.md +0 -74
  207. package/.cursor/skills/deprecation-and-migration/SKILL.md +0 -75
  208. package/.cursor/skills/documentation-and-adrs/SKILL.md +0 -75
  209. package/.cursor/skills/documentation-lookup/SKILL.md +0 -90
  210. package/.cursor/skills/frontend-slides/SKILL.md +0 -184
  211. package/.cursor/skills/frontend-slides/STYLE_PRESETS.md +0 -330
  212. package/.cursor/skills/frontend-ui-engineering/SKILL.md +0 -68
  213. package/.cursor/skills/git-workflow-and-versioning/SKILL.md +0 -75
  214. package/.cursor/skills/idea-refine/SKILL.md +0 -84
  215. package/.cursor/skills/incremental-implementation/SKILL.md +0 -75
  216. package/.cursor/skills/investor-materials/SKILL.md +0 -96
  217. package/.cursor/skills/investor-outreach/SKILL.md +0 -76
  218. package/.cursor/skills/market-research/SKILL.md +0 -75
  219. package/.cursor/skills/mcp-server-patterns/SKILL.md +0 -67
  220. package/.cursor/skills/nextjs-turbopack/SKILL.md +0 -44
  221. package/.cursor/skills/performance-optimization/SKILL.md +0 -77
  222. package/.cursor/skills/security-and-hardening/SKILL.md +0 -70
  223. package/.cursor/skills/using-aw-skills/SKILL.md +0 -290
  224. package/.cursor/skills/using-aw-skills/evals/skill-trigger-cases.tsv +0 -25
  225. package/.cursor/skills/using-aw-skills/evals/test-skill-triggers.sh +0 -171
  226. package/.cursor/skills/using-aw-skills/hooks/hooks.json +0 -9
  227. package/.cursor/skills/using-aw-skills/hooks/session-start.sh +0 -67
  228. package/.cursor/skills/using-platform-skills/SKILL.md +0 -163
  229. package/.cursor/skills/using-platform-skills/evals/platform-selection-cases.json +0 -52
  230. /package/.cursor/rules/{golang-coding-style.md → golang-coding-style.mdc} +0 -0
  231. /package/.cursor/rules/{golang-hooks.md → golang-hooks.mdc} +0 -0
  232. /package/.cursor/rules/{golang-patterns.md → golang-patterns.mdc} +0 -0
  233. /package/.cursor/rules/{golang-security.md → golang-security.mdc} +0 -0
  234. /package/.cursor/rules/{golang-testing.md → golang-testing.mdc} +0 -0
  235. /package/.cursor/rules/{kotlin-coding-style.md → kotlin-coding-style.mdc} +0 -0
  236. /package/.cursor/rules/{kotlin-hooks.md → kotlin-hooks.mdc} +0 -0
  237. /package/.cursor/rules/{kotlin-patterns.md → kotlin-patterns.mdc} +0 -0
  238. /package/.cursor/rules/{kotlin-security.md → kotlin-security.mdc} +0 -0
  239. /package/.cursor/rules/{kotlin-testing.md → kotlin-testing.mdc} +0 -0
  240. /package/.cursor/rules/{php-coding-style.md → php-coding-style.mdc} +0 -0
  241. /package/.cursor/rules/{php-hooks.md → php-hooks.mdc} +0 -0
  242. /package/.cursor/rules/{php-patterns.md → php-patterns.mdc} +0 -0
  243. /package/.cursor/rules/{php-security.md → php-security.mdc} +0 -0
  244. /package/.cursor/rules/{php-testing.md → php-testing.mdc} +0 -0
  245. /package/.cursor/rules/{python-coding-style.md → python-coding-style.mdc} +0 -0
  246. /package/.cursor/rules/{python-hooks.md → python-hooks.mdc} +0 -0
  247. /package/.cursor/rules/{python-patterns.md → python-patterns.mdc} +0 -0
  248. /package/.cursor/rules/{python-security.md → python-security.mdc} +0 -0
  249. /package/.cursor/rules/{python-testing.md → python-testing.mdc} +0 -0
  250. /package/.cursor/rules/{swift-coding-style.md → swift-coding-style.mdc} +0 -0
  251. /package/.cursor/rules/{swift-hooks.md → swift-hooks.mdc} +0 -0
  252. /package/.cursor/rules/{swift-patterns.md → swift-patterns.mdc} +0 -0
  253. /package/.cursor/rules/{swift-security.md → swift-security.mdc} +0 -0
  254. /package/.cursor/rules/{swift-testing.md → swift-testing.mdc} +0 -0
  255. /package/.cursor/rules/{typescript-coding-style.md → typescript-coding-style.mdc} +0 -0
  256. /package/.cursor/rules/{typescript-hooks.md → typescript-hooks.mdc} +0 -0
  257. /package/.cursor/rules/{typescript-patterns.md → typescript-patterns.mdc} +0 -0
  258. /package/.cursor/rules/{typescript-security.md → typescript-security.mdc} +0 -0
  259. /package/.cursor/rules/{typescript-testing.md → typescript-testing.mdc} +0 -0
@@ -0,0 +1,279 @@
1
+ # Command Template
2
+
3
+ Copy the scaffold below as your starting point. Replace all `<placeholder>` tokens.
4
+
5
+ ---
6
+
7
+ ## Scaffold
8
+
9
+ ````markdown
10
+ ---
11
+ name: <namespace>:<command-slug>
12
+ description: "<1-2 sentences. What workflow this automates and when to use it.>"
13
+ argument-hint: "[target] [--flag]"
14
+ mcp: []
15
+ ---
16
+
17
+ # <Command Display Name>
18
+
19
+ <1-2 sentence purpose. What end-to-end workflow does this command automate?>
20
+
21
+ ## Protocol
22
+
23
+ > **AW-PROTOCOL**: This command follows the AW orchestration protocol.
24
+ > All phases execute sequentially. Each phase has defined inputs, outputs,
25
+ > and checkpoints. Failure at any checkpoint triggers the on-failure handler
26
+ > before proceeding.
27
+
28
+ ### Skill Loading Gate
29
+
30
+ > **BLOCKING**: Before executing ANY phase, resolve and load the following skills.
31
+ > Do not proceed until all skills are confirmed loaded.
32
+
33
+ | Skill | Purpose | Required |
34
+ |-------|---------|----------|
35
+ | `<namespace>-<skill-1>` | <what it provides> | Yes |
36
+ | `<namespace>-<skill-2>` | <what it provides> | Yes |
37
+ | `<namespace>-<skill-3>` | <what it provides> | No |
38
+
39
+ ```
40
+ Resolve: <skill-1>, <skill-2>
41
+ Confirm: all loaded
42
+ Proceed: Phase 0
43
+ ```
44
+
45
+ ## Core Principles
46
+
47
+ 1. **<Principle 1>** — <Why this principle matters for this workflow>
48
+ 2. **<Principle 2>** — <Why this principle matters>
49
+ 3. **<Principle 3>** — <Why this principle matters>
50
+
51
+ ## Agent Roster
52
+
53
+ | Agent | Role | Phase(s) | Model |
54
+ |-------|------|----------|-------|
55
+ | `<agent-1>` | <what it does> | <phase numbers> | sonnet |
56
+ | `<agent-2>` | <what it does> | <phase numbers> | sonnet |
57
+ | `<agent-3>` | <what it does> | <phase numbers> | haiku |
58
+
59
+ ## Phase 0: Initialize
60
+
61
+ **Purpose:** Validate inputs, resolve paths, establish workspace.
62
+
63
+ 1. Parse arguments: `<expected arguments>`
64
+ 2. Validate target exists: `<validation command>`
65
+ 3. Create workspace directory:
66
+
67
+ ```bash
68
+ mkdir -p <workspace-path>
69
+ ```
70
+
71
+ 4. Snapshot current state (for rollback):
72
+
73
+ ```bash
74
+ <snapshot command>
75
+ ```
76
+
77
+ **Output:** Validated inputs, workspace path, snapshot reference
78
+ **Checkpoint:** All inputs valid, workspace exists, snapshot saved
79
+ **On-failure:** Report missing inputs with usage example, exit
80
+
81
+ ---
82
+
83
+ ## Phase 1: <Phase Name>
84
+
85
+ **Purpose:** <What this phase accomplishes and why it comes first>
86
+
87
+ **Agent:** `<agent-name>`
88
+ **Input:** <What this phase receives from Phase 0 or prior phase>
89
+
90
+ ### Steps
91
+
92
+ 1. <Step with concrete action>
93
+ 2. <Step with concrete action>
94
+ 3. <Step with concrete action>
95
+
96
+ ```bash
97
+ # Example command for this phase
98
+ <command>
99
+ ```
100
+
101
+ **Output:** <Specific artifacts this phase produces>
102
+ **Checkpoint:** <Verifiable criteria — how to confirm this phase succeeded>
103
+ **On-failure:** <What to do if the checkpoint fails — retry, skip, escalate>
104
+
105
+ ---
106
+
107
+ ## Phase 2: <Phase Name>
108
+
109
+ **Purpose:** <What this phase accomplishes>
110
+
111
+ **Agent:** `<agent-name>`
112
+ **Input:** <Output from Phase 1>
113
+
114
+ ### Steps
115
+
116
+ 1. <Step with concrete action>
117
+ 2. <Step with concrete action>
118
+
119
+ **Output:** <Artifacts produced>
120
+ **Checkpoint:** <Success criteria>
121
+ **On-failure:** <Recovery strategy>
122
+
123
+ ---
124
+
125
+ ## Phase 3: <Phase Name>
126
+
127
+ **Purpose:** <What this phase accomplishes>
128
+
129
+ **Agent:** `<agent-name>`
130
+ **Input:** <Output from Phase 2>
131
+
132
+ ### Steps
133
+
134
+ 1. <Step with concrete action>
135
+ 2. <Step with concrete action>
136
+
137
+ **Output:** <Artifacts produced>
138
+ **Checkpoint:** <Success criteria>
139
+ **On-failure:** <Recovery strategy>
140
+
141
+ ---
142
+
143
+ ## Phase N: <Human Checkpoint> (optional)
144
+
145
+ **Purpose:** Pause for human review before irreversible actions.
146
+
147
+ **Input:** <Summary of all prior phase outputs>
148
+
149
+ ### Review Prompt
150
+
151
+ ```
152
+ The following changes are ready for <action>:
153
+
154
+ <summary of changes>
155
+
156
+ Proceed? [y/n]
157
+ ```
158
+
159
+ **On-approve:** Continue to next phase
160
+ **On-reject:** <What to do — rollback, revise, or exit>
161
+
162
+ ---
163
+
164
+ ## Phase N+1: Deliver
165
+
166
+ **Purpose:** Produce final deliverables and report results.
167
+
168
+ ### Steps
169
+
170
+ 1. Aggregate outputs from all phases
171
+ 2. Generate summary report
172
+ 3. Clean up workspace (if applicable)
173
+
174
+ **Output:** Final deliverables (see table below)
175
+ **Checkpoint:** All required deliverables exist and pass validation
176
+
177
+ ## Compound Learnings
178
+
179
+ <Patterns discovered across phases that should be captured for future runs.
180
+ This section is populated after the first few executions.>
181
+
182
+ - <Learning 1 — e.g., "Phase 2 consistently takes 3x longer than Phase 1; consider parallelizing sub-steps">
183
+ - <Learning 2 — e.g., "Agent X produces better output when given Phase 1 output as structured JSON, not prose">
184
+
185
+ ## Output Format
186
+
187
+ ```
188
+ ## <Command Name> Results
189
+
190
+ **Status:** <COMPLETE | PARTIAL | FAILED>
191
+ **Duration:** <time>
192
+ **Phases completed:** <N/M>
193
+
194
+ ### Phase Summary
195
+ | Phase | Status | Key Output |
196
+ |-------|--------|------------|
197
+ | 0: Init | PASS | Workspace at <path> |
198
+ | 1: <name> | PASS | <output summary> |
199
+ | 2: <name> | PASS | <output summary> |
200
+
201
+ ### Deliverables
202
+ <list of produced artifacts with paths>
203
+
204
+ ### Issues
205
+ <any failures, warnings, or items needing follow-up>
206
+ ```
207
+
208
+ ## Error Handling
209
+
210
+ | Error | Phase | Recovery |
211
+ |-------|-------|----------|
212
+ | <error-type-1> | <phase> | <what to do> |
213
+ | <error-type-2> | <phase> | <what to do> |
214
+ | <error-type-3> | Any | <what to do> |
215
+ | Unrecoverable failure | Any | Rollback to Phase 0 snapshot, report error |
216
+
217
+ ## References
218
+
219
+ - [<skill-name>](../skills/<slug>/SKILL.md) — <what it provides>
220
+ - [<reference-name>](references/<file>.md) — <what it covers>
221
+ ````
222
+
223
+ ---
224
+
225
+ ## Section-by-Section Guide
226
+
227
+ ### Frontmatter
228
+
229
+ - `name` — follows naming convention: `aw:platform-<domain>-<slug>` for platform, `aw:<team>-<sub_team>-<slug>` for teams (or `aw:<team>-<sub_team>-<domain>-<slug>` when domain nesting is used), `aw:<slug>` for stage commands. All hyphens, no colons (except the `aw:` prefix). See [registry-structure.md](registry-structure.md) for the full naming table.
230
+ - `description` — front-load the workflow being automated
231
+ - `argument-hint` — shown in help text; keep it short
232
+ - `mcp` — list of MCP servers this command requires (empty if none)
233
+
234
+ ### Protocol & Skill Loading Gate
235
+
236
+ The AW-PROTOCOL reference signals that this is a managed pipeline. The skill loading gate is BLOCKING — the command must not execute any phase until all required skills are confirmed loaded. This prevents partial execution with missing context.
237
+
238
+ ### Core Principles
239
+
240
+ Three to five principles that shape decision-making across all phases. These are not rules (those go in rules/); they are workflow-specific values. Example: "Prefer incremental rollout over big-bang deployment."
241
+
242
+ ### Agent Roster
243
+
244
+ Declares all agents used across phases upfront. This lets the reader understand the full cast before diving into phases. Include the model tier — it affects cost and capability expectations.
245
+
246
+ ### Phase Structure
247
+
248
+ Every phase follows the same contract:
249
+
250
+ - **Purpose** — Why this phase exists (not what it does — the steps cover that)
251
+ - **Agent** — Which agent executes this phase
252
+ - **Input** — Explicit data dependency on prior phases
253
+ - **Steps** — Concrete, numbered actions
254
+ - **Output** — What this phase produces (consumed by later phases)
255
+ - **Checkpoint** — Verifiable success criteria (binary: pass or fail)
256
+ - **On-failure** — Recovery strategy (retry, skip, escalate, rollback)
257
+
258
+ This structure makes commands debuggable. When Phase 3 fails, you check Phase 3's checkpoint and on-failure handler.
259
+
260
+ ### Human Checkpoints
261
+
262
+ Insert before irreversible actions (deployments, data migrations, external API calls). The command pauses, presents a summary, and waits for approval. Include a clear on-reject path.
263
+
264
+ ### Compound Learnings
265
+
266
+ Populated after real executions. This is where operational wisdom accumulates. Review after the first 5-10 runs and update the command based on observed patterns.
267
+
268
+ ### Error Handling Table
269
+
270
+ Exhaustive mapping of known failure modes to recovery strategies. The "Any" phase row catches unexpected failures with a universal rollback strategy.
271
+
272
+ ## Anti-Patterns
273
+
274
+ | Pattern | Problem | Fix |
275
+ |---|---|---|
276
+ | No checkpoints between phases | Cascading failures — Phase 3 fails because Phase 1 silently produced bad output | Add checkpoint to every phase |
277
+ | Monolithic single-phase command | Not a command — it's a script. Commands orchestrate multiple agents through phases | Break into 3+ phases or make it a skill |
278
+ | No skill loading gate | Agents execute without required context, producing shallow output | Add BLOCKING gate with required skills |
279
+ | Phase depends on implicit state | Breaks when phases are re-run or reordered | Make all inputs explicit in the Input field |
@@ -0,0 +1,176 @@
1
+ # Eval Template
2
+
3
+ Copy the scaffold below as your starting point. Replace all `<placeholder>` tokens.
4
+
5
+ ---
6
+
7
+ ## Scaffold
8
+
9
+ ````markdown
10
+ ---
11
+ name: eval-<eval-slug>
12
+ target: <parent-artifact-name>
13
+ category: <functional | structural | behavioral | integration>
14
+ difficulty: <basic | intermediate | advanced>
15
+ ---
16
+
17
+ # Eval: <Eval Display Name>
18
+
19
+ ## Task
20
+
21
+ <2-4 sentences describing what the model should do when given this eval's prompt.
22
+ Be specific about the scenario, inputs, and expected workflow. This is the "user request"
23
+ that the executor receives.>
24
+
25
+ ### Prompt
26
+
27
+ ```
28
+ <The exact prompt to give the executor. This must be realistic — something a real user
29
+ would actually ask. Include enough context for the executor to act without follow-up questions.>
30
+ ```
31
+
32
+ ## Context
33
+
34
+ | Field | Value |
35
+ |-------|-------|
36
+ | **Namespace** | `<namespace where the parent artifact lives>` |
37
+ | **Domain** | `<domain: backend, frontend, data, infra, etc.>` |
38
+ | **Target artifact** | `<path to the artifact being tested>` |
39
+ | **Target type** | `<command \| agent \| skill \| rule \| eval>` |
40
+ | **Related work** | `<links to related artifacts, PRs, or docs>` |
41
+
42
+ ## Expected Outcomes
43
+
44
+ The executor's output must satisfy ALL of the following:
45
+
46
+ - [ ] <Outcome 1 — specific, verifiable assertion about the output>
47
+ - [ ] <Outcome 2 — structural check: "file exists at X", "section Y is present">
48
+ - [ ] <Outcome 3 — content check: "contains at least N items", "references skill Z">
49
+ - [ ] <Outcome 4 — quality check: "examples are concrete, not placeholder">
50
+ - [ ] <Outcome 5 — negative check: "does NOT contain X" or "does NOT skip Y">
51
+
52
+ ### Assertion Quality Criteria
53
+
54
+ Each assertion above must be:
55
+ - **Verifiable** — A grader can determine pass/fail from the output alone
56
+ - **Discriminating** — A clearly wrong output would fail this assertion
57
+ - **Stable** — Minor formatting changes don't cause false failures
58
+
59
+ ## Grading Criteria
60
+
61
+ ### PASS (all conditions met)
62
+
63
+ - All expected outcomes checked
64
+ - Output is production-ready (not placeholder/stub content)
65
+ - No critical errors in execution
66
+
67
+ ### PARTIAL (some conditions met)
68
+
69
+ - <N>+ of <M> expected outcomes met
70
+ - Output has correct structure but thin content
71
+ - OR output has rich content but wrong structure
72
+
73
+ ### FAIL (below threshold)
74
+
75
+ - Fewer than <N> expected outcomes met
76
+ - Output is structurally wrong (missing required sections, wrong artifact type)
77
+ - OR executor failed to complete the task
78
+
79
+ ## Evaluation Method
80
+
81
+ **Type:** <deterministic | model-based | hybrid>
82
+
83
+ ### Deterministic Checks
84
+
85
+ <Checks that can be performed by a script — file existence, section headers,
86
+ frontmatter fields, naming patterns.>
87
+
88
+ ```bash
89
+ # Example: verify file exists and has required sections
90
+ test -f "<expected-path>" || echo "FAIL: file not found"
91
+ grep -q "## Core Mission" "<expected-path>" || echo "FAIL: missing Core Mission"
92
+ ```
93
+
94
+ ### Model-Based Checks
95
+
96
+ <Checks that require judgment — content quality, example relevance, reasoning depth.
97
+ These are evaluated by the grader agent.>
98
+
99
+ - Does the output explain WHY, not just WHAT?
100
+ - Are examples concrete and domain-specific (not generic foo/bar)?
101
+ - Would a domain expert find the content useful?
102
+
103
+ ## Variants (optional)
104
+
105
+ <Alternative scenarios that test the same artifact from different angles.>
106
+
107
+ | Variant | Difference | Tests |
108
+ |---------|------------|-------|
109
+ | `eval-<slug>-minimal` | Minimal input, no context | Handles missing info gracefully |
110
+ | `eval-<slug>-complex` | Multi-step request with constraints | Handles complexity without losing accuracy |
111
+ | `eval-<slug>-adversarial` | Intentionally ambiguous or misleading input | Doesn't hallucinate or guess |
112
+
113
+ ## Baseline Expectations
114
+
115
+ <What should happen when the executor runs WITHOUT the target artifact loaded.
116
+ This establishes the value-add of the artifact.>
117
+
118
+ - Without artifact: <expected behavior — generic output, missed requirements, etc.>
119
+ - With artifact: <expected behavior — specific, structured, complete output>
120
+ - **Expected delta:** <quantified improvement, e.g., "+40% pass rate">
121
+ ````
122
+
123
+ ---
124
+
125
+ ## Section-by-Section Guide
126
+
127
+ ### Frontmatter
128
+
129
+ - `name` — Always prefixed with `eval-`. Lives in the colocated `evals/` directory of the parent artifact.
130
+ - `target` — The artifact this eval tests. Must reference an existing artifact.
131
+ - `category` — What aspect is being tested:
132
+ - `functional` — Does the artifact produce correct output?
133
+ - `structural` — Does the output have the right shape?
134
+ - `behavioral` — Does the artifact handle edge cases correctly?
135
+ - `integration` — Does the artifact work with other artifacts?
136
+ - `difficulty` — Affects grading tolerance. Basic evals expect straightforward success. Advanced evals allow more nuanced partial results.
137
+
138
+ ### Task & Prompt
139
+
140
+ The prompt is the most critical field. It must be:
141
+ 1. **Realistic** — something a real user would type
142
+ 2. **Self-contained** — the executor shouldn't need to ask follow-up questions
143
+ 3. **Unambiguous** — one clear correct interpretation
144
+
145
+ Bad prompts produce unreliable evals. If the eval flakes, the prompt is usually the problem.
146
+
147
+ ### Expected Outcomes
148
+
149
+ Four or more assertions, each independently verifiable. Mix structural checks (file exists, section present) with content checks (examples are concrete, references are valid) and at least one negative check (does NOT contain placeholder text).
150
+
151
+ Weak assertions that pass for both good and bad output provide false confidence. Each assertion should discriminate: a clearly wrong output must fail it.
152
+
153
+ ### Grading Criteria
154
+
155
+ Three tiers with clear thresholds. PASS/PARTIAL/FAIL must be unambiguous — the grader should not need judgment to classify a result into a tier. Use specific counts ("4+ of 5 outcomes") rather than vague language ("most outcomes").
156
+
157
+ ### Evaluation Method
158
+
159
+ Three options:
160
+ - **Deterministic** — Script-based checks only. Fast, reliable, but can't assess quality.
161
+ - **Model-based** — Grader agent evaluates. Can assess quality, but slower and potentially inconsistent.
162
+ - **Hybrid** — Deterministic for structure, model-based for content. Best of both worlds. Recommended default.
163
+
164
+ ### Baseline Expectations
165
+
166
+ The with/without comparison is how you measure the artifact's value-add. Without a baseline, you can't distinguish "the artifact helped" from "the model would have done this anyway." Always specify expected delta.
167
+
168
+ ## Anti-Patterns
169
+
170
+ | Pattern | Problem | Fix |
171
+ |---|---|---|
172
+ | Assertions that always pass | False confidence — bad output also passes | Test assertions against a known-bad output |
173
+ | Ambiguous prompt | Eval flakes — different runs interpret differently | Make prompt self-contained with concrete details |
174
+ | No negative assertions | Doesn't catch hallucination or extra content | Add "does NOT contain" checks |
175
+ | No baseline expectation | Can't measure artifact value-add | Specify without-artifact behavior |
176
+ | Only structural checks | Correct shape with garbage content passes | Add content quality assertions |
@@ -0,0 +1,119 @@
1
+ # Rule Template
2
+
3
+ Copy the scaffold below as your starting point. Replace all `<placeholder>` tokens. Rules are intentionally shorter than other artifact types — a rule that needs 1000 words to explain is probably a skill.
4
+
5
+ ---
6
+
7
+ ## Scaffold
8
+
9
+ ````markdown
10
+ ---
11
+ id: <domain>/<rule-slug>
12
+ severity: <MUST | SHOULD | MAY>
13
+ domains: [<domain-1>, <domain-2>]
14
+ paths: ["<glob-pattern-1>", "<glob-pattern-2>"]
15
+ ---
16
+
17
+ # <Rule Title>
18
+
19
+ ## Rule
20
+
21
+ <requirement-statement> [<MUST|SHOULD|MAY>]
22
+
23
+ **Why:** <1-2 sentences explaining the consequence of violating this rule. What breaks, degrades, or becomes vulnerable? This is the most important part — a model that understands "why" handles edge cases better than one following a directive.>
24
+
25
+ ## WRONG
26
+
27
+ <Real violation — not a toy example. Show code or config that a developer would actually write.>
28
+
29
+ ```<language>
30
+ // <Brief comment explaining what's wrong>
31
+ <violating code>
32
+ ```
33
+
34
+ **Impact:** <What happens if this ships — runtime error, security vulnerability, data corruption, etc.>
35
+
36
+ ## RIGHT
37
+
38
+ <Verified fix — the correct way to write the same code. Must compile/run.>
39
+
40
+ ```<language>
41
+ // <Brief comment explaining why this is correct>
42
+ <correct code>
43
+ ```
44
+
45
+ ## Exceptions
46
+
47
+ <When this rule does NOT apply. Be specific — vague exceptions become loopholes.>
48
+
49
+ - <Exception 1>: <specific condition and why the rule doesn't apply>
50
+ - <Exception 2>: <specific condition>
51
+
52
+ If no exceptions exist, write: "No exceptions. This rule applies universally."
53
+
54
+ ## Enforcement
55
+
56
+ - **Automated:** <How this can be caught automatically — linter rule, CI check, grep pattern>
57
+ - **Manual:** <What a reviewer should look for during code review>
58
+
59
+ ## Severity Justification
60
+
61
+ **<MUST|SHOULD|MAY>** because <reason tied to impact>:
62
+
63
+ - **MUST** — Violation causes correctness failures, security vulnerabilities, or data loss
64
+ - **SHOULD** — Violation degrades quality, maintainability, or developer experience
65
+ - **MAY** — Violation is suboptimal but acceptable in some contexts
66
+
67
+ ## References
68
+
69
+ - [<skill-name>](../skills/<slug>/SKILL.md) — <deeper guidance on the practice>
70
+ - [<external-doc>](<url>) — <authoritative source>
71
+ ````
72
+
73
+ ---
74
+
75
+ ## Section-by-Section Guide
76
+
77
+ ### Frontmatter
78
+
79
+ - `id` — Unique identifier in `<domain>/<slug>` format. Used in rule-manifest.json and AGENTS.md references.
80
+ - `severity` — One of MUST (violation = defect), SHOULD (violation = code smell), MAY (recommendation).
81
+ - `domains` — Which platform domains this applies to. Use `["universal"]` for cross-cutting rules.
82
+ - `paths` — Glob patterns for files this rule applies to. Enables automated scoping.
83
+
84
+ ### Rule Statement
85
+
86
+ One sentence. Active voice. Ends with the severity tag in brackets. The model reads this as the primary constraint.
87
+
88
+ **Good:** `All database queries must be scoped by locationId from auth context. [MUST]`
89
+ **Bad:** `It is recommended that queries should generally include location scoping when possible. [SHOULD]`
90
+
91
+ ### Why
92
+
93
+ The single most important section. Models follow rules more reliably when they understand consequences. "Because the style guide says so" is not a reason. "Because unscoped queries return data from other tenants, creating a data leak" is.
94
+
95
+ ### WRONG / RIGHT Examples
96
+
97
+ Real code, not pseudocode. The WRONG example should be something a developer would plausibly write — not a strawman. The RIGHT example must be a direct fix of the WRONG example, not a different scenario.
98
+
99
+ ### Exceptions
100
+
101
+ Explicit exceptions prevent false positives and reduce rule fatigue. If a rule has no exceptions, say so explicitly — ambiguity about exceptions leads to inconsistent enforcement.
102
+
103
+ ### Enforcement
104
+
105
+ Split into automated (CI/linter) and manual (code review). Every rule should have at least one enforcement path. Rules that can only be enforced manually are expensive — prioritize automatable rules.
106
+
107
+ ### Severity Justification
108
+
109
+ Explains why this severity level was chosen, not what the levels mean. "MUST because unscoped queries create cross-tenant data leaks in production" connects the severity to the specific consequence.
110
+
111
+ ## Anti-Patterns
112
+
113
+ | Pattern | Problem | Fix |
114
+ |---|---|---|
115
+ | No "Why" section | Model follows rule mechanically, fails on edge cases | Add consequence-driven explanation |
116
+ | Pseudocode examples | Developer can't map to real code | Use real language, real patterns |
117
+ | WRONG example is a strawman | Nobody would write that; rule feels patronizing | Use a plausible violation from real code |
118
+ | Vague exceptions | "Sometimes this doesn't apply" — when? | List specific conditions or write "No exceptions" |
119
+ | MUST severity without justification | Everything feels critical; severity loses meaning | Justify with specific impact |
@@ -0,0 +1,123 @@
1
+ # Skill Template
2
+
3
+ Copy the scaffold below as your starting point. Replace all `<placeholder>` tokens.
4
+
5
+ ---
6
+
7
+ ## Scaffold
8
+
9
+ ````markdown
10
+ ---
11
+ name: <namespace>-<skill-slug>
12
+ description: "<1-2 sentences. State primary capability first, then 'Use when <trigger scenario>'.>"
13
+ trigger: when the user <trigger condition>
14
+ ---
15
+
16
+ # <Skill Display Name>
17
+
18
+ <1-2 sentence purpose statement. What does this skill teach, and why does it matter?>
19
+
20
+ ## When to Use
21
+
22
+ - <Trigger scenario 1 — specific user intent or request pattern>
23
+ - <Trigger scenario 2 — a different angle or adjacent need>
24
+ - <Trigger scenario 3 — an edge case that should still match>
25
+
26
+ ## Quick Start
27
+
28
+ <Minimal example showing the skill in action. This is the "show, don't tell" section.
29
+ Give a concrete, copy-pasteable example — not a description of what to do.>
30
+
31
+ ```bash
32
+ # Example: a concrete invocation or code snippet
33
+ <command or code>
34
+ ```
35
+
36
+ ## Detailed Guide
37
+
38
+ ### <Topic 1>
39
+
40
+ <Step-by-step instructions with concrete actions. Each step should be:
41
+ 1. Numbered
42
+ 2. Actionable (starts with a verb)
43
+ 3. Specific (includes file paths, commands, or code)>
44
+
45
+ ### <Topic 2>
46
+
47
+ <More guidance. Add as many topic sections as needed, but each should
48
+ earn its place — if a section doesn't change behavior, remove it.>
49
+
50
+ ### <Topic 3 — Common Pitfalls>
51
+
52
+ <What goes wrong and how to fix it. Real failure modes, not hypothetical ones.>
53
+
54
+ ## Checklist
55
+
56
+ - [ ] <Check item 1> — <pass/fail criteria: what to look for and what "done" means>
57
+ - [ ] <Check item 2> — <pass/fail criteria>
58
+ - [ ] <Check item 3> — <pass/fail criteria>
59
+
60
+ ## Output Format
61
+
62
+ <Show the exact structure of what this skill produces. If it produces a file,
63
+ show the file. If it produces a checklist, show the checklist. Be concrete.>
64
+
65
+ ```
66
+ <output structure>
67
+ ```
68
+
69
+ ## References
70
+
71
+ - [<reference-name>](references/<file>.md) — <what it covers>
72
+ - [<external-link>](<url>) — <why it's relevant>
73
+ ````
74
+
75
+ ---
76
+
77
+ ## Section-by-Section Guide
78
+
79
+ ### Frontmatter
80
+
81
+ The three fields (`name`, `description`, `trigger`) control discoverability. The description is what the model reads to decide whether to load this skill. Front-load the capability; put the trigger scenario second.
82
+
83
+ **Good:** `"MongoDB query optimization patterns for Mongoose and native driver. Use when debugging slow queries, reviewing aggregation pipelines, or designing indexes."`
84
+
85
+ **Bad:** `"This skill helps with MongoDB."` (too vague, no trigger signals)
86
+
87
+ ### Purpose Statement
88
+
89
+ One to two sentences below the H1. This is the first thing a reader sees. It should answer: "Why does this skill exist?" and "What outcome does it produce?"
90
+
91
+ ### When to Use
92
+
93
+ Three or more trigger scenarios. These help the model (and human readers) decide if this skill matches their situation. Be specific about user intent, not about the skill's internal mechanics.
94
+
95
+ ### Quick Start
96
+
97
+ The most important section for adoption. A developer should be able to copy-paste this and get a working result. If your Quick Start requires reading the Detailed Guide first, it's too complex.
98
+
99
+ ### Detailed Guide
100
+
101
+ Progressive disclosure. Only readers who need depth will reach here. Organize by task or topic, not by internal architecture. Each subsection should be independently useful.
102
+
103
+ ### Checklist
104
+
105
+ Actionable verification items. Each item must have clear pass/fail criteria — "looks good" is not a criterion. These are used by graders and reviewers to validate the skill was applied correctly.
106
+
107
+ ### Output Format
108
+
109
+ Show, don't describe. If the skill produces JSON, show JSON. If it produces a markdown report, show the markdown. The model uses this section to format its output correctly.
110
+
111
+ ### References
112
+
113
+ Link to deeper material. Reference files for detailed patterns, external docs for vendor APIs. Keep the skill itself lean; push depth into references.
114
+
115
+ ## Anti-Patterns
116
+
117
+ | Pattern | Problem | Fix |
118
+ |---|---|---|
119
+ | 5000+ word SKILL.md | Model wastes context loading it | Split into SKILL.md (overview) + references/ (depth) |
120
+ | No Quick Start | Low adoption — readers leave before learning | Add a copy-pasteable example |
121
+ | Vague trigger description | Model loads the skill for wrong requests | Add 3+ specific trigger scenarios |
122
+ | Checklist without criteria | Unverifiable — "did I do this?" has no answer | Add pass/fail criteria to every item |
123
+ | Generic examples | Model produces generic output | Use real domain examples, not `foo`/`bar` |