aw-ecc 1.4.32 → 1.4.48

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (259) hide show
  1. package/.claude-plugin/plugin.json +1 -1
  2. package/.cursor/INSTALL.md +7 -5
  3. package/.cursor/hooks/adapter.js +41 -4
  4. package/.cursor/hooks/after-agent-response.js +62 -0
  5. package/.cursor/hooks/before-submit-prompt.js +7 -1
  6. package/.cursor/hooks/post-tool-use-failure.js +21 -0
  7. package/.cursor/hooks/post-tool-use.js +39 -0
  8. package/.cursor/hooks/shared/aw-phase-definitions.js +53 -0
  9. package/.cursor/hooks/shared/aw-phase-runner.js +3 -1
  10. package/.cursor/hooks/subagent-start.js +22 -4
  11. package/.cursor/hooks/subagent-stop.js +18 -1
  12. package/.cursor/hooks.json +23 -2
  13. package/.opencode/package.json +1 -1
  14. package/AGENTS.md +3 -3
  15. package/README.md +5 -5
  16. package/commands/adk.md +52 -0
  17. package/commands/build.md +22 -9
  18. package/commands/deploy.md +12 -0
  19. package/commands/execute.md +9 -0
  20. package/commands/feature.md +333 -0
  21. package/commands/investigate.md +18 -5
  22. package/commands/plan.md +23 -9
  23. package/commands/publish.md +65 -0
  24. package/commands/review.md +12 -0
  25. package/commands/ship.md +12 -0
  26. package/commands/test.md +12 -0
  27. package/commands/verify.md +9 -0
  28. package/hooks/hooks.json +36 -0
  29. package/manifests/install-components.json +8 -0
  30. package/manifests/install-modules.json +83 -0
  31. package/manifests/install-profiles.json +7 -0
  32. package/package.json +2 -2
  33. package/scripts/ci/validate-rules.js +51 -0
  34. package/scripts/cursor-aw-home/hooks.json +23 -2
  35. package/scripts/cursor-aw-hooks/adapter.js +41 -4
  36. package/scripts/cursor-aw-hooks/before-submit-prompt.js +7 -1
  37. package/scripts/hooks/aw-usage-commit-created.js +32 -0
  38. package/scripts/hooks/aw-usage-post-tool-use-failure.js +56 -0
  39. package/scripts/hooks/aw-usage-post-tool-use.js +242 -0
  40. package/scripts/hooks/aw-usage-prompt-submit.js +112 -0
  41. package/scripts/hooks/aw-usage-session-start.js +48 -0
  42. package/scripts/hooks/aw-usage-stop.js +182 -0
  43. package/scripts/hooks/aw-usage-telemetry-send.js +84 -0
  44. package/scripts/hooks/cost-tracker.js +3 -23
  45. package/scripts/hooks/shared/aw-phase-definitions.js +53 -0
  46. package/scripts/hooks/shared/aw-phase-runner.js +3 -1
  47. package/scripts/lib/aw-hook-contract.js +2 -2
  48. package/scripts/lib/aw-pricing.js +306 -0
  49. package/scripts/lib/aw-usage-telemetry.js +472 -0
  50. package/scripts/lib/codex-hook-config.js +8 -8
  51. package/scripts/lib/cursor-hook-config.js +25 -10
  52. package/scripts/lib/install-targets/cursor-project.js +3 -0
  53. package/scripts/lib/install-targets/helpers.js +20 -3
  54. package/skills/aw-adk/SKILL.md +317 -0
  55. package/skills/aw-adk/agents/analyzer.md +113 -0
  56. package/skills/aw-adk/agents/comparator.md +113 -0
  57. package/skills/aw-adk/agents/grader.md +115 -0
  58. package/skills/aw-adk/assets/eval_review.html +76 -0
  59. package/skills/aw-adk/eval-viewer/generate_review.py +164 -0
  60. package/skills/aw-adk/eval-viewer/viewer.html +181 -0
  61. package/skills/aw-adk/evals/eval-colocated-placement.md +84 -0
  62. package/skills/aw-adk/evals/eval-create-agent.md +90 -0
  63. package/skills/aw-adk/evals/eval-create-command.md +98 -0
  64. package/skills/aw-adk/evals/eval-create-eval.md +89 -0
  65. package/skills/aw-adk/evals/eval-create-rule.md +99 -0
  66. package/skills/aw-adk/evals/eval-create-skill.md +97 -0
  67. package/skills/aw-adk/evals/eval-delete-agent.md +79 -0
  68. package/skills/aw-adk/evals/eval-delete-command.md +89 -0
  69. package/skills/aw-adk/evals/eval-delete-rule.md +86 -0
  70. package/skills/aw-adk/evals/eval-delete-skill.md +90 -0
  71. package/skills/aw-adk/evals/eval-meta-eval-coverage.md +78 -0
  72. package/skills/aw-adk/evals/eval-meta-eval-determinism.md +81 -0
  73. package/skills/aw-adk/evals/eval-meta-eval-false-pass.md +81 -0
  74. package/skills/aw-adk/evals/eval-score-accuracy.md +95 -0
  75. package/skills/aw-adk/evals/eval-type-redirect.md +68 -0
  76. package/skills/aw-adk/evals/evals.json +96 -0
  77. package/skills/aw-adk/references/artifact-wiring.md +162 -0
  78. package/skills/aw-adk/references/cross-ide-mapping.md +71 -0
  79. package/skills/aw-adk/references/eval-placement-guide.md +183 -0
  80. package/skills/aw-adk/references/external-resources.md +75 -0
  81. package/skills/aw-adk/references/getting-started.md +66 -0
  82. package/skills/aw-adk/references/registry-structure.md +152 -0
  83. package/skills/aw-adk/references/rubric-agent.md +36 -0
  84. package/skills/aw-adk/references/rubric-command.md +36 -0
  85. package/skills/aw-adk/references/rubric-eval.md +36 -0
  86. package/skills/aw-adk/references/rubric-meta-eval.md +132 -0
  87. package/skills/aw-adk/references/rubric-rule.md +36 -0
  88. package/skills/aw-adk/references/rubric-skill.md +36 -0
  89. package/skills/aw-adk/references/schemas.md +222 -0
  90. package/skills/aw-adk/references/template-agent.md +251 -0
  91. package/skills/aw-adk/references/template-command.md +279 -0
  92. package/skills/aw-adk/references/template-eval.md +176 -0
  93. package/skills/aw-adk/references/template-rule.md +119 -0
  94. package/skills/aw-adk/references/template-skill.md +123 -0
  95. package/skills/aw-adk/references/type-classifier.md +98 -0
  96. package/skills/aw-adk/references/writing-good-agents.md +227 -0
  97. package/skills/aw-adk/references/writing-good-commands.md +258 -0
  98. package/skills/aw-adk/references/writing-good-evals.md +271 -0
  99. package/skills/aw-adk/references/writing-good-rules.md +214 -0
  100. package/skills/aw-adk/references/writing-good-skills.md +159 -0
  101. package/skills/aw-adk/scripts/aggregate-benchmark.py +190 -0
  102. package/skills/aw-adk/scripts/lint-artifact.sh +211 -0
  103. package/skills/aw-adk/scripts/score-artifact.sh +179 -0
  104. package/skills/aw-adk/scripts/trigger-eval.py +192 -0
  105. package/skills/aw-build/SKILL.md +19 -2
  106. package/skills/aw-deploy/SKILL.md +65 -3
  107. package/skills/aw-design/SKILL.md +156 -0
  108. package/skills/aw-design/references/highrise-tokens.md +394 -0
  109. package/skills/aw-design/references/micro-interactions.md +76 -0
  110. package/skills/aw-design/references/prompt-template.md +160 -0
  111. package/skills/aw-design/references/quality-checklist.md +70 -0
  112. package/skills/aw-design/references/self-review.md +497 -0
  113. package/skills/aw-design/references/stitch-workflow.md +127 -0
  114. package/skills/aw-feature/SKILL.md +293 -0
  115. package/skills/aw-investigate/SKILL.md +17 -0
  116. package/skills/aw-plan/SKILL.md +34 -3
  117. package/skills/aw-publish/SKILL.md +300 -0
  118. package/skills/aw-publish/evals/eval-confirmation-gate.md +60 -0
  119. package/skills/aw-publish/evals/eval-intent-detection.md +111 -0
  120. package/skills/aw-publish/evals/eval-push-modes.md +67 -0
  121. package/skills/aw-publish/evals/eval-rules-push.md +60 -0
  122. package/skills/aw-publish/evals/evals.json +29 -0
  123. package/skills/aw-publish/references/push-modes.md +38 -0
  124. package/skills/aw-review/SKILL.md +88 -9
  125. package/skills/aw-rules-review/SKILL.md +124 -0
  126. package/skills/aw-rules-review/agents/openai.yaml +3 -0
  127. package/skills/aw-rules-review/scripts/generate-review-template.mjs +323 -0
  128. package/skills/aw-ship/SKILL.md +16 -0
  129. package/skills/aw-spec/SKILL.md +15 -0
  130. package/skills/aw-tasks/SKILL.md +15 -0
  131. package/skills/aw-test/SKILL.md +16 -0
  132. package/skills/aw-yolo/SKILL.md +4 -0
  133. package/skills/diagnose/SKILL.md +121 -0
  134. package/skills/diagnose/scripts/hitl-loop.template.sh +41 -0
  135. package/skills/finish-only-when-green/SKILL.md +265 -0
  136. package/skills/grill-me/SKILL.md +24 -0
  137. package/skills/grill-with-docs/SKILL.md +92 -0
  138. package/skills/grill-with-docs/adr-format.md +47 -0
  139. package/skills/grill-with-docs/context-format.md +67 -0
  140. package/skills/improve-codebase-architecture/SKILL.md +75 -0
  141. package/skills/improve-codebase-architecture/deepening.md +37 -0
  142. package/skills/improve-codebase-architecture/interface-design.md +44 -0
  143. package/skills/improve-codebase-architecture/language.md +53 -0
  144. package/skills/local-ghl-setup-from-screenshot/SKILL.md +538 -0
  145. package/skills/tdd/SKILL.md +115 -0
  146. package/skills/tdd/deep-modules.md +33 -0
  147. package/skills/tdd/interface-design.md +31 -0
  148. package/skills/tdd/mocking.md +59 -0
  149. package/skills/tdd/refactoring.md +10 -0
  150. package/skills/tdd/tests.md +61 -0
  151. package/skills/to-issues/SKILL.md +62 -0
  152. package/skills/to-prd/SKILL.md +75 -0
  153. package/skills/using-aw-skills/SKILL.md +170 -237
  154. package/skills/using-aw-skills/hooks/session-start.sh +11 -41
  155. package/skills/zoom-out/SKILL.md +24 -0
  156. package/.codex/hooks/aw-post-tool-use.sh +0 -6
  157. package/.codex/hooks/aw-pre-tool-use.sh +0 -6
  158. package/.codex/hooks/aw-session-start.sh +0 -25
  159. package/.codex/hooks/aw-stop.sh +0 -6
  160. package/.codex/hooks/aw-user-prompt-submit.sh +0 -10
  161. package/.codex/hooks.json +0 -62
  162. package/.cursor/rules/common-agents.md +0 -53
  163. package/.cursor/rules/common-aw-routing.md +0 -43
  164. package/.cursor/rules/common-coding-style.md +0 -52
  165. package/.cursor/rules/common-development-workflow.md +0 -33
  166. package/.cursor/rules/common-git-workflow.md +0 -28
  167. package/.cursor/rules/common-hooks.md +0 -34
  168. package/.cursor/rules/common-patterns.md +0 -35
  169. package/.cursor/rules/common-performance.md +0 -59
  170. package/.cursor/rules/common-security.md +0 -33
  171. package/.cursor/rules/common-testing.md +0 -33
  172. package/.cursor/skills/api-and-interface-design/SKILL.md +0 -75
  173. package/.cursor/skills/article-writing/SKILL.md +0 -85
  174. package/.cursor/skills/aw-brainstorm/SKILL.md +0 -115
  175. package/.cursor/skills/aw-build/SKILL.md +0 -152
  176. package/.cursor/skills/aw-build/evals/build-stage-cases.json +0 -28
  177. package/.cursor/skills/aw-debug/SKILL.md +0 -49
  178. package/.cursor/skills/aw-deploy/SKILL.md +0 -101
  179. package/.cursor/skills/aw-deploy/evals/deploy-stage-cases.json +0 -32
  180. package/.cursor/skills/aw-execute/SKILL.md +0 -47
  181. package/.cursor/skills/aw-execute/references/mode-code.md +0 -47
  182. package/.cursor/skills/aw-execute/references/mode-docs.md +0 -28
  183. package/.cursor/skills/aw-execute/references/mode-infra.md +0 -44
  184. package/.cursor/skills/aw-execute/references/mode-migration.md +0 -58
  185. package/.cursor/skills/aw-execute/references/worker-implementer.md +0 -26
  186. package/.cursor/skills/aw-execute/references/worker-parallel-worker.md +0 -23
  187. package/.cursor/skills/aw-execute/references/worker-quality-reviewer.md +0 -23
  188. package/.cursor/skills/aw-execute/references/worker-spec-reviewer.md +0 -23
  189. package/.cursor/skills/aw-execute/scripts/build-worker-bundle.js +0 -229
  190. package/.cursor/skills/aw-finish/SKILL.md +0 -111
  191. package/.cursor/skills/aw-investigate/SKILL.md +0 -109
  192. package/.cursor/skills/aw-plan/SKILL.md +0 -368
  193. package/.cursor/skills/aw-prepare/SKILL.md +0 -118
  194. package/.cursor/skills/aw-review/SKILL.md +0 -118
  195. package/.cursor/skills/aw-ship/SKILL.md +0 -115
  196. package/.cursor/skills/aw-spec/SKILL.md +0 -104
  197. package/.cursor/skills/aw-tasks/SKILL.md +0 -138
  198. package/.cursor/skills/aw-test/SKILL.md +0 -118
  199. package/.cursor/skills/aw-verify/SKILL.md +0 -51
  200. package/.cursor/skills/aw-yolo/SKILL.md +0 -111
  201. package/.cursor/skills/browser-testing-with-devtools/SKILL.md +0 -81
  202. package/.cursor/skills/bun-runtime/SKILL.md +0 -84
  203. package/.cursor/skills/ci-cd-and-automation/SKILL.md +0 -71
  204. package/.cursor/skills/code-simplification/SKILL.md +0 -74
  205. package/.cursor/skills/content-engine/SKILL.md +0 -88
  206. package/.cursor/skills/context-engineering/SKILL.md +0 -74
  207. package/.cursor/skills/deprecation-and-migration/SKILL.md +0 -75
  208. package/.cursor/skills/documentation-and-adrs/SKILL.md +0 -75
  209. package/.cursor/skills/documentation-lookup/SKILL.md +0 -90
  210. package/.cursor/skills/frontend-slides/SKILL.md +0 -184
  211. package/.cursor/skills/frontend-slides/STYLE_PRESETS.md +0 -330
  212. package/.cursor/skills/frontend-ui-engineering/SKILL.md +0 -68
  213. package/.cursor/skills/git-workflow-and-versioning/SKILL.md +0 -75
  214. package/.cursor/skills/idea-refine/SKILL.md +0 -84
  215. package/.cursor/skills/incremental-implementation/SKILL.md +0 -75
  216. package/.cursor/skills/investor-materials/SKILL.md +0 -96
  217. package/.cursor/skills/investor-outreach/SKILL.md +0 -76
  218. package/.cursor/skills/market-research/SKILL.md +0 -75
  219. package/.cursor/skills/mcp-server-patterns/SKILL.md +0 -67
  220. package/.cursor/skills/nextjs-turbopack/SKILL.md +0 -44
  221. package/.cursor/skills/performance-optimization/SKILL.md +0 -77
  222. package/.cursor/skills/security-and-hardening/SKILL.md +0 -70
  223. package/.cursor/skills/using-aw-skills/SKILL.md +0 -290
  224. package/.cursor/skills/using-aw-skills/evals/skill-trigger-cases.tsv +0 -25
  225. package/.cursor/skills/using-aw-skills/evals/test-skill-triggers.sh +0 -171
  226. package/.cursor/skills/using-aw-skills/hooks/hooks.json +0 -9
  227. package/.cursor/skills/using-aw-skills/hooks/session-start.sh +0 -67
  228. package/.cursor/skills/using-platform-skills/SKILL.md +0 -163
  229. package/.cursor/skills/using-platform-skills/evals/platform-selection-cases.json +0 -52
  230. /package/.cursor/rules/{golang-coding-style.md → golang-coding-style.mdc} +0 -0
  231. /package/.cursor/rules/{golang-hooks.md → golang-hooks.mdc} +0 -0
  232. /package/.cursor/rules/{golang-patterns.md → golang-patterns.mdc} +0 -0
  233. /package/.cursor/rules/{golang-security.md → golang-security.mdc} +0 -0
  234. /package/.cursor/rules/{golang-testing.md → golang-testing.mdc} +0 -0
  235. /package/.cursor/rules/{kotlin-coding-style.md → kotlin-coding-style.mdc} +0 -0
  236. /package/.cursor/rules/{kotlin-hooks.md → kotlin-hooks.mdc} +0 -0
  237. /package/.cursor/rules/{kotlin-patterns.md → kotlin-patterns.mdc} +0 -0
  238. /package/.cursor/rules/{kotlin-security.md → kotlin-security.mdc} +0 -0
  239. /package/.cursor/rules/{kotlin-testing.md → kotlin-testing.mdc} +0 -0
  240. /package/.cursor/rules/{php-coding-style.md → php-coding-style.mdc} +0 -0
  241. /package/.cursor/rules/{php-hooks.md → php-hooks.mdc} +0 -0
  242. /package/.cursor/rules/{php-patterns.md → php-patterns.mdc} +0 -0
  243. /package/.cursor/rules/{php-security.md → php-security.mdc} +0 -0
  244. /package/.cursor/rules/{php-testing.md → php-testing.mdc} +0 -0
  245. /package/.cursor/rules/{python-coding-style.md → python-coding-style.mdc} +0 -0
  246. /package/.cursor/rules/{python-hooks.md → python-hooks.mdc} +0 -0
  247. /package/.cursor/rules/{python-patterns.md → python-patterns.mdc} +0 -0
  248. /package/.cursor/rules/{python-security.md → python-security.mdc} +0 -0
  249. /package/.cursor/rules/{python-testing.md → python-testing.mdc} +0 -0
  250. /package/.cursor/rules/{swift-coding-style.md → swift-coding-style.mdc} +0 -0
  251. /package/.cursor/rules/{swift-hooks.md → swift-hooks.mdc} +0 -0
  252. /package/.cursor/rules/{swift-patterns.md → swift-patterns.mdc} +0 -0
  253. /package/.cursor/rules/{swift-security.md → swift-security.mdc} +0 -0
  254. /package/.cursor/rules/{swift-testing.md → swift-testing.mdc} +0 -0
  255. /package/.cursor/rules/{typescript-coding-style.md → typescript-coding-style.mdc} +0 -0
  256. /package/.cursor/rules/{typescript-hooks.md → typescript-hooks.mdc} +0 -0
  257. /package/.cursor/rules/{typescript-patterns.md → typescript-patterns.mdc} +0 -0
  258. /package/.cursor/rules/{typescript-security.md → typescript-security.mdc} +0 -0
  259. /package/.cursor/rules/{typescript-testing.md → typescript-testing.mdc} +0 -0
@@ -0,0 +1,98 @@
1
+ # CASRE Type Classifier
2
+
3
+ Decision tree for classifying user requests into the correct artifact type. Use this before any ADK work to prevent misclassification.
4
+
5
+ ## Decision Tree
6
+
7
+ ```
8
+ What does the user want?
9
+
10
+ ├── "I want to define a standard/constraint/rule"
11
+ │ └── RULE — An enforceable constraint with WRONG/RIGHT examples
12
+
13
+ ├── "I want to test/validate an existing artifact"
14
+ │ └── EVAL — Scenarios that verify an artifact works correctly
15
+
16
+ ├── "I want a multi-step workflow that orchestrates agents"
17
+ │ └── COMMAND — A pipeline with phases, agent assignments, checkpoints
18
+
19
+ ├── "I want a persona that makes decisions and uses tools"
20
+ │ └── AGENT — Has identity, judgment, model tier, and skills
21
+
22
+ ├── "I want reusable knowledge, patterns, or checklists"
23
+ │ └── SKILL — Static knowledge loaded on demand
24
+
25
+ └── Ambiguous?
26
+ → Ask: "Does this involve multiple phases with different agents,
27
+ or is it a single body of knowledge?"
28
+ → Ask: "Does this enforce a standard, or teach a practice?"
29
+ ```
30
+
31
+ ## Quick Classifier Table
32
+
33
+ | Signal | Type | Reasoning |
34
+ |---|---|---|
35
+ | "best practices for X" | Skill | Static knowledge, reference material |
36
+ | "review checklist for X" | Skill | Checklist = static knowledge |
37
+ | "pipeline from spec to deploy" | Command | Multi-phase workflow |
38
+ | "automate the ship process" | Command | Orchestration of agents |
39
+ | "expert in database optimization" | Agent | Persona with judgment |
40
+ | "reviewer that checks security" | Agent | Decision-making persona |
41
+ | "no hardcoded secrets allowed" | Rule | Enforceable constraint |
42
+ | "all PRs must have tests" | Rule | Standard with severity |
43
+ | "verify my agent works correctly" | Eval | Testing existing artifact |
44
+ | "add test cases for this skill" | Eval | Validation scenarios |
45
+
46
+ ## Common Misclassifications
47
+
48
+ These are the most frequent mistakes. Catch them before scaffolding:
49
+
50
+ ### "Create a command for X best practices"
51
+ **Wrong:** Command. **Right:** Skill.
52
+ **Why:** "Best practices" is static knowledge, not a multi-phase workflow. Commands orchestrate agents through pipeline phases.
53
+
54
+ ### "Create a command that reviews X"
55
+ **Usually wrong:** Command. **Usually right:** Skill (review checklist).
56
+ **Exception:** If it's a multi-phase pipeline (analyze → review → report → remediate), then it IS a command.
57
+ **Ask:** "Is this a review checklist, or a multi-phase review pipeline?"
58
+
59
+ ### "Create a command that acts as an X expert"
60
+ **Wrong:** Command. **Right:** Agent.
61
+ **Why:** "Acts as" = persona. Commands don't have identity. Agents do.
62
+
63
+ ### "Create a rule for how to write good code"
64
+ **Wrong:** Rule. **Right:** Skill.
65
+ **Why:** Rules enforce specific constraints ("no bare any"). Skills teach practices ("how to write good code").
66
+
67
+ ### "Create an agent that contains all MongoDB patterns"
68
+ **Wrong:** Agent. **Right:** Skill.
69
+ **Why:** A static body of knowledge is a skill. An agent has judgment and decision-making ability. The agent *loads* the skill.
70
+
71
+ ## When to Redirect
72
+
73
+ If you classify and the user disagrees, don't argue. But explain your reasoning:
74
+
75
+ ```
76
+ I'd suggest making this a skill rather than a command because:
77
+ - It's a body of knowledge (MongoDB patterns), not a multi-step workflow
78
+ - Skills are loaded on demand by agents — an agent can use this skill
79
+ - Commands orchestrate agents through phases, which isn't needed here
80
+
81
+ But if you prefer a command, I can scaffold one. What would you like?
82
+ ```
83
+
84
+ ## Type Relationship
85
+
86
+ Understanding how types relate helps classify correctly:
87
+
88
+ ```
89
+ Commands orchestrate → Agents
90
+ Agents load → Skills
91
+ Rules constrain → All types
92
+ Evals validate → All types
93
+ ```
94
+
95
+ - A command USES agents (assigns them to phases)
96
+ - An agent LOADS skills (references them in frontmatter)
97
+ - A rule CONSTRAINS any artifact (applies as a standard)
98
+ - An eval TESTS any artifact (validates it works)
@@ -0,0 +1,227 @@
1
+ # Writing Good Agents
2
+
3
+ An agent is an autonomous AI persona with identity, tools, and judgment. Unlike skills (passive knowledge) or rules (enforced constraints), agents reason independently, make decisions, and produce artifacts.
4
+
5
+ ## Before / After: Identity
6
+
7
+ ### Bad — thin identity
8
+
9
+ ```yaml
10
+ identity:
11
+ role: "A code reviewer"
12
+ ```
13
+
14
+ The agent has no personality, no memory model, no domain grounding. It will produce generic reviews indistinguishable from a raw LLM prompt.
15
+
16
+ ### Good — four-field identity
17
+
18
+ ```yaml
19
+ identity:
20
+ role: >
21
+ Senior backend engineer specializing in NestJS microservices
22
+ with 8+ years of production experience in multi-tenant SaaS platforms.
23
+ personality: >
24
+ Thorough but pragmatic. Flags critical issues firmly, treats style
25
+ preferences as suggestions. Explains *why* something matters, not
26
+ just *what* is wrong. Never condescending.
27
+ memory: >
28
+ Retains context about the current PR's changed files, related test
29
+ coverage, and the service's recent incident history when available.
30
+ experience: >
31
+ Has debugged N+1 query issues, auth bypass vulnerabilities, and
32
+ race conditions in event-driven architectures. Knows the difference
33
+ between a real bug and a nitpick.
34
+ ```
35
+
36
+ Why this works: The four fields constrain behavior across multiple axes. The agent knows *what* it is, *how* it communicates, *what* it remembers, and *what* patterns it recognizes from "experience."
37
+
38
+ ## Before / After: Mission
39
+
40
+ ### Bad — vague mission
41
+
42
+ ```yaml
43
+ mission: "Review code and find issues"
44
+ ```
45
+
46
+ No domain scope, no outcome definition, no boundary.
47
+
48
+ ### Good — concrete mission with domain, outcomes, scope
49
+
50
+ ```yaml
51
+ mission:
52
+ domain: "Backend NestJS services in the payments domain"
53
+ outcomes:
54
+ - "Identify security vulnerabilities (auth bypass, injection, data leakage)"
55
+ - "Flag performance issues (N+1 queries, missing indexes, unbounded loops)"
56
+ - "Verify multi-tenancy scoping (locationId from auth context, not client)"
57
+ - "Ensure error handling follows platform patterns (no empty catch, structured responses)"
58
+ scope:
59
+ includes:
60
+ - "Changed files in the current PR"
61
+ - "Test files corresponding to changed source files"
62
+ excludes:
63
+ - "Generated files (*.generated.ts, migrations)"
64
+ - "Third-party library code"
65
+ - "Style/formatting issues (handled by linter)"
66
+ ```
67
+
68
+ ## Before / After: Communication Style
69
+
70
+ ### Bad — no communication guidance
71
+
72
+ The agent dumps findings in an unstructured wall of text with no severity indicators.
73
+
74
+ ### Good — structured output contract
75
+
76
+ ```yaml
77
+ communication:
78
+ format: |
79
+ ## Review Summary
80
+ **Risk Level:** CRITICAL | HIGH | MEDIUM | LOW
81
+
82
+ ### Findings
83
+ For each finding:
84
+ - **Severity:** CRITICAL / HIGH / MEDIUM / LOW
85
+ - **File:** path/to/file.ts:lineNumber
86
+ - **Issue:** One-sentence description
87
+ - **Why:** Why this matters (security risk, data loss, performance)
88
+ - **Fix:** Concrete code suggestion
89
+
90
+ ### Verdict
91
+ BLOCK (has CRITICAL) | APPROVE WITH COMMENTS | APPROVE
92
+ rules:
93
+ - "CRITICAL findings must include BLOCK verdict"
94
+ - "Never say 'looks good' without evidence of what was checked"
95
+ - "Maximum 10 findings per review — prioritize by severity"
96
+ ```
97
+
98
+ ## Anti-Pattern Catalog
99
+
100
+ ### 1. God-Agent (Does Everything)
101
+
102
+ **Symptom:** One agent handles code review, testing, deployment, documentation, and security analysis.
103
+
104
+ **Fix:** Split by responsibility. Each agent should have one clear mission. A code reviewer does not deploy. A security reviewer does not write docs.
105
+
106
+ **Test:** If your agent's mission has more than 4 outcomes spanning unrelated domains, it's a god-agent.
107
+
108
+ ### 2. Tool Hoarding
109
+
110
+ **Symptom:** Agent has 15 tools listed, uses 3 regularly.
111
+
112
+ **Fix:** Give agents only the tools they need for their mission. Extra tools waste context window and invite off-task behavior.
113
+
114
+ **Guideline:** 3-6 tools is typical. If an agent needs more than 8, it's probably a god-agent in disguise.
115
+
116
+ ### 3. Missing Communication Style
117
+
118
+ **Symptom:** Agent produces output in unpredictable formats. Sometimes bullet lists, sometimes prose, sometimes JSON.
119
+
120
+ **Fix:** Define an explicit output contract. Specify the structure, severity labels, and verdict format. The consuming command or human needs to parse the output reliably.
121
+
122
+ ### 4. No Measurable Metrics
123
+
124
+ **Symptom:** Agent mission says "improve code quality" with no way to verify.
125
+
126
+ **Fix:** Define observable outcomes:
127
+ - "Flag all uses of bare `any` type" (countable)
128
+ - "Identify missing test files for new source files" (binary per file)
129
+ - "Detect N+1 query patterns in repository methods" (specific pattern)
130
+
131
+ ### 5. Generic Rules Without BLOCK/NEVER
132
+
133
+ **Symptom:** Agent instructions say "be careful with security" without specifying what triggers a block.
134
+
135
+ **Fix:** Use explicit behavioral boundaries:
136
+
137
+ ```yaml
138
+ rules:
139
+ BLOCK:
140
+ - "Hardcoded secrets (API keys, passwords, tokens)"
141
+ - "Missing auth guard on new endpoints"
142
+ - "locationId from client input instead of auth context"
143
+ NEVER:
144
+ - "Never approve a PR with failing tests"
145
+ - "Never suggest disabling TypeScript strict mode"
146
+ PREFER:
147
+ - "Prefer suggesting fixes over just flagging issues"
148
+ ```
149
+
150
+ ### 6. No Error Handling Guidance
151
+
152
+ **Symptom:** Agent crashes or produces garbage when it encounters unexpected input (empty diff, binary files, massive files).
153
+
154
+ **Fix:** Define edge case behavior:
155
+
156
+ ```yaml
157
+ edge_cases:
158
+ empty_diff: "Report 'No changes to review' and exit"
159
+ binary_files: "Skip with note: 'Binary file skipped: {path}'"
160
+ file_over_1000_lines: "Review only changed hunks, note that full file review was skipped"
161
+ ```
162
+
163
+ ## Scope Boundaries
164
+
165
+ ### Agent vs Skill vs Command
166
+
167
+ | Dimension | Agent | Skill | Command |
168
+ |-----------|-------|-------|---------|
169
+ | **Has identity** | Yes | No | No |
170
+ | **Makes decisions** | Yes | No | Orchestrates decisions |
171
+ | **Loaded by** | Command or user | Agent or command | User directly |
172
+ | **Produces** | Findings, artifacts, verdicts | Nothing (passive reference) | End-to-end outcome |
173
+ | **Example** | Security reviewer | NestJS auth patterns | `/review-pr` |
174
+
175
+ ### Decision Guide
176
+
177
+ - **Need autonomous judgment?** → Agent
178
+ - **Need reusable knowledge?** → Skill
179
+ - **Need a multi-step pipeline?** → Command (which uses agents)
180
+ - **Need an enforceable constraint?** → Rule
181
+
182
+ ## Squad Assignment (1-9)
183
+
184
+ Squads group agents by domain for efficient coordination:
185
+
186
+ | Squad | Domain | Example Agents |
187
+ |-------|--------|----------------|
188
+ | 1 | Planning & Architecture | planner, architect |
189
+ | 2 | Implementation | coder, refactorer |
190
+ | 3 | Testing | tdd-guide, e2e-runner |
191
+ | 4 | Review & Quality | code-reviewer, security-reviewer |
192
+ | 5 | DevOps & Infrastructure | deployer, build-resolver |
193
+ | 6 | Documentation | doc-updater |
194
+ | 7 | Data & Analytics | data-reviewer |
195
+ | 8 | Frontend | ui-reviewer, a11y-checker |
196
+ | 9 | Coordination | command coordinators |
197
+
198
+ **Rules:**
199
+ - Agents in the same squad share domain context and can hand off seamlessly.
200
+ - Cross-squad communication goes through the coordinator (squad 9).
201
+ - An agent belongs to exactly one squad.
202
+
203
+ ## Model Tier Selection
204
+
205
+ | Tier | Model | Use For | Cost Signal |
206
+ |------|-------|---------|-------------|
207
+ | **Coordinator** | Opus | Orchestration, judgment calls, architectural decisions, conflict resolution | High |
208
+ | **Worker** | Sonnet | Code generation, implementation, detailed review, refactoring | Medium |
209
+ | **Checker** | Haiku | Checklists, linting-style checks, simple validations, formatting | Low |
210
+
211
+ **Guidelines:**
212
+ - Coordinators (Opus) make judgment calls and resolve ambiguity. They do not write code.
213
+ - Workers (Sonnet) do the heavy lifting. Most agents are workers.
214
+ - Checkers (Haiku) handle mechanical tasks. Use when the task is deterministic and the instructions are clear enough for the smallest model.
215
+ - If a Haiku-tier agent produces inconsistent results, promote to Sonnet. If Sonnet can't handle the judgment, promote to Opus.
216
+
217
+ ## Agent Quality Checklist
218
+
219
+ - [ ] Four-field identity (role, personality, memory, experience)
220
+ - [ ] Concrete mission with domain, outcomes, and scope boundaries
221
+ - [ ] 3-6 tools (no hoarding)
222
+ - [ ] Explicit output contract with structure and severity levels
223
+ - [ ] BLOCK/NEVER/PREFER behavioral rules
224
+ - [ ] Edge case handling defined
225
+ - [ ] Model tier justified (not defaulting to Opus for everything)
226
+ - [ ] Squad assignment documented
227
+ - [ ] Tested with representative inputs including edge cases
@@ -0,0 +1,258 @@
1
+ # Writing Good Commands
2
+
3
+ A command is a user-facing pipeline that orchestrates agents, skills, and tools through defined phases to produce an end-to-end outcome. Commands are what users invoke (e.g., `/review-pr`, `/implement-feature`).
4
+
5
+ ## Before / After: Command Structure
6
+
7
+ ### Bad — monolith command
8
+
9
+ ```yaml
10
+ name: review-pr
11
+ phases:
12
+ - name: "Review"
13
+ description: "Review the PR and provide feedback"
14
+ agents: [code-reviewer, security-reviewer, performance-reviewer, test-reviewer, doc-reviewer]
15
+ steps:
16
+ - "Load the PR diff"
17
+ - "Review everything"
18
+ - "Output findings"
19
+ ```
20
+
21
+ Problems: One phase does everything. No checkpoints. No skill loading. Five agents compete for context with no coordination. No error handling. The user sees nothing until the end.
22
+
23
+ ### Good — phased pipeline with checkpoints
24
+
25
+ ```yaml
26
+ name: review-pr
27
+ phases:
28
+ - name: "Context"
29
+ description: "Load PR context and determine review scope"
30
+ agent: coordinator
31
+ model: opus
32
+ steps:
33
+ - "Fetch PR diff and metadata via gh CLI"
34
+ - "Classify changed files by domain (backend, frontend, infra, test)"
35
+ - "Load relevant skills based on file types"
36
+ - "Determine which review agents are needed"
37
+ gate: "coordinator confirms scope and agent roster before proceeding"
38
+
39
+ - name: "Review"
40
+ description: "Parallel domain-specific reviews"
41
+ agents:
42
+ - code-reviewer (changed backend files)
43
+ - security-reviewer (auth/input/secret files)
44
+ model: sonnet
45
+ execution: parallel
46
+ steps:
47
+ - "Each agent reviews its assigned files"
48
+ - "Each agent produces structured findings with severity"
49
+
50
+ - name: "Synthesis"
51
+ description: "Merge findings and produce verdict"
52
+ agent: coordinator
53
+ model: opus
54
+ steps:
55
+ - "Collect findings from all reviewers"
56
+ - "Deduplicate overlapping findings"
57
+ - "Assign final verdict: BLOCK / APPROVE WITH COMMENTS / APPROVE"
58
+ checkpoint: "Present findings to user for confirmation before posting"
59
+
60
+ - name: "Publish"
61
+ description: "Post review to GitHub"
62
+ agent: coordinator
63
+ steps:
64
+ - "Format findings as PR review comments"
65
+ - "Post via gh CLI"
66
+ ```
67
+
68
+ ## Phase Design Patterns
69
+
70
+ ### Linear Pipeline
71
+
72
+ Phases execute sequentially. Output of phase N is input to phase N+1.
73
+
74
+ ```
75
+ Context → Plan → Implement → Test → Review → Commit
76
+ ```
77
+
78
+ **Use when:** Tasks have natural ordering where later phases depend on earlier results.
79
+
80
+ ### Interactive Loop
81
+
82
+ A phase repeats until a condition is met, with human input between iterations.
83
+
84
+ ```
85
+ Draft → [checkpoint: user reviews] → Revise → [checkpoint] → ... → Approve
86
+ ```
87
+
88
+ **Use when:** Output quality requires human judgment (e.g., PRD drafting, design review).
89
+
90
+ ### Parallel Fan-Out
91
+
92
+ Multiple agents work simultaneously on independent subtasks, then results merge.
93
+
94
+ ```
95
+ Context → [security-review | code-review | perf-review] → Synthesis
96
+ ```
97
+
98
+ **Use when:** Subtasks are independent and can run concurrently. Always follow with a synthesis phase.
99
+
100
+ ### MCP-Driven
101
+
102
+ Phases interact with external systems (GitHub, Jenkins, Grafana) via MCP tools.
103
+
104
+ ```
105
+ Fetch PR → Review → Post Comments → Trigger Build → Monitor
106
+ ```
107
+
108
+ **Use when:** The command integrates with external services and needs to react to their responses.
109
+
110
+ ## Anti-Pattern Catalog
111
+
112
+ ### 1. No Checkpoints
113
+
114
+ **Symptom:** Command runs 10 minutes, produces wrong output, user has no opportunity to correct course.
115
+
116
+ **Fix:** Add at least 1 human checkpoint. Place it after the most consequential decision (usually after planning or before publishing).
117
+
118
+ **Rule of thumb:** Minimum 1 checkpoint, maximum 3. More than 3 creates friction. Fewer than 1 creates risk.
119
+
120
+ ### 2. No Skill Loading Gate
121
+
122
+ **Symptom:** Agents start working without loading relevant skills. They use generic knowledge instead of your team's patterns.
123
+
124
+ **Fix:** Add a Context phase that classifies the task and loads appropriate skills before agents begin work.
125
+
126
+ ```yaml
127
+ # BAD: agents start cold
128
+ phases:
129
+ - name: "Review"
130
+ agents: [code-reviewer]
131
+
132
+ # GOOD: context phase loads skills first
133
+ phases:
134
+ - name: "Context"
135
+ steps:
136
+ - "Classify changed files by domain"
137
+ - "Load skills: nestjs-patterns, mongoose-queries (based on file types)"
138
+ - name: "Review"
139
+ agents: [code-reviewer] # now has loaded skills in context
140
+ ```
141
+
142
+ ### 3. No Error Handling
143
+
144
+ **Symptom:** If `gh pr diff` fails (network error, auth issue), the command crashes with no recovery.
145
+
146
+ **Fix:** Define fallback behavior for each phase:
147
+
148
+ ```yaml
149
+ error_handling:
150
+ network_failure: "Retry once, then report error to user with diagnostic info"
151
+ empty_diff: "Report 'No changes found' and exit gracefully"
152
+ agent_timeout: "Use partial results, note incomplete review in output"
153
+ ```
154
+
155
+ ### 4. Silent Execution
156
+
157
+ **Symptom:** User invokes command and sees nothing for 5 minutes.
158
+
159
+ **Fix:** Each phase should emit a progress signal:
160
+
161
+ ```yaml
162
+ phases:
163
+ - name: "Context"
164
+ on_start: "Fetching PR #{{pr_number}} context..."
165
+ on_complete: "Scope: {{file_count}} files across {{domains}} domains"
166
+ - name: "Review"
167
+ on_start: "Running {{agent_count}} reviewers in parallel..."
168
+ on_complete: "Found {{finding_count}} findings ({{critical_count}} critical)"
169
+ ```
170
+
171
+ ### 5. Too Many Agents
172
+
173
+ **Symptom:** Command uses 7 agents, each invoked once. Context window is bloated with identity setup for agents that do minimal work.
174
+
175
+ **Fix:** Follow the agent roster rules below. If an agent does one small task, consider making it a step within another agent's workflow instead.
176
+
177
+ ### 6. No Phase Gates
178
+
179
+ **Symptom:** Phase 2 proceeds even when Phase 1 produced garbage (e.g., empty plan, failed fetch).
180
+
181
+ **Fix:** Add gates between phases:
182
+
183
+ ```yaml
184
+ gate: "Plan must contain at least 3 implementation steps and a test strategy"
185
+ ```
186
+
187
+ If the gate fails, the command stops and reports to the user rather than wasting tokens on doomed downstream phases.
188
+
189
+ ## Agent Roster Rules
190
+
191
+ ### Minimize Unique Agents
192
+
193
+ Every unique agent added to a command costs context window (identity, mission, tools, rules). Use the minimum set.
194
+
195
+ | Command Complexity | Recommended Agent Count |
196
+ |-------------------|------------------------|
197
+ | Simple (single-domain) | 1-2 |
198
+ | Medium (cross-domain) | 2-3 |
199
+ | Complex (full pipeline) | 3-5 |
200
+
201
+ ### Max 2-3 Uses Per Agent
202
+
203
+ If an agent is used more than 3 times in a command, it's doing too much. Either:
204
+ - Merge those phases into one agent invocation
205
+ - Split the agent's responsibilities
206
+
207
+ ### Coordinator Is Opus
208
+
209
+ The coordinating agent (phase routing, synthesis, conflict resolution) should run on Opus. Worker agents run on Sonnet. Checklist agents run on Haiku.
210
+
211
+ ```yaml
212
+ # GOOD: tiered model assignment
213
+ agents:
214
+ coordinator: { model: opus, uses: [context, synthesis, publish] }
215
+ code-reviewer: { model: sonnet, uses: [review] }
216
+ lint-checker: { model: haiku, uses: [formatting-check] }
217
+ ```
218
+
219
+ ## Human Checkpoint Guidance
220
+
221
+ ### Minimum: 1 Checkpoint
222
+
223
+ Every command that produces artifacts visible to others (PR comments, commits, messages) must have at least one checkpoint before publishing.
224
+
225
+ ### Maximum: 3 Checkpoints
226
+
227
+ More than 3 checkpoints turns an automated command into a manual process. If you need that much human oversight, the command is not well-defined enough.
228
+
229
+ ### Where to Place Checkpoints
230
+
231
+ | Placement | When |
232
+ |-----------|------|
233
+ | After planning | When the plan determines all downstream work |
234
+ | After review/synthesis | When findings will be published externally |
235
+ | Before destructive actions | Commits, deployments, PR comments |
236
+
237
+ ### Checkpoint Format
238
+
239
+ ```yaml
240
+ checkpoint:
241
+ display: "Summary of what was done and what will happen next"
242
+ options:
243
+ - "proceed" — continue to next phase
244
+ - "revise" — re-run current phase with feedback
245
+ - "abort" — stop command, preserve artifacts so far
246
+ ```
247
+
248
+ ## Command Quality Checklist
249
+
250
+ - [ ] Minimum 2 phases (context + execution at minimum)
251
+ - [ ] Skill loading gate in first phase
252
+ - [ ] 1-3 human checkpoints at consequential decision points
253
+ - [ ] Progress signals on every phase (on_start, on_complete)
254
+ - [ ] Error handling with fallbacks for network, empty input, timeouts
255
+ - [ ] Agent count justified (not exceeding 5 for complex commands)
256
+ - [ ] Coordinator on Opus, workers on Sonnet, checkers on Haiku
257
+ - [ ] Gates between phases to prevent garbage propagation
258
+ - [ ] Each agent used 1-3 times (not more)