aw-ecc 1.4.32 → 1.4.48
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/plugin.json +1 -1
- package/.cursor/INSTALL.md +7 -5
- package/.cursor/hooks/adapter.js +41 -4
- package/.cursor/hooks/after-agent-response.js +62 -0
- package/.cursor/hooks/before-submit-prompt.js +7 -1
- package/.cursor/hooks/post-tool-use-failure.js +21 -0
- package/.cursor/hooks/post-tool-use.js +39 -0
- package/.cursor/hooks/shared/aw-phase-definitions.js +53 -0
- package/.cursor/hooks/shared/aw-phase-runner.js +3 -1
- package/.cursor/hooks/subagent-start.js +22 -4
- package/.cursor/hooks/subagent-stop.js +18 -1
- package/.cursor/hooks.json +23 -2
- package/.opencode/package.json +1 -1
- package/AGENTS.md +3 -3
- package/README.md +5 -5
- package/commands/adk.md +52 -0
- package/commands/build.md +22 -9
- package/commands/deploy.md +12 -0
- package/commands/execute.md +9 -0
- package/commands/feature.md +333 -0
- package/commands/investigate.md +18 -5
- package/commands/plan.md +23 -9
- package/commands/publish.md +65 -0
- package/commands/review.md +12 -0
- package/commands/ship.md +12 -0
- package/commands/test.md +12 -0
- package/commands/verify.md +9 -0
- package/hooks/hooks.json +36 -0
- package/manifests/install-components.json +8 -0
- package/manifests/install-modules.json +83 -0
- package/manifests/install-profiles.json +7 -0
- package/package.json +2 -2
- package/scripts/ci/validate-rules.js +51 -0
- package/scripts/cursor-aw-home/hooks.json +23 -2
- package/scripts/cursor-aw-hooks/adapter.js +41 -4
- package/scripts/cursor-aw-hooks/before-submit-prompt.js +7 -1
- package/scripts/hooks/aw-usage-commit-created.js +32 -0
- package/scripts/hooks/aw-usage-post-tool-use-failure.js +56 -0
- package/scripts/hooks/aw-usage-post-tool-use.js +242 -0
- package/scripts/hooks/aw-usage-prompt-submit.js +112 -0
- package/scripts/hooks/aw-usage-session-start.js +48 -0
- package/scripts/hooks/aw-usage-stop.js +182 -0
- package/scripts/hooks/aw-usage-telemetry-send.js +84 -0
- package/scripts/hooks/cost-tracker.js +3 -23
- package/scripts/hooks/shared/aw-phase-definitions.js +53 -0
- package/scripts/hooks/shared/aw-phase-runner.js +3 -1
- package/scripts/lib/aw-hook-contract.js +2 -2
- package/scripts/lib/aw-pricing.js +306 -0
- package/scripts/lib/aw-usage-telemetry.js +472 -0
- package/scripts/lib/codex-hook-config.js +8 -8
- package/scripts/lib/cursor-hook-config.js +25 -10
- package/scripts/lib/install-targets/cursor-project.js +3 -0
- package/scripts/lib/install-targets/helpers.js +20 -3
- package/skills/aw-adk/SKILL.md +317 -0
- package/skills/aw-adk/agents/analyzer.md +113 -0
- package/skills/aw-adk/agents/comparator.md +113 -0
- package/skills/aw-adk/agents/grader.md +115 -0
- package/skills/aw-adk/assets/eval_review.html +76 -0
- package/skills/aw-adk/eval-viewer/generate_review.py +164 -0
- package/skills/aw-adk/eval-viewer/viewer.html +181 -0
- package/skills/aw-adk/evals/eval-colocated-placement.md +84 -0
- package/skills/aw-adk/evals/eval-create-agent.md +90 -0
- package/skills/aw-adk/evals/eval-create-command.md +98 -0
- package/skills/aw-adk/evals/eval-create-eval.md +89 -0
- package/skills/aw-adk/evals/eval-create-rule.md +99 -0
- package/skills/aw-adk/evals/eval-create-skill.md +97 -0
- package/skills/aw-adk/evals/eval-delete-agent.md +79 -0
- package/skills/aw-adk/evals/eval-delete-command.md +89 -0
- package/skills/aw-adk/evals/eval-delete-rule.md +86 -0
- package/skills/aw-adk/evals/eval-delete-skill.md +90 -0
- package/skills/aw-adk/evals/eval-meta-eval-coverage.md +78 -0
- package/skills/aw-adk/evals/eval-meta-eval-determinism.md +81 -0
- package/skills/aw-adk/evals/eval-meta-eval-false-pass.md +81 -0
- package/skills/aw-adk/evals/eval-score-accuracy.md +95 -0
- package/skills/aw-adk/evals/eval-type-redirect.md +68 -0
- package/skills/aw-adk/evals/evals.json +96 -0
- package/skills/aw-adk/references/artifact-wiring.md +162 -0
- package/skills/aw-adk/references/cross-ide-mapping.md +71 -0
- package/skills/aw-adk/references/eval-placement-guide.md +183 -0
- package/skills/aw-adk/references/external-resources.md +75 -0
- package/skills/aw-adk/references/getting-started.md +66 -0
- package/skills/aw-adk/references/registry-structure.md +152 -0
- package/skills/aw-adk/references/rubric-agent.md +36 -0
- package/skills/aw-adk/references/rubric-command.md +36 -0
- package/skills/aw-adk/references/rubric-eval.md +36 -0
- package/skills/aw-adk/references/rubric-meta-eval.md +132 -0
- package/skills/aw-adk/references/rubric-rule.md +36 -0
- package/skills/aw-adk/references/rubric-skill.md +36 -0
- package/skills/aw-adk/references/schemas.md +222 -0
- package/skills/aw-adk/references/template-agent.md +251 -0
- package/skills/aw-adk/references/template-command.md +279 -0
- package/skills/aw-adk/references/template-eval.md +176 -0
- package/skills/aw-adk/references/template-rule.md +119 -0
- package/skills/aw-adk/references/template-skill.md +123 -0
- package/skills/aw-adk/references/type-classifier.md +98 -0
- package/skills/aw-adk/references/writing-good-agents.md +227 -0
- package/skills/aw-adk/references/writing-good-commands.md +258 -0
- package/skills/aw-adk/references/writing-good-evals.md +271 -0
- package/skills/aw-adk/references/writing-good-rules.md +214 -0
- package/skills/aw-adk/references/writing-good-skills.md +159 -0
- package/skills/aw-adk/scripts/aggregate-benchmark.py +190 -0
- package/skills/aw-adk/scripts/lint-artifact.sh +211 -0
- package/skills/aw-adk/scripts/score-artifact.sh +179 -0
- package/skills/aw-adk/scripts/trigger-eval.py +192 -0
- package/skills/aw-build/SKILL.md +19 -2
- package/skills/aw-deploy/SKILL.md +65 -3
- package/skills/aw-design/SKILL.md +156 -0
- package/skills/aw-design/references/highrise-tokens.md +394 -0
- package/skills/aw-design/references/micro-interactions.md +76 -0
- package/skills/aw-design/references/prompt-template.md +160 -0
- package/skills/aw-design/references/quality-checklist.md +70 -0
- package/skills/aw-design/references/self-review.md +497 -0
- package/skills/aw-design/references/stitch-workflow.md +127 -0
- package/skills/aw-feature/SKILL.md +293 -0
- package/skills/aw-investigate/SKILL.md +17 -0
- package/skills/aw-plan/SKILL.md +34 -3
- package/skills/aw-publish/SKILL.md +300 -0
- package/skills/aw-publish/evals/eval-confirmation-gate.md +60 -0
- package/skills/aw-publish/evals/eval-intent-detection.md +111 -0
- package/skills/aw-publish/evals/eval-push-modes.md +67 -0
- package/skills/aw-publish/evals/eval-rules-push.md +60 -0
- package/skills/aw-publish/evals/evals.json +29 -0
- package/skills/aw-publish/references/push-modes.md +38 -0
- package/skills/aw-review/SKILL.md +88 -9
- package/skills/aw-rules-review/SKILL.md +124 -0
- package/skills/aw-rules-review/agents/openai.yaml +3 -0
- package/skills/aw-rules-review/scripts/generate-review-template.mjs +323 -0
- package/skills/aw-ship/SKILL.md +16 -0
- package/skills/aw-spec/SKILL.md +15 -0
- package/skills/aw-tasks/SKILL.md +15 -0
- package/skills/aw-test/SKILL.md +16 -0
- package/skills/aw-yolo/SKILL.md +4 -0
- package/skills/diagnose/SKILL.md +121 -0
- package/skills/diagnose/scripts/hitl-loop.template.sh +41 -0
- package/skills/finish-only-when-green/SKILL.md +265 -0
- package/skills/grill-me/SKILL.md +24 -0
- package/skills/grill-with-docs/SKILL.md +92 -0
- package/skills/grill-with-docs/adr-format.md +47 -0
- package/skills/grill-with-docs/context-format.md +67 -0
- package/skills/improve-codebase-architecture/SKILL.md +75 -0
- package/skills/improve-codebase-architecture/deepening.md +37 -0
- package/skills/improve-codebase-architecture/interface-design.md +44 -0
- package/skills/improve-codebase-architecture/language.md +53 -0
- package/skills/local-ghl-setup-from-screenshot/SKILL.md +538 -0
- package/skills/tdd/SKILL.md +115 -0
- package/skills/tdd/deep-modules.md +33 -0
- package/skills/tdd/interface-design.md +31 -0
- package/skills/tdd/mocking.md +59 -0
- package/skills/tdd/refactoring.md +10 -0
- package/skills/tdd/tests.md +61 -0
- package/skills/to-issues/SKILL.md +62 -0
- package/skills/to-prd/SKILL.md +75 -0
- package/skills/using-aw-skills/SKILL.md +170 -237
- package/skills/using-aw-skills/hooks/session-start.sh +11 -41
- package/skills/zoom-out/SKILL.md +24 -0
- package/.codex/hooks/aw-post-tool-use.sh +0 -6
- package/.codex/hooks/aw-pre-tool-use.sh +0 -6
- package/.codex/hooks/aw-session-start.sh +0 -25
- package/.codex/hooks/aw-stop.sh +0 -6
- package/.codex/hooks/aw-user-prompt-submit.sh +0 -10
- package/.codex/hooks.json +0 -62
- package/.cursor/rules/common-agents.md +0 -53
- package/.cursor/rules/common-aw-routing.md +0 -43
- package/.cursor/rules/common-coding-style.md +0 -52
- package/.cursor/rules/common-development-workflow.md +0 -33
- package/.cursor/rules/common-git-workflow.md +0 -28
- package/.cursor/rules/common-hooks.md +0 -34
- package/.cursor/rules/common-patterns.md +0 -35
- package/.cursor/rules/common-performance.md +0 -59
- package/.cursor/rules/common-security.md +0 -33
- package/.cursor/rules/common-testing.md +0 -33
- package/.cursor/skills/api-and-interface-design/SKILL.md +0 -75
- package/.cursor/skills/article-writing/SKILL.md +0 -85
- package/.cursor/skills/aw-brainstorm/SKILL.md +0 -115
- package/.cursor/skills/aw-build/SKILL.md +0 -152
- package/.cursor/skills/aw-build/evals/build-stage-cases.json +0 -28
- package/.cursor/skills/aw-debug/SKILL.md +0 -49
- package/.cursor/skills/aw-deploy/SKILL.md +0 -101
- package/.cursor/skills/aw-deploy/evals/deploy-stage-cases.json +0 -32
- package/.cursor/skills/aw-execute/SKILL.md +0 -47
- package/.cursor/skills/aw-execute/references/mode-code.md +0 -47
- package/.cursor/skills/aw-execute/references/mode-docs.md +0 -28
- package/.cursor/skills/aw-execute/references/mode-infra.md +0 -44
- package/.cursor/skills/aw-execute/references/mode-migration.md +0 -58
- package/.cursor/skills/aw-execute/references/worker-implementer.md +0 -26
- package/.cursor/skills/aw-execute/references/worker-parallel-worker.md +0 -23
- package/.cursor/skills/aw-execute/references/worker-quality-reviewer.md +0 -23
- package/.cursor/skills/aw-execute/references/worker-spec-reviewer.md +0 -23
- package/.cursor/skills/aw-execute/scripts/build-worker-bundle.js +0 -229
- package/.cursor/skills/aw-finish/SKILL.md +0 -111
- package/.cursor/skills/aw-investigate/SKILL.md +0 -109
- package/.cursor/skills/aw-plan/SKILL.md +0 -368
- package/.cursor/skills/aw-prepare/SKILL.md +0 -118
- package/.cursor/skills/aw-review/SKILL.md +0 -118
- package/.cursor/skills/aw-ship/SKILL.md +0 -115
- package/.cursor/skills/aw-spec/SKILL.md +0 -104
- package/.cursor/skills/aw-tasks/SKILL.md +0 -138
- package/.cursor/skills/aw-test/SKILL.md +0 -118
- package/.cursor/skills/aw-verify/SKILL.md +0 -51
- package/.cursor/skills/aw-yolo/SKILL.md +0 -111
- package/.cursor/skills/browser-testing-with-devtools/SKILL.md +0 -81
- package/.cursor/skills/bun-runtime/SKILL.md +0 -84
- package/.cursor/skills/ci-cd-and-automation/SKILL.md +0 -71
- package/.cursor/skills/code-simplification/SKILL.md +0 -74
- package/.cursor/skills/content-engine/SKILL.md +0 -88
- package/.cursor/skills/context-engineering/SKILL.md +0 -74
- package/.cursor/skills/deprecation-and-migration/SKILL.md +0 -75
- package/.cursor/skills/documentation-and-adrs/SKILL.md +0 -75
- package/.cursor/skills/documentation-lookup/SKILL.md +0 -90
- package/.cursor/skills/frontend-slides/SKILL.md +0 -184
- package/.cursor/skills/frontend-slides/STYLE_PRESETS.md +0 -330
- package/.cursor/skills/frontend-ui-engineering/SKILL.md +0 -68
- package/.cursor/skills/git-workflow-and-versioning/SKILL.md +0 -75
- package/.cursor/skills/idea-refine/SKILL.md +0 -84
- package/.cursor/skills/incremental-implementation/SKILL.md +0 -75
- package/.cursor/skills/investor-materials/SKILL.md +0 -96
- package/.cursor/skills/investor-outreach/SKILL.md +0 -76
- package/.cursor/skills/market-research/SKILL.md +0 -75
- package/.cursor/skills/mcp-server-patterns/SKILL.md +0 -67
- package/.cursor/skills/nextjs-turbopack/SKILL.md +0 -44
- package/.cursor/skills/performance-optimization/SKILL.md +0 -77
- package/.cursor/skills/security-and-hardening/SKILL.md +0 -70
- package/.cursor/skills/using-aw-skills/SKILL.md +0 -290
- package/.cursor/skills/using-aw-skills/evals/skill-trigger-cases.tsv +0 -25
- package/.cursor/skills/using-aw-skills/evals/test-skill-triggers.sh +0 -171
- package/.cursor/skills/using-aw-skills/hooks/hooks.json +0 -9
- package/.cursor/skills/using-aw-skills/hooks/session-start.sh +0 -67
- package/.cursor/skills/using-platform-skills/SKILL.md +0 -163
- package/.cursor/skills/using-platform-skills/evals/platform-selection-cases.json +0 -52
- /package/.cursor/rules/{golang-coding-style.md → golang-coding-style.mdc} +0 -0
- /package/.cursor/rules/{golang-hooks.md → golang-hooks.mdc} +0 -0
- /package/.cursor/rules/{golang-patterns.md → golang-patterns.mdc} +0 -0
- /package/.cursor/rules/{golang-security.md → golang-security.mdc} +0 -0
- /package/.cursor/rules/{golang-testing.md → golang-testing.mdc} +0 -0
- /package/.cursor/rules/{kotlin-coding-style.md → kotlin-coding-style.mdc} +0 -0
- /package/.cursor/rules/{kotlin-hooks.md → kotlin-hooks.mdc} +0 -0
- /package/.cursor/rules/{kotlin-patterns.md → kotlin-patterns.mdc} +0 -0
- /package/.cursor/rules/{kotlin-security.md → kotlin-security.mdc} +0 -0
- /package/.cursor/rules/{kotlin-testing.md → kotlin-testing.mdc} +0 -0
- /package/.cursor/rules/{php-coding-style.md → php-coding-style.mdc} +0 -0
- /package/.cursor/rules/{php-hooks.md → php-hooks.mdc} +0 -0
- /package/.cursor/rules/{php-patterns.md → php-patterns.mdc} +0 -0
- /package/.cursor/rules/{php-security.md → php-security.mdc} +0 -0
- /package/.cursor/rules/{php-testing.md → php-testing.mdc} +0 -0
- /package/.cursor/rules/{python-coding-style.md → python-coding-style.mdc} +0 -0
- /package/.cursor/rules/{python-hooks.md → python-hooks.mdc} +0 -0
- /package/.cursor/rules/{python-patterns.md → python-patterns.mdc} +0 -0
- /package/.cursor/rules/{python-security.md → python-security.mdc} +0 -0
- /package/.cursor/rules/{python-testing.md → python-testing.mdc} +0 -0
- /package/.cursor/rules/{swift-coding-style.md → swift-coding-style.mdc} +0 -0
- /package/.cursor/rules/{swift-hooks.md → swift-hooks.mdc} +0 -0
- /package/.cursor/rules/{swift-patterns.md → swift-patterns.mdc} +0 -0
- /package/.cursor/rules/{swift-security.md → swift-security.mdc} +0 -0
- /package/.cursor/rules/{swift-testing.md → swift-testing.mdc} +0 -0
- /package/.cursor/rules/{typescript-coding-style.md → typescript-coding-style.mdc} +0 -0
- /package/.cursor/rules/{typescript-hooks.md → typescript-hooks.mdc} +0 -0
- /package/.cursor/rules/{typescript-patterns.md → typescript-patterns.mdc} +0 -0
- /package/.cursor/rules/{typescript-security.md → typescript-security.mdc} +0 -0
- /package/.cursor/rules/{typescript-testing.md → typescript-testing.mdc} +0 -0
|
@@ -0,0 +1,98 @@
|
|
|
1
|
+
# CASRE Type Classifier
|
|
2
|
+
|
|
3
|
+
Decision tree for classifying user requests into the correct artifact type. Use this before any ADK work to prevent misclassification.
|
|
4
|
+
|
|
5
|
+
## Decision Tree
|
|
6
|
+
|
|
7
|
+
```
|
|
8
|
+
What does the user want?
|
|
9
|
+
│
|
|
10
|
+
├── "I want to define a standard/constraint/rule"
|
|
11
|
+
│ └── RULE — An enforceable constraint with WRONG/RIGHT examples
|
|
12
|
+
│
|
|
13
|
+
├── "I want to test/validate an existing artifact"
|
|
14
|
+
│ └── EVAL — Scenarios that verify an artifact works correctly
|
|
15
|
+
│
|
|
16
|
+
├── "I want a multi-step workflow that orchestrates agents"
|
|
17
|
+
│ └── COMMAND — A pipeline with phases, agent assignments, checkpoints
|
|
18
|
+
│
|
|
19
|
+
├── "I want a persona that makes decisions and uses tools"
|
|
20
|
+
│ └── AGENT — Has identity, judgment, model tier, and skills
|
|
21
|
+
│
|
|
22
|
+
├── "I want reusable knowledge, patterns, or checklists"
|
|
23
|
+
│ └── SKILL — Static knowledge loaded on demand
|
|
24
|
+
│
|
|
25
|
+
└── Ambiguous?
|
|
26
|
+
→ Ask: "Does this involve multiple phases with different agents,
|
|
27
|
+
or is it a single body of knowledge?"
|
|
28
|
+
→ Ask: "Does this enforce a standard, or teach a practice?"
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
## Quick Classifier Table
|
|
32
|
+
|
|
33
|
+
| Signal | Type | Reasoning |
|
|
34
|
+
|---|---|---|
|
|
35
|
+
| "best practices for X" | Skill | Static knowledge, reference material |
|
|
36
|
+
| "review checklist for X" | Skill | Checklist = static knowledge |
|
|
37
|
+
| "pipeline from spec to deploy" | Command | Multi-phase workflow |
|
|
38
|
+
| "automate the ship process" | Command | Orchestration of agents |
|
|
39
|
+
| "expert in database optimization" | Agent | Persona with judgment |
|
|
40
|
+
| "reviewer that checks security" | Agent | Decision-making persona |
|
|
41
|
+
| "no hardcoded secrets allowed" | Rule | Enforceable constraint |
|
|
42
|
+
| "all PRs must have tests" | Rule | Standard with severity |
|
|
43
|
+
| "verify my agent works correctly" | Eval | Testing existing artifact |
|
|
44
|
+
| "add test cases for this skill" | Eval | Validation scenarios |
|
|
45
|
+
|
|
46
|
+
## Common Misclassifications
|
|
47
|
+
|
|
48
|
+
These are the most frequent mistakes. Catch them before scaffolding:
|
|
49
|
+
|
|
50
|
+
### "Create a command for X best practices"
|
|
51
|
+
**Wrong:** Command. **Right:** Skill.
|
|
52
|
+
**Why:** "Best practices" is static knowledge, not a multi-phase workflow. Commands orchestrate agents through pipeline phases.
|
|
53
|
+
|
|
54
|
+
### "Create a command that reviews X"
|
|
55
|
+
**Usually wrong:** Command. **Usually right:** Skill (review checklist).
|
|
56
|
+
**Exception:** If it's a multi-phase pipeline (analyze → review → report → remediate), then it IS a command.
|
|
57
|
+
**Ask:** "Is this a review checklist, or a multi-phase review pipeline?"
|
|
58
|
+
|
|
59
|
+
### "Create a command that acts as an X expert"
|
|
60
|
+
**Wrong:** Command. **Right:** Agent.
|
|
61
|
+
**Why:** "Acts as" = persona. Commands don't have identity. Agents do.
|
|
62
|
+
|
|
63
|
+
### "Create a rule for how to write good code"
|
|
64
|
+
**Wrong:** Rule. **Right:** Skill.
|
|
65
|
+
**Why:** Rules enforce specific constraints ("no bare any"). Skills teach practices ("how to write good code").
|
|
66
|
+
|
|
67
|
+
### "Create an agent that contains all MongoDB patterns"
|
|
68
|
+
**Wrong:** Agent. **Right:** Skill.
|
|
69
|
+
**Why:** A static body of knowledge is a skill. An agent has judgment and decision-making ability. The agent *loads* the skill.
|
|
70
|
+
|
|
71
|
+
## When to Redirect
|
|
72
|
+
|
|
73
|
+
If you classify and the user disagrees, don't argue. But explain your reasoning:
|
|
74
|
+
|
|
75
|
+
```
|
|
76
|
+
I'd suggest making this a skill rather than a command because:
|
|
77
|
+
- It's a body of knowledge (MongoDB patterns), not a multi-step workflow
|
|
78
|
+
- Skills are loaded on demand by agents — an agent can use this skill
|
|
79
|
+
- Commands orchestrate agents through phases, which isn't needed here
|
|
80
|
+
|
|
81
|
+
But if you prefer a command, I can scaffold one. What would you like?
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
## Type Relationship
|
|
85
|
+
|
|
86
|
+
Understanding how types relate helps classify correctly:
|
|
87
|
+
|
|
88
|
+
```
|
|
89
|
+
Commands orchestrate → Agents
|
|
90
|
+
Agents load → Skills
|
|
91
|
+
Rules constrain → All types
|
|
92
|
+
Evals validate → All types
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
- A command USES agents (assigns them to phases)
|
|
96
|
+
- An agent LOADS skills (references them in frontmatter)
|
|
97
|
+
- A rule CONSTRAINS any artifact (applies as a standard)
|
|
98
|
+
- An eval TESTS any artifact (validates it works)
|
|
@@ -0,0 +1,227 @@
|
|
|
1
|
+
# Writing Good Agents
|
|
2
|
+
|
|
3
|
+
An agent is an autonomous AI persona with identity, tools, and judgment. Unlike skills (passive knowledge) or rules (enforced constraints), agents reason independently, make decisions, and produce artifacts.
|
|
4
|
+
|
|
5
|
+
## Before / After: Identity
|
|
6
|
+
|
|
7
|
+
### Bad — thin identity
|
|
8
|
+
|
|
9
|
+
```yaml
|
|
10
|
+
identity:
|
|
11
|
+
role: "A code reviewer"
|
|
12
|
+
```
|
|
13
|
+
|
|
14
|
+
The agent has no personality, no memory model, no domain grounding. It will produce generic reviews indistinguishable from a raw LLM prompt.
|
|
15
|
+
|
|
16
|
+
### Good — four-field identity
|
|
17
|
+
|
|
18
|
+
```yaml
|
|
19
|
+
identity:
|
|
20
|
+
role: >
|
|
21
|
+
Senior backend engineer specializing in NestJS microservices
|
|
22
|
+
with 8+ years of production experience in multi-tenant SaaS platforms.
|
|
23
|
+
personality: >
|
|
24
|
+
Thorough but pragmatic. Flags critical issues firmly, treats style
|
|
25
|
+
preferences as suggestions. Explains *why* something matters, not
|
|
26
|
+
just *what* is wrong. Never condescending.
|
|
27
|
+
memory: >
|
|
28
|
+
Retains context about the current PR's changed files, related test
|
|
29
|
+
coverage, and the service's recent incident history when available.
|
|
30
|
+
experience: >
|
|
31
|
+
Has debugged N+1 query issues, auth bypass vulnerabilities, and
|
|
32
|
+
race conditions in event-driven architectures. Knows the difference
|
|
33
|
+
between a real bug and a nitpick.
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
Why this works: The four fields constrain behavior across multiple axes. The agent knows *what* it is, *how* it communicates, *what* it remembers, and *what* patterns it recognizes from "experience."
|
|
37
|
+
|
|
38
|
+
## Before / After: Mission
|
|
39
|
+
|
|
40
|
+
### Bad — vague mission
|
|
41
|
+
|
|
42
|
+
```yaml
|
|
43
|
+
mission: "Review code and find issues"
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
No domain scope, no outcome definition, no boundary.
|
|
47
|
+
|
|
48
|
+
### Good — concrete mission with domain, outcomes, scope
|
|
49
|
+
|
|
50
|
+
```yaml
|
|
51
|
+
mission:
|
|
52
|
+
domain: "Backend NestJS services in the payments domain"
|
|
53
|
+
outcomes:
|
|
54
|
+
- "Identify security vulnerabilities (auth bypass, injection, data leakage)"
|
|
55
|
+
- "Flag performance issues (N+1 queries, missing indexes, unbounded loops)"
|
|
56
|
+
- "Verify multi-tenancy scoping (locationId from auth context, not client)"
|
|
57
|
+
- "Ensure error handling follows platform patterns (no empty catch, structured responses)"
|
|
58
|
+
scope:
|
|
59
|
+
includes:
|
|
60
|
+
- "Changed files in the current PR"
|
|
61
|
+
- "Test files corresponding to changed source files"
|
|
62
|
+
excludes:
|
|
63
|
+
- "Generated files (*.generated.ts, migrations)"
|
|
64
|
+
- "Third-party library code"
|
|
65
|
+
- "Style/formatting issues (handled by linter)"
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
## Before / After: Communication Style
|
|
69
|
+
|
|
70
|
+
### Bad — no communication guidance
|
|
71
|
+
|
|
72
|
+
The agent dumps findings in an unstructured wall of text with no severity indicators.
|
|
73
|
+
|
|
74
|
+
### Good — structured output contract
|
|
75
|
+
|
|
76
|
+
```yaml
|
|
77
|
+
communication:
|
|
78
|
+
format: |
|
|
79
|
+
## Review Summary
|
|
80
|
+
**Risk Level:** CRITICAL | HIGH | MEDIUM | LOW
|
|
81
|
+
|
|
82
|
+
### Findings
|
|
83
|
+
For each finding:
|
|
84
|
+
- **Severity:** CRITICAL / HIGH / MEDIUM / LOW
|
|
85
|
+
- **File:** path/to/file.ts:lineNumber
|
|
86
|
+
- **Issue:** One-sentence description
|
|
87
|
+
- **Why:** Why this matters (security risk, data loss, performance)
|
|
88
|
+
- **Fix:** Concrete code suggestion
|
|
89
|
+
|
|
90
|
+
### Verdict
|
|
91
|
+
BLOCK (has CRITICAL) | APPROVE WITH COMMENTS | APPROVE
|
|
92
|
+
rules:
|
|
93
|
+
- "CRITICAL findings must include BLOCK verdict"
|
|
94
|
+
- "Never say 'looks good' without evidence of what was checked"
|
|
95
|
+
- "Maximum 10 findings per review — prioritize by severity"
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
## Anti-Pattern Catalog
|
|
99
|
+
|
|
100
|
+
### 1. God-Agent (Does Everything)
|
|
101
|
+
|
|
102
|
+
**Symptom:** One agent handles code review, testing, deployment, documentation, and security analysis.
|
|
103
|
+
|
|
104
|
+
**Fix:** Split by responsibility. Each agent should have one clear mission. A code reviewer does not deploy. A security reviewer does not write docs.
|
|
105
|
+
|
|
106
|
+
**Test:** If your agent's mission has more than 4 outcomes spanning unrelated domains, it's a god-agent.
|
|
107
|
+
|
|
108
|
+
### 2. Tool Hoarding
|
|
109
|
+
|
|
110
|
+
**Symptom:** Agent has 15 tools listed, uses 3 regularly.
|
|
111
|
+
|
|
112
|
+
**Fix:** Give agents only the tools they need for their mission. Extra tools waste context window and invite off-task behavior.
|
|
113
|
+
|
|
114
|
+
**Guideline:** 3-6 tools is typical. If an agent needs more than 8, it's probably a god-agent in disguise.
|
|
115
|
+
|
|
116
|
+
### 3. Missing Communication Style
|
|
117
|
+
|
|
118
|
+
**Symptom:** Agent produces output in unpredictable formats. Sometimes bullet lists, sometimes prose, sometimes JSON.
|
|
119
|
+
|
|
120
|
+
**Fix:** Define an explicit output contract. Specify the structure, severity labels, and verdict format. The consuming command or human needs to parse the output reliably.
|
|
121
|
+
|
|
122
|
+
### 4. No Measurable Metrics
|
|
123
|
+
|
|
124
|
+
**Symptom:** Agent mission says "improve code quality" with no way to verify.
|
|
125
|
+
|
|
126
|
+
**Fix:** Define observable outcomes:
|
|
127
|
+
- "Flag all uses of bare `any` type" (countable)
|
|
128
|
+
- "Identify missing test files for new source files" (binary per file)
|
|
129
|
+
- "Detect N+1 query patterns in repository methods" (specific pattern)
|
|
130
|
+
|
|
131
|
+
### 5. Generic Rules Without BLOCK/NEVER
|
|
132
|
+
|
|
133
|
+
**Symptom:** Agent instructions say "be careful with security" without specifying what triggers a block.
|
|
134
|
+
|
|
135
|
+
**Fix:** Use explicit behavioral boundaries:
|
|
136
|
+
|
|
137
|
+
```yaml
|
|
138
|
+
rules:
|
|
139
|
+
BLOCK:
|
|
140
|
+
- "Hardcoded secrets (API keys, passwords, tokens)"
|
|
141
|
+
- "Missing auth guard on new endpoints"
|
|
142
|
+
- "locationId from client input instead of auth context"
|
|
143
|
+
NEVER:
|
|
144
|
+
- "Never approve a PR with failing tests"
|
|
145
|
+
- "Never suggest disabling TypeScript strict mode"
|
|
146
|
+
PREFER:
|
|
147
|
+
- "Prefer suggesting fixes over just flagging issues"
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
### 6. No Error Handling Guidance
|
|
151
|
+
|
|
152
|
+
**Symptom:** Agent crashes or produces garbage when it encounters unexpected input (empty diff, binary files, massive files).
|
|
153
|
+
|
|
154
|
+
**Fix:** Define edge case behavior:
|
|
155
|
+
|
|
156
|
+
```yaml
|
|
157
|
+
edge_cases:
|
|
158
|
+
empty_diff: "Report 'No changes to review' and exit"
|
|
159
|
+
binary_files: "Skip with note: 'Binary file skipped: {path}'"
|
|
160
|
+
file_over_1000_lines: "Review only changed hunks, note that full file review was skipped"
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
## Scope Boundaries
|
|
164
|
+
|
|
165
|
+
### Agent vs Skill vs Command
|
|
166
|
+
|
|
167
|
+
| Dimension | Agent | Skill | Command |
|
|
168
|
+
|-----------|-------|-------|---------|
|
|
169
|
+
| **Has identity** | Yes | No | No |
|
|
170
|
+
| **Makes decisions** | Yes | No | Orchestrates decisions |
|
|
171
|
+
| **Loaded by** | Command or user | Agent or command | User directly |
|
|
172
|
+
| **Produces** | Findings, artifacts, verdicts | Nothing (passive reference) | End-to-end outcome |
|
|
173
|
+
| **Example** | Security reviewer | NestJS auth patterns | `/review-pr` |
|
|
174
|
+
|
|
175
|
+
### Decision Guide
|
|
176
|
+
|
|
177
|
+
- **Need autonomous judgment?** → Agent
|
|
178
|
+
- **Need reusable knowledge?** → Skill
|
|
179
|
+
- **Need a multi-step pipeline?** → Command (which uses agents)
|
|
180
|
+
- **Need an enforceable constraint?** → Rule
|
|
181
|
+
|
|
182
|
+
## Squad Assignment (1-9)
|
|
183
|
+
|
|
184
|
+
Squads group agents by domain for efficient coordination:
|
|
185
|
+
|
|
186
|
+
| Squad | Domain | Example Agents |
|
|
187
|
+
|-------|--------|----------------|
|
|
188
|
+
| 1 | Planning & Architecture | planner, architect |
|
|
189
|
+
| 2 | Implementation | coder, refactorer |
|
|
190
|
+
| 3 | Testing | tdd-guide, e2e-runner |
|
|
191
|
+
| 4 | Review & Quality | code-reviewer, security-reviewer |
|
|
192
|
+
| 5 | DevOps & Infrastructure | deployer, build-resolver |
|
|
193
|
+
| 6 | Documentation | doc-updater |
|
|
194
|
+
| 7 | Data & Analytics | data-reviewer |
|
|
195
|
+
| 8 | Frontend | ui-reviewer, a11y-checker |
|
|
196
|
+
| 9 | Coordination | command coordinators |
|
|
197
|
+
|
|
198
|
+
**Rules:**
|
|
199
|
+
- Agents in the same squad share domain context and can hand off seamlessly.
|
|
200
|
+
- Cross-squad communication goes through the coordinator (squad 9).
|
|
201
|
+
- An agent belongs to exactly one squad.
|
|
202
|
+
|
|
203
|
+
## Model Tier Selection
|
|
204
|
+
|
|
205
|
+
| Tier | Model | Use For | Cost Signal |
|
|
206
|
+
|------|-------|---------|-------------|
|
|
207
|
+
| **Coordinator** | Opus | Orchestration, judgment calls, architectural decisions, conflict resolution | High |
|
|
208
|
+
| **Worker** | Sonnet | Code generation, implementation, detailed review, refactoring | Medium |
|
|
209
|
+
| **Checker** | Haiku | Checklists, linting-style checks, simple validations, formatting | Low |
|
|
210
|
+
|
|
211
|
+
**Guidelines:**
|
|
212
|
+
- Coordinators (Opus) make judgment calls and resolve ambiguity. They do not write code.
|
|
213
|
+
- Workers (Sonnet) do the heavy lifting. Most agents are workers.
|
|
214
|
+
- Checkers (Haiku) handle mechanical tasks. Use when the task is deterministic and the instructions are clear enough for the smallest model.
|
|
215
|
+
- If a Haiku-tier agent produces inconsistent results, promote to Sonnet. If Sonnet can't handle the judgment, promote to Opus.
|
|
216
|
+
|
|
217
|
+
## Agent Quality Checklist
|
|
218
|
+
|
|
219
|
+
- [ ] Four-field identity (role, personality, memory, experience)
|
|
220
|
+
- [ ] Concrete mission with domain, outcomes, and scope boundaries
|
|
221
|
+
- [ ] 3-6 tools (no hoarding)
|
|
222
|
+
- [ ] Explicit output contract with structure and severity levels
|
|
223
|
+
- [ ] BLOCK/NEVER/PREFER behavioral rules
|
|
224
|
+
- [ ] Edge case handling defined
|
|
225
|
+
- [ ] Model tier justified (not defaulting to Opus for everything)
|
|
226
|
+
- [ ] Squad assignment documented
|
|
227
|
+
- [ ] Tested with representative inputs including edge cases
|
|
@@ -0,0 +1,258 @@
|
|
|
1
|
+
# Writing Good Commands
|
|
2
|
+
|
|
3
|
+
A command is a user-facing pipeline that orchestrates agents, skills, and tools through defined phases to produce an end-to-end outcome. Commands are what users invoke (e.g., `/review-pr`, `/implement-feature`).
|
|
4
|
+
|
|
5
|
+
## Before / After: Command Structure
|
|
6
|
+
|
|
7
|
+
### Bad — monolith command
|
|
8
|
+
|
|
9
|
+
```yaml
|
|
10
|
+
name: review-pr
|
|
11
|
+
phases:
|
|
12
|
+
- name: "Review"
|
|
13
|
+
description: "Review the PR and provide feedback"
|
|
14
|
+
agents: [code-reviewer, security-reviewer, performance-reviewer, test-reviewer, doc-reviewer]
|
|
15
|
+
steps:
|
|
16
|
+
- "Load the PR diff"
|
|
17
|
+
- "Review everything"
|
|
18
|
+
- "Output findings"
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
Problems: One phase does everything. No checkpoints. No skill loading. Five agents compete for context with no coordination. No error handling. The user sees nothing until the end.
|
|
22
|
+
|
|
23
|
+
### Good — phased pipeline with checkpoints
|
|
24
|
+
|
|
25
|
+
```yaml
|
|
26
|
+
name: review-pr
|
|
27
|
+
phases:
|
|
28
|
+
- name: "Context"
|
|
29
|
+
description: "Load PR context and determine review scope"
|
|
30
|
+
agent: coordinator
|
|
31
|
+
model: opus
|
|
32
|
+
steps:
|
|
33
|
+
- "Fetch PR diff and metadata via gh CLI"
|
|
34
|
+
- "Classify changed files by domain (backend, frontend, infra, test)"
|
|
35
|
+
- "Load relevant skills based on file types"
|
|
36
|
+
- "Determine which review agents are needed"
|
|
37
|
+
gate: "coordinator confirms scope and agent roster before proceeding"
|
|
38
|
+
|
|
39
|
+
- name: "Review"
|
|
40
|
+
description: "Parallel domain-specific reviews"
|
|
41
|
+
agents:
|
|
42
|
+
- code-reviewer (changed backend files)
|
|
43
|
+
- security-reviewer (auth/input/secret files)
|
|
44
|
+
model: sonnet
|
|
45
|
+
execution: parallel
|
|
46
|
+
steps:
|
|
47
|
+
- "Each agent reviews its assigned files"
|
|
48
|
+
- "Each agent produces structured findings with severity"
|
|
49
|
+
|
|
50
|
+
- name: "Synthesis"
|
|
51
|
+
description: "Merge findings and produce verdict"
|
|
52
|
+
agent: coordinator
|
|
53
|
+
model: opus
|
|
54
|
+
steps:
|
|
55
|
+
- "Collect findings from all reviewers"
|
|
56
|
+
- "Deduplicate overlapping findings"
|
|
57
|
+
- "Assign final verdict: BLOCK / APPROVE WITH COMMENTS / APPROVE"
|
|
58
|
+
checkpoint: "Present findings to user for confirmation before posting"
|
|
59
|
+
|
|
60
|
+
- name: "Publish"
|
|
61
|
+
description: "Post review to GitHub"
|
|
62
|
+
agent: coordinator
|
|
63
|
+
steps:
|
|
64
|
+
- "Format findings as PR review comments"
|
|
65
|
+
- "Post via gh CLI"
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
## Phase Design Patterns
|
|
69
|
+
|
|
70
|
+
### Linear Pipeline
|
|
71
|
+
|
|
72
|
+
Phases execute sequentially. Output of phase N is input to phase N+1.
|
|
73
|
+
|
|
74
|
+
```
|
|
75
|
+
Context → Plan → Implement → Test → Review → Commit
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
**Use when:** Tasks have natural ordering where later phases depend on earlier results.
|
|
79
|
+
|
|
80
|
+
### Interactive Loop
|
|
81
|
+
|
|
82
|
+
A phase repeats until a condition is met, with human input between iterations.
|
|
83
|
+
|
|
84
|
+
```
|
|
85
|
+
Draft → [checkpoint: user reviews] → Revise → [checkpoint] → ... → Approve
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
**Use when:** Output quality requires human judgment (e.g., PRD drafting, design review).
|
|
89
|
+
|
|
90
|
+
### Parallel Fan-Out
|
|
91
|
+
|
|
92
|
+
Multiple agents work simultaneously on independent subtasks, then results merge.
|
|
93
|
+
|
|
94
|
+
```
|
|
95
|
+
Context → [security-review | code-review | perf-review] → Synthesis
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
**Use when:** Subtasks are independent and can run concurrently. Always follow with a synthesis phase.
|
|
99
|
+
|
|
100
|
+
### MCP-Driven
|
|
101
|
+
|
|
102
|
+
Phases interact with external systems (GitHub, Jenkins, Grafana) via MCP tools.
|
|
103
|
+
|
|
104
|
+
```
|
|
105
|
+
Fetch PR → Review → Post Comments → Trigger Build → Monitor
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
**Use when:** The command integrates with external services and needs to react to their responses.
|
|
109
|
+
|
|
110
|
+
## Anti-Pattern Catalog
|
|
111
|
+
|
|
112
|
+
### 1. No Checkpoints
|
|
113
|
+
|
|
114
|
+
**Symptom:** Command runs 10 minutes, produces wrong output, user has no opportunity to correct course.
|
|
115
|
+
|
|
116
|
+
**Fix:** Add at least 1 human checkpoint. Place it after the most consequential decision (usually after planning or before publishing).
|
|
117
|
+
|
|
118
|
+
**Rule of thumb:** Minimum 1 checkpoint, maximum 3. More than 3 creates friction. Fewer than 1 creates risk.
|
|
119
|
+
|
|
120
|
+
### 2. No Skill Loading Gate
|
|
121
|
+
|
|
122
|
+
**Symptom:** Agents start working without loading relevant skills. They use generic knowledge instead of your team's patterns.
|
|
123
|
+
|
|
124
|
+
**Fix:** Add a Context phase that classifies the task and loads appropriate skills before agents begin work.
|
|
125
|
+
|
|
126
|
+
```yaml
|
|
127
|
+
# BAD: agents start cold
|
|
128
|
+
phases:
|
|
129
|
+
- name: "Review"
|
|
130
|
+
agents: [code-reviewer]
|
|
131
|
+
|
|
132
|
+
# GOOD: context phase loads skills first
|
|
133
|
+
phases:
|
|
134
|
+
- name: "Context"
|
|
135
|
+
steps:
|
|
136
|
+
- "Classify changed files by domain"
|
|
137
|
+
- "Load skills: nestjs-patterns, mongoose-queries (based on file types)"
|
|
138
|
+
- name: "Review"
|
|
139
|
+
agents: [code-reviewer] # now has loaded skills in context
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
### 3. No Error Handling
|
|
143
|
+
|
|
144
|
+
**Symptom:** If `gh pr diff` fails (network error, auth issue), the command crashes with no recovery.
|
|
145
|
+
|
|
146
|
+
**Fix:** Define fallback behavior for each phase:
|
|
147
|
+
|
|
148
|
+
```yaml
|
|
149
|
+
error_handling:
|
|
150
|
+
network_failure: "Retry once, then report error to user with diagnostic info"
|
|
151
|
+
empty_diff: "Report 'No changes found' and exit gracefully"
|
|
152
|
+
agent_timeout: "Use partial results, note incomplete review in output"
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
### 4. Silent Execution
|
|
156
|
+
|
|
157
|
+
**Symptom:** User invokes command and sees nothing for 5 minutes.
|
|
158
|
+
|
|
159
|
+
**Fix:** Each phase should emit a progress signal:
|
|
160
|
+
|
|
161
|
+
```yaml
|
|
162
|
+
phases:
|
|
163
|
+
- name: "Context"
|
|
164
|
+
on_start: "Fetching PR #{{pr_number}} context..."
|
|
165
|
+
on_complete: "Scope: {{file_count}} files across {{domains}} domains"
|
|
166
|
+
- name: "Review"
|
|
167
|
+
on_start: "Running {{agent_count}} reviewers in parallel..."
|
|
168
|
+
on_complete: "Found {{finding_count}} findings ({{critical_count}} critical)"
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
### 5. Too Many Agents
|
|
172
|
+
|
|
173
|
+
**Symptom:** Command uses 7 agents, each invoked once. Context window is bloated with identity setup for agents that do minimal work.
|
|
174
|
+
|
|
175
|
+
**Fix:** Follow the agent roster rules below. If an agent does one small task, consider making it a step within another agent's workflow instead.
|
|
176
|
+
|
|
177
|
+
### 6. No Phase Gates
|
|
178
|
+
|
|
179
|
+
**Symptom:** Phase 2 proceeds even when Phase 1 produced garbage (e.g., empty plan, failed fetch).
|
|
180
|
+
|
|
181
|
+
**Fix:** Add gates between phases:
|
|
182
|
+
|
|
183
|
+
```yaml
|
|
184
|
+
gate: "Plan must contain at least 3 implementation steps and a test strategy"
|
|
185
|
+
```
|
|
186
|
+
|
|
187
|
+
If the gate fails, the command stops and reports to the user rather than wasting tokens on doomed downstream phases.
|
|
188
|
+
|
|
189
|
+
## Agent Roster Rules
|
|
190
|
+
|
|
191
|
+
### Minimize Unique Agents
|
|
192
|
+
|
|
193
|
+
Every unique agent added to a command costs context window (identity, mission, tools, rules). Use the minimum set.
|
|
194
|
+
|
|
195
|
+
| Command Complexity | Recommended Agent Count |
|
|
196
|
+
|-------------------|------------------------|
|
|
197
|
+
| Simple (single-domain) | 1-2 |
|
|
198
|
+
| Medium (cross-domain) | 2-3 |
|
|
199
|
+
| Complex (full pipeline) | 3-5 |
|
|
200
|
+
|
|
201
|
+
### Max 2-3 Uses Per Agent
|
|
202
|
+
|
|
203
|
+
If an agent is used more than 3 times in a command, it's doing too much. Either:
|
|
204
|
+
- Merge those phases into one agent invocation
|
|
205
|
+
- Split the agent's responsibilities
|
|
206
|
+
|
|
207
|
+
### Coordinator Is Opus
|
|
208
|
+
|
|
209
|
+
The coordinating agent (phase routing, synthesis, conflict resolution) should run on Opus. Worker agents run on Sonnet. Checklist agents run on Haiku.
|
|
210
|
+
|
|
211
|
+
```yaml
|
|
212
|
+
# GOOD: tiered model assignment
|
|
213
|
+
agents:
|
|
214
|
+
coordinator: { model: opus, uses: [context, synthesis, publish] }
|
|
215
|
+
code-reviewer: { model: sonnet, uses: [review] }
|
|
216
|
+
lint-checker: { model: haiku, uses: [formatting-check] }
|
|
217
|
+
```
|
|
218
|
+
|
|
219
|
+
## Human Checkpoint Guidance
|
|
220
|
+
|
|
221
|
+
### Minimum: 1 Checkpoint
|
|
222
|
+
|
|
223
|
+
Every command that produces artifacts visible to others (PR comments, commits, messages) must have at least one checkpoint before publishing.
|
|
224
|
+
|
|
225
|
+
### Maximum: 3 Checkpoints
|
|
226
|
+
|
|
227
|
+
More than 3 checkpoints turns an automated command into a manual process. If you need that much human oversight, the command is not well-defined enough.
|
|
228
|
+
|
|
229
|
+
### Where to Place Checkpoints
|
|
230
|
+
|
|
231
|
+
| Placement | When |
|
|
232
|
+
|-----------|------|
|
|
233
|
+
| After planning | When the plan determines all downstream work |
|
|
234
|
+
| After review/synthesis | When findings will be published externally |
|
|
235
|
+
| Before destructive actions | Commits, deployments, PR comments |
|
|
236
|
+
|
|
237
|
+
### Checkpoint Format
|
|
238
|
+
|
|
239
|
+
```yaml
|
|
240
|
+
checkpoint:
|
|
241
|
+
display: "Summary of what was done and what will happen next"
|
|
242
|
+
options:
|
|
243
|
+
- "proceed" — continue to next phase
|
|
244
|
+
- "revise" — re-run current phase with feedback
|
|
245
|
+
- "abort" — stop command, preserve artifacts so far
|
|
246
|
+
```
|
|
247
|
+
|
|
248
|
+
## Command Quality Checklist
|
|
249
|
+
|
|
250
|
+
- [ ] Minimum 2 phases (context + execution at minimum)
|
|
251
|
+
- [ ] Skill loading gate in first phase
|
|
252
|
+
- [ ] 1-3 human checkpoints at consequential decision points
|
|
253
|
+
- [ ] Progress signals on every phase (on_start, on_complete)
|
|
254
|
+
- [ ] Error handling with fallbacks for network, empty input, timeouts
|
|
255
|
+
- [ ] Agent count justified (not exceeding 5 for complex commands)
|
|
256
|
+
- [ ] Coordinator on Opus, workers on Sonnet, checkers on Haiku
|
|
257
|
+
- [ ] Gates between phases to prevent garbage propagation
|
|
258
|
+
- [ ] Each agent used 1-3 times (not more)
|