agent-directives 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +385 -0
- package/directives/adaptive-routing.md +361 -0
- package/directives/architecture-boundaries.md +223 -0
- package/directives/codebase-navigation.md +325 -0
- package/directives/context-handoff.md +220 -0
- package/directives/error-memory.md +169 -0
- package/directives/exploration-mode.md +266 -0
- package/directives/session-decisions.md +193 -0
- package/directives/specification-driven-development.md +278 -0
- package/directives/task-framing.md +154 -0
- package/directives/test-driven-development.md +305 -0
- package/directives/type-driven-development.md +173 -0
- package/directives/verification.md +266 -0
- package/directives/workspace-isolation.md +219 -0
- package/dist/cli.d.ts +3 -0
- package/dist/cli.d.ts.map +1 -0
- package/dist/cli.js +232 -0
- package/dist/cli.js.map +1 -0
- package/dist/context-audit.d.ts +30 -0
- package/dist/context-audit.d.ts.map +1 -0
- package/dist/context-audit.js +75 -0
- package/dist/context-audit.js.map +1 -0
- package/dist/install.d.ts +18 -0
- package/dist/install.d.ts.map +1 -0
- package/dist/install.js +28 -0
- package/dist/install.js.map +1 -0
- package/dist/manifest.d.ts +25 -0
- package/dist/manifest.d.ts.map +1 -0
- package/dist/manifest.js +29 -0
- package/dist/manifest.js.map +1 -0
- package/dist/prompt.d.ts +3 -0
- package/dist/prompt.d.ts.map +1 -0
- package/dist/prompt.js +29 -0
- package/dist/prompt.js.map +1 -0
- package/dist/targets.d.ts +10 -0
- package/dist/targets.d.ts.map +1 -0
- package/dist/targets.js +32 -0
- package/dist/targets.js.map +1 -0
- package/manifest.json +387 -0
- package/package.json +74 -0
- package/skills/architecture-boundary-reviewer/SKILL.md +228 -0
- package/skills/code-reviewer/SKILL.md +77 -0
- package/skills/codebase-health-reviewer/SKILL.md +234 -0
- package/skills/harness-hooks-reviewer/SKILL.md +159 -0
- package/skills/implementation-task-planner/SKILL.md +205 -0
- package/skills/mcp-integration-reviewer/SKILL.md +157 -0
- package/skills/product-requirements-writer/SKILL.md +205 -0
- package/skills/production-readiness-reviewer/SKILL.md +240 -0
- package/skills/self-audit/SKILL.md +134 -0
- package/skills/spec-reviewer/SKILL.md +304 -0
- package/skills/subagent-driven-development/SKILL.md +236 -0
- package/skills/systematic-debugging/SKILL.md +313 -0
- package/skills/test-reviewer/SKILL.md +293 -0
- package/templates/AGENTS.md +120 -0
- package/templates/CLAUDE.md +115 -0
- package/templates/copilot-instructions.md +116 -0
- package/templates/decision-log.md +44 -0
|
@@ -0,0 +1,134 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: "self-audit"
|
|
3
|
+
description: "Load when implementation is past GREEN/REFACTOR and the user asks for pre-PR verification, self-audit, scope check, weakest-assumption review, anomaly triage, or a confidence check."
|
|
4
|
+
version: 1.0.0
|
|
5
|
+
required: false
|
|
6
|
+
category: review
|
|
7
|
+
tools:
|
|
8
|
+
- claude
|
|
9
|
+
- copilot
|
|
10
|
+
- codex
|
|
11
|
+
- cursor
|
|
12
|
+
routing:
|
|
13
|
+
triggers:
|
|
14
|
+
- after-refactor
|
|
15
|
+
- before-verification
|
|
16
|
+
- full-path
|
|
17
|
+
- pre-pr
|
|
18
|
+
paths:
|
|
19
|
+
- full-path
|
|
20
|
+
---
|
|
21
|
+
|
|
22
|
+
# Self-Audit
|
|
23
|
+
|
|
24
|
+
After GREEN/REFACTOR, before verification. This is a triage point — some
|
|
25
|
+
findings loop back to TDD, others flow forward into the PR body.
|
|
26
|
+
|
|
27
|
+
```
|
|
28
|
+
TDD (RED → GREEN → REFACTOR)
|
|
29
|
+
│
|
|
30
|
+
▼
|
|
31
|
+
SELF-AUDIT (triage)
|
|
32
|
+
│
|
|
33
|
+
├─ 🔁 Fix now ──▶ RED (one targeted TDD cycle)
|
|
34
|
+
│ │
|
|
35
|
+
│ ▼
|
|
36
|
+
│ SELF-AUDIT pass 2 (document only)
|
|
37
|
+
│ │
|
|
38
|
+
├─ 📋 Document ────────┤
|
|
39
|
+
│ │
|
|
40
|
+
├─ 🧑 Ask human ───────┤
|
|
41
|
+
│ ▼
|
|
42
|
+
└──────────────▶ Verification → PR
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
**One-loop-max:** Pass 1 triages. If it sends a fix to RED, pass 2 is
|
|
46
|
+
documentation only. There is no pass 3.
|
|
47
|
+
|
|
48
|
+
---
|
|
49
|
+
|
|
50
|
+
## The Jenga Test (always required)
|
|
51
|
+
|
|
52
|
+
Name the **single weakest assumption** in your implementation — the block
|
|
53
|
+
that, if pulled, collapses the most.
|
|
54
|
+
|
|
55
|
+
For each entry, state:
|
|
56
|
+
|
|
57
|
+
- **Weakest assumption** — specific and falsifiable, not vague
|
|
58
|
+
- **It would break if** — the concrete condition that makes it false
|
|
59
|
+
- **Evidence supporting it** — what you verified, or "none"
|
|
60
|
+
- **Routing** — 🔁 Fix now / 📋 Document / 🧑 Ask human
|
|
61
|
+
|
|
62
|
+
If you can't identify a weak assumption, that *is* the Jenga entry:
|
|
63
|
+
"My assumption is that I have no weak assumptions."
|
|
64
|
+
|
|
65
|
+
### Routing criteria
|
|
66
|
+
|
|
67
|
+
- **🔁 Fix now** — One TDD cycle. In scope. Shipping without it is irresponsible.
|
|
68
|
+
- **📋 Document** — Architectural, out of scope, or multi-cycle. Known gap, not a blocker.
|
|
69
|
+
- **🧑 Ask human** — Can't assess fixability, or the fix changes the approach.
|
|
70
|
+
|
|
71
|
+
---
|
|
72
|
+
|
|
73
|
+
## Anomaly Register (required when anomalies exist)
|
|
74
|
+
|
|
75
|
+
Log every warning, deprecation notice, flaky test, or unexpected side effect
|
|
76
|
+
observed during the TDD cycle. For each, record what it was, whether it's new
|
|
77
|
+
or recurring, what it might signal, and a routing decision.
|
|
78
|
+
|
|
79
|
+
**"It's always been like that" is not a valid disposition.** Recurring anomalies
|
|
80
|
+
get the highest suspicion, not the lowest.
|
|
81
|
+
|
|
82
|
+
A suspiciously empty register is itself a signal.
|
|
83
|
+
|
|
84
|
+
---
|
|
85
|
+
|
|
86
|
+
## Diff and Boundary Reality Check (required when code changed)
|
|
87
|
+
|
|
88
|
+
Before finalizing self-audit, inspect the actual diff. If `difit` is available,
|
|
89
|
+
prefer it for a local GitHub-style review:
|
|
90
|
+
|
|
91
|
+
```bash
|
|
92
|
+
npx difit .
|
|
93
|
+
npx difit staged
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
Use the diff to look for:
|
|
97
|
+
|
|
98
|
+
- unrelated edits that expanded beyond the task
|
|
99
|
+
- imports or exports that cross an architectural boundary
|
|
100
|
+
- missing tests adjacent to changed behavior
|
|
101
|
+
- public API changes not reflected in docs or verification
|
|
102
|
+
- risky deletions, broad rewrites, or new shared utilities
|
|
103
|
+
|
|
104
|
+
If Fallow is available in a TypeScript/JavaScript project, use relevant summary
|
|
105
|
+
checks as self-audit evidence for architecture drift, dead code, duplication, and
|
|
106
|
+
cycles. Route any boundary uncertainty into the Jenga Test.
|
|
107
|
+
|
|
108
|
+
---
|
|
109
|
+
|
|
110
|
+
## Sunk Cost Check (required after 3+ TDD cycles in a session)
|
|
111
|
+
|
|
112
|
+
Assess trajectory across cycles. If two or more of these are true, surface it:
|
|
113
|
+
|
|
114
|
+
1. Jenga entries are getting more severe each cycle
|
|
115
|
+
2. Anomaly Register is growing rather than stabilizing
|
|
116
|
+
3. Later cycles work around limitations of earlier cycles
|
|
117
|
+
|
|
118
|
+
The question to surface: *"If I started fresh with what I know now, would I
|
|
119
|
+
choose this same approach?"* The human decides. You surface.
|
|
120
|
+
|
|
121
|
+
---
|
|
122
|
+
|
|
123
|
+
## Output Routing
|
|
124
|
+
|
|
125
|
+
Each destination fires on a specific condition:
|
|
126
|
+
|
|
127
|
+
| When | Route to | What |
|
|
128
|
+
| --- | --- | --- |
|
|
129
|
+
| **Always**, when opening a PR and self-audit produced routed findings | `## Self-Audit` in the PR body, **before** `## Verification` | Full Jenga + Anomaly Register + Sunk Cost (if triggered). Reviewer sees uncertainty before proof. |
|
|
130
|
+
| **When** self-audit has no routed findings | One-line PR note | `Self-audit completed; no routed findings.` Avoid boilerplate sections with no information. |
|
|
131
|
+
| **Always**, when running verification after self-audit | Verification focus areas (same session) | Verification's functional proof must target any 📋 documented Jenga assumption. |
|
|
132
|
+
| **When** an anomaly matches one you've seen in a previous PR's self-audit | `docs/ERRORS.md` (error-memory format) | Recurrence across PRs promotes an anomaly from one-time observation to systemic pattern. Check by grepping recent merged PRs for the same warning text. |
|
|
133
|
+
| **When** the human decides to change approach after a Sunk Cost Signal | `docs/decisions/` (session-decisions format) | Captures why the approach changed. If the human says "continue," no log needed — the signal is already in the PR body. |
|
|
134
|
+
| **When** starting work in a module that has been self-audited before (during codebase navigation) | Read previous `## Self-Audit` sections from recent merged PRs | Previous Jenga entries are the known weak spots. If your change makes a previous break condition more likely, include it in your own self-audit. |
|
|
@@ -0,0 +1,304 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: "spec-reviewer"
|
|
3
|
+
description: "Load when the user asks whether implementation matches a spec, requirements doc, acceptance criteria, or design plan, or says check what is missing, incomplete, or divergent before merge."
|
|
4
|
+
version: 1.1.0
|
|
5
|
+
required: false
|
|
6
|
+
category: review
|
|
7
|
+
tools:
|
|
8
|
+
- claude
|
|
9
|
+
- copilot
|
|
10
|
+
- codex
|
|
11
|
+
- cursor
|
|
12
|
+
routing:
|
|
13
|
+
triggers:
|
|
14
|
+
- written-spec
|
|
15
|
+
- specification
|
|
16
|
+
- acceptance-criteria
|
|
17
|
+
- design-review
|
|
18
|
+
paths:
|
|
19
|
+
- full-path
|
|
20
|
+
- review-path
|
|
21
|
+
---
|
|
22
|
+
|
|
23
|
+
## Review Depth
|
|
24
|
+
|
|
25
|
+
Default to the lightest useful review.
|
|
26
|
+
|
|
27
|
+
### Fast Path
|
|
28
|
+
Use only when the change is small, localized, low-risk, and project gates are already passing or not relevant.
|
|
29
|
+
|
|
30
|
+
Output:
|
|
31
|
+
- Top 1-3 material findings only
|
|
32
|
+
- `No material findings` if clean
|
|
33
|
+
- Verification gaps only when they affect merge confidence
|
|
34
|
+
|
|
35
|
+
Do not emit the full checklist when there are no findings.
|
|
36
|
+
|
|
37
|
+
### Deep Path
|
|
38
|
+
Use the full review process when the change is high-risk, cross-cutting, production-sensitive, security/data-sensitive, behavior-changing without adequate tests, has failing or missing gates, or is explicitly requested.
|
|
39
|
+
|
|
40
|
+
# Spec Reviewer
|
|
41
|
+
|
|
42
|
+
You are a specialist in reviewing whether an implementation matches its written
|
|
43
|
+
specification. Your primary focus is ensuring every requirement has code, every
|
|
44
|
+
scenario has coverage, and the implementation follows the design it was built
|
|
45
|
+
against.
|
|
46
|
+
|
|
47
|
+
This skill complements the test-reviewer skill. Test-reviewer catches bad tests.
|
|
48
|
+
Spec-reviewer catches missing or divergent implementations.
|
|
49
|
+
|
|
50
|
+
## Core Principle: The Spec Is the Contract
|
|
51
|
+
|
|
52
|
+
The specification is the agreement between intent and implementation. If the
|
|
53
|
+
code doesn't match the spec, one of them is wrong — and you need to identify
|
|
54
|
+
which. The spec is not aspirational; it is the contract.
|
|
55
|
+
|
|
56
|
+
---
|
|
57
|
+
|
|
58
|
+
## Three-Dimensional Review
|
|
59
|
+
|
|
60
|
+
Every spec review checks three dimensions. Each has its own severity level.
|
|
61
|
+
|
|
62
|
+
### Dimension 1: Completeness (CRITICAL)
|
|
63
|
+
|
|
64
|
+
**Question:** Is everything the spec requires actually implemented?
|
|
65
|
+
|
|
66
|
+
#### Check 1: Requirement Coverage
|
|
67
|
+
|
|
68
|
+
For each requirement in the specification:
|
|
69
|
+
|
|
70
|
+
1. **Find the requirement** — look for `### Requirement:` or similar markers
|
|
71
|
+
2. **Search for implementation evidence** — grep for keywords, class names,
|
|
72
|
+
function names, or behavior described in the requirement
|
|
73
|
+
3. **Assess coverage:**
|
|
74
|
+
- **Found** — implementation exists, note the file and line range
|
|
75
|
+
- **Partial** — some aspects implemented, others missing
|
|
76
|
+
- **Missing** — no evidence of implementation
|
|
77
|
+
|
|
78
|
+
```
|
|
79
|
+
### Requirement: User authentication
|
|
80
|
+
Status: FOUND
|
|
81
|
+
Evidence: src/auth/login.ts:45-82, src/auth/session.ts:12-34
|
|
82
|
+
|
|
83
|
+
### Requirement: Password reset flow
|
|
84
|
+
Status: PARTIAL
|
|
85
|
+
Evidence: src/auth/reset.ts:1-30 (token generation only, email sending missing)
|
|
86
|
+
|
|
87
|
+
### Requirement: Rate limiting on login attempts
|
|
88
|
+
Status: MISSING
|
|
89
|
+
Evidence: No rate-limiting middleware found in auth routes
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
#### Check 2: Scenario Coverage
|
|
93
|
+
|
|
94
|
+
For each scenario in the specification:
|
|
95
|
+
|
|
96
|
+
1. **Find the scenario** — look for `#### Scenario:` or `WHEN/THEN` patterns
|
|
97
|
+
2. **Check for test coverage** — does a test verify this scenario?
|
|
98
|
+
3. **Check for implementation coverage** — does the code handle this case?
|
|
99
|
+
|
|
100
|
+
| Scenario Status | Meaning |
|
|
101
|
+
| ------------------ | -------------------------------------- |
|
|
102
|
+
| Covered | Both test and implementation exist |
|
|
103
|
+
| Untested | Implementation exists, no test |
|
|
104
|
+
| Unimplemented | Test exists (possibly skipped), no code |
|
|
105
|
+
| Missing | Neither test nor implementation exists |
|
|
106
|
+
|
|
107
|
+
### Dimension 2: Correctness (WARNING)
|
|
108
|
+
|
|
109
|
+
**Question:** Does the code do what the spec says, or something different?
|
|
110
|
+
|
|
111
|
+
#### Check 3: Implementation-Spec Alignment
|
|
112
|
+
|
|
113
|
+
For each implemented requirement:
|
|
114
|
+
|
|
115
|
+
1. Read the specification's description of expected behavior
|
|
116
|
+
2. Read the implementation
|
|
117
|
+
3. Compare: does the code produce the behavior the spec describes?
|
|
118
|
+
|
|
119
|
+
```typescript
|
|
120
|
+
// Spec says: "The system SHALL return a 409 Conflict when creating
|
|
121
|
+
// a user with an email that already exists."
|
|
122
|
+
|
|
123
|
+
// Implementation review:
|
|
124
|
+
// src/users/create.ts:67-72
|
|
125
|
+
if (existingUser) {
|
|
126
|
+
return { status: 409, body: { error: "User exists" } };
|
|
127
|
+
}
|
|
128
|
+
|
|
129
|
+
// DIVERGENCE: Error message is generic "User exists" but spec might
|
|
130
|
+
// expect "Email already registered" — check spec for exact wording.
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
**Key signals of divergence:**
|
|
134
|
+
|
|
135
|
+
- Different error messages or status codes than the spec describes
|
|
136
|
+
- Different function signatures or return types than the spec defines
|
|
137
|
+
- Different ordering or flow than the spec prescribes
|
|
138
|
+
- Different edge case handling than the spec requires
|
|
139
|
+
- Implementation handles cases the spec doesn't mention (scope creep)
|
|
140
|
+
- Implementation skips cases the spec requires (incomplete)
|
|
141
|
+
|
|
142
|
+
#### Check 4: Scenario Behavior Matching
|
|
143
|
+
|
|
144
|
+
For each testable scenario:
|
|
145
|
+
|
|
146
|
+
1. Read the scenario's expected outcome
|
|
147
|
+
2. Read the corresponding test (if it exists)
|
|
148
|
+
3. Does the test actually verify what the scenario describes?
|
|
149
|
+
|
|
150
|
+
```
|
|
151
|
+
Scenario: "User submits empty registration form"
|
|
152
|
+
Expected: "The system SHALL return validation errors for each required field"
|
|
153
|
+
Test: it("should reject empty form", () => { ... })
|
|
154
|
+
|
|
155
|
+
Issue: Test checks that status is 400 but does not verify that ALL
|
|
156
|
+
required fields have error messages. Scenario expects per-field errors.
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
### Dimension 3: Coherence (SUGGESTION)
|
|
160
|
+
|
|
161
|
+
**Question:** Does the implementation follow the design decisions?
|
|
162
|
+
|
|
163
|
+
#### Check 5: Design Adherence
|
|
164
|
+
|
|
165
|
+
If a design document exists:
|
|
166
|
+
|
|
167
|
+
1. Extract key decisions (look for "Decision:", "Approach:", "Architecture:",
|
|
168
|
+
"Pattern:")
|
|
169
|
+
2. Verify the implementation follows those decisions
|
|
170
|
+
3. If it contradicts a decision, flag it
|
|
171
|
+
|
|
172
|
+
```
|
|
173
|
+
Design says: "Use repository pattern for data access"
|
|
174
|
+
Implementation: Direct SQL queries in route handlers
|
|
175
|
+
|
|
176
|
+
DIVERGENCE: Design specifies repository pattern but implementation
|
|
177
|
+
uses inline queries in src/routes/users.ts:34-41
|
|
178
|
+
```
|
|
179
|
+
|
|
180
|
+
#### Check 6: Pattern Consistency
|
|
181
|
+
|
|
182
|
+
Review new code for consistency with project patterns:
|
|
183
|
+
|
|
184
|
+
- File naming and directory structure
|
|
185
|
+
- Error handling approach
|
|
186
|
+
- Logging patterns
|
|
187
|
+
- Import/export conventions
|
|
188
|
+
- Configuration patterns
|
|
189
|
+
|
|
190
|
+
---
|
|
191
|
+
|
|
192
|
+
## Review Process
|
|
193
|
+
|
|
194
|
+
For every spec review:
|
|
195
|
+
|
|
196
|
+
1. **Read the specification** — understand all requirements and scenarios
|
|
197
|
+
2. **Read the design** (if it exists) — understand architectural decisions
|
|
198
|
+
3. **Map requirements to code** — completeness check
|
|
199
|
+
4. **Map scenarios to tests** — scenario coverage check
|
|
200
|
+
5. **Spot-check implementations** — correctness check on critical paths
|
|
201
|
+
6. **Check design adherence** — coherence check
|
|
202
|
+
7. **Generate the review report** — structured output below
|
|
203
|
+
|
|
204
|
+
---
|
|
205
|
+
|
|
206
|
+
## Output Format
|
|
207
|
+
|
|
208
|
+
### Summary Scorecard
|
|
209
|
+
|
|
210
|
+
```markdown
|
|
211
|
+
## Spec Review: [Change/Feature Name]
|
|
212
|
+
|
|
213
|
+
### Summary
|
|
214
|
+
|
|
215
|
+
| Dimension | Status |
|
|
216
|
+
| ------------ | ------------------------------- |
|
|
217
|
+
| Completeness | X/Y requirements, Z/W scenarios |
|
|
218
|
+
| Correctness | N issues found |
|
|
219
|
+
| Coherence | M notes |
|
|
220
|
+
```
|
|
221
|
+
|
|
222
|
+
### Issues by Severity
|
|
223
|
+
|
|
224
|
+
#### CRITICAL (must fix before merge)
|
|
225
|
+
|
|
226
|
+
```
|
|
227
|
+
### CRITICAL: Missing requirement — [requirement name]
|
|
228
|
+
|
|
229
|
+
**Spec location:** specs/feature/spec.md, line N
|
|
230
|
+
**Requirement:** [the requirement text]
|
|
231
|
+
**Evidence:** No implementation found in codebase
|
|
232
|
+
**Recommendation:** Implement [requirement] in [suggested location]
|
|
233
|
+
```
|
|
234
|
+
|
|
235
|
+
#### WARNING (should fix)
|
|
236
|
+
|
|
237
|
+
```
|
|
238
|
+
### WARNING: Implementation diverges from spec — [requirement name]
|
|
239
|
+
|
|
240
|
+
**Spec says:** [what the spec expects]
|
|
241
|
+
**Code does:** [what the implementation actually does]
|
|
242
|
+
**File:** path/to/file.ts:line-range
|
|
243
|
+
**Recommendation:** [update code to match spec OR update spec to match code, with reasoning]
|
|
244
|
+
```
|
|
245
|
+
|
|
246
|
+
#### SUGGESTION (nice to fix)
|
|
247
|
+
|
|
248
|
+
```
|
|
249
|
+
### SUGGESTION: Design decision not followed — [decision name]
|
|
250
|
+
|
|
251
|
+
**Design says:** [the decision]
|
|
252
|
+
**Implementation:** [what was done instead]
|
|
253
|
+
**File:** path/to/file.ts:line-range
|
|
254
|
+
**Recommendation:** [align implementation with design OR update design to reflect reality]
|
|
255
|
+
```
|
|
256
|
+
|
|
257
|
+
### Graceful Degradation
|
|
258
|
+
|
|
259
|
+
If only partial specifications exist, review what you can and clearly state
|
|
260
|
+
what was skipped:
|
|
261
|
+
|
|
262
|
+
```markdown
|
|
263
|
+
### Scope of Review
|
|
264
|
+
|
|
265
|
+
- ✅ Requirements checked (spec.md found, 8 requirements)
|
|
266
|
+
- ✅ Scenarios checked (12 scenarios in spec)
|
|
267
|
+
- ⚠️ Design adherence skipped (no design.md found)
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
---
|
|
271
|
+
|
|
272
|
+
## Severity Guidelines
|
|
273
|
+
|
|
274
|
+
| Condition | Severity |
|
|
275
|
+
| ---------------------------------- | ----------- |
|
|
276
|
+
| Required behavior not implemented | CRITICAL |
|
|
277
|
+
| Spec scenario completely uncovered | CRITICAL |
|
|
278
|
+
| Implementation contradicts spec | WARNING |
|
|
279
|
+
| Spec scenario partially covered | WARNING |
|
|
280
|
+
| Design decision ignored | SUGGESTION |
|
|
281
|
+
| Pattern inconsistency | SUGGESTION |
|
|
282
|
+
|
|
283
|
+
**When uncertain:** Prefer the lower severity. False CRITICALs waste time;
|
|
284
|
+
missed SUGGESTIONs are low-cost.
|
|
285
|
+
|
|
286
|
+
**Every issue must include:** a specific, actionable recommendation with file
|
|
287
|
+
and line references where applicable. No vague suggestions like "review this
|
|
288
|
+
section."
|
|
289
|
+
|
|
290
|
+
---
|
|
291
|
+
|
|
292
|
+
## Forbidden Patterns
|
|
293
|
+
|
|
294
|
+
| Pattern | Why Forbidden |
|
|
295
|
+
| ----------------------------------------------- | --------------------------------------------------------- |
|
|
296
|
+
| Flagging issues without specific recommendations | Issues without fixes are complaints, not reviews |
|
|
297
|
+
| Reviewing without reading the spec | You cannot verify against a contract you haven't read |
|
|
298
|
+
| Treating spec as suggestions rather than contract | The spec IS the standard — if it's wrong, update it |
|
|
299
|
+
| Skipping scenarios during review | Scenarios are the testable surface — skipping them misses bugs |
|
|
300
|
+
| Using only CRITICAL severity | Not everything is critical; over-flagging causes alert fatigue |
|
|
301
|
+
|
|
302
|
+
---
|
|
303
|
+
|
|
304
|
+
_This skill is used after implementation and before merge to verify that the code matches its specification._
|
|
@@ -0,0 +1,236 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: "subagent-driven-development"
|
|
3
|
+
description: "Load when executing an existing implementation plan with multiple mostly independent tasks using delegated subagents, fresh task context, parent-owned review, and final integration verification."
|
|
4
|
+
version: 1.0.0
|
|
5
|
+
required: false
|
|
6
|
+
category: workflow
|
|
7
|
+
tools:
|
|
8
|
+
- claude
|
|
9
|
+
- copilot
|
|
10
|
+
- codex
|
|
11
|
+
- cursor
|
|
12
|
+
routing:
|
|
13
|
+
triggers:
|
|
14
|
+
- subagent-orchestration
|
|
15
|
+
- delegated-implementation
|
|
16
|
+
- implementation-plan-execution
|
|
17
|
+
- multi-task-plan
|
|
18
|
+
- parallel-agent-work
|
|
19
|
+
paths:
|
|
20
|
+
- full-path
|
|
21
|
+
- debugging-path
|
|
22
|
+
- policy-path
|
|
23
|
+
---
|
|
24
|
+
|
|
25
|
+
# Subagent-Driven Development
|
|
26
|
+
|
|
27
|
+
You are an implementation orchestrator. Your job is to execute an existing plan by
|
|
28
|
+
splitting safe work into delegated agent tasks while keeping responsibility for
|
|
29
|
+
scope, sequencing, review, integration, and final verification.
|
|
30
|
+
|
|
31
|
+
This skill does not replace planning, TDD, review, or verification. It coordinates
|
|
32
|
+
those workflows when fresh subagent contexts are safer than one long-running
|
|
33
|
+
implementation session.
|
|
34
|
+
|
|
35
|
+
## When to Load
|
|
36
|
+
|
|
37
|
+
Load this skill when all are true:
|
|
38
|
+
|
|
39
|
+
- an implementation plan, issue task list, PRD-derived task list, or clear staged
|
|
40
|
+
work plan already exists
|
|
41
|
+
- the work contains multiple tasks that can be scoped independently or sequenced
|
|
42
|
+
cleanly
|
|
43
|
+
- the active client/runtime supports delegated subagents, parallel agents, or
|
|
44
|
+
equivalent isolated worker sessions
|
|
45
|
+
- each task can be given self-contained context, constraints, non-goals, and
|
|
46
|
+
verification expectations
|
|
47
|
+
- the parent agent can inspect results and run final combined verification
|
|
48
|
+
|
|
49
|
+
Do not load this skill when:
|
|
50
|
+
|
|
51
|
+
- requirements are still vague — use `skills/product-requirements-writer/SKILL.md`
|
|
52
|
+
or `skills/implementation-task-planner/SKILL.md` first
|
|
53
|
+
- one coherent system model is required before any safe edit can happen
|
|
54
|
+
- tasks would edit the same files concurrently or compete for the same mutable
|
|
55
|
+
resources
|
|
56
|
+
- the active runtime lacks safe delegation support; use the normal Full Path and
|
|
57
|
+
state that subagent orchestration is unavailable
|
|
58
|
+
- the orchestration overhead is larger than the risk of simply doing a small edit
|
|
59
|
+
|
|
60
|
+
## Core Principle: Parent Owns Scope, Subagents Own Slices
|
|
61
|
+
|
|
62
|
+
A subagent is a worker with isolated context, not a replacement for the parent
|
|
63
|
+
agent's judgment. The parent agent must decide what can be delegated, provide the
|
|
64
|
+
right context, and verify that the combined result is safe.
|
|
65
|
+
|
|
66
|
+
Do not dispatch broad prompts such as "implement the plan." Dispatch narrow,
|
|
67
|
+
self-contained task slices with explicit constraints and expected evidence.
|
|
68
|
+
|
|
69
|
+
## Parent-Agent Responsibilities
|
|
70
|
+
|
|
71
|
+
Before dispatching any subagent:
|
|
72
|
+
|
|
73
|
+
1. Read the plan or task list once.
|
|
74
|
+
2. Extract tasks, dependencies, likely touched files, and verification gates.
|
|
75
|
+
3. Classify each task as parallel-safe, sequential, or not delegable.
|
|
76
|
+
4. Identify tasks that may share files, shared state, test fixtures, migrations,
|
|
77
|
+
generated outputs, or external resources.
|
|
78
|
+
5. Decide the smallest safe delegation units.
|
|
79
|
+
|
|
80
|
+
During execution:
|
|
81
|
+
|
|
82
|
+
1. Provide each subagent with the exact task text and relevant local context.
|
|
83
|
+
2. Set allowed edit scope, forbidden areas, constraints, and non-goals.
|
|
84
|
+
3. Require status, changed files, verification, and concerns in the subagent
|
|
85
|
+
response.
|
|
86
|
+
4. Review subagent results before accepting them.
|
|
87
|
+
5. Stop unsafe parallel work if changed-file overlap or shared-state coupling
|
|
88
|
+
appears.
|
|
89
|
+
|
|
90
|
+
Before completion:
|
|
91
|
+
|
|
92
|
+
1. Inspect the combined diff.
|
|
93
|
+
2. Check changed-file overlap and integration risk.
|
|
94
|
+
3. Run the selected review skills from `directives/adaptive-routing.md`.
|
|
95
|
+
4. Run relevant project verification and quality gates.
|
|
96
|
+
5. Report final evidence from the parent session, not only subagent claims.
|
|
97
|
+
|
|
98
|
+
## Delegation Decision Rules
|
|
99
|
+
|
|
100
|
+
Use delegated subagents for tasks that are:
|
|
101
|
+
|
|
102
|
+
- isolated to different files, modules, packages, tests, or research questions
|
|
103
|
+
- small enough to explain in one focused prompt
|
|
104
|
+
- independently verifiable
|
|
105
|
+
- low-conflict if completed in parallel, or clearly ordered if sequential
|
|
106
|
+
|
|
107
|
+
Keep work in the parent session or execute sequentially when:
|
|
108
|
+
|
|
109
|
+
- one task's result determines another task's design
|
|
110
|
+
- tasks touch the same files or generated artifacts
|
|
111
|
+
- tasks involve migrations, production data, auth/security/privacy, deployment,
|
|
112
|
+
or other high-risk shared state
|
|
113
|
+
- broad architecture understanding is required before editing
|
|
114
|
+
- the plan itself may be wrong
|
|
115
|
+
|
|
116
|
+
Parallel delegation is optional. Sequential fresh-context delegation can still be
|
|
117
|
+
valuable for long plans when each task needs a clean scope and review checkpoint.
|
|
118
|
+
|
|
119
|
+
## Subagent Prompt Contract
|
|
120
|
+
|
|
121
|
+
Every implementation subagent prompt should include:
|
|
122
|
+
|
|
123
|
+
- **Task goal:** the specific task to complete
|
|
124
|
+
- **Original task text:** copied from the plan, not summarized from memory
|
|
125
|
+
- **Relevant context:** files, commands, existing patterns, dependencies, prior task
|
|
126
|
+
outcomes that matter
|
|
127
|
+
- **Edit scope:** allowed files/areas and forbidden areas
|
|
128
|
+
- **Constraints and non-goals:** what not to build, refactor, or clean up
|
|
129
|
+
- **Workflow expectations:** TDD, type-first work, debugging, or review rules that
|
|
130
|
+
apply to this task
|
|
131
|
+
- **Verification:** exact or best-known checks to run, plus what to report if a
|
|
132
|
+
check is unavailable
|
|
133
|
+
- **Output contract:** required status, changed files, verification evidence,
|
|
134
|
+
unresolved risks, and questions
|
|
135
|
+
|
|
136
|
+
Use this status vocabulary:
|
|
137
|
+
|
|
138
|
+
| Status | Meaning | Parent action |
|
|
139
|
+
| --- | --- | --- |
|
|
140
|
+
| `DONE` | Task complete with evidence and no material concerns | Review before accepting |
|
|
141
|
+
| `DONE_WITH_CONCERNS` | Task complete but the worker found risks, assumptions, or weak evidence | Inspect concerns before review |
|
|
142
|
+
| `NEEDS_CONTEXT` | Worker cannot proceed safely without missing information | Provide context or re-scope |
|
|
143
|
+
| `BLOCKED` | Worker cannot complete with current plan/tooling/scope | Reassess task size, assumptions, or ask the human |
|
|
144
|
+
|
|
145
|
+
Never ignore `NEEDS_CONTEXT`, `BLOCKED`, or material concerns. Retrying the same
|
|
146
|
+
prompt without changing context usually repeats the failure.
|
|
147
|
+
|
|
148
|
+
## Review Sequence
|
|
149
|
+
|
|
150
|
+
For non-trivial delegated implementation, review in this order:
|
|
151
|
+
|
|
152
|
+
1. **Spec compliance review**
|
|
153
|
+
- Does the change satisfy the original task/spec?
|
|
154
|
+
- Are required paths, APIs, behaviors, and tests present?
|
|
155
|
+
- Did the subagent avoid extra scope?
|
|
156
|
+
|
|
157
|
+
2. **Quality review**
|
|
158
|
+
- Does the code follow project conventions?
|
|
159
|
+
- Are tests meaningful and behavior-focused?
|
|
160
|
+
- Are error handling, security, data, and operational risks addressed for the
|
|
161
|
+
touched surface?
|
|
162
|
+
- Did the worker introduce unnecessary abstraction, duplication, or broad
|
|
163
|
+
cleanup?
|
|
164
|
+
|
|
165
|
+
Use existing routed reviewer skills only when their normal routing triggers match
|
|
166
|
+
the touched surface or risk:
|
|
167
|
+
|
|
168
|
+
- `skills/spec-reviewer/SKILL.md` for spec-governed work
|
|
169
|
+
- `skills/test-reviewer/SKILL.md` for tests and eval scenarios
|
|
170
|
+
- `skills/code-reviewer/SKILL.md` for baseline diff review when a PR, branch,
|
|
171
|
+
local diff, or review checkpoint is in scope
|
|
172
|
+
- `skills/architecture-boundary-reviewer/SKILL.md` for imports, exports, moves,
|
|
173
|
+
package boundaries, or shared utilities
|
|
174
|
+
- `skills/production-readiness-reviewer/SKILL.md` for production-sensitive work
|
|
175
|
+
|
|
176
|
+
Do not load every reviewer by default. Implementer self-review is useful but never
|
|
177
|
+
replaces parent-side or routed reviewer validation when the risk calls for it.
|
|
178
|
+
|
|
179
|
+
## Failure Handling
|
|
180
|
+
|
|
181
|
+
If a subagent finds spec gaps:
|
|
182
|
+
|
|
183
|
+
1. Re-dispatch a focused fix task or fix in the parent session if safer.
|
|
184
|
+
2. Re-run the spec review after the fix.
|
|
185
|
+
3. Do not move to quality review until material spec gaps are closed.
|
|
186
|
+
|
|
187
|
+
If a quality reviewer requests changes:
|
|
188
|
+
|
|
189
|
+
1. Fix only material issues tied to the task.
|
|
190
|
+
2. Re-review when the fix changes behavior, tests, or architecture.
|
|
191
|
+
3. Track minor follow-ups separately if they are outside scope.
|
|
192
|
+
|
|
193
|
+
If subagent outputs conflict:
|
|
194
|
+
|
|
195
|
+
1. Stop accepting further worker changes.
|
|
196
|
+
2. Inspect changed-file overlap and assumptions.
|
|
197
|
+
3. Resolve conflicts in the parent session or re-scope sequentially.
|
|
198
|
+
4. Run combined verification before continuing.
|
|
199
|
+
|
|
200
|
+
## Common Pitfalls
|
|
201
|
+
|
|
202
|
+
1. **Dispatching the whole plan.** Broad delegation recreates the same context
|
|
203
|
+
problem in another agent. Slice the plan first.
|
|
204
|
+
|
|
205
|
+
2. **Parallelizing shared files.** If two workers touch the same file, generated
|
|
206
|
+
output, fixture, migration, or shared resource, treat the work as sequential
|
|
207
|
+
unless you have explicit isolation.
|
|
208
|
+
|
|
209
|
+
3. **Letting workers infer constraints.** Subagents do not inherit the parent
|
|
210
|
+
session's hidden context. Provide exact task text, constraints, non-goals, and
|
|
211
|
+
verification requirements.
|
|
212
|
+
|
|
213
|
+
4. **Trusting claims without evidence.** A worker saying tests passed is weaker
|
|
214
|
+
than parent-side verification. Run final combined checks before reporting done.
|
|
215
|
+
|
|
216
|
+
5. **Turning every small task into a review gauntlet.** Use review depth that
|
|
217
|
+
matches risk. Tiny mechanical tasks may need parent inspection and targeted
|
|
218
|
+
verification; non-trivial behavior changes need spec and quality review.
|
|
219
|
+
|
|
220
|
+
6. **Skipping integration review.** Independent task success does not prove the
|
|
221
|
+
combined diff is coherent. Always inspect the integrated result.
|
|
222
|
+
|
|
223
|
+
## Verification Checklist
|
|
224
|
+
|
|
225
|
+
Before completing a subagent-driven implementation:
|
|
226
|
+
|
|
227
|
+
- [ ] The parent agent read and classified the full plan before dispatching work.
|
|
228
|
+
- [ ] Each subagent received self-contained context, constraints, non-goals, edit
|
|
229
|
+
scope, and verification expectations.
|
|
230
|
+
- [ ] Parallel work did not edit the same files or shared mutable resources.
|
|
231
|
+
- [ ] `NEEDS_CONTEXT`, `BLOCKED`, and `DONE_WITH_CONCERNS` statuses were handled
|
|
232
|
+
explicitly.
|
|
233
|
+
- [ ] Spec compliance was checked before quality review for non-trivial tasks.
|
|
234
|
+
- [ ] Relevant routed reviewer skills were used only when their normal triggers
|
|
235
|
+
matched the touched surfaces or risks.
|
|
236
|
+
- [ ] The parent agent inspected the combined diff and ran final verification.
|