agent-directives 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (57) hide show
  1. package/README.md +385 -0
  2. package/directives/adaptive-routing.md +361 -0
  3. package/directives/architecture-boundaries.md +223 -0
  4. package/directives/codebase-navigation.md +325 -0
  5. package/directives/context-handoff.md +220 -0
  6. package/directives/error-memory.md +169 -0
  7. package/directives/exploration-mode.md +266 -0
  8. package/directives/session-decisions.md +193 -0
  9. package/directives/specification-driven-development.md +278 -0
  10. package/directives/task-framing.md +154 -0
  11. package/directives/test-driven-development.md +305 -0
  12. package/directives/type-driven-development.md +173 -0
  13. package/directives/verification.md +266 -0
  14. package/directives/workspace-isolation.md +219 -0
  15. package/dist/cli.d.ts +3 -0
  16. package/dist/cli.d.ts.map +1 -0
  17. package/dist/cli.js +232 -0
  18. package/dist/cli.js.map +1 -0
  19. package/dist/context-audit.d.ts +30 -0
  20. package/dist/context-audit.d.ts.map +1 -0
  21. package/dist/context-audit.js +75 -0
  22. package/dist/context-audit.js.map +1 -0
  23. package/dist/install.d.ts +18 -0
  24. package/dist/install.d.ts.map +1 -0
  25. package/dist/install.js +28 -0
  26. package/dist/install.js.map +1 -0
  27. package/dist/manifest.d.ts +25 -0
  28. package/dist/manifest.d.ts.map +1 -0
  29. package/dist/manifest.js +29 -0
  30. package/dist/manifest.js.map +1 -0
  31. package/dist/prompt.d.ts +3 -0
  32. package/dist/prompt.d.ts.map +1 -0
  33. package/dist/prompt.js +29 -0
  34. package/dist/prompt.js.map +1 -0
  35. package/dist/targets.d.ts +10 -0
  36. package/dist/targets.d.ts.map +1 -0
  37. package/dist/targets.js +32 -0
  38. package/dist/targets.js.map +1 -0
  39. package/manifest.json +387 -0
  40. package/package.json +74 -0
  41. package/skills/architecture-boundary-reviewer/SKILL.md +228 -0
  42. package/skills/code-reviewer/SKILL.md +77 -0
  43. package/skills/codebase-health-reviewer/SKILL.md +234 -0
  44. package/skills/harness-hooks-reviewer/SKILL.md +159 -0
  45. package/skills/implementation-task-planner/SKILL.md +205 -0
  46. package/skills/mcp-integration-reviewer/SKILL.md +157 -0
  47. package/skills/product-requirements-writer/SKILL.md +205 -0
  48. package/skills/production-readiness-reviewer/SKILL.md +240 -0
  49. package/skills/self-audit/SKILL.md +134 -0
  50. package/skills/spec-reviewer/SKILL.md +304 -0
  51. package/skills/subagent-driven-development/SKILL.md +236 -0
  52. package/skills/systematic-debugging/SKILL.md +313 -0
  53. package/skills/test-reviewer/SKILL.md +293 -0
  54. package/templates/AGENTS.md +120 -0
  55. package/templates/CLAUDE.md +115 -0
  56. package/templates/copilot-instructions.md +116 -0
  57. package/templates/decision-log.md +44 -0
@@ -0,0 +1,134 @@
1
+ ---
2
+ name: "self-audit"
3
+ description: "Load when implementation is past GREEN/REFACTOR and the user asks for pre-PR verification, self-audit, scope check, weakest-assumption review, anomaly triage, or a confidence check."
4
+ version: 1.0.0
5
+ required: false
6
+ category: review
7
+ tools:
8
+ - claude
9
+ - copilot
10
+ - codex
11
+ - cursor
12
+ routing:
13
+ triggers:
14
+ - after-refactor
15
+ - before-verification
16
+ - full-path
17
+ - pre-pr
18
+ paths:
19
+ - full-path
20
+ ---
21
+
22
+ # Self-Audit
23
+
24
+ After GREEN/REFACTOR, before verification. This is a triage point — some
25
+ findings loop back to TDD, others flow forward into the PR body.
26
+
27
+ ```
28
+ TDD (RED → GREEN → REFACTOR)
29
+
30
+
31
+ SELF-AUDIT (triage)
32
+
33
+ ├─ 🔁 Fix now ──▶ RED (one targeted TDD cycle)
34
+ │ │
35
+ │ ▼
36
+ │ SELF-AUDIT pass 2 (document only)
37
+ │ │
38
+ ├─ 📋 Document ────────┤
39
+ │ │
40
+ ├─ 🧑 Ask human ───────┤
41
+ │ ▼
42
+ └──────────────▶ Verification → PR
43
+ ```
44
+
45
+ **One-loop-max:** Pass 1 triages. If it sends a fix to RED, pass 2 is
46
+ documentation only. There is no pass 3.
47
+
48
+ ---
49
+
50
+ ## The Jenga Test (always required)
51
+
52
+ Name the **single weakest assumption** in your implementation — the block
53
+ that, if pulled, collapses the most.
54
+
55
+ For each entry, state:
56
+
57
+ - **Weakest assumption** — specific and falsifiable, not vague
58
+ - **It would break if** — the concrete condition that makes it false
59
+ - **Evidence supporting it** — what you verified, or "none"
60
+ - **Routing** — 🔁 Fix now / 📋 Document / 🧑 Ask human
61
+
62
+ If you can't identify a weak assumption, that *is* the Jenga entry:
63
+ "My assumption is that I have no weak assumptions."
64
+
65
+ ### Routing criteria
66
+
67
+ - **🔁 Fix now** — One TDD cycle. In scope. Shipping without it is irresponsible.
68
+ - **📋 Document** — Architectural, out of scope, or multi-cycle. Known gap, not a blocker.
69
+ - **🧑 Ask human** — Can't assess fixability, or the fix changes the approach.
70
+
71
+ ---
72
+
73
+ ## Anomaly Register (required when anomalies exist)
74
+
75
+ Log every warning, deprecation notice, flaky test, or unexpected side effect
76
+ observed during the TDD cycle. For each, record what it was, whether it's new
77
+ or recurring, what it might signal, and a routing decision.
78
+
79
+ **"It's always been like that" is not a valid disposition.** Recurring anomalies
80
+ get the highest suspicion, not the lowest.
81
+
82
+ A suspiciously empty register is itself a signal.
83
+
84
+ ---
85
+
86
+ ## Diff and Boundary Reality Check (required when code changed)
87
+
88
+ Before finalizing self-audit, inspect the actual diff. If `difit` is available,
89
+ prefer it for a local GitHub-style review:
90
+
91
+ ```bash
92
+ npx difit .
93
+ npx difit staged
94
+ ```
95
+
96
+ Use the diff to look for:
97
+
98
+ - unrelated edits that expanded beyond the task
99
+ - imports or exports that cross an architectural boundary
100
+ - missing tests adjacent to changed behavior
101
+ - public API changes not reflected in docs or verification
102
+ - risky deletions, broad rewrites, or new shared utilities
103
+
104
+ If Fallow is available in a TypeScript/JavaScript project, use relevant summary
105
+ checks as self-audit evidence for architecture drift, dead code, duplication, and
106
+ cycles. Route any boundary uncertainty into the Jenga Test.
107
+
108
+ ---
109
+
110
+ ## Sunk Cost Check (required after 3+ TDD cycles in a session)
111
+
112
+ Assess trajectory across cycles. If two or more of these are true, surface it:
113
+
114
+ 1. Jenga entries are getting more severe each cycle
115
+ 2. Anomaly Register is growing rather than stabilizing
116
+ 3. Later cycles work around limitations of earlier cycles
117
+
118
+ The question to surface: *"If I started fresh with what I know now, would I
119
+ choose this same approach?"* The human decides. You surface.
120
+
121
+ ---
122
+
123
+ ## Output Routing
124
+
125
+ Each destination fires on a specific condition:
126
+
127
+ | When | Route to | What |
128
+ | --- | --- | --- |
129
+ | **Always**, when opening a PR and self-audit produced routed findings | `## Self-Audit` in the PR body, **before** `## Verification` | Full Jenga + Anomaly Register + Sunk Cost (if triggered). Reviewer sees uncertainty before proof. |
130
+ | **When** self-audit has no routed findings | One-line PR note | `Self-audit completed; no routed findings.` Avoid boilerplate sections with no information. |
131
+ | **Always**, when running verification after self-audit | Verification focus areas (same session) | Verification's functional proof must target any 📋 documented Jenga assumption. |
132
+ | **When** an anomaly matches one you've seen in a previous PR's self-audit | `docs/ERRORS.md` (error-memory format) | Recurrence across PRs promotes an anomaly from one-time observation to systemic pattern. Check by grepping recent merged PRs for the same warning text. |
133
+ | **When** the human decides to change approach after a Sunk Cost Signal | `docs/decisions/` (session-decisions format) | Captures why the approach changed. If the human says "continue," no log needed — the signal is already in the PR body. |
134
+ | **When** starting work in a module that has been self-audited before (during codebase navigation) | Read previous `## Self-Audit` sections from recent merged PRs | Previous Jenga entries are the known weak spots. If your change makes a previous break condition more likely, include it in your own self-audit. |
@@ -0,0 +1,304 @@
1
+ ---
2
+ name: "spec-reviewer"
3
+ description: "Load when the user asks whether implementation matches a spec, requirements doc, acceptance criteria, or design plan, or says check what is missing, incomplete, or divergent before merge."
4
+ version: 1.1.0
5
+ required: false
6
+ category: review
7
+ tools:
8
+ - claude
9
+ - copilot
10
+ - codex
11
+ - cursor
12
+ routing:
13
+ triggers:
14
+ - written-spec
15
+ - specification
16
+ - acceptance-criteria
17
+ - design-review
18
+ paths:
19
+ - full-path
20
+ - review-path
21
+ ---
22
+
23
+ ## Review Depth
24
+
25
+ Default to the lightest useful review.
26
+
27
+ ### Fast Path
28
+ Use only when the change is small, localized, low-risk, and project gates are already passing or not relevant.
29
+
30
+ Output:
31
+ - Top 1-3 material findings only
32
+ - `No material findings` if clean
33
+ - Verification gaps only when they affect merge confidence
34
+
35
+ Do not emit the full checklist when there are no findings.
36
+
37
+ ### Deep Path
38
+ Use the full review process when the change is high-risk, cross-cutting, production-sensitive, security/data-sensitive, behavior-changing without adequate tests, has failing or missing gates, or is explicitly requested.
39
+
40
+ # Spec Reviewer
41
+
42
+ You are a specialist in reviewing whether an implementation matches its written
43
+ specification. Your primary focus is ensuring every requirement has code, every
44
+ scenario has coverage, and the implementation follows the design it was built
45
+ against.
46
+
47
+ This skill complements the test-reviewer skill. Test-reviewer catches bad tests.
48
+ Spec-reviewer catches missing or divergent implementations.
49
+
50
+ ## Core Principle: The Spec Is the Contract
51
+
52
+ The specification is the agreement between intent and implementation. If the
53
+ code doesn't match the spec, one of them is wrong — and you need to identify
54
+ which. The spec is not aspirational; it is the contract.
55
+
56
+ ---
57
+
58
+ ## Three-Dimensional Review
59
+
60
+ Every spec review checks three dimensions. Each has its own severity level.
61
+
62
+ ### Dimension 1: Completeness (CRITICAL)
63
+
64
+ **Question:** Is everything the spec requires actually implemented?
65
+
66
+ #### Check 1: Requirement Coverage
67
+
68
+ For each requirement in the specification:
69
+
70
+ 1. **Find the requirement** — look for `### Requirement:` or similar markers
71
+ 2. **Search for implementation evidence** — grep for keywords, class names,
72
+ function names, or behavior described in the requirement
73
+ 3. **Assess coverage:**
74
+ - **Found** — implementation exists, note the file and line range
75
+ - **Partial** — some aspects implemented, others missing
76
+ - **Missing** — no evidence of implementation
77
+
78
+ ```
79
+ ### Requirement: User authentication
80
+ Status: FOUND
81
+ Evidence: src/auth/login.ts:45-82, src/auth/session.ts:12-34
82
+
83
+ ### Requirement: Password reset flow
84
+ Status: PARTIAL
85
+ Evidence: src/auth/reset.ts:1-30 (token generation only, email sending missing)
86
+
87
+ ### Requirement: Rate limiting on login attempts
88
+ Status: MISSING
89
+ Evidence: No rate-limiting middleware found in auth routes
90
+ ```
91
+
92
+ #### Check 2: Scenario Coverage
93
+
94
+ For each scenario in the specification:
95
+
96
+ 1. **Find the scenario** — look for `#### Scenario:` or `WHEN/THEN` patterns
97
+ 2. **Check for test coverage** — does a test verify this scenario?
98
+ 3. **Check for implementation coverage** — does the code handle this case?
99
+
100
+ | Scenario Status | Meaning |
101
+ | ------------------ | -------------------------------------- |
102
+ | Covered | Both test and implementation exist |
103
+ | Untested | Implementation exists, no test |
104
+ | Unimplemented | Test exists (possibly skipped), no code |
105
+ | Missing | Neither test nor implementation exists |
106
+
107
+ ### Dimension 2: Correctness (WARNING)
108
+
109
+ **Question:** Does the code do what the spec says, or something different?
110
+
111
+ #### Check 3: Implementation-Spec Alignment
112
+
113
+ For each implemented requirement:
114
+
115
+ 1. Read the specification's description of expected behavior
116
+ 2. Read the implementation
117
+ 3. Compare: does the code produce the behavior the spec describes?
118
+
119
+ ```typescript
120
+ // Spec says: "The system SHALL return a 409 Conflict when creating
121
+ // a user with an email that already exists."
122
+
123
+ // Implementation review:
124
+ // src/users/create.ts:67-72
125
+ if (existingUser) {
126
+ return { status: 409, body: { error: "User exists" } };
127
+ }
128
+
129
+ // DIVERGENCE: Error message is generic "User exists" but spec might
130
+ // expect "Email already registered" — check spec for exact wording.
131
+ ```
132
+
133
+ **Key signals of divergence:**
134
+
135
+ - Different error messages or status codes than the spec describes
136
+ - Different function signatures or return types than the spec defines
137
+ - Different ordering or flow than the spec prescribes
138
+ - Different edge case handling than the spec requires
139
+ - Implementation handles cases the spec doesn't mention (scope creep)
140
+ - Implementation skips cases the spec requires (incomplete)
141
+
142
+ #### Check 4: Scenario Behavior Matching
143
+
144
+ For each testable scenario:
145
+
146
+ 1. Read the scenario's expected outcome
147
+ 2. Read the corresponding test (if it exists)
148
+ 3. Does the test actually verify what the scenario describes?
149
+
150
+ ```
151
+ Scenario: "User submits empty registration form"
152
+ Expected: "The system SHALL return validation errors for each required field"
153
+ Test: it("should reject empty form", () => { ... })
154
+
155
+ Issue: Test checks that status is 400 but does not verify that ALL
156
+ required fields have error messages. Scenario expects per-field errors.
157
+ ```
158
+
159
+ ### Dimension 3: Coherence (SUGGESTION)
160
+
161
+ **Question:** Does the implementation follow the design decisions?
162
+
163
+ #### Check 5: Design Adherence
164
+
165
+ If a design document exists:
166
+
167
+ 1. Extract key decisions (look for "Decision:", "Approach:", "Architecture:",
168
+ "Pattern:")
169
+ 2. Verify the implementation follows those decisions
170
+ 3. If it contradicts a decision, flag it
171
+
172
+ ```
173
+ Design says: "Use repository pattern for data access"
174
+ Implementation: Direct SQL queries in route handlers
175
+
176
+ DIVERGENCE: Design specifies repository pattern but implementation
177
+ uses inline queries in src/routes/users.ts:34-41
178
+ ```
179
+
180
+ #### Check 6: Pattern Consistency
181
+
182
+ Review new code for consistency with project patterns:
183
+
184
+ - File naming and directory structure
185
+ - Error handling approach
186
+ - Logging patterns
187
+ - Import/export conventions
188
+ - Configuration patterns
189
+
190
+ ---
191
+
192
+ ## Review Process
193
+
194
+ For every spec review:
195
+
196
+ 1. **Read the specification** — understand all requirements and scenarios
197
+ 2. **Read the design** (if it exists) — understand architectural decisions
198
+ 3. **Map requirements to code** — completeness check
199
+ 4. **Map scenarios to tests** — scenario coverage check
200
+ 5. **Spot-check implementations** — correctness check on critical paths
201
+ 6. **Check design adherence** — coherence check
202
+ 7. **Generate the review report** — structured output below
203
+
204
+ ---
205
+
206
+ ## Output Format
207
+
208
+ ### Summary Scorecard
209
+
210
+ ```markdown
211
+ ## Spec Review: [Change/Feature Name]
212
+
213
+ ### Summary
214
+
215
+ | Dimension | Status |
216
+ | ------------ | ------------------------------- |
217
+ | Completeness | X/Y requirements, Z/W scenarios |
218
+ | Correctness | N issues found |
219
+ | Coherence | M notes |
220
+ ```
221
+
222
+ ### Issues by Severity
223
+
224
+ #### CRITICAL (must fix before merge)
225
+
226
+ ```
227
+ ### CRITICAL: Missing requirement — [requirement name]
228
+
229
+ **Spec location:** specs/feature/spec.md, line N
230
+ **Requirement:** [the requirement text]
231
+ **Evidence:** No implementation found in codebase
232
+ **Recommendation:** Implement [requirement] in [suggested location]
233
+ ```
234
+
235
+ #### WARNING (should fix)
236
+
237
+ ```
238
+ ### WARNING: Implementation diverges from spec — [requirement name]
239
+
240
+ **Spec says:** [what the spec expects]
241
+ **Code does:** [what the implementation actually does]
242
+ **File:** path/to/file.ts:line-range
243
+ **Recommendation:** [update code to match spec OR update spec to match code, with reasoning]
244
+ ```
245
+
246
+ #### SUGGESTION (nice to fix)
247
+
248
+ ```
249
+ ### SUGGESTION: Design decision not followed — [decision name]
250
+
251
+ **Design says:** [the decision]
252
+ **Implementation:** [what was done instead]
253
+ **File:** path/to/file.ts:line-range
254
+ **Recommendation:** [align implementation with design OR update design to reflect reality]
255
+ ```
256
+
257
+ ### Graceful Degradation
258
+
259
+ If only partial specifications exist, review what you can and clearly state
260
+ what was skipped:
261
+
262
+ ```markdown
263
+ ### Scope of Review
264
+
265
+ - ✅ Requirements checked (spec.md found, 8 requirements)
266
+ - ✅ Scenarios checked (12 scenarios in spec)
267
+ - ⚠️ Design adherence skipped (no design.md found)
268
+ ```
269
+
270
+ ---
271
+
272
+ ## Severity Guidelines
273
+
274
+ | Condition | Severity |
275
+ | ---------------------------------- | ----------- |
276
+ | Required behavior not implemented | CRITICAL |
277
+ | Spec scenario completely uncovered | CRITICAL |
278
+ | Implementation contradicts spec | WARNING |
279
+ | Spec scenario partially covered | WARNING |
280
+ | Design decision ignored | SUGGESTION |
281
+ | Pattern inconsistency | SUGGESTION |
282
+
283
+ **When uncertain:** Prefer the lower severity. False CRITICALs waste time;
284
+ missed SUGGESTIONs are low-cost.
285
+
286
+ **Every issue must include:** a specific, actionable recommendation with file
287
+ and line references where applicable. No vague suggestions like "review this
288
+ section."
289
+
290
+ ---
291
+
292
+ ## Forbidden Patterns
293
+
294
+ | Pattern | Why Forbidden |
295
+ | ----------------------------------------------- | --------------------------------------------------------- |
296
+ | Flagging issues without specific recommendations | Issues without fixes are complaints, not reviews |
297
+ | Reviewing without reading the spec | You cannot verify against a contract you haven't read |
298
+ | Treating spec as suggestions rather than contract | The spec IS the standard — if it's wrong, update it |
299
+ | Skipping scenarios during review | Scenarios are the testable surface — skipping them misses bugs |
300
+ | Using only CRITICAL severity | Not everything is critical; over-flagging causes alert fatigue |
301
+
302
+ ---
303
+
304
+ _This skill is used after implementation and before merge to verify that the code matches its specification._
@@ -0,0 +1,236 @@
1
+ ---
2
+ name: "subagent-driven-development"
3
+ description: "Load when executing an existing implementation plan with multiple mostly independent tasks using delegated subagents, fresh task context, parent-owned review, and final integration verification."
4
+ version: 1.0.0
5
+ required: false
6
+ category: workflow
7
+ tools:
8
+ - claude
9
+ - copilot
10
+ - codex
11
+ - cursor
12
+ routing:
13
+ triggers:
14
+ - subagent-orchestration
15
+ - delegated-implementation
16
+ - implementation-plan-execution
17
+ - multi-task-plan
18
+ - parallel-agent-work
19
+ paths:
20
+ - full-path
21
+ - debugging-path
22
+ - policy-path
23
+ ---
24
+
25
+ # Subagent-Driven Development
26
+
27
+ You are an implementation orchestrator. Your job is to execute an existing plan by
28
+ splitting safe work into delegated agent tasks while keeping responsibility for
29
+ scope, sequencing, review, integration, and final verification.
30
+
31
+ This skill does not replace planning, TDD, review, or verification. It coordinates
32
+ those workflows when fresh subagent contexts are safer than one long-running
33
+ implementation session.
34
+
35
+ ## When to Load
36
+
37
+ Load this skill when all are true:
38
+
39
+ - an implementation plan, issue task list, PRD-derived task list, or clear staged
40
+ work plan already exists
41
+ - the work contains multiple tasks that can be scoped independently or sequenced
42
+ cleanly
43
+ - the active client/runtime supports delegated subagents, parallel agents, or
44
+ equivalent isolated worker sessions
45
+ - each task can be given self-contained context, constraints, non-goals, and
46
+ verification expectations
47
+ - the parent agent can inspect results and run final combined verification
48
+
49
+ Do not load this skill when:
50
+
51
+ - requirements are still vague — use `skills/product-requirements-writer/SKILL.md`
52
+ or `skills/implementation-task-planner/SKILL.md` first
53
+ - one coherent system model is required before any safe edit can happen
54
+ - tasks would edit the same files concurrently or compete for the same mutable
55
+ resources
56
+ - the active runtime lacks safe delegation support; use the normal Full Path and
57
+ state that subagent orchestration is unavailable
58
+ - the orchestration overhead is larger than the risk of simply doing a small edit
59
+
60
+ ## Core Principle: Parent Owns Scope, Subagents Own Slices
61
+
62
+ A subagent is a worker with isolated context, not a replacement for the parent
63
+ agent's judgment. The parent agent must decide what can be delegated, provide the
64
+ right context, and verify that the combined result is safe.
65
+
66
+ Do not dispatch broad prompts such as "implement the plan." Dispatch narrow,
67
+ self-contained task slices with explicit constraints and expected evidence.
68
+
69
+ ## Parent-Agent Responsibilities
70
+
71
+ Before dispatching any subagent:
72
+
73
+ 1. Read the plan or task list once.
74
+ 2. Extract tasks, dependencies, likely touched files, and verification gates.
75
+ 3. Classify each task as parallel-safe, sequential, or not delegable.
76
+ 4. Identify tasks that may share files, shared state, test fixtures, migrations,
77
+ generated outputs, or external resources.
78
+ 5. Decide the smallest safe delegation units.
79
+
80
+ During execution:
81
+
82
+ 1. Provide each subagent with the exact task text and relevant local context.
83
+ 2. Set allowed edit scope, forbidden areas, constraints, and non-goals.
84
+ 3. Require status, changed files, verification, and concerns in the subagent
85
+ response.
86
+ 4. Review subagent results before accepting them.
87
+ 5. Stop unsafe parallel work if changed-file overlap or shared-state coupling
88
+ appears.
89
+
90
+ Before completion:
91
+
92
+ 1. Inspect the combined diff.
93
+ 2. Check changed-file overlap and integration risk.
94
+ 3. Run the selected review skills from `directives/adaptive-routing.md`.
95
+ 4. Run relevant project verification and quality gates.
96
+ 5. Report final evidence from the parent session, not only subagent claims.
97
+
98
+ ## Delegation Decision Rules
99
+
100
+ Use delegated subagents for tasks that are:
101
+
102
+ - isolated to different files, modules, packages, tests, or research questions
103
+ - small enough to explain in one focused prompt
104
+ - independently verifiable
105
+ - low-conflict if completed in parallel, or clearly ordered if sequential
106
+
107
+ Keep work in the parent session or execute sequentially when:
108
+
109
+ - one task's result determines another task's design
110
+ - tasks touch the same files or generated artifacts
111
+ - tasks involve migrations, production data, auth/security/privacy, deployment,
112
+ or other high-risk shared state
113
+ - broad architecture understanding is required before editing
114
+ - the plan itself may be wrong
115
+
116
+ Parallel delegation is optional. Sequential fresh-context delegation can still be
117
+ valuable for long plans when each task needs a clean scope and review checkpoint.
118
+
119
+ ## Subagent Prompt Contract
120
+
121
+ Every implementation subagent prompt should include:
122
+
123
+ - **Task goal:** the specific task to complete
124
+ - **Original task text:** copied from the plan, not summarized from memory
125
+ - **Relevant context:** files, commands, existing patterns, dependencies, prior task
126
+ outcomes that matter
127
+ - **Edit scope:** allowed files/areas and forbidden areas
128
+ - **Constraints and non-goals:** what not to build, refactor, or clean up
129
+ - **Workflow expectations:** TDD, type-first work, debugging, or review rules that
130
+ apply to this task
131
+ - **Verification:** exact or best-known checks to run, plus what to report if a
132
+ check is unavailable
133
+ - **Output contract:** required status, changed files, verification evidence,
134
+ unresolved risks, and questions
135
+
136
+ Use this status vocabulary:
137
+
138
+ | Status | Meaning | Parent action |
139
+ | --- | --- | --- |
140
+ | `DONE` | Task complete with evidence and no material concerns | Review before accepting |
141
+ | `DONE_WITH_CONCERNS` | Task complete but the worker found risks, assumptions, or weak evidence | Inspect concerns before review |
142
+ | `NEEDS_CONTEXT` | Worker cannot proceed safely without missing information | Provide context or re-scope |
143
+ | `BLOCKED` | Worker cannot complete with current plan/tooling/scope | Reassess task size, assumptions, or ask the human |
144
+
145
+ Never ignore `NEEDS_CONTEXT`, `BLOCKED`, or material concerns. Retrying the same
146
+ prompt without changing context usually repeats the failure.
147
+
148
+ ## Review Sequence
149
+
150
+ For non-trivial delegated implementation, review in this order:
151
+
152
+ 1. **Spec compliance review**
153
+ - Does the change satisfy the original task/spec?
154
+ - Are required paths, APIs, behaviors, and tests present?
155
+ - Did the subagent avoid extra scope?
156
+
157
+ 2. **Quality review**
158
+ - Does the code follow project conventions?
159
+ - Are tests meaningful and behavior-focused?
160
+ - Are error handling, security, data, and operational risks addressed for the
161
+ touched surface?
162
+ - Did the worker introduce unnecessary abstraction, duplication, or broad
163
+ cleanup?
164
+
165
+ Use existing routed reviewer skills only when their normal routing triggers match
166
+ the touched surface or risk:
167
+
168
+ - `skills/spec-reviewer/SKILL.md` for spec-governed work
169
+ - `skills/test-reviewer/SKILL.md` for tests and eval scenarios
170
+ - `skills/code-reviewer/SKILL.md` for baseline diff review when a PR, branch,
171
+ local diff, or review checkpoint is in scope
172
+ - `skills/architecture-boundary-reviewer/SKILL.md` for imports, exports, moves,
173
+ package boundaries, or shared utilities
174
+ - `skills/production-readiness-reviewer/SKILL.md` for production-sensitive work
175
+
176
+ Do not load every reviewer by default. Implementer self-review is useful but never
177
+ replaces parent-side or routed reviewer validation when the risk calls for it.
178
+
179
+ ## Failure Handling
180
+
181
+ If a subagent finds spec gaps:
182
+
183
+ 1. Re-dispatch a focused fix task or fix in the parent session if safer.
184
+ 2. Re-run the spec review after the fix.
185
+ 3. Do not move to quality review until material spec gaps are closed.
186
+
187
+ If a quality reviewer requests changes:
188
+
189
+ 1. Fix only material issues tied to the task.
190
+ 2. Re-review when the fix changes behavior, tests, or architecture.
191
+ 3. Track minor follow-ups separately if they are outside scope.
192
+
193
+ If subagent outputs conflict:
194
+
195
+ 1. Stop accepting further worker changes.
196
+ 2. Inspect changed-file overlap and assumptions.
197
+ 3. Resolve conflicts in the parent session or re-scope sequentially.
198
+ 4. Run combined verification before continuing.
199
+
200
+ ## Common Pitfalls
201
+
202
+ 1. **Dispatching the whole plan.** Broad delegation recreates the same context
203
+ problem in another agent. Slice the plan first.
204
+
205
+ 2. **Parallelizing shared files.** If two workers touch the same file, generated
206
+ output, fixture, migration, or shared resource, treat the work as sequential
207
+ unless you have explicit isolation.
208
+
209
+ 3. **Letting workers infer constraints.** Subagents do not inherit the parent
210
+ session's hidden context. Provide exact task text, constraints, non-goals, and
211
+ verification requirements.
212
+
213
+ 4. **Trusting claims without evidence.** A worker saying tests passed is weaker
214
+ than parent-side verification. Run final combined checks before reporting done.
215
+
216
+ 5. **Turning every small task into a review gauntlet.** Use review depth that
217
+ matches risk. Tiny mechanical tasks may need parent inspection and targeted
218
+ verification; non-trivial behavior changes need spec and quality review.
219
+
220
+ 6. **Skipping integration review.** Independent task success does not prove the
221
+ combined diff is coherent. Always inspect the integrated result.
222
+
223
+ ## Verification Checklist
224
+
225
+ Before completing a subagent-driven implementation:
226
+
227
+ - [ ] The parent agent read and classified the full plan before dispatching work.
228
+ - [ ] Each subagent received self-contained context, constraints, non-goals, edit
229
+ scope, and verification expectations.
230
+ - [ ] Parallel work did not edit the same files or shared mutable resources.
231
+ - [ ] `NEEDS_CONTEXT`, `BLOCKED`, and `DONE_WITH_CONCERNS` statuses were handled
232
+ explicitly.
233
+ - [ ] Spec compliance was checked before quality review for non-trivial tasks.
234
+ - [ ] Relevant routed reviewer skills were used only when their normal triggers
235
+ matched the touched surfaces or risks.
236
+ - [ ] The parent agent inspected the combined diff and ran final verification.