@vpxa/aikit 0.1.2 → 0.1.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/packages/cli/dist/commands/init/constants.d.ts +3 -1
- package/packages/cli/dist/commands/init/constants.js +1 -1
- package/packages/cli/dist/commands/init/index.js +4 -4
- package/packages/cli/dist/commands/init/scaffold.d.ts +8 -1
- package/packages/cli/dist/commands/init/scaffold.js +1 -1
- package/packages/cli/dist/commands/init/user.js +4 -4
- package/packages/cli/dist/commands/upgrade.js +1 -1
- package/packages/core/dist/global-registry.js +1 -1
- package/packages/core/dist/types.d.ts +2 -0
- package/packages/flows/dist/git.js +1 -1
- package/packages/flows/dist/registry.d.ts +3 -3
- package/packages/flows/dist/registry.js +1 -1
- package/packages/flows/dist/symlinks.js +1 -1
- package/packages/indexer/dist/filesystem-crawler.js +1 -1
- package/packages/indexer/dist/hash-cache.js +1 -1
- package/packages/kb-client/dist/direct-client.d.ts +33 -34
- package/packages/kb-client/dist/index.d.ts +5 -4
- package/packages/kb-client/dist/mcp-client.d.ts +18 -18
- package/packages/kb-client/dist/parsers.d.ts +14 -11
- package/packages/kb-client/dist/types.d.ts +50 -47
- package/packages/present/dist/index.html +26 -26
- package/packages/server/dist/config.js +1 -1
- package/packages/server/dist/idle-timer.d.ts +4 -0
- package/packages/server/dist/idle-timer.js +1 -1
- package/packages/server/dist/index.js +1 -1
- package/packages/server/dist/memory-monitor.d.ts +2 -2
- package/packages/server/dist/memory-monitor.js +1 -1
- package/packages/server/dist/server.d.ts +1 -1
- package/packages/server/dist/server.js +2 -2
- package/packages/server/dist/tool-metadata.js +1 -1
- package/packages/server/dist/tools/config.tool.d.ts +8 -0
- package/packages/server/dist/tools/config.tool.js +12 -0
- package/packages/server/dist/tools/flow.tools.js +1 -1
- package/packages/server/dist/tools/present/browser.js +7 -7
- package/packages/server/dist/tools/present/tool.js +4 -4
- package/packages/server/dist/tools/search.tool.js +4 -4
- package/packages/server/dist/tools/status.tool.js +3 -3
- package/packages/store/dist/sqlite-graph-store.d.ts +3 -0
- package/packages/store/dist/sqlite-graph-store.js +3 -3
- package/packages/tools/dist/checkpoint.js +1 -1
- package/packages/tools/dist/evidence-map.js +2 -2
- package/packages/tools/dist/queue.js +1 -1
- package/packages/tools/dist/restore-points.js +1 -1
- package/packages/tools/dist/schema-validate.js +1 -1
- package/packages/tools/dist/snippet.js +1 -1
- package/packages/tools/dist/stash.js +1 -1
- package/packages/tools/dist/workset.js +1 -1
- package/packages/tui/dist/{App-B2-KJPt4.js → App-DpjN3iS-.js} +1 -1
- package/packages/tui/dist/App.js +1 -1
- package/packages/tui/dist/LogPanel-Db-SeZhR.js +3 -0
- package/packages/tui/dist/index.js +1 -1
- package/packages/tui/dist/panels/LogPanel.js +1 -1
- package/scaffold/general/skills/multi-agents-development/SKILL.md +435 -435
- package/scaffold/general/skills/present/SKILL.md +424 -424
- package/packages/kb-client/dist/__tests__/direct-client.test.d.ts +0 -1
- package/packages/kb-client/dist/__tests__/mcp-client.test.d.ts +0 -1
- package/packages/kb-client/dist/__tests__/parsers.test.d.ts +0 -1
- package/packages/tui/dist/LogPanel-E_1Do4-j.js +0 -3
|
@@ -1,435 +1,435 @@
|
|
|
1
|
-
# Multi-Agent Development
|
|
2
|
-
|
|
3
|
-
Comprehensive patterns for orchestrating multiple AI agents in parallel development workflows. Covers task decomposition, parallel dispatch, context crafting, status handling, review pipelines, and recovery.
|
|
4
|
-
|
|
5
|
-
**Core Principle**: Dispatch multiple agents for focused tasks. Each subagent gets fresh, focused context with explicit scope — never inherited session state.
|
|
6
|
-
|
|
7
|
-
Load this skill when orchestrating multi-agent work: planning parallel batches, crafting delegation prompts, handling implementer status, running review pipelines, or recovering from agent failures.
|
|
8
|
-
|
|
9
|
-
---
|
|
10
|
-
|
|
11
|
-
## §1 Agent Roles & Model Selection
|
|
12
|
-
|
|
13
|
-
### Role Categories
|
|
14
|
-
|
|
15
|
-
| Role | Agents | When to Use | Parallelizable |
|
|
16
|
-
|------|--------|-------------|----------------|
|
|
17
|
-
| **Orchestration** | Orchestrator, Planner | Workflow control, planning | No (sequential) |
|
|
18
|
-
| **Implementation** | Implementer, Frontend, Refactor | Code creation/modification | Yes (disjoint files only) |
|
|
19
|
-
| **Research** | Explorer, Researcher-Alpha/Beta/Gamma/Delta | Codebase exploration, decisions | Yes (always) |
|
|
20
|
-
| **Review** | Code-Reviewer-Alpha/Beta, Architect-Reviewer-Alpha/Beta | Quality verification | Yes (always) |
|
|
21
|
-
| **Diagnostics** | Debugger, Security | Issue tracing, vulnerability analysis | Yes (read-only) |
|
|
22
|
-
| **Documentation** | Documenter | README, API docs, changelog | Yes (disjoint files) |
|
|
23
|
-
|
|
24
|
-
### Model Selection by Task Complexity
|
|
25
|
-
|
|
26
|
-
Choose the **least powerful model that can handle the role**:
|
|
27
|
-
|
|
28
|
-
| Complexity Signal | Model Tier | Example Agents |
|
|
29
|
-
|-------------------|-----------|----------------|
|
|
30
|
-
| Mechanical (rename, move, add field) | Fast model | Explorer (Gemini Flash) |
|
|
31
|
-
| Standard (implement spec, write tests) | Mid-tier | Implementer (GPT-5.4), Refactor (GPT-5.4) |
|
|
32
|
-
| Judgment-heavy (architecture, security, debug) | Strongest | Debugger (Opus 4.6), Security (Opus 4.6) |
|
|
33
|
-
| Multi-model cross-validation | Mixed | Researcher-Alpha/Beta/Gamma/Delta (all different) |
|
|
34
|
-
|
|
35
|
-
**Upgrade signal**: If an agent returns `BLOCKED` or `DONE_WITH_CONCERNS` on a task classified as "Standard", consider re-dispatching to a stronger model.
|
|
36
|
-
|
|
37
|
-
---
|
|
38
|
-
|
|
39
|
-
## §2 Task Decomposition Rules
|
|
40
|
-
|
|
41
|
-
### The Golden Rule
|
|
42
|
-
> **One task = one focused problem domain = 1-3 files maximum.**
|
|
43
|
-
|
|
44
|
-
### Decomposition Checklist
|
|
45
|
-
|
|
46
|
-
For each task, specify ALL of:
|
|
47
|
-
- [ ] **Target files** — exact paths to create or modify
|
|
48
|
-
- [ ] **Acceptance criteria** — what "done" looks like (testable)
|
|
49
|
-
- [ ] **Agent assignment** — which agent handles this
|
|
50
|
-
- [ ] **Dependencies** — which tasks must complete first (if any)
|
|
51
|
-
|
|
52
|
-
### Sizing Guide
|
|
53
|
-
|
|
54
|
-
| Task Size | Files | Example | Agent |
|
|
55
|
-
|-----------|-------|---------|-------|
|
|
56
|
-
| **Micro** | 1 file | Add a utility function | Implementer |
|
|
57
|
-
| **Small** | 1-2 files | New endpoint + test | Implementer |
|
|
58
|
-
| **Standard** | 2-3 files | Feature with service + controller + test | Implementer |
|
|
59
|
-
| **Too big** | 4+ files | **SPLIT IT** — decompose further | — |
|
|
60
|
-
|
|
61
|
-
### Splitting Strategies
|
|
62
|
-
- **By layer**: Service logic (Implementer) + UI component (Frontend) + tests (Implementer)
|
|
63
|
-
- **By feature boundary**: Auth endpoints (Implementer A) + Profile endpoints (Implementer B)
|
|
64
|
-
- **By concern**: Data model changes (Implementer) + API route changes (Implementer) + UI updates (Frontend)
|
|
65
|
-
|
|
66
|
-
---
|
|
67
|
-
|
|
68
|
-
## §3 Independence Decision Tree
|
|
69
|
-
|
|
70
|
-
Before marking tasks as parallel, walk this tree:
|
|
71
|
-
|
|
72
|
-
```
|
|
73
|
-
Task A and Task B — can they run in parallel?
|
|
74
|
-
│
|
|
75
|
-
├─ Do they share ANY files? (create, modify, or delete the same file)
|
|
76
|
-
│ ├─ YES → SEQUENTIAL (or merge into one task)
|
|
77
|
-
│ └─ NO ↓
|
|
78
|
-
│
|
|
79
|
-
├─ Do they share mutable state? (env vars, globals, same DB table, shared config)
|
|
80
|
-
│ ├─ YES → SEQUENTIAL
|
|
81
|
-
│ └─ NO ↓
|
|
82
|
-
│
|
|
83
|
-
├─ Does B need A's output? (B reads a file A creates, B uses A's new export)
|
|
84
|
-
│ ├─ YES → SEQUENTIAL (A before B)
|
|
85
|
-
│ └─ NO ↓
|
|
86
|
-
│
|
|
87
|
-
├─ Would A's result change B's approach? (A discovers something that affects B)
|
|
88
|
-
│ ├─ YES → SEQUENTIAL or single agent
|
|
89
|
-
│ └─ NO ↓
|
|
90
|
-
│
|
|
91
|
-
├─ Resource contention? (same port, same build process, same lock file)
|
|
92
|
-
│ ├─ YES → SEQUENTIAL
|
|
93
|
-
│ └─ NO ↓
|
|
94
|
-
│
|
|
95
|
-
└─ ✅ SAFE TO PARALLELIZE
|
|
96
|
-
```
|
|
97
|
-
|
|
98
|
-
### Edge Cases
|
|
99
|
-
|
|
100
|
-
| Situation | Verdict | Why |
|
|
101
|
-
|-----------|---------|-----|
|
|
102
|
-
| Both import from same module (read-only) | ✅ Parallel | Reading shared code is fine |
|
|
103
|
-
| Both add exports to same index file | ❌ Sequential | Concurrent index.ts edits will conflict |
|
|
104
|
-
| A creates a type, B uses that type | ❌ Sequential | B depends on A's output |
|
|
105
|
-
| Both modify different test files | ✅ Parallel | Disjoint file sets |
|
|
106
|
-
| Both touch package.json | ❌ Sequential | Shared file |
|
|
107
|
-
| A adds a route, B adds middleware | ⚠️ Check | If B's middleware affects A's route → sequential |
|
|
108
|
-
|
|
109
|
-
### Integration Verification (after parallel batch completes)
|
|
110
|
-
|
|
111
|
-
1. **Conflict check**: Did any agent unexpectedly modify a file assigned to another agent?
|
|
112
|
-
2. **Import check**: Do all new cross-references resolve?
|
|
113
|
-
3. **Full suite**: `check({})` + `test_run({})` — everything must pass
|
|
114
|
-
4. **Spot check**: Manually verify at least one task's output matches acceptance criteria
|
|
115
|
-
|
|
116
|
-
---
|
|
117
|
-
|
|
118
|
-
## §4 Parallel Dispatch Patterns
|
|
119
|
-
|
|
120
|
-
### Dispatch Rules
|
|
121
|
-
|
|
122
|
-
1. **Max 4 concurrent file-modifying agents** per batch
|
|
123
|
-
2. **Read-only agents have no limit** — Explorer, Researcher*, Reviewer*, Security can always run in parallel
|
|
124
|
-
3. **Build dependency graph first** — phases with no dependencies MUST be batched together
|
|
125
|
-
4. **Never dispatch two implementers to the same file** — even different sections
|
|
126
|
-
|
|
127
|
-
### Batch Strategy
|
|
128
|
-
|
|
129
|
-
```
|
|
130
|
-
Phase Plan:
|
|
131
|
-
Phase 1: [Task A, Task B, Task C] ← no dependencies between A/B/C
|
|
132
|
-
Phase 2: [Task D, Task E] ← D depends on A, E depends on B
|
|
133
|
-
Phase 3: [Task F] ← F depends on D and E
|
|
134
|
-
|
|
135
|
-
Execution:
|
|
136
|
-
Batch 1: dispatch(A, B, C) in parallel → review → gate
|
|
137
|
-
Batch 2: dispatch(D, E) in parallel → review → gate
|
|
138
|
-
Batch 3: dispatch(F) → review → gate
|
|
139
|
-
```
|
|
140
|
-
|
|
141
|
-
### Anti-Patterns
|
|
142
|
-
|
|
143
|
-
| ❌ Don't | ✅ Do Instead |
|
|
144
|
-
|----------|--------------|
|
|
145
|
-
| Dispatch 6 implementers at once | Max 4, queue the rest |
|
|
146
|
-
| Give one agent 10 files | Split into 3-4 focused tasks |
|
|
147
|
-
| Let agents read the full plan | Give each agent ONLY its task context |
|
|
148
|
-
| Retry same prompt on failure | Diagnose first, then re-prompt with fix |
|
|
149
|
-
| Skip review after parallel batch | ALWAYS review + integration verify |
|
|
150
|
-
| Inherit session context to subagent | Build fresh, focused context per dispatch |
|
|
151
|
-
|
|
152
|
-
---
|
|
153
|
-
|
|
154
|
-
## §5 Context Crafting Guide
|
|
155
|
-
|
|
156
|
-
### The Controller Principle
|
|
157
|
-
> **The Orchestrator provides ALL context. Subagents never need to search for context themselves.**
|
|
158
|
-
|
|
159
|
-
Each subagent gets a fresh, self-contained prompt. No inherited session state. No "read the plan first."
|
|
160
|
-
|
|
161
|
-
### The 6-Point Prompt Template
|
|
162
|
-
|
|
163
|
-
Every delegation prompt MUST include:
|
|
164
|
-
|
|
165
|
-
```markdown
|
|
166
|
-
## 1. Scope
|
|
167
|
-
Files to create/modify: [exact paths]
|
|
168
|
-
Files to NOT touch: [boundaries]
|
|
169
|
-
|
|
170
|
-
## 2. Goal
|
|
171
|
-
[What the code should do — acceptance criteria, testable outcomes]
|
|
172
|
-
|
|
173
|
-
## 3. Architectural Context
|
|
174
|
-
[Relevant patterns, conventions, existing code structure]
|
|
175
|
-
[Include actual code snippets from compact/digest — don't tell agent to "go read X"]
|
|
176
|
-
|
|
177
|
-
## 4. Constraints
|
|
178
|
-
- Follow [pattern/convention]
|
|
179
|
-
- Do NOT modify [boundary files]
|
|
180
|
-
- Use [specific library/approach]
|
|
181
|
-
|
|
182
|
-
## 5. FORGE Context
|
|
183
|
-
Tier: [Floor/Standard/Critical]
|
|
184
|
-
Evidence requirements: [what evidence to collect]
|
|
185
|
-
|
|
186
|
-
## 6. Self-Review & Status
|
|
187
|
-
Before declaring DONE, verify:
|
|
188
|
-
- [ ] All acceptance criteria met
|
|
189
|
-
- [ ] No files outside scope modified
|
|
190
|
-
- [ ] Tests pass (if applicable)
|
|
191
|
-
- [ ] Code follows stated conventions
|
|
192
|
-
|
|
193
|
-
End with status: DONE | DONE_WITH_CONCERNS | NEEDS_CONTEXT | BLOCKED
|
|
194
|
-
```
|
|
195
|
-
|
|
196
|
-
### What to Include vs Omit
|
|
197
|
-
|
|
198
|
-
| ✅ Include | ❌ Omit |
|
|
199
|
-
|-----------|---------|
|
|
200
|
-
| Exact file paths and code snippets | Full session history |
|
|
201
|
-
| Acceptance criteria | Other agents' tasks |
|
|
202
|
-
| Relevant conventions (from KB) | Unrelated architecture context |
|
|
203
|
-
| Compact/digest of relevant files | Raw file contents of large files |
|
|
204
|
-
| Error messages (if fixing a bug) | Previous failed attempts (unless relevant) |
|
|
205
|
-
| FORGE tier and ceremony | Full FORGE protocol explanation |
|
|
206
|
-
|
|
207
|
-
### Context Size Budget
|
|
208
|
-
|
|
209
|
-
| Task Complexity | Context Target | Approach |
|
|
210
|
-
|-----------------|---------------|----------|
|
|
211
|
-
| Micro (1 file) | ~500 tokens | Inline code snippet + goal |
|
|
212
|
-
| Small (1-2 files) | ~1000 tokens | `compact` of target files + goal |
|
|
213
|
-
| Standard (2-3 files) | ~2000 tokens | `digest` of related files + architectural context |
|
|
214
|
-
| Complex (judgment-heavy) | ~3000 tokens | `digest` + relevant decisions from AI Kit |
|
|
215
|
-
|
|
216
|
-
---
|
|
217
|
-
|
|
218
|
-
## §6 Subagent Execution Cycle
|
|
219
|
-
|
|
220
|
-
### Lifecycle
|
|
221
|
-
|
|
222
|
-
```
|
|
223
|
-
Orchestrator Subagent (fresh instance)
|
|
224
|
-
│ │
|
|
225
|
-
├─ Craft focused prompt ──────────────►│
|
|
226
|
-
│ (6-point template) │
|
|
227
|
-
│ ├─ Understand scope
|
|
228
|
-
│ ├─ Implement changes
|
|
229
|
-
│ ├─ Self-review (checklist)
|
|
230
|
-
│◄─────────────────── Return status ───┤
|
|
231
|
-
│ │ (DONE/CONCERNS/NEEDS/BLOCKED)
|
|
232
|
-
│ │
|
|
233
|
-
├─ Handle status (see §7) × (subagent terminates)
|
|
234
|
-
│
|
|
235
|
-
├─ Automated gate (check/test_run)
|
|
236
|
-
│
|
|
237
|
-
├─ Dispatch reviewers (see §8)
|
|
238
|
-
│
|
|
239
|
-
└─ FORGE evidence_map gate
|
|
240
|
-
```
|
|
241
|
-
|
|
242
|
-
### Key Rules
|
|
243
|
-
|
|
244
|
-
1. **One subagent = one task**. Never reuse a subagent for a different task.
|
|
245
|
-
2. **Controller provides context**. The subagent's prompt contains everything it needs — it should NOT need to search/explore the codebase.
|
|
246
|
-
3. **Self-review before handoff**. Every implementer must complete the self-review checklist before declaring DONE.
|
|
247
|
-
4. **Status is mandatory**. Every subagent response MUST end with exactly ONE status code.
|
|
248
|
-
|
|
249
|
-
---
|
|
250
|
-
|
|
251
|
-
## §7 Implementer Status Protocol
|
|
252
|
-
|
|
253
|
-
### Status Codes
|
|
254
|
-
|
|
255
|
-
Every implementer (Implementer, Frontend, Refactor) MUST end their response with exactly ONE:
|
|
256
|
-
|
|
257
|
-
| Status | Meaning | Orchestrator Action |
|
|
258
|
-
|--------|---------|-------------------|
|
|
259
|
-
| **DONE** | All tasks complete, self-review passed | → Automated gate → Review pipeline |
|
|
260
|
-
| **DONE_WITH_CONCERNS** | Complete but flagging issues: [list] | → Surface concerns as `Assumed` claims in evidence_map → Likely HOLD → Address before review |
|
|
261
|
-
| **NEEDS_CONTEXT** | Cannot proceed without: [specific question] | → Provide missing context → Re-dispatch same task (counts as retry) |
|
|
262
|
-
| **BLOCKED** | Hit a wall: [description] | → Diagnose (see below) |
|
|
263
|
-
|
|
264
|
-
### BLOCKED Diagnosis Tree
|
|
265
|
-
|
|
266
|
-
```
|
|
267
|
-
Agent returned BLOCKED
|
|
268
|
-
│
|
|
269
|
-
├─ Missing context? (needs info not in prompt)
|
|
270
|
-
│ → Provide context, re-dispatch
|
|
271
|
-
│
|
|
272
|
-
├─ Wrong model? (task too complex for assigned model)
|
|
273
|
-
│ → Re-dispatch to stronger model (e.g., Implementer → Debugger)
|
|
274
|
-
│
|
|
275
|
-
├─ Scope too broad? (agent overwhelmed)
|
|
276
|
-
│ → Split task further, re-dispatch smaller pieces
|
|
277
|
-
│
|
|
278
|
-
├─ Plan wrong? (implementation approach won't work)
|
|
279
|
-
│ → Re-plan this phase, check AI Kit for alternatives
|
|
280
|
-
│
|
|
281
|
-
└─ External blocker? (dependency not ready, API unavailable)
|
|
282
|
-
→ Park task, proceed with independent work, revisit later
|
|
283
|
-
```
|
|
284
|
-
|
|
285
|
-
### FORGE Composition
|
|
286
|
-
|
|
287
|
-
Status protocol and FORGE are **independent but composable**:
|
|
288
|
-
|
|
289
|
-
- **Status** = subjective agent telemetry ("I think I'm done")
|
|
290
|
-
- **FORGE** = objective quality evidence ("the evidence says it's done")
|
|
291
|
-
|
|
292
|
-
```
|
|
293
|
-
DONE → proceed to automated gate → FORGE evidence_map
|
|
294
|
-
DONE_WITH_CONCERNS → concerns become 'Assumed' claims → evidence_map likely HOLDs
|
|
295
|
-
NEEDS_CONTEXT → provide context, re-dispatch (no FORGE yet)
|
|
296
|
-
BLOCKED → diagnose:
|
|
297
|
-
contract/security issue → HARD_BLOCK
|
|
298
|
-
resource/scope issue → re-plan, no FORGE
|
|
299
|
-
```
|
|
300
|
-
|
|
301
|
-
**Critical rule**: Every `DONE` status MUST be followed by `evidence_map({ action: "gate" })` before proceeding to review. No shortcuts.
|
|
302
|
-
|
|
303
|
-
---
|
|
304
|
-
|
|
305
|
-
## §8 Review Pipeline
|
|
306
|
-
|
|
307
|
-
### Four-Stage Pipeline
|
|
308
|
-
|
|
309
|
-
```
|
|
310
|
-
Stage 1: Implementer Self-Review (embedded in agent output)
|
|
311
|
-
└─ Checklist: scope respected, tests pass, conventions followed
|
|
312
|
-
│
|
|
313
|
-
Stage 2: Orchestrator Automated Gate
|
|
314
|
-
└─ check({}) + test_run({}) MUST pass
|
|
315
|
-
└─ Validate self-review checklist present in output
|
|
316
|
-
└─ FAIL → bounce back to implementer with specific gap
|
|
317
|
-
└─ PASS ↓
|
|
318
|
-
│
|
|
319
|
-
Stage 3: Dual Code Review (parallel)
|
|
320
|
-
├─ Code-Reviewer-Alpha (GPT-5.4): code quality + Spec Alignment
|
|
321
|
-
└─ Code-Reviewer-Beta (Opus 4.6): code quality + Spec Alignment
|
|
322
|
-
│ Both review same code, different model perspectives
|
|
323
|
-
│ Spec Alignment = "Does this match what was asked?"
|
|
324
|
-
│
|
|
325
|
-
Stage 4: Conditional Reviews (parallel if both needed)
|
|
326
|
-
├─ Architecture Review — if boundary changes, new modules, pattern shifts
|
|
327
|
-
└─ Security Review — if auth, crypto, input handling, or external data
|
|
328
|
-
│
|
|
329
|
-
FORGE Gate: evidence_map({ action: "gate" })
|
|
330
|
-
└─ YIELD → proceed to commit
|
|
331
|
-
└─ HOLD → address flagged items → re-gate (max 3 rounds)
|
|
332
|
-
└─ HARD_BLOCK → escalate to user
|
|
333
|
-
```
|
|
334
|
-
|
|
335
|
-
### Spec Alignment Dimension (for Code Reviewers)
|
|
336
|
-
|
|
337
|
-
Both Code-Reviewer-Alpha and Code-Reviewer-Beta evaluate an explicit **Spec Alignment** dimension:
|
|
338
|
-
|
|
339
|
-
1. Does the implementation match the acceptance criteria from the task?
|
|
340
|
-
2. Are there over-builds (features not requested)?
|
|
341
|
-
3. Are there under-builds (requirements missed)?
|
|
342
|
-
4. Does the output match the expected file changes?
|
|
343
|
-
|
|
344
|
-
This catches spec drift that automated tests might miss.
|
|
345
|
-
|
|
346
|
-
### When to Skip Stages
|
|
347
|
-
|
|
348
|
-
| Stage | Skip When |
|
|
349
|
-
|-------|-----------|
|
|
350
|
-
| Architecture Review | No new modules, no boundary changes, no new patterns |
|
|
351
|
-
| Security Review | No auth, no crypto, no external input handling |
|
|
352
|
-
| FORGE Gate | Floor-tier tasks only (simple, mechanical changes) |
|
|
353
|
-
|
|
354
|
-
---
|
|
355
|
-
|
|
356
|
-
## §9 Recovery & Escalation
|
|
357
|
-
|
|
358
|
-
### Retry Policy
|
|
359
|
-
|
|
360
|
-
- **Max 2 retries per agent per task** — after that, re-plan or escalate
|
|
361
|
-
- Each retry MUST include the specific failure reason in the new prompt
|
|
362
|
-
- Never retry with the same prompt — always add diagnostic context
|
|
363
|
-
|
|
364
|
-
### Loop Detection
|
|
365
|
-
|
|
366
|
-
If an agent returns the same error/status 2+ times:
|
|
367
|
-
1. **STOP** — do not retry again
|
|
368
|
-
2. Check if the approach is fundamentally wrong
|
|
369
|
-
3. Consider: different agent, different model, different decomposition, or user escalation
|
|
370
|
-
|
|
371
|
-
### Emergency Procedures
|
|
372
|
-
|
|
373
|
-
When parallel batch causes cascading failures:
|
|
374
|
-
|
|
375
|
-
```
|
|
376
|
-
STOP → Halt all running agents immediately
|
|
377
|
-
ASSESS → git diff --stat + check({}) — how bad is it?
|
|
378
|
-
CONTAIN → Limited (1-3 files): fix or re-delegate
|
|
379
|
-
Widespread (10+ files): git stash to preserve for analysis
|
|
380
|
-
RECOVER → Partial: git checkout -- {specific files}
|
|
381
|
-
Full: git stash (preserves) or git checkout . (discards)
|
|
382
|
-
Nuclear: git reset --hard HEAD (last resort)
|
|
383
|
-
DOCUMENT → remember what went wrong, update plan
|
|
384
|
-
```
|
|
385
|
-
|
|
386
|
-
### Scope Tripwires
|
|
387
|
-
|
|
388
|
-
| Signal | Action |
|
|
389
|
-
|--------|--------|
|
|
390
|
-
| Agent modified **2x more files** than planned | Pause, review before continuing |
|
|
391
|
-
| Agent returns `ESCALATE` or `BLOCKED` repeatedly | Do NOT re-delegate unchanged. Diagnose first |
|
|
392
|
-
| Agent's output contradicts the plan | Stop, compare with plan, re-align |
|
|
393
|
-
| Tests that were passing now fail | Immediate rollback of that agent's changes |
|
|
394
|
-
|
|
395
|
-
---
|
|
396
|
-
|
|
397
|
-
## §10 Common Mistakes & Red Flags
|
|
398
|
-
|
|
399
|
-
### Delegation Anti-Patterns
|
|
400
|
-
|
|
401
|
-
| ❌ Mistake | Why It Fails | ✅ Fix |
|
|
402
|
-
|-----------|-------------|--------|
|
|
403
|
-
| **Too broad scope** — "implement the auth system" | Agent lacks clear boundaries, produces sprawling changes | Split: "add JWT middleware to auth.ts" + "add login endpoint to routes.ts" |
|
|
404
|
-
| **No constraints** — "add a feature" | Agent invents architecture, conflicts with existing patterns | Include conventions, boundaries, existing patterns in prompt |
|
|
405
|
-
| **Vague output** — "make it work" | No way to verify completion | Specific acceptance criteria: "endpoint returns 200 with {schema}" |
|
|
406
|
-
| **Session context inheritance** — "continue from where we left off" | Subagent has stale/polluted context | Fresh prompt with 6-point template every time |
|
|
407
|
-
| **Skipping reviews** — "it's a small change" | Small changes cause big regressions | ALWAYS run automated gate minimum |
|
|
408
|
-
| **Parallel on shared files** — "both agents edit config.ts" | Merge conflicts, lost changes | Sequential, or merge into one task |
|
|
409
|
-
| **Trusting the report** — "agent said DONE so it's done" | Agents are optimistic, miss edge cases | Automated gate + dual code review catches this |
|
|
410
|
-
| **Brute-force retries** — same prompt 3 times | If it failed twice, it'll fail a third time | Diagnose, change approach, then retry |
|
|
411
|
-
| **Orchestrator implements** — "just this one small fix" | Breaks the delegation contract, no review | ALWAYS delegate, no matter how small |
|
|
412
|
-
|
|
413
|
-
### Red Flags in Agent Output
|
|
414
|
-
|
|
415
|
-
| Flag | What It Means | Action |
|
|
416
|
-
|------|--------------|--------|
|
|
417
|
-
| Agent modified files outside its scope | Scope creep or misunderstanding | Rollback out-of-scope files, re-delegate with tighter constraints |
|
|
418
|
-
| Agent added dependencies not in plan | Unauthorized architectural decision | Review necessity, likely rollback |
|
|
419
|
-
| Agent skipped self-review checklist | Rushing, likely incomplete | Bounce back with checklist requirement |
|
|
420
|
-
| Agent's DONE but tests fail | Didn't actually self-test | Bounce back with failing test output |
|
|
421
|
-
| Agent asks questions in output instead of using NEEDS_CONTEXT | Misunderstands status protocol | Treat as NEEDS_CONTEXT, educate in next prompt |
|
|
422
|
-
|
|
423
|
-
---
|
|
424
|
-
|
|
425
|
-
## Prompt Template Reference
|
|
426
|
-
|
|
427
|
-
Detailed prompt templates are provided as sidecar files:
|
|
428
|
-
|
|
429
|
-
| Template | File | Use When |
|
|
430
|
-
|----------|------|----------|
|
|
431
|
-
| Implementer dispatch | [`implementer-prompt.md`](implementer-prompt.md) | Dispatching Implementer, Frontend, or Refactor agents |
|
|
432
|
-
| Spec compliance review | [`spec-review-prompt.md`](spec-review-prompt.md) | Adversarial spec alignment check (Code-Reviewer-Alpha) |
|
|
433
|
-
| Code quality review | [`code-quality-review-prompt.md`](code-quality-review-prompt.md) | Dual code quality review (Code-Reviewer-Beta) |
|
|
434
|
-
| Architecture review | [`architecture-review-prompt.md`](architecture-review-prompt.md) | Boundary changes, pattern adherence review |
|
|
435
|
-
| Parallel dispatch example | [`parallel-dispatch-example.md`](parallel-dispatch-example.md) | Worked example of decomposing a feature into parallel tasks |
|
|
1
|
+
# Multi-Agent Development
|
|
2
|
+
|
|
3
|
+
Comprehensive patterns for orchestrating multiple AI agents in parallel development workflows. Covers task decomposition, parallel dispatch, context crafting, status handling, review pipelines, and recovery.
|
|
4
|
+
|
|
5
|
+
**Core Principle**: Dispatch multiple agents for focused tasks. Each subagent gets fresh, focused context with explicit scope — never inherited session state.
|
|
6
|
+
|
|
7
|
+
Load this skill when orchestrating multi-agent work: planning parallel batches, crafting delegation prompts, handling implementer status, running review pipelines, or recovering from agent failures.
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## §1 Agent Roles & Model Selection
|
|
12
|
+
|
|
13
|
+
### Role Categories
|
|
14
|
+
|
|
15
|
+
| Role | Agents | When to Use | Parallelizable |
|
|
16
|
+
|------|--------|-------------|----------------|
|
|
17
|
+
| **Orchestration** | Orchestrator, Planner | Workflow control, planning | No (sequential) |
|
|
18
|
+
| **Implementation** | Implementer, Frontend, Refactor | Code creation/modification | Yes (disjoint files only) |
|
|
19
|
+
| **Research** | Explorer, Researcher-Alpha/Beta/Gamma/Delta | Codebase exploration, decisions | Yes (always) |
|
|
20
|
+
| **Review** | Code-Reviewer-Alpha/Beta, Architect-Reviewer-Alpha/Beta | Quality verification | Yes (always) |
|
|
21
|
+
| **Diagnostics** | Debugger, Security | Issue tracing, vulnerability analysis | Yes (read-only) |
|
|
22
|
+
| **Documentation** | Documenter | README, API docs, changelog | Yes (disjoint files) |
|
|
23
|
+
|
|
24
|
+
### Model Selection by Task Complexity
|
|
25
|
+
|
|
26
|
+
Choose the **least powerful model that can handle the role**:
|
|
27
|
+
|
|
28
|
+
| Complexity Signal | Model Tier | Example Agents |
|
|
29
|
+
|-------------------|-----------|----------------|
|
|
30
|
+
| Mechanical (rename, move, add field) | Fast model | Explorer (Gemini Flash) |
|
|
31
|
+
| Standard (implement spec, write tests) | Mid-tier | Implementer (GPT-5.4), Refactor (GPT-5.4) |
|
|
32
|
+
| Judgment-heavy (architecture, security, debug) | Strongest | Debugger (Opus 4.6), Security (Opus 4.6) |
|
|
33
|
+
| Multi-model cross-validation | Mixed | Researcher-Alpha/Beta/Gamma/Delta (all different) |
|
|
34
|
+
|
|
35
|
+
**Upgrade signal**: If an agent returns `BLOCKED` or `DONE_WITH_CONCERNS` on a task classified as "Standard", consider re-dispatching to a stronger model.
|
|
36
|
+
|
|
37
|
+
---
|
|
38
|
+
|
|
39
|
+
## §2 Task Decomposition Rules
|
|
40
|
+
|
|
41
|
+
### The Golden Rule
|
|
42
|
+
> **One task = one focused problem domain = 1-3 files maximum.**
|
|
43
|
+
|
|
44
|
+
### Decomposition Checklist
|
|
45
|
+
|
|
46
|
+
For each task, specify ALL of:
|
|
47
|
+
- [ ] **Target files** — exact paths to create or modify
|
|
48
|
+
- [ ] **Acceptance criteria** — what "done" looks like (testable)
|
|
49
|
+
- [ ] **Agent assignment** — which agent handles this
|
|
50
|
+
- [ ] **Dependencies** — which tasks must complete first (if any)
|
|
51
|
+
|
|
52
|
+
### Sizing Guide
|
|
53
|
+
|
|
54
|
+
| Task Size | Files | Example | Agent |
|
|
55
|
+
|-----------|-------|---------|-------|
|
|
56
|
+
| **Micro** | 1 file | Add a utility function | Implementer |
|
|
57
|
+
| **Small** | 1-2 files | New endpoint + test | Implementer |
|
|
58
|
+
| **Standard** | 2-3 files | Feature with service + controller + test | Implementer |
|
|
59
|
+
| **Too big** | 4+ files | **SPLIT IT** — decompose further | — |
|
|
60
|
+
|
|
61
|
+
### Splitting Strategies
|
|
62
|
+
- **By layer**: Service logic (Implementer) + UI component (Frontend) + tests (Implementer)
|
|
63
|
+
- **By feature boundary**: Auth endpoints (Implementer A) + Profile endpoints (Implementer B)
|
|
64
|
+
- **By concern**: Data model changes (Implementer) + API route changes (Implementer) + UI updates (Frontend)
|
|
65
|
+
|
|
66
|
+
---
|
|
67
|
+
|
|
68
|
+
## §3 Independence Decision Tree
|
|
69
|
+
|
|
70
|
+
Before marking tasks as parallel, walk this tree:
|
|
71
|
+
|
|
72
|
+
```
|
|
73
|
+
Task A and Task B — can they run in parallel?
|
|
74
|
+
│
|
|
75
|
+
├─ Do they share ANY files? (create, modify, or delete the same file)
|
|
76
|
+
│ ├─ YES → SEQUENTIAL (or merge into one task)
|
|
77
|
+
│ └─ NO ↓
|
|
78
|
+
│
|
|
79
|
+
├─ Do they share mutable state? (env vars, globals, same DB table, shared config)
|
|
80
|
+
│ ├─ YES → SEQUENTIAL
|
|
81
|
+
│ └─ NO ↓
|
|
82
|
+
│
|
|
83
|
+
├─ Does B need A's output? (B reads a file A creates, B uses A's new export)
|
|
84
|
+
│ ├─ YES → SEQUENTIAL (A before B)
|
|
85
|
+
│ └─ NO ↓
|
|
86
|
+
│
|
|
87
|
+
├─ Would A's result change B's approach? (A discovers something that affects B)
|
|
88
|
+
│ ├─ YES → SEQUENTIAL or single agent
|
|
89
|
+
│ └─ NO ↓
|
|
90
|
+
│
|
|
91
|
+
├─ Resource contention? (same port, same build process, same lock file)
|
|
92
|
+
│ ├─ YES → SEQUENTIAL
|
|
93
|
+
│ └─ NO ↓
|
|
94
|
+
│
|
|
95
|
+
└─ ✅ SAFE TO PARALLELIZE
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
### Edge Cases
|
|
99
|
+
|
|
100
|
+
| Situation | Verdict | Why |
|
|
101
|
+
|-----------|---------|-----|
|
|
102
|
+
| Both import from same module (read-only) | ✅ Parallel | Reading shared code is fine |
|
|
103
|
+
| Both add exports to same index file | ❌ Sequential | Concurrent index.ts edits will conflict |
|
|
104
|
+
| A creates a type, B uses that type | ❌ Sequential | B depends on A's output |
|
|
105
|
+
| Both modify different test files | ✅ Parallel | Disjoint file sets |
|
|
106
|
+
| Both touch package.json | ❌ Sequential | Shared file |
|
|
107
|
+
| A adds a route, B adds middleware | ⚠️ Check | If B's middleware affects A's route → sequential |
|
|
108
|
+
|
|
109
|
+
### Integration Verification (after parallel batch completes)
|
|
110
|
+
|
|
111
|
+
1. **Conflict check**: Did any agent unexpectedly modify a file assigned to another agent?
|
|
112
|
+
2. **Import check**: Do all new cross-references resolve?
|
|
113
|
+
3. **Full suite**: `check({})` + `test_run({})` — everything must pass
|
|
114
|
+
4. **Spot check**: Manually verify at least one task's output matches acceptance criteria
|
|
115
|
+
|
|
116
|
+
---
|
|
117
|
+
|
|
118
|
+
## §4 Parallel Dispatch Patterns
|
|
119
|
+
|
|
120
|
+
### Dispatch Rules
|
|
121
|
+
|
|
122
|
+
1. **Max 4 concurrent file-modifying agents** per batch
|
|
123
|
+
2. **Read-only agents have no limit** — Explorer, Researcher*, Reviewer*, Security can always run in parallel
|
|
124
|
+
3. **Build dependency graph first** — phases with no dependencies MUST be batched together
|
|
125
|
+
4. **Never dispatch two implementers to the same file** — even different sections
|
|
126
|
+
|
|
127
|
+
### Batch Strategy
|
|
128
|
+
|
|
129
|
+
```
|
|
130
|
+
Phase Plan:
|
|
131
|
+
Phase 1: [Task A, Task B, Task C] ← no dependencies between A/B/C
|
|
132
|
+
Phase 2: [Task D, Task E] ← D depends on A, E depends on B
|
|
133
|
+
Phase 3: [Task F] ← F depends on D and E
|
|
134
|
+
|
|
135
|
+
Execution:
|
|
136
|
+
Batch 1: dispatch(A, B, C) in parallel → review → gate
|
|
137
|
+
Batch 2: dispatch(D, E) in parallel → review → gate
|
|
138
|
+
Batch 3: dispatch(F) → review → gate
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
### Anti-Patterns
|
|
142
|
+
|
|
143
|
+
| ❌ Don't | ✅ Do Instead |
|
|
144
|
+
|----------|--------------|
|
|
145
|
+
| Dispatch 6 implementers at once | Max 4, queue the rest |
|
|
146
|
+
| Give one agent 10 files | Split into 3-4 focused tasks |
|
|
147
|
+
| Let agents read the full plan | Give each agent ONLY its task context |
|
|
148
|
+
| Retry same prompt on failure | Diagnose first, then re-prompt with fix |
|
|
149
|
+
| Skip review after parallel batch | ALWAYS review + integration verify |
|
|
150
|
+
| Inherit session context to subagent | Build fresh, focused context per dispatch |
|
|
151
|
+
|
|
152
|
+
---
|
|
153
|
+
|
|
154
|
+
## §5 Context Crafting Guide
|
|
155
|
+
|
|
156
|
+
### The Controller Principle
|
|
157
|
+
> **The Orchestrator provides ALL context. Subagents never need to search for context themselves.**
|
|
158
|
+
|
|
159
|
+
Each subagent gets a fresh, self-contained prompt. No inherited session state. No "read the plan first."
|
|
160
|
+
|
|
161
|
+
### The 6-Point Prompt Template
|
|
162
|
+
|
|
163
|
+
Every delegation prompt MUST include:
|
|
164
|
+
|
|
165
|
+
```markdown
|
|
166
|
+
## 1. Scope
|
|
167
|
+
Files to create/modify: [exact paths]
|
|
168
|
+
Files to NOT touch: [boundaries]
|
|
169
|
+
|
|
170
|
+
## 2. Goal
|
|
171
|
+
[What the code should do — acceptance criteria, testable outcomes]
|
|
172
|
+
|
|
173
|
+
## 3. Architectural Context
|
|
174
|
+
[Relevant patterns, conventions, existing code structure]
|
|
175
|
+
[Include actual code snippets from compact/digest — don't tell agent to "go read X"]
|
|
176
|
+
|
|
177
|
+
## 4. Constraints
|
|
178
|
+
- Follow [pattern/convention]
|
|
179
|
+
- Do NOT modify [boundary files]
|
|
180
|
+
- Use [specific library/approach]
|
|
181
|
+
|
|
182
|
+
## 5. FORGE Context
|
|
183
|
+
Tier: [Floor/Standard/Critical]
|
|
184
|
+
Evidence requirements: [what evidence to collect]
|
|
185
|
+
|
|
186
|
+
## 6. Self-Review & Status
|
|
187
|
+
Before declaring DONE, verify:
|
|
188
|
+
- [ ] All acceptance criteria met
|
|
189
|
+
- [ ] No files outside scope modified
|
|
190
|
+
- [ ] Tests pass (if applicable)
|
|
191
|
+
- [ ] Code follows stated conventions
|
|
192
|
+
|
|
193
|
+
End with status: DONE | DONE_WITH_CONCERNS | NEEDS_CONTEXT | BLOCKED
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
### What to Include vs Omit
|
|
197
|
+
|
|
198
|
+
| ✅ Include | ❌ Omit |
|
|
199
|
+
|-----------|---------|
|
|
200
|
+
| Exact file paths and code snippets | Full session history |
|
|
201
|
+
| Acceptance criteria | Other agents' tasks |
|
|
202
|
+
| Relevant conventions (from KB) | Unrelated architecture context |
|
|
203
|
+
| Compact/digest of relevant files | Raw file contents of large files |
|
|
204
|
+
| Error messages (if fixing a bug) | Previous failed attempts (unless relevant) |
|
|
205
|
+
| FORGE tier and ceremony | Full FORGE protocol explanation |
|
|
206
|
+
|
|
207
|
+
### Context Size Budget
|
|
208
|
+
|
|
209
|
+
| Task Complexity | Context Target | Approach |
|
|
210
|
+
|-----------------|---------------|----------|
|
|
211
|
+
| Micro (1 file) | ~500 tokens | Inline code snippet + goal |
|
|
212
|
+
| Small (1-2 files) | ~1000 tokens | `compact` of target files + goal |
|
|
213
|
+
| Standard (2-3 files) | ~2000 tokens | `digest` of related files + architectural context |
|
|
214
|
+
| Complex (judgment-heavy) | ~3000 tokens | `digest` + relevant decisions from AI Kit |
|
|
215
|
+
|
|
216
|
+
---
|
|
217
|
+
|
|
218
|
+
## §6 Subagent Execution Cycle
|
|
219
|
+
|
|
220
|
+
### Lifecycle
|
|
221
|
+
|
|
222
|
+
```
|
|
223
|
+
Orchestrator Subagent (fresh instance)
|
|
224
|
+
│ │
|
|
225
|
+
├─ Craft focused prompt ──────────────►│
|
|
226
|
+
│ (6-point template) │
|
|
227
|
+
│ ├─ Understand scope
|
|
228
|
+
│ ├─ Implement changes
|
|
229
|
+
│ ├─ Self-review (checklist)
|
|
230
|
+
│◄─────────────────── Return status ───┤
|
|
231
|
+
│ │ (DONE/CONCERNS/NEEDS/BLOCKED)
|
|
232
|
+
│ │
|
|
233
|
+
├─ Handle status (see §7) × (subagent terminates)
|
|
234
|
+
│
|
|
235
|
+
├─ Automated gate (check/test_run)
|
|
236
|
+
│
|
|
237
|
+
├─ Dispatch reviewers (see §8)
|
|
238
|
+
│
|
|
239
|
+
└─ FORGE evidence_map gate
|
|
240
|
+
```
|
|
241
|
+
|
|
242
|
+
### Key Rules
|
|
243
|
+
|
|
244
|
+
1. **One subagent = one task**. Never reuse a subagent for a different task.
|
|
245
|
+
2. **Controller provides context**. The subagent's prompt contains everything it needs — it should NOT need to search/explore the codebase.
|
|
246
|
+
3. **Self-review before handoff**. Every implementer must complete the self-review checklist before declaring DONE.
|
|
247
|
+
4. **Status is mandatory**. Every subagent response MUST end with exactly ONE status code.
|
|
248
|
+
|
|
249
|
+
---
|
|
250
|
+
|
|
251
|
+
## §7 Implementer Status Protocol
|
|
252
|
+
|
|
253
|
+
### Status Codes
|
|
254
|
+
|
|
255
|
+
Every implementer (Implementer, Frontend, Refactor) MUST end their response with exactly ONE:
|
|
256
|
+
|
|
257
|
+
| Status | Meaning | Orchestrator Action |
|
|
258
|
+
|--------|---------|-------------------|
|
|
259
|
+
| **DONE** | All tasks complete, self-review passed | → Automated gate → Review pipeline |
|
|
260
|
+
| **DONE_WITH_CONCERNS** | Complete but flagging issues: [list] | → Surface concerns as `Assumed` claims in evidence_map → Likely HOLD → Address before review |
|
|
261
|
+
| **NEEDS_CONTEXT** | Cannot proceed without: [specific question] | → Provide missing context → Re-dispatch same task (counts as retry) |
|
|
262
|
+
| **BLOCKED** | Hit a wall: [description] | → Diagnose (see below) |
|
|
263
|
+
|
|
264
|
+
### BLOCKED Diagnosis Tree
|
|
265
|
+
|
|
266
|
+
```
|
|
267
|
+
Agent returned BLOCKED
|
|
268
|
+
│
|
|
269
|
+
├─ Missing context? (needs info not in prompt)
|
|
270
|
+
│ → Provide context, re-dispatch
|
|
271
|
+
│
|
|
272
|
+
├─ Wrong model? (task too complex for assigned model)
|
|
273
|
+
│ → Re-dispatch to stronger model (e.g., Implementer → Debugger)
|
|
274
|
+
│
|
|
275
|
+
├─ Scope too broad? (agent overwhelmed)
|
|
276
|
+
│ → Split task further, re-dispatch smaller pieces
|
|
277
|
+
│
|
|
278
|
+
├─ Plan wrong? (implementation approach won't work)
|
|
279
|
+
│ → Re-plan this phase, check AI Kit for alternatives
|
|
280
|
+
│
|
|
281
|
+
└─ External blocker? (dependency not ready, API unavailable)
|
|
282
|
+
→ Park task, proceed with independent work, revisit later
|
|
283
|
+
```
|
|
284
|
+
|
|
285
|
+
### FORGE Composition
|
|
286
|
+
|
|
287
|
+
Status protocol and FORGE are **independent but composable**:
|
|
288
|
+
|
|
289
|
+
- **Status** = subjective agent telemetry ("I think I'm done")
|
|
290
|
+
- **FORGE** = objective quality evidence ("the evidence says it's done")
|
|
291
|
+
|
|
292
|
+
```
|
|
293
|
+
DONE → proceed to automated gate → FORGE evidence_map
|
|
294
|
+
DONE_WITH_CONCERNS → concerns become 'Assumed' claims → evidence_map likely HOLDs
|
|
295
|
+
NEEDS_CONTEXT → provide context, re-dispatch (no FORGE yet)
|
|
296
|
+
BLOCKED → diagnose:
|
|
297
|
+
contract/security issue → HARD_BLOCK
|
|
298
|
+
resource/scope issue → re-plan, no FORGE
|
|
299
|
+
```
|
|
300
|
+
|
|
301
|
+
**Critical rule**: Every `DONE` status MUST be followed by `evidence_map({ action: "gate" })` before proceeding to review. No shortcuts.
|
|
302
|
+
|
|
303
|
+
---
|
|
304
|
+
|
|
305
|
+
## §8 Review Pipeline
|
|
306
|
+
|
|
307
|
+
### Four-Stage Pipeline
|
|
308
|
+
|
|
309
|
+
```
|
|
310
|
+
Stage 1: Implementer Self-Review (embedded in agent output)
|
|
311
|
+
└─ Checklist: scope respected, tests pass, conventions followed
|
|
312
|
+
│
|
|
313
|
+
Stage 2: Orchestrator Automated Gate
|
|
314
|
+
└─ check({}) + test_run({}) MUST pass
|
|
315
|
+
└─ Validate self-review checklist present in output
|
|
316
|
+
└─ FAIL → bounce back to implementer with specific gap
|
|
317
|
+
└─ PASS ↓
|
|
318
|
+
│
|
|
319
|
+
Stage 3: Dual Code Review (parallel)
|
|
320
|
+
├─ Code-Reviewer-Alpha (GPT-5.4): code quality + Spec Alignment
|
|
321
|
+
└─ Code-Reviewer-Beta (Opus 4.6): code quality + Spec Alignment
|
|
322
|
+
│ Both review same code, different model perspectives
|
|
323
|
+
│ Spec Alignment = "Does this match what was asked?"
|
|
324
|
+
│
|
|
325
|
+
Stage 4: Conditional Reviews (parallel if both needed)
|
|
326
|
+
├─ Architecture Review — if boundary changes, new modules, pattern shifts
|
|
327
|
+
└─ Security Review — if auth, crypto, input handling, or external data
|
|
328
|
+
│
|
|
329
|
+
FORGE Gate: evidence_map({ action: "gate" })
|
|
330
|
+
└─ YIELD → proceed to commit
|
|
331
|
+
└─ HOLD → address flagged items → re-gate (max 3 rounds)
|
|
332
|
+
└─ HARD_BLOCK → escalate to user
|
|
333
|
+
```
|
|
334
|
+
|
|
335
|
+
### Spec Alignment Dimension (for Code Reviewers)
|
|
336
|
+
|
|
337
|
+
Both Code-Reviewer-Alpha and Code-Reviewer-Beta evaluate an explicit **Spec Alignment** dimension:
|
|
338
|
+
|
|
339
|
+
1. Does the implementation match the acceptance criteria from the task?
|
|
340
|
+
2. Are there over-builds (features not requested)?
|
|
341
|
+
3. Are there under-builds (requirements missed)?
|
|
342
|
+
4. Does the output match the expected file changes?
|
|
343
|
+
|
|
344
|
+
This catches spec drift that automated tests might miss.
|
|
345
|
+
|
|
346
|
+
### When to Skip Stages
|
|
347
|
+
|
|
348
|
+
| Stage | Skip When |
|
|
349
|
+
|-------|-----------|
|
|
350
|
+
| Architecture Review | No new modules, no boundary changes, no new patterns |
|
|
351
|
+
| Security Review | No auth, no crypto, no external input handling |
|
|
352
|
+
| FORGE Gate | Floor-tier tasks only (simple, mechanical changes) |
|
|
353
|
+
|
|
354
|
+
---
|
|
355
|
+
|
|
356
|
+
## §9 Recovery & Escalation
|
|
357
|
+
|
|
358
|
+
### Retry Policy
|
|
359
|
+
|
|
360
|
+
- **Max 2 retries per agent per task** — after that, re-plan or escalate
|
|
361
|
+
- Each retry MUST include the specific failure reason in the new prompt
|
|
362
|
+
- Never retry with the same prompt — always add diagnostic context
|
|
363
|
+
|
|
364
|
+
### Loop Detection
|
|
365
|
+
|
|
366
|
+
If an agent returns the same error/status 2+ times:
|
|
367
|
+
1. **STOP** — do not retry again
|
|
368
|
+
2. Check if the approach is fundamentally wrong
|
|
369
|
+
3. Consider: different agent, different model, different decomposition, or user escalation
|
|
370
|
+
|
|
371
|
+
### Emergency Procedures
|
|
372
|
+
|
|
373
|
+
When parallel batch causes cascading failures:
|
|
374
|
+
|
|
375
|
+
```
|
|
376
|
+
STOP → Halt all running agents immediately
|
|
377
|
+
ASSESS → git diff --stat + check({}) — how bad is it?
|
|
378
|
+
CONTAIN → Limited (1-3 files): fix or re-delegate
|
|
379
|
+
Widespread (10+ files): git stash to preserve for analysis
|
|
380
|
+
RECOVER → Partial: git checkout -- {specific files}
|
|
381
|
+
Full: git stash (preserves) or git checkout . (discards)
|
|
382
|
+
Nuclear: git reset --hard HEAD (last resort)
|
|
383
|
+
DOCUMENT → remember what went wrong, update plan
|
|
384
|
+
```
|
|
385
|
+
|
|
386
|
+
### Scope Tripwires
|
|
387
|
+
|
|
388
|
+
| Signal | Action |
|
|
389
|
+
|--------|--------|
|
|
390
|
+
| Agent modified **2x more files** than planned | Pause, review before continuing |
|
|
391
|
+
| Agent returns `ESCALATE` or `BLOCKED` repeatedly | Do NOT re-delegate unchanged. Diagnose first |
|
|
392
|
+
| Agent's output contradicts the plan | Stop, compare with plan, re-align |
|
|
393
|
+
| Tests that were passing now fail | Immediate rollback of that agent's changes |
|
|
394
|
+
|
|
395
|
+
---
|
|
396
|
+
|
|
397
|
+
## §10 Common Mistakes & Red Flags
|
|
398
|
+
|
|
399
|
+
### Delegation Anti-Patterns
|
|
400
|
+
|
|
401
|
+
| ❌ Mistake | Why It Fails | ✅ Fix |
|
|
402
|
+
|-----------|-------------|--------|
|
|
403
|
+
| **Too broad scope** — "implement the auth system" | Agent lacks clear boundaries, produces sprawling changes | Split: "add JWT middleware to auth.ts" + "add login endpoint to routes.ts" |
|
|
404
|
+
| **No constraints** — "add a feature" | Agent invents architecture, conflicts with existing patterns | Include conventions, boundaries, existing patterns in prompt |
|
|
405
|
+
| **Vague output** — "make it work" | No way to verify completion | Specific acceptance criteria: "endpoint returns 200 with {schema}" |
|
|
406
|
+
| **Session context inheritance** — "continue from where we left off" | Subagent has stale/polluted context | Fresh prompt with 6-point template every time |
|
|
407
|
+
| **Skipping reviews** — "it's a small change" | Small changes cause big regressions | ALWAYS run automated gate minimum |
|
|
408
|
+
| **Parallel on shared files** — "both agents edit config.ts" | Merge conflicts, lost changes | Sequential, or merge into one task |
|
|
409
|
+
| **Trusting the report** — "agent said DONE so it's done" | Agents are optimistic, miss edge cases | Automated gate + dual code review catches this |
|
|
410
|
+
| **Brute-force retries** — same prompt 3 times | If it failed twice, it'll fail a third time | Diagnose, change approach, then retry |
|
|
411
|
+
| **Orchestrator implements** — "just this one small fix" | Breaks the delegation contract, no review | ALWAYS delegate, no matter how small |
|
|
412
|
+
|
|
413
|
+
### Red Flags in Agent Output
|
|
414
|
+
|
|
415
|
+
| Flag | What It Means | Action |
|
|
416
|
+
|------|--------------|--------|
|
|
417
|
+
| Agent modified files outside its scope | Scope creep or misunderstanding | Rollback out-of-scope files, re-delegate with tighter constraints |
|
|
418
|
+
| Agent added dependencies not in plan | Unauthorized architectural decision | Review necessity, likely rollback |
|
|
419
|
+
| Agent skipped self-review checklist | Rushing, likely incomplete | Bounce back with checklist requirement |
|
|
420
|
+
| Agent's DONE but tests fail | Didn't actually self-test | Bounce back with failing test output |
|
|
421
|
+
| Agent asks questions in output instead of using NEEDS_CONTEXT | Misunderstands status protocol | Treat as NEEDS_CONTEXT, educate in next prompt |
|
|
422
|
+
|
|
423
|
+
---
|
|
424
|
+
|
|
425
|
+
## Prompt Template Reference
|
|
426
|
+
|
|
427
|
+
Detailed prompt templates are provided as sidecar files:
|
|
428
|
+
|
|
429
|
+
| Template | File | Use When |
|
|
430
|
+
|----------|------|----------|
|
|
431
|
+
| Implementer dispatch | [`implementer-prompt.md`](implementer-prompt.md) | Dispatching Implementer, Frontend, or Refactor agents |
|
|
432
|
+
| Spec compliance review | [`spec-review-prompt.md`](spec-review-prompt.md) | Adversarial spec alignment check (Code-Reviewer-Alpha) |
|
|
433
|
+
| Code quality review | [`code-quality-review-prompt.md`](code-quality-review-prompt.md) | Dual code quality review (Code-Reviewer-Beta) |
|
|
434
|
+
| Architecture review | [`architecture-review-prompt.md`](architecture-review-prompt.md) | Boundary changes, pattern adherence review |
|
|
435
|
+
| Parallel dispatch example | [`parallel-dispatch-example.md`](parallel-dispatch-example.md) | Worked example of decomposing a feature into parallel tasks |
|