@vpxa/aikit 0.1.2 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (52) hide show
  1. package/package.json +1 -1
  2. package/packages/core/dist/global-registry.js +1 -1
  3. package/packages/core/dist/types.d.ts +2 -0
  4. package/packages/flows/dist/git.js +1 -1
  5. package/packages/flows/dist/registry.d.ts +3 -3
  6. package/packages/flows/dist/registry.js +1 -1
  7. package/packages/flows/dist/symlinks.js +1 -1
  8. package/packages/indexer/dist/filesystem-crawler.js +1 -1
  9. package/packages/indexer/dist/hash-cache.js +1 -1
  10. package/packages/kb-client/dist/direct-client.d.ts +33 -34
  11. package/packages/kb-client/dist/index.d.ts +5 -4
  12. package/packages/kb-client/dist/mcp-client.d.ts +18 -18
  13. package/packages/kb-client/dist/parsers.d.ts +14 -11
  14. package/packages/kb-client/dist/types.d.ts +50 -47
  15. package/packages/present/dist/index.html +26 -26
  16. package/packages/server/dist/config.js +1 -1
  17. package/packages/server/dist/idle-timer.d.ts +4 -0
  18. package/packages/server/dist/idle-timer.js +1 -1
  19. package/packages/server/dist/index.js +1 -1
  20. package/packages/server/dist/memory-monitor.d.ts +2 -2
  21. package/packages/server/dist/memory-monitor.js +1 -1
  22. package/packages/server/dist/server.d.ts +1 -1
  23. package/packages/server/dist/server.js +2 -2
  24. package/packages/server/dist/tool-metadata.js +1 -1
  25. package/packages/server/dist/tools/config.tool.d.ts +8 -0
  26. package/packages/server/dist/tools/config.tool.js +12 -0
  27. package/packages/server/dist/tools/flow.tools.js +1 -1
  28. package/packages/server/dist/tools/present/browser.js +7 -7
  29. package/packages/server/dist/tools/present/tool.js +4 -4
  30. package/packages/server/dist/tools/search.tool.js +4 -4
  31. package/packages/server/dist/tools/status.tool.js +3 -3
  32. package/packages/store/dist/sqlite-graph-store.d.ts +3 -0
  33. package/packages/store/dist/sqlite-graph-store.js +3 -3
  34. package/packages/tools/dist/checkpoint.js +1 -1
  35. package/packages/tools/dist/evidence-map.js +2 -2
  36. package/packages/tools/dist/queue.js +1 -1
  37. package/packages/tools/dist/restore-points.js +1 -1
  38. package/packages/tools/dist/schema-validate.js +1 -1
  39. package/packages/tools/dist/snippet.js +1 -1
  40. package/packages/tools/dist/stash.js +1 -1
  41. package/packages/tools/dist/workset.js +1 -1
  42. package/packages/tui/dist/{App-B2-KJPt4.js → App-DpjN3iS-.js} +1 -1
  43. package/packages/tui/dist/App.js +1 -1
  44. package/packages/tui/dist/LogPanel-Db-SeZhR.js +3 -0
  45. package/packages/tui/dist/index.js +1 -1
  46. package/packages/tui/dist/panels/LogPanel.js +1 -1
  47. package/scaffold/general/skills/multi-agents-development/SKILL.md +435 -435
  48. package/scaffold/general/skills/present/SKILL.md +424 -424
  49. package/packages/kb-client/dist/__tests__/direct-client.test.d.ts +0 -1
  50. package/packages/kb-client/dist/__tests__/mcp-client.test.d.ts +0 -1
  51. package/packages/kb-client/dist/__tests__/parsers.test.d.ts +0 -1
  52. package/packages/tui/dist/LogPanel-E_1Do4-j.js +0 -3
@@ -1,435 +1,435 @@
1
- # Multi-Agent Development
2
-
3
- Comprehensive patterns for orchestrating multiple AI agents in parallel development workflows. Covers task decomposition, parallel dispatch, context crafting, status handling, review pipelines, and recovery.
4
-
5
- **Core Principle**: Dispatch multiple agents for focused tasks. Each subagent gets fresh, focused context with explicit scope — never inherited session state.
6
-
7
- Load this skill when orchestrating multi-agent work: planning parallel batches, crafting delegation prompts, handling implementer status, running review pipelines, or recovering from agent failures.
8
-
9
- ---
10
-
11
- ## §1 Agent Roles & Model Selection
12
-
13
- ### Role Categories
14
-
15
- | Role | Agents | When to Use | Parallelizable |
16
- |------|--------|-------------|----------------|
17
- | **Orchestration** | Orchestrator, Planner | Workflow control, planning | No (sequential) |
18
- | **Implementation** | Implementer, Frontend, Refactor | Code creation/modification | Yes (disjoint files only) |
19
- | **Research** | Explorer, Researcher-Alpha/Beta/Gamma/Delta | Codebase exploration, decisions | Yes (always) |
20
- | **Review** | Code-Reviewer-Alpha/Beta, Architect-Reviewer-Alpha/Beta | Quality verification | Yes (always) |
21
- | **Diagnostics** | Debugger, Security | Issue tracing, vulnerability analysis | Yes (read-only) |
22
- | **Documentation** | Documenter | README, API docs, changelog | Yes (disjoint files) |
23
-
24
- ### Model Selection by Task Complexity
25
-
26
- Choose the **least powerful model that can handle the role**:
27
-
28
- | Complexity Signal | Model Tier | Example Agents |
29
- |-------------------|-----------|----------------|
30
- | Mechanical (rename, move, add field) | Fast model | Explorer (Gemini Flash) |
31
- | Standard (implement spec, write tests) | Mid-tier | Implementer (GPT-5.4), Refactor (GPT-5.4) |
32
- | Judgment-heavy (architecture, security, debug) | Strongest | Debugger (Opus 4.6), Security (Opus 4.6) |
33
- | Multi-model cross-validation | Mixed | Researcher-Alpha/Beta/Gamma/Delta (all different) |
34
-
35
- **Upgrade signal**: If an agent returns `BLOCKED` or `DONE_WITH_CONCERNS` on a task classified as "Standard", consider re-dispatching to a stronger model.
36
-
37
- ---
38
-
39
- ## §2 Task Decomposition Rules
40
-
41
- ### The Golden Rule
42
- > **One task = one focused problem domain = 1-3 files maximum.**
43
-
44
- ### Decomposition Checklist
45
-
46
- For each task, specify ALL of:
47
- - [ ] **Target files** — exact paths to create or modify
48
- - [ ] **Acceptance criteria** — what "done" looks like (testable)
49
- - [ ] **Agent assignment** — which agent handles this
50
- - [ ] **Dependencies** — which tasks must complete first (if any)
51
-
52
- ### Sizing Guide
53
-
54
- | Task Size | Files | Example | Agent |
55
- |-----------|-------|---------|-------|
56
- | **Micro** | 1 file | Add a utility function | Implementer |
57
- | **Small** | 1-2 files | New endpoint + test | Implementer |
58
- | **Standard** | 2-3 files | Feature with service + controller + test | Implementer |
59
- | **Too big** | 4+ files | **SPLIT IT** — decompose further | — |
60
-
61
- ### Splitting Strategies
62
- - **By layer**: Service logic (Implementer) + UI component (Frontend) + tests (Implementer)
63
- - **By feature boundary**: Auth endpoints (Implementer A) + Profile endpoints (Implementer B)
64
- - **By concern**: Data model changes (Implementer) + API route changes (Implementer) + UI updates (Frontend)
65
-
66
- ---
67
-
68
- ## §3 Independence Decision Tree
69
-
70
- Before marking tasks as parallel, walk this tree:
71
-
72
- ```
73
- Task A and Task B — can they run in parallel?
74
-
75
- ├─ Do they share ANY files? (create, modify, or delete the same file)
76
- │ ├─ YES → SEQUENTIAL (or merge into one task)
77
- │ └─ NO ↓
78
-
79
- ├─ Do they share mutable state? (env vars, globals, same DB table, shared config)
80
- │ ├─ YES → SEQUENTIAL
81
- │ └─ NO ↓
82
-
83
- ├─ Does B need A's output? (B reads a file A creates, B uses A's new export)
84
- │ ├─ YES → SEQUENTIAL (A before B)
85
- │ └─ NO ↓
86
-
87
- ├─ Would A's result change B's approach? (A discovers something that affects B)
88
- │ ├─ YES → SEQUENTIAL or single agent
89
- │ └─ NO ↓
90
-
91
- ├─ Resource contention? (same port, same build process, same lock file)
92
- │ ├─ YES → SEQUENTIAL
93
- │ └─ NO ↓
94
-
95
- └─ ✅ SAFE TO PARALLELIZE
96
- ```
97
-
98
- ### Edge Cases
99
-
100
- | Situation | Verdict | Why |
101
- |-----------|---------|-----|
102
- | Both import from same module (read-only) | ✅ Parallel | Reading shared code is fine |
103
- | Both add exports to same index file | ❌ Sequential | Concurrent index.ts edits will conflict |
104
- | A creates a type, B uses that type | ❌ Sequential | B depends on A's output |
105
- | Both modify different test files | ✅ Parallel | Disjoint file sets |
106
- | Both touch package.json | ❌ Sequential | Shared file |
107
- | A adds a route, B adds middleware | ⚠️ Check | If B's middleware affects A's route → sequential |
108
-
109
- ### Integration Verification (after parallel batch completes)
110
-
111
- 1. **Conflict check**: Did any agent unexpectedly modify a file assigned to another agent?
112
- 2. **Import check**: Do all new cross-references resolve?
113
- 3. **Full suite**: `check({})` + `test_run({})` — everything must pass
114
- 4. **Spot check**: Manually verify at least one task's output matches acceptance criteria
115
-
116
- ---
117
-
118
- ## §4 Parallel Dispatch Patterns
119
-
120
- ### Dispatch Rules
121
-
122
- 1. **Max 4 concurrent file-modifying agents** per batch
123
- 2. **Read-only agents have no limit** — Explorer, Researcher*, Reviewer*, Security can always run in parallel
124
- 3. **Build dependency graph first** — phases with no dependencies MUST be batched together
125
- 4. **Never dispatch two implementers to the same file** — even different sections
126
-
127
- ### Batch Strategy
128
-
129
- ```
130
- Phase Plan:
131
- Phase 1: [Task A, Task B, Task C] ← no dependencies between A/B/C
132
- Phase 2: [Task D, Task E] ← D depends on A, E depends on B
133
- Phase 3: [Task F] ← F depends on D and E
134
-
135
- Execution:
136
- Batch 1: dispatch(A, B, C) in parallel → review → gate
137
- Batch 2: dispatch(D, E) in parallel → review → gate
138
- Batch 3: dispatch(F) → review → gate
139
- ```
140
-
141
- ### Anti-Patterns
142
-
143
- | ❌ Don't | ✅ Do Instead |
144
- |----------|--------------|
145
- | Dispatch 6 implementers at once | Max 4, queue the rest |
146
- | Give one agent 10 files | Split into 3-4 focused tasks |
147
- | Let agents read the full plan | Give each agent ONLY its task context |
148
- | Retry same prompt on failure | Diagnose first, then re-prompt with fix |
149
- | Skip review after parallel batch | ALWAYS review + integration verify |
150
- | Inherit session context to subagent | Build fresh, focused context per dispatch |
151
-
152
- ---
153
-
154
- ## §5 Context Crafting Guide
155
-
156
- ### The Controller Principle
157
- > **The Orchestrator provides ALL context. Subagents never need to search for context themselves.**
158
-
159
- Each subagent gets a fresh, self-contained prompt. No inherited session state. No "read the plan first."
160
-
161
- ### The 6-Point Prompt Template
162
-
163
- Every delegation prompt MUST include:
164
-
165
- ```markdown
166
- ## 1. Scope
167
- Files to create/modify: [exact paths]
168
- Files to NOT touch: [boundaries]
169
-
170
- ## 2. Goal
171
- [What the code should do — acceptance criteria, testable outcomes]
172
-
173
- ## 3. Architectural Context
174
- [Relevant patterns, conventions, existing code structure]
175
- [Include actual code snippets from compact/digest — don't tell agent to "go read X"]
176
-
177
- ## 4. Constraints
178
- - Follow [pattern/convention]
179
- - Do NOT modify [boundary files]
180
- - Use [specific library/approach]
181
-
182
- ## 5. FORGE Context
183
- Tier: [Floor/Standard/Critical]
184
- Evidence requirements: [what evidence to collect]
185
-
186
- ## 6. Self-Review & Status
187
- Before declaring DONE, verify:
188
- - [ ] All acceptance criteria met
189
- - [ ] No files outside scope modified
190
- - [ ] Tests pass (if applicable)
191
- - [ ] Code follows stated conventions
192
-
193
- End with status: DONE | DONE_WITH_CONCERNS | NEEDS_CONTEXT | BLOCKED
194
- ```
195
-
196
- ### What to Include vs Omit
197
-
198
- | ✅ Include | ❌ Omit |
199
- |-----------|---------|
200
- | Exact file paths and code snippets | Full session history |
201
- | Acceptance criteria | Other agents' tasks |
202
- | Relevant conventions (from KB) | Unrelated architecture context |
203
- | Compact/digest of relevant files | Raw file contents of large files |
204
- | Error messages (if fixing a bug) | Previous failed attempts (unless relevant) |
205
- | FORGE tier and ceremony | Full FORGE protocol explanation |
206
-
207
- ### Context Size Budget
208
-
209
- | Task Complexity | Context Target | Approach |
210
- |-----------------|---------------|----------|
211
- | Micro (1 file) | ~500 tokens | Inline code snippet + goal |
212
- | Small (1-2 files) | ~1000 tokens | `compact` of target files + goal |
213
- | Standard (2-3 files) | ~2000 tokens | `digest` of related files + architectural context |
214
- | Complex (judgment-heavy) | ~3000 tokens | `digest` + relevant decisions from AI Kit |
215
-
216
- ---
217
-
218
- ## §6 Subagent Execution Cycle
219
-
220
- ### Lifecycle
221
-
222
- ```
223
- Orchestrator Subagent (fresh instance)
224
- │ │
225
- ├─ Craft focused prompt ──────────────►│
226
- │ (6-point template) │
227
- │ ├─ Understand scope
228
- │ ├─ Implement changes
229
- │ ├─ Self-review (checklist)
230
- │◄─────────────────── Return status ───┤
231
- │ │ (DONE/CONCERNS/NEEDS/BLOCKED)
232
- │ │
233
- ├─ Handle status (see §7) × (subagent terminates)
234
-
235
- ├─ Automated gate (check/test_run)
236
-
237
- ├─ Dispatch reviewers (see §8)
238
-
239
- └─ FORGE evidence_map gate
240
- ```
241
-
242
- ### Key Rules
243
-
244
- 1. **One subagent = one task**. Never reuse a subagent for a different task.
245
- 2. **Controller provides context**. The subagent's prompt contains everything it needs — it should NOT need to search/explore the codebase.
246
- 3. **Self-review before handoff**. Every implementer must complete the self-review checklist before declaring DONE.
247
- 4. **Status is mandatory**. Every subagent response MUST end with exactly ONE status code.
248
-
249
- ---
250
-
251
- ## §7 Implementer Status Protocol
252
-
253
- ### Status Codes
254
-
255
- Every implementer (Implementer, Frontend, Refactor) MUST end their response with exactly ONE:
256
-
257
- | Status | Meaning | Orchestrator Action |
258
- |--------|---------|-------------------|
259
- | **DONE** | All tasks complete, self-review passed | → Automated gate → Review pipeline |
260
- | **DONE_WITH_CONCERNS** | Complete but flagging issues: [list] | → Surface concerns as `Assumed` claims in evidence_map → Likely HOLD → Address before review |
261
- | **NEEDS_CONTEXT** | Cannot proceed without: [specific question] | → Provide missing context → Re-dispatch same task (counts as retry) |
262
- | **BLOCKED** | Hit a wall: [description] | → Diagnose (see below) |
263
-
264
- ### BLOCKED Diagnosis Tree
265
-
266
- ```
267
- Agent returned BLOCKED
268
-
269
- ├─ Missing context? (needs info not in prompt)
270
- │ → Provide context, re-dispatch
271
-
272
- ├─ Wrong model? (task too complex for assigned model)
273
- │ → Re-dispatch to stronger model (e.g., Implementer → Debugger)
274
-
275
- ├─ Scope too broad? (agent overwhelmed)
276
- │ → Split task further, re-dispatch smaller pieces
277
-
278
- ├─ Plan wrong? (implementation approach won't work)
279
- │ → Re-plan this phase, check AI Kit for alternatives
280
-
281
- └─ External blocker? (dependency not ready, API unavailable)
282
- → Park task, proceed with independent work, revisit later
283
- ```
284
-
285
- ### FORGE Composition
286
-
287
- Status protocol and FORGE are **independent but composable**:
288
-
289
- - **Status** = subjective agent telemetry ("I think I'm done")
290
- - **FORGE** = objective quality evidence ("the evidence says it's done")
291
-
292
- ```
293
- DONE → proceed to automated gate → FORGE evidence_map
294
- DONE_WITH_CONCERNS → concerns become 'Assumed' claims → evidence_map likely HOLDs
295
- NEEDS_CONTEXT → provide context, re-dispatch (no FORGE yet)
296
- BLOCKED → diagnose:
297
- contract/security issue → HARD_BLOCK
298
- resource/scope issue → re-plan, no FORGE
299
- ```
300
-
301
- **Critical rule**: Every `DONE` status MUST be followed by `evidence_map({ action: "gate" })` before proceeding to review. No shortcuts.
302
-
303
- ---
304
-
305
- ## §8 Review Pipeline
306
-
307
- ### Four-Stage Pipeline
308
-
309
- ```
310
- Stage 1: Implementer Self-Review (embedded in agent output)
311
- └─ Checklist: scope respected, tests pass, conventions followed
312
-
313
- Stage 2: Orchestrator Automated Gate
314
- └─ check({}) + test_run({}) MUST pass
315
- └─ Validate self-review checklist present in output
316
- └─ FAIL → bounce back to implementer with specific gap
317
- └─ PASS ↓
318
-
319
- Stage 3: Dual Code Review (parallel)
320
- ├─ Code-Reviewer-Alpha (GPT-5.4): code quality + Spec Alignment
321
- └─ Code-Reviewer-Beta (Opus 4.6): code quality + Spec Alignment
322
- │ Both review same code, different model perspectives
323
- │ Spec Alignment = "Does this match what was asked?"
324
-
325
- Stage 4: Conditional Reviews (parallel if both needed)
326
- ├─ Architecture Review — if boundary changes, new modules, pattern shifts
327
- └─ Security Review — if auth, crypto, input handling, or external data
328
-
329
- FORGE Gate: evidence_map({ action: "gate" })
330
- └─ YIELD → proceed to commit
331
- └─ HOLD → address flagged items → re-gate (max 3 rounds)
332
- └─ HARD_BLOCK → escalate to user
333
- ```
334
-
335
- ### Spec Alignment Dimension (for Code Reviewers)
336
-
337
- Both Code-Reviewer-Alpha and Code-Reviewer-Beta evaluate an explicit **Spec Alignment** dimension:
338
-
339
- 1. Does the implementation match the acceptance criteria from the task?
340
- 2. Are there over-builds (features not requested)?
341
- 3. Are there under-builds (requirements missed)?
342
- 4. Does the output match the expected file changes?
343
-
344
- This catches spec drift that automated tests might miss.
345
-
346
- ### When to Skip Stages
347
-
348
- | Stage | Skip When |
349
- |-------|-----------|
350
- | Architecture Review | No new modules, no boundary changes, no new patterns |
351
- | Security Review | No auth, no crypto, no external input handling |
352
- | FORGE Gate | Floor-tier tasks only (simple, mechanical changes) |
353
-
354
- ---
355
-
356
- ## §9 Recovery & Escalation
357
-
358
- ### Retry Policy
359
-
360
- - **Max 2 retries per agent per task** — after that, re-plan or escalate
361
- - Each retry MUST include the specific failure reason in the new prompt
362
- - Never retry with the same prompt — always add diagnostic context
363
-
364
- ### Loop Detection
365
-
366
- If an agent returns the same error/status 2+ times:
367
- 1. **STOP** — do not retry again
368
- 2. Check if the approach is fundamentally wrong
369
- 3. Consider: different agent, different model, different decomposition, or user escalation
370
-
371
- ### Emergency Procedures
372
-
373
- When parallel batch causes cascading failures:
374
-
375
- ```
376
- STOP → Halt all running agents immediately
377
- ASSESS → git diff --stat + check({}) — how bad is it?
378
- CONTAIN → Limited (1-3 files): fix or re-delegate
379
- Widespread (10+ files): git stash to preserve for analysis
380
- RECOVER → Partial: git checkout -- {specific files}
381
- Full: git stash (preserves) or git checkout . (discards)
382
- Nuclear: git reset --hard HEAD (last resort)
383
- DOCUMENT → remember what went wrong, update plan
384
- ```
385
-
386
- ### Scope Tripwires
387
-
388
- | Signal | Action |
389
- |--------|--------|
390
- | Agent modified **2x more files** than planned | Pause, review before continuing |
391
- | Agent returns `ESCALATE` or `BLOCKED` repeatedly | Do NOT re-delegate unchanged. Diagnose first |
392
- | Agent's output contradicts the plan | Stop, compare with plan, re-align |
393
- | Tests that were passing now fail | Immediate rollback of that agent's changes |
394
-
395
- ---
396
-
397
- ## §10 Common Mistakes & Red Flags
398
-
399
- ### Delegation Anti-Patterns
400
-
401
- | ❌ Mistake | Why It Fails | ✅ Fix |
402
- |-----------|-------------|--------|
403
- | **Too broad scope** — "implement the auth system" | Agent lacks clear boundaries, produces sprawling changes | Split: "add JWT middleware to auth.ts" + "add login endpoint to routes.ts" |
404
- | **No constraints** — "add a feature" | Agent invents architecture, conflicts with existing patterns | Include conventions, boundaries, existing patterns in prompt |
405
- | **Vague output** — "make it work" | No way to verify completion | Specific acceptance criteria: "endpoint returns 200 with {schema}" |
406
- | **Session context inheritance** — "continue from where we left off" | Subagent has stale/polluted context | Fresh prompt with 6-point template every time |
407
- | **Skipping reviews** — "it's a small change" | Small changes cause big regressions | ALWAYS run automated gate minimum |
408
- | **Parallel on shared files** — "both agents edit config.ts" | Merge conflicts, lost changes | Sequential, or merge into one task |
409
- | **Trusting the report** — "agent said DONE so it's done" | Agents are optimistic, miss edge cases | Automated gate + dual code review catches this |
410
- | **Brute-force retries** — same prompt 3 times | If it failed twice, it'll fail a third time | Diagnose, change approach, then retry |
411
- | **Orchestrator implements** — "just this one small fix" | Breaks the delegation contract, no review | ALWAYS delegate, no matter how small |
412
-
413
- ### Red Flags in Agent Output
414
-
415
- | Flag | What It Means | Action |
416
- |------|--------------|--------|
417
- | Agent modified files outside its scope | Scope creep or misunderstanding | Rollback out-of-scope files, re-delegate with tighter constraints |
418
- | Agent added dependencies not in plan | Unauthorized architectural decision | Review necessity, likely rollback |
419
- | Agent skipped self-review checklist | Rushing, likely incomplete | Bounce back with checklist requirement |
420
- | Agent's DONE but tests fail | Didn't actually self-test | Bounce back with failing test output |
421
- | Agent asks questions in output instead of using NEEDS_CONTEXT | Misunderstands status protocol | Treat as NEEDS_CONTEXT, educate in next prompt |
422
-
423
- ---
424
-
425
- ## Prompt Template Reference
426
-
427
- Detailed prompt templates are provided as sidecar files:
428
-
429
- | Template | File | Use When |
430
- |----------|------|----------|
431
- | Implementer dispatch | [`implementer-prompt.md`](implementer-prompt.md) | Dispatching Implementer, Frontend, or Refactor agents |
432
- | Spec compliance review | [`spec-review-prompt.md`](spec-review-prompt.md) | Adversarial spec alignment check (Code-Reviewer-Alpha) |
433
- | Code quality review | [`code-quality-review-prompt.md`](code-quality-review-prompt.md) | Dual code quality review (Code-Reviewer-Beta) |
434
- | Architecture review | [`architecture-review-prompt.md`](architecture-review-prompt.md) | Boundary changes, pattern adherence review |
435
- | Parallel dispatch example | [`parallel-dispatch-example.md`](parallel-dispatch-example.md) | Worked example of decomposing a feature into parallel tasks |
1
+ # Multi-Agent Development
2
+
3
+ Comprehensive patterns for orchestrating multiple AI agents in parallel development workflows. Covers task decomposition, parallel dispatch, context crafting, status handling, review pipelines, and recovery.
4
+
5
+ **Core Principle**: Dispatch multiple agents for focused tasks. Each subagent gets fresh, focused context with explicit scope — never inherited session state.
6
+
7
+ Load this skill when orchestrating multi-agent work: planning parallel batches, crafting delegation prompts, handling implementer status, running review pipelines, or recovering from agent failures.
8
+
9
+ ---
10
+
11
+ ## §1 Agent Roles & Model Selection
12
+
13
+ ### Role Categories
14
+
15
+ | Role | Agents | When to Use | Parallelizable |
16
+ |------|--------|-------------|----------------|
17
+ | **Orchestration** | Orchestrator, Planner | Workflow control, planning | No (sequential) |
18
+ | **Implementation** | Implementer, Frontend, Refactor | Code creation/modification | Yes (disjoint files only) |
19
+ | **Research** | Explorer, Researcher-Alpha/Beta/Gamma/Delta | Codebase exploration, decisions | Yes (always) |
20
+ | **Review** | Code-Reviewer-Alpha/Beta, Architect-Reviewer-Alpha/Beta | Quality verification | Yes (always) |
21
+ | **Diagnostics** | Debugger, Security | Issue tracing, vulnerability analysis | Yes (read-only) |
22
+ | **Documentation** | Documenter | README, API docs, changelog | Yes (disjoint files) |
23
+
24
+ ### Model Selection by Task Complexity
25
+
26
+ Choose the **least powerful model that can handle the role**:
27
+
28
+ | Complexity Signal | Model Tier | Example Agents |
29
+ |-------------------|-----------|----------------|
30
+ | Mechanical (rename, move, add field) | Fast model | Explorer (Gemini Flash) |
31
+ | Standard (implement spec, write tests) | Mid-tier | Implementer (GPT-5.4), Refactor (GPT-5.4) |
32
+ | Judgment-heavy (architecture, security, debug) | Strongest | Debugger (Opus 4.6), Security (Opus 4.6) |
33
+ | Multi-model cross-validation | Mixed | Researcher-Alpha/Beta/Gamma/Delta (all different) |
34
+
35
+ **Upgrade signal**: If an agent returns `BLOCKED` or `DONE_WITH_CONCERNS` on a task classified as "Standard", consider re-dispatching to a stronger model.
36
+
37
+ ---
38
+
39
+ ## §2 Task Decomposition Rules
40
+
41
+ ### The Golden Rule
42
+ > **One task = one focused problem domain = 1-3 files maximum.**
43
+
44
+ ### Decomposition Checklist
45
+
46
+ For each task, specify ALL of:
47
+ - [ ] **Target files** — exact paths to create or modify
48
+ - [ ] **Acceptance criteria** — what "done" looks like (testable)
49
+ - [ ] **Agent assignment** — which agent handles this
50
+ - [ ] **Dependencies** — which tasks must complete first (if any)
51
+
52
+ ### Sizing Guide
53
+
54
+ | Task Size | Files | Example | Agent |
55
+ |-----------|-------|---------|-------|
56
+ | **Micro** | 1 file | Add a utility function | Implementer |
57
+ | **Small** | 1-2 files | New endpoint + test | Implementer |
58
+ | **Standard** | 2-3 files | Feature with service + controller + test | Implementer |
59
+ | **Too big** | 4+ files | **SPLIT IT** — decompose further | — |
60
+
61
+ ### Splitting Strategies
62
+ - **By layer**: Service logic (Implementer) + UI component (Frontend) + tests (Implementer)
63
+ - **By feature boundary**: Auth endpoints (Implementer A) + Profile endpoints (Implementer B)
64
+ - **By concern**: Data model changes (Implementer) + API route changes (Implementer) + UI updates (Frontend)
65
+
66
+ ---
67
+
68
+ ## §3 Independence Decision Tree
69
+
70
+ Before marking tasks as parallel, walk this tree:
71
+
72
+ ```
73
+ Task A and Task B — can they run in parallel?
74
+
75
+ ├─ Do they share ANY files? (create, modify, or delete the same file)
76
+ │ ├─ YES → SEQUENTIAL (or merge into one task)
77
+ │ └─ NO ↓
78
+
79
+ ├─ Do they share mutable state? (env vars, globals, same DB table, shared config)
80
+ │ ├─ YES → SEQUENTIAL
81
+ │ └─ NO ↓
82
+
83
+ ├─ Does B need A's output? (B reads a file A creates, B uses A's new export)
84
+ │ ├─ YES → SEQUENTIAL (A before B)
85
+ │ └─ NO ↓
86
+
87
+ ├─ Would A's result change B's approach? (A discovers something that affects B)
88
+ │ ├─ YES → SEQUENTIAL or single agent
89
+ │ └─ NO ↓
90
+
91
+ ├─ Resource contention? (same port, same build process, same lock file)
92
+ │ ├─ YES → SEQUENTIAL
93
+ │ └─ NO ↓
94
+
95
+ └─ ✅ SAFE TO PARALLELIZE
96
+ ```
97
+
98
+ ### Edge Cases
99
+
100
+ | Situation | Verdict | Why |
101
+ |-----------|---------|-----|
102
+ | Both import from same module (read-only) | ✅ Parallel | Reading shared code is fine |
103
+ | Both add exports to same index file | ❌ Sequential | Concurrent index.ts edits will conflict |
104
+ | A creates a type, B uses that type | ❌ Sequential | B depends on A's output |
105
+ | Both modify different test files | ✅ Parallel | Disjoint file sets |
106
+ | Both touch package.json | ❌ Sequential | Shared file |
107
+ | A adds a route, B adds middleware | ⚠️ Check | If B's middleware affects A's route → sequential |
108
+
109
+ ### Integration Verification (after parallel batch completes)
110
+
111
+ 1. **Conflict check**: Did any agent unexpectedly modify a file assigned to another agent?
112
+ 2. **Import check**: Do all new cross-references resolve?
113
+ 3. **Full suite**: `check({})` + `test_run({})` — everything must pass
114
+ 4. **Spot check**: Manually verify at least one task's output matches acceptance criteria
115
+
116
+ ---
117
+
118
+ ## §4 Parallel Dispatch Patterns
119
+
120
+ ### Dispatch Rules
121
+
122
+ 1. **Max 4 concurrent file-modifying agents** per batch
123
+ 2. **Read-only agents have no limit** — Explorer, Researcher*, Reviewer*, Security can always run in parallel
124
+ 3. **Build dependency graph first** — phases with no dependencies MUST be batched together
125
+ 4. **Never dispatch two implementers to the same file** — even different sections
126
+
127
+ ### Batch Strategy
128
+
129
+ ```
130
+ Phase Plan:
131
+ Phase 1: [Task A, Task B, Task C] ← no dependencies between A/B/C
132
+ Phase 2: [Task D, Task E] ← D depends on A, E depends on B
133
+ Phase 3: [Task F] ← F depends on D and E
134
+
135
+ Execution:
136
+ Batch 1: dispatch(A, B, C) in parallel → review → gate
137
+ Batch 2: dispatch(D, E) in parallel → review → gate
138
+ Batch 3: dispatch(F) → review → gate
139
+ ```
140
+
141
+ ### Anti-Patterns
142
+
143
+ | ❌ Don't | ✅ Do Instead |
144
+ |----------|--------------|
145
+ | Dispatch 6 implementers at once | Max 4, queue the rest |
146
+ | Give one agent 10 files | Split into 3-4 focused tasks |
147
+ | Let agents read the full plan | Give each agent ONLY its task context |
148
+ | Retry same prompt on failure | Diagnose first, then re-prompt with fix |
149
+ | Skip review after parallel batch | ALWAYS review + integration verify |
150
+ | Inherit session context to subagent | Build fresh, focused context per dispatch |
151
+
152
+ ---
153
+
154
+ ## §5 Context Crafting Guide
155
+
156
+ ### The Controller Principle
157
+ > **The Orchestrator provides ALL context. Subagents never need to search for context themselves.**
158
+
159
+ Each subagent gets a fresh, self-contained prompt. No inherited session state. No "read the plan first."
160
+
161
+ ### The 6-Point Prompt Template
162
+
163
+ Every delegation prompt MUST include:
164
+
165
+ ```markdown
166
+ ## 1. Scope
167
+ Files to create/modify: [exact paths]
168
+ Files to NOT touch: [boundaries]
169
+
170
+ ## 2. Goal
171
+ [What the code should do — acceptance criteria, testable outcomes]
172
+
173
+ ## 3. Architectural Context
174
+ [Relevant patterns, conventions, existing code structure]
175
+ [Include actual code snippets from compact/digest — don't tell agent to "go read X"]
176
+
177
+ ## 4. Constraints
178
+ - Follow [pattern/convention]
179
+ - Do NOT modify [boundary files]
180
+ - Use [specific library/approach]
181
+
182
+ ## 5. FORGE Context
183
+ Tier: [Floor/Standard/Critical]
184
+ Evidence requirements: [what evidence to collect]
185
+
186
+ ## 6. Self-Review & Status
187
+ Before declaring DONE, verify:
188
+ - [ ] All acceptance criteria met
189
+ - [ ] No files outside scope modified
190
+ - [ ] Tests pass (if applicable)
191
+ - [ ] Code follows stated conventions
192
+
193
+ End with status: DONE | DONE_WITH_CONCERNS | NEEDS_CONTEXT | BLOCKED
194
+ ```
195
+
196
+ ### What to Include vs Omit
197
+
198
+ | ✅ Include | ❌ Omit |
199
+ |-----------|---------|
200
+ | Exact file paths and code snippets | Full session history |
201
+ | Acceptance criteria | Other agents' tasks |
202
+ | Relevant conventions (from KB) | Unrelated architecture context |
203
+ | Compact/digest of relevant files | Raw file contents of large files |
204
+ | Error messages (if fixing a bug) | Previous failed attempts (unless relevant) |
205
+ | FORGE tier and ceremony | Full FORGE protocol explanation |
206
+
207
+ ### Context Size Budget
208
+
209
+ | Task Complexity | Context Target | Approach |
210
+ |-----------------|---------------|----------|
211
+ | Micro (1 file) | ~500 tokens | Inline code snippet + goal |
212
+ | Small (1-2 files) | ~1000 tokens | `compact` of target files + goal |
213
+ | Standard (2-3 files) | ~2000 tokens | `digest` of related files + architectural context |
214
+ | Complex (judgment-heavy) | ~3000 tokens | `digest` + relevant decisions from AI Kit |
215
+
216
+ ---
217
+
218
+ ## §6 Subagent Execution Cycle
219
+
220
+ ### Lifecycle
221
+
222
+ ```
223
+ Orchestrator Subagent (fresh instance)
224
+ │ │
225
+ ├─ Craft focused prompt ──────────────►│
226
+ │ (6-point template) │
227
+ │ ├─ Understand scope
228
+ │ ├─ Implement changes
229
+ │ ├─ Self-review (checklist)
230
+ │◄─────────────────── Return status ───┤
231
+ │ │ (DONE/CONCERNS/NEEDS/BLOCKED)
232
+ │ │
233
+ ├─ Handle status (see §7) × (subagent terminates)
234
+
235
+ ├─ Automated gate (check/test_run)
236
+
237
+ ├─ Dispatch reviewers (see §8)
238
+
239
+ └─ FORGE evidence_map gate
240
+ ```
241
+
242
+ ### Key Rules
243
+
244
+ 1. **One subagent = one task**. Never reuse a subagent for a different task.
245
+ 2. **Controller provides context**. The subagent's prompt contains everything it needs — it should NOT need to search/explore the codebase.
246
+ 3. **Self-review before handoff**. Every implementer must complete the self-review checklist before declaring DONE.
247
+ 4. **Status is mandatory**. Every subagent response MUST end with exactly ONE status code.
248
+
249
+ ---
250
+
251
+ ## §7 Implementer Status Protocol
252
+
253
+ ### Status Codes
254
+
255
+ Every implementer (Implementer, Frontend, Refactor) MUST end their response with exactly ONE:
256
+
257
+ | Status | Meaning | Orchestrator Action |
258
+ |--------|---------|-------------------|
259
+ | **DONE** | All tasks complete, self-review passed | → Automated gate → Review pipeline |
260
+ | **DONE_WITH_CONCERNS** | Complete but flagging issues: [list] | → Surface concerns as `Assumed` claims in evidence_map → Likely HOLD → Address before review |
261
+ | **NEEDS_CONTEXT** | Cannot proceed without: [specific question] | → Provide missing context → Re-dispatch same task (counts as retry) |
262
+ | **BLOCKED** | Hit a wall: [description] | → Diagnose (see below) |
263
+
264
+ ### BLOCKED Diagnosis Tree
265
+
266
+ ```
267
+ Agent returned BLOCKED
268
+
269
+ ├─ Missing context? (needs info not in prompt)
270
+ │ → Provide context, re-dispatch
271
+
272
+ ├─ Wrong model? (task too complex for assigned model)
273
+ │ → Re-dispatch to stronger model (e.g., Implementer → Debugger)
274
+
275
+ ├─ Scope too broad? (agent overwhelmed)
276
+ │ → Split task further, re-dispatch smaller pieces
277
+
278
+ ├─ Plan wrong? (implementation approach won't work)
279
+ │ → Re-plan this phase, check AI Kit for alternatives
280
+
281
+ └─ External blocker? (dependency not ready, API unavailable)
282
+ → Park task, proceed with independent work, revisit later
283
+ ```
284
+
285
+ ### FORGE Composition
286
+
287
+ Status protocol and FORGE are **independent but composable**:
288
+
289
+ - **Status** = subjective agent telemetry ("I think I'm done")
290
+ - **FORGE** = objective quality evidence ("the evidence says it's done")
291
+
292
+ ```
293
+ DONE → proceed to automated gate → FORGE evidence_map
294
+ DONE_WITH_CONCERNS → concerns become 'Assumed' claims → evidence_map likely HOLDs
295
+ NEEDS_CONTEXT → provide context, re-dispatch (no FORGE yet)
296
+ BLOCKED → diagnose:
297
+ contract/security issue → HARD_BLOCK
298
+ resource/scope issue → re-plan, no FORGE
299
+ ```
300
+
301
+ **Critical rule**: Every `DONE` status MUST be followed by `evidence_map({ action: "gate" })` before proceeding to review. No shortcuts.
302
+
303
+ ---
304
+
305
+ ## §8 Review Pipeline
306
+
307
+ ### Four-Stage Pipeline
308
+
309
+ ```
310
+ Stage 1: Implementer Self-Review (embedded in agent output)
311
+ └─ Checklist: scope respected, tests pass, conventions followed
312
+
313
+ Stage 2: Orchestrator Automated Gate
314
+ └─ check({}) + test_run({}) MUST pass
315
+ └─ Validate self-review checklist present in output
316
+ └─ FAIL → bounce back to implementer with specific gap
317
+ └─ PASS ↓
318
+
319
+ Stage 3: Dual Code Review (parallel)
320
+ ├─ Code-Reviewer-Alpha (GPT-5.4): code quality + Spec Alignment
321
+ └─ Code-Reviewer-Beta (Opus 4.6): code quality + Spec Alignment
322
+ │ Both review same code, different model perspectives
323
+ │ Spec Alignment = "Does this match what was asked?"
324
+
325
+ Stage 4: Conditional Reviews (parallel if both needed)
326
+ ├─ Architecture Review — if boundary changes, new modules, pattern shifts
327
+ └─ Security Review — if auth, crypto, input handling, or external data
328
+
329
+ FORGE Gate: evidence_map({ action: "gate" })
330
+ └─ YIELD → proceed to commit
331
+ └─ HOLD → address flagged items → re-gate (max 3 rounds)
332
+ └─ HARD_BLOCK → escalate to user
333
+ ```
334
+
335
+ ### Spec Alignment Dimension (for Code Reviewers)
336
+
337
+ Both Code-Reviewer-Alpha and Code-Reviewer-Beta evaluate an explicit **Spec Alignment** dimension:
338
+
339
+ 1. Does the implementation match the acceptance criteria from the task?
340
+ 2. Are there over-builds (features not requested)?
341
+ 3. Are there under-builds (requirements missed)?
342
+ 4. Does the output match the expected file changes?
343
+
344
+ This catches spec drift that automated tests might miss.
345
+
346
+ ### When to Skip Stages
347
+
348
+ | Stage | Skip When |
349
+ |-------|-----------|
350
+ | Architecture Review | No new modules, no boundary changes, no new patterns |
351
+ | Security Review | No auth, no crypto, no external input handling |
352
+ | FORGE Gate | Floor-tier tasks only (simple, mechanical changes) |
353
+
354
+ ---
355
+
356
+ ## §9 Recovery & Escalation
357
+
358
+ ### Retry Policy
359
+
360
+ - **Max 2 retries per agent per task** — after that, re-plan or escalate
361
+ - Each retry MUST include the specific failure reason in the new prompt
362
+ - Never retry with the same prompt — always add diagnostic context
363
+
364
+ ### Loop Detection
365
+
366
+ If an agent returns the same error/status 2+ times:
367
+ 1. **STOP** — do not retry again
368
+ 2. Check if the approach is fundamentally wrong
369
+ 3. Consider: different agent, different model, different decomposition, or user escalation
370
+
371
+ ### Emergency Procedures
372
+
373
+ When parallel batch causes cascading failures:
374
+
375
+ ```
376
+ STOP → Halt all running agents immediately
377
+ ASSESS → git diff --stat + check({}) — how bad is it?
378
+ CONTAIN → Limited (1-3 files): fix or re-delegate
379
+ Widespread (10+ files): git stash to preserve for analysis
380
+ RECOVER → Partial: git checkout -- {specific files}
381
+ Full: git stash (preserves) or git checkout . (discards)
382
+ Nuclear: git reset --hard HEAD (last resort)
383
+ DOCUMENT → remember what went wrong, update plan
384
+ ```
385
+
386
+ ### Scope Tripwires
387
+
388
+ | Signal | Action |
389
+ |--------|--------|
390
+ | Agent modified **2x more files** than planned | Pause, review before continuing |
391
+ | Agent returns `ESCALATE` or `BLOCKED` repeatedly | Do NOT re-delegate unchanged. Diagnose first |
392
+ | Agent's output contradicts the plan | Stop, compare with plan, re-align |
393
+ | Tests that were passing now fail | Immediate rollback of that agent's changes |
394
+
395
+ ---
396
+
397
+ ## §10 Common Mistakes & Red Flags
398
+
399
+ ### Delegation Anti-Patterns
400
+
401
+ | ❌ Mistake | Why It Fails | ✅ Fix |
402
+ |-----------|-------------|--------|
403
+ | **Too broad scope** — "implement the auth system" | Agent lacks clear boundaries, produces sprawling changes | Split: "add JWT middleware to auth.ts" + "add login endpoint to routes.ts" |
404
+ | **No constraints** — "add a feature" | Agent invents architecture, conflicts with existing patterns | Include conventions, boundaries, existing patterns in prompt |
405
+ | **Vague output** — "make it work" | No way to verify completion | Specific acceptance criteria: "endpoint returns 200 with {schema}" |
406
+ | **Session context inheritance** — "continue from where we left off" | Subagent has stale/polluted context | Fresh prompt with 6-point template every time |
407
+ | **Skipping reviews** — "it's a small change" | Small changes cause big regressions | ALWAYS run automated gate minimum |
408
+ | **Parallel on shared files** — "both agents edit config.ts" | Merge conflicts, lost changes | Sequential, or merge into one task |
409
+ | **Trusting the report** — "agent said DONE so it's done" | Agents are optimistic, miss edge cases | Automated gate + dual code review catches this |
410
+ | **Brute-force retries** — same prompt 3 times | If it failed twice, it'll fail a third time | Diagnose, change approach, then retry |
411
+ | **Orchestrator implements** — "just this one small fix" | Breaks the delegation contract, no review | ALWAYS delegate, no matter how small |
412
+
413
+ ### Red Flags in Agent Output
414
+
415
+ | Flag | What It Means | Action |
416
+ |------|--------------|--------|
417
+ | Agent modified files outside its scope | Scope creep or misunderstanding | Rollback out-of-scope files, re-delegate with tighter constraints |
418
+ | Agent added dependencies not in plan | Unauthorized architectural decision | Review necessity, likely rollback |
419
+ | Agent skipped self-review checklist | Rushing, likely incomplete | Bounce back with checklist requirement |
420
+ | Agent's DONE but tests fail | Didn't actually self-test | Bounce back with failing test output |
421
+ | Agent asks questions in output instead of using NEEDS_CONTEXT | Misunderstands status protocol | Treat as NEEDS_CONTEXT, educate in next prompt |
422
+
423
+ ---
424
+
425
+ ## Prompt Template Reference
426
+
427
+ Detailed prompt templates are provided as sidecar files:
428
+
429
+ | Template | File | Use When |
430
+ |----------|------|----------|
431
+ | Implementer dispatch | [`implementer-prompt.md`](implementer-prompt.md) | Dispatching Implementer, Frontend, or Refactor agents |
432
+ | Spec compliance review | [`spec-review-prompt.md`](spec-review-prompt.md) | Adversarial spec alignment check (Code-Reviewer-Alpha) |
433
+ | Code quality review | [`code-quality-review-prompt.md`](code-quality-review-prompt.md) | Dual code quality review (Code-Reviewer-Beta) |
434
+ | Architecture review | [`architecture-review-prompt.md`](architecture-review-prompt.md) | Boundary changes, pattern adherence review |
435
+ | Parallel dispatch example | [`parallel-dispatch-example.md`](parallel-dispatch-example.md) | Worked example of decomposing a feature into parallel tasks |