opencode-swarm-plugin 0.12.27 → 0.12.29

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,682 +0,0 @@
1
- ---
2
- name: agent-patterns
3
- description: AI agent design patterns from "Patterns for Building AI Agents". Use when designing agent architectures, planning agent capabilities, implementing human-in-the-loop workflows, or setting up agent evals. Covers capability whiteboards, architecture evolution, dynamic agents, context engineering, and multi-agent coordination.
4
- ---
5
-
6
- # Agent Patterns
7
-
8
- Actionable patterns for building production AI agents. From "Patterns for Building AI Agents" by Sam Bhagwat & Michelle Gienow.
9
-
10
- ## Whiteboard Agent Capabilities
11
-
12
- **Problem**: Dozens of potential capabilities, unclear where to start.
13
-
14
- **Solution**: Ruthless prioritization using impact/effort matrix.
15
-
16
- ### Process
17
-
18
- 1. **List all possible capabilities** - brainstorm everything the agent could do
19
- 2. **Plot on 2x2 matrix** - Impact (low/high) vs Effort (low/high)
20
- 3. **Start with high-impact, low-effort** - quick wins build momentum
21
- 4. **Defer low-impact, high-effort** - avoid complexity that doesn't move the needle
22
- 5. **Validate with users** - don't build what you think they want
23
-
24
- ### Warning Signs
25
-
26
- - Building features users didn't request
27
- - Starting with hardest capability first
28
- - No clear success metrics per capability
29
- - "We'll need this eventually" justifications
30
-
31
- ### Example: Code Assistant Agent
32
-
33
- **High Impact, Low Effort** (DO FIRST):
34
-
35
- - Code completion for common patterns
36
- - Syntax error detection
37
- - Import statement fixes
38
-
39
- **High Impact, High Effort** (DO LATER):
40
-
41
- - Full codebase refactoring
42
- - Architecture recommendations
43
- - Security vulnerability analysis
44
-
45
- **Low Impact** (DON'T BUILD):
46
-
47
- - Custom color scheme suggestions
48
- - Code formatting opinions
49
- - Editor layout recommendations
50
-
51
- ## Evolve Your Agent Architecture
52
-
53
- **Problem**: Need to ship incrementally without over-engineering.
54
-
55
- **Solution**: Start simple, evolve based on real usage.
56
-
57
- ### Architecture Progression
58
-
59
- **Level 1: Single-Shot**
60
-
61
- - One prompt, one response
62
- - No memory, no tools
63
- - Use for: Simple classification, text generation
64
-
65
- **Level 2: Agentic Workflow**
66
-
67
- - LLM + tool calls in a loop
68
- - Agent decides when to use tools
69
- - Use for: Research, data gathering, simple automation
70
-
71
- **Level 3: Reflection**
72
-
73
- - Agent critiques its own outputs
74
- - Iterates until quality threshold met
75
- - Use for: Code generation, content writing, analysis
76
-
77
- **Level 4: Multi-Agent**
78
-
79
- - Specialized agents for different tasks
80
- - Coordinator routes between them
81
- - Use for: Complex workflows, domain expertise
82
-
83
- **Level 5: Human-in-the-Loop**
84
-
85
- - Agents request human input at decision points
86
- - Deferred execution for safety-critical actions
87
- - Use for: Financial transactions, legal review, medical diagnosis
88
-
89
- ### Evolution Triggers
90
-
91
- **Add tools when**:
92
-
93
- - Hallucinations about external data
94
- - Need real-time information
95
- - Require system actions
96
-
97
- **Add reflection when**:
98
-
99
- - Quality varies too much
100
- - First draft rarely good enough
101
- - Need self-correction
102
-
103
- **Add multi-agent when**:
104
-
105
- - Single prompt becomes unwieldy (>2000 tokens)
106
- - Specialized expertise needed
107
- - Parallel work possible
108
-
109
- **Add HITL when**:
110
-
111
- - High cost of errors
112
- - Legal/compliance requirements
113
- - Trust not yet established
114
-
115
- ### Anti-Patterns
116
-
117
- - Skipping straight to multi-agent without validating single-agent works
118
- - Adding reflection before tools (tools usually higher ROI)
119
- - Building Level 5 when Level 2 would suffice
120
-
121
- ## Dynamic Agents
122
-
123
- **Problem**: Different user types need different agent behaviors, don't want to maintain multiple versions.
124
-
125
- **Solution**: Agents that adapt based on context signals.
126
-
127
- ### Adaptation Strategies
128
-
129
- **User-Based**:
130
-
131
- - Expertise level (beginner/advanced)
132
- - Role (developer/manager/executive)
133
- - Permissions (read-only/edit/admin)
134
-
135
- **Context-Based**:
136
-
137
- - Time sensitivity (urgent/routine)
138
- - Risk level (safe/review-required/blocked)
139
- - Data sensitivity (public/internal/confidential)
140
-
141
- **Task-Based**:
142
-
143
- - Complexity (simple/moderate/complex)
144
- - Familiarity (seen before/novel)
145
- - Confidence (high/medium/low)
146
-
147
- ### Implementation Patterns
148
-
149
- **Via System Prompt**:
150
-
151
- ```typescript
152
- const systemPrompt = buildPrompt({
153
- userLevel: user.expertiseLevel,
154
- riskLevel: task.calculateRisk(),
155
- tone: user.preferences.tone,
156
- });
157
- ```
158
-
159
- **Via Tool Selection**:
160
-
161
- ```typescript
162
- const tools = selectTools({
163
- userPermissions: user.permissions,
164
- taskType: task.type,
165
- contextSensitivity: context.dataLevel,
166
- });
167
- ```
168
-
169
- **Via Output Format**:
170
-
171
- ```typescript
172
- const format = user.isExpert ? "technical_detail" : "executive_summary";
173
- ```
174
-
175
- ### Warning Signs
176
-
177
- - Same output for all users regardless of context
178
- - Hard-coded behavior that should be dynamic
179
- - Creating separate agents per use case
180
- - No adaptation based on feedback
181
-
182
- ## Human-in-the-Loop Patterns
183
-
184
- **Problem**: Need human judgment for safety-critical decisions.
185
-
186
- **Solution**: Strategic pause points, not blanket approval.
187
-
188
- ### When to Use HITL
189
-
190
- **Always**:
191
-
192
- - Financial transactions
193
- - Legal commitments
194
- - Privacy/security decisions
195
- - Irreversible actions
196
-
197
- **Sometimes**:
198
-
199
- - Low confidence predictions
200
- - Novel scenarios
201
- - High-value decisions
202
- - Learning new domains
203
-
204
- **Never**:
205
-
206
- - Routine operations
207
- - Read-only queries
208
- - Already-verified patterns
209
- - Low-stakes decisions
210
-
211
- ### Patterns
212
-
213
- **Pause-and-Verify**:
214
-
215
- - Agent stops execution
216
- - Requests human decision
217
- - Resumes with human input
218
- - Use when: Decision is blocking, immediate context needed
219
-
220
- **Deferred Execution**:
221
-
222
- - Agent plans action
223
- - Queues for approval
224
- - Human reviews asynchronously
225
- - Executes on approval
226
- - Use when: Batch review possible, non-urgent
227
-
228
- **Confidence Threshold**:
229
-
230
- - Agent checks confidence score
231
- - Auto-executes if above threshold
232
- - Requests human if below
233
- - Use when: Most cases are clear-cut
234
-
235
- **Explanation-First**:
236
-
237
- - Agent provides reasoning
238
- - Human approves/rejects/modifies
239
- - Agent proceeds with decision
240
- - Use when: Teaching agent new patterns
241
-
242
- ### Implementation
243
-
244
- ```typescript
245
- async function executeWithApproval(action: Action) {
246
- if (action.risk === "high") {
247
- const approval = await requestHumanApproval({
248
- action,
249
- reasoning: action.reasoning,
250
- alternatives: action.alternatives,
251
- });
252
-
253
- if (!approval.approved) {
254
- return handleRejection(approval.reason);
255
- }
256
- }
257
-
258
- return await executeAction(action);
259
- }
260
- ```
261
-
262
- ### Anti-Patterns
263
-
264
- - Requesting approval for every action (approval fatigue)
265
- - No context in approval requests (human can't evaluate)
266
- - Blocking on approvals that could be deferred
267
- - No learning from approval patterns
268
-
269
- ## Evals: Testing Agent Behavior
270
-
271
- **Problem**: Can't tell if changes improve or break agent quality.
272
-
273
- **Solution**: Test suite of evaluation criteria with measurable metrics.
274
-
275
- ### Eval Types
276
-
277
- **Unit Evals** (single capability):
278
-
279
- - Tool calling accuracy
280
- - Formatting compliance
281
- - Entity extraction precision
282
- - Response time
283
-
284
- **Integration Evals** (multi-step):
285
-
286
- - End-to-end task completion
287
- - Multi-tool orchestration
288
- - Error recovery
289
- - Context retention
290
-
291
- **Production Evals** (real usage):
292
-
293
- - User satisfaction scores
294
- - Task success rate
295
- - Escalation frequency
296
- - Cost per task
297
-
298
- ### Building an Eval Suite
299
-
300
- **1. Define Success Criteria**:
301
-
302
- ```typescript
303
- type EvalCriteria = {
304
- name: string;
305
- description: string;
306
- passing: (output: string) => boolean;
307
- weight: number; // 0-1
308
- };
309
-
310
- const criteria: EvalCriteria[] = [
311
- {
312
- name: "correct_format",
313
- description: "Output is valid JSON",
314
- passing: (out) => isValidJSON(out),
315
- weight: 1.0, // must pass
316
- },
317
- {
318
- name: "includes_reasoning",
319
- description: "Explanation provided",
320
- passing: (out) => out.includes("because"),
321
- weight: 0.7, // nice to have
322
- },
323
- ];
324
- ```
325
-
326
- **2. Create Test Cases**:
327
-
328
- - Representative samples (80% common cases)
329
- - Edge cases (15% unusual scenarios)
330
- - Adversarial cases (5% intentionally tricky)
331
-
332
- **3. Establish Baselines**:
333
-
334
- - Measure current performance
335
- - Set minimum acceptable thresholds
336
- - Track regression
337
-
338
- **4. Automate Runs**:
339
-
340
- - Run on every prompt change
341
- - Run on every code deploy
342
- - Run on schedule (daily/weekly)
343
-
344
- ### Metrics to Track
345
-
346
- **Accuracy**:
347
-
348
- - Precision: True positives / (True positives + False positives)
349
- - Recall: True positives / (True positives + False negatives)
350
- - F1: Harmonic mean of precision and recall
351
-
352
- **Quality**:
353
-
354
- - Hallucination rate
355
- - Instruction following
356
- - Output format compliance
357
-
358
- **Efficiency**:
359
-
360
- - Token usage
361
- - Latency
362
- - Cost per task
363
-
364
- **Safety**:
365
-
366
- - False approval rate (approved when should reject)
367
- - False rejection rate (rejected when should approve)
368
- - Out-of-bounds attempts
369
-
370
- ### Anti-Patterns
371
-
372
- - Manual testing only (doesn't scale)
373
- - No quantitative metrics (can't track progress)
374
- - Testing only happy paths (misses edge cases)
375
- - Evals that always pass (not catching regressions)
376
-
377
- ## Context Engineering
378
-
379
- **Problem**: Context window is precious, must optimize what you send.
380
-
381
- **Solution**: Systematic approach to context construction.
382
-
383
- ### Context Budget Strategy
384
-
385
- **Allocate token budget** (example for 128k window):
386
-
387
- - System prompt: 2k tokens (1.5%)
388
- - Tool definitions: 5k tokens (4%)
389
- - Conversation history: 30k tokens (23%)
390
- - Retrieved context: 40k tokens (31%)
391
- - User message: 1k tokens (0.8%)
392
- - **Reserve**: 50k tokens (39%) for output + safety margin
393
-
394
- ### What to Include
395
-
396
- **Always**:
397
-
398
- - Current task description
399
- - User's explicit context
400
- - Error messages from previous attempts
401
-
402
- **Usually**:
403
-
404
- - Relevant conversation history (last N turns)
405
- - Retrieved documentation snippets
406
- - Related code snippets
407
-
408
- **Sometimes**:
409
-
410
- - Full file contents (only if directly editing)
411
- - Entire conversation thread (only for handoffs)
412
- - Tangential context (only if user mentioned)
413
-
414
- **Never**:
415
-
416
- - Entire codebase dumps
417
- - All conversation history
418
- - Unused tool definitions
419
- - Redundant information
420
-
421
- ### Context Compression
422
-
423
- **Summarization**:
424
-
425
- ```typescript
426
- const summary = await summarizeConversation({
427
- messages: history.slice(0, -5), // all but last 5
428
- maxTokens: 500,
429
- });
430
-
431
- const context = [
432
- { role: "system", content: summary },
433
- ...history.slice(-5), // keep last 5 verbatim
434
- ];
435
- ```
436
-
437
- **Selective Detail**:
438
-
439
- - Full detail for active files
440
- - Signatures only for reference files
441
- - Summaries for background context
442
-
443
- **Chunking**:
444
-
445
- - Break large documents into semantic chunks
446
- - Embed and retrieve only relevant chunks
447
- - Include chunk context (surrounding text)
448
-
449
- ### Anti-Patterns
450
-
451
- - Dumping entire files into context "just in case"
452
- - No context trimming as conversation grows
453
- - Including tool definitions for tools not available
454
- - Repeating same context every turn
455
-
456
- ## Multi-Agent Coordination
457
-
458
- **Problem**: Complex tasks need specialized agents, but coordination is hard.
459
-
460
- **Solution**: Clear patterns for routing, handoffs, and shared context.
461
-
462
- ### Coordination Patterns
463
-
464
- **Router (Orchestrator)**:
465
-
466
- - Single coordinator routes to specialized agents
467
- - Each specialist returns to coordinator
468
- - Coordinator synthesizes final answer
469
-
470
- ```typescript
471
- async function route(task: Task) {
472
- const specialist = selectAgent(task.type);
473
- const result = await specialist.execute(task);
474
- return synthesize(result);
475
- }
476
- ```
477
-
478
- **Use when**:
479
-
480
- - Clear task categorization
481
- - Specialists don't need to communicate
482
- - Single final output needed
483
-
484
- **Sequential Handoff**:
485
-
486
- - Agent A completes its part
487
- - Passes context + output to Agent B
488
- - Agent B continues from there
489
-
490
- ```typescript
491
- const research = await researchAgent.execute(query);
492
- const draft = await writerAgent.execute({
493
- research,
494
- query,
495
- });
496
- const final = await editorAgent.execute(draft);
497
- ```
498
-
499
- **Use when**:
500
-
501
- - Linear workflow
502
- - Each step builds on previous
503
- - No backtracking needed
504
-
505
- **Parallel Execution**:
506
-
507
- - Multiple agents work simultaneously
508
- - Results aggregated at end
509
- - Faster but requires independence
510
-
511
- ```typescript
512
- const [research, examples, tests] = await Promise.all([
513
- researchAgent.execute(task),
514
- exampleAgent.execute(task),
515
- testAgent.execute(task),
516
- ]);
517
-
518
- return combine(research, examples, tests);
519
- ```
520
-
521
- **Use when**:
522
-
523
- - Tasks are independent
524
- - Speed matters
525
- - No dependencies between agents
526
-
527
- **Swarm (Peer-to-Peer)**:
528
-
529
- - Agents can communicate directly
530
- - Emergent coordination
531
- - More flexible but harder to debug
532
-
533
- **Use when**:
534
-
535
- - Unpredictable workflow
536
- - Agents need to negotiate
537
- - Exploration over optimization
538
-
539
- ### Shared Context Strategies
540
-
541
- **Minimal** (default):
542
-
543
- - Only pass task description + previous output
544
- - Each agent constructs own context
545
- - Fastest, but agents may lack context
546
-
547
- **Selective**:
548
-
549
- - Pass task + output + key decisions
550
- - Include "why" not just "what"
551
- - Balance between speed and context
552
-
553
- **Full**:
554
-
555
- - Pass entire conversation thread
556
- - All agents see everything
557
- - Slowest, but maximum context
558
-
559
- ### Anti-Patterns
560
-
561
- - Creating multi-agent system when single agent would work
562
- - No clear ownership (all agents can do everything)
563
- - Agents passing full context when summary would work
564
- - No error handling when specialist fails
565
- - Circular delegation (A → B → A)
566
-
567
- ## Agent Observability
568
-
569
- **Problem**: Can't debug what you can't see.
570
-
571
- **Solution**: Structured logging and tracing throughout agent execution.
572
-
573
- ### What to Log
574
-
575
- **Every LLM Call**:
576
-
577
- - Timestamp
578
- - Model used
579
- - System prompt hash (not full text)
580
- - User message
581
- - Tool calls
582
- - Response
583
- - Token counts
584
- - Latency
585
- - Cost
586
-
587
- **Every Tool Call**:
588
-
589
- - Tool name
590
- - Arguments (sanitized)
591
- - Return value (sanitized)
592
- - Success/failure
593
- - Latency
594
-
595
- **Every Decision Point**:
596
-
597
- - What options were considered
598
- - Which was chosen
599
- - Why (confidence scores, rules triggered)
600
-
601
- ### Tracing Multi-Agent Flows
602
-
603
- ```typescript
604
- const trace = {
605
- trace_id: generateId(),
606
- parent_id: null,
607
- agent: "coordinator",
608
- task: "research_and_write",
609
- children: [],
610
- };
611
-
612
- // Child agent inherits trace context
613
- const childTrace = {
614
- trace_id: trace.trace_id,
615
- parent_id: trace.id,
616
- agent: "researcher",
617
- task: "gather_sources",
618
- };
619
- ```
620
-
621
- ### Debugging Patterns
622
-
623
- **When output is wrong**:
624
-
625
- 1. Check tool calls - were right tools called with right args?
626
- 2. Check retrieved context - was relevant info available?
627
- 3. Check prompt - was instruction clear?
628
- 4. Check examples - were they representative?
629
-
630
- **When agent gets stuck**:
631
-
632
- 1. Check for infinite loops (same tool called repeatedly)
633
- 2. Check for missing tool (agent trying to do something impossible)
634
- 3. Check for ambiguous instruction (agent can't decide)
635
-
636
- **When cost is too high**:
637
-
638
- 1. Check context size - are you sending too much?
639
- 2. Check retry logic - are you retrying failures?
640
- 3. Check model selection - using GPT-4 when 3.5 would work?
641
-
642
- ### Anti-Patterns
643
-
644
- - Logging passwords/API keys/PII
645
- - No structured format (makes analysis hard)
646
- - Logging everything (noise drowns signal)
647
- - No sampling (100% trace on high-volume)
648
- - Logs not searchable/aggregatable
649
-
650
- ## Quick Reference
651
-
652
- **Starting a new agent**:
653
-
654
- 1. Whiteboard capabilities → prioritize ruthlessly
655
- 2. Start with Level 1 or 2 architecture
656
- 3. Build eval suite early
657
- 4. Add HITL for safety-critical paths
658
- 5. Evolve architecture based on real usage
659
-
660
- **Agent not performing well**:
661
-
662
- 1. Check evals - which criteria failing?
663
- 2. Improve prompt - clearer instructions, better examples
664
- 3. Add tools - reduce hallucination
665
- 4. Add reflection - improve quality
666
- 5. Add retrieval - expand knowledge
667
-
668
- **Scaling to production**:
669
-
670
- 1. Observability first - can't debug blind
671
- 2. Evals in CI/CD - prevent regressions
672
- 3. Context budget discipline - trim aggressively
673
- 4. Dynamic behavior - adapt to user/task
674
- 5. HITL for uncertainty - safety over speed
675
-
676
- **Multi-agent complexity**:
677
-
678
- 1. Prove single-agent won't work
679
- 2. Start with router pattern
680
- 3. Minimize shared context
681
- 4. Trace end-to-end flows
682
- 5. Fall back to human when agents stuck