opencode-multiagent 0.2.1 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (160) hide show
  1. package/AGENTS.md +83 -0
  2. package/CHANGELOG.md +31 -0
  3. package/CONTRIBUTING.md +36 -0
  4. package/README.md +44 -168
  5. package/README.tr.md +84 -0
  6. package/RELEASE.md +68 -0
  7. package/agents/AGENTS.md +91 -0
  8. package/agents/auditor.md +67 -23
  9. package/agents/{worker.md → coder.md} +24 -17
  10. package/agents/docmaster.md +91 -0
  11. package/agents/executor.md +63 -79
  12. package/agents/planner.md +78 -58
  13. package/agents/reviewer.md +31 -15
  14. package/agents/scout.md +25 -17
  15. package/agents/sec-coder.md +83 -0
  16. package/agents/ui-coder.md +77 -0
  17. package/commands/board.md +17 -0
  18. package/commands/execute.md +9 -7
  19. package/commands/init-deep.md +7 -6
  20. package/commands/init.md +5 -5
  21. package/commands/inspect.md +6 -5
  22. package/commands/plan.md +8 -6
  23. package/commands/quality.md +4 -3
  24. package/commands/review.md +5 -3
  25. package/commands/status.md +5 -3
  26. package/defaults/AGENTS.md +48 -0
  27. package/defaults/opencode-multiagent.json +180 -0
  28. package/defaults/opencode-multiagent.schema.json +265 -0
  29. package/dist/control-plane.d.ts +4 -0
  30. package/dist/control-plane.d.ts.map +1 -0
  31. package/dist/index.d.ts +5 -0
  32. package/dist/index.d.ts.map +1 -0
  33. package/dist/index.js +1916 -0
  34. package/dist/opencode-multiagent/compiler.d.ts +25 -0
  35. package/dist/opencode-multiagent/compiler.d.ts.map +1 -0
  36. package/dist/opencode-multiagent/constants.d.ts +128 -0
  37. package/dist/opencode-multiagent/constants.d.ts.map +1 -0
  38. package/dist/opencode-multiagent/correlation.d.ts +21 -0
  39. package/dist/opencode-multiagent/correlation.d.ts.map +1 -0
  40. package/dist/opencode-multiagent/defaults.d.ts +10 -0
  41. package/dist/opencode-multiagent/defaults.d.ts.map +1 -0
  42. package/dist/opencode-multiagent/hooks.d.ts +62 -0
  43. package/dist/opencode-multiagent/hooks.d.ts.map +1 -0
  44. package/dist/opencode-multiagent/log.d.ts +2 -0
  45. package/dist/opencode-multiagent/log.d.ts.map +1 -0
  46. package/dist/opencode-multiagent/markdown.d.ts +8 -0
  47. package/dist/opencode-multiagent/markdown.d.ts.map +1 -0
  48. package/dist/opencode-multiagent/mcp.d.ts +3 -0
  49. package/dist/opencode-multiagent/mcp.d.ts.map +1 -0
  50. package/dist/opencode-multiagent/policy.d.ts +5 -0
  51. package/dist/opencode-multiagent/policy.d.ts.map +1 -0
  52. package/dist/opencode-multiagent/quality.d.ts +18 -0
  53. package/dist/opencode-multiagent/quality.d.ts.map +1 -0
  54. package/dist/opencode-multiagent/runtime.d.ts +7 -0
  55. package/dist/opencode-multiagent/runtime.d.ts.map +1 -0
  56. package/dist/opencode-multiagent/session-tracker.d.ts +32 -0
  57. package/dist/opencode-multiagent/session-tracker.d.ts.map +1 -0
  58. package/dist/opencode-multiagent/skills.d.ts +17 -0
  59. package/dist/opencode-multiagent/skills.d.ts.map +1 -0
  60. package/dist/opencode-multiagent/supervision.d.ts +26 -0
  61. package/dist/opencode-multiagent/supervision.d.ts.map +1 -0
  62. package/dist/opencode-multiagent/task-manager.d.ts +54 -0
  63. package/dist/opencode-multiagent/task-manager.d.ts.map +1 -0
  64. package/dist/opencode-multiagent/telemetry.d.ts +28 -0
  65. package/dist/opencode-multiagent/telemetry.d.ts.map +1 -0
  66. package/dist/opencode-multiagent/tools.d.ts +87 -0
  67. package/dist/opencode-multiagent/tools.d.ts.map +1 -0
  68. package/dist/opencode-multiagent/types.d.ts +36 -0
  69. package/dist/opencode-multiagent/types.d.ts.map +1 -0
  70. package/dist/opencode-multiagent/utils.d.ts +9 -0
  71. package/dist/opencode-multiagent/utils.d.ts.map +1 -0
  72. package/docs/agents.md +148 -0
  73. package/docs/agents.tr.md +149 -0
  74. package/docs/configuration.md +244 -0
  75. package/docs/configuration.tr.md +244 -0
  76. package/docs/usage-guide.md +224 -0
  77. package/docs/usage-guide.tr.md +225 -0
  78. package/examples/opencode.with-overrides.json +3 -7
  79. package/package.json +23 -13
  80. package/skills/AGENTS.md +51 -0
  81. package/skills/advanced-evaluation/SKILL.md +37 -21
  82. package/skills/advanced-evaluation/manifest.json +2 -13
  83. package/skills/cek-context-engineering/SKILL.md +159 -87
  84. package/skills/cek-context-engineering/manifest.json +1 -3
  85. package/skills/cek-prompt-engineering/SKILL.md +13 -10
  86. package/skills/cek-prompt-engineering/manifest.json +1 -3
  87. package/skills/cek-test-prompt/SKILL.md +38 -28
  88. package/skills/cek-test-prompt/manifest.json +1 -3
  89. package/skills/cek-thought-based-reasoning/SKILL.md +75 -21
  90. package/skills/cek-thought-based-reasoning/manifest.json +1 -3
  91. package/skills/context-degradation/SKILL.md +14 -13
  92. package/skills/context-degradation/manifest.json +1 -3
  93. package/skills/debate/SKILL.md +23 -78
  94. package/skills/debate/manifest.json +2 -12
  95. package/skills/design-first/manifest.json +2 -13
  96. package/skills/dispatching-parallel-agents/SKILL.md +14 -3
  97. package/skills/dispatching-parallel-agents/manifest.json +1 -4
  98. package/skills/drift-analysis/SKILL.md +50 -29
  99. package/skills/drift-analysis/manifest.json +2 -12
  100. package/skills/evaluation/manifest.json +2 -12
  101. package/skills/executing-plans/SKILL.md +15 -8
  102. package/skills/executing-plans/manifest.json +1 -3
  103. package/skills/handoff-protocols/manifest.json +2 -12
  104. package/skills/parallel-investigation/SKILL.md +25 -12
  105. package/skills/parallel-investigation/manifest.json +1 -4
  106. package/skills/reflexion-critique/SKILL.md +21 -10
  107. package/skills/reflexion-critique/manifest.json +1 -3
  108. package/skills/reflexion-reflect/SKILL.md +36 -34
  109. package/skills/reflexion-reflect/manifest.json +2 -10
  110. package/skills/root-cause-analysis/manifest.json +2 -13
  111. package/skills/sadd-judge-with-debate/SKILL.md +50 -26
  112. package/skills/sadd-judge-with-debate/manifest.json +1 -3
  113. package/skills/structured-code-review/manifest.json +2 -11
  114. package/skills/task-decomposition/manifest.json +2 -13
  115. package/skills/verification-before-completion/manifest.json +2 -15
  116. package/skills/verification-gates/SKILL.md +27 -19
  117. package/skills/verification-gates/manifest.json +2 -12
  118. package/agents/advisor.md +0 -57
  119. package/agents/critic.md +0 -127
  120. package/agents/deep-worker.md +0 -65
  121. package/agents/devil.md +0 -36
  122. package/agents/heavy-worker.md +0 -68
  123. package/agents/lead.md +0 -155
  124. package/agents/librarian.md +0 -62
  125. package/agents/qa.md +0 -50
  126. package/agents/quick.md +0 -65
  127. package/agents/scribe.md +0 -78
  128. package/agents/strategist.md +0 -63
  129. package/agents/ui-heavy-worker.md +0 -62
  130. package/agents/ui-worker.md +0 -69
  131. package/agents/validator.md +0 -47
  132. package/defaults/agent-settings.json +0 -102
  133. package/defaults/agent-settings.schema.json +0 -25
  134. package/defaults/flags.json +0 -35
  135. package/defaults/flags.schema.json +0 -119
  136. package/defaults/mcp-defaults.json +0 -47
  137. package/defaults/mcp-defaults.schema.json +0 -38
  138. package/defaults/profiles.json +0 -53
  139. package/defaults/profiles.schema.json +0 -60
  140. package/defaults/team-profiles.json +0 -83
  141. package/src/control-plane.ts +0 -21
  142. package/src/index.ts +0 -8
  143. package/src/opencode-multiagent/compiler.ts +0 -168
  144. package/src/opencode-multiagent/constants.ts +0 -178
  145. package/src/opencode-multiagent/file-lock.ts +0 -90
  146. package/src/opencode-multiagent/hooks.ts +0 -599
  147. package/src/opencode-multiagent/log.ts +0 -12
  148. package/src/opencode-multiagent/mailbox.ts +0 -287
  149. package/src/opencode-multiagent/markdown.ts +0 -99
  150. package/src/opencode-multiagent/mcp.ts +0 -35
  151. package/src/opencode-multiagent/policy.ts +0 -67
  152. package/src/opencode-multiagent/quality.ts +0 -140
  153. package/src/opencode-multiagent/runtime.ts +0 -55
  154. package/src/opencode-multiagent/skills.ts +0 -144
  155. package/src/opencode-multiagent/supervision.ts +0 -156
  156. package/src/opencode-multiagent/task-manager.ts +0 -148
  157. package/src/opencode-multiagent/team-manager.ts +0 -219
  158. package/src/opencode-multiagent/team-tools.ts +0 -359
  159. package/src/opencode-multiagent/telemetry.ts +0 -124
  160. package/src/opencode-multiagent/utils.ts +0 -54
@@ -83,7 +83,7 @@ The file system itself provides structure that agents can navigate. File sizes s
83
83
 
84
84
  ### Hybrid Strategies
85
85
 
86
- The most effective agents employ hybrid strategies. Pre-load some context for speed (like CLAUDE.md files or project rules), but enable autonomous exploration for additional context as needed. The decision boundary depends on task characteristics and context dynamics.
86
+ The most effective agents employ hybrid strategies. Pre-load some context for speed (like AGENTS.md files or project rules), but enable autonomous exploration for additional context as needed. The decision boundary depends on task characteristics and context dynamics.
87
87
 
88
88
  For contexts with less dynamic content, pre-loading more upfront makes sense. For rapidly changing or highly specific information, just-in-time loading avoids stale context.
89
89
 
@@ -96,6 +96,7 @@ Effective context budgeting requires understanding not just raw token counts but
96
96
  ## Examples
97
97
 
98
98
  **Example 1: Organizing System Prompts**
99
+
99
100
  ```markdown
100
101
  <BACKGROUND_INFORMATION>
101
102
  You are a Python expert helping a development team.
@@ -121,23 +122,29 @@ Explain the reasoning behind suggestions.
121
122
  ```
122
123
 
123
124
  **Example 2: Progressive Document Loading**
125
+
124
126
  ```markdown
125
127
  # Instead of loading all documentation at once:
126
128
 
127
129
  # Step 1: Load summary
128
- docs/architecture_overview.md # Lightweight overview
130
+
131
+ docs/architecture_overview.md # Lightweight overview
129
132
 
130
133
  # Step 2: Load specific section as needed
131
- docs/api/endpoints.md # Only when API work needed
132
- docs/database/schemas.md # Only when data layer work needed
134
+
135
+ docs/api/endpoints.md # Only when API work needed
136
+ docs/database/schemas.md # Only when data layer work needed
133
137
  ```
134
138
 
135
139
  **Example 3: Skill Description Design**
140
+
136
141
  ```markdown
137
142
  # Bad: Vague description that loads into context but provides little signal
143
+
138
144
  description: Helps with code things
139
145
 
140
146
  # Good: Specific description that helps model decide when to activate
147
+
141
148
  description: Analyze code quality and suggest refactoring patterns. Use when reviewing pull requests or improving existing code structure.
142
149
  ```
143
150
 
@@ -273,64 +280,76 @@ Implement these strategies through specific architectural patterns. Use just-in-
273
280
  ## Examples
274
281
 
275
282
  **Example 1: Detecting Degradation in Prompt Design**
283
+
276
284
  ```markdown
277
285
  # Signs your command/skill prompt may be too large:
278
286
 
279
287
  Early signs (context ~50-70% utilized):
288
+
280
289
  - Agent occasionally misses instructions
281
290
  - Responses become less focused
282
291
  - Some guidelines ignored
283
292
 
284
293
  Warning signs (context ~70-85% utilized):
294
+
285
295
  - Inconsistent behavior across runs
286
296
  - Agent "forgets" earlier instructions
287
297
  - Quality varies significantly
288
298
 
289
299
  Critical signs (context >85% utilized):
300
+
290
301
  - Agent ignores key constraints
291
302
  - Hallucinations increase
292
303
  - Task completion fails
293
304
  ```
294
305
 
295
306
  **Example 2: Mitigating Lost-in-Middle in Prompt Structure**
307
+
296
308
  ```markdown
297
309
  # Organize prompts with critical info at edges
298
310
 
299
- <CRITICAL_CONSTRAINTS> # At start (high attention)
311
+ <CRITICAL_CONSTRAINTS> # At start (high attention)
312
+
300
313
  - Never modify production files directly
301
314
  - Always run tests before committing
302
315
  - Maximum file size: 500 lines
303
- </CRITICAL_CONSTRAINTS>
316
+ </CRITICAL_CONSTRAINTS>
317
+
318
+ <DETAILED_GUIDELINES> # Middle (lower attention)
304
319
 
305
- <DETAILED_GUIDELINES> # Middle (lower attention)
306
320
  - Code style preferences
307
321
  - Documentation templates
308
322
  - Review checklists
309
323
  - Example patterns
310
- </DETAILED_GUIDELINES>
324
+ </DETAILED_GUIDELINES>
325
+
326
+ <KEY_REMINDERS> # At end (high attention)
311
327
 
312
- <KEY_REMINDERS> # At end (high attention)
313
328
  - Run tests: npm run test
314
329
  - Format code: npm run format
315
330
  - Create PR with description
316
- </KEY_REMINDERS>
331
+ </KEY_REMINDERS>
317
332
  ```
318
333
 
319
334
  **Example 3: Sub-Agent Context Isolation**
335
+
320
336
  ```markdown
321
337
  # Instead of one agent handling everything:
322
338
 
323
339
  ## Coordinator Agent (lean context)
340
+
324
341
  - Understands task decomposition
325
342
  - Delegates to specialized sub-agents
326
343
  - Synthesizes results
327
344
 
328
345
  ## Code Review Sub-Agent (isolated context)
346
+
329
347
  - Loaded only with code review guidelines
330
348
  - Focuses solely on review task
331
349
  - Returns structured findings
332
350
 
333
351
  ## Test Writer Sub-Agent (isolated context)
352
+
334
353
  - Loaded only with testing patterns
335
354
  - Focuses solely on test creation
336
355
  - Returns test files
@@ -378,12 +397,13 @@ Extract all factual claims from the following output. List each claim on a separ
378
397
  </TASK>
379
398
 
380
399
  <FOCUS_AREAS>
400
+
381
401
  - File paths and their existence
382
402
  - Function/class/method names referenced
383
403
  - Code behavior assertions ("this function returns X")
384
404
  - External facts about APIs, libraries, or specifications
385
405
  - Numerical values and metrics
386
- </FOCUS_AREAS>
406
+ </FOCUS_AREAS>
387
407
 
388
408
  <OUTPUT_TO_ANALYZE>
389
409
  {agent_output}
@@ -412,11 +432,12 @@ Verify this claim by checking the actual codebase and context.
412
432
  </CLAIM>
413
433
 
414
434
  <VERIFICATION_APPROACH>
435
+
415
436
  - For file paths: Use file tools to check existence
416
437
  - For code claims: Read the actual code and verify behavior
417
438
  - For external facts: Cross-reference with documentation or web search
418
439
  - For metrics: Analyze the code structure
419
- </VERIFICATION_APPROACH>
440
+ </VERIFICATION_APPROACH>
420
441
 
421
442
  <RESPONSE_FORMAT>
422
443
  STATUS: [VERIFIED | FALSE | UNVERIFIABLE]
@@ -452,10 +473,11 @@ Specific issues:
452
473
  {list of FALSE and UNVERIFIABLE claims with evidence}
453
474
 
454
475
  Please regenerate your response. For each factual claim:
476
+
455
477
  1. Explicitly verify it using tools before stating it
456
478
  2. If you cannot verify, state "I cannot verify..." instead of asserting
457
479
  3. Cite the specific file/line/source for verifiable facts
458
- </REGENERATION_PROMPT>
480
+ </REGENERATION_PROMPT>
459
481
  ```
460
482
 
461
483
  ## Lost-in-Middle Detection Workflow
@@ -477,6 +499,7 @@ Extract all critical instructions from your prompt that the agent MUST follow:
477
499
 
478
500
  ```markdown
479
501
  Critical instructions to verify:
502
+
480
503
  1. "Never modify files in /production"
481
504
  2. "Always run tests before committing"
482
505
  3. "Use TypeScript strict mode"
@@ -497,10 +520,11 @@ Prompt: {your_full_prompt_being_tested}
497
520
  Task: {representative_task_that_exercises_all_instructions}
498
521
 
499
522
  For each run, save:
523
+
500
524
  - run_id: unique identifier
501
525
  - agent_output: complete response from agent
502
526
  - timestamp: when run completed
503
- </AGENT_RUN_CONFIG>
527
+ </AGENT_RUN_CONFIG>
504
528
  ```
505
529
 
506
530
  **Step 3: Verify Each Output Against Original Prompt**
@@ -527,16 +551,18 @@ You are a compliance verification agent. Analyze whether the agent output follow
527
551
 
528
552
  <VERIFICATION_APPROACH>
529
553
  For each critical instruction:
554
+
530
555
  1. Determine if the instruction was applicable to this task
531
556
  2. If applicable, check whether the output complies
532
557
  3. Look for both explicit violations and omissions
533
558
  4. Note any partial compliance
534
- </VERIFICATION_APPROACH>
559
+ </VERIFICATION_APPROACH>
535
560
 
536
561
  <OUTPUT_FORMAT>
537
562
  RUN_ID: {run_id}
538
563
 
539
564
  INSTRUCTION_COMPLIANCE:
565
+
540
566
  - Instruction 1: "Never modify files in /production"
541
567
  STATUS: [FOLLOWED | VIOLATED | NOT_APPLICABLE]
542
568
  EVIDENCE: {quote from output or explanation}
@@ -548,11 +574,12 @@ INSTRUCTION_COMPLIANCE:
548
574
  [... continue for all instructions ...]
549
575
 
550
576
  SUMMARY:
577
+
551
578
  - Instructions followed: {count}
552
579
  - Instructions violated: {count}
553
580
  - Not applicable: {count}
554
- </OUTPUT_FORMAT>
555
- </VERIFICATION_AGENT_PROMPT>
581
+ </OUTPUT_FORMAT>
582
+ </VERIFICATION_AGENT_PROMPT>
556
583
  ```
557
584
 
558
585
  **Step 4: Aggregate Results and Identify At-Risk Parts**
@@ -562,18 +589,19 @@ Collect verification results from all runs and identify instructions that were i
562
589
  ```markdown
563
590
  <AGGREGATION_LOGIC>
564
591
  For each instruction:
565
- followed_count = number of runs where STATUS == FOLLOWED
566
- violated_count = number of runs where STATUS == VIOLATED
567
- applicable_runs = total_runs - (runs where STATUS == NOT_APPLICABLE)
592
+ followed_count = number of runs where STATUS == FOLLOWED
593
+ violated_count = number of runs where STATUS == VIOLATED
594
+ applicable_runs = total_runs - (runs where STATUS == NOT_APPLICABLE)
595
+
596
+ compliance_rate = followed_count / applicable_runs
568
597
 
569
- compliance_rate = followed_count / applicable_runs
598
+ Classification:
570
599
 
571
- Classification:
572
- - compliance_rate == 1.0: RELIABLE (always followed)
573
- - compliance_rate >= 0.8: MOSTLY_RELIABLE (minor inconsistency)
574
- - compliance_rate >= 0.5: AT_RISK (inconsistent - likely lost-in-middle)
575
- - compliance_rate < 0.5: FREQUENTLY_IGNORED (severe issue)
576
- - compliance_rate == 0.0: ALWAYS_IGNORED (critical failure)
600
+ - compliance_rate == 1.0: RELIABLE (always followed)
601
+ - compliance_rate >= 0.8: MOSTLY_RELIABLE (minor inconsistency)
602
+ - compliance_rate >= 0.5: AT_RISK (inconsistent - likely lost-in-middle)
603
+ - compliance_rate < 0.5: FREQUENTLY_IGNORED (severe issue)
604
+ - compliance_rate == 0.0: ALWAYS_IGNORED (critical failure)
577
605
 
578
606
  AT_RISK instructions are the primary signal for lost-in-middle problems.
579
607
  These are instructions that work sometimes but not consistently, indicating
@@ -583,22 +611,23 @@ they are in attention-weak positions.
583
611
  <AGGREGATION_OUTPUT_FORMAT>
584
612
  INSTRUCTION COMPLIANCE SUMMARY:
585
613
 
586
- | Instruction | Followed | Violated | Compliance Rate | Status |
587
- |-------------|----------|----------|-----------------|--------|
588
- | 1. Never modify /production | 5/5 | 0/5 | 100% | RELIABLE |
589
- | 2. Run tests before commit | 3/5 | 2/5 | 60% | AT_RISK |
590
- | 3. TypeScript strict mode | 4/5 | 1/5 | 80% | MOSTLY_RELIABLE |
591
- | 4. Max function length 50 | 2/5 | 3/5 | 40% | FREQUENTLY_IGNORED |
592
- | 5. Include JSDoc | 5/5 | 0/5 | 100% | RELIABLE |
593
- | 6. Format as JSON | 1/5 | 4/5 | 20% | ALWAYS_IGNORED |
594
- | 7. Log modifications | 3/5 | 2/5 | 60% | AT_RISK |
614
+ | Instruction | Followed | Violated | Compliance Rate | Status |
615
+ | --------------------------- | -------- | -------- | --------------- | ------------------ |
616
+ | 1. Never modify /production | 5/5 | 0/5 | 100% | RELIABLE |
617
+ | 2. Run tests before commit | 3/5 | 2/5 | 60% | AT_RISK |
618
+ | 3. TypeScript strict mode | 4/5 | 1/5 | 80% | MOSTLY_RELIABLE |
619
+ | 4. Max function length 50 | 2/5 | 3/5 | 40% | FREQUENTLY_IGNORED |
620
+ | 5. Include JSDoc | 5/5 | 0/5 | 100% | RELIABLE |
621
+ | 6. Format as JSON | 1/5 | 4/5 | 20% | ALWAYS_IGNORED |
622
+ | 7. Log modifications | 3/5 | 2/5 | 60% | AT_RISK |
595
623
 
596
624
  AT-RISK INSTRUCTIONS (likely in lost-in-middle zone):
625
+
597
626
  - Instruction 2: "Run tests before commit" (60% compliance)
598
627
  - Instruction 4: "Max function length 50" (40% compliance)
599
628
  - Instruction 6: "Format as JSON" (20% compliance)
600
629
  - Instruction 7: "Log modifications" (60% compliance)
601
- </AGGREGATION_OUTPUT_FORMAT>
630
+ </AGGREGATION_OUTPUT_FORMAT>
602
631
  ```
603
632
 
604
633
  **Step 5: Output Recommendations**
@@ -626,10 +655,10 @@ SPECIFIC RECOMMENDATIONS:
626
655
  Restructure at-risk instructions with emphasis:
627
656
 
628
657
  Before: "Always run tests before committing"
629
- After: "**CRITICAL:** You MUST run tests before committing. Never skip this step."
658
+ After: "**CRITICAL:** You MUST run tests before committing. Never skip this step."
630
659
 
631
660
  Before: "Maximum function length: 50 lines"
632
- After: "3. [REQUIRED] Maximum function length: 50 lines"
661
+ After: "3. [REQUIRED] Maximum function length: 50 lines"
633
662
 
634
663
  Use numbered lists, bold markers, or explicit tags like [REQUIRED], [CRITICAL], [MUST].
635
664
 
@@ -644,7 +673,7 @@ SPECIFIC RECOMMENDATIONS:
644
673
  - Moving 2-3 most critical items to edges
645
674
  - Converting remaining middle items to a numbered checklist
646
675
  - Adding explicit "verify these items" reminder at end
647
- </RECOMMENDATIONS_OUTPUT>
676
+ </RECOMMENDATIONS_OUTPUT>
648
677
  ```
649
678
 
650
679
  ### Complete Workflow Example
@@ -653,31 +682,37 @@ SPECIFIC RECOMMENDATIONS:
653
682
  # Example: Testing a Code Review Command
654
683
 
655
684
  ## Original Prompt Being Tested:
685
+
656
686
  "Review the code for: security issues, performance problems,
657
687
  code style, test coverage, documentation completeness,
658
688
  error handling, and logging practices."
659
689
 
660
690
  ## Run 5 Agents:
691
+
661
692
  Each agent reviews the same code sample with this prompt.
662
693
 
663
694
  ## Verification Results:
664
- | Instruction | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Rate |
665
- |-------------|-------|-------|-------|-------|-------|------|
666
- | Security | Y | Y | Y | Y | Y | 100% |
667
- | Performance | Y | X | Y | X | Y | 60% |
668
- | Code style | X | X | Y | X | X | 20% |
669
- | Test coverage | X | Y | X | X | Y | 40% |
670
- | Documentation | X | X | X | Y | X | 20% |
671
- | Error handling | Y | Y | X | Y | Y | 80% |
672
- | Logging | Y | Y | Y | Y | Y | 100% |
695
+
696
+ | Instruction | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Rate |
697
+ | -------------- | ----- | ----- | ----- | ----- | ----- | ---- |
698
+ | Security | Y | Y | Y | Y | Y | 100% |
699
+ | Performance | Y | X | Y | X | Y | 60% |
700
+ | Code style | X | X | Y | X | X | 20% |
701
+ | Test coverage | X | Y | X | X | Y | 40% |
702
+ | Documentation | X | X | X | Y | X | 20% |
703
+ | Error handling | Y | Y | X | Y | Y | 80% |
704
+ | Logging | Y | Y | Y | Y | Y | 100% |
673
705
 
674
706
  ## Analysis:
707
+
675
708
  - RELIABLE: Security, Logging (at edges of list)
676
709
  - AT_RISK: Performance, Error handling
677
710
  - FREQUENTLY_IGNORED: Code style, Test coverage, Documentation (middle of list)
678
711
 
679
712
  ## Remediation Applied:
713
+
680
714
  "**CRITICAL REVIEW AREAS:**
715
+
681
716
  1. Security vulnerabilities
682
717
  2. Test coverage gaps
683
718
  3. Documentation completeness
@@ -706,6 +741,7 @@ Record the output of each agent in your chain:
706
741
 
707
742
  ```markdown
708
743
  Agent Chain Record:
744
+
709
745
  - Agent 1 (Analyzer): {output_1}
710
746
  - Agent 2 (Planner): {output_2}
711
747
  - Agent 3 (Implementer): {output_3}
@@ -758,11 +794,12 @@ Agent 4 Output: {output_4}
758
794
 
759
795
  <ANALYSIS_APPROACH>
760
796
  For each agent output (starting from the last):
797
+
761
798
  1. Does this output contain the error?
762
799
  2. If yes, was the error present in the input to this agent?
763
800
  3. If error is in output but not input: This agent INTRODUCED the error
764
801
  4. If error is in both: This agent PROPAGATED the error
765
- </ANALYSIS_APPROACH>
802
+ </ANALYSIS_APPROACH>
766
803
 
767
804
  <OUTPUT_FORMAT>
768
805
  ERROR: {error_id}
@@ -803,7 +840,7 @@ After Agent {N} completes:
803
840
  - Or: Regenerate Agent N output with explicit guidance
804
841
 
805
842
  3. Only proceed to Agent {N+1} if verification passes
806
- </ERROR_BOUNDARY_TEMPLATE>
843
+ </ERROR_BOUNDARY_TEMPLATE>
807
844
  ```
808
845
 
809
846
  ## Context Relevance Scoring Workflow
@@ -813,7 +850,7 @@ Not all parts of a prompt contribute equally to task completion. This workflow i
813
850
  ### When to Use
814
851
 
815
852
  - When optimizing prompt length and content
816
- - When deciding what to include in CLAUDE.md
853
+ - When deciding what to include in AGENTS.md
817
854
  - When a prompt feels bloated but you are unsure what to cut
818
855
  - When debugging agents that ignore provided context
819
856
  - Before deploying new commands, skills, or agent prompts
@@ -827,35 +864,32 @@ Divide the prompt (command/skill/agent) into logical sections. Each part should
827
864
  ```markdown
828
865
  <PROMPT_PARTS>
829
866
  PART_1:
830
- ID: background
831
- CONTENT: |
832
- You are a Python expert helping a development team.
833
- Current project: Data processing pipeline in Python 3.9+
867
+ ID: background
868
+ CONTENT: |
869
+ You are a Python expert helping a development team.
870
+ Current project: Data processing pipeline in Python 3.9+
834
871
 
835
872
  PART_2:
836
- ID: code_style_rules
837
- CONTENT: |
838
- - Write clean, idiomatic Python code
839
- - Include type hints for function signatures
840
- - Add docstrings for public functions
841
- - Follow PEP 8 style guidelines
873
+ ID: code_style_rules
874
+ CONTENT: | - Write clean, idiomatic Python code - Include type hints for function signatures - Add docstrings for public functions - Follow PEP 8 style guidelines
842
875
 
843
876
  PART_3:
844
- ID: historical_context
845
- CONTENT: |
846
- The project was migrated from Python 2.7 in 2019.
847
- Original team used camelCase naming but we now use snake_case.
848
- Legacy modules in /legacy folder are frozen.
877
+ ID: historical_context
878
+ CONTENT: |
879
+ The project was migrated from Python 2.7 in 2019.
880
+ Original team used camelCase naming but we now use snake_case.
881
+ Legacy modules in /legacy folder are frozen.
849
882
 
850
883
  PART_4:
851
- ID: output_format
852
- CONTENT: |
853
- Provide actionable feedback with specific line references.
854
- Explain the reasoning behind suggestions.
884
+ ID: output_format
885
+ CONTENT: |
886
+ Provide actionable feedback with specific line references.
887
+ Explain the reasoning behind suggestions.
855
888
  </PROMPT_PARTS>
856
889
  ```
857
890
 
858
891
  Splitting guidelines:
892
+
859
893
  - Each XML section or Markdown header becomes a part
860
894
  - Separate conceptually distinct instructions into their own parts
861
895
  - Keep related instructions together (do not split mid-thought)
@@ -883,26 +917,30 @@ Example: "Review a pull request for code quality issues and suggest improvements
883
917
  Score 0-10 based on these criteria:
884
918
 
885
919
  ESSENTIAL (8-10):
920
+
886
921
  - Part directly enables task completion
887
922
  - Removing this part would cause task failure
888
923
  - Part contains critical constraints that prevent errors
889
924
  - Part defines required output format or structure
890
925
 
891
926
  HELPFUL (5-7):
927
+
892
928
  - Part improves output quality but is not strictly required
893
929
  - Part provides useful context that guides better decisions
894
930
  - Part contains preferences that affect style but not correctness
895
931
 
896
932
  MARGINAL (2-4):
933
+
897
934
  - Part has tangential relevance to the task
898
935
  - Part might occasionally be useful but usually is not
899
936
  - Part provides historical context rarely needed
900
937
 
901
938
  DISTRACTOR (0-1):
939
+
902
940
  - Part is irrelevant to the task
903
941
  - Part could confuse the agent about what to focus on
904
942
  - Part competes for attention without contributing value
905
- </SCORING_CRITERIA>
943
+ </SCORING_CRITERIA>
906
944
 
907
945
  <OUTPUT_FORMAT>
908
946
  RELEVANCE_SCORE: [0-10]
@@ -943,12 +981,14 @@ Apply the distractor threshold (score < 5):
943
981
  DISTRACTOR_ANALYSIS:
944
982
 
945
983
  Identified Distractors:
984
+
946
985
  1. PART: historical_context
947
986
  SCORE: 3/10
948
987
  JUSTIFICATION: "Migration history from Python 2.7 is rarely relevant to reviewing current code. The naming convention note is useful but should be in code_style_rules instead."
949
988
  RECOMMENDATION: REMOVE or RELOCATE
950
989
 
951
990
  Summary:
991
+
952
992
  - Total parts: 4
953
993
  - High-relevance parts (>=5): 3
954
994
  - Distractor parts (<5): 1
@@ -956,6 +996,7 @@ Summary:
956
996
  - Average relevance: 6.75
957
997
 
958
998
  Token Impact:
999
+
959
1000
  - Distractor tokens: ~45 (historical_context)
960
1001
  - Potential savings: 45 tokens (11% of prompt)
961
1002
  ```
@@ -980,6 +1021,7 @@ OPTIMIZATION_RECOMMENDATIONS:
980
1021
  Savings: ~15 tokens
981
1022
 
982
1023
  OPTIMIZED PROMPT STRUCTURE:
1024
+
983
1025
  - background (condensed): 8 tokens
984
1026
  - code_style_rules (with snake_case added): 52 tokens
985
1027
  - output_format: 28 tokens
@@ -991,13 +1033,14 @@ OPTIMIZED PROMPT STRUCTURE:
991
1033
 
992
1034
  The default threshold of 5 balances comprehensiveness against efficiency:
993
1035
 
994
- | Threshold | Use Case |
995
- |-----------|----------|
996
- | < 3 | Aggressive pruning for token-constrained contexts |
997
- | < 5 | Standard optimization (recommended default) |
998
- | < 7 | Conservative pruning for critical prompts |
1036
+ | Threshold | Use Case |
1037
+ | --------- | ------------------------------------------------- |
1038
+ | < 3 | Aggressive pruning for token-constrained contexts |
1039
+ | < 5 | Standard optimization (recommended default) |
1040
+ | < 7 | Conservative pruning for critical prompts |
999
1041
 
1000
1042
  Adjust threshold based on:
1043
+
1001
1044
  - **Context budget pressure**: Lower threshold when approaching limits
1002
1045
  - **Task criticality**: Higher threshold for production prompts
1003
1046
  - **Prompt stability**: Lower threshold for experimental prompts
@@ -1008,14 +1051,16 @@ For efficiency, parallelize scoring agents:
1008
1051
 
1009
1052
  ```markdown
1010
1053
  # Parallel execution pattern
1054
+
1011
1055
  spawn_parallel([
1012
- scoring_agent(part_1, task_description),
1013
- scoring_agent(part_2, task_description),
1014
- scoring_agent(part_3, task_description),
1015
- ...
1056
+ scoring_agent(part_1, task_description),
1057
+ scoring_agent(part_2, task_description),
1058
+ scoring_agent(part_3, task_description),
1059
+ ...
1016
1060
  ])
1017
1061
 
1018
1062
  # Collect and aggregate
1063
+
1019
1064
  scores = await_all(scoring_agents)
1020
1065
  analysis = aggregate_scores(scores)
1021
1066
  ```
@@ -1052,30 +1097,35 @@ Analyze the recent conversation history for signs of context degradation.
1052
1097
  Check for these degradation symptoms:
1053
1098
 
1054
1099
  LOST_IN_MIDDLE:
1100
+
1055
1101
  - [ ] Agent missing instructions from early in conversation
1056
1102
  - [ ] Critical constraints being ignored
1057
1103
  - [ ] Agent asking for information already provided
1058
1104
 
1059
1105
  CONTEXT_POISONING:
1106
+
1060
1107
  - [ ] Same error appearing repeatedly
1061
1108
  - [ ] Agent referencing incorrect information as fact
1062
1109
  - [ ] Hallucinations that persist despite correction
1063
1110
 
1064
1111
  CONTEXT_DISTRACTION:
1112
+
1065
1113
  - [ ] Responses becoming unfocused
1066
1114
  - [ ] Agent using irrelevant context inappropriately
1067
1115
  - [ ] Quality declining on previously-successful tasks
1068
1116
 
1069
1117
  CONTEXT_CONFUSION:
1118
+
1070
1119
  - [ ] Agent mixing up different task requirements
1071
1120
  - [ ] Wrong tool selections for obvious tasks
1072
1121
  - [ ] Outputs that blend requirements from different tasks
1073
1122
 
1074
1123
  CONTEXT_CLASH:
1124
+
1075
1125
  - [ ] Agent expressing uncertainty about conflicting information
1076
1126
  - [ ] Inconsistent behavior between turns
1077
1127
  - [ ] Agent asking for clarification on resolved issues
1078
- </SYMPTOM_CHECKLIST>
1128
+ </SYMPTOM_CHECKLIST>
1079
1129
 
1080
1130
  <OUTPUT_FORMAT>
1081
1131
  HEALTH_STATUS: [HEALTHY | DEGRADED | CRITICAL]
@@ -1091,10 +1141,11 @@ Based on health status, trigger appropriate intervention:
1091
1141
 
1092
1142
  ```markdown
1093
1143
  IF HEALTH_STATUS == "DEGRADED" or HEALTH_STATUS == "CRITICAL":
1094
- <RESTART_INTERVENTION>
1095
- 1. Extract essential state to preserve and save to a file
1096
- 2. Ask user to start a new session with clean context and load the preserved state from the file after the new session is started
1097
- </RESTART_INTERVENTION>
1144
+ <RESTART_INTERVENTION>
1145
+
1146
+ 1. Extract essential state to preserve and save to a file
1147
+ 2. Ask user to start a new session with clean context and load the preserved state from the file after the new session is started
1148
+ </RESTART_INTERVENTION>
1098
1149
  ```
1099
1150
 
1100
1151
  ## Guidelines for Multi-Agent Verification
@@ -1153,16 +1204,19 @@ Observation masking replaces verbose tool outputs with compact references. The i
1153
1204
  Not all observations should be masked equally:
1154
1205
 
1155
1206
  **Never mask:**
1207
+
1156
1208
  - Observations critical to current task
1157
1209
  - Observations from the most recent turn
1158
1210
  - Observations used in active reasoning
1159
1211
 
1160
1212
  **Consider masking:**
1213
+
1161
1214
  - Observations from 3+ turns ago
1162
1215
  - Verbose outputs with key points extractable
1163
1216
  - Observations whose purpose has been served
1164
1217
 
1165
1218
  **Always mask:**
1219
+
1166
1220
  - Repeated outputs
1167
1221
  - Boilerplate headers/footers
1168
1222
  - Outputs already summarized in conversation
@@ -1176,6 +1230,7 @@ This approach achieves separation of concerns--the detailed search context remai
1176
1230
 
1177
1231
  **When to Partition**
1178
1232
  Consider partitioning when:
1233
+
1179
1234
  - Task naturally decomposes into independent subtasks
1180
1235
  - Different subtasks require different specialized context
1181
1236
  - Context accumulation threatens to exceed limits
@@ -1183,22 +1238,24 @@ Consider partitioning when:
1183
1238
 
1184
1239
  **Result Aggregation**
1185
1240
  Aggregate results from partitioned subtasks by:
1241
+
1186
1242
  1. Validating all partitions completed
1187
1243
  2. Merging compatible results
1188
1244
  3. Summarizing if combined results still too large
1189
1245
  4. Resolving conflicts between partition outputs
1190
1246
 
1191
-
1192
1247
  ## Practical Guidance
1193
1248
 
1194
1249
  ### Optimization Decision Framework
1195
1250
 
1196
1251
  **When to optimize:**
1252
+
1197
1253
  - Response quality degrades as conversations extend
1198
1254
  - Costs increase due to long contexts
1199
1255
  - Latency increases with conversation length
1200
1256
 
1201
1257
  **What to apply:**
1258
+
1202
1259
  - Tool outputs dominate: observation masking
1203
1260
  - Retrieved documents dominate: summarization or partitioning
1204
1261
  - Message history dominates: compaction with summarization
@@ -1208,29 +1265,41 @@ Aggregate results from partitioned subtasks by:
1208
1265
 
1209
1266
  **Command Optimization**
1210
1267
  Commands load on-demand, so focus on keeping individual commands focused:
1268
+
1211
1269
  ```markdown
1212
1270
  # Good: Focused command with clear scope
1271
+
1213
1272
  ---
1273
+
1214
1274
  name: review-security
1215
1275
  description: Review code for security vulnerabilities
1276
+
1216
1277
  ---
1278
+
1217
1279
  # Specific security review instructions only
1218
1280
 
1219
1281
  # Avoid: Overloaded command trying to do everything
1282
+
1220
1283
  ---
1284
+
1221
1285
  name: review-all
1222
1286
  description: Review code for everything
1287
+
1223
1288
  ---
1289
+
1224
1290
  # 50 different review checklists crammed together
1225
1291
  ```
1226
1292
 
1227
1293
  **Skill Optimization**
1228
1294
  Skills load their descriptions by default, so descriptions must be concise:
1295
+
1229
1296
  ```markdown
1230
1297
  # Good: Concise description
1298
+
1231
1299
  description: Analyze code architecture. Use for design reviews.
1232
1300
 
1233
1301
  # Avoid: Verbose description that wastes context budget
1302
+
1234
1303
  description: This skill provides comprehensive analysis of code
1235
1304
  architecture including but not limited to class hierarchies,
1236
1305
  dependency graphs, coupling metrics, cohesion analysis...
@@ -1238,12 +1307,15 @@ dependency graphs, coupling metrics, cohesion analysis...
1238
1307
 
1239
1308
  **Sub-Agent Context Design**
1240
1309
  When spawning sub-agents, provide focused context:
1310
+
1241
1311
  ```markdown
1242
1312
  # Coordinator provides minimal handoff:
1313
+
1243
1314
  "Review authentication module for security issues.
1244
1315
  Return findings in structured format."
1245
1316
 
1246
1317
  # NOT this verbose handoff:
1318
+
1247
1319
  "I need you to look at the authentication module which is
1248
1320
  located in src/auth/ and contains several files including
1249
1321
  login.ts, session.ts, tokens.ts... [500 more tokens of context]"