opencode-multiagent 0.2.1 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +83 -0
- package/CHANGELOG.md +31 -0
- package/CONTRIBUTING.md +36 -0
- package/README.md +44 -168
- package/README.tr.md +84 -0
- package/RELEASE.md +68 -0
- package/agents/AGENTS.md +91 -0
- package/agents/auditor.md +67 -23
- package/agents/{worker.md → coder.md} +24 -17
- package/agents/docmaster.md +91 -0
- package/agents/executor.md +63 -79
- package/agents/planner.md +78 -58
- package/agents/reviewer.md +31 -15
- package/agents/scout.md +25 -17
- package/agents/sec-coder.md +83 -0
- package/agents/ui-coder.md +77 -0
- package/commands/board.md +17 -0
- package/commands/execute.md +9 -7
- package/commands/init-deep.md +7 -6
- package/commands/init.md +5 -5
- package/commands/inspect.md +6 -5
- package/commands/plan.md +8 -6
- package/commands/quality.md +4 -3
- package/commands/review.md +5 -3
- package/commands/status.md +5 -3
- package/defaults/AGENTS.md +48 -0
- package/defaults/opencode-multiagent.json +180 -0
- package/defaults/opencode-multiagent.schema.json +265 -0
- package/dist/control-plane.d.ts +4 -0
- package/dist/control-plane.d.ts.map +1 -0
- package/dist/index.d.ts +5 -0
- package/dist/index.d.ts.map +1 -0
- package/dist/index.js +1916 -0
- package/dist/opencode-multiagent/compiler.d.ts +25 -0
- package/dist/opencode-multiagent/compiler.d.ts.map +1 -0
- package/dist/opencode-multiagent/constants.d.ts +128 -0
- package/dist/opencode-multiagent/constants.d.ts.map +1 -0
- package/dist/opencode-multiagent/correlation.d.ts +21 -0
- package/dist/opencode-multiagent/correlation.d.ts.map +1 -0
- package/dist/opencode-multiagent/defaults.d.ts +10 -0
- package/dist/opencode-multiagent/defaults.d.ts.map +1 -0
- package/dist/opencode-multiagent/hooks.d.ts +62 -0
- package/dist/opencode-multiagent/hooks.d.ts.map +1 -0
- package/dist/opencode-multiagent/log.d.ts +2 -0
- package/dist/opencode-multiagent/log.d.ts.map +1 -0
- package/dist/opencode-multiagent/markdown.d.ts +8 -0
- package/dist/opencode-multiagent/markdown.d.ts.map +1 -0
- package/dist/opencode-multiagent/mcp.d.ts +3 -0
- package/dist/opencode-multiagent/mcp.d.ts.map +1 -0
- package/dist/opencode-multiagent/policy.d.ts +5 -0
- package/dist/opencode-multiagent/policy.d.ts.map +1 -0
- package/dist/opencode-multiagent/quality.d.ts +18 -0
- package/dist/opencode-multiagent/quality.d.ts.map +1 -0
- package/dist/opencode-multiagent/runtime.d.ts +7 -0
- package/dist/opencode-multiagent/runtime.d.ts.map +1 -0
- package/dist/opencode-multiagent/session-tracker.d.ts +32 -0
- package/dist/opencode-multiagent/session-tracker.d.ts.map +1 -0
- package/dist/opencode-multiagent/skills.d.ts +17 -0
- package/dist/opencode-multiagent/skills.d.ts.map +1 -0
- package/dist/opencode-multiagent/supervision.d.ts +26 -0
- package/dist/opencode-multiagent/supervision.d.ts.map +1 -0
- package/dist/opencode-multiagent/task-manager.d.ts +54 -0
- package/dist/opencode-multiagent/task-manager.d.ts.map +1 -0
- package/dist/opencode-multiagent/telemetry.d.ts +28 -0
- package/dist/opencode-multiagent/telemetry.d.ts.map +1 -0
- package/dist/opencode-multiagent/tools.d.ts +87 -0
- package/dist/opencode-multiagent/tools.d.ts.map +1 -0
- package/dist/opencode-multiagent/types.d.ts +36 -0
- package/dist/opencode-multiagent/types.d.ts.map +1 -0
- package/dist/opencode-multiagent/utils.d.ts +9 -0
- package/dist/opencode-multiagent/utils.d.ts.map +1 -0
- package/docs/agents.md +148 -0
- package/docs/agents.tr.md +149 -0
- package/docs/configuration.md +244 -0
- package/docs/configuration.tr.md +244 -0
- package/docs/usage-guide.md +224 -0
- package/docs/usage-guide.tr.md +225 -0
- package/examples/opencode.with-overrides.json +3 -7
- package/package.json +23 -13
- package/skills/AGENTS.md +51 -0
- package/skills/advanced-evaluation/SKILL.md +37 -21
- package/skills/advanced-evaluation/manifest.json +2 -13
- package/skills/cek-context-engineering/SKILL.md +159 -87
- package/skills/cek-context-engineering/manifest.json +1 -3
- package/skills/cek-prompt-engineering/SKILL.md +13 -10
- package/skills/cek-prompt-engineering/manifest.json +1 -3
- package/skills/cek-test-prompt/SKILL.md +38 -28
- package/skills/cek-test-prompt/manifest.json +1 -3
- package/skills/cek-thought-based-reasoning/SKILL.md +75 -21
- package/skills/cek-thought-based-reasoning/manifest.json +1 -3
- package/skills/context-degradation/SKILL.md +14 -13
- package/skills/context-degradation/manifest.json +1 -3
- package/skills/debate/SKILL.md +23 -78
- package/skills/debate/manifest.json +2 -12
- package/skills/design-first/manifest.json +2 -13
- package/skills/dispatching-parallel-agents/SKILL.md +14 -3
- package/skills/dispatching-parallel-agents/manifest.json +1 -4
- package/skills/drift-analysis/SKILL.md +50 -29
- package/skills/drift-analysis/manifest.json +2 -12
- package/skills/evaluation/manifest.json +2 -12
- package/skills/executing-plans/SKILL.md +15 -8
- package/skills/executing-plans/manifest.json +1 -3
- package/skills/handoff-protocols/manifest.json +2 -12
- package/skills/parallel-investigation/SKILL.md +25 -12
- package/skills/parallel-investigation/manifest.json +1 -4
- package/skills/reflexion-critique/SKILL.md +21 -10
- package/skills/reflexion-critique/manifest.json +1 -3
- package/skills/reflexion-reflect/SKILL.md +36 -34
- package/skills/reflexion-reflect/manifest.json +2 -10
- package/skills/root-cause-analysis/manifest.json +2 -13
- package/skills/sadd-judge-with-debate/SKILL.md +50 -26
- package/skills/sadd-judge-with-debate/manifest.json +1 -3
- package/skills/structured-code-review/manifest.json +2 -11
- package/skills/task-decomposition/manifest.json +2 -13
- package/skills/verification-before-completion/manifest.json +2 -15
- package/skills/verification-gates/SKILL.md +27 -19
- package/skills/verification-gates/manifest.json +2 -12
- package/agents/advisor.md +0 -57
- package/agents/critic.md +0 -127
- package/agents/deep-worker.md +0 -65
- package/agents/devil.md +0 -36
- package/agents/heavy-worker.md +0 -68
- package/agents/lead.md +0 -155
- package/agents/librarian.md +0 -62
- package/agents/qa.md +0 -50
- package/agents/quick.md +0 -65
- package/agents/scribe.md +0 -78
- package/agents/strategist.md +0 -63
- package/agents/ui-heavy-worker.md +0 -62
- package/agents/ui-worker.md +0 -69
- package/agents/validator.md +0 -47
- package/defaults/agent-settings.json +0 -102
- package/defaults/agent-settings.schema.json +0 -25
- package/defaults/flags.json +0 -35
- package/defaults/flags.schema.json +0 -119
- package/defaults/mcp-defaults.json +0 -47
- package/defaults/mcp-defaults.schema.json +0 -38
- package/defaults/profiles.json +0 -53
- package/defaults/profiles.schema.json +0 -60
- package/defaults/team-profiles.json +0 -83
- package/src/control-plane.ts +0 -21
- package/src/index.ts +0 -8
- package/src/opencode-multiagent/compiler.ts +0 -168
- package/src/opencode-multiagent/constants.ts +0 -178
- package/src/opencode-multiagent/file-lock.ts +0 -90
- package/src/opencode-multiagent/hooks.ts +0 -599
- package/src/opencode-multiagent/log.ts +0 -12
- package/src/opencode-multiagent/mailbox.ts +0 -287
- package/src/opencode-multiagent/markdown.ts +0 -99
- package/src/opencode-multiagent/mcp.ts +0 -35
- package/src/opencode-multiagent/policy.ts +0 -67
- package/src/opencode-multiagent/quality.ts +0 -140
- package/src/opencode-multiagent/runtime.ts +0 -55
- package/src/opencode-multiagent/skills.ts +0 -144
- package/src/opencode-multiagent/supervision.ts +0 -156
- package/src/opencode-multiagent/task-manager.ts +0 -148
- package/src/opencode-multiagent/team-manager.ts +0 -219
- package/src/opencode-multiagent/team-tools.ts +0 -359
- package/src/opencode-multiagent/telemetry.ts +0 -124
- package/src/opencode-multiagent/utils.ts +0 -54
|
@@ -83,7 +83,7 @@ The file system itself provides structure that agents can navigate. File sizes s
|
|
|
83
83
|
|
|
84
84
|
### Hybrid Strategies
|
|
85
85
|
|
|
86
|
-
The most effective agents employ hybrid strategies. Pre-load some context for speed (like
|
|
86
|
+
The most effective agents employ hybrid strategies. Pre-load some context for speed (like AGENTS.md files or project rules), but enable autonomous exploration for additional context as needed. The decision boundary depends on task characteristics and context dynamics.
|
|
87
87
|
|
|
88
88
|
For contexts with less dynamic content, pre-loading more upfront makes sense. For rapidly changing or highly specific information, just-in-time loading avoids stale context.
|
|
89
89
|
|
|
@@ -96,6 +96,7 @@ Effective context budgeting requires understanding not just raw token counts but
|
|
|
96
96
|
## Examples
|
|
97
97
|
|
|
98
98
|
**Example 1: Organizing System Prompts**
|
|
99
|
+
|
|
99
100
|
```markdown
|
|
100
101
|
<BACKGROUND_INFORMATION>
|
|
101
102
|
You are a Python expert helping a development team.
|
|
@@ -121,23 +122,29 @@ Explain the reasoning behind suggestions.
|
|
|
121
122
|
```
|
|
122
123
|
|
|
123
124
|
**Example 2: Progressive Document Loading**
|
|
125
|
+
|
|
124
126
|
```markdown
|
|
125
127
|
# Instead of loading all documentation at once:
|
|
126
128
|
|
|
127
129
|
# Step 1: Load summary
|
|
128
|
-
|
|
130
|
+
|
|
131
|
+
docs/architecture_overview.md # Lightweight overview
|
|
129
132
|
|
|
130
133
|
# Step 2: Load specific section as needed
|
|
131
|
-
|
|
132
|
-
docs/
|
|
134
|
+
|
|
135
|
+
docs/api/endpoints.md # Only when API work needed
|
|
136
|
+
docs/database/schemas.md # Only when data layer work needed
|
|
133
137
|
```
|
|
134
138
|
|
|
135
139
|
**Example 3: Skill Description Design**
|
|
140
|
+
|
|
136
141
|
```markdown
|
|
137
142
|
# Bad: Vague description that loads into context but provides little signal
|
|
143
|
+
|
|
138
144
|
description: Helps with code things
|
|
139
145
|
|
|
140
146
|
# Good: Specific description that helps model decide when to activate
|
|
147
|
+
|
|
141
148
|
description: Analyze code quality and suggest refactoring patterns. Use when reviewing pull requests or improving existing code structure.
|
|
142
149
|
```
|
|
143
150
|
|
|
@@ -273,64 +280,76 @@ Implement these strategies through specific architectural patterns. Use just-in-
|
|
|
273
280
|
## Examples
|
|
274
281
|
|
|
275
282
|
**Example 1: Detecting Degradation in Prompt Design**
|
|
283
|
+
|
|
276
284
|
```markdown
|
|
277
285
|
# Signs your command/skill prompt may be too large:
|
|
278
286
|
|
|
279
287
|
Early signs (context ~50-70% utilized):
|
|
288
|
+
|
|
280
289
|
- Agent occasionally misses instructions
|
|
281
290
|
- Responses become less focused
|
|
282
291
|
- Some guidelines ignored
|
|
283
292
|
|
|
284
293
|
Warning signs (context ~70-85% utilized):
|
|
294
|
+
|
|
285
295
|
- Inconsistent behavior across runs
|
|
286
296
|
- Agent "forgets" earlier instructions
|
|
287
297
|
- Quality varies significantly
|
|
288
298
|
|
|
289
299
|
Critical signs (context >85% utilized):
|
|
300
|
+
|
|
290
301
|
- Agent ignores key constraints
|
|
291
302
|
- Hallucinations increase
|
|
292
303
|
- Task completion fails
|
|
293
304
|
```
|
|
294
305
|
|
|
295
306
|
**Example 2: Mitigating Lost-in-Middle in Prompt Structure**
|
|
307
|
+
|
|
296
308
|
```markdown
|
|
297
309
|
# Organize prompts with critical info at edges
|
|
298
310
|
|
|
299
|
-
<CRITICAL_CONSTRAINTS>
|
|
311
|
+
<CRITICAL_CONSTRAINTS> # At start (high attention)
|
|
312
|
+
|
|
300
313
|
- Never modify production files directly
|
|
301
314
|
- Always run tests before committing
|
|
302
315
|
- Maximum file size: 500 lines
|
|
303
|
-
</CRITICAL_CONSTRAINTS>
|
|
316
|
+
</CRITICAL_CONSTRAINTS>
|
|
317
|
+
|
|
318
|
+
<DETAILED_GUIDELINES> # Middle (lower attention)
|
|
304
319
|
|
|
305
|
-
<DETAILED_GUIDELINES> # Middle (lower attention)
|
|
306
320
|
- Code style preferences
|
|
307
321
|
- Documentation templates
|
|
308
322
|
- Review checklists
|
|
309
323
|
- Example patterns
|
|
310
|
-
</DETAILED_GUIDELINES>
|
|
324
|
+
</DETAILED_GUIDELINES>
|
|
325
|
+
|
|
326
|
+
<KEY_REMINDERS> # At end (high attention)
|
|
311
327
|
|
|
312
|
-
<KEY_REMINDERS> # At end (high attention)
|
|
313
328
|
- Run tests: npm run test
|
|
314
329
|
- Format code: npm run format
|
|
315
330
|
- Create PR with description
|
|
316
|
-
</KEY_REMINDERS>
|
|
331
|
+
</KEY_REMINDERS>
|
|
317
332
|
```
|
|
318
333
|
|
|
319
334
|
**Example 3: Sub-Agent Context Isolation**
|
|
335
|
+
|
|
320
336
|
```markdown
|
|
321
337
|
# Instead of one agent handling everything:
|
|
322
338
|
|
|
323
339
|
## Coordinator Agent (lean context)
|
|
340
|
+
|
|
324
341
|
- Understands task decomposition
|
|
325
342
|
- Delegates to specialized sub-agents
|
|
326
343
|
- Synthesizes results
|
|
327
344
|
|
|
328
345
|
## Code Review Sub-Agent (isolated context)
|
|
346
|
+
|
|
329
347
|
- Loaded only with code review guidelines
|
|
330
348
|
- Focuses solely on review task
|
|
331
349
|
- Returns structured findings
|
|
332
350
|
|
|
333
351
|
## Test Writer Sub-Agent (isolated context)
|
|
352
|
+
|
|
334
353
|
- Loaded only with testing patterns
|
|
335
354
|
- Focuses solely on test creation
|
|
336
355
|
- Returns test files
|
|
@@ -378,12 +397,13 @@ Extract all factual claims from the following output. List each claim on a separ
|
|
|
378
397
|
</TASK>
|
|
379
398
|
|
|
380
399
|
<FOCUS_AREAS>
|
|
400
|
+
|
|
381
401
|
- File paths and their existence
|
|
382
402
|
- Function/class/method names referenced
|
|
383
403
|
- Code behavior assertions ("this function returns X")
|
|
384
404
|
- External facts about APIs, libraries, or specifications
|
|
385
405
|
- Numerical values and metrics
|
|
386
|
-
</FOCUS_AREAS>
|
|
406
|
+
</FOCUS_AREAS>
|
|
387
407
|
|
|
388
408
|
<OUTPUT_TO_ANALYZE>
|
|
389
409
|
{agent_output}
|
|
@@ -412,11 +432,12 @@ Verify this claim by checking the actual codebase and context.
|
|
|
412
432
|
</CLAIM>
|
|
413
433
|
|
|
414
434
|
<VERIFICATION_APPROACH>
|
|
435
|
+
|
|
415
436
|
- For file paths: Use file tools to check existence
|
|
416
437
|
- For code claims: Read the actual code and verify behavior
|
|
417
438
|
- For external facts: Cross-reference with documentation or web search
|
|
418
439
|
- For metrics: Analyze the code structure
|
|
419
|
-
</VERIFICATION_APPROACH>
|
|
440
|
+
</VERIFICATION_APPROACH>
|
|
420
441
|
|
|
421
442
|
<RESPONSE_FORMAT>
|
|
422
443
|
STATUS: [VERIFIED | FALSE | UNVERIFIABLE]
|
|
@@ -452,10 +473,11 @@ Specific issues:
|
|
|
452
473
|
{list of FALSE and UNVERIFIABLE claims with evidence}
|
|
453
474
|
|
|
454
475
|
Please regenerate your response. For each factual claim:
|
|
476
|
+
|
|
455
477
|
1. Explicitly verify it using tools before stating it
|
|
456
478
|
2. If you cannot verify, state "I cannot verify..." instead of asserting
|
|
457
479
|
3. Cite the specific file/line/source for verifiable facts
|
|
458
|
-
</REGENERATION_PROMPT>
|
|
480
|
+
</REGENERATION_PROMPT>
|
|
459
481
|
```
|
|
460
482
|
|
|
461
483
|
## Lost-in-Middle Detection Workflow
|
|
@@ -477,6 +499,7 @@ Extract all critical instructions from your prompt that the agent MUST follow:
|
|
|
477
499
|
|
|
478
500
|
```markdown
|
|
479
501
|
Critical instructions to verify:
|
|
502
|
+
|
|
480
503
|
1. "Never modify files in /production"
|
|
481
504
|
2. "Always run tests before committing"
|
|
482
505
|
3. "Use TypeScript strict mode"
|
|
@@ -497,10 +520,11 @@ Prompt: {your_full_prompt_being_tested}
|
|
|
497
520
|
Task: {representative_task_that_exercises_all_instructions}
|
|
498
521
|
|
|
499
522
|
For each run, save:
|
|
523
|
+
|
|
500
524
|
- run_id: unique identifier
|
|
501
525
|
- agent_output: complete response from agent
|
|
502
526
|
- timestamp: when run completed
|
|
503
|
-
</AGENT_RUN_CONFIG>
|
|
527
|
+
</AGENT_RUN_CONFIG>
|
|
504
528
|
```
|
|
505
529
|
|
|
506
530
|
**Step 3: Verify Each Output Against Original Prompt**
|
|
@@ -527,16 +551,18 @@ You are a compliance verification agent. Analyze whether the agent output follow
|
|
|
527
551
|
|
|
528
552
|
<VERIFICATION_APPROACH>
|
|
529
553
|
For each critical instruction:
|
|
554
|
+
|
|
530
555
|
1. Determine if the instruction was applicable to this task
|
|
531
556
|
2. If applicable, check whether the output complies
|
|
532
557
|
3. Look for both explicit violations and omissions
|
|
533
558
|
4. Note any partial compliance
|
|
534
|
-
</VERIFICATION_APPROACH>
|
|
559
|
+
</VERIFICATION_APPROACH>
|
|
535
560
|
|
|
536
561
|
<OUTPUT_FORMAT>
|
|
537
562
|
RUN_ID: {run_id}
|
|
538
563
|
|
|
539
564
|
INSTRUCTION_COMPLIANCE:
|
|
565
|
+
|
|
540
566
|
- Instruction 1: "Never modify files in /production"
|
|
541
567
|
STATUS: [FOLLOWED | VIOLATED | NOT_APPLICABLE]
|
|
542
568
|
EVIDENCE: {quote from output or explanation}
|
|
@@ -548,11 +574,12 @@ INSTRUCTION_COMPLIANCE:
|
|
|
548
574
|
[... continue for all instructions ...]
|
|
549
575
|
|
|
550
576
|
SUMMARY:
|
|
577
|
+
|
|
551
578
|
- Instructions followed: {count}
|
|
552
579
|
- Instructions violated: {count}
|
|
553
580
|
- Not applicable: {count}
|
|
554
|
-
</OUTPUT_FORMAT>
|
|
555
|
-
</VERIFICATION_AGENT_PROMPT>
|
|
581
|
+
</OUTPUT_FORMAT>
|
|
582
|
+
</VERIFICATION_AGENT_PROMPT>
|
|
556
583
|
```
|
|
557
584
|
|
|
558
585
|
**Step 4: Aggregate Results and Identify At-Risk Parts**
|
|
@@ -562,18 +589,19 @@ Collect verification results from all runs and identify instructions that were i
|
|
|
562
589
|
```markdown
|
|
563
590
|
<AGGREGATION_LOGIC>
|
|
564
591
|
For each instruction:
|
|
565
|
-
|
|
566
|
-
|
|
567
|
-
|
|
592
|
+
followed_count = number of runs where STATUS == FOLLOWED
|
|
593
|
+
violated_count = number of runs where STATUS == VIOLATED
|
|
594
|
+
applicable_runs = total_runs - (runs where STATUS == NOT_APPLICABLE)
|
|
595
|
+
|
|
596
|
+
compliance_rate = followed_count / applicable_runs
|
|
568
597
|
|
|
569
|
-
|
|
598
|
+
Classification:
|
|
570
599
|
|
|
571
|
-
|
|
572
|
-
|
|
573
|
-
|
|
574
|
-
|
|
575
|
-
|
|
576
|
-
- compliance_rate == 0.0: ALWAYS_IGNORED (critical failure)
|
|
600
|
+
- compliance_rate == 1.0: RELIABLE (always followed)
|
|
601
|
+
- compliance_rate >= 0.8: MOSTLY_RELIABLE (minor inconsistency)
|
|
602
|
+
- compliance_rate >= 0.5: AT_RISK (inconsistent - likely lost-in-middle)
|
|
603
|
+
- compliance_rate < 0.5: FREQUENTLY_IGNORED (severe issue)
|
|
604
|
+
- compliance_rate == 0.0: ALWAYS_IGNORED (critical failure)
|
|
577
605
|
|
|
578
606
|
AT_RISK instructions are the primary signal for lost-in-middle problems.
|
|
579
607
|
These are instructions that work sometimes but not consistently, indicating
|
|
@@ -583,22 +611,23 @@ they are in attention-weak positions.
|
|
|
583
611
|
<AGGREGATION_OUTPUT_FORMAT>
|
|
584
612
|
INSTRUCTION COMPLIANCE SUMMARY:
|
|
585
613
|
|
|
586
|
-
| Instruction
|
|
587
|
-
|
|
588
|
-
| 1. Never modify /production | 5/5
|
|
589
|
-
| 2. Run tests before commit
|
|
590
|
-
| 3. TypeScript strict mode
|
|
591
|
-
| 4. Max function length 50
|
|
592
|
-
| 5. Include JSDoc
|
|
593
|
-
| 6. Format as JSON
|
|
594
|
-
| 7. Log modifications
|
|
614
|
+
| Instruction | Followed | Violated | Compliance Rate | Status |
|
|
615
|
+
| --------------------------- | -------- | -------- | --------------- | ------------------ |
|
|
616
|
+
| 1. Never modify /production | 5/5 | 0/5 | 100% | RELIABLE |
|
|
617
|
+
| 2. Run tests before commit | 3/5 | 2/5 | 60% | AT_RISK |
|
|
618
|
+
| 3. TypeScript strict mode | 4/5 | 1/5 | 80% | MOSTLY_RELIABLE |
|
|
619
|
+
| 4. Max function length 50 | 2/5 | 3/5 | 40% | FREQUENTLY_IGNORED |
|
|
620
|
+
| 5. Include JSDoc | 5/5 | 0/5 | 100% | RELIABLE |
|
|
621
|
+
| 6. Format as JSON | 1/5 | 4/5 | 20% | ALWAYS_IGNORED |
|
|
622
|
+
| 7. Log modifications | 3/5 | 2/5 | 60% | AT_RISK |
|
|
595
623
|
|
|
596
624
|
AT-RISK INSTRUCTIONS (likely in lost-in-middle zone):
|
|
625
|
+
|
|
597
626
|
- Instruction 2: "Run tests before commit" (60% compliance)
|
|
598
627
|
- Instruction 4: "Max function length 50" (40% compliance)
|
|
599
628
|
- Instruction 6: "Format as JSON" (20% compliance)
|
|
600
629
|
- Instruction 7: "Log modifications" (60% compliance)
|
|
601
|
-
</AGGREGATION_OUTPUT_FORMAT>
|
|
630
|
+
</AGGREGATION_OUTPUT_FORMAT>
|
|
602
631
|
```
|
|
603
632
|
|
|
604
633
|
**Step 5: Output Recommendations**
|
|
@@ -626,10 +655,10 @@ SPECIFIC RECOMMENDATIONS:
|
|
|
626
655
|
Restructure at-risk instructions with emphasis:
|
|
627
656
|
|
|
628
657
|
Before: "Always run tests before committing"
|
|
629
|
-
After:
|
|
658
|
+
After: "**CRITICAL:** You MUST run tests before committing. Never skip this step."
|
|
630
659
|
|
|
631
660
|
Before: "Maximum function length: 50 lines"
|
|
632
|
-
After:
|
|
661
|
+
After: "3. [REQUIRED] Maximum function length: 50 lines"
|
|
633
662
|
|
|
634
663
|
Use numbered lists, bold markers, or explicit tags like [REQUIRED], [CRITICAL], [MUST].
|
|
635
664
|
|
|
@@ -644,7 +673,7 @@ SPECIFIC RECOMMENDATIONS:
|
|
|
644
673
|
- Moving 2-3 most critical items to edges
|
|
645
674
|
- Converting remaining middle items to a numbered checklist
|
|
646
675
|
- Adding explicit "verify these items" reminder at end
|
|
647
|
-
</RECOMMENDATIONS_OUTPUT>
|
|
676
|
+
</RECOMMENDATIONS_OUTPUT>
|
|
648
677
|
```
|
|
649
678
|
|
|
650
679
|
### Complete Workflow Example
|
|
@@ -653,31 +682,37 @@ SPECIFIC RECOMMENDATIONS:
|
|
|
653
682
|
# Example: Testing a Code Review Command
|
|
654
683
|
|
|
655
684
|
## Original Prompt Being Tested:
|
|
685
|
+
|
|
656
686
|
"Review the code for: security issues, performance problems,
|
|
657
687
|
code style, test coverage, documentation completeness,
|
|
658
688
|
error handling, and logging practices."
|
|
659
689
|
|
|
660
690
|
## Run 5 Agents:
|
|
691
|
+
|
|
661
692
|
Each agent reviews the same code sample with this prompt.
|
|
662
693
|
|
|
663
694
|
## Verification Results:
|
|
664
|
-
|
|
665
|
-
|
|
666
|
-
|
|
|
667
|
-
|
|
|
668
|
-
|
|
|
669
|
-
|
|
|
670
|
-
|
|
|
671
|
-
|
|
|
672
|
-
|
|
|
695
|
+
|
|
696
|
+
| Instruction | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Rate |
|
|
697
|
+
| -------------- | ----- | ----- | ----- | ----- | ----- | ---- |
|
|
698
|
+
| Security | Y | Y | Y | Y | Y | 100% |
|
|
699
|
+
| Performance | Y | X | Y | X | Y | 60% |
|
|
700
|
+
| Code style | X | X | Y | X | X | 20% |
|
|
701
|
+
| Test coverage | X | Y | X | X | Y | 40% |
|
|
702
|
+
| Documentation | X | X | X | Y | X | 20% |
|
|
703
|
+
| Error handling | Y | Y | X | Y | Y | 80% |
|
|
704
|
+
| Logging | Y | Y | Y | Y | Y | 100% |
|
|
673
705
|
|
|
674
706
|
## Analysis:
|
|
707
|
+
|
|
675
708
|
- RELIABLE: Security, Logging (at edges of list)
|
|
676
709
|
- AT_RISK: Performance, Error handling
|
|
677
710
|
- FREQUENTLY_IGNORED: Code style, Test coverage, Documentation (middle of list)
|
|
678
711
|
|
|
679
712
|
## Remediation Applied:
|
|
713
|
+
|
|
680
714
|
"**CRITICAL REVIEW AREAS:**
|
|
715
|
+
|
|
681
716
|
1. Security vulnerabilities
|
|
682
717
|
2. Test coverage gaps
|
|
683
718
|
3. Documentation completeness
|
|
@@ -706,6 +741,7 @@ Record the output of each agent in your chain:
|
|
|
706
741
|
|
|
707
742
|
```markdown
|
|
708
743
|
Agent Chain Record:
|
|
744
|
+
|
|
709
745
|
- Agent 1 (Analyzer): {output_1}
|
|
710
746
|
- Agent 2 (Planner): {output_2}
|
|
711
747
|
- Agent 3 (Implementer): {output_3}
|
|
@@ -758,11 +794,12 @@ Agent 4 Output: {output_4}
|
|
|
758
794
|
|
|
759
795
|
<ANALYSIS_APPROACH>
|
|
760
796
|
For each agent output (starting from the last):
|
|
797
|
+
|
|
761
798
|
1. Does this output contain the error?
|
|
762
799
|
2. If yes, was the error present in the input to this agent?
|
|
763
800
|
3. If error is in output but not input: This agent INTRODUCED the error
|
|
764
801
|
4. If error is in both: This agent PROPAGATED the error
|
|
765
|
-
</ANALYSIS_APPROACH>
|
|
802
|
+
</ANALYSIS_APPROACH>
|
|
766
803
|
|
|
767
804
|
<OUTPUT_FORMAT>
|
|
768
805
|
ERROR: {error_id}
|
|
@@ -803,7 +840,7 @@ After Agent {N} completes:
|
|
|
803
840
|
- Or: Regenerate Agent N output with explicit guidance
|
|
804
841
|
|
|
805
842
|
3. Only proceed to Agent {N+1} if verification passes
|
|
806
|
-
</ERROR_BOUNDARY_TEMPLATE>
|
|
843
|
+
</ERROR_BOUNDARY_TEMPLATE>
|
|
807
844
|
```
|
|
808
845
|
|
|
809
846
|
## Context Relevance Scoring Workflow
|
|
@@ -813,7 +850,7 @@ Not all parts of a prompt contribute equally to task completion. This workflow i
|
|
|
813
850
|
### When to Use
|
|
814
851
|
|
|
815
852
|
- When optimizing prompt length and content
|
|
816
|
-
- When deciding what to include in
|
|
853
|
+
- When deciding what to include in AGENTS.md
|
|
817
854
|
- When a prompt feels bloated but you are unsure what to cut
|
|
818
855
|
- When debugging agents that ignore provided context
|
|
819
856
|
- Before deploying new commands, skills, or agent prompts
|
|
@@ -827,35 +864,32 @@ Divide the prompt (command/skill/agent) into logical sections. Each part should
|
|
|
827
864
|
```markdown
|
|
828
865
|
<PROMPT_PARTS>
|
|
829
866
|
PART_1:
|
|
830
|
-
|
|
831
|
-
|
|
832
|
-
|
|
833
|
-
|
|
867
|
+
ID: background
|
|
868
|
+
CONTENT: |
|
|
869
|
+
You are a Python expert helping a development team.
|
|
870
|
+
Current project: Data processing pipeline in Python 3.9+
|
|
834
871
|
|
|
835
872
|
PART_2:
|
|
836
|
-
|
|
837
|
-
|
|
838
|
-
- Write clean, idiomatic Python code
|
|
839
|
-
- Include type hints for function signatures
|
|
840
|
-
- Add docstrings for public functions
|
|
841
|
-
- Follow PEP 8 style guidelines
|
|
873
|
+
ID: code_style_rules
|
|
874
|
+
CONTENT: | - Write clean, idiomatic Python code - Include type hints for function signatures - Add docstrings for public functions - Follow PEP 8 style guidelines
|
|
842
875
|
|
|
843
876
|
PART_3:
|
|
844
|
-
|
|
845
|
-
|
|
846
|
-
|
|
847
|
-
|
|
848
|
-
|
|
877
|
+
ID: historical_context
|
|
878
|
+
CONTENT: |
|
|
879
|
+
The project was migrated from Python 2.7 in 2019.
|
|
880
|
+
Original team used camelCase naming but we now use snake_case.
|
|
881
|
+
Legacy modules in /legacy folder are frozen.
|
|
849
882
|
|
|
850
883
|
PART_4:
|
|
851
|
-
|
|
852
|
-
|
|
853
|
-
|
|
854
|
-
|
|
884
|
+
ID: output_format
|
|
885
|
+
CONTENT: |
|
|
886
|
+
Provide actionable feedback with specific line references.
|
|
887
|
+
Explain the reasoning behind suggestions.
|
|
855
888
|
</PROMPT_PARTS>
|
|
856
889
|
```
|
|
857
890
|
|
|
858
891
|
Splitting guidelines:
|
|
892
|
+
|
|
859
893
|
- Each XML section or Markdown header becomes a part
|
|
860
894
|
- Separate conceptually distinct instructions into their own parts
|
|
861
895
|
- Keep related instructions together (do not split mid-thought)
|
|
@@ -883,26 +917,30 @@ Example: "Review a pull request for code quality issues and suggest improvements
|
|
|
883
917
|
Score 0-10 based on these criteria:
|
|
884
918
|
|
|
885
919
|
ESSENTIAL (8-10):
|
|
920
|
+
|
|
886
921
|
- Part directly enables task completion
|
|
887
922
|
- Removing this part would cause task failure
|
|
888
923
|
- Part contains critical constraints that prevent errors
|
|
889
924
|
- Part defines required output format or structure
|
|
890
925
|
|
|
891
926
|
HELPFUL (5-7):
|
|
927
|
+
|
|
892
928
|
- Part improves output quality but is not strictly required
|
|
893
929
|
- Part provides useful context that guides better decisions
|
|
894
930
|
- Part contains preferences that affect style but not correctness
|
|
895
931
|
|
|
896
932
|
MARGINAL (2-4):
|
|
933
|
+
|
|
897
934
|
- Part has tangential relevance to the task
|
|
898
935
|
- Part might occasionally be useful but usually is not
|
|
899
936
|
- Part provides historical context rarely needed
|
|
900
937
|
|
|
901
938
|
DISTRACTOR (0-1):
|
|
939
|
+
|
|
902
940
|
- Part is irrelevant to the task
|
|
903
941
|
- Part could confuse the agent about what to focus on
|
|
904
942
|
- Part competes for attention without contributing value
|
|
905
|
-
</SCORING_CRITERIA>
|
|
943
|
+
</SCORING_CRITERIA>
|
|
906
944
|
|
|
907
945
|
<OUTPUT_FORMAT>
|
|
908
946
|
RELEVANCE_SCORE: [0-10]
|
|
@@ -943,12 +981,14 @@ Apply the distractor threshold (score < 5):
|
|
|
943
981
|
DISTRACTOR_ANALYSIS:
|
|
944
982
|
|
|
945
983
|
Identified Distractors:
|
|
984
|
+
|
|
946
985
|
1. PART: historical_context
|
|
947
986
|
SCORE: 3/10
|
|
948
987
|
JUSTIFICATION: "Migration history from Python 2.7 is rarely relevant to reviewing current code. The naming convention note is useful but should be in code_style_rules instead."
|
|
949
988
|
RECOMMENDATION: REMOVE or RELOCATE
|
|
950
989
|
|
|
951
990
|
Summary:
|
|
991
|
+
|
|
952
992
|
- Total parts: 4
|
|
953
993
|
- High-relevance parts (>=5): 3
|
|
954
994
|
- Distractor parts (<5): 1
|
|
@@ -956,6 +996,7 @@ Summary:
|
|
|
956
996
|
- Average relevance: 6.75
|
|
957
997
|
|
|
958
998
|
Token Impact:
|
|
999
|
+
|
|
959
1000
|
- Distractor tokens: ~45 (historical_context)
|
|
960
1001
|
- Potential savings: 45 tokens (11% of prompt)
|
|
961
1002
|
```
|
|
@@ -980,6 +1021,7 @@ OPTIMIZATION_RECOMMENDATIONS:
|
|
|
980
1021
|
Savings: ~15 tokens
|
|
981
1022
|
|
|
982
1023
|
OPTIMIZED PROMPT STRUCTURE:
|
|
1024
|
+
|
|
983
1025
|
- background (condensed): 8 tokens
|
|
984
1026
|
- code_style_rules (with snake_case added): 52 tokens
|
|
985
1027
|
- output_format: 28 tokens
|
|
@@ -991,13 +1033,14 @@ OPTIMIZED PROMPT STRUCTURE:
|
|
|
991
1033
|
|
|
992
1034
|
The default threshold of 5 balances comprehensiveness against efficiency:
|
|
993
1035
|
|
|
994
|
-
| Threshold | Use Case
|
|
995
|
-
|
|
996
|
-
| < 3
|
|
997
|
-
| < 5
|
|
998
|
-
| < 7
|
|
1036
|
+
| Threshold | Use Case |
|
|
1037
|
+
| --------- | ------------------------------------------------- |
|
|
1038
|
+
| < 3 | Aggressive pruning for token-constrained contexts |
|
|
1039
|
+
| < 5 | Standard optimization (recommended default) |
|
|
1040
|
+
| < 7 | Conservative pruning for critical prompts |
|
|
999
1041
|
|
|
1000
1042
|
Adjust threshold based on:
|
|
1043
|
+
|
|
1001
1044
|
- **Context budget pressure**: Lower threshold when approaching limits
|
|
1002
1045
|
- **Task criticality**: Higher threshold for production prompts
|
|
1003
1046
|
- **Prompt stability**: Lower threshold for experimental prompts
|
|
@@ -1008,14 +1051,16 @@ For efficiency, parallelize scoring agents:
|
|
|
1008
1051
|
|
|
1009
1052
|
```markdown
|
|
1010
1053
|
# Parallel execution pattern
|
|
1054
|
+
|
|
1011
1055
|
spawn_parallel([
|
|
1012
|
-
|
|
1013
|
-
|
|
1014
|
-
|
|
1015
|
-
|
|
1056
|
+
scoring_agent(part_1, task_description),
|
|
1057
|
+
scoring_agent(part_2, task_description),
|
|
1058
|
+
scoring_agent(part_3, task_description),
|
|
1059
|
+
...
|
|
1016
1060
|
])
|
|
1017
1061
|
|
|
1018
1062
|
# Collect and aggregate
|
|
1063
|
+
|
|
1019
1064
|
scores = await_all(scoring_agents)
|
|
1020
1065
|
analysis = aggregate_scores(scores)
|
|
1021
1066
|
```
|
|
@@ -1052,30 +1097,35 @@ Analyze the recent conversation history for signs of context degradation.
|
|
|
1052
1097
|
Check for these degradation symptoms:
|
|
1053
1098
|
|
|
1054
1099
|
LOST_IN_MIDDLE:
|
|
1100
|
+
|
|
1055
1101
|
- [ ] Agent missing instructions from early in conversation
|
|
1056
1102
|
- [ ] Critical constraints being ignored
|
|
1057
1103
|
- [ ] Agent asking for information already provided
|
|
1058
1104
|
|
|
1059
1105
|
CONTEXT_POISONING:
|
|
1106
|
+
|
|
1060
1107
|
- [ ] Same error appearing repeatedly
|
|
1061
1108
|
- [ ] Agent referencing incorrect information as fact
|
|
1062
1109
|
- [ ] Hallucinations that persist despite correction
|
|
1063
1110
|
|
|
1064
1111
|
CONTEXT_DISTRACTION:
|
|
1112
|
+
|
|
1065
1113
|
- [ ] Responses becoming unfocused
|
|
1066
1114
|
- [ ] Agent using irrelevant context inappropriately
|
|
1067
1115
|
- [ ] Quality declining on previously-successful tasks
|
|
1068
1116
|
|
|
1069
1117
|
CONTEXT_CONFUSION:
|
|
1118
|
+
|
|
1070
1119
|
- [ ] Agent mixing up different task requirements
|
|
1071
1120
|
- [ ] Wrong tool selections for obvious tasks
|
|
1072
1121
|
- [ ] Outputs that blend requirements from different tasks
|
|
1073
1122
|
|
|
1074
1123
|
CONTEXT_CLASH:
|
|
1124
|
+
|
|
1075
1125
|
- [ ] Agent expressing uncertainty about conflicting information
|
|
1076
1126
|
- [ ] Inconsistent behavior between turns
|
|
1077
1127
|
- [ ] Agent asking for clarification on resolved issues
|
|
1078
|
-
</SYMPTOM_CHECKLIST>
|
|
1128
|
+
</SYMPTOM_CHECKLIST>
|
|
1079
1129
|
|
|
1080
1130
|
<OUTPUT_FORMAT>
|
|
1081
1131
|
HEALTH_STATUS: [HEALTHY | DEGRADED | CRITICAL]
|
|
@@ -1091,10 +1141,11 @@ Based on health status, trigger appropriate intervention:
|
|
|
1091
1141
|
|
|
1092
1142
|
```markdown
|
|
1093
1143
|
IF HEALTH_STATUS == "DEGRADED" or HEALTH_STATUS == "CRITICAL":
|
|
1094
|
-
|
|
1095
|
-
|
|
1096
|
-
|
|
1097
|
-
|
|
1144
|
+
<RESTART_INTERVENTION>
|
|
1145
|
+
|
|
1146
|
+
1. Extract essential state to preserve and save to a file
|
|
1147
|
+
2. Ask user to start a new session with clean context and load the preserved state from the file after the new session is started
|
|
1148
|
+
</RESTART_INTERVENTION>
|
|
1098
1149
|
```
|
|
1099
1150
|
|
|
1100
1151
|
## Guidelines for Multi-Agent Verification
|
|
@@ -1153,16 +1204,19 @@ Observation masking replaces verbose tool outputs with compact references. The i
|
|
|
1153
1204
|
Not all observations should be masked equally:
|
|
1154
1205
|
|
|
1155
1206
|
**Never mask:**
|
|
1207
|
+
|
|
1156
1208
|
- Observations critical to current task
|
|
1157
1209
|
- Observations from the most recent turn
|
|
1158
1210
|
- Observations used in active reasoning
|
|
1159
1211
|
|
|
1160
1212
|
**Consider masking:**
|
|
1213
|
+
|
|
1161
1214
|
- Observations from 3+ turns ago
|
|
1162
1215
|
- Verbose outputs with key points extractable
|
|
1163
1216
|
- Observations whose purpose has been served
|
|
1164
1217
|
|
|
1165
1218
|
**Always mask:**
|
|
1219
|
+
|
|
1166
1220
|
- Repeated outputs
|
|
1167
1221
|
- Boilerplate headers/footers
|
|
1168
1222
|
- Outputs already summarized in conversation
|
|
@@ -1176,6 +1230,7 @@ This approach achieves separation of concerns--the detailed search context remai
|
|
|
1176
1230
|
|
|
1177
1231
|
**When to Partition**
|
|
1178
1232
|
Consider partitioning when:
|
|
1233
|
+
|
|
1179
1234
|
- Task naturally decomposes into independent subtasks
|
|
1180
1235
|
- Different subtasks require different specialized context
|
|
1181
1236
|
- Context accumulation threatens to exceed limits
|
|
@@ -1183,22 +1238,24 @@ Consider partitioning when:
|
|
|
1183
1238
|
|
|
1184
1239
|
**Result Aggregation**
|
|
1185
1240
|
Aggregate results from partitioned subtasks by:
|
|
1241
|
+
|
|
1186
1242
|
1. Validating all partitions completed
|
|
1187
1243
|
2. Merging compatible results
|
|
1188
1244
|
3. Summarizing if combined results still too large
|
|
1189
1245
|
4. Resolving conflicts between partition outputs
|
|
1190
1246
|
|
|
1191
|
-
|
|
1192
1247
|
## Practical Guidance
|
|
1193
1248
|
|
|
1194
1249
|
### Optimization Decision Framework
|
|
1195
1250
|
|
|
1196
1251
|
**When to optimize:**
|
|
1252
|
+
|
|
1197
1253
|
- Response quality degrades as conversations extend
|
|
1198
1254
|
- Costs increase due to long contexts
|
|
1199
1255
|
- Latency increases with conversation length
|
|
1200
1256
|
|
|
1201
1257
|
**What to apply:**
|
|
1258
|
+
|
|
1202
1259
|
- Tool outputs dominate: observation masking
|
|
1203
1260
|
- Retrieved documents dominate: summarization or partitioning
|
|
1204
1261
|
- Message history dominates: compaction with summarization
|
|
@@ -1208,29 +1265,41 @@ Aggregate results from partitioned subtasks by:
|
|
|
1208
1265
|
|
|
1209
1266
|
**Command Optimization**
|
|
1210
1267
|
Commands load on-demand, so focus on keeping individual commands focused:
|
|
1268
|
+
|
|
1211
1269
|
```markdown
|
|
1212
1270
|
# Good: Focused command with clear scope
|
|
1271
|
+
|
|
1213
1272
|
---
|
|
1273
|
+
|
|
1214
1274
|
name: review-security
|
|
1215
1275
|
description: Review code for security vulnerabilities
|
|
1276
|
+
|
|
1216
1277
|
---
|
|
1278
|
+
|
|
1217
1279
|
# Specific security review instructions only
|
|
1218
1280
|
|
|
1219
1281
|
# Avoid: Overloaded command trying to do everything
|
|
1282
|
+
|
|
1220
1283
|
---
|
|
1284
|
+
|
|
1221
1285
|
name: review-all
|
|
1222
1286
|
description: Review code for everything
|
|
1287
|
+
|
|
1223
1288
|
---
|
|
1289
|
+
|
|
1224
1290
|
# 50 different review checklists crammed together
|
|
1225
1291
|
```
|
|
1226
1292
|
|
|
1227
1293
|
**Skill Optimization**
|
|
1228
1294
|
Skills load their descriptions by default, so descriptions must be concise:
|
|
1295
|
+
|
|
1229
1296
|
```markdown
|
|
1230
1297
|
# Good: Concise description
|
|
1298
|
+
|
|
1231
1299
|
description: Analyze code architecture. Use for design reviews.
|
|
1232
1300
|
|
|
1233
1301
|
# Avoid: Verbose description that wastes context budget
|
|
1302
|
+
|
|
1234
1303
|
description: This skill provides comprehensive analysis of code
|
|
1235
1304
|
architecture including but not limited to class hierarchies,
|
|
1236
1305
|
dependency graphs, coupling metrics, cohesion analysis...
|
|
@@ -1238,12 +1307,15 @@ dependency graphs, coupling metrics, cohesion analysis...
|
|
|
1238
1307
|
|
|
1239
1308
|
**Sub-Agent Context Design**
|
|
1240
1309
|
When spawning sub-agents, provide focused context:
|
|
1310
|
+
|
|
1241
1311
|
```markdown
|
|
1242
1312
|
# Coordinator provides minimal handoff:
|
|
1313
|
+
|
|
1243
1314
|
"Review authentication module for security issues.
|
|
1244
1315
|
Return findings in structured format."
|
|
1245
1316
|
|
|
1246
1317
|
# NOT this verbose handoff:
|
|
1318
|
+
|
|
1247
1319
|
"I need you to look at the authentication module which is
|
|
1248
1320
|
located in src/auth/ and contains several files including
|
|
1249
1321
|
login.ts, session.ts, tokens.ts... [500 more tokens of context]"
|