npm - opencode-multiagent - Versions diffs - 0.2.1 → 0.4.0 - Mend

opencode-multiagent 0.2.1 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (160) hide show

package/AGENTS.md +83 -0
package/CHANGELOG.md +31 -0
package/CONTRIBUTING.md +36 -0
package/README.md +44 -168
package/README.tr.md +84 -0
package/RELEASE.md +68 -0
package/agents/AGENTS.md +91 -0
package/agents/auditor.md +67 -23
package/agents/{worker.md → coder.md} +24 -17
package/agents/docmaster.md +91 -0
package/agents/executor.md +63 -79
package/agents/planner.md +78 -58
package/agents/reviewer.md +31 -15
package/agents/scout.md +25 -17
package/agents/sec-coder.md +83 -0
package/agents/ui-coder.md +77 -0
package/commands/board.md +17 -0
package/commands/execute.md +9 -7
package/commands/init-deep.md +7 -6
package/commands/init.md +5 -5
package/commands/inspect.md +6 -5
package/commands/plan.md +8 -6
package/commands/quality.md +4 -3
package/commands/review.md +5 -3
package/commands/status.md +5 -3
package/defaults/AGENTS.md +48 -0
package/defaults/opencode-multiagent.json +180 -0
package/defaults/opencode-multiagent.schema.json +265 -0
package/dist/control-plane.d.ts +4 -0
package/dist/control-plane.d.ts.map +1 -0
package/dist/index.d.ts +5 -0
package/dist/index.d.ts.map +1 -0
package/dist/index.js +1916 -0
package/dist/opencode-multiagent/compiler.d.ts +25 -0
package/dist/opencode-multiagent/compiler.d.ts.map +1 -0
package/dist/opencode-multiagent/constants.d.ts +128 -0
package/dist/opencode-multiagent/constants.d.ts.map +1 -0
package/dist/opencode-multiagent/correlation.d.ts +21 -0
package/dist/opencode-multiagent/correlation.d.ts.map +1 -0
package/dist/opencode-multiagent/defaults.d.ts +10 -0
package/dist/opencode-multiagent/defaults.d.ts.map +1 -0
package/dist/opencode-multiagent/hooks.d.ts +62 -0
package/dist/opencode-multiagent/hooks.d.ts.map +1 -0
package/dist/opencode-multiagent/log.d.ts +2 -0
package/dist/opencode-multiagent/log.d.ts.map +1 -0
package/dist/opencode-multiagent/markdown.d.ts +8 -0
package/dist/opencode-multiagent/markdown.d.ts.map +1 -0
package/dist/opencode-multiagent/mcp.d.ts +3 -0
package/dist/opencode-multiagent/mcp.d.ts.map +1 -0
package/dist/opencode-multiagent/policy.d.ts +5 -0
package/dist/opencode-multiagent/policy.d.ts.map +1 -0
package/dist/opencode-multiagent/quality.d.ts +18 -0
package/dist/opencode-multiagent/quality.d.ts.map +1 -0
package/dist/opencode-multiagent/runtime.d.ts +7 -0
package/dist/opencode-multiagent/runtime.d.ts.map +1 -0
package/dist/opencode-multiagent/session-tracker.d.ts +32 -0
package/dist/opencode-multiagent/session-tracker.d.ts.map +1 -0
package/dist/opencode-multiagent/skills.d.ts +17 -0
package/dist/opencode-multiagent/skills.d.ts.map +1 -0
package/dist/opencode-multiagent/supervision.d.ts +26 -0
package/dist/opencode-multiagent/supervision.d.ts.map +1 -0
package/dist/opencode-multiagent/task-manager.d.ts +54 -0
package/dist/opencode-multiagent/task-manager.d.ts.map +1 -0
package/dist/opencode-multiagent/telemetry.d.ts +28 -0
package/dist/opencode-multiagent/telemetry.d.ts.map +1 -0
package/dist/opencode-multiagent/tools.d.ts +87 -0
package/dist/opencode-multiagent/tools.d.ts.map +1 -0
package/dist/opencode-multiagent/types.d.ts +36 -0
package/dist/opencode-multiagent/types.d.ts.map +1 -0
package/dist/opencode-multiagent/utils.d.ts +9 -0
package/dist/opencode-multiagent/utils.d.ts.map +1 -0
package/docs/agents.md +148 -0
package/docs/agents.tr.md +149 -0
package/docs/configuration.md +244 -0
package/docs/configuration.tr.md +244 -0
package/docs/usage-guide.md +224 -0
package/docs/usage-guide.tr.md +225 -0
package/examples/opencode.with-overrides.json +3 -7
package/package.json +23 -13
package/skills/AGENTS.md +51 -0
package/skills/advanced-evaluation/SKILL.md +37 -21
package/skills/advanced-evaluation/manifest.json +2 -13
package/skills/cek-context-engineering/SKILL.md +159 -87
package/skills/cek-context-engineering/manifest.json +1 -3
package/skills/cek-prompt-engineering/SKILL.md +13 -10
package/skills/cek-prompt-engineering/manifest.json +1 -3
package/skills/cek-test-prompt/SKILL.md +38 -28
package/skills/cek-test-prompt/manifest.json +1 -3
package/skills/cek-thought-based-reasoning/SKILL.md +75 -21
package/skills/cek-thought-based-reasoning/manifest.json +1 -3
package/skills/context-degradation/SKILL.md +14 -13
package/skills/context-degradation/manifest.json +1 -3
package/skills/debate/SKILL.md +23 -78
package/skills/debate/manifest.json +2 -12
package/skills/design-first/manifest.json +2 -13
package/skills/dispatching-parallel-agents/SKILL.md +14 -3
package/skills/dispatching-parallel-agents/manifest.json +1 -4
package/skills/drift-analysis/SKILL.md +50 -29
package/skills/drift-analysis/manifest.json +2 -12
package/skills/evaluation/manifest.json +2 -12
package/skills/executing-plans/SKILL.md +15 -8
package/skills/executing-plans/manifest.json +1 -3
package/skills/handoff-protocols/manifest.json +2 -12
package/skills/parallel-investigation/SKILL.md +25 -12
package/skills/parallel-investigation/manifest.json +1 -4
package/skills/reflexion-critique/SKILL.md +21 -10
package/skills/reflexion-critique/manifest.json +1 -3
package/skills/reflexion-reflect/SKILL.md +36 -34
package/skills/reflexion-reflect/manifest.json +2 -10
package/skills/root-cause-analysis/manifest.json +2 -13
package/skills/sadd-judge-with-debate/SKILL.md +50 -26
package/skills/sadd-judge-with-debate/manifest.json +1 -3
package/skills/structured-code-review/manifest.json +2 -11
package/skills/task-decomposition/manifest.json +2 -13
package/skills/verification-before-completion/manifest.json +2 -15
package/skills/verification-gates/SKILL.md +27 -19
package/skills/verification-gates/manifest.json +2 -12
package/agents/advisor.md +0 -57
package/agents/critic.md +0 -127
package/agents/deep-worker.md +0 -65
package/agents/devil.md +0 -36
package/agents/heavy-worker.md +0 -68
package/agents/lead.md +0 -155
package/agents/librarian.md +0 -62
package/agents/qa.md +0 -50
package/agents/quick.md +0 -65
package/agents/scribe.md +0 -78
package/agents/strategist.md +0 -63
package/agents/ui-heavy-worker.md +0 -62
package/agents/ui-worker.md +0 -69
package/agents/validator.md +0 -47
package/defaults/agent-settings.json +0 -102
package/defaults/agent-settings.schema.json +0 -25
package/defaults/flags.json +0 -35
package/defaults/flags.schema.json +0 -119
package/defaults/mcp-defaults.json +0 -47
package/defaults/mcp-defaults.schema.json +0 -38
package/defaults/profiles.json +0 -53
package/defaults/profiles.schema.json +0 -60
package/defaults/team-profiles.json +0 -83
package/src/control-plane.ts +0 -21
package/src/index.ts +0 -8
package/src/opencode-multiagent/compiler.ts +0 -168
package/src/opencode-multiagent/constants.ts +0 -178
package/src/opencode-multiagent/file-lock.ts +0 -90
package/src/opencode-multiagent/hooks.ts +0 -599
package/src/opencode-multiagent/log.ts +0 -12
package/src/opencode-multiagent/mailbox.ts +0 -287
package/src/opencode-multiagent/markdown.ts +0 -99
package/src/opencode-multiagent/mcp.ts +0 -35
package/src/opencode-multiagent/policy.ts +0 -67
package/src/opencode-multiagent/quality.ts +0 -140
package/src/opencode-multiagent/runtime.ts +0 -55
package/src/opencode-multiagent/skills.ts +0 -144
package/src/opencode-multiagent/supervision.ts +0 -156
package/src/opencode-multiagent/task-manager.ts +0 -148
package/src/opencode-multiagent/team-manager.ts +0 -219
package/src/opencode-multiagent/team-tools.ts +0 -359
package/src/opencode-multiagent/telemetry.ts +0 -124
package/src/opencode-multiagent/utils.ts +0 -54

package/skills/cek-context-engineering/SKILL.md CHANGED Viewed

@@ -83,7 +83,7 @@ The file system itself provides structure that agents can navigate. File sizes s
 ### Hybrid Strategies
-The most effective agents employ hybrid strategies. Pre-load some context for speed (like CLAUDE.md files or project rules), but enable autonomous exploration for additional context as needed. The decision boundary depends on task characteristics and context dynamics.
+The most effective agents employ hybrid strategies. Pre-load some context for speed (like AGENTS.md files or project rules), but enable autonomous exploration for additional context as needed. The decision boundary depends on task characteristics and context dynamics.
 For contexts with less dynamic content, pre-loading more upfront makes sense. For rapidly changing or highly specific information, just-in-time loading avoids stale context.
@@ -96,6 +96,7 @@ Effective context budgeting requires understanding not just raw token counts but
 ## Examples
 **Example 1: Organizing System Prompts**
 ```markdown
 <BACKGROUND_INFORMATION>
 You are a Python expert helping a development team.
@@ -121,23 +122,29 @@ Explain the reasoning behind suggestions.
 ```
 **Example 2: Progressive Document Loading**
 ```markdown
 # Instead of loading all documentation at once:
 # Step 1: Load summary
-docs/architecture_overview.md     # Lightweight overview
+docs/architecture_overview.md # Lightweight overview
 # Step 2: Load specific section as needed
-docs/api/endpoints.md             # Only when API work needed
-docs/database/schemas.md          # Only when data layer work needed
+docs/api/endpoints.md # Only when API work needed
+docs/database/schemas.md # Only when data layer work needed
 ```
 **Example 3: Skill Description Design**
 ```markdown
 # Bad: Vague description that loads into context but provides little signal
 description: Helps with code things
 # Good: Specific description that helps model decide when to activate
 description: Analyze code quality and suggest refactoring patterns. Use when reviewing pull requests or improving existing code structure.
 ```
@@ -273,64 +280,76 @@ Implement these strategies through specific architectural patterns. Use just-in-
 ## Examples
 **Example 1: Detecting Degradation in Prompt Design**
 ```markdown
 # Signs your command/skill prompt may be too large:
 Early signs (context ~50-70% utilized):
 - Agent occasionally misses instructions
 - Responses become less focused
 - Some guidelines ignored
 Warning signs (context ~70-85% utilized):
 - Inconsistent behavior across runs
 - Agent "forgets" earlier instructions
 - Quality varies significantly
 Critical signs (context >85% utilized):
 - Agent ignores key constraints
 - Hallucinations increase
 - Task completion fails
 ```
 **Example 2: Mitigating Lost-in-Middle in Prompt Structure**
 ```markdown
 # Organize prompts with critical info at edges
-<CRITICAL_CONSTRAINTS>                    # At start (high attention)
+<CRITICAL_CONSTRAINTS> # At start (high attention)
 - Never modify production files directly
 - Always run tests before committing
 - Maximum file size: 500 lines
-</CRITICAL_CONSTRAINTS>
+  </CRITICAL_CONSTRAINTS>
+<DETAILED_GUIDELINES> # Middle (lower attention)
-<DETAILED_GUIDELINES>                     # Middle (lower attention)
 - Code style preferences
 - Documentation templates
 - Review checklists
 - Example patterns
-</DETAILED_GUIDELINES>
+  </DETAILED_GUIDELINES>
+<KEY_REMINDERS> # At end (high attention)
-<KEY_REMINDERS>                           # At end (high attention)
 - Run tests: npm run test
 - Format code: npm run format
 - Create PR with description
-</KEY_REMINDERS>
+  </KEY_REMINDERS>
 ```
 **Example 3: Sub-Agent Context Isolation**
 ```markdown
 # Instead of one agent handling everything:
 ## Coordinator Agent (lean context)
 - Understands task decomposition
 - Delegates to specialized sub-agents
 - Synthesizes results
 ## Code Review Sub-Agent (isolated context)
 - Loaded only with code review guidelines
 - Focuses solely on review task
 - Returns structured findings
 ## Test Writer Sub-Agent (isolated context)
 - Loaded only with testing patterns
 - Focuses solely on test creation
 - Returns test files
@@ -378,12 +397,13 @@ Extract all factual claims from the following output. List each claim on a separ
 </TASK>
 <FOCUS_AREAS>
 - File paths and their existence
 - Function/class/method names referenced
 - Code behavior assertions ("this function returns X")
 - External facts about APIs, libraries, or specifications
 - Numerical values and metrics
-</FOCUS_AREAS>
+  </FOCUS_AREAS>
 <OUTPUT_TO_ANALYZE>
 {agent_output}
@@ -412,11 +432,12 @@ Verify this claim by checking the actual codebase and context.
 </CLAIM>
 <VERIFICATION_APPROACH>
 - For file paths: Use file tools to check existence
 - For code claims: Read the actual code and verify behavior
 - For external facts: Cross-reference with documentation or web search
 - For metrics: Analyze the code structure
-</VERIFICATION_APPROACH>
+  </VERIFICATION_APPROACH>
 <RESPONSE_FORMAT>
 STATUS: [VERIFIED | FALSE | UNVERIFIABLE]
@@ -452,10 +473,11 @@ Specific issues:
 {list of FALSE and UNVERIFIABLE claims with evidence}
 Please regenerate your response. For each factual claim:
 1. Explicitly verify it using tools before stating it
 2. If you cannot verify, state "I cannot verify..." instead of asserting
 3. Cite the specific file/line/source for verifiable facts
-</REGENERATION_PROMPT>
+   </REGENERATION_PROMPT>
 ```
 ## Lost-in-Middle Detection Workflow
@@ -477,6 +499,7 @@ Extract all critical instructions from your prompt that the agent MUST follow:
 ```markdown
 Critical instructions to verify:
 1. "Never modify files in /production"
 2. "Always run tests before committing"
 3. "Use TypeScript strict mode"
@@ -497,10 +520,11 @@ Prompt: {your_full_prompt_being_tested}
 Task: {representative_task_that_exercises_all_instructions}
 For each run, save:
 - run_id: unique identifier
 - agent_output: complete response from agent
 - timestamp: when run completed
-</AGENT_RUN_CONFIG>
+  </AGENT_RUN_CONFIG>
 ```
 **Step 3: Verify Each Output Against Original Prompt**
@@ -527,16 +551,18 @@ You are a compliance verification agent. Analyze whether the agent output follow
 <VERIFICATION_APPROACH>
 For each critical instruction:
 1. Determine if the instruction was applicable to this task
 2. If applicable, check whether the output complies
 3. Look for both explicit violations and omissions
 4. Note any partial compliance
-</VERIFICATION_APPROACH>
+   </VERIFICATION_APPROACH>
 <OUTPUT_FORMAT>
 RUN_ID: {run_id}
 INSTRUCTION_COMPLIANCE:
 - Instruction 1: "Never modify files in /production"
   STATUS: [FOLLOWED | VIOLATED | NOT_APPLICABLE]
   EVIDENCE: {quote from output or explanation}
@@ -548,11 +574,12 @@ INSTRUCTION_COMPLIANCE:
 [... continue for all instructions ...]
 SUMMARY:
 - Instructions followed: {count}
 - Instructions violated: {count}
 - Not applicable: {count}
-</OUTPUT_FORMAT>
-</VERIFICATION_AGENT_PROMPT>
+  </OUTPUT_FORMAT>
+  </VERIFICATION_AGENT_PROMPT>
 ```
 **Step 4: Aggregate Results and Identify At-Risk Parts**
@@ -562,18 +589,19 @@ Collect verification results from all runs and identify instructions that were i
 ```markdown
 <AGGREGATION_LOGIC>
 For each instruction:
-  followed_count = number of runs where STATUS == FOLLOWED
-  violated_count = number of runs where STATUS == VIOLATED
-  applicable_runs = total_runs - (runs where STATUS == NOT_APPLICABLE)
+followed_count = number of runs where STATUS == FOLLOWED
+violated_count = number of runs where STATUS == VIOLATED
+applicable_runs = total_runs - (runs where STATUS == NOT_APPLICABLE)
+compliance_rate = followed_count / applicable_runs
-  compliance_rate = followed_count / applicable_runs
+Classification:
-  Classification:
-  - compliance_rate == 1.0: RELIABLE (always followed)
-  - compliance_rate >= 0.8: MOSTLY_RELIABLE (minor inconsistency)
-  - compliance_rate >= 0.5: AT_RISK (inconsistent - likely lost-in-middle)
-  - compliance_rate < 0.5: FREQUENTLY_IGNORED (severe issue)
-  - compliance_rate == 0.0: ALWAYS_IGNORED (critical failure)
+- compliance_rate == 1.0: RELIABLE (always followed)
+- compliance_rate >= 0.8: MOSTLY_RELIABLE (minor inconsistency)
+- compliance_rate >= 0.5: AT_RISK (inconsistent - likely lost-in-middle)
+- compliance_rate < 0.5: FREQUENTLY_IGNORED (severe issue)
+- compliance_rate == 0.0: ALWAYS_IGNORED (critical failure)
 AT_RISK instructions are the primary signal for lost-in-middle problems.
 These are instructions that work sometimes but not consistently, indicating
@@ -583,22 +611,23 @@ they are in attention-weak positions.
 <AGGREGATION_OUTPUT_FORMAT>
 INSTRUCTION COMPLIANCE SUMMARY:
-| Instruction | Followed | Violated | Compliance Rate | Status |
-|-------------|----------|----------|-----------------|--------|
-| 1. Never modify /production | 5/5 | 0/5 | 100% | RELIABLE |
-| 2. Run tests before commit | 3/5 | 2/5 | 60% | AT_RISK |
-| 3. TypeScript strict mode | 4/5 | 1/5 | 80% | MOSTLY_RELIABLE |
-| 4. Max function length 50 | 2/5 | 3/5 | 40% | FREQUENTLY_IGNORED |
-| 5. Include JSDoc | 5/5 | 0/5 | 100% | RELIABLE |
-| 6. Format as JSON | 1/5 | 4/5 | 20% | ALWAYS_IGNORED |
-| 7. Log modifications | 3/5 | 2/5 | 60% | AT_RISK |
+| Instruction                 | Followed | Violated | Compliance Rate | Status             |
+| --------------------------- | -------- | -------- | --------------- | ------------------ |
+| 1. Never modify /production | 5/5      | 0/5      | 100%            | RELIABLE           |
+| 2. Run tests before commit  | 3/5      | 2/5      | 60%             | AT_RISK            |
+| 3. TypeScript strict mode   | 4/5      | 1/5      | 80%             | MOSTLY_RELIABLE    |
+| 4. Max function length 50   | 2/5      | 3/5      | 40%             | FREQUENTLY_IGNORED |
+| 5. Include JSDoc            | 5/5      | 0/5      | 100%            | RELIABLE           |
+| 6. Format as JSON           | 1/5      | 4/5      | 20%             | ALWAYS_IGNORED     |
+| 7. Log modifications        | 3/5      | 2/5      | 60%             | AT_RISK            |
 AT-RISK INSTRUCTIONS (likely in lost-in-middle zone):
 - Instruction 2: "Run tests before commit" (60% compliance)
 - Instruction 4: "Max function length 50" (40% compliance)
 - Instruction 6: "Format as JSON" (20% compliance)
 - Instruction 7: "Log modifications" (60% compliance)
-</AGGREGATION_OUTPUT_FORMAT>
+  </AGGREGATION_OUTPUT_FORMAT>
 ```
 **Step 5: Output Recommendations**
@@ -626,10 +655,10 @@ SPECIFIC RECOMMENDATIONS:
    Restructure at-risk instructions with emphasis:
    Before: "Always run tests before committing"
-   After:  "**CRITICAL:** You MUST run tests before committing. Never skip this step."
+   After: "**CRITICAL:** You MUST run tests before committing. Never skip this step."
    Before: "Maximum function length: 50 lines"
-   After:  "3. [REQUIRED] Maximum function length: 50 lines"
+   After: "3. [REQUIRED] Maximum function length: 50 lines"
    Use numbered lists, bold markers, or explicit tags like [REQUIRED], [CRITICAL], [MUST].
@@ -644,7 +673,7 @@ SPECIFIC RECOMMENDATIONS:
    - Moving 2-3 most critical items to edges
    - Converting remaining middle items to a numbered checklist
    - Adding explicit "verify these items" reminder at end
-</RECOMMENDATIONS_OUTPUT>
+     </RECOMMENDATIONS_OUTPUT>
 ```
 ### Complete Workflow Example
@@ -653,31 +682,37 @@ SPECIFIC RECOMMENDATIONS:
 # Example: Testing a Code Review Command
 ## Original Prompt Being Tested:
 "Review the code for: security issues, performance problems,
 code style, test coverage, documentation completeness,
 error handling, and logging practices."
 ## Run 5 Agents:
 Each agent reviews the same code sample with this prompt.
 ## Verification Results:
-| Instruction | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Rate |
-|-------------|-------|-------|-------|-------|-------|------|
-| Security | Y | Y | Y | Y | Y | 100% |
-| Performance | Y | X | Y | X | Y | 60% |
-| Code style | X | X | Y | X | X | 20% |
-| Test coverage | X | Y | X | X | Y | 40% |
-| Documentation | X | X | X | Y | X | 20% |
-| Error handling | Y | Y | X | Y | Y | 80% |
-| Logging | Y | Y | Y | Y | Y | 100% |
+| Instruction    | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Rate |
+| -------------- | ----- | ----- | ----- | ----- | ----- | ---- |
+| Security       | Y     | Y     | Y     | Y     | Y     | 100% |
+| Performance    | Y     | X     | Y     | X     | Y     | 60%  |
+| Code style     | X     | X     | Y     | X     | X     | 20%  |
+| Test coverage  | X     | Y     | X     | X     | Y     | 40%  |
+| Documentation  | X     | X     | X     | Y     | X     | 20%  |
+| Error handling | Y     | Y     | X     | Y     | Y     | 80%  |
+| Logging        | Y     | Y     | Y     | Y     | Y     | 100% |
 ## Analysis:
 - RELIABLE: Security, Logging (at edges of list)
 - AT_RISK: Performance, Error handling
 - FREQUENTLY_IGNORED: Code style, Test coverage, Documentation (middle of list)
 ## Remediation Applied:
 "**CRITICAL REVIEW AREAS:**
 1. Security vulnerabilities
 2. Test coverage gaps
 3. Documentation completeness
@@ -706,6 +741,7 @@ Record the output of each agent in your chain:
 ```markdown
 Agent Chain Record:
 - Agent 1 (Analyzer): {output_1}
 - Agent 2 (Planner): {output_2}
 - Agent 3 (Implementer): {output_3}
@@ -758,11 +794,12 @@ Agent 4 Output: {output_4}
 <ANALYSIS_APPROACH>
 For each agent output (starting from the last):
 1. Does this output contain the error?
 2. If yes, was the error present in the input to this agent?
 3. If error is in output but not input: This agent INTRODUCED the error
 4. If error is in both: This agent PROPAGATED the error
-</ANALYSIS_APPROACH>
+   </ANALYSIS_APPROACH>
 <OUTPUT_FORMAT>
 ERROR: {error_id}
@@ -803,7 +840,7 @@ After Agent {N} completes:
    - Or: Regenerate Agent N output with explicit guidance
 3. Only proceed to Agent {N+1} if verification passes
-</ERROR_BOUNDARY_TEMPLATE>
+   </ERROR_BOUNDARY_TEMPLATE>
 ```
 ## Context Relevance Scoring Workflow
@@ -813,7 +850,7 @@ Not all parts of a prompt contribute equally to task completion. This workflow i
 ### When to Use
 - When optimizing prompt length and content
-- When deciding what to include in CLAUDE.md
+- When deciding what to include in AGENTS.md
 - When a prompt feels bloated but you are unsure what to cut
 - When debugging agents that ignore provided context
 - Before deploying new commands, skills, or agent prompts
@@ -827,35 +864,32 @@ Divide the prompt (command/skill/agent) into logical sections. Each part should
 ```markdown
 <PROMPT_PARTS>
 PART_1:
-  ID: background
-  CONTENT: |
-    You are a Python expert helping a development team.
-    Current project: Data processing pipeline in Python 3.9+
+ID: background
+CONTENT: |
+You are a Python expert helping a development team.
+Current project: Data processing pipeline in Python 3.9+
 PART_2:
-  ID: code_style_rules
-  CONTENT: |
-    - Write clean, idiomatic Python code
-    - Include type hints for function signatures
-    - Add docstrings for public functions
-    - Follow PEP 8 style guidelines
+ID: code_style_rules
+CONTENT: | - Write clean, idiomatic Python code - Include type hints for function signatures - Add docstrings for public functions - Follow PEP 8 style guidelines
 PART_3:
-  ID: historical_context
-  CONTENT: |
-    The project was migrated from Python 2.7 in 2019.
-    Original team used camelCase naming but we now use snake_case.
-    Legacy modules in /legacy folder are frozen.
+ID: historical_context
+CONTENT: |
+The project was migrated from Python 2.7 in 2019.
+Original team used camelCase naming but we now use snake_case.
+Legacy modules in /legacy folder are frozen.
 PART_4:
-  ID: output_format
-  CONTENT: |
-    Provide actionable feedback with specific line references.
-    Explain the reasoning behind suggestions.
+ID: output_format
+CONTENT: |
+Provide actionable feedback with specific line references.
+Explain the reasoning behind suggestions.
 </PROMPT_PARTS>
 ```
 Splitting guidelines:
 - Each XML section or Markdown header becomes a part
 - Separate conceptually distinct instructions into their own parts
 - Keep related instructions together (do not split mid-thought)
@@ -883,26 +917,30 @@ Example: "Review a pull request for code quality issues and suggest improvements
 Score 0-10 based on these criteria:
 ESSENTIAL (8-10):
 - Part directly enables task completion
 - Removing this part would cause task failure
 - Part contains critical constraints that prevent errors
 - Part defines required output format or structure
 HELPFUL (5-7):
 - Part improves output quality but is not strictly required
 - Part provides useful context that guides better decisions
 - Part contains preferences that affect style but not correctness
 MARGINAL (2-4):
 - Part has tangential relevance to the task
 - Part might occasionally be useful but usually is not
 - Part provides historical context rarely needed
 DISTRACTOR (0-1):
 - Part is irrelevant to the task
 - Part could confuse the agent about what to focus on
 - Part competes for attention without contributing value
-</SCORING_CRITERIA>
+  </SCORING_CRITERIA>
 <OUTPUT_FORMAT>
 RELEVANCE_SCORE: [0-10]
@@ -943,12 +981,14 @@ Apply the distractor threshold (score < 5):
 DISTRACTOR_ANALYSIS:
 Identified Distractors:
 1. PART: historical_context
    SCORE: 3/10
    JUSTIFICATION: "Migration history from Python 2.7 is rarely relevant to reviewing current code. The naming convention note is useful but should be in code_style_rules instead."
    RECOMMENDATION: REMOVE or RELOCATE
 Summary:
 - Total parts: 4
 - High-relevance parts (>=5): 3
 - Distractor parts (<5): 1
@@ -956,6 +996,7 @@ Summary:
 - Average relevance: 6.75
 Token Impact:
 - Distractor tokens: ~45 (historical_context)
 - Potential savings: 45 tokens (11% of prompt)
 ```
@@ -980,6 +1021,7 @@ OPTIMIZATION_RECOMMENDATIONS:
    Savings: ~15 tokens
 OPTIMIZED PROMPT STRUCTURE:
 - background (condensed): 8 tokens
 - code_style_rules (with snake_case added): 52 tokens
 - output_format: 28 tokens
@@ -991,13 +1033,14 @@ OPTIMIZED PROMPT STRUCTURE:
 The default threshold of 5 balances comprehensiveness against efficiency:
-| Threshold | Use Case |
-|-----------|----------|
-| < 3 | Aggressive pruning for token-constrained contexts |
-| < 5 | Standard optimization (recommended default) |
-| < 7 | Conservative pruning for critical prompts |
+| Threshold | Use Case                                          |
+| --------- | ------------------------------------------------- |
+| < 3       | Aggressive pruning for token-constrained contexts |
+| < 5       | Standard optimization (recommended default)       |
+| < 7       | Conservative pruning for critical prompts         |
 Adjust threshold based on:
 - **Context budget pressure**: Lower threshold when approaching limits
 - **Task criticality**: Higher threshold for production prompts
 - **Prompt stability**: Lower threshold for experimental prompts
@@ -1008,14 +1051,16 @@ For efficiency, parallelize scoring agents:
 ```markdown
 # Parallel execution pattern
 spawn_parallel([
-  scoring_agent(part_1, task_description),
-  scoring_agent(part_2, task_description),
-  scoring_agent(part_3, task_description),
-  ...
+scoring_agent(part_1, task_description),
+scoring_agent(part_2, task_description),
+scoring_agent(part_3, task_description),
+...
 ])
 # Collect and aggregate
 scores = await_all(scoring_agents)
 analysis = aggregate_scores(scores)
 ```
@@ -1052,30 +1097,35 @@ Analyze the recent conversation history for signs of context degradation.
 Check for these degradation symptoms:
 LOST_IN_MIDDLE:
 - [ ] Agent missing instructions from early in conversation
 - [ ] Critical constraints being ignored
 - [ ] Agent asking for information already provided
 CONTEXT_POISONING:
 - [ ] Same error appearing repeatedly
 - [ ] Agent referencing incorrect information as fact
 - [ ] Hallucinations that persist despite correction
 CONTEXT_DISTRACTION:
 - [ ] Responses becoming unfocused
 - [ ] Agent using irrelevant context inappropriately
 - [ ] Quality declining on previously-successful tasks
 CONTEXT_CONFUSION:
 - [ ] Agent mixing up different task requirements
 - [ ] Wrong tool selections for obvious tasks
 - [ ] Outputs that blend requirements from different tasks
 CONTEXT_CLASH:
 - [ ] Agent expressing uncertainty about conflicting information
 - [ ] Inconsistent behavior between turns
 - [ ] Agent asking for clarification on resolved issues
-</SYMPTOM_CHECKLIST>
+      </SYMPTOM_CHECKLIST>
 <OUTPUT_FORMAT>
 HEALTH_STATUS: [HEALTHY | DEGRADED | CRITICAL]
@@ -1091,10 +1141,11 @@ Based on health status, trigger appropriate intervention:
 ```markdown
 IF HEALTH_STATUS == "DEGRADED" or HEALTH_STATUS == "CRITICAL":
-  <RESTART_INTERVENTION>
-  1. Extract essential state to preserve and save to a file
-  2. Ask user to start a new session with clean context and load the preserved state from the file after the new session is started
-  </RESTART_INTERVENTION>
+<RESTART_INTERVENTION>
+1. Extract essential state to preserve and save to a file
+2. Ask user to start a new session with clean context and load the preserved state from the file after the new session is started
+   </RESTART_INTERVENTION>
 ```
 ## Guidelines for Multi-Agent Verification
@@ -1153,16 +1204,19 @@ Observation masking replaces verbose tool outputs with compact references. The i
 Not all observations should be masked equally:
 **Never mask:**
 - Observations critical to current task
 - Observations from the most recent turn
 - Observations used in active reasoning
 **Consider masking:**
 - Observations from 3+ turns ago
 - Verbose outputs with key points extractable
 - Observations whose purpose has been served
 **Always mask:**
 - Repeated outputs
 - Boilerplate headers/footers
 - Outputs already summarized in conversation
@@ -1176,6 +1230,7 @@ This approach achieves separation of concerns--the detailed search context remai
 **When to Partition**
 Consider partitioning when:
 - Task naturally decomposes into independent subtasks
 - Different subtasks require different specialized context
 - Context accumulation threatens to exceed limits
@@ -1183,22 +1238,24 @@ Consider partitioning when:
 **Result Aggregation**
 Aggregate results from partitioned subtasks by:
 1. Validating all partitions completed
 2. Merging compatible results
 3. Summarizing if combined results still too large
 4. Resolving conflicts between partition outputs
 ## Practical Guidance
 ### Optimization Decision Framework
 **When to optimize:**
 - Response quality degrades as conversations extend
 - Costs increase due to long contexts
 - Latency increases with conversation length
 **What to apply:**
 - Tool outputs dominate: observation masking
 - Retrieved documents dominate: summarization or partitioning
 - Message history dominates: compaction with summarization
@@ -1208,29 +1265,41 @@ Aggregate results from partitioned subtasks by:
 **Command Optimization**
 Commands load on-demand, so focus on keeping individual commands focused:
 ```markdown
 # Good: Focused command with clear scope
 ---
 name: review-security
 description: Review code for security vulnerabilities
 ---
 # Specific security review instructions only
 # Avoid: Overloaded command trying to do everything
 ---
 name: review-all
 description: Review code for everything
 ---
 # 50 different review checklists crammed together
 ```
 **Skill Optimization**
 Skills load their descriptions by default, so descriptions must be concise:
 ```markdown
 # Good: Concise description
 description: Analyze code architecture. Use for design reviews.
 # Avoid: Verbose description that wastes context budget
 description: This skill provides comprehensive analysis of code
 architecture including but not limited to class hierarchies,
 dependency graphs, coupling metrics, cohesion analysis...
@@ -1238,12 +1307,15 @@ dependency graphs, coupling metrics, cohesion analysis...
 **Sub-Agent Context Design**
 When spawning sub-agents, provide focused context:
 ```markdown
 # Coordinator provides minimal handoff:
 "Review authentication module for security issues.
 Return findings in structured format."
 # NOT this verbose handoff:
 "I need you to look at the authentication module which is
 located in src/auth/ and contains several files including
 login.ts, session.ts, tokens.ts... [500 more tokens of context]"