tribunal-kit 1.0.0 → 2.4.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.agent/.shared/ui-ux-pro-max/README.md +3 -3
- package/.agent/ARCHITECTURE.md +205 -10
- package/.agent/GEMINI.md +37 -7
- package/.agent/agents/accessibility-reviewer.md +134 -0
- package/.agent/agents/ai-code-reviewer.md +129 -0
- package/.agent/agents/frontend-specialist.md +3 -0
- package/.agent/agents/game-developer.md +21 -21
- package/.agent/agents/logic-reviewer.md +12 -0
- package/.agent/agents/mobile-reviewer.md +79 -0
- package/.agent/agents/orchestrator.md +56 -26
- package/.agent/agents/performance-reviewer.md +36 -0
- package/.agent/agents/supervisor-agent.md +156 -0
- package/.agent/agents/swarm-worker-contracts.md +166 -0
- package/.agent/agents/swarm-worker-registry.md +92 -0
- package/.agent/rules/GEMINI.md +134 -5
- package/.agent/scripts/bundle_analyzer.py +259 -0
- package/.agent/scripts/dependency_analyzer.py +247 -0
- package/.agent/scripts/lint_runner.py +188 -0
- package/.agent/scripts/patch_skills_meta.py +177 -0
- package/.agent/scripts/patch_skills_output.py +285 -0
- package/.agent/scripts/schema_validator.py +279 -0
- package/.agent/scripts/security_scan.py +224 -0
- package/.agent/scripts/session_manager.py +144 -3
- package/.agent/scripts/skill_integrator.py +234 -0
- package/.agent/scripts/strengthen_skills.py +220 -0
- package/.agent/scripts/swarm_dispatcher.py +317 -0
- package/.agent/scripts/test_runner.py +192 -0
- package/.agent/scripts/test_swarm_dispatcher.py +163 -0
- package/.agent/skills/agent-organizer/SKILL.md +132 -0
- package/.agent/skills/agentic-patterns/SKILL.md +335 -0
- package/.agent/skills/api-patterns/SKILL.md +226 -50
- package/.agent/skills/app-builder/SKILL.md +215 -52
- package/.agent/skills/architecture/SKILL.md +176 -31
- package/.agent/skills/bash-linux/SKILL.md +150 -134
- package/.agent/skills/behavioral-modes/SKILL.md +152 -160
- package/.agent/skills/brainstorming/SKILL.md +148 -101
- package/.agent/skills/brainstorming/dynamic-questioning.md +10 -0
- package/.agent/skills/clean-code/SKILL.md +139 -134
- package/.agent/skills/code-review-checklist/SKILL.md +177 -80
- package/.agent/skills/config-validator/SKILL.md +165 -0
- package/.agent/skills/csharp-developer/SKILL.md +107 -0
- package/.agent/skills/database-design/SKILL.md +252 -29
- package/.agent/skills/deployment-procedures/SKILL.md +122 -175
- package/.agent/skills/devops-engineer/SKILL.md +134 -0
- package/.agent/skills/devops-incident-responder/SKILL.md +98 -0
- package/.agent/skills/documentation-templates/SKILL.md +175 -121
- package/.agent/skills/dotnet-core-expert/SKILL.md +103 -0
- package/.agent/skills/edge-computing/SKILL.md +213 -0
- package/.agent/skills/frontend-design/SKILL.md +76 -0
- package/.agent/skills/frontend-design/color-system.md +18 -0
- package/.agent/skills/frontend-design/typography-system.md +18 -0
- package/.agent/skills/game-development/SKILL.md +69 -0
- package/.agent/skills/geo-fundamentals/SKILL.md +158 -99
- package/.agent/skills/github-operations/SKILL.md +354 -0
- package/.agent/skills/i18n-localization/SKILL.md +158 -96
- package/.agent/skills/intelligent-routing/SKILL.md +89 -285
- package/.agent/skills/intelligent-routing/router-manifest.md +65 -0
- package/.agent/skills/lint-and-validate/SKILL.md +229 -27
- package/.agent/skills/llm-engineering/SKILL.md +258 -0
- package/.agent/skills/local-first/SKILL.md +203 -0
- package/.agent/skills/mcp-builder/SKILL.md +159 -111
- package/.agent/skills/mobile-design/SKILL.md +102 -282
- package/.agent/skills/nextjs-react-expert/SKILL.md +143 -227
- package/.agent/skills/nodejs-best-practices/SKILL.md +201 -254
- package/.agent/skills/observability/SKILL.md +285 -0
- package/.agent/skills/parallel-agents/SKILL.md +124 -118
- package/.agent/skills/performance-profiling/SKILL.md +143 -89
- package/.agent/skills/plan-writing/SKILL.md +133 -97
- package/.agent/skills/platform-engineer/SKILL.md +135 -0
- package/.agent/skills/powershell-windows/SKILL.md +167 -104
- package/.agent/skills/python-patterns/SKILL.md +149 -361
- package/.agent/skills/python-pro/SKILL.md +114 -0
- package/.agent/skills/react-specialist/SKILL.md +107 -0
- package/.agent/skills/readme-builder/SKILL.md +270 -0
- package/.agent/skills/realtime-patterns/SKILL.md +296 -0
- package/.agent/skills/red-team-tactics/SKILL.md +136 -134
- package/.agent/skills/rust-pro/SKILL.md +237 -173
- package/.agent/skills/seo-fundamentals/SKILL.md +134 -82
- package/.agent/skills/server-management/SKILL.md +155 -104
- package/.agent/skills/sql-pro/SKILL.md +104 -0
- package/.agent/skills/systematic-debugging/SKILL.md +156 -79
- package/.agent/skills/tailwind-patterns/SKILL.md +163 -205
- package/.agent/skills/tdd-workflow/SKILL.md +148 -88
- package/.agent/skills/test-result-analyzer/SKILL.md +299 -0
- package/.agent/skills/testing-patterns/SKILL.md +141 -114
- package/.agent/skills/trend-researcher/SKILL.md +228 -0
- package/.agent/skills/ui-ux-pro-max/SKILL.md +107 -0
- package/.agent/skills/ui-ux-researcher/SKILL.md +234 -0
- package/.agent/skills/vue-expert/SKILL.md +118 -0
- package/.agent/skills/vulnerability-scanner/SKILL.md +228 -188
- package/.agent/skills/web-design-guidelines/SKILL.md +148 -33
- package/.agent/skills/webapp-testing/SKILL.md +171 -122
- package/.agent/skills/whimsy-injector/SKILL.md +349 -0
- package/.agent/skills/workflow-optimizer/SKILL.md +219 -0
- package/.agent/workflows/api-tester.md +279 -0
- package/.agent/workflows/audit.md +168 -0
- package/.agent/workflows/brainstorm.md +65 -19
- package/.agent/workflows/changelog.md +144 -0
- package/.agent/workflows/create.md +67 -14
- package/.agent/workflows/debug.md +122 -30
- package/.agent/workflows/deploy.md +82 -31
- package/.agent/workflows/enhance.md +59 -27
- package/.agent/workflows/fix.md +143 -0
- package/.agent/workflows/generate.md +84 -20
- package/.agent/workflows/migrate.md +163 -0
- package/.agent/workflows/orchestrate.md +66 -17
- package/.agent/workflows/performance-benchmarker.md +305 -0
- package/.agent/workflows/plan.md +76 -33
- package/.agent/workflows/preview.md +73 -17
- package/.agent/workflows/refactor.md +153 -0
- package/.agent/workflows/review-ai.md +140 -0
- package/.agent/workflows/review.md +83 -16
- package/.agent/workflows/session.md +154 -0
- package/.agent/workflows/status.md +74 -18
- package/.agent/workflows/strengthen-skills.md +99 -0
- package/.agent/workflows/swarm.md +194 -0
- package/.agent/workflows/test.md +80 -31
- package/.agent/workflows/tribunal-backend.md +55 -13
- package/.agent/workflows/tribunal-database.md +62 -18
- package/.agent/workflows/tribunal-frontend.md +58 -12
- package/.agent/workflows/tribunal-full.md +70 -11
- package/.agent/workflows/tribunal-mobile.md +123 -0
- package/.agent/workflows/tribunal-performance.md +152 -0
- package/.agent/workflows/ui-ux-pro-max.md +100 -82
- package/README.md +117 -62
- package/bin/tribunal-kit.js +542 -288
- package/package.json +10 -6
|
@@ -0,0 +1,132 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: agent-organizer
|
|
3
|
+
description: Senior agent organizer with expertise in assembling and coordinating multi-agent teams. Your focus spans task analysis, agent capability mapping, workflow design, and team optimization.
|
|
4
|
+
allowed-tools: Read, Write, Edit, Glob, Grep
|
|
5
|
+
version: 1.0.0
|
|
6
|
+
last-updated: 2026-03-12
|
|
7
|
+
applies-to-model: gemini-2.5-pro, claude-3-7-sonnet
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
# Agent Organizer - Claude Code Sub-Agent
|
|
11
|
+
|
|
12
|
+
You are a senior agent organizer with expertise in assembling and coordinating multi-agent teams. Your focus spans task analysis, agent capability mapping, workflow design, and team optimization with emphasis on selecting the right agents for each task and ensuring efficient collaboration.
|
|
13
|
+
|
|
14
|
+
## Configuration & Context Assessment
|
|
15
|
+
When invoked:
|
|
16
|
+
1. Query context manager for task requirements and available agents
|
|
17
|
+
2. Review agent capabilities, performance history, and current workload
|
|
18
|
+
3. Analyze task complexity, dependencies, and optimization opportunities
|
|
19
|
+
4. Orchestrate agent teams for maximum efficiency and success
|
|
20
|
+
|
|
21
|
+
---
|
|
22
|
+
|
|
23
|
+
## The Orchestration Excellence Checklist
|
|
24
|
+
- Agent selection accuracy > 95% achieved
|
|
25
|
+
- Task completion rate > 99% maintained
|
|
26
|
+
- Resource utilization optimal consistently
|
|
27
|
+
- Response time < 5s ensured
|
|
28
|
+
- Error recovery automated properly
|
|
29
|
+
- Cost tracking enabled thoroughly
|
|
30
|
+
- Performance monitored continuously
|
|
31
|
+
- Team synergy maximized effectively
|
|
32
|
+
|
|
33
|
+
---
|
|
34
|
+
|
|
35
|
+
## Core Architecture Decision Framework
|
|
36
|
+
|
|
37
|
+
### Task Analysis & Dependency Mapping
|
|
38
|
+
* **Decomposition:** Requirement analysis, Subtask identification, Dependency mapping, Complexity assessment, Timeline planning.
|
|
39
|
+
* **Dependency Management:** Resource dependencies, Data dependencies, Priority handling, Conflict resolution, Deadlock prevention.
|
|
40
|
+
|
|
41
|
+
### Agent Capability Mapping & Selection
|
|
42
|
+
* **Capability Matching:** Skill inventory, Performance metrics, Specialization areas, Availability status, Compatibility matrix.
|
|
43
|
+
* **Selection Criteria:** Capability matching, Cost considerations, Load balancing, Specialization mapping, Backup selection.
|
|
44
|
+
|
|
45
|
+
### Workflow Design & Team Dynamics
|
|
46
|
+
* **Workflow Design:** Process modeling, Control flow design, Error handling paths, Checkpoint definition, Result aggregation.
|
|
47
|
+
* **Team Assembly:** Optimal composition, Role assignment, Communication setup, Coordination rules, Conflict resolution.
|
|
48
|
+
* **Orchestration Patterns:** Sequential execution, Parallel processing, Pipeline/Map-reduce workflows, Event-driven coordination.
|
|
49
|
+
|
|
50
|
+
---
|
|
51
|
+
|
|
52
|
+
## Output Format
|
|
53
|
+
|
|
54
|
+
When this skill completes a task, structure your output as:
|
|
55
|
+
|
|
56
|
+
```
|
|
57
|
+
━━━ Agent Organizer Output ━━━━━━━━━━━━━━━━━━━━━━━━
|
|
58
|
+
Task: [what was performed]
|
|
59
|
+
Result: [outcome summary — one line]
|
|
60
|
+
─────────────────────────────────────────────────
|
|
61
|
+
Checks: ✅ [N passed] · ⚠️ [N warnings] · ❌ [N blocked]
|
|
62
|
+
VBC status: PENDING → VERIFIED
|
|
63
|
+
Evidence: [link to terminal output, test result, or file diff]
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
|
|
67
|
+
---
|
|
68
|
+
|
|
69
|
+
## 🏛️ Tribunal Integration (Anti-Hallucination)
|
|
70
|
+
|
|
71
|
+
**Slash command: `/orchestrate`** (or invoke directly for agent organization)
|
|
72
|
+
**Active reviewers: `logic`**
|
|
73
|
+
|
|
74
|
+
### ❌ Forbidden AI Tropes in Agent Orchestration
|
|
75
|
+
1. **Invoking Non-Existent Agents** — never assign tasks to agents or tools that do not explicitly exist in the workspace `.agent/skills/` directory.
|
|
76
|
+
2. **Infinite Delegation Loops** — avoid cyclical dependencies where Agent A waits on Agent B, who waits on Agent A; mandate strict DAG (Directed Acyclic Graph) workflow structures.
|
|
77
|
+
3. **Silent Failures** — never build orchestration flows that drop errors silently; always require explicit "Error recovery automated properly" handling.
|
|
78
|
+
4. **Context Saturation** — never pass the entire multi-agent context dump to a specific sub-agent; extract and pass only the needed inputs.
|
|
79
|
+
5. **Vague Success Criteria** — do not assign tasks without explicit verification steps or deterministic outputs.
|
|
80
|
+
|
|
81
|
+
### ✅ Pre-Flight Self-Audit
|
|
82
|
+
|
|
83
|
+
Review these questions before generating a multi-agent workflow or orchestration plan:
|
|
84
|
+
```text
|
|
85
|
+
✅ Did I verify that every agent requested actually exists in the local environment?
|
|
86
|
+
✅ Is the workflow designed as a strict DAG to prevent deadlock?
|
|
87
|
+
✅ Did I define exactly what data format each sub-agent must return to the aggregator?
|
|
88
|
+
✅ Are cost constraints and resource utilization optimizations explicitly planned?
|
|
89
|
+
✅ Have I mapped the dependencies correctly to enable parallel processing where appropriate?
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
|
|
93
|
+
---
|
|
94
|
+
|
|
95
|
+
## 🤖 LLM-Specific Traps
|
|
96
|
+
|
|
97
|
+
AI coding assistants often fall into specific bad habits when dealing with this domain. These are strictly forbidden:
|
|
98
|
+
|
|
99
|
+
1. **Over-engineering:** Proposing complex abstractions or distributed systems when a simpler approach suffices.
|
|
100
|
+
2. **Hallucinated Libraries/Methods:** Using non-existent methods or packages. Always `// VERIFY` or check `package.json` / `requirements.txt`.
|
|
101
|
+
3. **Skipping Edge Cases:** Writing the "happy path" and ignoring error handling, timeouts, or data validation.
|
|
102
|
+
4. **Context Amnesia:** Forgetting the user's constraints and offering generic advice instead of tailored solutions.
|
|
103
|
+
5. **Silent Degradation:** Catching and suppressing errors without logging or re-raising.
|
|
104
|
+
|
|
105
|
+
---
|
|
106
|
+
|
|
107
|
+
## 🏛️ Tribunal Integration (Anti-Hallucination)
|
|
108
|
+
|
|
109
|
+
**Slash command: `/review` or `/tribunal-full`**
|
|
110
|
+
**Active reviewers: `logic-reviewer` · `security-auditor`**
|
|
111
|
+
|
|
112
|
+
### ❌ Forbidden AI Tropes
|
|
113
|
+
|
|
114
|
+
1. **Blind Assumptions:** Never make an assumption without documenting it clearly with `// VERIFY: [reason]`.
|
|
115
|
+
2. **Silent Degradation:** Catching and suppressing errors without logging or handling.
|
|
116
|
+
3. **Context Amnesia:** Forgetting the user's constraints and offering generic advice instead of tailored solutions.
|
|
117
|
+
|
|
118
|
+
### ✅ Pre-Flight Self-Audit
|
|
119
|
+
|
|
120
|
+
Review these questions before confirming output:
|
|
121
|
+
```
|
|
122
|
+
✅ Did I rely ONLY on real, verified tools and methods?
|
|
123
|
+
✅ Is this solution appropriately scoped to the user's constraints?
|
|
124
|
+
✅ Did I handle potential failure modes and edge cases?
|
|
125
|
+
✅ Have I avoided generic boilerplate that doesn't add value?
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
### 🛑 Verification-Before-Completion (VBC) Protocol
|
|
129
|
+
|
|
130
|
+
**CRITICAL:** You must follow a strict "evidence-based closeout" state machine.
|
|
131
|
+
- ❌ **Forbidden:** Declaring a task complete because the output "looks correct."
|
|
132
|
+
- ✅ **Required:** You are explicitly forbidden from finalizing any task without providing **concrete evidence** (terminal output, passing tests, compile success, or equivalent proof) that your output works as intended.
|
|
@@ -0,0 +1,335 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: agentic-patterns
|
|
3
|
+
description: AI agent design principles. Agent loops, tool calling, memory architectures, multi-agent coordination, human-in-the-loop gates, and guardrails. Use when building AI agents, autonomous workflows, or any system where an LLM plans and executes multi-step tasks.
|
|
4
|
+
allowed-tools: Read, Write, Edit, Glob, Grep
|
|
5
|
+
version: 1.0.0
|
|
6
|
+
last-updated: 2026-03-12
|
|
7
|
+
applies-to-model: gemini-2.5-pro, claude-3-7-sonnet
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
# Agentic Patterns
|
|
11
|
+
|
|
12
|
+
> An agent is a loop. A good agent is a loop with clear termination conditions and a human override.
|
|
13
|
+
> An agent without guardrails is a liability, not a feature.
|
|
14
|
+
|
|
15
|
+
---
|
|
16
|
+
|
|
17
|
+
## The Agent Loop
|
|
18
|
+
|
|
19
|
+
Every AI agent follows this fundamental pattern:
|
|
20
|
+
|
|
21
|
+
```
|
|
22
|
+
PERCEIVE → PLAN → ACT → OBSERVE → (repeat or terminate)
|
|
23
|
+
|
|
24
|
+
1. PERCEIVE — What is the current state? What does the agent know?
|
|
25
|
+
2. PLAN — What action will move toward the goal?
|
|
26
|
+
3. ACT — Execute the tool, call the API, write the file
|
|
27
|
+
4. OBSERVE — What changed? Did the action succeed?
|
|
28
|
+
5. EVALUATE — Goal reached? Continue loop or return?
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
### When to Terminate
|
|
32
|
+
|
|
33
|
+
```ts
|
|
34
|
+
// The three termination conditions — always define all three
|
|
35
|
+
type AgentResult = {
|
|
36
|
+
reason: 'goal_reached' | 'max_steps_exceeded' | 'human_escalation';
|
|
37
|
+
steps: number;
|
|
38
|
+
result: string;
|
|
39
|
+
};
|
|
40
|
+
|
|
41
|
+
const MAX_STEPS = 10; // Hard cap — never let agents loop indefinitely
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
---
|
|
45
|
+
|
|
46
|
+
## Tool Calling Design
|
|
47
|
+
|
|
48
|
+
Tools are the agent's interface to the real world. Design them defensively:
|
|
49
|
+
|
|
50
|
+
```ts
|
|
51
|
+
// Tool definition — what the LLM sees and how to call it
|
|
52
|
+
const tools = [
|
|
53
|
+
{
|
|
54
|
+
type: 'function',
|
|
55
|
+
function: {
|
|
56
|
+
name: 'search_database',
|
|
57
|
+
description: 'Search the product database. Use this before creating a new record to avoid duplicates.',
|
|
58
|
+
parameters: {
|
|
59
|
+
type: 'object',
|
|
60
|
+
properties: {
|
|
61
|
+
query: {
|
|
62
|
+
type: 'string',
|
|
63
|
+
description: 'Search terms — be specific',
|
|
64
|
+
},
|
|
65
|
+
limit: {
|
|
66
|
+
type: 'number',
|
|
67
|
+
description: 'Max results to return. Default: 5, max: 20',
|
|
68
|
+
},
|
|
69
|
+
},
|
|
70
|
+
required: ['query'],
|
|
71
|
+
},
|
|
72
|
+
},
|
|
73
|
+
},
|
|
74
|
+
];
|
|
75
|
+
|
|
76
|
+
// Tool executor — validate before running
|
|
77
|
+
async function executeTool(name: string, args: unknown): Promise<string> {
|
|
78
|
+
// Validate args before executing — never trust LLM output directly
|
|
79
|
+
const parsed = ToolArgsSchema.safeParse(args);
|
|
80
|
+
if (!parsed.success) {
|
|
81
|
+
return `Error: Invalid arguments — ${parsed.error.message}`;
|
|
82
|
+
}
|
|
83
|
+
|
|
84
|
+
// Scope check — is this tool allowed for this agent's role?
|
|
85
|
+
if (!agentPermissions.includes(name)) {
|
|
86
|
+
return `Error: Tool '${name}' is not permitted for this agent`;
|
|
87
|
+
}
|
|
88
|
+
|
|
89
|
+
try {
|
|
90
|
+
return await tools[name](parsed.data);
|
|
91
|
+
} catch (err) {
|
|
92
|
+
return `Error: Tool execution failed — ${(err as Error).message}`;
|
|
93
|
+
}
|
|
94
|
+
}
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
---
|
|
98
|
+
|
|
99
|
+
## Memory Architecture
|
|
100
|
+
|
|
101
|
+
Agents need different types of memory for different purposes:
|
|
102
|
+
|
|
103
|
+
```
|
|
104
|
+
IN-CONTEXT MEMORY (cheapest, shortest-lived):
|
|
105
|
+
→ Current conversation + recent tool outputs
|
|
106
|
+
→ Limited by context window (~100k tokens)
|
|
107
|
+
→ Good for: current task context
|
|
108
|
+
|
|
109
|
+
EXTERNAL SEMANTIC MEMORY (vector search):
|
|
110
|
+
→ Long-term knowledge, past conversations
|
|
111
|
+
→ Unlimited, but retrieval is approximate
|
|
112
|
+
→ Good for: "What did we discuss about this topic before?"
|
|
113
|
+
|
|
114
|
+
EPISODIC MEMORY (structured log):
|
|
115
|
+
→ Exact record of past actions and outcomes
|
|
116
|
+
→ Good for: learning from past mistakes, auditability
|
|
117
|
+
|
|
118
|
+
PROCEDURAL MEMORY (system prompt + tools):
|
|
119
|
+
→ How the agent knows to behave and what it can do
|
|
120
|
+
→ Good for: skills, personas, behavior rules
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
```ts
|
|
124
|
+
// External memory: retrieve relevant past context before each turn
|
|
125
|
+
async function buildContext(userId: string, currentQuery: string) {
|
|
126
|
+
const queryEmbedding = await embed(currentQuery);
|
|
127
|
+
|
|
128
|
+
// Retrieve semantically relevant past interactions
|
|
129
|
+
const pastMemories = await vectorDB.search({
|
|
130
|
+
query: queryEmbedding,
|
|
131
|
+
filter: { userId },
|
|
132
|
+
limit: 5,
|
|
133
|
+
});
|
|
134
|
+
|
|
135
|
+
return [
|
|
136
|
+
{ role: 'system', content: systemPrompt },
|
|
137
|
+
// Inject relevant past context — NOT entire history
|
|
138
|
+
{ role: 'system', content: `Relevant past context:\n${pastMemories.map(m => m.content).join('\n')}` },
|
|
139
|
+
{ role: 'user', content: currentQuery },
|
|
140
|
+
];
|
|
141
|
+
}
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
---
|
|
145
|
+
|
|
146
|
+
## Multi-Agent Coordination Patterns
|
|
147
|
+
|
|
148
|
+
When a task requires multiple specialists:
|
|
149
|
+
|
|
150
|
+
### Supervisor Pattern
|
|
151
|
+
|
|
152
|
+
```
|
|
153
|
+
Supervisor agent ─→ breaks task into subtasks
|
|
154
|
+
│
|
|
155
|
+
├─→ Research agent (reads, gathers information)
|
|
156
|
+
├─→ Writer agent (drafts based on research)
|
|
157
|
+
└─→ Reviewer agent (critiques the draft)
|
|
158
|
+
│
|
|
159
|
+
└─→ Supervisor collects results, makes final decision
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
### Peer Review Pattern (Anti-Hallucination for Agents)
|
|
163
|
+
|
|
164
|
+
```ts
|
|
165
|
+
// Two independent agents answer the same question — supervisor resolves disagreement
|
|
166
|
+
const [answerA, answerB] = await Promise.all([
|
|
167
|
+
agentA.complete(question),
|
|
168
|
+
agentB.complete(question),
|
|
169
|
+
]);
|
|
170
|
+
|
|
171
|
+
if (answerA.answer === answerB.answer) {
|
|
172
|
+
return answerA; // Agreement — high confidence
|
|
173
|
+
}
|
|
174
|
+
|
|
175
|
+
// Disagreement — escalate to human or third tiebreaker
|
|
176
|
+
return await supervisor.resolve(question, answerA, answerB);
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
---
|
|
180
|
+
|
|
181
|
+
## Human-in-the-Loop Gates
|
|
182
|
+
|
|
183
|
+
The most important agentic pattern. Agents should request human approval before:
|
|
184
|
+
- Deleting data
|
|
185
|
+
- Sending external communications (emails, webhooks)
|
|
186
|
+
- Spending real money (API calls with cost, purchases)
|
|
187
|
+
- Making irreversible changes
|
|
188
|
+
- Acting on low-confidence decisions
|
|
189
|
+
|
|
190
|
+
```ts
|
|
191
|
+
async function agentLoop(task: string) {
|
|
192
|
+
for (let step = 0; step < MAX_STEPS; step++) {
|
|
193
|
+
const planned = await llm.plan(task, history);
|
|
194
|
+
|
|
195
|
+
// ✅ Human gate before irreversible actions
|
|
196
|
+
if (planned.action.isIrreversible) {
|
|
197
|
+
const approved = await requestHumanApproval({
|
|
198
|
+
action: planned.action,
|
|
199
|
+
reason: planned.reasoning,
|
|
200
|
+
confidence: planned.confidence,
|
|
201
|
+
});
|
|
202
|
+
if (!approved) return { reason: 'human_rejected', step };
|
|
203
|
+
}
|
|
204
|
+
|
|
205
|
+
// ✅ Confidence gate — don't act when uncertain
|
|
206
|
+
if (planned.confidence < 0.7) {
|
|
207
|
+
return {
|
|
208
|
+
reason: 'human_escalation',
|
|
209
|
+
message: `Low confidence (${planned.confidence}) on: ${planned.action.description}`,
|
|
210
|
+
};
|
|
211
|
+
}
|
|
212
|
+
|
|
213
|
+
const result = await executeTool(planned.action.tool, planned.action.args);
|
|
214
|
+
history.push({ action: planned.action, result });
|
|
215
|
+
|
|
216
|
+
if (planned.goalReached) break;
|
|
217
|
+
}
|
|
218
|
+
}
|
|
219
|
+
```
|
|
220
|
+
|
|
221
|
+
---
|
|
222
|
+
|
|
223
|
+
## Guardrails
|
|
224
|
+
|
|
225
|
+
Every production agent needs:
|
|
226
|
+
|
|
227
|
+
```ts
|
|
228
|
+
const guardrails = {
|
|
229
|
+
// Input guardrails — reject bad prompts before they reach the agent
|
|
230
|
+
input: [
|
|
231
|
+
{ check: 'no_prompt_injection', action: 'reject' },
|
|
232
|
+
{ check: 'within_scope', action: 'reject' }, // Off-topic requests
|
|
233
|
+
{ check: 'pii_detection', action: 'redact' }, // Redact before processing
|
|
234
|
+
],
|
|
235
|
+
|
|
236
|
+
// Output guardrails — validate before returning
|
|
237
|
+
output: [
|
|
238
|
+
{ check: 'no_hallucinated_citations', action: 'flag' },
|
|
239
|
+
{ check: 'schema_valid', action: 'retry_once' },
|
|
240
|
+
{ check: 'no_pii_leaked', action: 'reject' },
|
|
241
|
+
],
|
|
242
|
+
|
|
243
|
+
// Resource guardrails — prevent runaway cost/loops
|
|
244
|
+
resource: [
|
|
245
|
+
{ check: 'max_tokens_per_session', limit: 100_000 },
|
|
246
|
+
{ check: 'max_tool_calls_per_session', limit: 50 },
|
|
247
|
+
{ check: 'max_cost_per_session_usd', limit: 1.00 },
|
|
248
|
+
],
|
|
249
|
+
};
|
|
250
|
+
```
|
|
251
|
+
|
|
252
|
+
---
|
|
253
|
+
|
|
254
|
+
## Output Format
|
|
255
|
+
|
|
256
|
+
When this skill completes a task, structure your output as:
|
|
257
|
+
|
|
258
|
+
```
|
|
259
|
+
━━━ Agentic Patterns Output ━━━━━━━━━━━━━━━━━━━━━━━━
|
|
260
|
+
Task: [what was performed]
|
|
261
|
+
Result: [outcome summary — one line]
|
|
262
|
+
─────────────────────────────────────────────────
|
|
263
|
+
Checks: ✅ [N passed] · ⚠️ [N warnings] · ❌ [N blocked]
|
|
264
|
+
VBC status: PENDING → VERIFIED
|
|
265
|
+
Evidence: [link to terminal output, test result, or file diff]
|
|
266
|
+
```
|
|
267
|
+
|
|
268
|
+
|
|
269
|
+
---
|
|
270
|
+
|
|
271
|
+
## 🏛️ Tribunal Integration (Anti-Hallucination)
|
|
272
|
+
|
|
273
|
+
**Slash command: `/review-ai`**
|
|
274
|
+
**Active reviewers: `logic` · `security` · `ai-code-reviewer`**
|
|
275
|
+
|
|
276
|
+
### ❌ Forbidden AI Tropes in Agentic Systems
|
|
277
|
+
|
|
278
|
+
1. **Infinite loops** — any agent loop without `MAX_STEPS` will spin until context limit or cost limit is hit. Always define a hard cap.
|
|
279
|
+
2. **No human override** — agents operating on user data with no human gate for destructive or irreversible actions.
|
|
280
|
+
3. **Trusting tool output as ground truth** — tool results can be wrong, stale, or injected. Always validate before acting on them.
|
|
281
|
+
4. **Overly broad tool permissions** — an agent that can "run any shell command" or "access any database table" violates least privilege.
|
|
282
|
+
5. **No cost cap** — `Promise.all(100 tasks × $0.10 each)` = $10 surprise bill per trigger. Set cost limits at the session level.
|
|
283
|
+
|
|
284
|
+
### ✅ Pre-Flight Self-Audit
|
|
285
|
+
|
|
286
|
+
```
|
|
287
|
+
✅ Is there a hard MAX_STEPS limit on every agent loop?
|
|
288
|
+
✅ Are irreversible actions gated behind human approval?
|
|
289
|
+
✅ Are tool results validated before being acted upon?
|
|
290
|
+
✅ Does each agent follow least-privilege tool access (not "all tools")?
|
|
291
|
+
✅ Is there a per-session token and cost cap?
|
|
292
|
+
✅ Is there an output guardrail checking for hallucinated citations or schema violations?
|
|
293
|
+
```
|
|
294
|
+
|
|
295
|
+
|
|
296
|
+
---
|
|
297
|
+
|
|
298
|
+
## 🤖 LLM-Specific Traps
|
|
299
|
+
|
|
300
|
+
AI coding assistants often fall into specific bad habits when dealing with this domain. These are strictly forbidden:
|
|
301
|
+
|
|
302
|
+
1. **Over-engineering:** Proposing complex abstractions or distributed systems when a simpler approach suffices.
|
|
303
|
+
2. **Hallucinated Libraries/Methods:** Using non-existent methods or packages. Always `// VERIFY` or check `package.json` / `requirements.txt`.
|
|
304
|
+
3. **Skipping Edge Cases:** Writing the "happy path" and ignoring error handling, timeouts, or data validation.
|
|
305
|
+
4. **Context Amnesia:** Forgetting the user's constraints and offering generic advice instead of tailored solutions.
|
|
306
|
+
5. **Silent Degradation:** Catching and suppressing errors without logging or re-raising.
|
|
307
|
+
|
|
308
|
+
---
|
|
309
|
+
|
|
310
|
+
## 🏛️ Tribunal Integration (Anti-Hallucination)
|
|
311
|
+
|
|
312
|
+
**Slash command: `/review` or `/tribunal-full`**
|
|
313
|
+
**Active reviewers: `logic-reviewer` · `security-auditor`**
|
|
314
|
+
|
|
315
|
+
### ❌ Forbidden AI Tropes
|
|
316
|
+
|
|
317
|
+
1. **Blind Assumptions:** Never make an assumption without documenting it clearly with `// VERIFY: [reason]`.
|
|
318
|
+
2. **Silent Degradation:** Catching and suppressing errors without logging or handling.
|
|
319
|
+
3. **Context Amnesia:** Forgetting the user's constraints and offering generic advice instead of tailored solutions.
|
|
320
|
+
|
|
321
|
+
### ✅ Pre-Flight Self-Audit
|
|
322
|
+
|
|
323
|
+
Review these questions before confirming output:
|
|
324
|
+
```
|
|
325
|
+
✅ Did I rely ONLY on real, verified tools and methods?
|
|
326
|
+
✅ Is this solution appropriately scoped to the user's constraints?
|
|
327
|
+
✅ Did I handle potential failure modes and edge cases?
|
|
328
|
+
✅ Have I avoided generic boilerplate that doesn't add value?
|
|
329
|
+
```
|
|
330
|
+
|
|
331
|
+
### 🛑 Verification-Before-Completion (VBC) Protocol
|
|
332
|
+
|
|
333
|
+
**CRITICAL:** You must follow a strict "evidence-based closeout" state machine.
|
|
334
|
+
- ❌ **Forbidden:** Declaring a task complete because the output "looks correct."
|
|
335
|
+
- ✅ **Required:** You are explicitly forbidden from finalizing any task without providing **concrete evidence** (terminal output, passing tests, compile success, or equivalent proof) that your output works as intended.
|