loki-mode 4.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +691 -0
- package/SKILL.md +191 -0
- package/VERSION +1 -0
- package/autonomy/.loki/dashboard/index.html +2634 -0
- package/autonomy/CONSTITUTION.md +508 -0
- package/autonomy/README.md +201 -0
- package/autonomy/config.example.yaml +152 -0
- package/autonomy/loki +526 -0
- package/autonomy/run.sh +3636 -0
- package/bin/loki-mode.js +26 -0
- package/bin/postinstall.js +60 -0
- package/docs/ACKNOWLEDGEMENTS.md +234 -0
- package/docs/COMPARISON.md +325 -0
- package/docs/COMPETITIVE-ANALYSIS.md +333 -0
- package/docs/INSTALLATION.md +547 -0
- package/docs/auto-claude-comparison.md +276 -0
- package/docs/cursor-comparison.md +225 -0
- package/docs/dashboard-guide.md +355 -0
- package/docs/screenshots/README.md +149 -0
- package/docs/screenshots/dashboard-agents.png +0 -0
- package/docs/screenshots/dashboard-tasks.png +0 -0
- package/docs/thick2thin.md +173 -0
- package/package.json +48 -0
- package/references/advanced-patterns.md +453 -0
- package/references/agent-types.md +243 -0
- package/references/agents.md +1043 -0
- package/references/business-ops.md +550 -0
- package/references/competitive-analysis.md +216 -0
- package/references/confidence-routing.md +371 -0
- package/references/core-workflow.md +275 -0
- package/references/cursor-learnings.md +207 -0
- package/references/deployment.md +604 -0
- package/references/lab-research-patterns.md +534 -0
- package/references/mcp-integration.md +186 -0
- package/references/memory-system.md +467 -0
- package/references/openai-patterns.md +647 -0
- package/references/production-patterns.md +568 -0
- package/references/prompt-repetition.md +192 -0
- package/references/quality-control.md +437 -0
- package/references/sdlc-phases.md +410 -0
- package/references/task-queue.md +361 -0
- package/references/tool-orchestration.md +691 -0
- package/skills/00-index.md +120 -0
- package/skills/agents.md +249 -0
- package/skills/artifacts.md +174 -0
- package/skills/github-integration.md +218 -0
- package/skills/model-selection.md +125 -0
- package/skills/parallel-workflows.md +526 -0
- package/skills/patterns-advanced.md +188 -0
- package/skills/production.md +292 -0
- package/skills/quality-gates.md +180 -0
- package/skills/testing.md +149 -0
- package/skills/troubleshooting.md +109 -0
package/package.json
ADDED
|
@@ -0,0 +1,48 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "loki-mode",
|
|
3
|
+
"version": "4.2.0",
|
|
4
|
+
"description": "Multi-agent autonomous startup system for Claude Code",
|
|
5
|
+
"keywords": [
|
|
6
|
+
"claude",
|
|
7
|
+
"claude-code",
|
|
8
|
+
"ai",
|
|
9
|
+
"agent",
|
|
10
|
+
"autonomous",
|
|
11
|
+
"startup",
|
|
12
|
+
"multi-agent"
|
|
13
|
+
],
|
|
14
|
+
"homepage": "https://github.com/asklokesh/loki-mode",
|
|
15
|
+
"bugs": {
|
|
16
|
+
"url": "https://github.com/asklokesh/loki-mode/issues"
|
|
17
|
+
},
|
|
18
|
+
"repository": {
|
|
19
|
+
"type": "git",
|
|
20
|
+
"url": "git+https://github.com/asklokesh/loki-mode.git"
|
|
21
|
+
},
|
|
22
|
+
"license": "MIT",
|
|
23
|
+
"author": "Lokesh",
|
|
24
|
+
"bin": {
|
|
25
|
+
"loki": "./autonomy/loki",
|
|
26
|
+
"loki-mode": "./bin/loki-mode.js"
|
|
27
|
+
},
|
|
28
|
+
"files": [
|
|
29
|
+
"SKILL.md",
|
|
30
|
+
"VERSION",
|
|
31
|
+
"autonomy/",
|
|
32
|
+
"skills/",
|
|
33
|
+
"references/",
|
|
34
|
+
"docs/",
|
|
35
|
+
"bin/"
|
|
36
|
+
],
|
|
37
|
+
"scripts": {
|
|
38
|
+
"postinstall": "node bin/postinstall.js",
|
|
39
|
+
"test": "echo \"Tests run via autonomy/run.sh\""
|
|
40
|
+
},
|
|
41
|
+
"engines": {
|
|
42
|
+
"node": ">=16.0.0"
|
|
43
|
+
},
|
|
44
|
+
"os": [
|
|
45
|
+
"darwin",
|
|
46
|
+
"linux"
|
|
47
|
+
]
|
|
48
|
+
}
|
|
@@ -0,0 +1,453 @@
|
|
|
1
|
+
# Advanced Agentic Patterns Reference
|
|
2
|
+
|
|
3
|
+
Research-backed patterns from 2025-2026 literature for enhanced multi-agent orchestration.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Memory Architecture (MIRIX/A-Mem/MemGPT Research)
|
|
8
|
+
|
|
9
|
+
### Three-Layer Memory System
|
|
10
|
+
|
|
11
|
+
```
|
|
12
|
+
+------------------------------------------------------------------+
|
|
13
|
+
| EPISODIC MEMORY (Specific Events) |
|
|
14
|
+
| - What happened, when, where |
|
|
15
|
+
| - Full interaction traces with timestamps |
|
|
16
|
+
| - Stored in: .loki/memory/episodic/ |
|
|
17
|
+
+------------------------------------------------------------------+
|
|
18
|
+
| SEMANTIC MEMORY (Generalized Knowledge) |
|
|
19
|
+
| - Abstracted patterns and facts |
|
|
20
|
+
| - Context-independent knowledge |
|
|
21
|
+
| - Stored in: .loki/memory/semantic/ |
|
|
22
|
+
+------------------------------------------------------------------+
|
|
23
|
+
| PROCEDURAL MEMORY (Learned Skills) |
|
|
24
|
+
| - How to do things |
|
|
25
|
+
| - Successful action sequences |
|
|
26
|
+
| - Stored in: .loki/memory/skills/ |
|
|
27
|
+
+------------------------------------------------------------------+
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
### Episodic-to-Semantic Consolidation
|
|
31
|
+
|
|
32
|
+
**Protocol:** After completing tasks, consolidate specific experiences into general knowledge.
|
|
33
|
+
|
|
34
|
+
```python
|
|
35
|
+
def consolidate_memory(task_result):
|
|
36
|
+
"""
|
|
37
|
+
Transform episodic (what happened) to semantic (how things work).
|
|
38
|
+
Based on MemGPT and Voyager patterns.
|
|
39
|
+
"""
|
|
40
|
+
# 1. Store raw episodic trace
|
|
41
|
+
episodic_entry = {
|
|
42
|
+
"timestamp": now(),
|
|
43
|
+
"task_id": task_result.id,
|
|
44
|
+
"context": task_result.context,
|
|
45
|
+
"actions": task_result.action_log,
|
|
46
|
+
"outcome": task_result.outcome,
|
|
47
|
+
"errors": task_result.errors
|
|
48
|
+
}
|
|
49
|
+
save_to_episodic(episodic_entry)
|
|
50
|
+
|
|
51
|
+
# 2. Extract generalizable patterns
|
|
52
|
+
if task_result.success:
|
|
53
|
+
pattern = extract_pattern(task_result)
|
|
54
|
+
if pattern.is_generalizable():
|
|
55
|
+
semantic_entry = {
|
|
56
|
+
"pattern": pattern.description,
|
|
57
|
+
"conditions": pattern.when_to_apply,
|
|
58
|
+
"actions": pattern.steps,
|
|
59
|
+
"confidence": pattern.success_rate,
|
|
60
|
+
"source_episodes": [task_result.id]
|
|
61
|
+
}
|
|
62
|
+
save_to_semantic(semantic_entry)
|
|
63
|
+
|
|
64
|
+
# 3. If error, create anti-pattern
|
|
65
|
+
if task_result.errors:
|
|
66
|
+
anti_pattern = {
|
|
67
|
+
"what_failed": task_result.errors[0].message,
|
|
68
|
+
"why_failed": analyze_root_cause(task_result),
|
|
69
|
+
"prevention": generate_prevention_rule(task_result),
|
|
70
|
+
"severity": classify_severity(task_result.errors)
|
|
71
|
+
}
|
|
72
|
+
save_to_learnings(anti_pattern)
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
### Zettelkasten-Inspired Note Linking (A-Mem Pattern)
|
|
76
|
+
|
|
77
|
+
Each memory note is atomic and linked to related notes:
|
|
78
|
+
|
|
79
|
+
```json
|
|
80
|
+
{
|
|
81
|
+
"id": "note-2026-01-06-001",
|
|
82
|
+
"content": "Express route handlers need explicit return types in strict mode",
|
|
83
|
+
"type": "semantic",
|
|
84
|
+
"links": [
|
|
85
|
+
{"to": "note-2026-01-05-042", "relation": "derived_from"},
|
|
86
|
+
{"to": "note-2026-01-06-003", "relation": "related_to"}
|
|
87
|
+
],
|
|
88
|
+
"tags": ["typescript", "express", "strict-mode"],
|
|
89
|
+
"confidence": 0.95,
|
|
90
|
+
"usage_count": 12
|
|
91
|
+
}
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
---
|
|
95
|
+
|
|
96
|
+
## Multi-Agent Reflexion (MAR Pattern)
|
|
97
|
+
|
|
98
|
+
### Problem: Degeneration-of-Thought
|
|
99
|
+
|
|
100
|
+
Single-agent self-critique leads to repeating the same flawed reasoning across iterations.
|
|
101
|
+
|
|
102
|
+
### Solution: Structured Debate Among Persona-Based Critics
|
|
103
|
+
|
|
104
|
+
```
|
|
105
|
+
+------------------+ +------------------+ +------------------+
|
|
106
|
+
| IMPLEMENTER | | SKEPTIC | | ADVOCATE |
|
|
107
|
+
| (Creates work) | --> | (Challenges it) | --> | (Defends merits) |
|
|
108
|
+
+------------------+ +------------------+ +------------------+
|
|
109
|
+
| | |
|
|
110
|
+
v v v
|
|
111
|
+
+------------------------------------------------------------------+
|
|
112
|
+
| SYNTHESIZER |
|
|
113
|
+
| - Weighs all perspectives |
|
|
114
|
+
| - Identifies valid concerns vs. false negatives |
|
|
115
|
+
| - Produces final verdict with evidence |
|
|
116
|
+
+------------------------------------------------------------------+
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
### Anti-Sycophancy Protocol (CONSENSAGENT)
|
|
120
|
+
|
|
121
|
+
**Problem:** Agents reinforce each other's responses instead of critically engaging.
|
|
122
|
+
|
|
123
|
+
**Solution:**
|
|
124
|
+
|
|
125
|
+
```python
|
|
126
|
+
def anti_sycophancy_review(implementation, reviewers):
|
|
127
|
+
"""
|
|
128
|
+
Prevent reviewers from just agreeing with each other.
|
|
129
|
+
Based on CONSENSAGENT research.
|
|
130
|
+
"""
|
|
131
|
+
# 1. Independent review phase (no visibility of other reviews)
|
|
132
|
+
independent_reviews = []
|
|
133
|
+
for reviewer in reviewers:
|
|
134
|
+
review = reviewer.review(
|
|
135
|
+
implementation,
|
|
136
|
+
visibility="blind", # Cannot see other reviews
|
|
137
|
+
prompt_suffix="Be skeptical. List specific concerns."
|
|
138
|
+
)
|
|
139
|
+
independent_reviews.append(review)
|
|
140
|
+
|
|
141
|
+
# 2. Debate phase (now reveal reviews)
|
|
142
|
+
if has_disagreement(independent_reviews):
|
|
143
|
+
debate_result = structured_debate(
|
|
144
|
+
reviews=independent_reviews,
|
|
145
|
+
max_rounds=2,
|
|
146
|
+
require_evidence=True # Must cite specific code/lines
|
|
147
|
+
)
|
|
148
|
+
else:
|
|
149
|
+
# All agreed - run devil's advocate check
|
|
150
|
+
devil_review = devil_advocate_agent.review(
|
|
151
|
+
implementation,
|
|
152
|
+
prompt="Find problems the other reviewers missed. Be contrarian."
|
|
153
|
+
)
|
|
154
|
+
independent_reviews.append(devil_review)
|
|
155
|
+
|
|
156
|
+
# 3. Synthesize with validity check
|
|
157
|
+
return synthesize_with_validity_alignment(independent_reviews)
|
|
158
|
+
|
|
159
|
+
def synthesize_with_validity_alignment(reviews):
|
|
160
|
+
"""
|
|
161
|
+
Research shows validity-aligned reasoning most strongly predicts improvement.
|
|
162
|
+
"""
|
|
163
|
+
findings = []
|
|
164
|
+
for review in reviews:
|
|
165
|
+
for concern in review.concerns:
|
|
166
|
+
findings.append({
|
|
167
|
+
"concern": concern.description,
|
|
168
|
+
"evidence": concern.code_reference, # Must have evidence
|
|
169
|
+
"severity": concern.severity,
|
|
170
|
+
"is_valid": verify_concern_is_actionable(concern)
|
|
171
|
+
})
|
|
172
|
+
|
|
173
|
+
# Filter to only valid, evidenced concerns
|
|
174
|
+
return [f for f in findings if f["is_valid"] and f["evidence"]]
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
### Heterogeneous Team Composition
|
|
178
|
+
|
|
179
|
+
**Research finding:** Diverse teams outperform homogeneous ones by 4-6%.
|
|
180
|
+
|
|
181
|
+
```yaml
|
|
182
|
+
review_team:
|
|
183
|
+
- role: "security_analyst"
|
|
184
|
+
model: opus
|
|
185
|
+
expertise: ["OWASP", "auth", "injection"]
|
|
186
|
+
personality: "paranoid"
|
|
187
|
+
|
|
188
|
+
- role: "performance_engineer"
|
|
189
|
+
model: sonnet
|
|
190
|
+
expertise: ["complexity", "caching", "async"]
|
|
191
|
+
personality: "pragmatic"
|
|
192
|
+
|
|
193
|
+
- role: "maintainability_advocate"
|
|
194
|
+
model: opus
|
|
195
|
+
expertise: ["SOLID", "patterns", "readability"]
|
|
196
|
+
personality: "perfectionist"
|
|
197
|
+
```
|
|
198
|
+
|
|
199
|
+
---
|
|
200
|
+
|
|
201
|
+
## Hierarchical Planning (GoalAct/TMS Patterns)
|
|
202
|
+
|
|
203
|
+
### Global Planning with Hierarchical Execution
|
|
204
|
+
|
|
205
|
+
**Research:** GoalAct achieved 12.22% improvement in success rate using this pattern.
|
|
206
|
+
|
|
207
|
+
```
|
|
208
|
+
+------------------------------------------------------------------+
|
|
209
|
+
| GLOBAL PLANNER |
|
|
210
|
+
| - Maintains overall goal and strategy |
|
|
211
|
+
| - Continuously updates plan based on progress |
|
|
212
|
+
| - Decomposes into high-level skills |
|
|
213
|
+
+------------------------------------------------------------------+
|
|
214
|
+
|
|
|
215
|
+
v
|
|
216
|
+
+------------------------------------------------------------------+
|
|
217
|
+
| HIGH-LEVEL SKILLS |
|
|
218
|
+
| - searching, coding, testing, writing, deploying |
|
|
219
|
+
| - Each skill has defined entry/exit conditions |
|
|
220
|
+
| - Reduces planning complexity at execution level |
|
|
221
|
+
+------------------------------------------------------------------+
|
|
222
|
+
|
|
|
223
|
+
v
|
|
224
|
+
+------------------------------------------------------------------+
|
|
225
|
+
| LOCAL EXECUTORS |
|
|
226
|
+
| - Execute specific actions within skill context |
|
|
227
|
+
| - Report progress back to global planner |
|
|
228
|
+
| - Can request skill escalation if blocked |
|
|
229
|
+
+------------------------------------------------------------------+
|
|
230
|
+
```
|
|
231
|
+
|
|
232
|
+
### Thought Management System (TMS)
|
|
233
|
+
|
|
234
|
+
**For long-horizon tasks:**
|
|
235
|
+
|
|
236
|
+
```python
|
|
237
|
+
class ThoughtManagementSystem:
|
|
238
|
+
"""
|
|
239
|
+
Based on TMS research for long-horizon autonomous tasks.
|
|
240
|
+
Enables dynamic prioritization and adaptive strategy.
|
|
241
|
+
"""
|
|
242
|
+
|
|
243
|
+
def __init__(self, completion_promise):
|
|
244
|
+
self.goal_hierarchy = self.decompose_goal(completion_promise)
|
|
245
|
+
self.active_thoughts = PriorityQueue()
|
|
246
|
+
self.completed_thoughts = []
|
|
247
|
+
self.blocked_thoughts = []
|
|
248
|
+
|
|
249
|
+
def decompose_goal(self, goal):
|
|
250
|
+
"""
|
|
251
|
+
Hierarchical goal decomposition with self-critique.
|
|
252
|
+
"""
|
|
253
|
+
# Level 0: Ultimate goal
|
|
254
|
+
hierarchy = {"goal": goal, "subgoals": []}
|
|
255
|
+
|
|
256
|
+
# Level 1: Phase-level subgoals
|
|
257
|
+
phases = self.identify_phases(goal)
|
|
258
|
+
for phase in phases:
|
|
259
|
+
phase_node = {"goal": phase, "subgoals": []}
|
|
260
|
+
|
|
261
|
+
# Level 2: Task-level subgoals
|
|
262
|
+
tasks = self.identify_tasks(phase)
|
|
263
|
+
for task in tasks:
|
|
264
|
+
phase_node["subgoals"].append({"goal": task, "subgoals": []})
|
|
265
|
+
|
|
266
|
+
hierarchy["subgoals"].append(phase_node)
|
|
267
|
+
|
|
268
|
+
return hierarchy
|
|
269
|
+
|
|
270
|
+
def iterate(self):
|
|
271
|
+
"""
|
|
272
|
+
Single iteration with self-critique.
|
|
273
|
+
"""
|
|
274
|
+
# 1. Select highest priority thought
|
|
275
|
+
thought = self.active_thoughts.pop()
|
|
276
|
+
|
|
277
|
+
# 2. Execute thought
|
|
278
|
+
result = self.execute(thought)
|
|
279
|
+
|
|
280
|
+
# 3. Self-critique: Did this make progress?
|
|
281
|
+
critique = self.self_critique(thought, result)
|
|
282
|
+
|
|
283
|
+
# 4. Adapt strategy based on critique
|
|
284
|
+
if critique.made_progress:
|
|
285
|
+
self.completed_thoughts.append(thought)
|
|
286
|
+
self.generate_next_thoughts(thought, result)
|
|
287
|
+
elif critique.is_blocked:
|
|
288
|
+
self.blocked_thoughts.append(thought)
|
|
289
|
+
self.escalate_or_decompose(thought)
|
|
290
|
+
else:
|
|
291
|
+
# No progress, not blocked - need different approach
|
|
292
|
+
thought.attempts += 1
|
|
293
|
+
thought.alternative_strategy = critique.suggested_alternative
|
|
294
|
+
self.active_thoughts.push(thought)
|
|
295
|
+
```
|
|
296
|
+
|
|
297
|
+
---
|
|
298
|
+
|
|
299
|
+
## Iter-VF: Iterative Verification-First
|
|
300
|
+
|
|
301
|
+
**Key insight:** Verify the extracted answer only, not the whole thinking process.
|
|
302
|
+
|
|
303
|
+
```python
|
|
304
|
+
def iterative_verify_first(task, max_iterations=3):
|
|
305
|
+
"""
|
|
306
|
+
Based on Iter-VF research: verify answer, maintain Markovian process.
|
|
307
|
+
Avoids context overflow and error accumulation.
|
|
308
|
+
"""
|
|
309
|
+
for iteration in range(max_iterations):
|
|
310
|
+
# 1. Generate solution
|
|
311
|
+
solution = generate_solution(task)
|
|
312
|
+
|
|
313
|
+
# 2. Extract concrete answer/output
|
|
314
|
+
answer = extract_answer(solution)
|
|
315
|
+
|
|
316
|
+
# 3. Verify ONLY the answer (not reasoning chain)
|
|
317
|
+
verification = verify_answer(
|
|
318
|
+
answer=answer,
|
|
319
|
+
spec=task.spec,
|
|
320
|
+
tests=task.tests
|
|
321
|
+
)
|
|
322
|
+
|
|
323
|
+
if verification.passes:
|
|
324
|
+
return solution
|
|
325
|
+
|
|
326
|
+
# 4. Markovian retry: fresh context with just error info
|
|
327
|
+
task = create_fresh_task(
|
|
328
|
+
original=task,
|
|
329
|
+
error=verification.error,
|
|
330
|
+
attempt=iteration + 1
|
|
331
|
+
# NOTE: Do NOT include previous reasoning chain
|
|
332
|
+
)
|
|
333
|
+
|
|
334
|
+
return FailedResult(task, "Max iterations reached")
|
|
335
|
+
```
|
|
336
|
+
|
|
337
|
+
---
|
|
338
|
+
|
|
339
|
+
## Collaboration Structures
|
|
340
|
+
|
|
341
|
+
### When to Use Each Structure
|
|
342
|
+
|
|
343
|
+
| Structure | Use When | Loki Mode Application |
|
|
344
|
+
|-----------|----------|----------------------|
|
|
345
|
+
| **Centralized** | Need consistency, single source of truth | Orchestrator for phase management |
|
|
346
|
+
| **Decentralized** | Need fault tolerance, parallel execution | Agent swarms for implementation |
|
|
347
|
+
| **Hierarchical** | Complex tasks with clear decomposition | Global planner -> Skill -> Executor |
|
|
348
|
+
|
|
349
|
+
### Coopetition Pattern
|
|
350
|
+
|
|
351
|
+
**Agents compete on alternatives, cooperate on consensus:**
|
|
352
|
+
|
|
353
|
+
```python
|
|
354
|
+
def coopetition_decision(agents, decision_point):
|
|
355
|
+
"""
|
|
356
|
+
Competition phase: Generate diverse alternatives
|
|
357
|
+
Cooperation phase: Reach consensus on best option
|
|
358
|
+
"""
|
|
359
|
+
# COMPETITION: Each agent proposes solution independently
|
|
360
|
+
proposals = []
|
|
361
|
+
for agent in agents:
|
|
362
|
+
proposal = agent.propose(
|
|
363
|
+
decision_point,
|
|
364
|
+
visibility="blind" # No peeking at other proposals
|
|
365
|
+
)
|
|
366
|
+
proposals.append(proposal)
|
|
367
|
+
|
|
368
|
+
# COOPERATION: Collaborative evaluation
|
|
369
|
+
if len(set(p.approach for p in proposals)) == 1:
|
|
370
|
+
# Unanimous - likely good solution
|
|
371
|
+
return proposals[0]
|
|
372
|
+
|
|
373
|
+
# Multiple approaches - structured debate
|
|
374
|
+
for proposal in proposals:
|
|
375
|
+
proposal.pros = evaluate_pros(proposal)
|
|
376
|
+
proposal.cons = evaluate_cons(proposal)
|
|
377
|
+
proposal.evidence = gather_evidence(proposal)
|
|
378
|
+
|
|
379
|
+
# Vote with reasoning requirement
|
|
380
|
+
winner = ranked_choice_vote(
|
|
381
|
+
proposals,
|
|
382
|
+
require_justification=True
|
|
383
|
+
)
|
|
384
|
+
|
|
385
|
+
return winner
|
|
386
|
+
```
|
|
387
|
+
|
|
388
|
+
---
|
|
389
|
+
|
|
390
|
+
## Progressive Complexity Escalation
|
|
391
|
+
|
|
392
|
+
**Start simple, escalate only when needed:**
|
|
393
|
+
|
|
394
|
+
```
|
|
395
|
+
Level 1: Single Agent, Direct Execution
|
|
396
|
+
|
|
|
397
|
+
+-- Success? --> Done
|
|
398
|
+
|
|
|
399
|
+
+-- Failure? --> Escalate
|
|
400
|
+
|
|
|
401
|
+
v
|
|
402
|
+
Level 2: Single Agent + Self-Verification Loop
|
|
403
|
+
|
|
|
404
|
+
+-- Success? --> Done
|
|
405
|
+
|
|
|
406
|
+
+-- Failure after 3 attempts? --> Escalate
|
|
407
|
+
|
|
|
408
|
+
v
|
|
409
|
+
Level 3: Multi-Agent Review
|
|
410
|
+
|
|
|
411
|
+
+-- Success? --> Done
|
|
412
|
+
|
|
|
413
|
+
+-- Persistent issues? --> Escalate
|
|
414
|
+
|
|
|
415
|
+
v
|
|
416
|
+
Level 4: Hierarchical Planning + Decomposition
|
|
417
|
+
|
|
|
418
|
+
+-- Success? --> Done
|
|
419
|
+
|
|
|
420
|
+
+-- Fundamental blocker? --> Human escalation
|
|
421
|
+
```
|
|
422
|
+
|
|
423
|
+
---
|
|
424
|
+
|
|
425
|
+
## Key Research Findings Summary
|
|
426
|
+
|
|
427
|
+
### What Works
|
|
428
|
+
|
|
429
|
+
1. **Heterogeneous teams** outperform homogeneous by 4-6%
|
|
430
|
+
2. **Iter-VF** (verify answer only) prevents context overflow
|
|
431
|
+
3. **Episodic-to-semantic consolidation** enables genuine learning
|
|
432
|
+
4. **Anti-sycophancy measures** (blind review, devil's advocate) improve accuracy 30%+
|
|
433
|
+
5. **Global planning** with local execution improves success rate 12%+
|
|
434
|
+
|
|
435
|
+
### What Doesn't Work
|
|
436
|
+
|
|
437
|
+
1. **Deep debate chains** - diminishing returns after 1-2 rounds
|
|
438
|
+
2. **Confidence visibility** - causes over-confidence cascades
|
|
439
|
+
3. **Full reasoning chain review** - leads to error accumulation
|
|
440
|
+
4. **Homogeneous reviewer teams** - miss diverse failure modes
|
|
441
|
+
5. **Over-engineered orchestration** - model upgrades outpace gains
|
|
442
|
+
|
|
443
|
+
---
|
|
444
|
+
|
|
445
|
+
## Sources
|
|
446
|
+
|
|
447
|
+
- [Multi-Agent Collaboration Mechanisms Survey](https://arxiv.org/abs/2501.06322)
|
|
448
|
+
- [CONSENSAGENT: Anti-Sycophancy Framework](https://aclanthology.org/2025.findings-acl.1141/)
|
|
449
|
+
- [GoalAct: Global Planning + Hierarchical Execution](https://arxiv.org/abs/2504.16563)
|
|
450
|
+
- [A-Mem: Agentic Memory System](https://arxiv.org/html/2502.12110v11)
|
|
451
|
+
- [Multi-Agent Reflexion (MAR)](https://arxiv.org/html/2512.20845)
|
|
452
|
+
- [Iter-VF: Iterative Verification-First](https://arxiv.org/html/2511.21734v1)
|
|
453
|
+
- [Awesome Agentic Patterns](https://github.com/nibzard/awesome-agentic-patterns)
|