@moon791017/neo-skills 1.1.10 → 1.1.12
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +2 -0
- package/package.json +1 -1
- package/skills/neo-agent-harness/reference/loop-engineering.md +91 -91
- package/skills/neo-agentic-design/SKILL.md +89 -0
- package/skills/neo-agentic-design/evals/eval_queries.json +58 -0
- package/skills/neo-agentic-design/evals/evals.json +27 -0
- package/skills/neo-agentic-design/references/advanced-safety.md +158 -0
- package/skills/neo-agentic-design/references/base-workflows.md +219 -0
- package/skills/neo-agentic-design/references/resilience-hitl.md +105 -0
- package/skills/neo-agentic-design/references/system-components.md +93 -0
package/README.md
CHANGED
|
@@ -59,6 +59,7 @@
|
|
|
59
59
|
| TypeScript | `neo-typescript` | 處理 TypeScript、tsconfig、strict mode、泛型、conditional/mapped/template literal types、ESM/CJS 與 runtime boundaries。 |
|
|
60
60
|
| Vue | `neo-vue` | 建置、除錯、重構或審查 Vue 3、SFC、Composition API、Pinia、Vue Router、Vite 與 Vue+TypeScript。 |
|
|
61
61
|
| Agent 架構 | `neo-sub-agent` | 設計、建立、審查或轉換 sub-agent、custom agent、worker/reviewer/planner agent 或 multi-agent workflow。 |
|
|
62
|
+
| Agent 架構 | `neo-agentic-design` | 設計、評估或實作 Agent 工作流、提示詞鏈、路由、規劃、反思、多 Agent 協作與記憶體管理等框架無關模式。 |
|
|
62
63
|
| 文字潤飾 | `neo-stop-slop` | 去除繁中或英文中的 AI 腔、贅詞、公式化句式,支援文件、註解、commit message 與 PR 說明。 |
|
|
63
64
|
|
|
64
65
|
## 安裝
|
|
@@ -153,6 +154,7 @@ npx -p @moon791017/neo-skills install-system-instructions \
|
|
|
153
154
|
| 建 Vue 3 元件 | `neo-vue` | `幫我重構這個 SFC,避免響應式踩坑` |
|
|
154
155
|
| 改善 AI 開發流程 | `neo-agent-harness` | `評估這個專案讓 coding agent 協作的可靠度` |
|
|
155
156
|
| 建立 sub-agent | `neo-sub-agent` | `幫我新增一個 Codex code-reviewer sub agent` |
|
|
157
|
+
| 設計 Agent 編排架構 | `neo-agentic-design` | `幫我設計一個多 Agent 客服系統的拓撲結構與重試機制` |
|
|
156
158
|
| 去掉 AI 腔 | `neo-stop-slop` | `把這段 PR 說明改得自然、直接一點` |
|
|
157
159
|
|
|
158
160
|
## 開發
|
package/package.json
CHANGED
|
@@ -2,11 +2,11 @@
|
|
|
2
2
|
|
|
3
3
|
Use this reference when designing loop architectures that automate agent-driven workflows beyond a single session.
|
|
4
4
|
|
|
5
|
-
##
|
|
5
|
+
## Relationship Between Loops and Harnesses
|
|
6
6
|
|
|
7
|
-
- Harness =
|
|
8
|
-
- Loop = harness
|
|
9
|
-
-
|
|
7
|
+
- Harness = the working environment for a single agent (guides + sensors + gates)
|
|
8
|
+
- Loop = the scheduling layer on top of the harness that lets the harness run itself
|
|
9
|
+
- Designing a loop does not replace prompts; it systematizes repetitive prompt actions
|
|
10
10
|
|
|
11
11
|
```text
|
|
12
12
|
Loop = Automations + Worktrees + Skills + Connectors + Sub-agents + State
|
|
@@ -14,136 +14,136 @@ Loop = Automations + Worktrees + Skills + Connectors + Sub-agents + State
|
|
|
14
14
|
running on top of the Harness
|
|
15
15
|
```
|
|
16
16
|
|
|
17
|
-
##
|
|
17
|
+
## Five Primitives + State
|
|
18
18
|
|
|
19
|
-
### 1. Automations
|
|
19
|
+
### 1. Automations (Heartbeat)
|
|
20
20
|
|
|
21
|
-
|
|
21
|
+
A loop without automations runs only once; automations make it repeat.
|
|
22
22
|
|
|
23
|
-
-
|
|
24
|
-
-
|
|
25
|
-
-
|
|
26
|
-
- `/loop`
|
|
23
|
+
- Schedule-driven triggers that periodically run exploration and classification.
|
|
24
|
+
- Findings go to the triage inbox; non-findings are auto-archived.
|
|
25
|
+
- Pair with skills to keep scheduled tasks maintainable—invoke `$skill-name` instead of pasting a wall of instructions.
|
|
26
|
+
- `/loop` repeats at a set frequency; `/goal` runs until a stop condition is met, with an independent model judging completion.
|
|
27
27
|
|
|
28
|
-
|
|
28
|
+
Tool mapping:
|
|
29
29
|
|
|
30
|
-
- Codex
|
|
31
|
-
- Claude Code
|
|
30
|
+
- Codex: Automations tab (select project, prompt, frequency, environment); results go to Triage inbox; `/goal` for run-until-done.
|
|
31
|
+
- Claude Code: `/loop`, `/goal`, hooks, cron, GitHub Actions.
|
|
32
32
|
|
|
33
|
-
### 2. Worktrees
|
|
33
|
+
### 2. Worktrees (Isolation)
|
|
34
34
|
|
|
35
|
-
|
|
35
|
+
Prevent file conflicts when multiple agents run in parallel.
|
|
36
36
|
|
|
37
|
-
-
|
|
38
|
-
-
|
|
39
|
-
-
|
|
37
|
+
- Each agent works in its own git worktree, sharing repo history.
|
|
38
|
+
- One agent's edits never touch another agent's checkout.
|
|
39
|
+
- Human review bandwidth is still the bottleneck—worktrees solve mechanical conflicts, but the number of agents you can run is limited by how many threads you can review simultaneously (orchestration tax).
|
|
40
40
|
|
|
41
|
-
|
|
41
|
+
Tool mapping:
|
|
42
42
|
|
|
43
|
-
- Codex
|
|
44
|
-
- Claude Code
|
|
43
|
+
- Codex: Built-in worktree per thread.
|
|
44
|
+
- Claude Code: `git worktree`, `--worktree` flag, subagent `isolation: worktree` setting.
|
|
45
45
|
|
|
46
|
-
### 3. Skills
|
|
46
|
+
### 3. Skills (Crystallized Knowledge)
|
|
47
47
|
|
|
48
|
-
|
|
48
|
+
Write repeatedly explained project context into a SKILL.md.
|
|
49
49
|
|
|
50
|
-
-
|
|
51
|
-
-
|
|
52
|
-
-
|
|
50
|
+
- Eliminate intent debt: on every cold start, an agent fills intent gaps with confident guesses. A skill externalizes intent so the agent reads it every time instead of reconstructing it.
|
|
51
|
+
- A loop without skills re-derives your entire project from scratch each cycle; a loop with skills carries forward knowledge from the last run.
|
|
52
|
+
- A skill is an authoring format; a plugin is a distribution format—package skills as plugins when sharing across repos.
|
|
53
53
|
|
|
54
|
-
|
|
54
|
+
Tool mapping:
|
|
55
55
|
|
|
56
|
-
- Codex
|
|
57
|
-
- Claude Code
|
|
56
|
+
- Codex: Agent Skills (`SKILL.md`), invoked via `$name` or `/skills`, or auto-triggered by description.
|
|
57
|
+
- Claude Code: Agent Skills (`SKILL.md`).
|
|
58
58
|
|
|
59
|
-
### 4. Plugins / Connectors
|
|
59
|
+
### 4. Plugins / Connectors (External Integration)
|
|
60
60
|
|
|
61
|
-
|
|
61
|
+
Connect external tools via MCP so the loop can act in real environments.
|
|
62
62
|
|
|
63
|
-
-
|
|
64
|
-
- Codex
|
|
65
|
-
- Plugins
|
|
63
|
+
- Can connect to issue trackers, databases, staging APIs, Slack.
|
|
64
|
+
- Both Codex and Claude Code use MCP; connectors are generally cross-tool portable.
|
|
65
|
+
- Plugins bundle connectors and skills together for one-step team installation.
|
|
66
66
|
|
|
67
|
-
|
|
67
|
+
A loop without connectors can only output suggestions; a loop with connectors can open PRs, link tickets, and ping channels directly.
|
|
68
68
|
|
|
69
|
-
### 5. Sub-agents
|
|
69
|
+
### 5. Sub-agents (Separating Generation from Verification)
|
|
70
70
|
|
|
71
|
-
|
|
71
|
+
The structural premise of a loop is separating maker from checker.
|
|
72
72
|
|
|
73
|
-
-
|
|
74
|
-
- `/goal`
|
|
75
|
-
-
|
|
76
|
-
- Sub-agents
|
|
73
|
+
- The model that writes the code grades its own work too leniently. A second agent with different instructions (sometimes a different model) catches issues the first agent convinced itself to accept.
|
|
74
|
+
- `/goal` also uses maker/checker separation under the hood—an independent small model judges whether the loop is done, rather than letting the working agent declare itself finished.
|
|
75
|
+
- Common division of labor: one explores, one implements, one verifies against spec.
|
|
76
|
+
- Sub-agents burn more tokens; spend them where a second opinion is worthwhile.
|
|
77
77
|
|
|
78
|
-
>
|
|
78
|
+
> **Responsibility boundary**: This section only covers the design rationale for why loops need maker/checker separation. For implementation details such as sub-agent definition format, instruction writing, and model selection, use the `neo-sub-agent` skill.
|
|
79
79
|
|
|
80
|
-
|
|
80
|
+
Tool mapping:
|
|
81
81
|
|
|
82
|
-
- Codex
|
|
83
|
-
- Claude Code
|
|
82
|
+
- Codex: TOML definition files under `.codex/agents/`, each with name, description, instructions, optional model, and reasoning effort.
|
|
83
|
+
- Claude Code: Subagent definitions under `.claude/agents/` + agent teams.
|
|
84
84
|
|
|
85
|
-
### 6. State
|
|
85
|
+
### 6. State (External Memory)
|
|
86
86
|
|
|
87
|
-
|
|
87
|
+
Models forget between conversations; progress must be written to the repo.
|
|
88
88
|
|
|
89
|
-
-
|
|
90
|
-
- State
|
|
89
|
+
- Format: markdown files, Linear boards, or any persistent store outside the conversation.
|
|
90
|
+
- State tracks what was done, what passed, and what remains. Every long-running agent depends on it: agents forget, repos don't.
|
|
91
91
|
|
|
92
|
-
##
|
|
92
|
+
## Primitives Comparison Table
|
|
93
93
|
|
|
94
|
-
|
|
|
94
|
+
| Primitive | Role in Loop | Codex | Claude Code |
|
|
95
95
|
|:--|:--|:--|:--|
|
|
96
|
-
| Automations |
|
|
97
|
-
| Worktrees |
|
|
98
|
-
| Skills |
|
|
99
|
-
| Plugins / Connectors |
|
|
100
|
-
| Sub-agents |
|
|
101
|
-
| State |
|
|
96
|
+
| Automations | Scheduled exploration and classification | Automations tab, `/goal` | `/loop`, `/goal`, hooks, cron, GitHub Actions |
|
|
97
|
+
| Worktrees | Parallel isolation | Built-in worktree per thread | `git worktree`, `--worktree`, `isolation: worktree` |
|
|
98
|
+
| Skills | Crystallized project knowledge | Agent Skills (`SKILL.md`), `$name` | Agent Skills (`SKILL.md`) |
|
|
99
|
+
| Plugins / Connectors | External tool integration | Connectors (MCP) + Plugins | MCP servers + Plugins |
|
|
100
|
+
| Sub-agents | Separating generation from verification | `.codex/agents/` TOML | `.claude/agents/` + agent teams |
|
|
101
|
+
| State | Cross-conversation progress | Markdown / Linear connector | Markdown (`AGENTS.md`, progress files) / Linear MCP |
|
|
102
102
|
|
|
103
|
-
##
|
|
103
|
+
## Example: A Complete Loop Flow
|
|
104
104
|
|
|
105
|
-
1. **Automation**
|
|
106
|
-
2. Triage skill
|
|
107
|
-
3.
|
|
108
|
-
4.
|
|
109
|
-
5.
|
|
110
|
-
6.
|
|
111
|
-
7. **Connectors**
|
|
112
|
-
8.
|
|
113
|
-
9. **
|
|
114
|
-
10.
|
|
105
|
+
1. **Automation** runs on the repo every morning; prompt invokes the triage skill.
|
|
106
|
+
2. Triage skill reads yesterday's CI failures, open issues, and recent commits.
|
|
107
|
+
3. Noteworthy findings are written to a **state file** or Linear board.
|
|
108
|
+
4. For each finding, an isolated **worktree** is created.
|
|
109
|
+
5. A **sub-agent** (maker) is sent into the worktree to draft a fix.
|
|
110
|
+
6. A second **sub-agent** (checker) reviews the draft using project **skills** and existing tests.
|
|
111
|
+
7. **Connectors** open a PR, update the ticket, and ping the channel once CI passes.
|
|
112
|
+
8. Findings that cannot be handled are sent to the triage inbox for humans.
|
|
113
|
+
9. The **state file** records what was attempted, what passed, and what remains open.
|
|
114
|
+
10. Tomorrow morning's run picks up from state.
|
|
115
115
|
|
|
116
|
-
|
|
116
|
+
You design it once; after that, you never manually prompt any step.
|
|
117
117
|
|
|
118
|
-
## Loop
|
|
118
|
+
## Three Major Loop Risks
|
|
119
119
|
|
|
120
|
-
### 1.
|
|
120
|
+
### 1. Verification Is Still on You
|
|
121
121
|
|
|
122
|
-
|
|
122
|
+
An unattended loop also makes mistakes unattended. Maker/checker separation is necessary but not sufficient—"done" is a claim, not a proof. Your job is to ship code you have confirmed works.
|
|
123
123
|
|
|
124
|
-
### 2.
|
|
124
|
+
### 2. Comprehension Debt
|
|
125
125
|
|
|
126
|
-
|
|
126
|
+
The faster a loop produces code you didn't write, the larger your understanding gap grows. Unless you read what the loop produces, comprehension debt only accelerates.
|
|
127
127
|
|
|
128
|
-
### 3.
|
|
128
|
+
### 3. Cognitive Surrender
|
|
129
129
|
|
|
130
|
-
|
|
130
|
+
When a loop runs itself, people easily stop having opinions and accept everything at face value. The same loop design, used by someone with judgment, accelerates deeply understood work; used by someone without judgment, it becomes a way to avoid understanding the work itself—same action, opposite outcomes.
|
|
131
131
|
|
|
132
|
-
###
|
|
132
|
+
### Risk Mitigation Strategies
|
|
133
133
|
|
|
134
|
-
-
|
|
135
|
-
-
|
|
136
|
-
-
|
|
137
|
-
-
|
|
138
|
-
-
|
|
134
|
+
- Periodically spot-check loop output; don't rely solely on green CI.
|
|
135
|
+
- Set output volume caps on the loop to prevent review backlog from spiraling.
|
|
136
|
+
- Record the timestamp of the last human review in the state file.
|
|
137
|
+
- Force high-risk changes (security, compliance, product scope) to exit the loop and wait for a human.
|
|
138
|
+
- Regularly feed loop error patterns back to improve the harness (agentic flywheel).
|
|
139
139
|
|
|
140
|
-
##
|
|
140
|
+
## When to Introduce a Loop vs. Stay with the Harness
|
|
141
141
|
|
|
142
|
-
|
|
|
142
|
+
| Condition | Recommendation |
|
|
143
143
|
|:--|:--|
|
|
144
|
-
|
|
|
145
|
-
| CI
|
|
146
|
-
|
|
|
147
|
-
| Maturity Level < 3 |
|
|
148
|
-
|
|
|
149
|
-
|
|
|
144
|
+
| Project lacks reliable local verification commands | Build the harness first |
|
|
145
|
+
| CI is unstable or frequently red | Fix CI first |
|
|
146
|
+
| Team has no review process for agent output | Establish a review process first |
|
|
147
|
+
| Maturity Level < 3 | Upgrade the harness first |
|
|
148
|
+
| Highly repetitive, low-risk tasks (triage, format fixes, dependency updates) | Good fit for a loop |
|
|
149
|
+
| Changes involve product scope, security, or architecture trade-offs | Not suitable for a fully automated loop |
|
|
@@ -0,0 +1,89 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: neo-agentic-design
|
|
3
|
+
description: >
|
|
4
|
+
Use this skill when designing, evaluating, or implementing Agent workflows, prompt chains, routing, planning, reflection, multi-agent collaboration, memory management, or other framework-agnostic LLM orchestration patterns.
|
|
5
|
+
license: MIT
|
|
6
|
+
compatibility: No specific language runtime required; conceptual-only patterns.
|
|
7
|
+
metadata:
|
|
8
|
+
version: "1.0.0"
|
|
9
|
+
type: "conceptual-design"
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
# Neo Agentic Design
|
|
13
|
+
|
|
14
|
+
This skill provides architectural concepts and orchestration patterns for building LLM Agent systems. It covers 21 core design patterns categorized into four themes. The orchestration logic remains abstract and independent of specific programming languages or frameworks.
|
|
15
|
+
|
|
16
|
+
## Gotchas
|
|
17
|
+
* **Over-engineering**: Prioritize simple prompt chains (Chapter 1) or routing (Chapter 2). Use complex multi-agent collaboration (Chapter 7) or hierarchical networks only when necessary to reduce token overhead.
|
|
18
|
+
* **Reflection Infinite Loops**: When implementing reflection (Chapter 4) or self-correction (Chapter 12), enforce a maximum iteration limit (e.g., 3-5 iterations) to prevent the LLM from getting stuck in an infinite loop.
|
|
19
|
+
* **Blocking Operations**: High-risk operations (such as direct database deletions or large fund transfers) must include a Human-in-the-Loop review gate (Chapter 13).
|
|
20
|
+
* **Context Pruning State Loss**: When compressing context, protect critical agent instructions from being pruned to prevent behavioral degradation.
|
|
21
|
+
|
|
22
|
+
## Workflow Checklist
|
|
23
|
+
Progress:
|
|
24
|
+
- [ ] Step 1: Analyze Requirements (define objectives, inputs, constraints, and complexity levels).
|
|
25
|
+
- [ ] Step 2: Select Orchestration Patterns (load corresponding reference documents based on requirements).
|
|
26
|
+
- [ ] Step 3: Plan System Components (determine memory, learning mechanisms, and protocol specifications).
|
|
27
|
+
- [ ] Step 4: Define Resilience and Safety (establish exception handling, human review gates, and input/output guardrails).
|
|
28
|
+
- [ ] Step 5: Draft Design Proposal (create system topology diagrams and describe the architecture).
|
|
29
|
+
|
|
30
|
+
## Detailed Guidelines
|
|
31
|
+
|
|
32
|
+
### Step 1 — Analyze Requirements
|
|
33
|
+
Evaluate problem complexity (Level 1, 2, or 3) and confirm:
|
|
34
|
+
1. **Latency Sensitivity**: For low-latency requirements, prioritize parallelization (Chapter 3) and routing (Chapter 2).
|
|
35
|
+
2. **Task Fragility**: For strict sequential tasks or error-prone processes, use chaining (Chapter 1) or planning (Chapter 6).
|
|
36
|
+
|
|
37
|
+
### Step 2 — Load Design Patterns (Progressive Loading)
|
|
38
|
+
Load specific reference files as needed to avoid loading all concepts at once:
|
|
39
|
+
* Base workflows (Prompt Chaining, Routing, Parallelization, Reflection, Tool Use, Planning, Multi-Agent Collaboration):
|
|
40
|
+
👉 **Load [base-workflows](references/base-workflows.md)**
|
|
41
|
+
* System infrastructure (Memory Management, Learning and Adaptation, MCP, Goal Setting and Monitoring):
|
|
42
|
+
👉 **Load [system-components](references/system-components.md)**
|
|
43
|
+
* Exception handling, HITL, RAG fact-grounding:
|
|
44
|
+
👉 **Load [resilience-hitl](references/resilience-hitl.md)**
|
|
45
|
+
* Advanced safety, evaluation, prioritization, A2A communication, exploration and discovery:
|
|
46
|
+
👉 **Load [advanced-safety](references/advanced-safety.md)**
|
|
47
|
+
|
|
48
|
+
### Step 3 — System Architecture Planning
|
|
49
|
+
The design document must clearly document:
|
|
50
|
+
1. **State Space**: Context window management method and division of short-term and long-term memory (cognitive/procedural memory).
|
|
51
|
+
2. **Tool Boundaries**: Tool call schema protocols and sandbox rules.
|
|
52
|
+
3. **Safety Boundaries**: Specific conditions for triggering human approval (HITL) or falling back to backup models.
|
|
53
|
+
|
|
54
|
+
---
|
|
55
|
+
|
|
56
|
+
## Output Template (Agentic Architecture Design Proposal)
|
|
57
|
+
|
|
58
|
+
When presenting agent designs to users, use this template format:
|
|
59
|
+
|
|
60
|
+
```markdown
|
|
61
|
+
# Agentic System Design Proposal: [System Name]
|
|
62
|
+
|
|
63
|
+
## 1. Executive Summary
|
|
64
|
+
* **Complexity Level**: [Level 1 / Level 2 / Level 3]
|
|
65
|
+
* **Target Objective**: [System Goal]
|
|
66
|
+
* **Key Constraints**: [Constraints such as latency, cost, security, etc.]
|
|
67
|
+
|
|
68
|
+
## 2. Core Orchestration Architecture
|
|
69
|
+
* **Selected Patterns**: [e.g., Router -> Parallel Agents -> Synthesizer]
|
|
70
|
+
* **Workflow Description**: [System data flow and control flow description]
|
|
71
|
+
|
|
72
|
+
### Topology Diagram (Mermaid)
|
|
73
|
+
```mermaid
|
|
74
|
+
[Mermaid diagram representing the Agent Loop / Topology]
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
## 3. Reference Patterns Applied
|
|
78
|
+
* **[Pattern Name] (Chapter X)**: [Specific application and rationale in the system]
|
|
79
|
+
* **[Pattern Name] (Chapter Y)**: [Specific application and rationale in the system]
|
|
80
|
+
|
|
81
|
+
## 4. Resilience, Safety & HITL Rules
|
|
82
|
+
* **Exception Recovery**: [Handling flow for API timeouts, rate limits, and JSON formatting errors]
|
|
83
|
+
* **Human-in-the-Loop Gates**: [Conditions triggering human review]
|
|
84
|
+
* **Guardrails**: [Input filtering and output validation mechanisms]
|
|
85
|
+
|
|
86
|
+
## 5. Next Steps / Implementation Roadmap
|
|
87
|
+
1. [Step 1]
|
|
88
|
+
2. [Step 2]
|
|
89
|
+
```
|
|
@@ -0,0 +1,58 @@
|
|
|
1
|
+
[
|
|
2
|
+
{
|
|
3
|
+
"query": "I need to design a system that routes incoming user queries to specialized LLM prompts depending on their category.",
|
|
4
|
+
"should_trigger": true
|
|
5
|
+
},
|
|
6
|
+
{
|
|
7
|
+
"query": "How do I implement reflection and self-correction in a multi-agent system to make it write better code?",
|
|
8
|
+
"should_trigger": true
|
|
9
|
+
},
|
|
10
|
+
{
|
|
11
|
+
"query": "Can you review my LLM orchestration workflow? It currently uses prompt chaining but has high latency.",
|
|
12
|
+
"should_trigger": true
|
|
13
|
+
},
|
|
14
|
+
{
|
|
15
|
+
"query": "I want to set up a Model Context Protocol (MCP) server for my agent so it can read local files.",
|
|
16
|
+
"should_trigger": true
|
|
17
|
+
},
|
|
18
|
+
{
|
|
19
|
+
"query": "What is the best way to handle long-term semantic memory and episodic memory in an autonomous agent?",
|
|
20
|
+
"should_trigger": true
|
|
21
|
+
},
|
|
22
|
+
{
|
|
23
|
+
"query": "Please design a pipeline workflow for generating technical reports, with a human-in-the-loop validation step.",
|
|
24
|
+
"should_trigger": true
|
|
25
|
+
},
|
|
26
|
+
{
|
|
27
|
+
"query": "How does dynamic re-prioritization work when an agent has conflicting goals?",
|
|
28
|
+
"should_trigger": true
|
|
29
|
+
},
|
|
30
|
+
{
|
|
31
|
+
"query": "Review the exception handling and recovery mechanism in my LLM agent loop.",
|
|
32
|
+
"should_trigger": true
|
|
33
|
+
},
|
|
34
|
+
{
|
|
35
|
+
"query": "I need to write a Python script that calculates the Fibonacci sequence using recursion.",
|
|
36
|
+
"should_trigger": false
|
|
37
|
+
},
|
|
38
|
+
{
|
|
39
|
+
"query": "What is the difference between supervised learning and reinforcement learning in traditional machine learning?",
|
|
40
|
+
"should_trigger": false
|
|
41
|
+
},
|
|
42
|
+
{
|
|
43
|
+
"query": "How do I configure my local PostgreSQL database on macOS?",
|
|
44
|
+
"should_trigger": false
|
|
45
|
+
},
|
|
46
|
+
{
|
|
47
|
+
"query": "Write a CSS stylesheet for a dark mode website.",
|
|
48
|
+
"should_trigger": false
|
|
49
|
+
},
|
|
50
|
+
{
|
|
51
|
+
"query": "I want to build a simple web scraper in Python using beautifulsoup4.",
|
|
52
|
+
"should_trigger": false
|
|
53
|
+
},
|
|
54
|
+
{
|
|
55
|
+
"query": "How do I write a prompt to make ChatGPT act like a professional English translator?",
|
|
56
|
+
"should_trigger": false
|
|
57
|
+
}
|
|
58
|
+
]
|
|
@@ -0,0 +1,27 @@
|
|
|
1
|
+
{
|
|
2
|
+
"skill_name": "neo-agentic-design",
|
|
3
|
+
"evals": [
|
|
4
|
+
{
|
|
5
|
+
"id": 1,
|
|
6
|
+
"prompt": "Design an agentic system that generates monthly financial reports. It must parse transaction raw data, categorize expenses, draft a report, let a human reviewer approve/edit the draft, and then output a final PDF. Minimize latency and ensure high accuracy.",
|
|
7
|
+
"expected_output": "An Agentic System Design Proposal containing Routing, Chaining, and Human-in-the-Loop patterns, structured with the standard output template.",
|
|
8
|
+
"assertions": [
|
|
9
|
+
"The output starts with 'Agentic System Design Proposal' or matches the template format",
|
|
10
|
+
"The proposal mentions Routing, Chaining, and Human-in-the-Loop patterns",
|
|
11
|
+
"The proposal contains a Mermaid sequence or flowchart diagram representing the topology",
|
|
12
|
+
"The proposal lists specific Gotchas or risks like latency and cost control"
|
|
13
|
+
]
|
|
14
|
+
},
|
|
15
|
+
{
|
|
16
|
+
"id": 2,
|
|
17
|
+
"prompt": "I need to design a system that reviews incoming code commits for potential security vulnerabilities and performance bottlenecks. It needs to check thousands of commits daily and must fail-safely if any analysis tool crashes.",
|
|
18
|
+
"expected_output": "An Agentic System Design Proposal containing Parallelization, Routing, Guardrails, and Exception Recovery patterns, structured with the standard output template.",
|
|
19
|
+
"assertions": [
|
|
20
|
+
"The proposal includes Parallelization and Exception Recovery patterns",
|
|
21
|
+
"The proposal provides a Mermaid topology diagram showing parallel evaluation and a merge point",
|
|
22
|
+
"The proposal includes specific Exception Handling rules for crashed analysis tools",
|
|
23
|
+
"The proposal includes Guardrails policies for input/output sanitization"
|
|
24
|
+
]
|
|
25
|
+
}
|
|
26
|
+
]
|
|
27
|
+
}
|
|
@@ -0,0 +1,158 @@
|
|
|
1
|
+
# Advanced Execution, Guardrails & Safety
|
|
2
|
+
|
|
3
|
+
This document provides conceptual designs for advanced execution, guardrails, and safety patterns, covering agent-to-agent (A2A) communication, resource-aware optimization, reasoning techniques, guardrails, evaluation and monitoring, prioritization, and scientific exploration.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Chapter 15: Inter-Agent Communication (A2A)
|
|
8
|
+
|
|
9
|
+
### 1. Definition
|
|
10
|
+
An open agent communication protocol across frameworks and technology stacks. Uses standard HTTP and JSON-RPC formats to enable agent declaration, task delegation, and data exchange across different networks.
|
|
11
|
+
|
|
12
|
+
### 2. Core Components
|
|
13
|
+
* **Agent Card**: A JSON declaration containing the agent name, version, endpoint URL, multimodal capabilities, and skills.
|
|
14
|
+
* **Task Mechanism**: Defines collaboration as a "Task" with a lifecycle state (Submitted, Working, Completed, Failed), tracked using a `contextId` for multi-turn conversation context.
|
|
15
|
+
* **Communication Modes**:
|
|
16
|
+
* **Synchronous**: Direct invocation with immediate response.
|
|
17
|
+
* **Asynchronous Polling**: Submit a task to obtain a Task ID and periodically query status.
|
|
18
|
+
* **Streaming (SSE)**: Receive partial outputs in real time via Server-Sent Events.
|
|
19
|
+
* **Webhook**: Actively push notifications to a specified URL upon task completion.
|
|
20
|
+
|
|
21
|
+
### 3. Problems Addressed
|
|
22
|
+
* Heterogeneous framework silos: Solves communication barriers between different agent frameworks.
|
|
23
|
+
* Distributed collaboration barriers: Enables agents on different servers to safely delegate tasks.
|
|
24
|
+
|
|
25
|
+
---
|
|
26
|
+
|
|
27
|
+
## Chapter 16: Resource-Aware Optimization
|
|
28
|
+
|
|
29
|
+
### 1. Definition
|
|
30
|
+
Monitors computation, latency, and financial costs (tokens/API calls) in real time during agent execution. Dynamically switches between models with different capabilities or prunes context based on budget and latency constraints.
|
|
31
|
+
|
|
32
|
+
### 2. Problems Addressed
|
|
33
|
+
* API cost overruns: Avoids using expensive reasoning models for simple queries.
|
|
34
|
+
* Rate limits and overload: Executes fallbacks and backup plans when the primary model is limited or overloaded.
|
|
35
|
+
|
|
36
|
+
### 3. Workflow
|
|
37
|
+
```mermaid
|
|
38
|
+
graph TD
|
|
39
|
+
Query[User Query] --> Router{Router LLM}
|
|
40
|
+
Router -->|1. Simple Query| CheapModel[Lightweight Model]
|
|
41
|
+
Router -->|2. Complex Reasoning| ExpensiveModel[High-tier Reasoning Model]
|
|
42
|
+
Router -->|3. Real-time Info| SearchTool[Real-time Search Tool]
|
|
43
|
+
CheapModel --> Checker[Critique Agent: Quality Eval]
|
|
44
|
+
ExpensiveModel --> Checker
|
|
45
|
+
Checker -->|Fail| Fallback[Fallback Plan]
|
|
46
|
+
Checker -->|Pass| Output[Final Output]
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
---
|
|
50
|
+
|
|
51
|
+
## Chapter 17: Reasoning Techniques
|
|
52
|
+
|
|
53
|
+
### 1. Definition
|
|
54
|
+
Architectural techniques that allocate more computational resources at inference time to explicitly expand the agent's thought process. Covers step-by-step decomposition, tree-search path planning, code-assisted execution, and ReAct loops.
|
|
55
|
+
|
|
56
|
+
### 2. Six Core Reasoning Patterns
|
|
57
|
+
* **Chain of Thought (CoT)**: Guides the model to reason step-by-step to decompose complex problems.
|
|
58
|
+
* **Tree of Thoughts (ToT)**: Represents the reasoning space as a tree, supporting backtracking and multi-path parallel evaluation.
|
|
59
|
+
* **Reasoning and Action (ReAct)**: Interleaves tool execution with reasoning steps (Thought -> Action -> Observation -> Thought ... -> Finish).
|
|
60
|
+
* **Program-Aided Language Models (PALMs)**: Offloads precise mathematical calculations to a secure code sandbox and interprets the results to eliminate calculation hallucinations.
|
|
61
|
+
* **Multi-Agent Debate (Chain/Graph of Debates)**: Employs multiple agents to debate a problem across several turns, using consensus or strong logical conclusions as the final answer.
|
|
62
|
+
* **Scaling Inference Law**: Uses multi-path generation, self-correction, or extended thinking paths during the inference stage, allowing smaller models to achieve performance comparable to a single generation of a larger model.
|
|
63
|
+
|
|
64
|
+
---
|
|
65
|
+
|
|
66
|
+
## Chapter 18: Guardrails & Safety Patterns
|
|
67
|
+
|
|
68
|
+
### 1. Definition
|
|
69
|
+
Deploys multiple layers of filtering and defense at the input, tool execution, and output stages to ensure system compliance, safety, and protection against jailbreak attacks, prompt injection, and tool privilege escalation.
|
|
70
|
+
|
|
71
|
+
### 2. Multi-Layer Defense Flow
|
|
72
|
+
```mermaid
|
|
73
|
+
graph TD
|
|
74
|
+
Input[User Input] --> InputGuard[1. Input Guardrails: Jailbreak/Injection Detection]
|
|
75
|
+
InputGuard -->|Violation| Block[Access Denied]
|
|
76
|
+
InputGuard -->|Safe| LLM_Core[2. Core Agent Reasoning]
|
|
77
|
+
LLM_Core -->|Call Tool| ToolCallback[3. Pre-execution Tool Validation]
|
|
78
|
+
ToolCallback -->|Reject| LLM_Core
|
|
79
|
+
ToolCallback -->|Approve| ToolExec[Tool Execution]
|
|
80
|
+
ToolExec --> OutputGen[Output Generation]
|
|
81
|
+
OutputGen --> OutputGuard[4. Output Guardrails: PII/Toxicity Filter]
|
|
82
|
+
OutputGuard -->|Safe| User[Deliver to User]
|
|
83
|
+
OutputGuard -->|Violation| Redaction[Redaction/Block/Self-Correction]
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
### 3. Problems Addressed
|
|
87
|
+
* Prompt jailbreaks: Prevents users from guiding the agent to perform unauthorized or harmful actions.
|
|
88
|
+
* Privilege escalation: Follows the principle of least privilege to prevent agents from unauthorized data modification or account deletion.
|
|
89
|
+
|
|
90
|
+
---
|
|
91
|
+
|
|
92
|
+
## Chapter 19: Evaluation and Monitoring
|
|
93
|
+
|
|
94
|
+
### 1. Definition
|
|
95
|
+
Systematically measures and audits agent execution quality, trajectories, resource consumption, and drift. Evaluates the execution trajectory rather than just the final answer for non-deterministic systems.
|
|
96
|
+
|
|
97
|
+
### 2. Three Core Evaluation Aspects
|
|
98
|
+
* **Objective Metrics Monitoring**: Logs latency, token consumption, and API success rates.
|
|
99
|
+
* **Trajectory Evaluation**: Compares action sequences with standard SOPs using exact matching, ordered matching, or unordered matching.
|
|
100
|
+
* **LLM-as-a-Judge**: Uses an independent LLM to score answers based on specific rubrics and outputs structured feedback.
|
|
101
|
+
|
|
102
|
+
### 3. Advanced Pattern: AI Contractor / Contract Pattern
|
|
103
|
+
Resolves prompt drift and responsibility ambiguity:
|
|
104
|
+
```mermaid
|
|
105
|
+
graph TD
|
|
106
|
+
User[User] -->|1. Initiate Draft Contract| Contractor[Contractor Agent]
|
|
107
|
+
Contractor -->|2. Self-Analysis & Evaluation| Analyze[Analyze clauses, scope, cost, dependencies]
|
|
108
|
+
Analyze -->|3. Negotiate Feedback| User
|
|
109
|
+
User -->|4. Approve & Sign| Execute[5. Execution: Self-test & verify]
|
|
110
|
+
Execute -->|6. Decompose Tasks| SubContracts[Sub-contracts]
|
|
111
|
+
SubContracts --> SubAgents[Sub-agents]
|
|
112
|
+
Execute -->|7. Deliver Deliverables| User
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
---
|
|
116
|
+
|
|
117
|
+
## Chapter 20: Prioritization
|
|
118
|
+
|
|
119
|
+
### 1. Definition
|
|
120
|
+
Sorts and dynamically schedules the execution order of multiple goals and tasks when the agent is faced with resource constraints or limited budgets.
|
|
121
|
+
|
|
122
|
+
### 2. Problems Addressed
|
|
123
|
+
* Deadlocks and lack of focus: Prevents delays in critical tasks caused by prioritizing minor ones.
|
|
124
|
+
* Inadequate crisis response: Ensures the agent can dynamically switch task context when high-priority events (e.g., safety alerts) occur.
|
|
125
|
+
|
|
126
|
+
### 3. Prioritization Metrics
|
|
127
|
+
* **Urgency**: Time sensitivity (closeness to deadline).
|
|
128
|
+
* **Importance**: Impact on accomplishing the ultimate goal.
|
|
129
|
+
* **Dependencies**: Whether the task is a prerequisite for other tasks.
|
|
130
|
+
* **Cost-Benefit Ratio**: Expected payoff relative to consumed resources.
|
|
131
|
+
|
|
132
|
+
### 4. Mechanism
|
|
133
|
+
Tasks are scored and entered into a Priority Queue, executed sequentially by the planner. The system recalculates weights and re-orders the queue (Dynamic Re-prioritization) or interrupts the current task when the environmental state changes.
|
|
134
|
+
|
|
135
|
+
---
|
|
136
|
+
|
|
137
|
+
## Chapter 21: Exploration and Discovery
|
|
138
|
+
|
|
139
|
+
### 1. Definition
|
|
140
|
+
Enables the agent to proactively explore unknown domains (Unknown Unknowns), generate new knowledge, design experiments, and prove hypotheses.
|
|
141
|
+
|
|
142
|
+
### 2. Multi-Agent Scientific Discovery Flow
|
|
143
|
+
```mermaid
|
|
144
|
+
graph TD
|
|
145
|
+
Goal[Exploration Goal] --> GenAgent[1. Generation Agent]
|
|
146
|
+
GenAgent -->|Propose Hypothesis| RefAgent[2. Reflection Agent]
|
|
147
|
+
RefAgent -->|Peer Review / Correction Suggestions| GenAgent
|
|
148
|
+
RefAgent -->|Accepted Draft| RankAgent[3. Ranking Agent]
|
|
149
|
+
RankAgent -->|Elo Tournament Debate| BestHypotheses[Select Best Hypotheses]
|
|
150
|
+
BestHypotheses --> EvoAgent[4. Evolution Agent]
|
|
151
|
+
EvoAgent -->|Concept Merging & Non-linear Exploration| AdvancedHypo[Advanced Hypotheses]
|
|
152
|
+
AdvancedHypo --> LabAgent[5. Lab Agent]
|
|
153
|
+
LabAgent -->|Execute Code/Simulation/Analysis| FinalReport[6. Final LaTeX Report]
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
### 3. Trade-offs
|
|
157
|
+
* **Pros**: Explores unknown topics autonomously, discovering insights that exceed human experience.
|
|
158
|
+
* **Cons**: High uncertainty and heavy token consumption; requires strict safety guardrails to prevent generating hazardous protocols.
|
|
@@ -0,0 +1,219 @@
|
|
|
1
|
+
# Base Patterns & Workflows
|
|
2
|
+
|
|
3
|
+
This document provides conceptual designs for basic agentic orchestration patterns, covering prompt chaining, routing, parallelization, reflection, tool use, planning, and multi-agent collaboration.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Chapter 1: Prompt Chaining
|
|
8
|
+
|
|
9
|
+
### 1. Definition
|
|
10
|
+
Decomposes a complex task into multiple **sequentially dependent subtasks**. The structured output of the previous step serves as the input for the next step. Each step focuses on a single, clear objective.
|
|
11
|
+
|
|
12
|
+
### 2. Problems Addressed
|
|
13
|
+
* Context dilution: Prevents the LLM from losing focus when processing large, complex tasks.
|
|
14
|
+
* Instruction drift: Avoids failures in a single prompt that contains too many rules.
|
|
15
|
+
|
|
16
|
+
### 3. Workflow
|
|
17
|
+
```mermaid
|
|
18
|
+
graph LR
|
|
19
|
+
Input[Raw Input] --> Step1[LLM Step A]
|
|
20
|
+
Step1 -->|Structured Output A| Step2[LLM Step B]
|
|
21
|
+
Step2 -->|Structured Output B| Step3[LLM Step C]
|
|
22
|
+
Step3 --> Output[Final Answer]
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
### 4. Trade-offs
|
|
26
|
+
* **Pros**: High predictability; easy to optimize prompts and perform unit testing for individual steps.
|
|
27
|
+
* **Cons**: High total latency due to sequential execution; errors in earlier steps propagate downstream (Error Cascade).
|
|
28
|
+
|
|
29
|
+
### 5. Use Cases
|
|
30
|
+
* Multi-step article generation (Outline -> Draft -> Polish -> Format).
|
|
31
|
+
* Data extraction and compliance analysis.
|
|
32
|
+
|
|
33
|
+
---
|
|
34
|
+
|
|
35
|
+
## Chapter 2: Routing
|
|
36
|
+
|
|
37
|
+
### 1. Definition
|
|
38
|
+
Dynamically redirects tasks to the most suitable execution path, specialized tool, or sub-agent based on input characteristics. Routing decisions are made by rule engines, semantic similarity, or LLM classifiers.
|
|
39
|
+
|
|
40
|
+
### 2. Problems Addressed
|
|
41
|
+
* Resource waste: Avoids using expensive, slow high-tier models for simple queries.
|
|
42
|
+
* Tool clutter: Avoids crowding too many unrelated tools into a single agent's context window.
|
|
43
|
+
|
|
44
|
+
### 3. Workflow
|
|
45
|
+
```mermaid
|
|
46
|
+
graph TD
|
|
47
|
+
Input[User Input] --> Router[Routing Classifier / LLM]
|
|
48
|
+
Router -->|Path A| AgentA[Specialized Agent A / Tool A]
|
|
49
|
+
Router -->|Path B| AgentB[Specialized Agent B / Tool B]
|
|
50
|
+
Router -->|Path C| AgentC[Specialized Agent C / Tool C]
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
### 4. Trade-offs
|
|
54
|
+
* **Pros**: High modularity; reduces average system latency and token consumption.
|
|
55
|
+
* **Cons**: Routing errors directly cause downstream task failures; an extra routing decision layer adds minor latency.
|
|
56
|
+
|
|
57
|
+
### 5. Use Cases
|
|
58
|
+
* Customer support dispatching (e.g., routing to billing, tech support, or returns agents).
|
|
59
|
+
* Pre-filtering for tool calls.
|
|
60
|
+
|
|
61
|
+
---
|
|
62
|
+
|
|
63
|
+
## Chapter 3: Parallelization
|
|
64
|
+
|
|
65
|
+
### 1. Definition
|
|
66
|
+
Splits a large task into multiple **independent subtasks** executed in parallel (Fork) and aggregates the results at a single point (Join).
|
|
67
|
+
|
|
68
|
+
### 2. Problems Addressed
|
|
69
|
+
* Cumulative linear latency: Solves the high time cost associated with sequential multi-step execution.
|
|
70
|
+
* Single-perspective limitation: Collects diverse solutions to the same problem simultaneously for synthesis.
|
|
71
|
+
|
|
72
|
+
### 3. Workflow
|
|
73
|
+
```mermaid
|
|
74
|
+
graph TD
|
|
75
|
+
Input[Raw Query] --> Splitter[Splitter]
|
|
76
|
+
Splitter --> TaskA[Parallel Task A]
|
|
77
|
+
Splitter --> TaskB[Parallel Task B]
|
|
78
|
+
Splitter --> TaskC[Parallel Task C]
|
|
79
|
+
TaskA --> Syn[Synthesis / Aggregator]
|
|
80
|
+
TaskB --> Syn
|
|
81
|
+
TaskC --> Syn
|
|
82
|
+
Syn --> Output[Aggregated Output]
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
### 4. Trade-offs
|
|
86
|
+
* **Pros**: Significantly reduces elapsed time; suitable for large-scale parallel filtering.
|
|
87
|
+
* **Cons**: High spikes in token usage, easily triggering API rate limits; reconciling inconsistent results requires additional algorithms or LLM overhead.
|
|
88
|
+
|
|
89
|
+
### 5. Use Cases
|
|
90
|
+
* Static code analysis (checking security, performance, and style simultaneously).
|
|
91
|
+
* Large-scale information retrieval and cross-document comparison.
|
|
92
|
+
|
|
93
|
+
---
|
|
94
|
+
|
|
95
|
+
## Chapter 4: Reflection (Self-Correction)
|
|
96
|
+
|
|
97
|
+
### 1. Definition
|
|
98
|
+
Introduces a dual-entity feedback mechanism: a Generator and a Critic. The Generator produces an initial draft, the Critic evaluates it for quality and provides feedback, and the Generator iteratively refines the output until termination conditions are met.
|
|
99
|
+
|
|
100
|
+
### 2. Problems Addressed
|
|
101
|
+
* Unstable output quality: Prevents logical gaps, factual errors, or formatting anomalies.
|
|
102
|
+
* Overconfidence: Breaks cognitive blind spots of a single-turn generation via an independent critique mechanism.
|
|
103
|
+
|
|
104
|
+
### 3. Workflow
|
|
105
|
+
```mermaid
|
|
106
|
+
graph TD
|
|
107
|
+
Input[Task Goal] --> Gen[Generator]
|
|
108
|
+
Gen --> Draft[Initial Draft]
|
|
109
|
+
Draft --> Critic[Critic / Evaluator]
|
|
110
|
+
Critic --> Decision{Is Acceptable?}
|
|
111
|
+
Decision -->|No| Feedback[Feedback/Suggestions]
|
|
112
|
+
Feedback -->|Guide Correction| Gen
|
|
113
|
+
Decision -->|Yes| Output[Final Accepted Output]
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
### 4. Trade-offs
|
|
117
|
+
* **Pros**: Highly stable output quality, significantly reducing logical and formatting errors.
|
|
118
|
+
* **Cons**: Higher token consumption; extended execution time; potential for infinite loops if termination conditions are poorly defined.
|
|
119
|
+
|
|
120
|
+
### 5. Use Cases
|
|
121
|
+
* Automated code generation and testing (write code -> run tests -> fix based on errors -> re-test).
|
|
122
|
+
* Strict compliance document drafting.
|
|
123
|
+
|
|
124
|
+
---
|
|
125
|
+
|
|
126
|
+
## Chapter 5: Tool Use / Function Calling
|
|
127
|
+
|
|
128
|
+
### 1. Definition
|
|
129
|
+
The LLM reads the description format (schema) of external tools, autonomously decides when to call a tool and generates the parameters. The agent executes the tool in a sandbox or external system, and feeds the results back to the LLM for interpretation.
|
|
130
|
+
|
|
131
|
+
### 2. Problems Addressed
|
|
132
|
+
* Information lag: Connects the model to real-time data.
|
|
133
|
+
* Lack of computation: Solves difficulties in mathematics and precise logical operations.
|
|
134
|
+
* Inability to affect external systems: Allows agents to send emails, write databases, or call APIs.
|
|
135
|
+
|
|
136
|
+
### 3. Workflow
|
|
137
|
+
```mermaid
|
|
138
|
+
sequenceDiagram
|
|
139
|
+
participant U as User
|
|
140
|
+
participant L as LLM (Core Reasoning)
|
|
141
|
+
participant A as Agent Execution Sandbox
|
|
142
|
+
participant T as External Tool / API
|
|
143
|
+
U->>L: Query
|
|
144
|
+
L->>L: Identify context, decide to use clock tool
|
|
145
|
+
L-->>A: Return tool name & structured parameters
|
|
146
|
+
A->>T: Call external API
|
|
147
|
+
T-->>A: Return real-time data
|
|
148
|
+
A-->>L: Send execution result back as context
|
|
149
|
+
L->>L: Synthesize and reason
|
|
150
|
+
L->>U: Respond to user
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
### 4. Trade-offs
|
|
154
|
+
* **Pros**: Greatly expands the action capabilities and data retrieval scope of the agent.
|
|
155
|
+
* **Cons**: Risk of parameter generation errors; security risks with external tools (requires strict sandboxing); vulnerability to external API instability.
|
|
156
|
+
|
|
157
|
+
### 5. Use Cases
|
|
158
|
+
* Real-time data queries (weather, stock market, ERP systems).
|
|
159
|
+
* Data entry and control (sending notifications, database updates).
|
|
160
|
+
|
|
161
|
+
---
|
|
162
|
+
|
|
163
|
+
## Chapter 6: Planning
|
|
164
|
+
|
|
165
|
+
### 1. Definition
|
|
166
|
+
Decomposes a high-level goal into an ordered set of dependent execution steps. The planner dynamically rewrites the remaining steps (replanning) based on environmental feedback and new information to ensure the goal is reached.
|
|
167
|
+
|
|
168
|
+
### 2. Problems Addressed
|
|
169
|
+
* Goal drift: Prevents the agent from losing sight of the ultimate goal during multi-step execution.
|
|
170
|
+
* Dynamic environment changes: Automatically searches for alternative solutions if a step fails.
|
|
171
|
+
|
|
172
|
+
### 3. Workflow
|
|
173
|
+
```mermaid
|
|
174
|
+
graph TD
|
|
175
|
+
Goal[Ultimate Goal] --> Planner[Planner: Task Decomposition]
|
|
176
|
+
Planner --> Plan[Generate Step List 1, 2, 3...]
|
|
177
|
+
Plan --> Executor[Executor: Call tools / sub-steps sequentially]
|
|
178
|
+
Executor --> EnvFeedback[Environmental Feedback]
|
|
179
|
+
EnvFeedback --> Checker{Encounter obstacles/failure?}
|
|
180
|
+
Checker -->|Yes| Replanner[Dynamic Replanner: Update plan]
|
|
181
|
+
Replanner --> Plan
|
|
182
|
+
Checker -->|No| Next{All steps completed?}
|
|
183
|
+
Next -->|No| Executor
|
|
184
|
+
Next -->|Yes| Output[Goal Accomplished]
|
|
185
|
+
```
|
|
186
|
+
|
|
187
|
+
### 4. Trade-offs
|
|
188
|
+
* **Pros**: Highly adaptable; capable of autonomously handling complex, unstructured tasks.
|
|
189
|
+
* **Cons**: Very high cost in LLM calls for planning and replanning; plan errors propagate, drifting downstream actions away from the target.
|
|
190
|
+
|
|
191
|
+
### 5. Use Cases
|
|
192
|
+
* Autonomous research assistants (Deep Research: dynamically selecting keywords, assessing information quality, diving deep into unknown domains).
|
|
193
|
+
* Automated software development (architecture design -> module division -> sequential development).
|
|
194
|
+
|
|
195
|
+
---
|
|
196
|
+
|
|
197
|
+
## Chapter 7: Multi-Agent Collaboration
|
|
198
|
+
|
|
199
|
+
### 1. Definition
|
|
200
|
+
Distributes a large task among multiple **specialized agents with distinct personas and skills**. These agents coordinate task handoffs, discussions, and integration through a predefined collaboration topology.
|
|
201
|
+
|
|
202
|
+
### 2. Problems Addressed
|
|
203
|
+
* Cognitive limits of a single core: Avoids overloading a single system prompt with too many instructions and roles.
|
|
204
|
+
* Unclear division of labor: Emulates human teams by dedicating specialists to specific tasks.
|
|
205
|
+
|
|
206
|
+
### 3. Workflow
|
|
207
|
+
Four main collaboration topologies:
|
|
208
|
+
* **Handoffs (Network)**: Agent A finishes its task and hands over the context and control to Agent B.
|
|
209
|
+
* **Supervisor**: A central Supervisor agent coordinates, assigns tasks to specialists, and aggregates results.
|
|
210
|
+
* **Hierarchy**: Supervisors oversee sub-supervisors, delegating and aggregating tasks hierarchically.
|
|
211
|
+
* **Blackboard**: Agents read and write to a shared state space (blackboard), intervening autonomously as the state changes.
|
|
212
|
+
|
|
213
|
+
### 4. Trade-offs
|
|
214
|
+
* **Pros**: Modular and scalable; allows mixing different model sizes/strengths to optimize costs.
|
|
215
|
+
* **Cons**: High communication overhead (multi-turn dialogues between agents); complex state management; risk of infinite discussion loops or unclear ownership.
|
|
216
|
+
|
|
217
|
+
### 5. Use Cases
|
|
218
|
+
* Simulated software development teams (Product Manager -> Architect -> Engineer -> QA).
|
|
219
|
+
* Creative content generation and peer review.
|
|
@@ -0,0 +1,105 @@
|
|
|
1
|
+
# Resilience, Exceptions & HITL
|
|
2
|
+
|
|
3
|
+
This document provides conceptual designs for system resilience, human interaction, and knowledge grounding, covering exception handling, Human-in-the-Loop (HITL) gates, and Retrieval-Augmented Generation (RAG).
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Chapter 12: Exception Handling and Recovery
|
|
8
|
+
|
|
9
|
+
### 1. Definition
|
|
10
|
+
Designs automatic detection, retry, fallback, and state rollback mechanisms for exceptions that may occur during agent execution (such as API timeouts, network disconnections, LLM format errors, and invalid tool parameters).
|
|
11
|
+
|
|
12
|
+
### 2. Problems Addressed
|
|
13
|
+
* System fragility: Prevents long-cycle workflows from breaking due to transient network or API issues.
|
|
14
|
+
* Format pollution: Guides the LLM to self-heal when its output does not conform to the expected JSON schema.
|
|
15
|
+
|
|
16
|
+
### 3. Workflow
|
|
17
|
+
```mermaid
|
|
18
|
+
graph TD
|
|
19
|
+
Step[Execute Tool / Call LLM] --> Success{Successful?}
|
|
20
|
+
Success -->|Yes| Next[Proceed to Next Step]
|
|
21
|
+
Success -->|No: Exception| Detector[Exception Detector]
|
|
22
|
+
Detector --> RuleCheck{Evaluate Exception Type}
|
|
23
|
+
RuleCheck -->|Network/Timeout| Retry[Auto Retry with Backoff]
|
|
24
|
+
RuleCheck -->|Format Error| Refine[Guide LLM to Self-Correct]
|
|
25
|
+
RuleCheck -->|Tool Failure| Fallback[Route to Fallback/Alternative Tool]
|
|
26
|
+
RuleCheck -->|Critical Error| Rollback[Rollback State to Checkpoint]
|
|
27
|
+
Retry --> Step
|
|
28
|
+
Refine --> Step
|
|
29
|
+
Fallback --> Step
|
|
30
|
+
Rollback --> UserEscalation[Human Intervention]
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
### 4. Trade-offs
|
|
34
|
+
* **Pros**: Improves system robustness and reduces manual maintenance costs.
|
|
35
|
+
* **Cons**: Excessive retries or fallbacks can mask underlying bugs or quietly degrade output quality.
|
|
36
|
+
|
|
37
|
+
---
|
|
38
|
+
|
|
39
|
+
## Chapter 13: Human-in-the-Loop (HITL)
|
|
40
|
+
|
|
41
|
+
### 1. Definition
|
|
42
|
+
Strategically embeds human review, intervention, and authorization mechanisms into the agent's autonomous decision-making workflow, combining human common sense, ethics, and legal judgment with AI automation.
|
|
43
|
+
|
|
44
|
+
### 2. Problems Addressed
|
|
45
|
+
* High-risk operations: Prevents agent errors when performing large financial transactions, deleting sensitive data, or executing legally sensitive actions.
|
|
46
|
+
* Automation boundaries: Requests human guidance when decision confidence falls below a set threshold.
|
|
47
|
+
|
|
48
|
+
### 3. Three Core Interaction Modes
|
|
49
|
+
````carousel
|
|
50
|
+
### 1. Human-in-the-Loop (HITL)
|
|
51
|
+
* **Mechanism**: The agent pauses when reaching a high-risk step (e.g., large bank transfer), suspends the task, and sends it to a pending review queue.
|
|
52
|
+
* **Workflow**: Agent pauses -> Human reviews (Approve/Reject/Modify) -> Agent receives input and resumes execution.
|
|
53
|
+
* **Key Characteristic**: Human approval is a mandatory gate.
|
|
54
|
+
<!-- slide -->
|
|
55
|
+
### 2. Human-on-the-Loop (HOTL)
|
|
56
|
+
* **Mechanism**: The agent executes tasks autonomously while a human supervisor monitors and adjusts strategies.
|
|
57
|
+
* **Workflow**: Human sets macro rules (e.g., transaction limits) -> Agent trades automatically -> Human monitors metrics -> Human intervenes via a Kill Switch if necessary.
|
|
58
|
+
* **Key Characteristic**: Human does not intervene in individual decisions but maintains macro-level oversight.
|
|
59
|
+
<!-- slide -->
|
|
60
|
+
### 3. Decision Augmentation
|
|
61
|
+
* **Mechanism**: The agent acts as an analytical assistant, gathering data and presenting candidates. Decision-making and execution are performed entirely by a human.
|
|
62
|
+
* **Workflow**: Human asks query -> Agent collects and analyzes data -> Agent proposes options A, B, and C with pros/cons -> Human selects and executes.
|
|
63
|
+
* **Key Characteristic**: Agent provides cognitive augmentation without execution authority.
|
|
64
|
+
````
|
|
65
|
+
|
|
66
|
+
### 4. Trade-offs
|
|
67
|
+
* **Pros**: Provides a safety net and compliance guarantee for high-risk decisions; collects human feedback to optimize agent alignment.
|
|
68
|
+
* **Cons**: Human intervention limits system scalability and speed; designing human-in-the-loop review queues increases development costs.
|
|
69
|
+
|
|
70
|
+
---
|
|
71
|
+
|
|
72
|
+
## Chapter 14: Knowledge Retrieval / RAG
|
|
73
|
+
|
|
74
|
+
### 1. Definition
|
|
75
|
+
Retrieves relevant information from a knowledge base before the LLM generates a response, injecting the retrieved text chunks into the prompt context to guide the LLM toward producing factually grounded answers.
|
|
76
|
+
|
|
77
|
+
### 2. Advanced Agentic RAG Variants
|
|
78
|
+
```mermaid
|
|
79
|
+
graph TD
|
|
80
|
+
subgraph Traditional RAG
|
|
81
|
+
Query[User Query] --> VectorSearch[Vector Similarity Search]
|
|
82
|
+
VectorSearch --> Context[Concatenate Context Chunks]
|
|
83
|
+
Context --> LLMGen[LLM Generates Response]
|
|
84
|
+
end
|
|
85
|
+
subgraph Graph RAG
|
|
86
|
+
GQuery[User Query] --> GraphSearch[Navigate Knowledge Graph Nodes & Edges]
|
|
87
|
+
GraphSearch --> UnifiedContext[Cross-document Context Linkage]
|
|
88
|
+
end
|
|
89
|
+
subgraph Agentic RAG
|
|
90
|
+
AQuery[User Query] --> AgentLayer[Agent Decision Layer]
|
|
91
|
+
AgentLayer -->|1. Decompose Task| SubQueries[Multi-step Sub-retrieval Tasks]
|
|
92
|
+
AgentLayer -->|2. Self-Reflection| SourceVal[Source Timeliness & Quality Check]
|
|
93
|
+
AgentLayer -->|3. Resolve Conflicts| ConflictRecon[Active Conflict Reconciliation]
|
|
94
|
+
AgentLayer -->|4. Tool Call| WebSearch[Web Search for Knowledge Gaps]
|
|
95
|
+
end
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
### 3. Problems Addressed
|
|
99
|
+
* Outdated knowledge: Bypasses the temporal limits of static training data.
|
|
100
|
+
* Hallucination: Restricts the model within factual boundaries using verified document contexts.
|
|
101
|
+
* Fragmented information: Resolves vector search limitations that struggle to answer comprehensive questions spanning multiple documents.
|
|
102
|
+
|
|
103
|
+
### 4. Trade-offs
|
|
104
|
+
* **Pros**: Minimizes factual errors; supports precise citations; imports private knowledge without retraining models.
|
|
105
|
+
* **Cons**: Highly sensitive to the quality of text chunking and embeddings; multi-step reasoning in Agentic RAG increases response latency.
|
|
@@ -0,0 +1,93 @@
|
|
|
1
|
+
# System Components & Protocols
|
|
2
|
+
|
|
3
|
+
This document provides conceptual designs for system architecture components, resources, and protocols, covering memory management, learning and adaptation, Model Context Protocol (MCP), and goal setting and monitoring.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Chapter 8: Memory Management
|
|
8
|
+
|
|
9
|
+
### 1. Definition
|
|
10
|
+
Provides agents with the ability to store and retrieve information across sessions and tasks through persistence mechanisms. The memory system is generally divided into short-term and long-term memory, managed by a unified Memory Service.
|
|
11
|
+
|
|
12
|
+
### 2. Memory Classification
|
|
13
|
+
| Memory Type | Medium | Function | Eviction & Retrieval Mechanism |
|
|
14
|
+
| :--- | :--- | :--- | :--- |
|
|
15
|
+
| **Short-term** | Current Context Window | Stores current conversation context and task execution trajectory | Sliding window, context pruning, and summarization |
|
|
16
|
+
| **Long-term Semantic** | Vector Database / Knowledge Base | Retains factual knowledge, concepts, and external rules | Vector semantic retrieval based on user input |
|
|
17
|
+
| **Long-term Episodic** | Structured Database / Log Store | Records past task execution experiences and outcomes | Used for few-shot learning or similar scenario matching |
|
|
18
|
+
| **Long-term Procedural**| Codebase / Tool Definitions / Prompt Templates | Records Standard Operating Procedures (SOPs) and toolbox definitions for specific tasks | Dynamically loaded based on task type |
|
|
19
|
+
|
|
20
|
+
### 3. Problems Addressed
|
|
21
|
+
* Amnesia (Context limits): Prevents long conversations from causing the LLM to lose critical history.
|
|
22
|
+
* Repeated errors: Ensures the agent learns from past executions to improve decision success rates.
|
|
23
|
+
|
|
24
|
+
---
|
|
25
|
+
|
|
26
|
+
## Chapter 9: Learning and Adaptation
|
|
27
|
+
|
|
28
|
+
### 1. Definition
|
|
29
|
+
Enables the agent to autonomously modify prompts or self-modify execution code in a code sandbox (SICA - Self-Improving Coding Agent) by collecting behavioral feedback and rewards from interactions with the environment, users, or other agents.
|
|
30
|
+
|
|
31
|
+
### 2. Problems Addressed
|
|
32
|
+
* Static configuration lag: Solves the issue of agents failing to adjust when environmental rules change.
|
|
33
|
+
* High development cost: Eliminates the manual process of fine-tuning prompts.
|
|
34
|
+
|
|
35
|
+
### 3. Workflow
|
|
36
|
+
```mermaid
|
|
37
|
+
graph TD
|
|
38
|
+
Interaction[Agent-Environment Interaction] --> Result[Execution Results & Metrics]
|
|
39
|
+
Result --> evaluator[Evaluator / Scoring System]
|
|
40
|
+
evaluator -->|Feedback/Score| Learner[Learning Engine]
|
|
41
|
+
Learner -->|Self-Optimize Prompts or Refactor Code| AgentUpgrade[Upgraded Agent]
|
|
42
|
+
AgentUpgrade -->|Next Task Turn| Interaction
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
### 4. Trade-offs
|
|
46
|
+
* **Pros**: High potential for long-term self-evolution; can discover high-quality logic not designed by humans in specific vertical disciplines (e.g., mathematical proofs, code generation).
|
|
47
|
+
* **Cons**: Unpredictable evolution paths, which may generate harmful mutations; self-modifying prompts can lead to privilege escalation or security vulnerabilities; extremely high overhead for training and testing iterations.
|
|
48
|
+
|
|
49
|
+
---
|
|
50
|
+
|
|
51
|
+
## Chapter 10: Model Context Protocol (MCP)
|
|
52
|
+
|
|
53
|
+
### 1. Definition
|
|
54
|
+
A standardized **Client-Server communication protocol** that establishes a plug-and-play integration standard between LLMs/Agents (Clients) and external data sources, development tools, and API services (Servers). MCP standardizes three core types of context exchange: **Resources**, **Prompts**, and **Tools**.
|
|
55
|
+
|
|
56
|
+
```mermaid
|
|
57
|
+
graph LR
|
|
58
|
+
subgraph Agentic Client
|
|
59
|
+
Agent[AI Agent / LLM]
|
|
60
|
+
end
|
|
61
|
+
subgraph MCP Server
|
|
62
|
+
Res[Resources: Files/Databases]
|
|
63
|
+
Pmt[Prompts: Templates]
|
|
64
|
+
Tls[Tools: APIs/Sandboxes]
|
|
65
|
+
end
|
|
66
|
+
Agent <-->|Standard JSON-RPC 2.0| MCP_Link[MCP Protocol Layer]
|
|
67
|
+
MCP_Link <--> Res
|
|
68
|
+
MCP_Link <--> Pmt
|
|
69
|
+
MCP_Link <--> Tls
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
### 2. Problems Addressed
|
|
73
|
+
* Tedious integration: Avoids repeatedly writing custom wrapper code when developing new agents or integrating new tools.
|
|
74
|
+
* Fragmented context acquisition: Provides external data and actions to the model in a unified interface format.
|
|
75
|
+
|
|
76
|
+
### 3. Trade-offs
|
|
77
|
+
* **Pros**: Reduces integration costs for multiple tools and data sources; decouples data sources from reasoning entities; supports dynamic discovery.
|
|
78
|
+
* **Cons**: Protocol serialization and JSON-RPC wrapping introduce minor performance overhead; requires tool providers to actively adopt the protocol.
|
|
79
|
+
|
|
80
|
+
---
|
|
81
|
+
|
|
82
|
+
## Chapter 11: Goal Setting and Monitoring
|
|
83
|
+
|
|
84
|
+
### 1. Definition
|
|
85
|
+
Sets structured and quantifiable goals (SMART principles) before agent initialization, and introduces an independent monitor during the execution phase to observe progress in real time (Progress Checkpoints), detect blocks, and trigger human-agent collaboration escalation when necessary.
|
|
86
|
+
|
|
87
|
+
### 2. Problems Addressed
|
|
88
|
+
* Blind execution: Prevents agents from entering infinite retry loops when encountering logical obstacles, wasting budget.
|
|
89
|
+
* Lack of observability: Solves the black-box execution problem, providing a clear progress path.
|
|
90
|
+
|
|
91
|
+
### 3. Use Cases
|
|
92
|
+
* Automated marketing campaign execution.
|
|
93
|
+
* Long-cycle autonomous codebase refactoring.
|