@moon791017/neo-skills 1.1.10 → 1.1.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -59,6 +59,7 @@
59
59
  | TypeScript | `neo-typescript` | 處理 TypeScript、tsconfig、strict mode、泛型、conditional/mapped/template literal types、ESM/CJS 與 runtime boundaries。 |
60
60
  | Vue | `neo-vue` | 建置、除錯、重構或審查 Vue 3、SFC、Composition API、Pinia、Vue Router、Vite 與 Vue+TypeScript。 |
61
61
  | Agent 架構 | `neo-sub-agent` | 設計、建立、審查或轉換 sub-agent、custom agent、worker/reviewer/planner agent 或 multi-agent workflow。 |
62
+ | Agent 架構 | `neo-agentic-design` | 設計、評估或實作 Agent 工作流、提示詞鏈、路由、規劃、反思、多 Agent 協作與記憶體管理等框架無關模式。 |
62
63
  | 文字潤飾 | `neo-stop-slop` | 去除繁中或英文中的 AI 腔、贅詞、公式化句式,支援文件、註解、commit message 與 PR 說明。 |
63
64
 
64
65
  ## 安裝
@@ -153,6 +154,7 @@ npx -p @moon791017/neo-skills install-system-instructions \
153
154
  | 建 Vue 3 元件 | `neo-vue` | `幫我重構這個 SFC,避免響應式踩坑` |
154
155
  | 改善 AI 開發流程 | `neo-agent-harness` | `評估這個專案讓 coding agent 協作的可靠度` |
155
156
  | 建立 sub-agent | `neo-sub-agent` | `幫我新增一個 Codex code-reviewer sub agent` |
157
+ | 設計 Agent 編排架構 | `neo-agentic-design` | `幫我設計一個多 Agent 客服系統的拓撲結構與重試機制` |
156
158
  | 去掉 AI 腔 | `neo-stop-slop` | `把這段 PR 說明改得自然、直接一點` |
157
159
 
158
160
  ## 開發
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@moon791017/neo-skills",
3
- "version": "1.1.10",
3
+ "version": "1.1.12",
4
4
  "type": "module",
5
5
  "description": "Neo Skills: A Universal AI Agent Skills Extension",
6
6
  "homepage": "https://neo-blog-iota.vercel.app/",
@@ -2,11 +2,11 @@
2
2
 
3
3
  Use this reference when designing loop architectures that automate agent-driven workflows beyond a single session.
4
4
 
5
- ## Loop Harness 的關係
5
+ ## Relationship Between Loops and Harnesses
6
6
 
7
- - Harness = 單一 agent 的工作環境(guides + sensors + gates
8
- - Loop = harness 之上的排程驅動層,讓 harness 自己跑
9
- - 設計 loop 不是取代 prompt,而是把反覆的 prompt 動作系統化
7
+ - Harness = the working environment for a single agent (guides + sensors + gates)
8
+ - Loop = the scheduling layer on top of the harness that lets the harness run itself
9
+ - Designing a loop does not replace prompts; it systematizes repetitive prompt actions
10
10
 
11
11
  ```text
12
12
  Loop = Automations + Worktrees + Skills + Connectors + Sub-agents + State
@@ -14,136 +14,136 @@ Loop = Automations + Worktrees + Skills + Connectors + Sub-agents + State
14
14
  running on top of the Harness
15
15
  ```
16
16
 
17
- ## 五個基本原件 + State
17
+ ## Five Primitives + State
18
18
 
19
- ### 1. Automations(心跳)
19
+ ### 1. Automations (Heartbeat)
20
20
 
21
- 沒有 automations loop 只跑一次;有了它才會重複。
21
+ A loop without automations runs only once; automations make it repeat.
22
22
 
23
- - 排程式觸發,定時執行探索與分類。
24
- - 找到問題的送 triage inbox,沒發現的自動歸檔。
25
- - 可搭配 skills 維護排程任務的可維護性——呼叫 `$skill-name` 而非貼一大段指令。
26
- - `/loop` 按頻率重複執行;`/goal` 持續執行直到停止條件成立,且由獨立模型判斷是否完成。
23
+ - Schedule-driven triggers that periodically run exploration and classification.
24
+ - Findings go to the triage inbox; non-findings are auto-archived.
25
+ - Pair with skills to keep scheduled tasks maintainable—invoke `$skill-name` instead of pasting a wall of instructions.
26
+ - `/loop` repeats at a set frequency; `/goal` runs until a stop condition is met, with an independent model judging completion.
27
27
 
28
- 工具對應:
28
+ Tool mapping:
29
29
 
30
- - CodexAutomations tab(選專案、prompt、頻率、環境),結果進 Triage inbox;`/goal` run-until-done
31
- - Claude Code:`/loop`、`/goal`、hookscronGitHub Actions
30
+ - Codex: Automations tab (select project, prompt, frequency, environment); results go to Triage inbox; `/goal` for run-until-done.
31
+ - Claude Code: `/loop`, `/goal`, hooks, cron, GitHub Actions.
32
32
 
33
- ### 2. Worktrees(隔離)
33
+ ### 2. Worktrees (Isolation)
34
34
 
35
- agent 並行時避免檔案衝突。
35
+ Prevent file conflicts when multiple agents run in parallel.
36
36
 
37
- - 每個 agent 在獨立的 git worktree 工作,共享 repo history
38
- - 一個 agent 的編輯不會碰到另一個的 checkout
39
- - 人的 review bandwidth 仍是瓶頸——worktree 解決機械衝突,但你能同時審幾條線決定了你能跑幾個 agent(orchestration tax)。
37
+ - Each agent works in its own git worktree, sharing repo history.
38
+ - One agent's edits never touch another agent's checkout.
39
+ - Human review bandwidth is still the bottleneck—worktrees solve mechanical conflicts, but the number of agents you can run is limited by how many threads you can review simultaneously (orchestration tax).
40
40
 
41
- 工具對應:
41
+ Tool mapping:
42
42
 
43
- - Codex:內建 worktree per thread
44
- - Claude Code:`git worktree`、`--worktree` flagsubagent `isolation: worktree` 設定。
43
+ - Codex: Built-in worktree per thread.
44
+ - Claude Code: `git worktree`, `--worktree` flag, subagent `isolation: worktree` setting.
45
45
 
46
- ### 3. Skills(知識固化)
46
+ ### 3. Skills (Crystallized Knowledge)
47
47
 
48
- 把反覆解釋的專案上下文寫成 SKILL.md
48
+ Write repeatedly explained project context into a SKILL.md.
49
49
 
50
- - 消除 intent debt:每次冷啟動,agent 會用自信的猜測填補意圖缺口。Skill 把意圖寫在外面,agent 每次讀取,不需重建。
51
- - 沒有 skills loop 每個 cycle 從零推導你的整個專案;有 skills loop 每次都帶著上次的知識跑。
52
- - Skill 是創作格式,Plugin 是發布格式——跨 repo 分享時打包成 plugin
50
+ - Eliminate intent debt: on every cold start, an agent fills intent gaps with confident guesses. A skill externalizes intent so the agent reads it every time instead of reconstructing it.
51
+ - A loop without skills re-derives your entire project from scratch each cycle; a loop with skills carries forward knowledge from the last run.
52
+ - A skill is an authoring format; a plugin is a distribution format—package skills as plugins when sharing across repos.
53
53
 
54
- 工具對應:
54
+ Tool mapping:
55
55
 
56
- - CodexAgent Skills (`SKILL.md`),用 `$name` `/skills` 呼叫,或由 description 自動觸發。
57
- - Claude CodeAgent Skills (`SKILL.md`)
56
+ - Codex: Agent Skills (`SKILL.md`), invoked via `$name` or `/skills`, or auto-triggered by description.
57
+ - Claude Code: Agent Skills (`SKILL.md`).
58
58
 
59
- ### 4. Plugins / Connectors(外部整合)
59
+ ### 4. Plugins / Connectors (External Integration)
60
60
 
61
- 透過 MCP 連接外部工具,讓 loop 能在真實環境中行動。
61
+ Connect external tools via MCP so the loop can act in real environments.
62
62
 
63
- - 可連接 issue tracker、database、staging API、Slack
64
- - Codex Claude Code 都用 MCP,connector 通常跨工具可用。
65
- - Plugins connectors skills 打包在一起,方便團隊成員一次安裝。
63
+ - Can connect to issue trackers, databases, staging APIs, Slack.
64
+ - Both Codex and Claude Code use MCP; connectors are generally cross-tool portable.
65
+ - Plugins bundle connectors and skills together for one-step team installation.
66
66
 
67
- 沒有 connectors loop 只能輸出建議;有 connectors loop 能直接開 PR、連 ticket、ping channel。
67
+ A loop without connectors can only output suggestions; a loop with connectors can open PRs, link tickets, and ping channels directly.
68
68
 
69
- ### 5. Sub-agents(生成與驗證分離)
69
+ ### 5. Sub-agents (Separating Generation from Verification)
70
70
 
71
- Loop 的結構性前提是把 maker checker 分開。
71
+ The structural premise of a loop is separating maker from checker.
72
72
 
73
- - 寫程式碼的 model 對自己的作業打分數太寬容。第二個 agent 用不同指令(有時不同 model)才能抓到第一個說服自己接受的問題。
74
- - `/goal` 底層也是 maker/checker 分離——用獨立的小模型判斷 loop 是否完成,而不是讓做事的 agent 自己說完成了。
75
- - 常見分工:一個 explore、一個 implement、一個 verify against spec
76
- - Sub-agents 會燒更多 token,花在值得第二意見的地方。
73
+ - The model that writes the code grades its own work too leniently. A second agent with different instructions (sometimes a different model) catches issues the first agent convinced itself to accept.
74
+ - `/goal` also uses maker/checker separation under the hood—an independent small model judges whether the loop is done, rather than letting the working agent declare itself finished.
75
+ - Common division of labor: one explores, one implements, one verifies against spec.
76
+ - Sub-agents burn more tokens; spend them where a second opinion is worthwhile.
77
77
 
78
- > **職責邊界**:本段只講「為什麼 loop 需要 maker/checker 分離」這個設計決策。具體 sub-agent 的定義格式、指令撰寫、model 選擇等實作細節,請使用 `neo-sub-agent` 技能。
78
+ > **Responsibility boundary**: This section only covers the design rationale for why loops need maker/checker separation. For implementation details such as sub-agent definition format, instruction writing, and model selection, use the `neo-sub-agent` skill.
79
79
 
80
- 工具對應:
80
+ Tool mapping:
81
81
 
82
- - Codex:`.codex/agents/` 下的 TOML 定義檔,每個有 name、descriptioninstructionsoptional model reasoning effort
83
- - Claude Code:`.claude/agents/` 下的 subagent 定義 + agent teams
82
+ - Codex: TOML definition files under `.codex/agents/`, each with name, description, instructions, optional model, and reasoning effort.
83
+ - Claude Code: Subagent definitions under `.claude/agents/` + agent teams.
84
84
 
85
- ### 6. State(外部記憶)
85
+ ### 6. State (External Memory)
86
86
 
87
- 模型在對話之間會遺忘,進度必須寫在 repo 裡。
87
+ Models forget between conversations; progress must be written to the repo.
88
88
 
89
- - 格式:markdown 檔、Linear board、或任何對話外的持久化儲存。
90
- - State 負責記住做過什麼、通過什麼、還剩什麼。每個 long-running agent 都依賴它:agent 會忘,repo 不會。
89
+ - Format: markdown files, Linear boards, or any persistent store outside the conversation.
90
+ - State tracks what was done, what passed, and what remains. Every long-running agent depends on it: agents forget, repos don't.
91
91
 
92
- ## 原件對照表
92
+ ## Primitives Comparison Table
93
93
 
94
- | 原件 | Loop 中的職責 | Codex | Claude Code |
94
+ | Primitive | Role in Loop | Codex | Claude Code |
95
95
  |:--|:--|:--|:--|
96
- | Automations | 排程探索與分類 | Automations tab, `/goal` | `/loop`, `/goal`, hooks, cron, GitHub Actions |
97
- | Worktrees | 隔離並行 | 內建 worktree per thread | `git worktree`, `--worktree`, `isolation: worktree` |
98
- | Skills | 固化專案知識 | Agent Skills (`SKILL.md`), `$name` | Agent Skills (`SKILL.md`) |
99
- | Plugins / Connectors | 外部工具整合 | Connectors (MCP) + Plugins | MCP servers + Plugins |
100
- | Sub-agents | 生成與驗證分離 | `.codex/agents/` TOML | `.claude/agents/` + agent teams |
101
- | State | 跨對話進度 | Markdown / Linear connector | Markdown (`AGENTS.md`, progress files) / Linear MCP |
96
+ | Automations | Scheduled exploration and classification | Automations tab, `/goal` | `/loop`, `/goal`, hooks, cron, GitHub Actions |
97
+ | Worktrees | Parallel isolation | Built-in worktree per thread | `git worktree`, `--worktree`, `isolation: worktree` |
98
+ | Skills | Crystallized project knowledge | Agent Skills (`SKILL.md`), `$name` | Agent Skills (`SKILL.md`) |
99
+ | Plugins / Connectors | External tool integration | Connectors (MCP) + Plugins | MCP servers + Plugins |
100
+ | Sub-agents | Separating generation from verification | `.codex/agents/` TOML | `.claude/agents/` + agent teams |
101
+ | State | Cross-conversation progress | Markdown / Linear connector | Markdown (`AGENTS.md`, progress files) / Linear MCP |
102
102
 
103
- ## 範例:一個完整 loop 的流程
103
+ ## Example: A Complete Loop Flow
104
104
 
105
- 1. **Automation** 每天早上在 repo 上執行,prompt 呼叫 triage skill
106
- 2. Triage skill 讀取昨天的 CI failuresopen issuesrecent commits
107
- 3. 發現值得處理的 findings,寫入 **state file** Linear board
108
- 4. 對每個 finding,開一個隔離的 **worktree**。
109
- 5. 送一個 **sub-agent**(maker)進 worktree 草擬修復。
110
- 6. 送第二個 **sub-agent**(checker)用專案 **skills** 和現有 tests 審查草稿。
111
- 7. **Connectors** PR、更新 ticket、CI 通過後 ping channel
112
- 8. 無法處理的 finding 送到 triage inbox 給人。
113
- 9. **State file** 記錄什麼被嘗試了、什麼通過了、什麼還開著。
114
- 10. 明天早上的 run state 接續。
105
+ 1. **Automation** runs on the repo every morning; prompt invokes the triage skill.
106
+ 2. Triage skill reads yesterday's CI failures, open issues, and recent commits.
107
+ 3. Noteworthy findings are written to a **state file** or Linear board.
108
+ 4. For each finding, an isolated **worktree** is created.
109
+ 5. A **sub-agent** (maker) is sent into the worktree to draft a fix.
110
+ 6. A second **sub-agent** (checker) reviews the draft using project **skills** and existing tests.
111
+ 7. **Connectors** open a PR, update the ticket, and ping the channel once CI passes.
112
+ 8. Findings that cannot be handled are sent to the triage inbox for humans.
113
+ 9. The **state file** records what was attempted, what passed, and what remains open.
114
+ 10. Tomorrow morning's run picks up from state.
115
115
 
116
- 你設計了一次,之後不再手動 prompt 任何步驟。
116
+ You design it once; after that, you never manually prompt any step.
117
117
 
118
- ## Loop 三大風險
118
+ ## Three Major Loop Risks
119
119
 
120
- ### 1. 驗證仍在你身上
120
+ ### 1. Verification Is Still on You
121
121
 
122
- Loop 無人值守時也會無人值守地犯錯。Maker/checker 分離是必要但不充分的——「done」是一個 claim,不是 proof。你的工作是 ship 你確認有效的程式碼。
122
+ An unattended loop also makes mistakes unattended. Maker/checker separation is necessary but not sufficient—"done" is a claim, not a proof. Your job is to ship code you have confirmed works.
123
123
 
124
- ### 2. 理解債(Comprehension Debt
124
+ ### 2. Comprehension Debt
125
125
 
126
- Loop 越快產出你沒寫的程式碼,你對系統的理解缺口越大。除非你讀 loop 產出的東西,否則理解債只會加速累積。
126
+ The faster a loop produces code you didn't write, the larger your understanding gap grows. Unless you read what the loop produces, comprehension debt only accelerates.
127
127
 
128
- ### 3. 認知投降(Cognitive Surrender
128
+ ### 3. Cognitive Surrender
129
129
 
130
- loop 自己跑,人很容易停止有主見、照單全收。同一個 loop 設計,有判斷力的人用來加速理解深入的工作,沒有判斷力的人用來迴避理解工作本身——同一動作,相反結果。
130
+ When a loop runs itself, people easily stop having opinions and accept everything at face value. The same loop design, used by someone with judgment, accelerates deeply understood work; used by someone without judgment, it becomes a way to avoid understanding the work itself—same action, opposite outcomes.
131
131
 
132
- ### 風險防護策略
132
+ ### Risk Mitigation Strategies
133
133
 
134
- - 定期抽查 loop 產出,不要只看 CI 綠燈。
135
- - 設定 loop 的產出量上限,避免 review backlog 失控。
136
- - state file 記錄人類最後審查的時間點。
137
- - 高風險變更(安全、合規、產品 scope)強制跳出 loop 等人。
138
- - 定期用 loop 的錯誤模式回饋改善 harnessagentic flywheel)。
134
+ - Periodically spot-check loop output; don't rely solely on green CI.
135
+ - Set output volume caps on the loop to prevent review backlog from spiraling.
136
+ - Record the timestamp of the last human review in the state file.
137
+ - Force high-risk changes (security, compliance, product scope) to exit the loop and wait for a human.
138
+ - Regularly feed loop error patterns back to improve the harness (agentic flywheel).
139
139
 
140
- ## 何時適合引入 Loop vs 留在 Harness
140
+ ## When to Introduce a Loop vs. Stay with the Harness
141
141
 
142
- | 條件 | 建議 |
142
+ | Condition | Recommendation |
143
143
  |:--|:--|
144
- | 專案沒有可靠的本地驗證指令 | 先建 harness |
145
- | CI 不穩定或經常紅燈 | 先修 CI |
146
- | 團隊對 agent 產出沒有 review 流程 | 先建 review 流程 |
147
- | Maturity Level < 3 | 先升級 harness |
148
- | 重複性高、風險低的任務(triage、格式修復、依賴更新) | 適合 loop |
149
- | 變更涉及產品 scope、安全、架構取捨 | 不適合全自動 loop |
144
+ | Project lacks reliable local verification commands | Build the harness first |
145
+ | CI is unstable or frequently red | Fix CI first |
146
+ | Team has no review process for agent output | Establish a review process first |
147
+ | Maturity Level < 3 | Upgrade the harness first |
148
+ | Highly repetitive, low-risk tasks (triage, format fixes, dependency updates) | Good fit for a loop |
149
+ | Changes involve product scope, security, or architecture trade-offs | Not suitable for a fully automated loop |
@@ -0,0 +1,89 @@
1
+ ---
2
+ name: neo-agentic-design
3
+ description: >
4
+ Use this skill when designing, evaluating, or implementing Agent workflows, prompt chains, routing, planning, reflection, multi-agent collaboration, memory management, or other framework-agnostic LLM orchestration patterns.
5
+ license: MIT
6
+ compatibility: No specific language runtime required; conceptual-only patterns.
7
+ metadata:
8
+ version: "1.0.0"
9
+ type: "conceptual-design"
10
+ ---
11
+
12
+ # Neo Agentic Design
13
+
14
+ This skill provides architectural concepts and orchestration patterns for building LLM Agent systems. It covers 21 core design patterns categorized into four themes. The orchestration logic remains abstract and independent of specific programming languages or frameworks.
15
+
16
+ ## Gotchas
17
+ * **Over-engineering**: Prioritize simple prompt chains (Chapter 1) or routing (Chapter 2). Use complex multi-agent collaboration (Chapter 7) or hierarchical networks only when necessary to reduce token overhead.
18
+ * **Reflection Infinite Loops**: When implementing reflection (Chapter 4) or self-correction (Chapter 12), enforce a maximum iteration limit (e.g., 3-5 iterations) to prevent the LLM from getting stuck in an infinite loop.
19
+ * **Blocking Operations**: High-risk operations (such as direct database deletions or large fund transfers) must include a Human-in-the-Loop review gate (Chapter 13).
20
+ * **Context Pruning State Loss**: When compressing context, protect critical agent instructions from being pruned to prevent behavioral degradation.
21
+
22
+ ## Workflow Checklist
23
+ Progress:
24
+ - [ ] Step 1: Analyze Requirements (define objectives, inputs, constraints, and complexity levels).
25
+ - [ ] Step 2: Select Orchestration Patterns (load corresponding reference documents based on requirements).
26
+ - [ ] Step 3: Plan System Components (determine memory, learning mechanisms, and protocol specifications).
27
+ - [ ] Step 4: Define Resilience and Safety (establish exception handling, human review gates, and input/output guardrails).
28
+ - [ ] Step 5: Draft Design Proposal (create system topology diagrams and describe the architecture).
29
+
30
+ ## Detailed Guidelines
31
+
32
+ ### Step 1 — Analyze Requirements
33
+ Evaluate problem complexity (Level 1, 2, or 3) and confirm:
34
+ 1. **Latency Sensitivity**: For low-latency requirements, prioritize parallelization (Chapter 3) and routing (Chapter 2).
35
+ 2. **Task Fragility**: For strict sequential tasks or error-prone processes, use chaining (Chapter 1) or planning (Chapter 6).
36
+
37
+ ### Step 2 — Load Design Patterns (Progressive Loading)
38
+ Load specific reference files as needed to avoid loading all concepts at once:
39
+ * Base workflows (Prompt Chaining, Routing, Parallelization, Reflection, Tool Use, Planning, Multi-Agent Collaboration):
40
+ 👉 **Load [base-workflows](references/base-workflows.md)**
41
+ * System infrastructure (Memory Management, Learning and Adaptation, MCP, Goal Setting and Monitoring):
42
+ 👉 **Load [system-components](references/system-components.md)**
43
+ * Exception handling, HITL, RAG fact-grounding:
44
+ 👉 **Load [resilience-hitl](references/resilience-hitl.md)**
45
+ * Advanced safety, evaluation, prioritization, A2A communication, exploration and discovery:
46
+ 👉 **Load [advanced-safety](references/advanced-safety.md)**
47
+
48
+ ### Step 3 — System Architecture Planning
49
+ The design document must clearly document:
50
+ 1. **State Space**: Context window management method and division of short-term and long-term memory (cognitive/procedural memory).
51
+ 2. **Tool Boundaries**: Tool call schema protocols and sandbox rules.
52
+ 3. **Safety Boundaries**: Specific conditions for triggering human approval (HITL) or falling back to backup models.
53
+
54
+ ---
55
+
56
+ ## Output Template (Agentic Architecture Design Proposal)
57
+
58
+ When presenting agent designs to users, use this template format:
59
+
60
+ ```markdown
61
+ # Agentic System Design Proposal: [System Name]
62
+
63
+ ## 1. Executive Summary
64
+ * **Complexity Level**: [Level 1 / Level 2 / Level 3]
65
+ * **Target Objective**: [System Goal]
66
+ * **Key Constraints**: [Constraints such as latency, cost, security, etc.]
67
+
68
+ ## 2. Core Orchestration Architecture
69
+ * **Selected Patterns**: [e.g., Router -> Parallel Agents -> Synthesizer]
70
+ * **Workflow Description**: [System data flow and control flow description]
71
+
72
+ ### Topology Diagram (Mermaid)
73
+ ```mermaid
74
+ [Mermaid diagram representing the Agent Loop / Topology]
75
+ ```
76
+
77
+ ## 3. Reference Patterns Applied
78
+ * **[Pattern Name] (Chapter X)**: [Specific application and rationale in the system]
79
+ * **[Pattern Name] (Chapter Y)**: [Specific application and rationale in the system]
80
+
81
+ ## 4. Resilience, Safety & HITL Rules
82
+ * **Exception Recovery**: [Handling flow for API timeouts, rate limits, and JSON formatting errors]
83
+ * **Human-in-the-Loop Gates**: [Conditions triggering human review]
84
+ * **Guardrails**: [Input filtering and output validation mechanisms]
85
+
86
+ ## 5. Next Steps / Implementation Roadmap
87
+ 1. [Step 1]
88
+ 2. [Step 2]
89
+ ```
@@ -0,0 +1,58 @@
1
+ [
2
+ {
3
+ "query": "I need to design a system that routes incoming user queries to specialized LLM prompts depending on their category.",
4
+ "should_trigger": true
5
+ },
6
+ {
7
+ "query": "How do I implement reflection and self-correction in a multi-agent system to make it write better code?",
8
+ "should_trigger": true
9
+ },
10
+ {
11
+ "query": "Can you review my LLM orchestration workflow? It currently uses prompt chaining but has high latency.",
12
+ "should_trigger": true
13
+ },
14
+ {
15
+ "query": "I want to set up a Model Context Protocol (MCP) server for my agent so it can read local files.",
16
+ "should_trigger": true
17
+ },
18
+ {
19
+ "query": "What is the best way to handle long-term semantic memory and episodic memory in an autonomous agent?",
20
+ "should_trigger": true
21
+ },
22
+ {
23
+ "query": "Please design a pipeline workflow for generating technical reports, with a human-in-the-loop validation step.",
24
+ "should_trigger": true
25
+ },
26
+ {
27
+ "query": "How does dynamic re-prioritization work when an agent has conflicting goals?",
28
+ "should_trigger": true
29
+ },
30
+ {
31
+ "query": "Review the exception handling and recovery mechanism in my LLM agent loop.",
32
+ "should_trigger": true
33
+ },
34
+ {
35
+ "query": "I need to write a Python script that calculates the Fibonacci sequence using recursion.",
36
+ "should_trigger": false
37
+ },
38
+ {
39
+ "query": "What is the difference between supervised learning and reinforcement learning in traditional machine learning?",
40
+ "should_trigger": false
41
+ },
42
+ {
43
+ "query": "How do I configure my local PostgreSQL database on macOS?",
44
+ "should_trigger": false
45
+ },
46
+ {
47
+ "query": "Write a CSS stylesheet for a dark mode website.",
48
+ "should_trigger": false
49
+ },
50
+ {
51
+ "query": "I want to build a simple web scraper in Python using beautifulsoup4.",
52
+ "should_trigger": false
53
+ },
54
+ {
55
+ "query": "How do I write a prompt to make ChatGPT act like a professional English translator?",
56
+ "should_trigger": false
57
+ }
58
+ ]
@@ -0,0 +1,27 @@
1
+ {
2
+ "skill_name": "neo-agentic-design",
3
+ "evals": [
4
+ {
5
+ "id": 1,
6
+ "prompt": "Design an agentic system that generates monthly financial reports. It must parse transaction raw data, categorize expenses, draft a report, let a human reviewer approve/edit the draft, and then output a final PDF. Minimize latency and ensure high accuracy.",
7
+ "expected_output": "An Agentic System Design Proposal containing Routing, Chaining, and Human-in-the-Loop patterns, structured with the standard output template.",
8
+ "assertions": [
9
+ "The output starts with 'Agentic System Design Proposal' or matches the template format",
10
+ "The proposal mentions Routing, Chaining, and Human-in-the-Loop patterns",
11
+ "The proposal contains a Mermaid sequence or flowchart diagram representing the topology",
12
+ "The proposal lists specific Gotchas or risks like latency and cost control"
13
+ ]
14
+ },
15
+ {
16
+ "id": 2,
17
+ "prompt": "I need to design a system that reviews incoming code commits for potential security vulnerabilities and performance bottlenecks. It needs to check thousands of commits daily and must fail-safely if any analysis tool crashes.",
18
+ "expected_output": "An Agentic System Design Proposal containing Parallelization, Routing, Guardrails, and Exception Recovery patterns, structured with the standard output template.",
19
+ "assertions": [
20
+ "The proposal includes Parallelization and Exception Recovery patterns",
21
+ "The proposal provides a Mermaid topology diagram showing parallel evaluation and a merge point",
22
+ "The proposal includes specific Exception Handling rules for crashed analysis tools",
23
+ "The proposal includes Guardrails policies for input/output sanitization"
24
+ ]
25
+ }
26
+ ]
27
+ }
@@ -0,0 +1,158 @@
1
+ # Advanced Execution, Guardrails & Safety
2
+
3
+ This document provides conceptual designs for advanced execution, guardrails, and safety patterns, covering agent-to-agent (A2A) communication, resource-aware optimization, reasoning techniques, guardrails, evaluation and monitoring, prioritization, and scientific exploration.
4
+
5
+ ---
6
+
7
+ ## Chapter 15: Inter-Agent Communication (A2A)
8
+
9
+ ### 1. Definition
10
+ An open agent communication protocol across frameworks and technology stacks. Uses standard HTTP and JSON-RPC formats to enable agent declaration, task delegation, and data exchange across different networks.
11
+
12
+ ### 2. Core Components
13
+ * **Agent Card**: A JSON declaration containing the agent name, version, endpoint URL, multimodal capabilities, and skills.
14
+ * **Task Mechanism**: Defines collaboration as a "Task" with a lifecycle state (Submitted, Working, Completed, Failed), tracked using a `contextId` for multi-turn conversation context.
15
+ * **Communication Modes**:
16
+ * **Synchronous**: Direct invocation with immediate response.
17
+ * **Asynchronous Polling**: Submit a task to obtain a Task ID and periodically query status.
18
+ * **Streaming (SSE)**: Receive partial outputs in real time via Server-Sent Events.
19
+ * **Webhook**: Actively push notifications to a specified URL upon task completion.
20
+
21
+ ### 3. Problems Addressed
22
+ * Heterogeneous framework silos: Solves communication barriers between different agent frameworks.
23
+ * Distributed collaboration barriers: Enables agents on different servers to safely delegate tasks.
24
+
25
+ ---
26
+
27
+ ## Chapter 16: Resource-Aware Optimization
28
+
29
+ ### 1. Definition
30
+ Monitors computation, latency, and financial costs (tokens/API calls) in real time during agent execution. Dynamically switches between models with different capabilities or prunes context based on budget and latency constraints.
31
+
32
+ ### 2. Problems Addressed
33
+ * API cost overruns: Avoids using expensive reasoning models for simple queries.
34
+ * Rate limits and overload: Executes fallbacks and backup plans when the primary model is limited or overloaded.
35
+
36
+ ### 3. Workflow
37
+ ```mermaid
38
+ graph TD
39
+ Query[User Query] --> Router{Router LLM}
40
+ Router -->|1. Simple Query| CheapModel[Lightweight Model]
41
+ Router -->|2. Complex Reasoning| ExpensiveModel[High-tier Reasoning Model]
42
+ Router -->|3. Real-time Info| SearchTool[Real-time Search Tool]
43
+ CheapModel --> Checker[Critique Agent: Quality Eval]
44
+ ExpensiveModel --> Checker
45
+ Checker -->|Fail| Fallback[Fallback Plan]
46
+ Checker -->|Pass| Output[Final Output]
47
+ ```
48
+
49
+ ---
50
+
51
+ ## Chapter 17: Reasoning Techniques
52
+
53
+ ### 1. Definition
54
+ Architectural techniques that allocate more computational resources at inference time to explicitly expand the agent's thought process. Covers step-by-step decomposition, tree-search path planning, code-assisted execution, and ReAct loops.
55
+
56
+ ### 2. Six Core Reasoning Patterns
57
+ * **Chain of Thought (CoT)**: Guides the model to reason step-by-step to decompose complex problems.
58
+ * **Tree of Thoughts (ToT)**: Represents the reasoning space as a tree, supporting backtracking and multi-path parallel evaluation.
59
+ * **Reasoning and Action (ReAct)**: Interleaves tool execution with reasoning steps (Thought -> Action -> Observation -> Thought ... -> Finish).
60
+ * **Program-Aided Language Models (PALMs)**: Offloads precise mathematical calculations to a secure code sandbox and interprets the results to eliminate calculation hallucinations.
61
+ * **Multi-Agent Debate (Chain/Graph of Debates)**: Employs multiple agents to debate a problem across several turns, using consensus or strong logical conclusions as the final answer.
62
+ * **Scaling Inference Law**: Uses multi-path generation, self-correction, or extended thinking paths during the inference stage, allowing smaller models to achieve performance comparable to a single generation of a larger model.
63
+
64
+ ---
65
+
66
+ ## Chapter 18: Guardrails & Safety Patterns
67
+
68
+ ### 1. Definition
69
+ Deploys multiple layers of filtering and defense at the input, tool execution, and output stages to ensure system compliance, safety, and protection against jailbreak attacks, prompt injection, and tool privilege escalation.
70
+
71
+ ### 2. Multi-Layer Defense Flow
72
+ ```mermaid
73
+ graph TD
74
+ Input[User Input] --> InputGuard[1. Input Guardrails: Jailbreak/Injection Detection]
75
+ InputGuard -->|Violation| Block[Access Denied]
76
+ InputGuard -->|Safe| LLM_Core[2. Core Agent Reasoning]
77
+ LLM_Core -->|Call Tool| ToolCallback[3. Pre-execution Tool Validation]
78
+ ToolCallback -->|Reject| LLM_Core
79
+ ToolCallback -->|Approve| ToolExec[Tool Execution]
80
+ ToolExec --> OutputGen[Output Generation]
81
+ OutputGen --> OutputGuard[4. Output Guardrails: PII/Toxicity Filter]
82
+ OutputGuard -->|Safe| User[Deliver to User]
83
+ OutputGuard -->|Violation| Redaction[Redaction/Block/Self-Correction]
84
+ ```
85
+
86
+ ### 3. Problems Addressed
87
+ * Prompt jailbreaks: Prevents users from guiding the agent to perform unauthorized or harmful actions.
88
+ * Privilege escalation: Follows the principle of least privilege to prevent agents from unauthorized data modification or account deletion.
89
+
90
+ ---
91
+
92
+ ## Chapter 19: Evaluation and Monitoring
93
+
94
+ ### 1. Definition
95
+ Systematically measures and audits agent execution quality, trajectories, resource consumption, and drift. Evaluates the execution trajectory rather than just the final answer for non-deterministic systems.
96
+
97
+ ### 2. Three Core Evaluation Aspects
98
+ * **Objective Metrics Monitoring**: Logs latency, token consumption, and API success rates.
99
+ * **Trajectory Evaluation**: Compares action sequences with standard SOPs using exact matching, ordered matching, or unordered matching.
100
+ * **LLM-as-a-Judge**: Uses an independent LLM to score answers based on specific rubrics and outputs structured feedback.
101
+
102
+ ### 3. Advanced Pattern: AI Contractor / Contract Pattern
103
+ Resolves prompt drift and responsibility ambiguity:
104
+ ```mermaid
105
+ graph TD
106
+ User[User] -->|1. Initiate Draft Contract| Contractor[Contractor Agent]
107
+ Contractor -->|2. Self-Analysis & Evaluation| Analyze[Analyze clauses, scope, cost, dependencies]
108
+ Analyze -->|3. Negotiate Feedback| User
109
+ User -->|4. Approve & Sign| Execute[5. Execution: Self-test & verify]
110
+ Execute -->|6. Decompose Tasks| SubContracts[Sub-contracts]
111
+ SubContracts --> SubAgents[Sub-agents]
112
+ Execute -->|7. Deliver Deliverables| User
113
+ ```
114
+
115
+ ---
116
+
117
+ ## Chapter 20: Prioritization
118
+
119
+ ### 1. Definition
120
+ Sorts and dynamically schedules the execution order of multiple goals and tasks when the agent is faced with resource constraints or limited budgets.
121
+
122
+ ### 2. Problems Addressed
123
+ * Deadlocks and lack of focus: Prevents delays in critical tasks caused by prioritizing minor ones.
124
+ * Inadequate crisis response: Ensures the agent can dynamically switch task context when high-priority events (e.g., safety alerts) occur.
125
+
126
+ ### 3. Prioritization Metrics
127
+ * **Urgency**: Time sensitivity (closeness to deadline).
128
+ * **Importance**: Impact on accomplishing the ultimate goal.
129
+ * **Dependencies**: Whether the task is a prerequisite for other tasks.
130
+ * **Cost-Benefit Ratio**: Expected payoff relative to consumed resources.
131
+
132
+ ### 4. Mechanism
133
+ Tasks are scored and entered into a Priority Queue, executed sequentially by the planner. The system recalculates weights and re-orders the queue (Dynamic Re-prioritization) or interrupts the current task when the environmental state changes.
134
+
135
+ ---
136
+
137
+ ## Chapter 21: Exploration and Discovery
138
+
139
+ ### 1. Definition
140
+ Enables the agent to proactively explore unknown domains (Unknown Unknowns), generate new knowledge, design experiments, and prove hypotheses.
141
+
142
+ ### 2. Multi-Agent Scientific Discovery Flow
143
+ ```mermaid
144
+ graph TD
145
+ Goal[Exploration Goal] --> GenAgent[1. Generation Agent]
146
+ GenAgent -->|Propose Hypothesis| RefAgent[2. Reflection Agent]
147
+ RefAgent -->|Peer Review / Correction Suggestions| GenAgent
148
+ RefAgent -->|Accepted Draft| RankAgent[3. Ranking Agent]
149
+ RankAgent -->|Elo Tournament Debate| BestHypotheses[Select Best Hypotheses]
150
+ BestHypotheses --> EvoAgent[4. Evolution Agent]
151
+ EvoAgent -->|Concept Merging & Non-linear Exploration| AdvancedHypo[Advanced Hypotheses]
152
+ AdvancedHypo --> LabAgent[5. Lab Agent]
153
+ LabAgent -->|Execute Code/Simulation/Analysis| FinalReport[6. Final LaTeX Report]
154
+ ```
155
+
156
+ ### 3. Trade-offs
157
+ * **Pros**: Explores unknown topics autonomously, discovering insights that exceed human experience.
158
+ * **Cons**: High uncertainty and heavy token consumption; requires strict safety guardrails to prevent generating hazardous protocols.
@@ -0,0 +1,219 @@
1
+ # Base Patterns & Workflows
2
+
3
+ This document provides conceptual designs for basic agentic orchestration patterns, covering prompt chaining, routing, parallelization, reflection, tool use, planning, and multi-agent collaboration.
4
+
5
+ ---
6
+
7
+ ## Chapter 1: Prompt Chaining
8
+
9
+ ### 1. Definition
10
+ Decomposes a complex task into multiple **sequentially dependent subtasks**. The structured output of the previous step serves as the input for the next step. Each step focuses on a single, clear objective.
11
+
12
+ ### 2. Problems Addressed
13
+ * Context dilution: Prevents the LLM from losing focus when processing large, complex tasks.
14
+ * Instruction drift: Avoids failures in a single prompt that contains too many rules.
15
+
16
+ ### 3. Workflow
17
+ ```mermaid
18
+ graph LR
19
+ Input[Raw Input] --> Step1[LLM Step A]
20
+ Step1 -->|Structured Output A| Step2[LLM Step B]
21
+ Step2 -->|Structured Output B| Step3[LLM Step C]
22
+ Step3 --> Output[Final Answer]
23
+ ```
24
+
25
+ ### 4. Trade-offs
26
+ * **Pros**: High predictability; easy to optimize prompts and perform unit testing for individual steps.
27
+ * **Cons**: High total latency due to sequential execution; errors in earlier steps propagate downstream (Error Cascade).
28
+
29
+ ### 5. Use Cases
30
+ * Multi-step article generation (Outline -> Draft -> Polish -> Format).
31
+ * Data extraction and compliance analysis.
32
+
33
+ ---
34
+
35
+ ## Chapter 2: Routing
36
+
37
+ ### 1. Definition
38
+ Dynamically redirects tasks to the most suitable execution path, specialized tool, or sub-agent based on input characteristics. Routing decisions are made by rule engines, semantic similarity, or LLM classifiers.
39
+
40
+ ### 2. Problems Addressed
41
+ * Resource waste: Avoids using expensive, slow high-tier models for simple queries.
42
+ * Tool clutter: Avoids crowding too many unrelated tools into a single agent's context window.
43
+
44
+ ### 3. Workflow
45
+ ```mermaid
46
+ graph TD
47
+ Input[User Input] --> Router[Routing Classifier / LLM]
48
+ Router -->|Path A| AgentA[Specialized Agent A / Tool A]
49
+ Router -->|Path B| AgentB[Specialized Agent B / Tool B]
50
+ Router -->|Path C| AgentC[Specialized Agent C / Tool C]
51
+ ```
52
+
53
+ ### 4. Trade-offs
54
+ * **Pros**: High modularity; reduces average system latency and token consumption.
55
+ * **Cons**: Routing errors directly cause downstream task failures; an extra routing decision layer adds minor latency.
56
+
57
+ ### 5. Use Cases
58
+ * Customer support dispatching (e.g., routing to billing, tech support, or returns agents).
59
+ * Pre-filtering for tool calls.
60
+
61
+ ---
62
+
63
+ ## Chapter 3: Parallelization
64
+
65
+ ### 1. Definition
66
+ Splits a large task into multiple **independent subtasks** executed in parallel (Fork) and aggregates the results at a single point (Join).
67
+
68
+ ### 2. Problems Addressed
69
+ * Cumulative linear latency: Solves the high time cost associated with sequential multi-step execution.
70
+ * Single-perspective limitation: Collects diverse solutions to the same problem simultaneously for synthesis.
71
+
72
+ ### 3. Workflow
73
+ ```mermaid
74
+ graph TD
75
+ Input[Raw Query] --> Splitter[Splitter]
76
+ Splitter --> TaskA[Parallel Task A]
77
+ Splitter --> TaskB[Parallel Task B]
78
+ Splitter --> TaskC[Parallel Task C]
79
+ TaskA --> Syn[Synthesis / Aggregator]
80
+ TaskB --> Syn
81
+ TaskC --> Syn
82
+ Syn --> Output[Aggregated Output]
83
+ ```
84
+
85
+ ### 4. Trade-offs
86
+ * **Pros**: Significantly reduces elapsed time; suitable for large-scale parallel filtering.
87
+ * **Cons**: High spikes in token usage, easily triggering API rate limits; reconciling inconsistent results requires additional algorithms or LLM overhead.
88
+
89
+ ### 5. Use Cases
90
+ * Static code analysis (checking security, performance, and style simultaneously).
91
+ * Large-scale information retrieval and cross-document comparison.
92
+
93
+ ---
94
+
95
+ ## Chapter 4: Reflection (Self-Correction)
96
+
97
+ ### 1. Definition
98
+ Introduces a dual-entity feedback mechanism: a Generator and a Critic. The Generator produces an initial draft, the Critic evaluates it for quality and provides feedback, and the Generator iteratively refines the output until termination conditions are met.
99
+
100
+ ### 2. Problems Addressed
101
+ * Unstable output quality: Prevents logical gaps, factual errors, or formatting anomalies.
102
+ * Overconfidence: Breaks cognitive blind spots of a single-turn generation via an independent critique mechanism.
103
+
104
+ ### 3. Workflow
105
+ ```mermaid
106
+ graph TD
107
+ Input[Task Goal] --> Gen[Generator]
108
+ Gen --> Draft[Initial Draft]
109
+ Draft --> Critic[Critic / Evaluator]
110
+ Critic --> Decision{Is Acceptable?}
111
+ Decision -->|No| Feedback[Feedback/Suggestions]
112
+ Feedback -->|Guide Correction| Gen
113
+ Decision -->|Yes| Output[Final Accepted Output]
114
+ ```
115
+
116
+ ### 4. Trade-offs
117
+ * **Pros**: Highly stable output quality, significantly reducing logical and formatting errors.
118
+ * **Cons**: Higher token consumption; extended execution time; potential for infinite loops if termination conditions are poorly defined.
119
+
120
+ ### 5. Use Cases
121
+ * Automated code generation and testing (write code -> run tests -> fix based on errors -> re-test).
122
+ * Strict compliance document drafting.
123
+
124
+ ---
125
+
126
+ ## Chapter 5: Tool Use / Function Calling
127
+
128
+ ### 1. Definition
129
+ The LLM reads the description format (schema) of external tools, autonomously decides when to call a tool and generates the parameters. The agent executes the tool in a sandbox or external system, and feeds the results back to the LLM for interpretation.
130
+
131
+ ### 2. Problems Addressed
132
+ * Information lag: Connects the model to real-time data.
133
+ * Lack of computation: Solves difficulties in mathematics and precise logical operations.
134
+ * Inability to affect external systems: Allows agents to send emails, write databases, or call APIs.
135
+
136
+ ### 3. Workflow
137
+ ```mermaid
138
+ sequenceDiagram
139
+ participant U as User
140
+ participant L as LLM (Core Reasoning)
141
+ participant A as Agent Execution Sandbox
142
+ participant T as External Tool / API
143
+ U->>L: Query
144
+ L->>L: Identify context, decide to use clock tool
145
+ L-->>A: Return tool name & structured parameters
146
+ A->>T: Call external API
147
+ T-->>A: Return real-time data
148
+ A-->>L: Send execution result back as context
149
+ L->>L: Synthesize and reason
150
+ L->>U: Respond to user
151
+ ```
152
+
153
+ ### 4. Trade-offs
154
+ * **Pros**: Greatly expands the action capabilities and data retrieval scope of the agent.
155
+ * **Cons**: Risk of parameter generation errors; security risks with external tools (requires strict sandboxing); vulnerability to external API instability.
156
+
157
+ ### 5. Use Cases
158
+ * Real-time data queries (weather, stock market, ERP systems).
159
+ * Data entry and control (sending notifications, database updates).
160
+
161
+ ---
162
+
163
+ ## Chapter 6: Planning
164
+
165
+ ### 1. Definition
166
+ Decomposes a high-level goal into an ordered set of dependent execution steps. The planner dynamically rewrites the remaining steps (replanning) based on environmental feedback and new information to ensure the goal is reached.
167
+
168
+ ### 2. Problems Addressed
169
+ * Goal drift: Prevents the agent from losing sight of the ultimate goal during multi-step execution.
170
+ * Dynamic environment changes: Automatically searches for alternative solutions if a step fails.
171
+
172
+ ### 3. Workflow
173
+ ```mermaid
174
+ graph TD
175
+ Goal[Ultimate Goal] --> Planner[Planner: Task Decomposition]
176
+ Planner --> Plan[Generate Step List 1, 2, 3...]
177
+ Plan --> Executor[Executor: Call tools / sub-steps sequentially]
178
+ Executor --> EnvFeedback[Environmental Feedback]
179
+ EnvFeedback --> Checker{Encounter obstacles/failure?}
180
+ Checker -->|Yes| Replanner[Dynamic Replanner: Update plan]
181
+ Replanner --> Plan
182
+ Checker -->|No| Next{All steps completed?}
183
+ Next -->|No| Executor
184
+ Next -->|Yes| Output[Goal Accomplished]
185
+ ```
186
+
187
+ ### 4. Trade-offs
188
+ * **Pros**: Highly adaptable; capable of autonomously handling complex, unstructured tasks.
189
+ * **Cons**: Very high cost in LLM calls for planning and replanning; plan errors propagate, drifting downstream actions away from the target.
190
+
191
+ ### 5. Use Cases
192
+ * Autonomous research assistants (Deep Research: dynamically selecting keywords, assessing information quality, diving deep into unknown domains).
193
+ * Automated software development (architecture design -> module division -> sequential development).
194
+
195
+ ---
196
+
197
+ ## Chapter 7: Multi-Agent Collaboration
198
+
199
+ ### 1. Definition
200
+ Distributes a large task among multiple **specialized agents with distinct personas and skills**. These agents coordinate task handoffs, discussions, and integration through a predefined collaboration topology.
201
+
202
+ ### 2. Problems Addressed
203
+ * Cognitive limits of a single core: Avoids overloading a single system prompt with too many instructions and roles.
204
+ * Unclear division of labor: Emulates human teams by dedicating specialists to specific tasks.
205
+
206
+ ### 3. Workflow
207
+ Four main collaboration topologies:
208
+ * **Handoffs (Network)**: Agent A finishes its task and hands over the context and control to Agent B.
209
+ * **Supervisor**: A central Supervisor agent coordinates, assigns tasks to specialists, and aggregates results.
210
+ * **Hierarchy**: Supervisors oversee sub-supervisors, delegating and aggregating tasks hierarchically.
211
+ * **Blackboard**: Agents read and write to a shared state space (blackboard), intervening autonomously as the state changes.
212
+
213
+ ### 4. Trade-offs
214
+ * **Pros**: Modular and scalable; allows mixing different model sizes/strengths to optimize costs.
215
+ * **Cons**: High communication overhead (multi-turn dialogues between agents); complex state management; risk of infinite discussion loops or unclear ownership.
216
+
217
+ ### 5. Use Cases
218
+ * Simulated software development teams (Product Manager -> Architect -> Engineer -> QA).
219
+ * Creative content generation and peer review.
@@ -0,0 +1,105 @@
1
+ # Resilience, Exceptions & HITL
2
+
3
+ This document provides conceptual designs for system resilience, human interaction, and knowledge grounding, covering exception handling, Human-in-the-Loop (HITL) gates, and Retrieval-Augmented Generation (RAG).
4
+
5
+ ---
6
+
7
+ ## Chapter 12: Exception Handling and Recovery
8
+
9
+ ### 1. Definition
10
+ Designs automatic detection, retry, fallback, and state rollback mechanisms for exceptions that may occur during agent execution (such as API timeouts, network disconnections, LLM format errors, and invalid tool parameters).
11
+
12
+ ### 2. Problems Addressed
13
+ * System fragility: Prevents long-cycle workflows from breaking due to transient network or API issues.
14
+ * Format pollution: Guides the LLM to self-heal when its output does not conform to the expected JSON schema.
15
+
16
+ ### 3. Workflow
17
+ ```mermaid
18
+ graph TD
19
+ Step[Execute Tool / Call LLM] --> Success{Successful?}
20
+ Success -->|Yes| Next[Proceed to Next Step]
21
+ Success -->|No: Exception| Detector[Exception Detector]
22
+ Detector --> RuleCheck{Evaluate Exception Type}
23
+ RuleCheck -->|Network/Timeout| Retry[Auto Retry with Backoff]
24
+ RuleCheck -->|Format Error| Refine[Guide LLM to Self-Correct]
25
+ RuleCheck -->|Tool Failure| Fallback[Route to Fallback/Alternative Tool]
26
+ RuleCheck -->|Critical Error| Rollback[Rollback State to Checkpoint]
27
+ Retry --> Step
28
+ Refine --> Step
29
+ Fallback --> Step
30
+ Rollback --> UserEscalation[Human Intervention]
31
+ ```
32
+
33
+ ### 4. Trade-offs
34
+ * **Pros**: Improves system robustness and reduces manual maintenance costs.
35
+ * **Cons**: Excessive retries or fallbacks can mask underlying bugs or quietly degrade output quality.
36
+
37
+ ---
38
+
39
+ ## Chapter 13: Human-in-the-Loop (HITL)
40
+
41
+ ### 1. Definition
42
+ Strategically embeds human review, intervention, and authorization mechanisms into the agent's autonomous decision-making workflow, combining human common sense, ethics, and legal judgment with AI automation.
43
+
44
+ ### 2. Problems Addressed
45
+ * High-risk operations: Prevents agent errors when performing large financial transactions, deleting sensitive data, or executing legally sensitive actions.
46
+ * Automation boundaries: Requests human guidance when decision confidence falls below a set threshold.
47
+
48
+ ### 3. Three Core Interaction Modes
49
+ ````carousel
50
+ ### 1. Human-in-the-Loop (HITL)
51
+ * **Mechanism**: The agent pauses when reaching a high-risk step (e.g., large bank transfer), suspends the task, and sends it to a pending review queue.
52
+ * **Workflow**: Agent pauses -> Human reviews (Approve/Reject/Modify) -> Agent receives input and resumes execution.
53
+ * **Key Characteristic**: Human approval is a mandatory gate.
54
+ <!-- slide -->
55
+ ### 2. Human-on-the-Loop (HOTL)
56
+ * **Mechanism**: The agent executes tasks autonomously while a human supervisor monitors and adjusts strategies.
57
+ * **Workflow**: Human sets macro rules (e.g., transaction limits) -> Agent trades automatically -> Human monitors metrics -> Human intervenes via a Kill Switch if necessary.
58
+ * **Key Characteristic**: Human does not intervene in individual decisions but maintains macro-level oversight.
59
+ <!-- slide -->
60
+ ### 3. Decision Augmentation
61
+ * **Mechanism**: The agent acts as an analytical assistant, gathering data and presenting candidates. Decision-making and execution are performed entirely by a human.
62
+ * **Workflow**: Human asks query -> Agent collects and analyzes data -> Agent proposes options A, B, and C with pros/cons -> Human selects and executes.
63
+ * **Key Characteristic**: Agent provides cognitive augmentation without execution authority.
64
+ ````
65
+
66
+ ### 4. Trade-offs
67
+ * **Pros**: Provides a safety net and compliance guarantee for high-risk decisions; collects human feedback to optimize agent alignment.
68
+ * **Cons**: Human intervention limits system scalability and speed; designing human-in-the-loop review queues increases development costs.
69
+
70
+ ---
71
+
72
+ ## Chapter 14: Knowledge Retrieval / RAG
73
+
74
+ ### 1. Definition
75
+ Retrieves relevant information from a knowledge base before the LLM generates a response, injecting the retrieved text chunks into the prompt context to guide the LLM toward producing factually grounded answers.
76
+
77
+ ### 2. Advanced Agentic RAG Variants
78
+ ```mermaid
79
+ graph TD
80
+ subgraph Traditional RAG
81
+ Query[User Query] --> VectorSearch[Vector Similarity Search]
82
+ VectorSearch --> Context[Concatenate Context Chunks]
83
+ Context --> LLMGen[LLM Generates Response]
84
+ end
85
+ subgraph Graph RAG
86
+ GQuery[User Query] --> GraphSearch[Navigate Knowledge Graph Nodes & Edges]
87
+ GraphSearch --> UnifiedContext[Cross-document Context Linkage]
88
+ end
89
+ subgraph Agentic RAG
90
+ AQuery[User Query] --> AgentLayer[Agent Decision Layer]
91
+ AgentLayer -->|1. Decompose Task| SubQueries[Multi-step Sub-retrieval Tasks]
92
+ AgentLayer -->|2. Self-Reflection| SourceVal[Source Timeliness & Quality Check]
93
+ AgentLayer -->|3. Resolve Conflicts| ConflictRecon[Active Conflict Reconciliation]
94
+ AgentLayer -->|4. Tool Call| WebSearch[Web Search for Knowledge Gaps]
95
+ end
96
+ ```
97
+
98
+ ### 3. Problems Addressed
99
+ * Outdated knowledge: Bypasses the temporal limits of static training data.
100
+ * Hallucination: Restricts the model within factual boundaries using verified document contexts.
101
+ * Fragmented information: Resolves vector search limitations that struggle to answer comprehensive questions spanning multiple documents.
102
+
103
+ ### 4. Trade-offs
104
+ * **Pros**: Minimizes factual errors; supports precise citations; imports private knowledge without retraining models.
105
+ * **Cons**: Highly sensitive to the quality of text chunking and embeddings; multi-step reasoning in Agentic RAG increases response latency.
@@ -0,0 +1,93 @@
1
+ # System Components & Protocols
2
+
3
+ This document provides conceptual designs for system architecture components, resources, and protocols, covering memory management, learning and adaptation, Model Context Protocol (MCP), and goal setting and monitoring.
4
+
5
+ ---
6
+
7
+ ## Chapter 8: Memory Management
8
+
9
+ ### 1. Definition
10
+ Provides agents with the ability to store and retrieve information across sessions and tasks through persistence mechanisms. The memory system is generally divided into short-term and long-term memory, managed by a unified Memory Service.
11
+
12
+ ### 2. Memory Classification
13
+ | Memory Type | Medium | Function | Eviction & Retrieval Mechanism |
14
+ | :--- | :--- | :--- | :--- |
15
+ | **Short-term** | Current Context Window | Stores current conversation context and task execution trajectory | Sliding window, context pruning, and summarization |
16
+ | **Long-term Semantic** | Vector Database / Knowledge Base | Retains factual knowledge, concepts, and external rules | Vector semantic retrieval based on user input |
17
+ | **Long-term Episodic** | Structured Database / Log Store | Records past task execution experiences and outcomes | Used for few-shot learning or similar scenario matching |
18
+ | **Long-term Procedural**| Codebase / Tool Definitions / Prompt Templates | Records Standard Operating Procedures (SOPs) and toolbox definitions for specific tasks | Dynamically loaded based on task type |
19
+
20
+ ### 3. Problems Addressed
21
+ * Amnesia (Context limits): Prevents long conversations from causing the LLM to lose critical history.
22
+ * Repeated errors: Ensures the agent learns from past executions to improve decision success rates.
23
+
24
+ ---
25
+
26
+ ## Chapter 9: Learning and Adaptation
27
+
28
+ ### 1. Definition
29
+ Enables the agent to autonomously modify prompts or self-modify execution code in a code sandbox (SICA - Self-Improving Coding Agent) by collecting behavioral feedback and rewards from interactions with the environment, users, or other agents.
30
+
31
+ ### 2. Problems Addressed
32
+ * Static configuration lag: Solves the issue of agents failing to adjust when environmental rules change.
33
+ * High development cost: Eliminates the manual process of fine-tuning prompts.
34
+
35
+ ### 3. Workflow
36
+ ```mermaid
37
+ graph TD
38
+ Interaction[Agent-Environment Interaction] --> Result[Execution Results & Metrics]
39
+ Result --> evaluator[Evaluator / Scoring System]
40
+ evaluator -->|Feedback/Score| Learner[Learning Engine]
41
+ Learner -->|Self-Optimize Prompts or Refactor Code| AgentUpgrade[Upgraded Agent]
42
+ AgentUpgrade -->|Next Task Turn| Interaction
43
+ ```
44
+
45
+ ### 4. Trade-offs
46
+ * **Pros**: High potential for long-term self-evolution; can discover high-quality logic not designed by humans in specific vertical disciplines (e.g., mathematical proofs, code generation).
47
+ * **Cons**: Unpredictable evolution paths, which may generate harmful mutations; self-modifying prompts can lead to privilege escalation or security vulnerabilities; extremely high overhead for training and testing iterations.
48
+
49
+ ---
50
+
51
+ ## Chapter 10: Model Context Protocol (MCP)
52
+
53
+ ### 1. Definition
54
+ A standardized **Client-Server communication protocol** that establishes a plug-and-play integration standard between LLMs/Agents (Clients) and external data sources, development tools, and API services (Servers). MCP standardizes three core types of context exchange: **Resources**, **Prompts**, and **Tools**.
55
+
56
+ ```mermaid
57
+ graph LR
58
+ subgraph Agentic Client
59
+ Agent[AI Agent / LLM]
60
+ end
61
+ subgraph MCP Server
62
+ Res[Resources: Files/Databases]
63
+ Pmt[Prompts: Templates]
64
+ Tls[Tools: APIs/Sandboxes]
65
+ end
66
+ Agent <-->|Standard JSON-RPC 2.0| MCP_Link[MCP Protocol Layer]
67
+ MCP_Link <--> Res
68
+ MCP_Link <--> Pmt
69
+ MCP_Link <--> Tls
70
+ ```
71
+
72
+ ### 2. Problems Addressed
73
+ * Tedious integration: Avoids repeatedly writing custom wrapper code when developing new agents or integrating new tools.
74
+ * Fragmented context acquisition: Provides external data and actions to the model in a unified interface format.
75
+
76
+ ### 3. Trade-offs
77
+ * **Pros**: Reduces integration costs for multiple tools and data sources; decouples data sources from reasoning entities; supports dynamic discovery.
78
+ * **Cons**: Protocol serialization and JSON-RPC wrapping introduce minor performance overhead; requires tool providers to actively adopt the protocol.
79
+
80
+ ---
81
+
82
+ ## Chapter 11: Goal Setting and Monitoring
83
+
84
+ ### 1. Definition
85
+ Sets structured and quantifiable goals (SMART principles) before agent initialization, and introduces an independent monitor during the execution phase to observe progress in real time (Progress Checkpoints), detect blocks, and trigger human-agent collaboration escalation when necessary.
86
+
87
+ ### 2. Problems Addressed
88
+ * Blind execution: Prevents agents from entering infinite retry loops when encountering logical obstacles, wasting budget.
89
+ * Lack of observability: Solves the black-box execution problem, providing a clear progress path.
90
+
91
+ ### 3. Use Cases
92
+ * Automated marketing campaign execution.
93
+ * Long-cycle autonomous codebase refactoring.