npm - @moon791017/neo-skills - Versions diffs - 1.1.10 → 1.1.12 - Mend

@moon791017/neo-skills 1.1.10 → 1.1.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/README.md +2 -0
package/package.json +1 -1
package/skills/neo-agent-harness/reference/loop-engineering.md +91 -91
package/skills/neo-agentic-design/SKILL.md +89 -0
package/skills/neo-agentic-design/evals/eval_queries.json +58 -0
package/skills/neo-agentic-design/evals/evals.json +27 -0
package/skills/neo-agentic-design/references/advanced-safety.md +158 -0
package/skills/neo-agentic-design/references/base-workflows.md +219 -0
package/skills/neo-agentic-design/references/resilience-hitl.md +105 -0
package/skills/neo-agentic-design/references/system-components.md +93 -0

package/README.md CHANGED Viewed

@@ -59,6 +59,7 @@
 | TypeScript | `neo-typescript` | 處理 TypeScript、tsconfig、strict mode、泛型、conditional/mapped/template literal types、ESM/CJS 與 runtime boundaries。 |
 | Vue | `neo-vue` | 建置、除錯、重構或審查 Vue 3、SFC、Composition API、Pinia、Vue Router、Vite 與 Vue+TypeScript。 |
 | Agent 架構 | `neo-sub-agent` | 設計、建立、審查或轉換 sub-agent、custom agent、worker/reviewer/planner agent 或 multi-agent workflow。 |
+| Agent 架構 | `neo-agentic-design` | 設計、評估或實作 Agent 工作流、提示詞鏈、路由、規劃、反思、多 Agent 協作與記憶體管理等框架無關模式。 |
 | 文字潤飾 | `neo-stop-slop` | 去除繁中或英文中的 AI 腔、贅詞、公式化句式，支援文件、註解、commit message 與 PR 說明。 |
 ## 安裝
@@ -153,6 +154,7 @@ npx -p @moon791017/neo-skills install-system-instructions \
 | 建 Vue 3 元件 | `neo-vue` | `幫我重構這個 SFC，避免響應式踩坑` |
 | 改善 AI 開發流程 | `neo-agent-harness` | `評估這個專案讓 coding agent 協作的可靠度` |
 | 建立 sub-agent | `neo-sub-agent` | `幫我新增一個 Codex code-reviewer sub agent` |
+| 設計 Agent 編排架構 | `neo-agentic-design` | `幫我設計一個多 Agent 客服系統的拓撲結構與重試機制` |
 | 去掉 AI 腔 | `neo-stop-slop` | `把這段 PR 說明改得自然、直接一點` |
 ## 開發

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@moon791017/neo-skills",
-  "version": "1.1.10",
+  "version": "1.1.12",
   "type": "module",
   "description": "Neo Skills: A Universal AI Agent Skills Extension",
   "homepage": "https://neo-blog-iota.vercel.app/",

package/skills/neo-agent-harness/reference/loop-engineering.md CHANGED Viewed

@@ -2,11 +2,11 @@
 Use this reference when designing loop architectures that automate agent-driven workflows beyond a single session.
-## Loop 與 Harness 的關係
+## Relationship Between Loops and Harnesses
-- Harness = 單一 agent 的工作環境（guides + sensors + gates）
-- Loop = harness 之上的排程驅動層，讓 harness 自己跑
-- 設計 loop 不是取代 prompt，而是把反覆的 prompt 動作系統化
+- Harness = the working environment for a single agent (guides + sensors + gates)
+- Loop = the scheduling layer on top of the harness that lets the harness run itself
+- Designing a loop does not replace prompts; it systematizes repetitive prompt actions
 ```text
 Loop = Automations + Worktrees + Skills + Connectors + Sub-agents + State
@@ -14,136 +14,136 @@ Loop = Automations + Worktrees + Skills + Connectors + Sub-agents + State
                           running on top of the Harness
 ```
-## 五個基本原件 + State
+## Five Primitives + State
-### 1. Automations（心跳）
+### 1. Automations (Heartbeat)
-沒有 automations 的 loop 只跑一次；有了它才會重複。
+A loop without automations runs only once; automations make it repeat.
-- 排程式觸發，定時執行探索與分類。
-- 找到問題的送 triage inbox，沒發現的自動歸檔。
-- 可搭配 skills 維護排程任務的可維護性——呼叫 `$skill-name` 而非貼一大段指令。
-- `/loop` 按頻率重複執行；`/goal` 持續執行直到停止條件成立，且由獨立模型判斷是否完成。
+- Schedule-driven triggers that periodically run exploration and classification.
+- Findings go to the triage inbox; non-findings are auto-archived.
+- Pair with skills to keep scheduled tasks maintainable—invoke `$skill-name` instead of pasting a wall of instructions.
+- `/loop` repeats at a set frequency; `/goal` runs until a stop condition is met, with an independent model judging completion.
-工具對應：
+Tool mapping:
-- Codex：Automations tab（選專案、prompt、頻率、環境），結果進 Triage inbox；`/goal` run-until-done。
-- Claude Code：`/loop`、`/goal`、hooks、cron、GitHub Actions。
+- Codex: Automations tab (select project, prompt, frequency, environment); results go to Triage inbox; `/goal` for run-until-done.
+- Claude Code: `/loop`, `/goal`, hooks, cron, GitHub Actions.
-### 2. Worktrees（隔離）
+### 2. Worktrees (Isolation)
-多 agent 並行時避免檔案衝突。
+Prevent file conflicts when multiple agents run in parallel.
-- 每個 agent 在獨立的 git worktree 工作，共享 repo history。
-- 一個 agent 的編輯不會碰到另一個的 checkout。
-- 人的 review bandwidth 仍是瓶頸——worktree 解決機械衝突，但你能同時審幾條線決定了你能跑幾個 agent（orchestration tax）。
+- Each agent works in its own git worktree, sharing repo history.
+- One agent's edits never touch another agent's checkout.
+- Human review bandwidth is still the bottleneck—worktrees solve mechanical conflicts, but the number of agents you can run is limited by how many threads you can review simultaneously (orchestration tax).
-工具對應：
+Tool mapping:
-- Codex：內建 worktree per thread。
-- Claude Code：`git worktree`、`--worktree` flag、subagent 的 `isolation: worktree` 設定。
+- Codex: Built-in worktree per thread.
+- Claude Code: `git worktree`, `--worktree` flag, subagent `isolation: worktree` setting.
-### 3. Skills（知識固化）
+### 3. Skills (Crystallized Knowledge)
-把反覆解釋的專案上下文寫成 SKILL.md。
+Write repeatedly explained project context into a SKILL.md.
-- 消除 intent debt：每次冷啟動，agent 會用自信的猜測填補意圖缺口。Skill 把意圖寫在外面，agent 每次讀取，不需重建。
-- 沒有 skills 的 loop 每個 cycle 從零推導你的整個專案；有 skills 的 loop 每次都帶著上次的知識跑。
-- Skill 是創作格式，Plugin 是發布格式——跨 repo 分享時打包成 plugin。
+- Eliminate intent debt: on every cold start, an agent fills intent gaps with confident guesses. A skill externalizes intent so the agent reads it every time instead of reconstructing it.
+- A loop without skills re-derives your entire project from scratch each cycle; a loop with skills carries forward knowledge from the last run.
+- A skill is an authoring format; a plugin is a distribution format—package skills as plugins when sharing across repos.
-工具對應：
+Tool mapping:
-- Codex：Agent Skills (`SKILL.md`)，用 `$name` 或 `/skills` 呼叫，或由 description 自動觸發。
-- Claude Code：Agent Skills (`SKILL.md`)。
+- Codex: Agent Skills (`SKILL.md`), invoked via `$name` or `/skills`, or auto-triggered by description.
+- Claude Code: Agent Skills (`SKILL.md`).
-### 4. Plugins / Connectors（外部整合）
+### 4. Plugins / Connectors (External Integration)
-透過 MCP 連接外部工具，讓 loop 能在真實環境中行動。
+Connect external tools via MCP so the loop can act in real environments.
-- 可連接 issue tracker、database、staging API、Slack。
-- Codex 和 Claude Code 都用 MCP，connector 通常跨工具可用。
-- Plugins 把 connectors 和 skills 打包在一起，方便團隊成員一次安裝。
+- Can connect to issue trackers, databases, staging APIs, Slack.
+- Both Codex and Claude Code use MCP; connectors are generally cross-tool portable.
+- Plugins bundle connectors and skills together for one-step team installation.
-沒有 connectors 的 loop 只能輸出建議；有 connectors 的 loop 能直接開 PR、連 ticket、ping channel。
+A loop without connectors can only output suggestions; a loop with connectors can open PRs, link tickets, and ping channels directly.
-### 5. Sub-agents（生成與驗證分離）
+### 5. Sub-agents (Separating Generation from Verification)
-Loop 的結構性前提是把 maker 和 checker 分開。
+The structural premise of a loop is separating maker from checker.
-- 寫程式碼的 model 對自己的作業打分數太寬容。第二個 agent 用不同指令（有時不同 model）才能抓到第一個說服自己接受的問題。
-- `/goal` 底層也是 maker/checker 分離——用獨立的小模型判斷 loop 是否完成，而不是讓做事的 agent 自己說完成了。
-- 常見分工：一個 explore、一個 implement、一個 verify against spec。
-- Sub-agents 會燒更多 token，花在值得第二意見的地方。
+- The model that writes the code grades its own work too leniently. A second agent with different instructions (sometimes a different model) catches issues the first agent convinced itself to accept.
+- `/goal` also uses maker/checker separation under the hood—an independent small model judges whether the loop is done, rather than letting the working agent declare itself finished.
+- Common division of labor: one explores, one implements, one verifies against spec.
+- Sub-agents burn more tokens; spend them where a second opinion is worthwhile.
-> **職責邊界**：本段只講「為什麼 loop 需要 maker/checker 分離」這個設計決策。具體 sub-agent 的定義格式、指令撰寫、model 選擇等實作細節，請使用 `neo-sub-agent` 技能。
+> **Responsibility boundary**: This section only covers the design rationale for why loops need maker/checker separation. For implementation details such as sub-agent definition format, instruction writing, and model selection, use the `neo-sub-agent` skill.
-工具對應：
+Tool mapping:
-- Codex：`.codex/agents/` 下的 TOML 定義檔，每個有 name、description、instructions、optional model 和 reasoning effort。
-- Claude Code：`.claude/agents/` 下的 subagent 定義 + agent teams。
+- Codex: TOML definition files under `.codex/agents/`, each with name, description, instructions, optional model, and reasoning effort.
+- Claude Code: Subagent definitions under `.claude/agents/` + agent teams.
-### 6. State（外部記憶）
+### 6. State (External Memory)
-模型在對話之間會遺忘，進度必須寫在 repo 裡。
+Models forget between conversations; progress must be written to the repo.
-- 格式：markdown 檔、Linear board、或任何對話外的持久化儲存。
-- State 負責記住做過什麼、通過什麼、還剩什麼。每個 long-running agent 都依賴它：agent 會忘，repo 不會。
+- Format: markdown files, Linear boards, or any persistent store outside the conversation.
+- State tracks what was done, what passed, and what remains. Every long-running agent depends on it: agents forget, repos don't.
-## 原件對照表
+## Primitives Comparison Table
-| 原件 | Loop 中的職責 | Codex | Claude Code |
+| Primitive | Role in Loop | Codex | Claude Code |
 |:--|:--|:--|:--|
-| Automations | 排程探索與分類 | Automations tab, `/goal` | `/loop`, `/goal`, hooks, cron, GitHub Actions |
-| Worktrees | 隔離並行 | 內建 worktree per thread | `git worktree`, `--worktree`, `isolation: worktree` |
-| Skills | 固化專案知識 | Agent Skills (`SKILL.md`), `$name` | Agent Skills (`SKILL.md`) |
-| Plugins / Connectors | 外部工具整合 | Connectors (MCP) + Plugins | MCP servers + Plugins |
-| Sub-agents | 生成與驗證分離 | `.codex/agents/` TOML | `.claude/agents/` + agent teams |
-| State | 跨對話進度 | Markdown / Linear connector | Markdown (`AGENTS.md`, progress files) / Linear MCP |
+| Automations | Scheduled exploration and classification | Automations tab, `/goal` | `/loop`, `/goal`, hooks, cron, GitHub Actions |
+| Worktrees | Parallel isolation | Built-in worktree per thread | `git worktree`, `--worktree`, `isolation: worktree` |
+| Skills | Crystallized project knowledge | Agent Skills (`SKILL.md`), `$name` | Agent Skills (`SKILL.md`) |
+| Plugins / Connectors | External tool integration | Connectors (MCP) + Plugins | MCP servers + Plugins |
+| Sub-agents | Separating generation from verification | `.codex/agents/` TOML | `.claude/agents/` + agent teams |
+| State | Cross-conversation progress | Markdown / Linear connector | Markdown (`AGENTS.md`, progress files) / Linear MCP |
-## 範例：一個完整 loop 的流程
+## Example: A Complete Loop Flow
-1. **Automation** 每天早上在 repo 上執行，prompt 呼叫 triage skill。
-2. Triage skill 讀取昨天的 CI failures、open issues、recent commits。
-3. 發現值得處理的 findings，寫入 **state file** 或 Linear board。
-4. 對每個 finding，開一個隔離的 **worktree**。
-5. 送一個 **sub-agent**（maker）進 worktree 草擬修復。
-6. 送第二個 **sub-agent**（checker）用專案 **skills** 和現有 tests 審查草稿。
-7. **Connectors** 開 PR、更新 ticket、CI 通過後 ping channel。
-8. 無法處理的 finding 送到 triage inbox 給人。
-9. **State file** 記錄什麼被嘗試了、什麼通過了、什麼還開著。
-10. 明天早上的 run 從 state 接續。
+1. **Automation** runs on the repo every morning; prompt invokes the triage skill.
+2. Triage skill reads yesterday's CI failures, open issues, and recent commits.
+3. Noteworthy findings are written to a **state file** or Linear board.
+4. For each finding, an isolated **worktree** is created.
+5. A **sub-agent** (maker) is sent into the worktree to draft a fix.
+6. A second **sub-agent** (checker) reviews the draft using project **skills** and existing tests.
+7. **Connectors** open a PR, update the ticket, and ping the channel once CI passes.
+8. Findings that cannot be handled are sent to the triage inbox for humans.
+9. The **state file** records what was attempted, what passed, and what remains open.
+10. Tomorrow morning's run picks up from state.
-你設計了一次，之後不再手動 prompt 任何步驟。
+You design it once; after that, you never manually prompt any step.
-## Loop 三大風險
+## Three Major Loop Risks
-### 1. 驗證仍在你身上
+### 1. Verification Is Still on You
-Loop 無人值守時也會無人值守地犯錯。Maker/checker 分離是必要但不充分的——「done」是一個 claim，不是 proof。你的工作是 ship 你確認有效的程式碼。
+An unattended loop also makes mistakes unattended. Maker/checker separation is necessary but not sufficient—"done" is a claim, not a proof. Your job is to ship code you have confirmed works.
-### 2. 理解債（Comprehension Debt）
+### 2. Comprehension Debt
-Loop 越快產出你沒寫的程式碼，你對系統的理解缺口越大。除非你讀 loop 產出的東西，否則理解債只會加速累積。
+The faster a loop produces code you didn't write, the larger your understanding gap grows. Unless you read what the loop produces, comprehension debt only accelerates.
-### 3. 認知投降（Cognitive Surrender）
+### 3. Cognitive Surrender
-當 loop 自己跑，人很容易停止有主見、照單全收。同一個 loop 設計，有判斷力的人用來加速理解深入的工作，沒有判斷力的人用來迴避理解工作本身——同一動作，相反結果。
+When a loop runs itself, people easily stop having opinions and accept everything at face value. The same loop design, used by someone with judgment, accelerates deeply understood work; used by someone without judgment, it becomes a way to avoid understanding the work itself—same action, opposite outcomes.
-### 風險防護策略
+### Risk Mitigation Strategies
-- 定期抽查 loop 產出，不要只看 CI 綠燈。
-- 設定 loop 的產出量上限，避免 review backlog 失控。
-- 在 state file 記錄人類最後審查的時間點。
-- 高風險變更（安全、合規、產品 scope）強制跳出 loop 等人。
-- 定期用 loop 的錯誤模式回饋改善 harness（agentic flywheel）。
+- Periodically spot-check loop output; don't rely solely on green CI.
+- Set output volume caps on the loop to prevent review backlog from spiraling.
+- Record the timestamp of the last human review in the state file.
+- Force high-risk changes (security, compliance, product scope) to exit the loop and wait for a human.
+- Regularly feed loop error patterns back to improve the harness (agentic flywheel).
-## 何時適合引入 Loop vs 留在 Harness
+## When to Introduce a Loop vs. Stay with the Harness
-| 條件 | 建議 |
+| Condition | Recommendation |
 |:--|:--|
-| 專案沒有可靠的本地驗證指令 | 先建 harness |
-| CI 不穩定或經常紅燈 | 先修 CI |
-| 團隊對 agent 產出沒有 review 流程 | 先建 review 流程 |
-| Maturity Level < 3 | 先升級 harness |
-| 重複性高、風險低的任務（triage、格式修復、依賴更新） | 適合 loop |
-| 變更涉及產品 scope、安全、架構取捨 | 不適合全自動 loop |
+| Project lacks reliable local verification commands | Build the harness first |
+| CI is unstable or frequently red | Fix CI first |
+| Team has no review process for agent output | Establish a review process first |
+| Maturity Level < 3 | Upgrade the harness first |
+| Highly repetitive, low-risk tasks (triage, format fixes, dependency updates) | Good fit for a loop |
+| Changes involve product scope, security, or architecture trade-offs | Not suitable for a fully automated loop |

package/skills/neo-agentic-design/SKILL.md ADDED Viewed

@@ -0,0 +1,89 @@
+---
+name: neo-agentic-design
+description: >
+  Use this skill when designing, evaluating, or implementing Agent workflows, prompt chains, routing, planning, reflection, multi-agent collaboration, memory management, or other framework-agnostic LLM orchestration patterns.
+license: MIT
+compatibility: No specific language runtime required; conceptual-only patterns.
+metadata:
+  version: "1.0.0"
+  type: "conceptual-design"
+---
+# Neo Agentic Design
+This skill provides architectural concepts and orchestration patterns for building LLM Agent systems. It covers 21 core design patterns categorized into four themes. The orchestration logic remains abstract and independent of specific programming languages or frameworks.
+## Gotchas
+* **Over-engineering**: Prioritize simple prompt chains (Chapter 1) or routing (Chapter 2). Use complex multi-agent collaboration (Chapter 7) or hierarchical networks only when necessary to reduce token overhead.
+* **Reflection Infinite Loops**: When implementing reflection (Chapter 4) or self-correction (Chapter 12), enforce a maximum iteration limit (e.g., 3-5 iterations) to prevent the LLM from getting stuck in an infinite loop.
+* **Blocking Operations**: High-risk operations (such as direct database deletions or large fund transfers) must include a Human-in-the-Loop review gate (Chapter 13).
+* **Context Pruning State Loss**: When compressing context, protect critical agent instructions from being pruned to prevent behavioral degradation.
+## Workflow Checklist
+Progress:
+- [ ] Step 1: Analyze Requirements (define objectives, inputs, constraints, and complexity levels).
+- [ ] Step 2: Select Orchestration Patterns (load corresponding reference documents based on requirements).
+- [ ] Step 3: Plan System Components (determine memory, learning mechanisms, and protocol specifications).
+- [ ] Step 4: Define Resilience and Safety (establish exception handling, human review gates, and input/output guardrails).
+- [ ] Step 5: Draft Design Proposal (create system topology diagrams and describe the architecture).
+## Detailed Guidelines
+### Step 1 — Analyze Requirements
+Evaluate problem complexity (Level 1, 2, or 3) and confirm:
+1. **Latency Sensitivity**: For low-latency requirements, prioritize parallelization (Chapter 3) and routing (Chapter 2).
+2. **Task Fragility**: For strict sequential tasks or error-prone processes, use chaining (Chapter 1) or planning (Chapter 6).
+### Step 2 — Load Design Patterns (Progressive Loading)
+Load specific reference files as needed to avoid loading all concepts at once:
+* Base workflows (Prompt Chaining, Routing, Parallelization, Reflection, Tool Use, Planning, Multi-Agent Collaboration):
+  👉 **Load [base-workflows](references/base-workflows.md)**
+* System infrastructure (Memory Management, Learning and Adaptation, MCP, Goal Setting and Monitoring):
+  👉 **Load [system-components](references/system-components.md)**
+* Exception handling, HITL, RAG fact-grounding:
+  👉 **Load [resilience-hitl](references/resilience-hitl.md)**
+* Advanced safety, evaluation, prioritization, A2A communication, exploration and discovery:
+  👉 **Load [advanced-safety](references/advanced-safety.md)**
+### Step 3 — System Architecture Planning
+The design document must clearly document:
+1. **State Space**: Context window management method and division of short-term and long-term memory (cognitive/procedural memory).
+2. **Tool Boundaries**: Tool call schema protocols and sandbox rules.
+3. **Safety Boundaries**: Specific conditions for triggering human approval (HITL) or falling back to backup models.
+---
+## Output Template (Agentic Architecture Design Proposal)
+When presenting agent designs to users, use this template format:
+```markdown
+# Agentic System Design Proposal: [System Name]
+## 1. Executive Summary
+* **Complexity Level**: [Level 1 / Level 2 / Level 3]
+* **Target Objective**: [System Goal]
+* **Key Constraints**: [Constraints such as latency, cost, security, etc.]
+## 2. Core Orchestration Architecture
+* **Selected Patterns**: [e.g., Router -> Parallel Agents -> Synthesizer]
+* **Workflow Description**: [System data flow and control flow description]
+### Topology Diagram (Mermaid)
+```mermaid
+[Mermaid diagram representing the Agent Loop / Topology]
+```
+## 3. Reference Patterns Applied
+* **[Pattern Name] (Chapter X)**: [Specific application and rationale in the system]
+* **[Pattern Name] (Chapter Y)**: [Specific application and rationale in the system]
+## 4. Resilience, Safety & HITL Rules
+* **Exception Recovery**: [Handling flow for API timeouts, rate limits, and JSON formatting errors]
+* **Human-in-the-Loop Gates**: [Conditions triggering human review]
+* **Guardrails**: [Input filtering and output validation mechanisms]
+## 5. Next Steps / Implementation Roadmap
+1. [Step 1]
+2. [Step 2]
+```

package/skills/neo-agentic-design/evals/eval_queries.json ADDED Viewed

@@ -0,0 +1,58 @@
+[
+  {
+    "query": "I need to design a system that routes incoming user queries to specialized LLM prompts depending on their category.",
+    "should_trigger": true
+  },
+  {
+    "query": "How do I implement reflection and self-correction in a multi-agent system to make it write better code?",
+    "should_trigger": true
+  },
+  {
+    "query": "Can you review my LLM orchestration workflow? It currently uses prompt chaining but has high latency.",
+    "should_trigger": true
+  },
+  {
+    "query": "I want to set up a Model Context Protocol (MCP) server for my agent so it can read local files.",
+    "should_trigger": true
+  },
+  {
+    "query": "What is the best way to handle long-term semantic memory and episodic memory in an autonomous agent?",
+    "should_trigger": true
+  },
+  {
+    "query": "Please design a pipeline workflow for generating technical reports, with a human-in-the-loop validation step.",
+    "should_trigger": true
+  },
+  {
+    "query": "How does dynamic re-prioritization work when an agent has conflicting goals?",
+    "should_trigger": true
+  },
+  {
+    "query": "Review the exception handling and recovery mechanism in my LLM agent loop.",
+    "should_trigger": true
+  },
+  {
+    "query": "I need to write a Python script that calculates the Fibonacci sequence using recursion.",
+    "should_trigger": false
+  },
+  {
+    "query": "What is the difference between supervised learning and reinforcement learning in traditional machine learning?",
+    "should_trigger": false
+  },
+  {
+    "query": "How do I configure my local PostgreSQL database on macOS?",
+    "should_trigger": false
+  },
+  {
+    "query": "Write a CSS stylesheet for a dark mode website.",
+    "should_trigger": false
+  },
+  {
+    "query": "I want to build a simple web scraper in Python using beautifulsoup4.",
+    "should_trigger": false
+  },
+  {
+    "query": "How do I write a prompt to make ChatGPT act like a professional English translator?",
+    "should_trigger": false
+  }
+]

package/skills/neo-agentic-design/evals/evals.json ADDED Viewed

@@ -0,0 +1,27 @@
+{
+  "skill_name": "neo-agentic-design",
+  "evals": [
+    {
+      "id": 1,
+      "prompt": "Design an agentic system that generates monthly financial reports. It must parse transaction raw data, categorize expenses, draft a report, let a human reviewer approve/edit the draft, and then output a final PDF. Minimize latency and ensure high accuracy.",
+      "expected_output": "An Agentic System Design Proposal containing Routing, Chaining, and Human-in-the-Loop patterns, structured with the standard output template.",
+      "assertions": [
+        "The output starts with 'Agentic System Design Proposal' or matches the template format",
+        "The proposal mentions Routing, Chaining, and Human-in-the-Loop patterns",
+        "The proposal contains a Mermaid sequence or flowchart diagram representing the topology",
+        "The proposal lists specific Gotchas or risks like latency and cost control"
+      ]
+    },
+    {
+      "id": 2,
+      "prompt": "I need to design a system that reviews incoming code commits for potential security vulnerabilities and performance bottlenecks. It needs to check thousands of commits daily and must fail-safely if any analysis tool crashes.",
+      "expected_output": "An Agentic System Design Proposal containing Parallelization, Routing, Guardrails, and Exception Recovery patterns, structured with the standard output template.",
+      "assertions": [
+        "The proposal includes Parallelization and Exception Recovery patterns",
+        "The proposal provides a Mermaid topology diagram showing parallel evaluation and a merge point",
+        "The proposal includes specific Exception Handling rules for crashed analysis tools",
+        "The proposal includes Guardrails policies for input/output sanitization"
+      ]
+    }
+  ]
+}

package/skills/neo-agentic-design/references/advanced-safety.md ADDED Viewed

@@ -0,0 +1,158 @@
+# Advanced Execution, Guardrails & Safety
+This document provides conceptual designs for advanced execution, guardrails, and safety patterns, covering agent-to-agent (A2A) communication, resource-aware optimization, reasoning techniques, guardrails, evaluation and monitoring, prioritization, and scientific exploration.
+---
+## Chapter 15: Inter-Agent Communication (A2A)
+### 1. Definition
+An open agent communication protocol across frameworks and technology stacks. Uses standard HTTP and JSON-RPC formats to enable agent declaration, task delegation, and data exchange across different networks.
+### 2. Core Components
+* **Agent Card**: A JSON declaration containing the agent name, version, endpoint URL, multimodal capabilities, and skills.
+* **Task Mechanism**: Defines collaboration as a "Task" with a lifecycle state (Submitted, Working, Completed, Failed), tracked using a `contextId` for multi-turn conversation context.
+* **Communication Modes**:
+  * **Synchronous**: Direct invocation with immediate response.
+  * **Asynchronous Polling**: Submit a task to obtain a Task ID and periodically query status.
+  * **Streaming (SSE)**: Receive partial outputs in real time via Server-Sent Events.
+  * **Webhook**: Actively push notifications to a specified URL upon task completion.
+### 3. Problems Addressed
+* Heterogeneous framework silos: Solves communication barriers between different agent frameworks.
+* Distributed collaboration barriers: Enables agents on different servers to safely delegate tasks.
+---
+## Chapter 16: Resource-Aware Optimization
+### 1. Definition
+Monitors computation, latency, and financial costs (tokens/API calls) in real time during agent execution. Dynamically switches between models with different capabilities or prunes context based on budget and latency constraints.
+### 2. Problems Addressed
+* API cost overruns: Avoids using expensive reasoning models for simple queries.
+* Rate limits and overload: Executes fallbacks and backup plans when the primary model is limited or overloaded.
+### 3. Workflow
+```mermaid
+graph TD
+    Query[User Query] --> Router{Router LLM}
+    Router -->|1. Simple Query| CheapModel[Lightweight Model]
+    Router -->|2. Complex Reasoning| ExpensiveModel[High-tier Reasoning Model]
+    Router -->|3. Real-time Info| SearchTool[Real-time Search Tool]
+    CheapModel --> Checker[Critique Agent: Quality Eval]
+    ExpensiveModel --> Checker
+    Checker -->|Fail| Fallback[Fallback Plan]
+    Checker -->|Pass| Output[Final Output]
+```
+---
+## Chapter 17: Reasoning Techniques
+### 1. Definition
+Architectural techniques that allocate more computational resources at inference time to explicitly expand the agent's thought process. Covers step-by-step decomposition, tree-search path planning, code-assisted execution, and ReAct loops.
+### 2. Six Core Reasoning Patterns
+* **Chain of Thought (CoT)**: Guides the model to reason step-by-step to decompose complex problems.
+* **Tree of Thoughts (ToT)**: Represents the reasoning space as a tree, supporting backtracking and multi-path parallel evaluation.
+* **Reasoning and Action (ReAct)**: Interleaves tool execution with reasoning steps (Thought -> Action -> Observation -> Thought ... -> Finish).
+* **Program-Aided Language Models (PALMs)**: Offloads precise mathematical calculations to a secure code sandbox and interprets the results to eliminate calculation hallucinations.
+* **Multi-Agent Debate (Chain/Graph of Debates)**: Employs multiple agents to debate a problem across several turns, using consensus or strong logical conclusions as the final answer.
+* **Scaling Inference Law**: Uses multi-path generation, self-correction, or extended thinking paths during the inference stage, allowing smaller models to achieve performance comparable to a single generation of a larger model.
+---
+## Chapter 18: Guardrails & Safety Patterns
+### 1. Definition
+Deploys multiple layers of filtering and defense at the input, tool execution, and output stages to ensure system compliance, safety, and protection against jailbreak attacks, prompt injection, and tool privilege escalation.
+### 2. Multi-Layer Defense Flow
+```mermaid
+graph TD
+    Input[User Input] --> InputGuard[1. Input Guardrails: Jailbreak/Injection Detection]
+    InputGuard -->|Violation| Block[Access Denied]
+    InputGuard -->|Safe| LLM_Core[2. Core Agent Reasoning]
+    LLM_Core -->|Call Tool| ToolCallback[3. Pre-execution Tool Validation]
+    ToolCallback -->|Reject| LLM_Core
+    ToolCallback -->|Approve| ToolExec[Tool Execution]
+    ToolExec --> OutputGen[Output Generation]
+    OutputGen --> OutputGuard[4. Output Guardrails: PII/Toxicity Filter]
+    OutputGuard -->|Safe| User[Deliver to User]
+    OutputGuard -->|Violation| Redaction[Redaction/Block/Self-Correction]
+```
+### 3. Problems Addressed
+* Prompt jailbreaks: Prevents users from guiding the agent to perform unauthorized or harmful actions.
+* Privilege escalation: Follows the principle of least privilege to prevent agents from unauthorized data modification or account deletion.
+---
+## Chapter 19: Evaluation and Monitoring
+### 1. Definition
+Systematically measures and audits agent execution quality, trajectories, resource consumption, and drift. Evaluates the execution trajectory rather than just the final answer for non-deterministic systems.
+### 2. Three Core Evaluation Aspects
+* **Objective Metrics Monitoring**: Logs latency, token consumption, and API success rates.
+* **Trajectory Evaluation**: Compares action sequences with standard SOPs using exact matching, ordered matching, or unordered matching.
+* **LLM-as-a-Judge**: Uses an independent LLM to score answers based on specific rubrics and outputs structured feedback.
+### 3. Advanced Pattern: AI Contractor / Contract Pattern
+Resolves prompt drift and responsibility ambiguity:
+```mermaid
+graph TD
+    User[User] -->|1. Initiate Draft Contract| Contractor[Contractor Agent]
+    Contractor -->|2. Self-Analysis & Evaluation| Analyze[Analyze clauses, scope, cost, dependencies]
+    Analyze -->|3. Negotiate Feedback| User
+    User -->|4. Approve & Sign| Execute[5. Execution: Self-test & verify]
+    Execute -->|6. Decompose Tasks| SubContracts[Sub-contracts]
+    SubContracts --> SubAgents[Sub-agents]
+    Execute -->|7. Deliver Deliverables| User
+```
+---
+## Chapter 20: Prioritization
+### 1. Definition
+Sorts and dynamically schedules the execution order of multiple goals and tasks when the agent is faced with resource constraints or limited budgets.
+### 2. Problems Addressed
+* Deadlocks and lack of focus: Prevents delays in critical tasks caused by prioritizing minor ones.
+* Inadequate crisis response: Ensures the agent can dynamically switch task context when high-priority events (e.g., safety alerts) occur.
+### 3. Prioritization Metrics
+* **Urgency**: Time sensitivity (closeness to deadline).
+* **Importance**: Impact on accomplishing the ultimate goal.
+* **Dependencies**: Whether the task is a prerequisite for other tasks.
+* **Cost-Benefit Ratio**: Expected payoff relative to consumed resources.
+### 4. Mechanism
+Tasks are scored and entered into a Priority Queue, executed sequentially by the planner. The system recalculates weights and re-orders the queue (Dynamic Re-prioritization) or interrupts the current task when the environmental state changes.
+---
+## Chapter 21: Exploration and Discovery
+### 1. Definition
+Enables the agent to proactively explore unknown domains (Unknown Unknowns), generate new knowledge, design experiments, and prove hypotheses.
+### 2. Multi-Agent Scientific Discovery Flow
+```mermaid
+graph TD
+    Goal[Exploration Goal] --> GenAgent[1. Generation Agent]
+    GenAgent -->|Propose Hypothesis| RefAgent[2. Reflection Agent]
+    RefAgent -->|Peer Review / Correction Suggestions| GenAgent
+    RefAgent -->|Accepted Draft| RankAgent[3. Ranking Agent]
+    RankAgent -->|Elo Tournament Debate| BestHypotheses[Select Best Hypotheses]
+    BestHypotheses --> EvoAgent[4. Evolution Agent]
+    EvoAgent -->|Concept Merging & Non-linear Exploration| AdvancedHypo[Advanced Hypotheses]
+    AdvancedHypo --> LabAgent[5. Lab Agent]
+    LabAgent -->|Execute Code/Simulation/Analysis| FinalReport[6. Final LaTeX Report]
+```
+### 3. Trade-offs
+* **Pros**: Explores unknown topics autonomously, discovering insights that exceed human experience.
+* **Cons**: High uncertainty and heavy token consumption; requires strict safety guardrails to prevent generating hazardous protocols.

package/skills/neo-agentic-design/references/base-workflows.md ADDED Viewed

@@ -0,0 +1,219 @@
+# Base Patterns & Workflows
+This document provides conceptual designs for basic agentic orchestration patterns, covering prompt chaining, routing, parallelization, reflection, tool use, planning, and multi-agent collaboration.
+---
+## Chapter 1: Prompt Chaining
+### 1. Definition
+Decomposes a complex task into multiple **sequentially dependent subtasks**. The structured output of the previous step serves as the input for the next step. Each step focuses on a single, clear objective.
+### 2. Problems Addressed
+* Context dilution: Prevents the LLM from losing focus when processing large, complex tasks.
+* Instruction drift: Avoids failures in a single prompt that contains too many rules.
+### 3. Workflow
+```mermaid
+graph LR
+    Input[Raw Input] --> Step1[LLM Step A]
+    Step1 -->|Structured Output A| Step2[LLM Step B]
+    Step2 -->|Structured Output B| Step3[LLM Step C]
+    Step3 --> Output[Final Answer]
+```
+### 4. Trade-offs
+* **Pros**: High predictability; easy to optimize prompts and perform unit testing for individual steps.
+* **Cons**: High total latency due to sequential execution; errors in earlier steps propagate downstream (Error Cascade).
+### 5. Use Cases
+* Multi-step article generation (Outline -> Draft -> Polish -> Format).
+* Data extraction and compliance analysis.
+---
+## Chapter 2: Routing
+### 1. Definition
+Dynamically redirects tasks to the most suitable execution path, specialized tool, or sub-agent based on input characteristics. Routing decisions are made by rule engines, semantic similarity, or LLM classifiers.
+### 2. Problems Addressed
+* Resource waste: Avoids using expensive, slow high-tier models for simple queries.
+* Tool clutter: Avoids crowding too many unrelated tools into a single agent's context window.
+### 3. Workflow
+```mermaid
+graph TD
+    Input[User Input] --> Router[Routing Classifier / LLM]
+    Router -->|Path A| AgentA[Specialized Agent A / Tool A]
+    Router -->|Path B| AgentB[Specialized Agent B / Tool B]
+    Router -->|Path C| AgentC[Specialized Agent C / Tool C]
+```
+### 4. Trade-offs
+* **Pros**: High modularity; reduces average system latency and token consumption.
+* **Cons**: Routing errors directly cause downstream task failures; an extra routing decision layer adds minor latency.
+### 5. Use Cases
+* Customer support dispatching (e.g., routing to billing, tech support, or returns agents).
+* Pre-filtering for tool calls.
+---
+## Chapter 3: Parallelization
+### 1. Definition
+Splits a large task into multiple **independent subtasks** executed in parallel (Fork) and aggregates the results at a single point (Join).
+### 2. Problems Addressed
+* Cumulative linear latency: Solves the high time cost associated with sequential multi-step execution.
+* Single-perspective limitation: Collects diverse solutions to the same problem simultaneously for synthesis.
+### 3. Workflow
+```mermaid
+graph TD
+    Input[Raw Query] --> Splitter[Splitter]
+    Splitter --> TaskA[Parallel Task A]
+    Splitter --> TaskB[Parallel Task B]
+    Splitter --> TaskC[Parallel Task C]
+    TaskA --> Syn[Synthesis / Aggregator]
+    TaskB --> Syn
+    TaskC --> Syn
+    Syn --> Output[Aggregated Output]
+```
+### 4. Trade-offs
+* **Pros**: Significantly reduces elapsed time; suitable for large-scale parallel filtering.
+* **Cons**: High spikes in token usage, easily triggering API rate limits; reconciling inconsistent results requires additional algorithms or LLM overhead.
+### 5. Use Cases
+* Static code analysis (checking security, performance, and style simultaneously).
+* Large-scale information retrieval and cross-document comparison.
+---
+## Chapter 4: Reflection (Self-Correction)
+### 1. Definition
+Introduces a dual-entity feedback mechanism: a Generator and a Critic. The Generator produces an initial draft, the Critic evaluates it for quality and provides feedback, and the Generator iteratively refines the output until termination conditions are met.
+### 2. Problems Addressed
+* Unstable output quality: Prevents logical gaps, factual errors, or formatting anomalies.
+* Overconfidence: Breaks cognitive blind spots of a single-turn generation via an independent critique mechanism.
+### 3. Workflow
+```mermaid
+graph TD
+    Input[Task Goal] --> Gen[Generator]
+    Gen --> Draft[Initial Draft]
+    Draft --> Critic[Critic / Evaluator]
+    Critic --> Decision{Is Acceptable?}
+    Decision -->|No| Feedback[Feedback/Suggestions]
+    Feedback -->|Guide Correction| Gen
+    Decision -->|Yes| Output[Final Accepted Output]
+```
+### 4. Trade-offs
+* **Pros**: Highly stable output quality, significantly reducing logical and formatting errors.
+* **Cons**: Higher token consumption; extended execution time; potential for infinite loops if termination conditions are poorly defined.
+### 5. Use Cases
+* Automated code generation and testing (write code -> run tests -> fix based on errors -> re-test).
+* Strict compliance document drafting.
+---
+## Chapter 5: Tool Use / Function Calling
+### 1. Definition
+The LLM reads the description format (schema) of external tools, autonomously decides when to call a tool and generates the parameters. The agent executes the tool in a sandbox or external system, and feeds the results back to the LLM for interpretation.
+### 2. Problems Addressed
+* Information lag: Connects the model to real-time data.
+* Lack of computation: Solves difficulties in mathematics and precise logical operations.
+* Inability to affect external systems: Allows agents to send emails, write databases, or call APIs.
+### 3. Workflow
+```mermaid
+sequenceDiagram
+    participant U as User
+    participant L as LLM (Core Reasoning)
+    participant A as Agent Execution Sandbox
+    participant T as External Tool / API
+    U->>L: Query
+    L->>L: Identify context, decide to use clock tool
+    L-->>A: Return tool name & structured parameters
+    A->>T: Call external API
+    T-->>A: Return real-time data
+    A-->>L: Send execution result back as context
+    L->>L: Synthesize and reason
+    L->>U: Respond to user
+```
+### 4. Trade-offs
+* **Pros**: Greatly expands the action capabilities and data retrieval scope of the agent.
+* **Cons**: Risk of parameter generation errors; security risks with external tools (requires strict sandboxing); vulnerability to external API instability.
+### 5. Use Cases
+* Real-time data queries (weather, stock market, ERP systems).
+* Data entry and control (sending notifications, database updates).
+---
+## Chapter 6: Planning
+### 1. Definition
+Decomposes a high-level goal into an ordered set of dependent execution steps. The planner dynamically rewrites the remaining steps (replanning) based on environmental feedback and new information to ensure the goal is reached.
+### 2. Problems Addressed
+* Goal drift: Prevents the agent from losing sight of the ultimate goal during multi-step execution.
+* Dynamic environment changes: Automatically searches for alternative solutions if a step fails.
+### 3. Workflow
+```mermaid
+graph TD
+    Goal[Ultimate Goal] --> Planner[Planner: Task Decomposition]
+    Planner --> Plan[Generate Step List 1, 2, 3...]
+    Plan --> Executor[Executor: Call tools / sub-steps sequentially]
+    Executor --> EnvFeedback[Environmental Feedback]
+    EnvFeedback --> Checker{Encounter obstacles/failure?}
+    Checker -->|Yes| Replanner[Dynamic Replanner: Update plan]
+    Replanner --> Plan
+    Checker -->|No| Next{All steps completed?}
+    Next -->|No| Executor
+    Next -->|Yes| Output[Goal Accomplished]
+```
+### 4. Trade-offs
+* **Pros**: Highly adaptable; capable of autonomously handling complex, unstructured tasks.
+* **Cons**: Very high cost in LLM calls for planning and replanning; plan errors propagate, drifting downstream actions away from the target.
+### 5. Use Cases
+* Autonomous research assistants (Deep Research: dynamically selecting keywords, assessing information quality, diving deep into unknown domains).
+* Automated software development (architecture design -> module division -> sequential development).
+---
+## Chapter 7: Multi-Agent Collaboration
+### 1. Definition
+Distributes a large task among multiple **specialized agents with distinct personas and skills**. These agents coordinate task handoffs, discussions, and integration through a predefined collaboration topology.
+### 2. Problems Addressed
+* Cognitive limits of a single core: Avoids overloading a single system prompt with too many instructions and roles.
+* Unclear division of labor: Emulates human teams by dedicating specialists to specific tasks.
+### 3. Workflow
+Four main collaboration topologies:
+* **Handoffs (Network)**: Agent A finishes its task and hands over the context and control to Agent B.
+* **Supervisor**: A central Supervisor agent coordinates, assigns tasks to specialists, and aggregates results.
+* **Hierarchy**: Supervisors oversee sub-supervisors, delegating and aggregating tasks hierarchically.
+* **Blackboard**: Agents read and write to a shared state space (blackboard), intervening autonomously as the state changes.
+### 4. Trade-offs
+* **Pros**: Modular and scalable; allows mixing different model sizes/strengths to optimize costs.
+* **Cons**: High communication overhead (multi-turn dialogues between agents); complex state management; risk of infinite discussion loops or unclear ownership.
+### 5. Use Cases
+* Simulated software development teams (Product Manager -> Architect -> Engineer -> QA).
+* Creative content generation and peer review.

package/skills/neo-agentic-design/references/resilience-hitl.md ADDED Viewed

@@ -0,0 +1,105 @@
+# Resilience, Exceptions & HITL
+This document provides conceptual designs for system resilience, human interaction, and knowledge grounding, covering exception handling, Human-in-the-Loop (HITL) gates, and Retrieval-Augmented Generation (RAG).
+---
+## Chapter 12: Exception Handling and Recovery
+### 1. Definition
+Designs automatic detection, retry, fallback, and state rollback mechanisms for exceptions that may occur during agent execution (such as API timeouts, network disconnections, LLM format errors, and invalid tool parameters).
+### 2. Problems Addressed
+* System fragility: Prevents long-cycle workflows from breaking due to transient network or API issues.
+* Format pollution: Guides the LLM to self-heal when its output does not conform to the expected JSON schema.
+### 3. Workflow
+```mermaid
+graph TD
+    Step[Execute Tool / Call LLM] --> Success{Successful?}
+    Success -->|Yes| Next[Proceed to Next Step]
+    Success -->|No: Exception| Detector[Exception Detector]
+    Detector --> RuleCheck{Evaluate Exception Type}
+    RuleCheck -->|Network/Timeout| Retry[Auto Retry with Backoff]
+    RuleCheck -->|Format Error| Refine[Guide LLM to Self-Correct]
+    RuleCheck -->|Tool Failure| Fallback[Route to Fallback/Alternative Tool]
+    RuleCheck -->|Critical Error| Rollback[Rollback State to Checkpoint]
+    Retry --> Step
+    Refine --> Step
+    Fallback --> Step
+    Rollback --> UserEscalation[Human Intervention]
+```
+### 4. Trade-offs
+* **Pros**: Improves system robustness and reduces manual maintenance costs.
+* **Cons**: Excessive retries or fallbacks can mask underlying bugs or quietly degrade output quality.
+---
+## Chapter 13: Human-in-the-Loop (HITL)
+### 1. Definition
+Strategically embeds human review, intervention, and authorization mechanisms into the agent's autonomous decision-making workflow, combining human common sense, ethics, and legal judgment with AI automation.
+### 2. Problems Addressed
+* High-risk operations: Prevents agent errors when performing large financial transactions, deleting sensitive data, or executing legally sensitive actions.
+* Automation boundaries: Requests human guidance when decision confidence falls below a set threshold.
+### 3. Three Core Interaction Modes
+````carousel
+### 1. Human-in-the-Loop (HITL)
+* **Mechanism**: The agent pauses when reaching a high-risk step (e.g., large bank transfer), suspends the task, and sends it to a pending review queue.
+* **Workflow**: Agent pauses -> Human reviews (Approve/Reject/Modify) -> Agent receives input and resumes execution.
+* **Key Characteristic**: Human approval is a mandatory gate.
+<!-- slide -->
+### 2. Human-on-the-Loop (HOTL)
+* **Mechanism**: The agent executes tasks autonomously while a human supervisor monitors and adjusts strategies.
+* **Workflow**: Human sets macro rules (e.g., transaction limits) -> Agent trades automatically -> Human monitors metrics -> Human intervenes via a Kill Switch if necessary.
+* **Key Characteristic**: Human does not intervene in individual decisions but maintains macro-level oversight.
+<!-- slide -->
+### 3. Decision Augmentation
+* **Mechanism**: The agent acts as an analytical assistant, gathering data and presenting candidates. Decision-making and execution are performed entirely by a human.
+* **Workflow**: Human asks query -> Agent collects and analyzes data -> Agent proposes options A, B, and C with pros/cons -> Human selects and executes.
+* **Key Characteristic**: Agent provides cognitive augmentation without execution authority.
+````
+### 4. Trade-offs
+* **Pros**: Provides a safety net and compliance guarantee for high-risk decisions; collects human feedback to optimize agent alignment.
+* **Cons**: Human intervention limits system scalability and speed; designing human-in-the-loop review queues increases development costs.
+---
+## Chapter 14: Knowledge Retrieval / RAG
+### 1. Definition
+Retrieves relevant information from a knowledge base before the LLM generates a response, injecting the retrieved text chunks into the prompt context to guide the LLM toward producing factually grounded answers.
+### 2. Advanced Agentic RAG Variants
+```mermaid
+graph TD
+    subgraph Traditional RAG
+        Query[User Query] --> VectorSearch[Vector Similarity Search]
+        VectorSearch --> Context[Concatenate Context Chunks]
+        Context --> LLMGen[LLM Generates Response]
+    end
+    subgraph Graph RAG
+        GQuery[User Query] --> GraphSearch[Navigate Knowledge Graph Nodes & Edges]
+        GraphSearch --> UnifiedContext[Cross-document Context Linkage]
+    end
+    subgraph Agentic RAG
+        AQuery[User Query] --> AgentLayer[Agent Decision Layer]
+        AgentLayer -->|1. Decompose Task| SubQueries[Multi-step Sub-retrieval Tasks]
+        AgentLayer -->|2. Self-Reflection| SourceVal[Source Timeliness & Quality Check]
+        AgentLayer -->|3. Resolve Conflicts| ConflictRecon[Active Conflict Reconciliation]
+        AgentLayer -->|4. Tool Call| WebSearch[Web Search for Knowledge Gaps]
+    end
+```
+### 3. Problems Addressed
+* Outdated knowledge: Bypasses the temporal limits of static training data.
+* Hallucination: Restricts the model within factual boundaries using verified document contexts.
+* Fragmented information: Resolves vector search limitations that struggle to answer comprehensive questions spanning multiple documents.
+### 4. Trade-offs
+* **Pros**: Minimizes factual errors; supports precise citations; imports private knowledge without retraining models.
+* **Cons**: Highly sensitive to the quality of text chunking and embeddings; multi-step reasoning in Agentic RAG increases response latency.

package/skills/neo-agentic-design/references/system-components.md ADDED Viewed

@@ -0,0 +1,93 @@
+# System Components & Protocols
+This document provides conceptual designs for system architecture components, resources, and protocols, covering memory management, learning and adaptation, Model Context Protocol (MCP), and goal setting and monitoring.
+---
+## Chapter 8: Memory Management
+### 1. Definition
+Provides agents with the ability to store and retrieve information across sessions and tasks through persistence mechanisms. The memory system is generally divided into short-term and long-term memory, managed by a unified Memory Service.
+### 2. Memory Classification
+| Memory Type | Medium | Function | Eviction & Retrieval Mechanism |
+| :--- | :--- | :--- | :--- |
+| **Short-term** | Current Context Window | Stores current conversation context and task execution trajectory | Sliding window, context pruning, and summarization |
+| **Long-term Semantic** | Vector Database / Knowledge Base | Retains factual knowledge, concepts, and external rules | Vector semantic retrieval based on user input |
+| **Long-term Episodic** | Structured Database / Log Store | Records past task execution experiences and outcomes | Used for few-shot learning or similar scenario matching |
+| **Long-term Procedural**| Codebase / Tool Definitions / Prompt Templates | Records Standard Operating Procedures (SOPs) and toolbox definitions for specific tasks | Dynamically loaded based on task type |
+### 3. Problems Addressed
+* Amnesia (Context limits): Prevents long conversations from causing the LLM to lose critical history.
+* Repeated errors: Ensures the agent learns from past executions to improve decision success rates.
+---
+## Chapter 9: Learning and Adaptation
+### 1. Definition
+Enables the agent to autonomously modify prompts or self-modify execution code in a code sandbox (SICA - Self-Improving Coding Agent) by collecting behavioral feedback and rewards from interactions with the environment, users, or other agents.
+### 2. Problems Addressed
+* Static configuration lag: Solves the issue of agents failing to adjust when environmental rules change.
+* High development cost: Eliminates the manual process of fine-tuning prompts.
+### 3. Workflow
+```mermaid
+graph TD
+    Interaction[Agent-Environment Interaction] --> Result[Execution Results & Metrics]
+    Result --> evaluator[Evaluator / Scoring System]
+    evaluator -->|Feedback/Score| Learner[Learning Engine]
+    Learner -->|Self-Optimize Prompts or Refactor Code| AgentUpgrade[Upgraded Agent]
+    AgentUpgrade -->|Next Task Turn| Interaction
+```
+### 4. Trade-offs
+* **Pros**: High potential for long-term self-evolution; can discover high-quality logic not designed by humans in specific vertical disciplines (e.g., mathematical proofs, code generation).
+* **Cons**: Unpredictable evolution paths, which may generate harmful mutations; self-modifying prompts can lead to privilege escalation or security vulnerabilities; extremely high overhead for training and testing iterations.
+---
+## Chapter 10: Model Context Protocol (MCP)
+### 1. Definition
+A standardized **Client-Server communication protocol** that establishes a plug-and-play integration standard between LLMs/Agents (Clients) and external data sources, development tools, and API services (Servers). MCP standardizes three core types of context exchange: **Resources**, **Prompts**, and **Tools**.
+```mermaid
+graph LR
+    subgraph Agentic Client
+        Agent[AI Agent / LLM]
+    end
+    subgraph MCP Server
+        Res[Resources: Files/Databases]
+        Pmt[Prompts: Templates]
+        Tls[Tools: APIs/Sandboxes]
+    end
+    Agent <-->|Standard JSON-RPC 2.0| MCP_Link[MCP Protocol Layer]
+    MCP_Link <--> Res
+    MCP_Link <--> Pmt
+    MCP_Link <--> Tls
+```
+### 2. Problems Addressed
+* Tedious integration: Avoids repeatedly writing custom wrapper code when developing new agents or integrating new tools.
+* Fragmented context acquisition: Provides external data and actions to the model in a unified interface format.
+### 3. Trade-offs
+* **Pros**: Reduces integration costs for multiple tools and data sources; decouples data sources from reasoning entities; supports dynamic discovery.
+* **Cons**: Protocol serialization and JSON-RPC wrapping introduce minor performance overhead; requires tool providers to actively adopt the protocol.
+---
+## Chapter 11: Goal Setting and Monitoring
+### 1. Definition
+Sets structured and quantifiable goals (SMART principles) before agent initialization, and introduces an independent monitor during the execution phase to observe progress in real time (Progress Checkpoints), detect blocks, and trigger human-agent collaboration escalation when necessary.
+### 2. Problems Addressed
+* Blind execution: Prevents agents from entering infinite retry loops when encountering logical obstacles, wasting budget.
+* Lack of observability: Solves the black-box execution problem, providing a clear progress path.
+### 3. Use Cases
+* Automated marketing campaign execution.
+* Long-cycle autonomous codebase refactoring.