@moon791017/neo-skills 1.1.11 → 1.1.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -59,6 +59,7 @@
59
59
  | TypeScript | `neo-typescript` | 處理 TypeScript、tsconfig、strict mode、泛型、conditional/mapped/template literal types、ESM/CJS 與 runtime boundaries。 |
60
60
  | Vue | `neo-vue` | 建置、除錯、重構或審查 Vue 3、SFC、Composition API、Pinia、Vue Router、Vite 與 Vue+TypeScript。 |
61
61
  | Agent 架構 | `neo-sub-agent` | 設計、建立、審查或轉換 sub-agent、custom agent、worker/reviewer/planner agent 或 multi-agent workflow。 |
62
+ | Agent 架構 | `neo-agentic-design` | 設計、評估或實作 Agent 工作流、提示詞鏈、路由、規劃、反思、多 Agent 協作與記憶體管理等框架無關模式。 |
62
63
  | 文字潤飾 | `neo-stop-slop` | 去除繁中或英文中的 AI 腔、贅詞、公式化句式,支援文件、註解、commit message 與 PR 說明。 |
63
64
 
64
65
  ## 安裝
@@ -153,6 +154,7 @@ npx -p @moon791017/neo-skills install-system-instructions \
153
154
  | 建 Vue 3 元件 | `neo-vue` | `幫我重構這個 SFC,避免響應式踩坑` |
154
155
  | 改善 AI 開發流程 | `neo-agent-harness` | `評估這個專案讓 coding agent 協作的可靠度` |
155
156
  | 建立 sub-agent | `neo-sub-agent` | `幫我新增一個 Codex code-reviewer sub agent` |
157
+ | 設計 Agent 編排架構 | `neo-agentic-design` | `幫我設計一個多 Agent 客服系統的拓撲結構與重試機制` |
156
158
  | 去掉 AI 腔 | `neo-stop-slop` | `把這段 PR 說明改得自然、直接一點` |
157
159
 
158
160
  ## 開發
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@moon791017/neo-skills",
3
- "version": "1.1.11",
3
+ "version": "1.1.12",
4
4
  "type": "module",
5
5
  "description": "Neo Skills: A Universal AI Agent Skills Extension",
6
6
  "homepage": "https://neo-blog-iota.vercel.app/",
@@ -0,0 +1,89 @@
1
+ ---
2
+ name: neo-agentic-design
3
+ description: >
4
+ Use this skill when designing, evaluating, or implementing Agent workflows, prompt chains, routing, planning, reflection, multi-agent collaboration, memory management, or other framework-agnostic LLM orchestration patterns.
5
+ license: MIT
6
+ compatibility: No specific language runtime required; conceptual-only patterns.
7
+ metadata:
8
+ version: "1.0.0"
9
+ type: "conceptual-design"
10
+ ---
11
+
12
+ # Neo Agentic Design
13
+
14
+ This skill provides architectural concepts and orchestration patterns for building LLM Agent systems. It covers 21 core design patterns categorized into four themes. The orchestration logic remains abstract and independent of specific programming languages or frameworks.
15
+
16
+ ## Gotchas
17
+ * **Over-engineering**: Prioritize simple prompt chains (Chapter 1) or routing (Chapter 2). Use complex multi-agent collaboration (Chapter 7) or hierarchical networks only when necessary to reduce token overhead.
18
+ * **Reflection Infinite Loops**: When implementing reflection (Chapter 4) or self-correction (Chapter 12), enforce a maximum iteration limit (e.g., 3-5 iterations) to prevent the LLM from getting stuck in an infinite loop.
19
+ * **Blocking Operations**: High-risk operations (such as direct database deletions or large fund transfers) must include a Human-in-the-Loop review gate (Chapter 13).
20
+ * **Context Pruning State Loss**: When compressing context, protect critical agent instructions from being pruned to prevent behavioral degradation.
21
+
22
+ ## Workflow Checklist
23
+ Progress:
24
+ - [ ] Step 1: Analyze Requirements (define objectives, inputs, constraints, and complexity levels).
25
+ - [ ] Step 2: Select Orchestration Patterns (load corresponding reference documents based on requirements).
26
+ - [ ] Step 3: Plan System Components (determine memory, learning mechanisms, and protocol specifications).
27
+ - [ ] Step 4: Define Resilience and Safety (establish exception handling, human review gates, and input/output guardrails).
28
+ - [ ] Step 5: Draft Design Proposal (create system topology diagrams and describe the architecture).
29
+
30
+ ## Detailed Guidelines
31
+
32
+ ### Step 1 — Analyze Requirements
33
+ Evaluate problem complexity (Level 1, 2, or 3) and confirm:
34
+ 1. **Latency Sensitivity**: For low-latency requirements, prioritize parallelization (Chapter 3) and routing (Chapter 2).
35
+ 2. **Task Fragility**: For strict sequential tasks or error-prone processes, use chaining (Chapter 1) or planning (Chapter 6).
36
+
37
+ ### Step 2 — Load Design Patterns (Progressive Loading)
38
+ Load specific reference files as needed to avoid loading all concepts at once:
39
+ * Base workflows (Prompt Chaining, Routing, Parallelization, Reflection, Tool Use, Planning, Multi-Agent Collaboration):
40
+ 👉 **Load [base-workflows](references/base-workflows.md)**
41
+ * System infrastructure (Memory Management, Learning and Adaptation, MCP, Goal Setting and Monitoring):
42
+ 👉 **Load [system-components](references/system-components.md)**
43
+ * Exception handling, HITL, RAG fact-grounding:
44
+ 👉 **Load [resilience-hitl](references/resilience-hitl.md)**
45
+ * Advanced safety, evaluation, prioritization, A2A communication, exploration and discovery:
46
+ 👉 **Load [advanced-safety](references/advanced-safety.md)**
47
+
48
+ ### Step 3 — System Architecture Planning
49
+ The design document must clearly document:
50
+ 1. **State Space**: Context window management method and division of short-term and long-term memory (cognitive/procedural memory).
51
+ 2. **Tool Boundaries**: Tool call schema protocols and sandbox rules.
52
+ 3. **Safety Boundaries**: Specific conditions for triggering human approval (HITL) or falling back to backup models.
53
+
54
+ ---
55
+
56
+ ## Output Template (Agentic Architecture Design Proposal)
57
+
58
+ When presenting agent designs to users, use this template format:
59
+
60
+ ```markdown
61
+ # Agentic System Design Proposal: [System Name]
62
+
63
+ ## 1. Executive Summary
64
+ * **Complexity Level**: [Level 1 / Level 2 / Level 3]
65
+ * **Target Objective**: [System Goal]
66
+ * **Key Constraints**: [Constraints such as latency, cost, security, etc.]
67
+
68
+ ## 2. Core Orchestration Architecture
69
+ * **Selected Patterns**: [e.g., Router -> Parallel Agents -> Synthesizer]
70
+ * **Workflow Description**: [System data flow and control flow description]
71
+
72
+ ### Topology Diagram (Mermaid)
73
+ ```mermaid
74
+ [Mermaid diagram representing the Agent Loop / Topology]
75
+ ```
76
+
77
+ ## 3. Reference Patterns Applied
78
+ * **[Pattern Name] (Chapter X)**: [Specific application and rationale in the system]
79
+ * **[Pattern Name] (Chapter Y)**: [Specific application and rationale in the system]
80
+
81
+ ## 4. Resilience, Safety & HITL Rules
82
+ * **Exception Recovery**: [Handling flow for API timeouts, rate limits, and JSON formatting errors]
83
+ * **Human-in-the-Loop Gates**: [Conditions triggering human review]
84
+ * **Guardrails**: [Input filtering and output validation mechanisms]
85
+
86
+ ## 5. Next Steps / Implementation Roadmap
87
+ 1. [Step 1]
88
+ 2. [Step 2]
89
+ ```
@@ -0,0 +1,58 @@
1
+ [
2
+ {
3
+ "query": "I need to design a system that routes incoming user queries to specialized LLM prompts depending on their category.",
4
+ "should_trigger": true
5
+ },
6
+ {
7
+ "query": "How do I implement reflection and self-correction in a multi-agent system to make it write better code?",
8
+ "should_trigger": true
9
+ },
10
+ {
11
+ "query": "Can you review my LLM orchestration workflow? It currently uses prompt chaining but has high latency.",
12
+ "should_trigger": true
13
+ },
14
+ {
15
+ "query": "I want to set up a Model Context Protocol (MCP) server for my agent so it can read local files.",
16
+ "should_trigger": true
17
+ },
18
+ {
19
+ "query": "What is the best way to handle long-term semantic memory and episodic memory in an autonomous agent?",
20
+ "should_trigger": true
21
+ },
22
+ {
23
+ "query": "Please design a pipeline workflow for generating technical reports, with a human-in-the-loop validation step.",
24
+ "should_trigger": true
25
+ },
26
+ {
27
+ "query": "How does dynamic re-prioritization work when an agent has conflicting goals?",
28
+ "should_trigger": true
29
+ },
30
+ {
31
+ "query": "Review the exception handling and recovery mechanism in my LLM agent loop.",
32
+ "should_trigger": true
33
+ },
34
+ {
35
+ "query": "I need to write a Python script that calculates the Fibonacci sequence using recursion.",
36
+ "should_trigger": false
37
+ },
38
+ {
39
+ "query": "What is the difference between supervised learning and reinforcement learning in traditional machine learning?",
40
+ "should_trigger": false
41
+ },
42
+ {
43
+ "query": "How do I configure my local PostgreSQL database on macOS?",
44
+ "should_trigger": false
45
+ },
46
+ {
47
+ "query": "Write a CSS stylesheet for a dark mode website.",
48
+ "should_trigger": false
49
+ },
50
+ {
51
+ "query": "I want to build a simple web scraper in Python using beautifulsoup4.",
52
+ "should_trigger": false
53
+ },
54
+ {
55
+ "query": "How do I write a prompt to make ChatGPT act like a professional English translator?",
56
+ "should_trigger": false
57
+ }
58
+ ]
@@ -0,0 +1,27 @@
1
+ {
2
+ "skill_name": "neo-agentic-design",
3
+ "evals": [
4
+ {
5
+ "id": 1,
6
+ "prompt": "Design an agentic system that generates monthly financial reports. It must parse transaction raw data, categorize expenses, draft a report, let a human reviewer approve/edit the draft, and then output a final PDF. Minimize latency and ensure high accuracy.",
7
+ "expected_output": "An Agentic System Design Proposal containing Routing, Chaining, and Human-in-the-Loop patterns, structured with the standard output template.",
8
+ "assertions": [
9
+ "The output starts with 'Agentic System Design Proposal' or matches the template format",
10
+ "The proposal mentions Routing, Chaining, and Human-in-the-Loop patterns",
11
+ "The proposal contains a Mermaid sequence or flowchart diagram representing the topology",
12
+ "The proposal lists specific Gotchas or risks like latency and cost control"
13
+ ]
14
+ },
15
+ {
16
+ "id": 2,
17
+ "prompt": "I need to design a system that reviews incoming code commits for potential security vulnerabilities and performance bottlenecks. It needs to check thousands of commits daily and must fail-safely if any analysis tool crashes.",
18
+ "expected_output": "An Agentic System Design Proposal containing Parallelization, Routing, Guardrails, and Exception Recovery patterns, structured with the standard output template.",
19
+ "assertions": [
20
+ "The proposal includes Parallelization and Exception Recovery patterns",
21
+ "The proposal provides a Mermaid topology diagram showing parallel evaluation and a merge point",
22
+ "The proposal includes specific Exception Handling rules for crashed analysis tools",
23
+ "The proposal includes Guardrails policies for input/output sanitization"
24
+ ]
25
+ }
26
+ ]
27
+ }
@@ -0,0 +1,158 @@
1
+ # Advanced Execution, Guardrails & Safety
2
+
3
+ This document provides conceptual designs for advanced execution, guardrails, and safety patterns, covering agent-to-agent (A2A) communication, resource-aware optimization, reasoning techniques, guardrails, evaluation and monitoring, prioritization, and scientific exploration.
4
+
5
+ ---
6
+
7
+ ## Chapter 15: Inter-Agent Communication (A2A)
8
+
9
+ ### 1. Definition
10
+ An open agent communication protocol across frameworks and technology stacks. Uses standard HTTP and JSON-RPC formats to enable agent declaration, task delegation, and data exchange across different networks.
11
+
12
+ ### 2. Core Components
13
+ * **Agent Card**: A JSON declaration containing the agent name, version, endpoint URL, multimodal capabilities, and skills.
14
+ * **Task Mechanism**: Defines collaboration as a "Task" with a lifecycle state (Submitted, Working, Completed, Failed), tracked using a `contextId` for multi-turn conversation context.
15
+ * **Communication Modes**:
16
+ * **Synchronous**: Direct invocation with immediate response.
17
+ * **Asynchronous Polling**: Submit a task to obtain a Task ID and periodically query status.
18
+ * **Streaming (SSE)**: Receive partial outputs in real time via Server-Sent Events.
19
+ * **Webhook**: Actively push notifications to a specified URL upon task completion.
20
+
21
+ ### 3. Problems Addressed
22
+ * Heterogeneous framework silos: Solves communication barriers between different agent frameworks.
23
+ * Distributed collaboration barriers: Enables agents on different servers to safely delegate tasks.
24
+
25
+ ---
26
+
27
+ ## Chapter 16: Resource-Aware Optimization
28
+
29
+ ### 1. Definition
30
+ Monitors computation, latency, and financial costs (tokens/API calls) in real time during agent execution. Dynamically switches between models with different capabilities or prunes context based on budget and latency constraints.
31
+
32
+ ### 2. Problems Addressed
33
+ * API cost overruns: Avoids using expensive reasoning models for simple queries.
34
+ * Rate limits and overload: Executes fallbacks and backup plans when the primary model is limited or overloaded.
35
+
36
+ ### 3. Workflow
37
+ ```mermaid
38
+ graph TD
39
+ Query[User Query] --> Router{Router LLM}
40
+ Router -->|1. Simple Query| CheapModel[Lightweight Model]
41
+ Router -->|2. Complex Reasoning| ExpensiveModel[High-tier Reasoning Model]
42
+ Router -->|3. Real-time Info| SearchTool[Real-time Search Tool]
43
+ CheapModel --> Checker[Critique Agent: Quality Eval]
44
+ ExpensiveModel --> Checker
45
+ Checker -->|Fail| Fallback[Fallback Plan]
46
+ Checker -->|Pass| Output[Final Output]
47
+ ```
48
+
49
+ ---
50
+
51
+ ## Chapter 17: Reasoning Techniques
52
+
53
+ ### 1. Definition
54
+ Architectural techniques that allocate more computational resources at inference time to explicitly expand the agent's thought process. Covers step-by-step decomposition, tree-search path planning, code-assisted execution, and ReAct loops.
55
+
56
+ ### 2. Six Core Reasoning Patterns
57
+ * **Chain of Thought (CoT)**: Guides the model to reason step-by-step to decompose complex problems.
58
+ * **Tree of Thoughts (ToT)**: Represents the reasoning space as a tree, supporting backtracking and multi-path parallel evaluation.
59
+ * **Reasoning and Action (ReAct)**: Interleaves tool execution with reasoning steps (Thought -> Action -> Observation -> Thought ... -> Finish).
60
+ * **Program-Aided Language Models (PALMs)**: Offloads precise mathematical calculations to a secure code sandbox and interprets the results to eliminate calculation hallucinations.
61
+ * **Multi-Agent Debate (Chain/Graph of Debates)**: Employs multiple agents to debate a problem across several turns, using consensus or strong logical conclusions as the final answer.
62
+ * **Scaling Inference Law**: Uses multi-path generation, self-correction, or extended thinking paths during the inference stage, allowing smaller models to achieve performance comparable to a single generation of a larger model.
63
+
64
+ ---
65
+
66
+ ## Chapter 18: Guardrails & Safety Patterns
67
+
68
+ ### 1. Definition
69
+ Deploys multiple layers of filtering and defense at the input, tool execution, and output stages to ensure system compliance, safety, and protection against jailbreak attacks, prompt injection, and tool privilege escalation.
70
+
71
+ ### 2. Multi-Layer Defense Flow
72
+ ```mermaid
73
+ graph TD
74
+ Input[User Input] --> InputGuard[1. Input Guardrails: Jailbreak/Injection Detection]
75
+ InputGuard -->|Violation| Block[Access Denied]
76
+ InputGuard -->|Safe| LLM_Core[2. Core Agent Reasoning]
77
+ LLM_Core -->|Call Tool| ToolCallback[3. Pre-execution Tool Validation]
78
+ ToolCallback -->|Reject| LLM_Core
79
+ ToolCallback -->|Approve| ToolExec[Tool Execution]
80
+ ToolExec --> OutputGen[Output Generation]
81
+ OutputGen --> OutputGuard[4. Output Guardrails: PII/Toxicity Filter]
82
+ OutputGuard -->|Safe| User[Deliver to User]
83
+ OutputGuard -->|Violation| Redaction[Redaction/Block/Self-Correction]
84
+ ```
85
+
86
+ ### 3. Problems Addressed
87
+ * Prompt jailbreaks: Prevents users from guiding the agent to perform unauthorized or harmful actions.
88
+ * Privilege escalation: Follows the principle of least privilege to prevent agents from unauthorized data modification or account deletion.
89
+
90
+ ---
91
+
92
+ ## Chapter 19: Evaluation and Monitoring
93
+
94
+ ### 1. Definition
95
+ Systematically measures and audits agent execution quality, trajectories, resource consumption, and drift. Evaluates the execution trajectory rather than just the final answer for non-deterministic systems.
96
+
97
+ ### 2. Three Core Evaluation Aspects
98
+ * **Objective Metrics Monitoring**: Logs latency, token consumption, and API success rates.
99
+ * **Trajectory Evaluation**: Compares action sequences with standard SOPs using exact matching, ordered matching, or unordered matching.
100
+ * **LLM-as-a-Judge**: Uses an independent LLM to score answers based on specific rubrics and outputs structured feedback.
101
+
102
+ ### 3. Advanced Pattern: AI Contractor / Contract Pattern
103
+ Resolves prompt drift and responsibility ambiguity:
104
+ ```mermaid
105
+ graph TD
106
+ User[User] -->|1. Initiate Draft Contract| Contractor[Contractor Agent]
107
+ Contractor -->|2. Self-Analysis & Evaluation| Analyze[Analyze clauses, scope, cost, dependencies]
108
+ Analyze -->|3. Negotiate Feedback| User
109
+ User -->|4. Approve & Sign| Execute[5. Execution: Self-test & verify]
110
+ Execute -->|6. Decompose Tasks| SubContracts[Sub-contracts]
111
+ SubContracts --> SubAgents[Sub-agents]
112
+ Execute -->|7. Deliver Deliverables| User
113
+ ```
114
+
115
+ ---
116
+
117
+ ## Chapter 20: Prioritization
118
+
119
+ ### 1. Definition
120
+ Sorts and dynamically schedules the execution order of multiple goals and tasks when the agent is faced with resource constraints or limited budgets.
121
+
122
+ ### 2. Problems Addressed
123
+ * Deadlocks and lack of focus: Prevents delays in critical tasks caused by prioritizing minor ones.
124
+ * Inadequate crisis response: Ensures the agent can dynamically switch task context when high-priority events (e.g., safety alerts) occur.
125
+
126
+ ### 3. Prioritization Metrics
127
+ * **Urgency**: Time sensitivity (closeness to deadline).
128
+ * **Importance**: Impact on accomplishing the ultimate goal.
129
+ * **Dependencies**: Whether the task is a prerequisite for other tasks.
130
+ * **Cost-Benefit Ratio**: Expected payoff relative to consumed resources.
131
+
132
+ ### 4. Mechanism
133
+ Tasks are scored and entered into a Priority Queue, executed sequentially by the planner. The system recalculates weights and re-orders the queue (Dynamic Re-prioritization) or interrupts the current task when the environmental state changes.
134
+
135
+ ---
136
+
137
+ ## Chapter 21: Exploration and Discovery
138
+
139
+ ### 1. Definition
140
+ Enables the agent to proactively explore unknown domains (Unknown Unknowns), generate new knowledge, design experiments, and prove hypotheses.
141
+
142
+ ### 2. Multi-Agent Scientific Discovery Flow
143
+ ```mermaid
144
+ graph TD
145
+ Goal[Exploration Goal] --> GenAgent[1. Generation Agent]
146
+ GenAgent -->|Propose Hypothesis| RefAgent[2. Reflection Agent]
147
+ RefAgent -->|Peer Review / Correction Suggestions| GenAgent
148
+ RefAgent -->|Accepted Draft| RankAgent[3. Ranking Agent]
149
+ RankAgent -->|Elo Tournament Debate| BestHypotheses[Select Best Hypotheses]
150
+ BestHypotheses --> EvoAgent[4. Evolution Agent]
151
+ EvoAgent -->|Concept Merging & Non-linear Exploration| AdvancedHypo[Advanced Hypotheses]
152
+ AdvancedHypo --> LabAgent[5. Lab Agent]
153
+ LabAgent -->|Execute Code/Simulation/Analysis| FinalReport[6. Final LaTeX Report]
154
+ ```
155
+
156
+ ### 3. Trade-offs
157
+ * **Pros**: Explores unknown topics autonomously, discovering insights that exceed human experience.
158
+ * **Cons**: High uncertainty and heavy token consumption; requires strict safety guardrails to prevent generating hazardous protocols.
@@ -0,0 +1,219 @@
1
+ # Base Patterns & Workflows
2
+
3
+ This document provides conceptual designs for basic agentic orchestration patterns, covering prompt chaining, routing, parallelization, reflection, tool use, planning, and multi-agent collaboration.
4
+
5
+ ---
6
+
7
+ ## Chapter 1: Prompt Chaining
8
+
9
+ ### 1. Definition
10
+ Decomposes a complex task into multiple **sequentially dependent subtasks**. The structured output of the previous step serves as the input for the next step. Each step focuses on a single, clear objective.
11
+
12
+ ### 2. Problems Addressed
13
+ * Context dilution: Prevents the LLM from losing focus when processing large, complex tasks.
14
+ * Instruction drift: Avoids failures in a single prompt that contains too many rules.
15
+
16
+ ### 3. Workflow
17
+ ```mermaid
18
+ graph LR
19
+ Input[Raw Input] --> Step1[LLM Step A]
20
+ Step1 -->|Structured Output A| Step2[LLM Step B]
21
+ Step2 -->|Structured Output B| Step3[LLM Step C]
22
+ Step3 --> Output[Final Answer]
23
+ ```
24
+
25
+ ### 4. Trade-offs
26
+ * **Pros**: High predictability; easy to optimize prompts and perform unit testing for individual steps.
27
+ * **Cons**: High total latency due to sequential execution; errors in earlier steps propagate downstream (Error Cascade).
28
+
29
+ ### 5. Use Cases
30
+ * Multi-step article generation (Outline -> Draft -> Polish -> Format).
31
+ * Data extraction and compliance analysis.
32
+
33
+ ---
34
+
35
+ ## Chapter 2: Routing
36
+
37
+ ### 1. Definition
38
+ Dynamically redirects tasks to the most suitable execution path, specialized tool, or sub-agent based on input characteristics. Routing decisions are made by rule engines, semantic similarity, or LLM classifiers.
39
+
40
+ ### 2. Problems Addressed
41
+ * Resource waste: Avoids using expensive, slow high-tier models for simple queries.
42
+ * Tool clutter: Avoids crowding too many unrelated tools into a single agent's context window.
43
+
44
+ ### 3. Workflow
45
+ ```mermaid
46
+ graph TD
47
+ Input[User Input] --> Router[Routing Classifier / LLM]
48
+ Router -->|Path A| AgentA[Specialized Agent A / Tool A]
49
+ Router -->|Path B| AgentB[Specialized Agent B / Tool B]
50
+ Router -->|Path C| AgentC[Specialized Agent C / Tool C]
51
+ ```
52
+
53
+ ### 4. Trade-offs
54
+ * **Pros**: High modularity; reduces average system latency and token consumption.
55
+ * **Cons**: Routing errors directly cause downstream task failures; an extra routing decision layer adds minor latency.
56
+
57
+ ### 5. Use Cases
58
+ * Customer support dispatching (e.g., routing to billing, tech support, or returns agents).
59
+ * Pre-filtering for tool calls.
60
+
61
+ ---
62
+
63
+ ## Chapter 3: Parallelization
64
+
65
+ ### 1. Definition
66
+ Splits a large task into multiple **independent subtasks** executed in parallel (Fork) and aggregates the results at a single point (Join).
67
+
68
+ ### 2. Problems Addressed
69
+ * Cumulative linear latency: Solves the high time cost associated with sequential multi-step execution.
70
+ * Single-perspective limitation: Collects diverse solutions to the same problem simultaneously for synthesis.
71
+
72
+ ### 3. Workflow
73
+ ```mermaid
74
+ graph TD
75
+ Input[Raw Query] --> Splitter[Splitter]
76
+ Splitter --> TaskA[Parallel Task A]
77
+ Splitter --> TaskB[Parallel Task B]
78
+ Splitter --> TaskC[Parallel Task C]
79
+ TaskA --> Syn[Synthesis / Aggregator]
80
+ TaskB --> Syn
81
+ TaskC --> Syn
82
+ Syn --> Output[Aggregated Output]
83
+ ```
84
+
85
+ ### 4. Trade-offs
86
+ * **Pros**: Significantly reduces elapsed time; suitable for large-scale parallel filtering.
87
+ * **Cons**: High spikes in token usage, easily triggering API rate limits; reconciling inconsistent results requires additional algorithms or LLM overhead.
88
+
89
+ ### 5. Use Cases
90
+ * Static code analysis (checking security, performance, and style simultaneously).
91
+ * Large-scale information retrieval and cross-document comparison.
92
+
93
+ ---
94
+
95
+ ## Chapter 4: Reflection (Self-Correction)
96
+
97
+ ### 1. Definition
98
+ Introduces a dual-entity feedback mechanism: a Generator and a Critic. The Generator produces an initial draft, the Critic evaluates it for quality and provides feedback, and the Generator iteratively refines the output until termination conditions are met.
99
+
100
+ ### 2. Problems Addressed
101
+ * Unstable output quality: Prevents logical gaps, factual errors, or formatting anomalies.
102
+ * Overconfidence: Breaks cognitive blind spots of a single-turn generation via an independent critique mechanism.
103
+
104
+ ### 3. Workflow
105
+ ```mermaid
106
+ graph TD
107
+ Input[Task Goal] --> Gen[Generator]
108
+ Gen --> Draft[Initial Draft]
109
+ Draft --> Critic[Critic / Evaluator]
110
+ Critic --> Decision{Is Acceptable?}
111
+ Decision -->|No| Feedback[Feedback/Suggestions]
112
+ Feedback -->|Guide Correction| Gen
113
+ Decision -->|Yes| Output[Final Accepted Output]
114
+ ```
115
+
116
+ ### 4. Trade-offs
117
+ * **Pros**: Highly stable output quality, significantly reducing logical and formatting errors.
118
+ * **Cons**: Higher token consumption; extended execution time; potential for infinite loops if termination conditions are poorly defined.
119
+
120
+ ### 5. Use Cases
121
+ * Automated code generation and testing (write code -> run tests -> fix based on errors -> re-test).
122
+ * Strict compliance document drafting.
123
+
124
+ ---
125
+
126
+ ## Chapter 5: Tool Use / Function Calling
127
+
128
+ ### 1. Definition
129
+ The LLM reads the description format (schema) of external tools, autonomously decides when to call a tool and generates the parameters. The agent executes the tool in a sandbox or external system, and feeds the results back to the LLM for interpretation.
130
+
131
+ ### 2. Problems Addressed
132
+ * Information lag: Connects the model to real-time data.
133
+ * Lack of computation: Solves difficulties in mathematics and precise logical operations.
134
+ * Inability to affect external systems: Allows agents to send emails, write databases, or call APIs.
135
+
136
+ ### 3. Workflow
137
+ ```mermaid
138
+ sequenceDiagram
139
+ participant U as User
140
+ participant L as LLM (Core Reasoning)
141
+ participant A as Agent Execution Sandbox
142
+ participant T as External Tool / API
143
+ U->>L: Query
144
+ L->>L: Identify context, decide to use clock tool
145
+ L-->>A: Return tool name & structured parameters
146
+ A->>T: Call external API
147
+ T-->>A: Return real-time data
148
+ A-->>L: Send execution result back as context
149
+ L->>L: Synthesize and reason
150
+ L->>U: Respond to user
151
+ ```
152
+
153
+ ### 4. Trade-offs
154
+ * **Pros**: Greatly expands the action capabilities and data retrieval scope of the agent.
155
+ * **Cons**: Risk of parameter generation errors; security risks with external tools (requires strict sandboxing); vulnerability to external API instability.
156
+
157
+ ### 5. Use Cases
158
+ * Real-time data queries (weather, stock market, ERP systems).
159
+ * Data entry and control (sending notifications, database updates).
160
+
161
+ ---
162
+
163
+ ## Chapter 6: Planning
164
+
165
+ ### 1. Definition
166
+ Decomposes a high-level goal into an ordered set of dependent execution steps. The planner dynamically rewrites the remaining steps (replanning) based on environmental feedback and new information to ensure the goal is reached.
167
+
168
+ ### 2. Problems Addressed
169
+ * Goal drift: Prevents the agent from losing sight of the ultimate goal during multi-step execution.
170
+ * Dynamic environment changes: Automatically searches for alternative solutions if a step fails.
171
+
172
+ ### 3. Workflow
173
+ ```mermaid
174
+ graph TD
175
+ Goal[Ultimate Goal] --> Planner[Planner: Task Decomposition]
176
+ Planner --> Plan[Generate Step List 1, 2, 3...]
177
+ Plan --> Executor[Executor: Call tools / sub-steps sequentially]
178
+ Executor --> EnvFeedback[Environmental Feedback]
179
+ EnvFeedback --> Checker{Encounter obstacles/failure?}
180
+ Checker -->|Yes| Replanner[Dynamic Replanner: Update plan]
181
+ Replanner --> Plan
182
+ Checker -->|No| Next{All steps completed?}
183
+ Next -->|No| Executor
184
+ Next -->|Yes| Output[Goal Accomplished]
185
+ ```
186
+
187
+ ### 4. Trade-offs
188
+ * **Pros**: Highly adaptable; capable of autonomously handling complex, unstructured tasks.
189
+ * **Cons**: Very high cost in LLM calls for planning and replanning; plan errors propagate, drifting downstream actions away from the target.
190
+
191
+ ### 5. Use Cases
192
+ * Autonomous research assistants (Deep Research: dynamically selecting keywords, assessing information quality, diving deep into unknown domains).
193
+ * Automated software development (architecture design -> module division -> sequential development).
194
+
195
+ ---
196
+
197
+ ## Chapter 7: Multi-Agent Collaboration
198
+
199
+ ### 1. Definition
200
+ Distributes a large task among multiple **specialized agents with distinct personas and skills**. These agents coordinate task handoffs, discussions, and integration through a predefined collaboration topology.
201
+
202
+ ### 2. Problems Addressed
203
+ * Cognitive limits of a single core: Avoids overloading a single system prompt with too many instructions and roles.
204
+ * Unclear division of labor: Emulates human teams by dedicating specialists to specific tasks.
205
+
206
+ ### 3. Workflow
207
+ Four main collaboration topologies:
208
+ * **Handoffs (Network)**: Agent A finishes its task and hands over the context and control to Agent B.
209
+ * **Supervisor**: A central Supervisor agent coordinates, assigns tasks to specialists, and aggregates results.
210
+ * **Hierarchy**: Supervisors oversee sub-supervisors, delegating and aggregating tasks hierarchically.
211
+ * **Blackboard**: Agents read and write to a shared state space (blackboard), intervening autonomously as the state changes.
212
+
213
+ ### 4. Trade-offs
214
+ * **Pros**: Modular and scalable; allows mixing different model sizes/strengths to optimize costs.
215
+ * **Cons**: High communication overhead (multi-turn dialogues between agents); complex state management; risk of infinite discussion loops or unclear ownership.
216
+
217
+ ### 5. Use Cases
218
+ * Simulated software development teams (Product Manager -> Architect -> Engineer -> QA).
219
+ * Creative content generation and peer review.
@@ -0,0 +1,105 @@
1
+ # Resilience, Exceptions & HITL
2
+
3
+ This document provides conceptual designs for system resilience, human interaction, and knowledge grounding, covering exception handling, Human-in-the-Loop (HITL) gates, and Retrieval-Augmented Generation (RAG).
4
+
5
+ ---
6
+
7
+ ## Chapter 12: Exception Handling and Recovery
8
+
9
+ ### 1. Definition
10
+ Designs automatic detection, retry, fallback, and state rollback mechanisms for exceptions that may occur during agent execution (such as API timeouts, network disconnections, LLM format errors, and invalid tool parameters).
11
+
12
+ ### 2. Problems Addressed
13
+ * System fragility: Prevents long-cycle workflows from breaking due to transient network or API issues.
14
+ * Format pollution: Guides the LLM to self-heal when its output does not conform to the expected JSON schema.
15
+
16
+ ### 3. Workflow
17
+ ```mermaid
18
+ graph TD
19
+ Step[Execute Tool / Call LLM] --> Success{Successful?}
20
+ Success -->|Yes| Next[Proceed to Next Step]
21
+ Success -->|No: Exception| Detector[Exception Detector]
22
+ Detector --> RuleCheck{Evaluate Exception Type}
23
+ RuleCheck -->|Network/Timeout| Retry[Auto Retry with Backoff]
24
+ RuleCheck -->|Format Error| Refine[Guide LLM to Self-Correct]
25
+ RuleCheck -->|Tool Failure| Fallback[Route to Fallback/Alternative Tool]
26
+ RuleCheck -->|Critical Error| Rollback[Rollback State to Checkpoint]
27
+ Retry --> Step
28
+ Refine --> Step
29
+ Fallback --> Step
30
+ Rollback --> UserEscalation[Human Intervention]
31
+ ```
32
+
33
+ ### 4. Trade-offs
34
+ * **Pros**: Improves system robustness and reduces manual maintenance costs.
35
+ * **Cons**: Excessive retries or fallbacks can mask underlying bugs or quietly degrade output quality.
36
+
37
+ ---
38
+
39
+ ## Chapter 13: Human-in-the-Loop (HITL)
40
+
41
+ ### 1. Definition
42
+ Strategically embeds human review, intervention, and authorization mechanisms into the agent's autonomous decision-making workflow, combining human common sense, ethics, and legal judgment with AI automation.
43
+
44
+ ### 2. Problems Addressed
45
+ * High-risk operations: Prevents agent errors when performing large financial transactions, deleting sensitive data, or executing legally sensitive actions.
46
+ * Automation boundaries: Requests human guidance when decision confidence falls below a set threshold.
47
+
48
+ ### 3. Three Core Interaction Modes
49
+ ````carousel
50
+ ### 1. Human-in-the-Loop (HITL)
51
+ * **Mechanism**: The agent pauses when reaching a high-risk step (e.g., large bank transfer), suspends the task, and sends it to a pending review queue.
52
+ * **Workflow**: Agent pauses -> Human reviews (Approve/Reject/Modify) -> Agent receives input and resumes execution.
53
+ * **Key Characteristic**: Human approval is a mandatory gate.
54
+ <!-- slide -->
55
+ ### 2. Human-on-the-Loop (HOTL)
56
+ * **Mechanism**: The agent executes tasks autonomously while a human supervisor monitors and adjusts strategies.
57
+ * **Workflow**: Human sets macro rules (e.g., transaction limits) -> Agent trades automatically -> Human monitors metrics -> Human intervenes via a Kill Switch if necessary.
58
+ * **Key Characteristic**: Human does not intervene in individual decisions but maintains macro-level oversight.
59
+ <!-- slide -->
60
+ ### 3. Decision Augmentation
61
+ * **Mechanism**: The agent acts as an analytical assistant, gathering data and presenting candidates. Decision-making and execution are performed entirely by a human.
62
+ * **Workflow**: Human asks query -> Agent collects and analyzes data -> Agent proposes options A, B, and C with pros/cons -> Human selects and executes.
63
+ * **Key Characteristic**: Agent provides cognitive augmentation without execution authority.
64
+ ````
65
+
66
+ ### 4. Trade-offs
67
+ * **Pros**: Provides a safety net and compliance guarantee for high-risk decisions; collects human feedback to optimize agent alignment.
68
+ * **Cons**: Human intervention limits system scalability and speed; designing human-in-the-loop review queues increases development costs.
69
+
70
+ ---
71
+
72
+ ## Chapter 14: Knowledge Retrieval / RAG
73
+
74
+ ### 1. Definition
75
+ Retrieves relevant information from a knowledge base before the LLM generates a response, injecting the retrieved text chunks into the prompt context to guide the LLM toward producing factually grounded answers.
76
+
77
+ ### 2. Advanced Agentic RAG Variants
78
+ ```mermaid
79
+ graph TD
80
+ subgraph Traditional RAG
81
+ Query[User Query] --> VectorSearch[Vector Similarity Search]
82
+ VectorSearch --> Context[Concatenate Context Chunks]
83
+ Context --> LLMGen[LLM Generates Response]
84
+ end
85
+ subgraph Graph RAG
86
+ GQuery[User Query] --> GraphSearch[Navigate Knowledge Graph Nodes & Edges]
87
+ GraphSearch --> UnifiedContext[Cross-document Context Linkage]
88
+ end
89
+ subgraph Agentic RAG
90
+ AQuery[User Query] --> AgentLayer[Agent Decision Layer]
91
+ AgentLayer -->|1. Decompose Task| SubQueries[Multi-step Sub-retrieval Tasks]
92
+ AgentLayer -->|2. Self-Reflection| SourceVal[Source Timeliness & Quality Check]
93
+ AgentLayer -->|3. Resolve Conflicts| ConflictRecon[Active Conflict Reconciliation]
94
+ AgentLayer -->|4. Tool Call| WebSearch[Web Search for Knowledge Gaps]
95
+ end
96
+ ```
97
+
98
+ ### 3. Problems Addressed
99
+ * Outdated knowledge: Bypasses the temporal limits of static training data.
100
+ * Hallucination: Restricts the model within factual boundaries using verified document contexts.
101
+ * Fragmented information: Resolves vector search limitations that struggle to answer comprehensive questions spanning multiple documents.
102
+
103
+ ### 4. Trade-offs
104
+ * **Pros**: Minimizes factual errors; supports precise citations; imports private knowledge without retraining models.
105
+ * **Cons**: Highly sensitive to the quality of text chunking and embeddings; multi-step reasoning in Agentic RAG increases response latency.
@@ -0,0 +1,93 @@
1
+ # System Components & Protocols
2
+
3
+ This document provides conceptual designs for system architecture components, resources, and protocols, covering memory management, learning and adaptation, Model Context Protocol (MCP), and goal setting and monitoring.
4
+
5
+ ---
6
+
7
+ ## Chapter 8: Memory Management
8
+
9
+ ### 1. Definition
10
+ Provides agents with the ability to store and retrieve information across sessions and tasks through persistence mechanisms. The memory system is generally divided into short-term and long-term memory, managed by a unified Memory Service.
11
+
12
+ ### 2. Memory Classification
13
+ | Memory Type | Medium | Function | Eviction & Retrieval Mechanism |
14
+ | :--- | :--- | :--- | :--- |
15
+ | **Short-term** | Current Context Window | Stores current conversation context and task execution trajectory | Sliding window, context pruning, and summarization |
16
+ | **Long-term Semantic** | Vector Database / Knowledge Base | Retains factual knowledge, concepts, and external rules | Vector semantic retrieval based on user input |
17
+ | **Long-term Episodic** | Structured Database / Log Store | Records past task execution experiences and outcomes | Used for few-shot learning or similar scenario matching |
18
+ | **Long-term Procedural**| Codebase / Tool Definitions / Prompt Templates | Records Standard Operating Procedures (SOPs) and toolbox definitions for specific tasks | Dynamically loaded based on task type |
19
+
20
+ ### 3. Problems Addressed
21
+ * Amnesia (Context limits): Prevents long conversations from causing the LLM to lose critical history.
22
+ * Repeated errors: Ensures the agent learns from past executions to improve decision success rates.
23
+
24
+ ---
25
+
26
+ ## Chapter 9: Learning and Adaptation
27
+
28
+ ### 1. Definition
29
+ Enables the agent to autonomously modify prompts or self-modify execution code in a code sandbox (SICA - Self-Improving Coding Agent) by collecting behavioral feedback and rewards from interactions with the environment, users, or other agents.
30
+
31
+ ### 2. Problems Addressed
32
+ * Static configuration lag: Solves the issue of agents failing to adjust when environmental rules change.
33
+ * High development cost: Eliminates the manual process of fine-tuning prompts.
34
+
35
+ ### 3. Workflow
36
+ ```mermaid
37
+ graph TD
38
+ Interaction[Agent-Environment Interaction] --> Result[Execution Results & Metrics]
39
+ Result --> evaluator[Evaluator / Scoring System]
40
+ evaluator -->|Feedback/Score| Learner[Learning Engine]
41
+ Learner -->|Self-Optimize Prompts or Refactor Code| AgentUpgrade[Upgraded Agent]
42
+ AgentUpgrade -->|Next Task Turn| Interaction
43
+ ```
44
+
45
+ ### 4. Trade-offs
46
+ * **Pros**: High potential for long-term self-evolution; can discover high-quality logic not designed by humans in specific vertical disciplines (e.g., mathematical proofs, code generation).
47
+ * **Cons**: Unpredictable evolution paths, which may generate harmful mutations; self-modifying prompts can lead to privilege escalation or security vulnerabilities; extremely high overhead for training and testing iterations.
48
+
49
+ ---
50
+
51
+ ## Chapter 10: Model Context Protocol (MCP)
52
+
53
+ ### 1. Definition
54
+ A standardized **Client-Server communication protocol** that establishes a plug-and-play integration standard between LLMs/Agents (Clients) and external data sources, development tools, and API services (Servers). MCP standardizes three core types of context exchange: **Resources**, **Prompts**, and **Tools**.
55
+
56
+ ```mermaid
57
+ graph LR
58
+ subgraph Agentic Client
59
+ Agent[AI Agent / LLM]
60
+ end
61
+ subgraph MCP Server
62
+ Res[Resources: Files/Databases]
63
+ Pmt[Prompts: Templates]
64
+ Tls[Tools: APIs/Sandboxes]
65
+ end
66
+ Agent <-->|Standard JSON-RPC 2.0| MCP_Link[MCP Protocol Layer]
67
+ MCP_Link <--> Res
68
+ MCP_Link <--> Pmt
69
+ MCP_Link <--> Tls
70
+ ```
71
+
72
+ ### 2. Problems Addressed
73
+ * Tedious integration: Avoids repeatedly writing custom wrapper code when developing new agents or integrating new tools.
74
+ * Fragmented context acquisition: Provides external data and actions to the model in a unified interface format.
75
+
76
+ ### 3. Trade-offs
77
+ * **Pros**: Reduces integration costs for multiple tools and data sources; decouples data sources from reasoning entities; supports dynamic discovery.
78
+ * **Cons**: Protocol serialization and JSON-RPC wrapping introduce minor performance overhead; requires tool providers to actively adopt the protocol.
79
+
80
+ ---
81
+
82
+ ## Chapter 11: Goal Setting and Monitoring
83
+
84
+ ### 1. Definition
85
+ Sets structured and quantifiable goals (SMART principles) before agent initialization, and introduces an independent monitor during the execution phase to observe progress in real time (Progress Checkpoints), detect blocks, and trigger human-agent collaboration escalation when necessary.
86
+
87
+ ### 2. Problems Addressed
88
+ * Blind execution: Prevents agents from entering infinite retry loops when encountering logical obstacles, wasting budget.
89
+ * Lack of observability: Solves the black-box execution problem, providing a clear progress path.
90
+
91
+ ### 3. Use Cases
92
+ * Automated marketing campaign execution.
93
+ * Long-cycle autonomous codebase refactoring.