agentme 0.16.0 → 0.17.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -82,7 +82,7 @@ Direct installation of project-required Go CLIs with `go install ...@latest` as
82
82
  - Outbound adapters live under `adapters/connectors/` with one subfolder per external resource, named descriptively (e.g., `postgres/`, `stripe-api/`, `redis-cache/`).
83
83
  - `shared/` must contain only infrastructure-agnostic utilities — not business rules or domain logic.
84
84
  - Packages are flat by default; sub-packages are only introduced when a feature package itself exceeds ~400 lines or has clearly separable sub-concerns.
85
- - Application MAY import from Adapters when it simplifies the design (pragmatic coupling per edr-021 rule 05).
85
+ - Application MAY import from Adapters when it simplifies the design (pragmatic coupling per edr-022 rule 05).
86
86
  - Consumer examples for reusable libraries belong in a sibling `examples/` folder and MUST import the public module path rather than reaching into internal source paths. Because Go libraries are not typically consumed from a local packaged artifact, local example validation may use a temporary module replacement for resolution, but the import path MUST remain the public module path.
87
87
 
88
88
  #### go.mod
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: agentme-edr-policy-018-ai-llm-development-standards
3
- description: Defines the standard framework, provider configuration, observability approach, and LLM mocking patterns for simple LLM calls in Python. Use when building, reviewing, or scaffolding any code that makes direct LLM calls using LangChain, manages prompt context, or handles conversation history. For agentic patterns see agentme-edr-019, for workflow patterns see agentme-edr-020.
3
+ description: Defines the standard framework, provider configuration, observability approach, and LLM mocking patterns for simple LLM calls in Python. Use when building, reviewing, or scaffolding any code that makes direct LLM calls using LangChain, manages prompt context, or handles conversation history. For agentic patterns see agentme-edr-019, for workflow patterns see agentme-edr-021.
4
4
  apply-to: Python projects that make direct LLM calls, manage prompt context, or handle conversation threads
5
5
  valid-from: 2026-06-05
6
6
  ---
@@ -29,7 +29,7 @@ Three distinct tiers of LLM-based computation are recognized in this policy. Eve
29
29
 
30
30
  These tiers nest: in general, a Workflow may contain Agent nodes; an Agent uses LLM calls internally. The tier of a component is determined by its outermost controlling structure.
31
31
 
32
- See [agentme-edr-019](019-ai-agents-development-standards.md) for Agent implementation standards and [agentme-edr-020](020-ai-workflow-development-standards.md) for Workflow implementation standards.
32
+ See [agentme-edr-019](019-ai-agents-development-standards.md) for Agent implementation standards and [agentme-edr-021](021-ai-workflow-development-standards.md) for Workflow implementation standards.
33
33
 
34
34
  ### Details
35
35
 
@@ -174,8 +174,8 @@ prompt = PromptTemplate.from_file(
174
174
  ## References
175
175
 
176
176
  - [agentme-edr-019](019-ai-agents-development-standards.md) — Agent implementation standards (deepagents, tool-invocation loops)
177
- - [agentme-edr-020](020-ai-workflow-development-standards.md) — Workflow implementation standards (LangGraph, MLflow run-level tracking)
177
+ - [agentme-edr-021](021-ai-workflow-development-standards.md) — Workflow implementation standards (LangGraph, MLflow run-level tracking)
178
178
  - [agentme-edr-004](../principles/004-unit-test-requirements.md) — Unit test requirements including external API mocking guidance
179
179
  - [agentme-edr-014](014-python-project-tooling.md) — Python project tooling and structure
180
180
  - [agentme-edr-007](../principles/007-project-quality-standards.md) — Project quality standards including AI-tier testing requirements (rule `09-ai-project-testing-requirements`)
181
- - [agentme-edr-021](021-ai-eval-standards.md) — AI eval standards: folder structure, script requirements, and MLflow tracking
181
+ - [agentme-edr-028](028-ai-eval-standards.md) — AI eval standards: folder structure, script requirements, and MLflow tracking
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  name: agentme-edr-policy-019-ai-agents-development-standards
3
- description: Defines the standard framework and patterns for building AI agents with tool-invocation loops using the deepagents framework. Use when building agents where the LLM autonomously decides which tools to call and when to stop. For simple LLM calls see agentme-edr-018, for workflow orchestration see agentme-edr-020.
4
- apply-to: AI agent projects that use tool-invocation loops where the LLM decides which tools to call and when to stop
3
+ description: Defines the structural patterns and design decisions for building AI agents with tool-invocation loops using the deepagents framework: framework selection, sandbox setup, state naming, agent naming, composition patterns, and system prompt structure. Use when designing or scaffolding a new agent. For tool definitions, error handling, observability, and testing see agentme-edr-020. For simple LLM calls see agentme-edr-018, for workflow orchestration see agentme-edr-021.
4
+ apply-to: AI agent projects consult when designing agent structure, choosing sandbox approach, defining naming conventions, and composing multi-agent systems
5
5
  valid-from: 2026-06-05
6
6
  ---
7
7
 
@@ -9,9 +9,9 @@ valid-from: 2026-06-05
9
9
 
10
10
  ## Context and Problem Statement
11
11
 
12
- AI applications often need to give LLMs the ability to autonomously choose and invoke tools to accomplish tasks. Without standardized patterns for agent implementation, projects end up with incompatible approaches to tool definition, state management, and runtime environments.
12
+ AI applications often need to give LLMs the ability to autonomously choose and invoke tools to accomplish tasks. Without standardized structural patterns for agent design, projects end up with incompatible approaches to framework selection, sandbox setup, state management, naming, composition, and system prompts.
13
13
 
14
- Which framework should be used for building agents with tool-invocation loops, and what are the essential patterns for agent state, tools, and execution environments?
14
+ Which framework should be used for building agents, how should agents be sandboxed, named, composed, and how should their system prompts be structured?
15
15
 
16
16
  ## Decision Outcome
17
17
 
@@ -87,84 +87,9 @@ def run_file_analysis_agent(input_files: List[Path]) -> AnalysisResult:
87
87
  **State type naming:**
88
88
 
89
89
  - Agent state types MUST end with `_agent_state` suffix (e.g., `file_analyzer_agent_state`)
90
- - Follow [agentme-edr-020](020-ai-workflow-development-standards.md) rule `11-state-type-conventions` when agents are used as workflow nodes
90
+ - Follow [agentme-edr-021](021-ai-workflow-development-standards.md) rule `11-state-type-conventions` when agents are used as workflow nodes
91
91
 
92
- #### 04-tool-definition-patterns
93
-
94
- Tools provided to agents MUST follow these patterns:
95
-
96
- **Tool signature:**
97
-
98
- ```python
99
- from typing import Any, Dict
100
-
101
- def tool_name(arg1: str, arg2: int) -> Dict[str, Any]:
102
- """
103
- Brief description of what the tool does.
104
-
105
- Args:
106
- arg1: Description of arg1
107
- arg2: Description of arg2
108
-
109
- Returns:
110
- Dictionary with tool execution results
111
- """
112
- # Tool implementation
113
- return {"status": "success", "result": ...}
114
- ```
115
-
116
- **Tool requirements:**
117
-
118
- - Tool names MUST be descriptive action verbs (e.g., `search_files`, `execute_command`, `read_document`)
119
- - Tool docstrings MUST clearly describe the tool's purpose, arguments, and return value (the LLM reads these)
120
- - Tools MUST return structured data (dictionaries or dataclasses), not bare strings or untyped values
121
- - Tools MUST handle errors gracefully and return error information in the result structure, not raise exceptions
122
- - Tools that interact with external systems MUST be placed in `adapters/connectors/` per [agentme-edr-026](026-pragmatic-hexagonal-architecture.md)
123
-
124
- **Error handling in tools:**
125
-
126
- ```python
127
- def search_files(pattern: str, directory: str = ".") -> Dict[str, Any]:
128
- """Search for files matching a glob pattern."""
129
- try:
130
- matches = list(Path(directory).glob(pattern))
131
- return {
132
- "status": "success",
133
- "matches": [str(m) for m in matches],
134
- "count": len(matches)
135
- }
136
- except Exception as e:
137
- return {
138
- "status": "error",
139
- "error_message": str(e),
140
- "error_type": type(e).__name__
141
- }
142
- ```
143
-
144
- #### 05-agent-error-handling-and-recovery
145
-
146
- Agents MUST implement robust error handling:
147
-
148
- **Maximum iteration limits:**
149
-
150
- - Every agent MUST have a maximum iteration limit to prevent infinite loops
151
- - The default maximum SHOULD be configurable and logged when reached
152
- - When the maximum is reached, the agent MUST return a structured failure result, not raise an exception
153
-
154
- **Tool failure handling:**
155
-
156
- - When a tool returns an error, the agent MUST be able to observe the error and decide on recovery actions
157
- - Tools MUST NOT raise exceptions for expected failures (network errors, file not found, etc.)
158
- - Agents MAY implement retry logic with exponential backoff for transient failures
159
-
160
- **Terminal states:**
161
-
162
- Agents MUST recognize and handle three terminal states:
163
- - **Success**: Goal achieved, task complete
164
- - **Failure**: Goal cannot be achieved, give up gracefully
165
- - **Timeout**: Maximum iterations reached, return partial results if possible
166
-
167
- #### 06-agent-naming-conventions
92
+ #### 04-agent-naming-conventions
168
93
 
169
94
  Agent class names MUST follow the pattern `<Purpose>Agent` where `<Purpose>` describes what the agent does:
170
95
 
@@ -179,93 +104,14 @@ Agent class names MUST follow the pattern `<Purpose>Agent` where `<Purpose>` des
179
104
  - `MyAgent` (not descriptive)
180
105
  - `Agent1` (numbered, not semantic)
181
106
 
182
- When agents are used as nodes in workflows, the node name MUST use the `_agent` suffix per [agentme-edr-020](020-ai-workflow-development-standards.md) rule `09-node-naming-conventions`.
183
-
184
- #### 07-agent-observability
185
-
186
- Agent execution MUST be observable through logging and tracing:
187
-
188
- - Log each iteration of the perceive → plan → act → observe cycle with iteration number and tool selection.
189
- - Use structured logging (JSON) with fields: `iteration`, `tool_selected`, `tool_result_status`, `decision`.
190
- - For LLM calls within agents, follow [agentme-edr-018](018-ai-llm-development-standards.md) rule `03-llm-observability`.
191
- - When agents run as workflow nodes, MLflow tracking from the parent workflow automatically captures agent-level traces.
192
- - The project Makefile MUST expose a `dev-mlflow` target to start a local MLflow tracking server for development inspection, per [agentme-edr-008](../devops/008-common-targets.md) rule `09-ai-project-dev-targets`.
107
+ When agents are used as nodes in workflows, the node name MUST use the `_agent` suffix per [agentme-edr-021](021-ai-workflow-development-standards.md) rule `09-node-naming-conventions`.
193
108
 
194
- **Example structured log entry:**
195
-
196
- ```json
197
- {
198
- "timestamp": "2026-06-05T10:30:45Z",
199
- "agent": "FileAnalyzerAgent",
200
- "iteration": 3,
201
- "tool_selected": "search_files",
202
- "tool_args": {"pattern": "*.py"},
203
- "tool_result_status": "success",
204
- "decision": "continue"
205
- }
206
- ```
207
-
208
- #### 08-agent-unit-testing
209
-
210
- Agent LLM calls are external API calls and MUST be mocked in unit tests per [agentme-edr-018](018-ai-llm-development-standards.md) rule `04-unit-test-mocking`.
211
-
212
- Because agents drive a tool-invocation loop — where the LLM decides which tools to call — the fake model must return **tool-call messages** followed by a final answer. Use **`GenericFakeChatModel`** for this:
213
-
214
- ```python
215
- from langchain_core.language_models.fake_chat_models import GenericFakeChatModel
216
- from langchain_core.messages import AIMessage
217
-
218
- def test_file_analyzer_agent_calls_search_then_stops():
219
- # Iteration 1: LLM requests a tool call
220
- tool_call_msg = AIMessage(
221
- content="",
222
- tool_calls=[{
223
- "name": "search_files",
224
- "args": {"pattern": "*.py", "directory": "/workspace"},
225
- "id": "call_1"
226
- }]
227
- )
228
- # Iteration 2: LLM produces a final answer after observing the tool result
229
- final_msg = AIMessage(content="Found 3 Python files matching the pattern.")
230
-
231
- fake_model = GenericFakeChatModel(messages=iter([tool_call_msg, final_msg]))
232
-
233
- agent = FileAnalyzerAgent(model=fake_model)
234
- result = agent.run(directory="/workspace")
235
-
236
- assert result.status == "success"
237
- assert "3 Python files" in result.summary
238
- ```
239
-
240
- Agents MUST be designed so that the LLM instance is injectable (constructor parameter) to allow test doubles. See [agentme-edr-018](018-ai-llm-development-standards.md) rule `04-unit-test-mocking` for the injectable LLM pattern.
241
-
242
- **`mock_deep_agent`**
243
-
244
- Place `mock_deep_agent` in a shared test utilities module (e.g., `tests/helpers.py`) so all test files that need it can import it from one location and mock deep_agent instances when needed.
245
-
246
- **Example usage:**
247
-
248
- ```python
249
- from tests.helpers import mock_deep_agent
250
-
251
- def test_workflow_calls_subagent(mocker):
252
- mock_deep_agent(
253
- mocker,
254
- "mypackage.nodes.analysis_node.create_workflow_agent",
255
- output={"status": "success", "findings": ["issue A"]}
256
- )
257
-
258
- result = run_analysis_workflow(input_data)
259
-
260
- assert result.findings == ["issue A"]
261
- ```
262
-
263
- #### 09-agent-composition
109
+ #### 05-agent-composition
264
110
 
265
111
  When multiple agents are needed:
266
112
 
267
113
  - **Single agent with multiple tools:** Use when tools share a common goal and context (e.g., a code analysis agent with `read_file`, `search_code`, and `analyze_pattern` tools).
268
- - **Multiple agents as workflow nodes:** Use when agents have distinct responsibilities and outputs that feed into each other. Orchestrate them using LangGraph per [agentme-edr-020](020-ai-workflow-development-standards.md).
114
+ - **Multiple agents as workflow nodes:** Use when agents have distinct responsibilities and outputs that feed into each other. Orchestrate them using LangGraph per [agentme-edr-021](021-ai-workflow-development-standards.md).
269
115
  - Do NOT create nested agent loops (agent calling agent autonomously). Use workflows for multi-agent orchestration.
270
116
 
271
117
  **Decision criteria:**
@@ -276,11 +122,74 @@ When multiple agents are needed:
276
122
  | Multiple workflow-orchestrated agents | Each agent has a distinct goal; outputs flow between agents; deterministic sequencing needed |
277
123
  | Nested agents (FORBIDDEN) | Never — always use workflow orchestration instead |
278
124
 
125
+ #### 06-agent-system-prompt-structure
126
+
127
+ Every agent system prompt MUST follow this XML-section template. Sections must appear in this order. Required sections must always be present; optional sections may be omitted when they genuinely do not apply; never reorder them.
128
+
129
+ ```xml
130
+ <OBJECTIVE>
131
+ [A one or two-sentence summary of the agent's main deliverable.
132
+ e.g.: Split the incoming file list into logical batches for parallel processing.]
133
+ </OBJECTIVE>
134
+
135
+ <ROLE>
136
+ [Defines who the agent is, its area of expertise, and its core persona.
137
+ If running inside a workflow, define exactly which node in WORKFLOW_CONTEXT this agent corresponds to.
138
+ e.g.: You are the batch_planning_agent (see WORKFLOW_CONTEXT). You are an expert at partitioning large file sets into balanced, directory-aware batches.]
139
+ </ROLE>
140
+
141
+ <INPUT>
142
+ [All inputs for this agent invocation. For standalone agents: list only the agent-specific inputs. For workflow agents: list workflow-level inputs shared across all agents first, then agent-specific inputs such as judge outcomes, counters, or intermediate results from upstream nodes.]
143
+ </INPUT>
144
+
145
+ <!-- Optional: include when the agent follows a non-trivial sequence of steps -->
146
+ <STEPS>
147
+ [Numbered list or chronological steps detailing how the agent should process an incoming request.
148
+ e.g.:
149
+ 1. Analyse the file list and group files by directory.
150
+ 2. Assign files to batches respecting the size constraints.
151
+ 3. Emit the JSON output described in OUTPUT_FORMAT.]
152
+ </STEPS>
153
+
154
+ <!-- Optional: include when tool use needs explicit guidance -->
155
+ <TOOL_GUIDANCE>
156
+ [Explicit instructions on when and how to use external tools, preventing the agent from guessing or using the wrong sequence.
157
+ e.g.: Do not call any tools. All reasoning is done in-context using the INPUT fields only.]
158
+ </TOOL_GUIDANCE>
159
+
160
+ <!-- Optional: include when hard constraints need to be stated explicitly -->
161
+ <GUARDRAILS>
162
+ [Hard, non-negotiable constraints the agent must never violate.
163
+ e.g.: NEVER create batches with fewer than 5 or more than 20 files. NEVER split files from the same directory across different batches unless unavoidable.]
164
+ </GUARDRAILS>
165
+
166
+ <OUTPUT_FORMAT>
167
+ [A templated example or JSON schema specifying exactly how the final response should look.
168
+ e.g.: Respond with a JSON object matching this schema: ...]
169
+ </OUTPUT_FORMAT>
170
+
171
+ <!-- Workflow-only: omit this section for standalone (non-workflow) agents -->
172
+ <WORKFLOW_CONTEXT>
173
+ [A detailed prose or diagram description of the overall workflow graph so the agent understands its role within the larger flow. Reference the specific node name that maps to this agent. Include enough detail about upstream and downstream nodes so the agent can reason about its context.]
174
+ </WORKFLOW_CONTEXT>
175
+ ```
176
+
177
+ **Rules:**
178
+
179
+ | Section | Required? | Notes |
180
+ |---|---|---|
181
+ | `<OBJECTIVE>` | Required | One or two sentences summarising the agent's main deliverable. |
182
+ | `<ROLE>` | Required | Agent persona and expertise. When inside a workflow, MUST reference its node name from `<WORKFLOW_CONTEXT>`. |
183
+ | `<INPUT>` | Required | List ALL inputs. For workflow agents: workflow-level inputs first, then agent-specific inputs. |
184
+ | `<STEPS>` | Optional | Include when the agent follows a non-trivial numbered sequence of steps. |
185
+ | `<TOOL_GUIDANCE>` | Optional | Include when tool use order or conditions need explicit direction. |
186
+ | `<GUARDRAILS>` | Optional | Hard constraints that must never be violated. |
187
+ | `<OUTPUT_FORMAT>` | Required | MUST include a concrete schema or templated example; do not leave it vague. |
188
+ | `<WORKFLOW_CONTEXT>` | Conditional | MUST be omitted for standalone agents. MUST be present when the agent runs as a node inside a LangGraph workflow. |
189
+
279
190
  ## References
280
191
 
281
192
  - [agentme-edr-018](018-ai-llm-development-standards.md) — LLM development standards (LangChain configuration, mocking patterns)
282
- - [agentme-edr-020](020-ai-workflow-development-standards.md) — Workflow development standards (using agents as workflow nodes)
283
- - [agentme-edr-026](026-pragmatic-hexagonal-architecture.md) — Hexagonal architecture (tool placement in adapters/connectors)
193
+ - [agentme-edr-021](021-ai-workflow-development-standards.md) — Workflow development standards (using agents as workflow nodes)
194
+ - [agentme-edr-020](020-ai-agents-quality-standards.md) — Agent implementation quality standards (tool definitions, error handling, observability, unit testing)
284
195
  - [agentme-edr-014](014-python-project-tooling.md) — Python project tooling and structure
285
- - [agentme-edr-007](../principles/007-project-quality-standards.md) — Project quality standards including AI-tier testing requirements (rule `09-ai-project-testing-requirements`)
286
- - [agentme-edr-021](021-ai-eval-standards.md) — AI eval standards: folder structure, script requirements, and MLflow tracking
@@ -0,0 +1,182 @@
1
+ ---
2
+ name: agentme-edr-policy-020-ai-agents-quality-standards
3
+ description: Defines implementation quality standards for AI agents: tool definition patterns, error handling and recovery, observability, and unit testing. Apply alongside agentme-edr-019 when implementing or reviewing agent code. For agent architecture and structural decisions (framework, sandbox, naming, composition, system prompts) see agentme-edr-019.
4
+ apply-to: AI agent implementation code — apply when writing tools, error handlers, logging, and unit tests for agents
5
+ valid-from: 2026-06-09
6
+ ---
7
+
8
+ # agentme-edr-policy-020: AI agents quality standards
9
+
10
+ ## Context and Problem Statement
11
+
12
+ Beyond selecting the right framework and structural patterns, agent implementations require consistent standards for how tools are defined, how errors are handled, how execution is observed, and how agents are tested. Without these standards, agents become hard to debug, unreliable, and untestable.
13
+
14
+ How should agent tools be defined, what error handling must agents implement, how should agent execution be observed, and how should agents be unit tested?
15
+
16
+ ## Decision Outcome
17
+
18
+ **Agent implementations MUST follow the tool definition, error handling, observability, and unit testing standards defined here, alongside the structural decisions in [agentme-edr-019](019-ai-agents-development-standards.md).**
19
+
20
+ ### Details
21
+
22
+ #### 01-tool-definition-patterns
23
+
24
+ Tools provided to agents MUST follow these patterns:
25
+
26
+ **Tool signature:**
27
+
28
+ ```python
29
+ from typing import Any, Dict
30
+
31
+ def tool_name(arg1: str, arg2: int) -> Dict[str, Any]:
32
+ """
33
+ Brief description of what the tool does.
34
+
35
+ Args:
36
+ arg1: Description of arg1
37
+ arg2: Description of arg2
38
+
39
+ Returns:
40
+ Dictionary with tool execution results
41
+ """
42
+ # Tool implementation
43
+ return {"status": "success", "result": ...}
44
+ ```
45
+
46
+ **Tool requirements:**
47
+
48
+ - Tool names MUST be descriptive action verbs (e.g., `search_files`, `execute_command`, `read_document`)
49
+ - Tool docstrings MUST clearly describe the tool's purpose, arguments, and return value (the LLM reads these)
50
+ - Tools MUST return structured data (dictionaries or dataclasses), not bare strings or untyped values
51
+ - Tools MUST handle errors gracefully and return error information in the result structure, not raise exceptions
52
+ - Tools that interact with external systems MUST be placed in `adapters/connectors/` per [agentme-edr-026](026-pragmatic-hexagonal-architecture.md)
53
+
54
+ **Error handling in tools:**
55
+
56
+ ```python
57
+ def search_files(pattern: str, directory: str = ".") -> Dict[str, Any]:
58
+ """Search for files matching a glob pattern."""
59
+ try:
60
+ matches = list(Path(directory).glob(pattern))
61
+ return {
62
+ "status": "success",
63
+ "matches": [str(m) for m in matches],
64
+ "count": len(matches)
65
+ }
66
+ except Exception as e:
67
+ return {
68
+ "status": "error",
69
+ "error_message": str(e),
70
+ "error_type": type(e).__name__
71
+ }
72
+ ```
73
+
74
+ #### 02-agent-error-handling-and-recovery
75
+
76
+ Agents MUST implement robust error handling:
77
+
78
+ **Maximum iteration limits:**
79
+
80
+ - Every agent MUST have a maximum iteration limit to prevent infinite loops
81
+ - The default maximum SHOULD be configurable and logged when reached
82
+ - When the maximum is reached, the agent MUST return a structured failure result, not raise an exception
83
+
84
+ **Tool failure handling:**
85
+
86
+ - When a tool returns an error, the agent MUST be able to observe the error and decide on recovery actions
87
+ - Tools MUST NOT raise exceptions for expected failures (network errors, file not found, etc.)
88
+ - Agents MAY implement retry logic with exponential backoff for transient failures
89
+
90
+ **Terminal states:**
91
+
92
+ Agents MUST recognize and handle three terminal states:
93
+ - **Success**: Goal achieved, task complete
94
+ - **Failure**: Goal cannot be achieved, give up gracefully
95
+ - **Timeout**: Maximum iterations reached, return partial results if possible
96
+
97
+ #### 03-agent-observability
98
+
99
+ Agent execution MUST be observable through logging and tracing:
100
+
101
+ - Log each iteration of the perceive → plan → act → observe cycle with iteration number and tool selection.
102
+ - Use structured logging (JSON) with fields: `iteration`, `tool_selected`, `tool_result_status`, `decision`.
103
+ - For LLM calls within agents, follow [agentme-edr-018](018-ai-llm-development-standards.md) rule `03-llm-observability`.
104
+ - When agents run as workflow nodes, MLflow tracking from the parent workflow automatically captures agent-level traces.
105
+ - The project Makefile MUST expose a `dev-mlflow` target to start a local MLflow tracking server for development inspection, per [agentme-edr-008](../devops/008-common-targets.md) rule `09-ai-project-dev-targets`.
106
+
107
+ **Example structured log entry:**
108
+
109
+ ```json
110
+ {
111
+ "timestamp": "2026-06-05T10:30:45Z",
112
+ "agent": "FileAnalyzerAgent",
113
+ "iteration": 3,
114
+ "tool_selected": "search_files",
115
+ "tool_args": {"pattern": "*.py"},
116
+ "tool_result_status": "success",
117
+ "decision": "continue"
118
+ }
119
+ ```
120
+
121
+ #### 04-agent-unit-testing
122
+
123
+ Agent LLM calls are external API calls and MUST be mocked in unit tests per [agentme-edr-018](018-ai-llm-development-standards.md) rule `04-unit-test-mocking`.
124
+
125
+ Because agents drive a tool-invocation loop — where the LLM decides which tools to call — the fake model must return **tool-call messages** followed by a final answer. Use **`GenericFakeChatModel`** for this:
126
+
127
+ ```python
128
+ from langchain_core.language_models.fake_chat_models import GenericFakeChatModel
129
+ from langchain_core.messages import AIMessage
130
+
131
+ def test_file_analyzer_agent_calls_search_then_stops():
132
+ # Iteration 1: LLM requests a tool call
133
+ tool_call_msg = AIMessage(
134
+ content="",
135
+ tool_calls=[{
136
+ "name": "search_files",
137
+ "args": {"pattern": "*.py", "directory": "/workspace"},
138
+ "id": "call_1"
139
+ }]
140
+ )
141
+ # Iteration 2: LLM produces a final answer after observing the tool result
142
+ final_msg = AIMessage(content="Found 3 Python files matching the pattern.")
143
+
144
+ fake_model = GenericFakeChatModel(messages=iter([tool_call_msg, final_msg]))
145
+
146
+ agent = FileAnalyzerAgent(model=fake_model)
147
+ result = agent.run(directory="/workspace")
148
+
149
+ assert result.status == "success"
150
+ assert "3 Python files" in result.summary
151
+ ```
152
+
153
+ Agents MUST be designed so that the LLM instance is injectable (constructor parameter) to allow test doubles. See [agentme-edr-018](018-ai-llm-development-standards.md) rule `04-unit-test-mocking` for the injectable LLM pattern.
154
+
155
+ **`mock_deep_agent`**
156
+
157
+ Place `mock_deep_agent` in a shared test utilities module (e.g., `tests/helpers.py`) so all test files that need it can import it from one location and mock deep_agent instances when needed.
158
+
159
+ **Example usage:**
160
+
161
+ ```python
162
+ from tests.helpers import mock_deep_agent
163
+
164
+ def test_workflow_calls_subagent(mocker):
165
+ mock_deep_agent(
166
+ mocker,
167
+ "mypackage.nodes.analysis_node.create_workflow_agent",
168
+ output={"status": "success", "findings": ["issue A"]}
169
+ )
170
+
171
+ result = run_analysis_workflow(input_data)
172
+
173
+ assert result.findings == ["issue A"]
174
+ ```
175
+
176
+ ## References
177
+
178
+ - [agentme-edr-019](019-ai-agents-development-standards.md) — Agent development standards (framework, sandbox, naming, composition, system prompts)
179
+ - [agentme-edr-018](018-ai-llm-development-standards.md) — LLM development standards (LangChain configuration, mocking patterns)
180
+ - [agentme-edr-026](026-pragmatic-hexagonal-architecture.md) — Hexagonal architecture (tool placement in adapters/connectors)
181
+ - [agentme-edr-007](../principles/007-project-quality-standards.md) — Project quality standards including AI-tier testing requirements (rule `09-ai-project-testing-requirements`)
182
+ - [agentme-edr-028](028-ai-eval-standards.md) — AI eval standards: folder structure, script requirements, and MLflow tracking
@@ -1,11 +1,11 @@
1
1
  ---
2
- name: agentme-edr-policy-020-ai-workflow-development-standards
2
+ name: agentme-edr-policy-021-ai-workflow-development-standards
3
3
  description: Defines the standard toolchain, framework, observability, and workflow patterns for building LangGraph workflows in Python. Use when scaffolding, reviewing, or extending AI workflow projects that orchestrate LLM calls, agents, and algorithmic nodes. For simple LLM calls see agentme-edr-018, for agentic patterns see agentme-edr-019.
4
4
  apply-to: AI workflow projects using LangGraph StateGraph built with Python
5
5
  valid-from: 2026-06-05
6
6
  ---
7
7
 
8
- # agentme-edr-policy-020: AI workflow development standards
8
+ # agentme-edr-policy-021: AI workflow development standards
9
9
 
10
10
  ## Context and Problem Statement
11
11
 
@@ -37,7 +37,7 @@ Use **MLflow** for all workflow observability and evaluation:
37
37
 
38
38
  #### 04-dataset-driven-accuracy-measurement
39
39
 
40
- Eval dataset and implementation requirements are defined in [agentme-edr-021](021-ai-eval-standards.md). Testing requirements (when evals are required, release gates) are defined in [agentme-edr-007](../principles/007-project-quality-standards.md) rule `09-ai-project-testing-requirements`.
40
+ Eval dataset and implementation requirements are defined in [agentme-edr-028](028-ai-eval-standards.md). Testing requirements (when evals are required, release gates) are defined in [agentme-edr-007](../principles/007-project-quality-standards.md) rule `09-ai-project-testing-requirements`.
41
41
 
42
42
  #### 05-flow-documentation
43
43
 
@@ -101,7 +101,7 @@ lib/src/<package_name>/
101
101
 
102
102
  #### 08-workflow-evals
103
103
 
104
- Eval folder structure and script requirements are defined in [agentme-edr-021](021-ai-eval-standards.md).
104
+ Eval folder structure and script requirements are defined in [agentme-edr-028](028-ai-eval-standards.md).
105
105
 
106
106
  #### 09-node-naming-conventions
107
107
 
@@ -258,5 +258,5 @@ result = graph.invoke(input_state, config={"thread_id": "session-123"})
258
258
  - [agentme-edr-026](026-pragmatic-hexagonal-architecture.md) — Adapter/application layer separation that defines the project layout
259
259
  - [agentme-edr-014](014-python-project-tooling.md) — Python project tooling and structure
260
260
  - [agentme-edr-024](024-ml-dataset-structure.md) — ML dataset structure for eval datasets
261
- - [agentme-edr-021](021-ai-eval-standards.md) — AI eval standards: folder structure, script requirements, and MLflow tracking
261
+ - [agentme-edr-028](028-ai-eval-standards.md) — AI eval standards: folder structure, script requirements, and MLflow tracking
262
262
  - [agentme-edr-007](../principles/007-project-quality-standards.md) — Project quality standards including AI-tier testing requirements (rule `09-ai-project-testing-requirements`)
@@ -17,7 +17,7 @@ How should an AI agent project integrate XDRS as its runtime source of truth for
17
17
 
18
18
  **Embed XDRS documents in `lib/data/.xdrs/`, instruct the agent to consult them via `AGENTS.md`, equip the agent with sandboxed file tools, and use the deepagents framework when a local sandbox is required.**
19
19
 
20
- This policy MUST only be applied when the project explicitly chooses XDRS as its knowledge governance layer. It is not required by [agentme-edr-019](019-ai-agents-development-standards.md) or [agentme-edr-020](020-ai-workflow-development-standards.md) in general.
20
+ This policy MUST only be applied when the project explicitly chooses XDRS as its knowledge governance layer. It is not required by [agentme-edr-019](019-ai-agents-development-standards.md) or [agentme-edr-021](021-ai-workflow-development-standards.md) in general.
21
21
 
22
22
  ### Details
23
23
 
@@ -59,7 +59,7 @@ Read /AGENTS.md and follow all instructions in it before proceeding.
59
59
 
60
60
  #### 02-agent-file-tools
61
61
 
62
- Every agent that uses the XDRS knowledge layer MUST use the file tools provided by the deepagents framework. Do not implement hand-rolled alternatives — see [agentme-edr-policy-018-ai-agent-development-standards.[09-local-sandbox]](018-ai-agent-development-standards.md) for the full sandbox and tool requirements.
62
+ Every agent that uses the XDRS knowledge layer MUST use the file tools provided by the deepagents framework. Do not implement hand-rolled alternatives — see [agentme-edr-019 rule 02-local-sandbox](019-ai-agents-development-standards.md) for the full sandbox and tool requirements.
63
63
 
64
64
  These tools operate over two sandboxed roots (configured in rule `03-local-sandbox`):
65
65
 
@@ -72,7 +72,7 @@ These tools operate over two sandboxed roots (configured in rule `03-local-sandb
72
72
 
73
73
  #### 03-local-sandbox
74
74
 
75
- Follow [agentme-edr-policy-018-ai-agent-development-standards.[09-local-sandbox]](018-ai-agent-development-standards.md) for the general deepagents sandbox setup. When XDRS is in use, add the following mounts to the sandbox configuration:
75
+ Follow [agentme-edr-019 rule 02-local-sandbox](019-ai-agents-development-standards.md) for the general deepagents sandbox setup. When XDRS is in use, add the following mounts to the sandbox configuration:
76
76
 
77
77
  | Source | Content | Deepagents sandbox path |
78
78
  |---|---|---|
@@ -92,8 +92,13 @@ agents_md = Path(temp_root) / "AGENTS.md"
92
92
  agents_md.write_text(_AGENTS_MD) # content from xdrs-core AGENTS.md template; see rule 01-xdrs-knowledge-layer
93
93
 
94
94
  # Add these mounts alongside the base mounts from agentme-edr-019 rule 02-local-sandbox:
95
- xdrs_mounts = [
96
- {"src": f"{data_root}/.xdrs", "dst": "/.xdrs", "readonly": True},
97
- {"src": str(agents_md), "dst": "/AGENTS.md", "readonly": True},
98
- ]
95
+ # (mount_paths uses {src: dst} dict format per agentme-edr-019)
96
+ sandbox = Sandbox(
97
+ mount_paths={
98
+ tmp_dir: "/workspace",
99
+ f"{data_root}/.xdrs": "/.xdrs",
100
+ str(agents_md): "/AGENTS.md",
101
+ },
102
+ virtual_mode=True,
103
+ )
99
104
  ```
@@ -1,11 +1,11 @@
1
1
  ---
2
- name: agentme-edr-policy-021-ai-eval-standards
2
+ name: agentme-edr-policy-028-ai-eval-standards
3
3
  description: Defines how to structure, write, and run eval tests for AI projects — folder layout, script requirements, and MLflow tracking. Use when implementing evals for LLM, Agent, or Workflow projects. For when evals are required see agentme-edr-007 rule 09-ai-project-testing-requirements.
4
4
  apply-to: Python AI projects (LLM, Agent, or Workflow tier) that implement eval testing
5
5
  valid-from: 2026-06-05
6
6
  ---
7
7
 
8
- # agentme-edr-policy-021: AI eval standards
8
+ # agentme-edr-policy-028: AI eval standards
9
9
 
10
10
  ## Context and Problem Statement
11
11
 
@@ -86,5 +86,6 @@ with mlflow.start_run():
86
86
  - [agentme-edr-007](../principles/007-project-quality-standards.md) — Project quality standards: when evals are required per AI tier (rule `09-ai-project-testing-requirements`) and statistical model eval targets (rule `07-statistical-models-must-have-eval-targets`)
87
87
  - [agentme-edr-018](018-ai-llm-development-standards.md) — LLM development standards: LangChain framework and observability
88
88
  - [agentme-edr-019](019-ai-agents-development-standards.md) — Agent development standards
89
- - [agentme-edr-020](020-ai-workflow-development-standards.md) — Workflow development standards
89
+ - [agentme-edr-021](021-ai-workflow-development-standards.md) — Workflow development standards
90
+
90
91
  - [agentme-edr-024](024-ml-dataset-structure.md) — ML dataset structure for eval datasets
@@ -32,9 +32,10 @@ Language and framework-specific tooling and project structure.
32
32
  - [agentme-edr-014](application/014-python-project-tooling.md) - **Python project tooling and structure** - Scaffold Python packages and CLIs with the standard layout *(includes skill: [005-create-python-project](application/skills/005-create-python-project/SKILL.md))*
33
33
  - [agentme-edr-015](application/015-cli-tool-standards.md) - **CLI tool standards** - Define command UX and behavior for CLI tools
34
34
  - [agentme-edr-018](application/018-ai-llm-development-standards.md) - **AI LLM development standards** - Standard framework (LangChain) and patterns for simple LLM calls with explicit configuration (no environment variables)
35
- - [agentme-edr-019](application/019-ai-agents-development-standards.md) - **AI agents development standards** - Standard framework (deepagents) and patterns for agentic tool-invocation loops
36
- - [agentme-edr-020](application/020-ai-workflow-development-standards.md) - **AI workflow development standards** - Standard toolchain (LangGraph), evaluation, and testing patterns for workflow projects
37
- - [agentme-edr-021](application/021-ai-eval-standards.md) - **AI eval standards** - Folder structure, script requirements, and MLflow tracking for eval tests across LLM, Agent, and Workflow tiers
35
+ - [agentme-edr-019](application/019-ai-agents-development-standards.md) - **AI agents development standards** - Structural patterns for agents: framework selection, sandbox setup, naming conventions, composition, and system prompt structure
36
+ - [agentme-edr-020](application/020-ai-agents-quality-standards.md) - **AI agents implementation quality standards** - Tool definition patterns, error handling, observability, and unit testing for agents
37
+ - [agentme-edr-021](application/021-ai-workflow-development-standards.md) - **AI workflow development standards** - Standard toolchain (LangGraph), evaluation, and testing patterns for workflow projects
38
+ - [agentme-edr-028](application/028-ai-eval-standards.md) - **AI eval standards** - Folder structure, script requirements, and MLflow tracking for eval tests across LLM, Agent, and Workflow tiers
38
39
  - [agentme-edr-024](application/024-ml-dataset-structure.md) - **ML dataset structure** - Standard folder layout and file conventions for ML datasets
39
40
  - [agentme-edr-025](application/025-ai-agent-xdrs-knowledge-layer.md) - **AI agent XDRS knowledge layer** - How to integrate XDRS as the runtime source of truth for policies and skills in AI agents (apply only when the project explicitly uses XDRS)
40
41
  - [agentme-edr-026](application/026-pragmatic-hexagonal-architecture.md) - **Pragmatic hexagonal architecture** - Organize application layers as External/Adapters/Application with practical coupling rules
@@ -244,7 +244,7 @@ AI projects are classified into three tiers — LLM, Agent, and Workflow — def
244
244
  |---|---|---|---|
245
245
  | **LLM** ([agentme-edr-018](../application/018-ai-llm-development-standards.md)) | Not required | Not required; SHOULD be used when critical prompts are in use to measure accuracy and detect model drift | Not required |
246
246
  | **Agent** ([agentme-edr-019](../application/019-ai-agents-development-standards.md)) | Not required | Not required; MAY be used | Not required |
247
- | **Workflow** ([agentme-edr-020](../application/020-ai-workflow-development-standards.md)) | **Required** — see below | **Required** before every release; failed evals block release | Advised |
247
+ | **Workflow** ([agentme-edr-021](../application/021-ai-workflow-development-standards.md)) | **Required** — see below | **Required** before every release; failed evals block release | Advised |
248
248
 
249
249
  **Workflow unit test requirements:**
250
250
 
@@ -260,4 +260,4 @@ AI projects are classified into three tiers — LLM, Agent, and Workflow — def
260
260
  - Evals MUST be executed before every release.
261
261
  - Accuracy below project-defined thresholds MUST block the release. Thresholds MUST be documented in the eval Makefile or README.
262
262
  - Evals MUST run against real LLM providers (not mocks) to capture model drift.
263
- - For eval folder structure and script requirements, see [agentme-edr-021](../application/021-ai-eval-standards.md).
263
+ - For eval folder structure and script requirements, see [agentme-edr-028](../application/028-ai-eval-standards.md).
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "agentme",
3
- "version": "0.16.0",
3
+ "version": "0.17.0",
4
4
  "description": "",
5
5
  "dependencies": {
6
6
  "filedist": "^0.35.0"