agentme 0.14.0 → 0.15.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (20) hide show
  1. package/.filedist-package.yml +1 -1
  2. package/.xdrs/agentme/edrs/application/003-javascript-project-tooling.md +3 -3
  3. package/.xdrs/agentme/edrs/application/010-golang-project-tooling.md +3 -3
  4. package/.xdrs/agentme/edrs/application/014-python-project-tooling.md +2 -2
  5. package/.xdrs/agentme/edrs/application/015-cli-tool-standards.md +2 -2
  6. package/.xdrs/agentme/edrs/application/018-ai-llm-development-standards.md +180 -0
  7. package/.xdrs/agentme/edrs/application/019-ai-agents-development-standards.md +284 -0
  8. package/.xdrs/agentme/edrs/application/020-ai-workflow-development-standards.md +261 -0
  9. package/.xdrs/agentme/edrs/application/021-ai-eval-standards.md +90 -0
  10. package/.xdrs/agentme/edrs/application/{019-ml-dataset-structure.md → 024-ml-dataset-structure.md} +2 -2
  11. package/.xdrs/agentme/edrs/application/{020-ai-agent-xdrs-knowledge-layer.md → 025-ai-agent-xdrs-knowledge-layer.md} +4 -4
  12. package/.xdrs/agentme/edrs/application/{021-pragmatic-hexagonal-architecture.md → 026-pragmatic-hexagonal-architecture.md} +2 -2
  13. package/.xdrs/agentme/edrs/application/skills/001-create-javascript-project/SKILL.md +2 -2
  14. package/.xdrs/agentme/edrs/application/skills/003-create-golang-project/SKILL.md +2 -2
  15. package/.xdrs/agentme/edrs/application/skills/005-create-python-project/SKILL.md +3 -3
  16. package/.xdrs/agentme/edrs/index.md +7 -5
  17. package/.xdrs/agentme/edrs/principles/007-project-quality-standards.md +29 -1
  18. package/package.json +3 -3
  19. package/.xdrs/agentme/edrs/application/018-ai-agent-development-standards.md +0 -309
  20. package/.xdrs/agentme/edrs/application/024-llm-development-standards.md +0 -116
@@ -1,5 +1,5 @@
1
1
  sets:
2
- - package: xdrs-core@0.28.1
2
+ - package: xdrs-core@0.28.3
3
3
  # - package: git:https://github.com/flaviostutz/xdrs-core.git@main
4
4
  selector:
5
5
  files:
@@ -82,7 +82,7 @@ Builds that miss the threshold must not be merged.
82
82
  │ ├── dist/ # compiled files and packed .tgz artifacts
83
83
  │ └── src/ # all TypeScript source files
84
84
  │ ├── index.ts # public API re-exports from app/
85
- │ ├── adapters/ # I/O boundary layer (following agentme-edr-021)
85
+ │ ├── adapters/ # I/O boundary layer (following agentme-edr-026)
86
86
  │ │ ├── cli/ # inbound: CLI bootstrap and entry point
87
87
  │ │ ├── http/ # inbound: HTTP server bootstrap and handlers
88
88
  │ │ └── connectors/ # outbound: one folder per external resource
@@ -101,7 +101,7 @@ Builds that miss the threshold must not be merged.
101
101
 
102
102
  The root `Makefile` delegates every target to `/lib` then `/examples` in sequence. Parent Makefiles should call child Makefiles directly, and each module Makefile is responsible for running its actual tool commands through `mise exec --`.
103
103
 
104
- Internal source code MUST be organized following [agentme-edr-021](021-pragmatic-hexagonal-architecture.md): `adapters/` (inbound and outbound I/O boundaries), `app/` (business logic), and `shared/` (infrastructure-agnostic utilities). The public API entry point (`index.ts`) re-exports from `app/`.
104
+ Internal source code MUST be organized following [agentme-edr-026](026-pragmatic-hexagonal-architecture.md): `adapters/` (inbound and outbound I/O boundaries), `app/` (business logic), and `shared/` (infrastructure-agnostic utilities). The public API entry point (`index.ts`) re-exports from `app/`.
105
105
 
106
106
  When a repository contains multiple JavaScript/TypeScript packages, each package MUST live in its own module folder such as `lib/my-package/` or `services/my-service/`, each with its own `Makefile`, `README.md`, `dist/`, and `.cache/`.
107
107
 
@@ -155,6 +155,6 @@ The examples folder MUST exist for any libraries and utilities that are publishe
155
155
  ## References
156
156
 
157
157
  - [agentme-edr-004](../principles/004-unit-test-requirements.md) — Coverage and unit-test baseline
158
- - [agentme-edr-021](021-pragmatic-hexagonal-architecture.md) — Internal adapter/application layer separation for applications
158
+ - [agentme-edr-026](026-pragmatic-hexagonal-architecture.md) — Internal adapter/application layer separation for applications
159
159
  - [001-create-javascript-project](skills/001-create-javascript-project/SKILL.md) — scaffolds a new project following this structure
160
160
 
@@ -47,7 +47,7 @@ Direct installation of project-required Go CLIs with `go install ...@latest` as
47
47
  ├── main.go # binary entry point — argument dispatch only, no logic
48
48
  ├── .cache/ # GOCACHE, GOMODCACHE, golangci-lint cache, coverage
49
49
  ├── dist/ # built binaries and packaged outputs
50
- ├── adapters/ # I/O boundary layer (following agentme-edr-021)
50
+ ├── adapters/ # I/O boundary layer (following agentme-edr-026)
51
51
  │ ├── cli/ # inbound: CLI wiring — flag parsing, output formatting
52
52
  │ │ └── *.go # subfolders per feature only when complexity warrants it
53
53
  │ ├── http/ # inbound: HTTP server bootstrap and handlers
@@ -73,7 +73,7 @@ Direct installation of project-required Go CLIs with `go install ...@latest` as
73
73
 
74
74
  **Key layout rules:**
75
75
 
76
- - Internal source code is organized following [agentme-edr-021](021-pragmatic-hexagonal-architecture.md): `adapters/` (inbound and outbound I/O boundaries), `app/` (business logic), and `shared/` (infrastructure-agnostic utilities).
76
+ - Internal source code is organized following [agentme-edr-026](026-pragmatic-hexagonal-architecture.md): `adapters/` (inbound and outbound I/O boundaries), `app/` (business logic), and `shared/` (infrastructure-agnostic utilities).
77
77
  - One Go module per project (`go.mod` at the project root). In a monorepo, each Go project has its own `go.mod` in its subdirectory. No nested modules within a single project unless explicitly justified.
78
78
  - In a multi-module repository, each Go module MUST live in its own folder root with its own `Makefile`, `README.md`, `dist/`, and `.cache/`.
79
79
  - `main.go` is solely an argument dispatcher — it reads `os.Args[1]` and delegates to an `adapters/cli/<feature>/Run*()` function. No domain logic lives in `main.go`.
@@ -178,5 +178,5 @@ Use the standard library `flag` package for CLI flags. Each `adapters/cli/<featu
178
178
 
179
179
  ## References
180
180
 
181
- - [agentme-edr-021](021-pragmatic-hexagonal-architecture.md) — Defines the adapter/application separation that this layout follows
181
+ - [agentme-edr-026](026-pragmatic-hexagonal-architecture.md) — Defines the adapter/application separation that this layout follows
182
182
  - [003-create-golang-project](skills/003-create-golang-project/SKILL.md) — scaffolds a new Go project following this structure
@@ -71,7 +71,7 @@ No tool MUST write cache or state files to the project root, `src/`, `tests/`, o
71
71
  │ ├── src/
72
72
  │ │ └── <package_name>/
73
73
  │ │ ├── __init__.py
74
- │ │ ├── adapters/ # I/O boundary layer (following agentme-edr-021)
74
+ │ │ ├── adapters/ # I/O boundary layer (following agentme-edr-026)
75
75
  │ │ │ ├── cli/ # inbound: CLI bootstrap and entry point
76
76
  │ │ │ ├── http/ # inbound: HTTP server bootstrap
77
77
  │ │ │ └── connectors/ # outbound: one folder per external resource
@@ -96,7 +96,7 @@ Keep the repository root clean: source code, tests, distribution artifacts, and
96
96
 
97
97
  Use the `lib/src/` layout for import safety and packaging clarity. Keep tests under `lib/tests/` and shared test setup in `lib/tests/conftest.py`. Do not introduce `requirements.txt`, `setup.py`, `setup.cfg`, `tox.ini`, `ruff.toml`, or `ty.toml` by default; keep project metadata and tool configuration in `lib/pyproject.toml`.
98
98
 
99
- Internal source code MUST be organized following [agentme-edr-021](021-pragmatic-hexagonal-architecture.md): `adapters/` (inbound and outbound I/O boundaries), `app/` (business logic), and `shared/` (infrastructure-agnostic utilities).
99
+ Internal source code MUST be organized following [agentme-edr-026](026-pragmatic-hexagonal-architecture.md): `adapters/` (inbound and outbound I/O boundaries), `app/` (business logic), and `shared/` (infrastructure-agnostic utilities).
100
100
 
101
101
  Libraries and shared utilities must include an `examples/` folder and wire example execution into the root `test` flow, following [agentme-edr-007](../principles/007-project-quality-standards.md). Each example directory is its own Python project with its own `pyproject.toml`, and examples must import the library as a consumer would rather than reaching back into `lib/src/` with relative imports. Local example verification must install the wheel built into `lib/dist/`; do not use editable or path-based dependencies back to `lib/`.
102
102
 
@@ -34,7 +34,7 @@ This keeps the user-facing command predictable while preserving a clean library
34
34
 
35
35
  #### CLI to application separation
36
36
 
37
- - Structure the software as `cli -> app` — the CLI adapter delegates to the application layer, following [agentme-edr-021](021-pragmatic-hexagonal-architecture.md).
37
+ - Structure the software as `cli -> app` — the CLI adapter delegates to the application layer, following [agentme-edr-026](026-pragmatic-hexagonal-architecture.md).
38
38
  - The CLI layer must only parse arguments, load config, call the application layer, and format output.
39
39
  - Domain logic must live in the application layer and be usable without CLI globals such as `argv`, `stdout`, or process exit handlers.
40
40
  - Every feature available through the CLI must also be available through the application API.
@@ -99,7 +99,7 @@ This keeps the user-facing command predictable while preserving a clean library
99
99
 
100
100
  ## References
101
101
 
102
- - [agentme-edr-021](021-pragmatic-hexagonal-architecture.md) - Defines the adapter/application separation that the CLI layer follows
102
+ - [agentme-edr-026](026-pragmatic-hexagonal-architecture.md) - Defines the adapter/application separation that the CLI layer follows
103
103
  - [agentme-edr-003](003-javascript-project-tooling.md) - JavaScript project packaging and structure
104
104
  - [agentme-edr-007](../principles/007-project-quality-standards.md) - README and examples baseline
105
105
  - [agentme-edr-008](../devops/008-common-targets.md) - Standard command names for project entry points
@@ -0,0 +1,180 @@
1
+ ---
2
+ name: agentme-edr-policy-018-ai-llm-development-standards
3
+ description: Defines the standard framework, provider configuration, observability approach, and LLM mocking patterns for simple LLM calls in Python. Use when building, reviewing, or scaffolding any code that makes direct LLM calls using LangChain, manages prompt context, or handles conversation history. For agentic patterns see agentme-edr-019, for workflow patterns see agentme-edr-020.
4
+ apply-to: Python projects that make direct LLM calls, manage prompt context, or handle conversation threads
5
+ valid-from: 2026-06-05
6
+ ---
7
+
8
+ # agentme-edr-policy-018: AI LLM development standards
9
+
10
+ ## Context and Problem Statement
11
+
12
+ LLM-based applications can be built at different levels of abstraction — from a single prompt call to a full autonomous agent or a complex multi-step workflow. Without a shared vocabulary and a prescribed framework, projects mix incompatible patterns for invoking models, managing context, and tracing requests.
13
+
14
+ Which framework should be used for LLM calls, how should providers be configured, and what is the canonical meaning of "LLM", "agent", and "workflow" in this codebase?
15
+
16
+ ## Decision Outcome
17
+
18
+ **Use LangChain as the standard framework for all direct LLM interactions. Adopt a strict three-tier conceptual model — LLM / Agent / Workflow — that maps each tier to a specific library.**
19
+
20
+ ### Conceptual model
21
+
22
+ Three distinct tiers of LLM-based computation are recognized in this policy. Every component MUST be classified into exactly one tier:
23
+
24
+ | Tier | What it is | Library |
25
+ |---|---|---|
26
+ | **LLM** | A request → response prompt exchange with a model. May include a conversation history or thread. No autonomous decision-making. | `langchain` / `langchain-openai` |
27
+ | **Agent** | An LLM-based flow driven by a tool-invocation loop that the LLM itself plans and executes. The LLM decides which tools to call and when to stop. | `deepagents` |
28
+ | **Workflow** | A directed graph of nodes that mixes LLM-based nodes (simple LLM calls or agentic loops) with deterministic algorithmic nodes. The graph topology is defined in code, not chosen by the LLM at runtime. | `langgraph` |
29
+
30
+ These tiers nest: in general, a Workflow may contain Agent nodes; an Agent uses LLM calls internally. The tier of a component is determined by its outermost controlling structure.
31
+
32
+ See [agentme-edr-019](019-ai-agents-development-standards.md) for Agent implementation standards and [agentme-edr-020](020-ai-workflow-development-standards.md) for Workflow implementation standards.
33
+
34
+ ### Details
35
+
36
+ #### 01-conceptual-model
37
+
38
+ Every component that interacts with an LLM MUST be classified as exactly one of the three tiers defined in the conceptual model table above: **LLM**, **Agent**, or **Workflow**.
39
+
40
+ - Do NOT use the word "agent" to describe a component that only makes a single LLM call without a tool-invocation loop.
41
+ - Do NOT use the word "workflow" to describe a component that is purely an LLM call with no graph structure.
42
+ - When designing a new feature, identify the correct tier first. The tier determines which library and patterns apply (LangChain, deepagents, or LangGraph).
43
+
44
+ **Function calling boundary:**
45
+
46
+ - A **single** function call decided by the LLM (e.g., "call get_weather(location)") is still an LLM-tier interaction if the function is called once and the result is returned to the user.
47
+ - An **iterative** function-calling loop where the LLM observes results and decides next actions autonomously is an Agent (see [agentme-edr-019](019-ai-agents-development-standards.md)).
48
+
49
+ #### 02-llm-framework
50
+
51
+ All direct LLM calls MUST use **LangChain** via the `langchain` packages.
52
+
53
+ - Use `langchain-openai` as the provider integration layer. It supports both OpenAI and Azure OpenAI.
54
+ - **Always configure LLM providers using explicit library attributes** such as `api_key`, `base_url`, `model`, `api_version`, etc. Never rely on environment variables for LLM configuration.
55
+ - Configuration MUST be passed via constructor parameters or configuration objects, making dependencies explicit and testable.
56
+
57
+ **Example of explicit configuration:**
58
+
59
+ ```python
60
+ # Azure OpenAI configuration (explicit)
61
+ llm = ChatOpenAI(
62
+ api_key=config.azure_api_key,
63
+ azure_endpoint=config.azure_endpoint,
64
+ api_version="2024-02-15-preview",
65
+ azure_deployment=config.azure_deployment
66
+ )
67
+ ```
68
+
69
+ #### 03-llm-observability
70
+
71
+ Enable LangChain auto-tracing at every application entry point by calling `mlflow.langchain.autolog()` during startup, before any LLM call is made.
72
+
73
+ - This captures inputs, outputs, token counts, and latency for every LangChain chain or runnable automatically.
74
+
75
+ #### 04-unit-test-mocking
76
+
77
+ LLM provider calls are external API calls and MUST be mocked in unit tests. Mocking LLM providers enables offline test execution while testing the logic, routing, and state management of LLM calls, agents, and workflows.
78
+
79
+ Use LangChain's built-in fake models from `langchain_core.language_models.fake_chat_models`. Choose the utility based on what the code under test expects from the model:
80
+
81
+ | Utility | When to use |
82
+ |---|---|
83
+ | `FakeListChatModel` | The code only reads the text content of the response (`AIMessage.content`). Returns plain-text `AIMessage` objects from a pre-defined list, in order. |
84
+ | `GenericFakeChatModel` | The code expects tool calls, structured outputs, or needs to inspect the message type beyond plain text. Accepts a list of pre-built `AIMessage` (or `AIMessageChunk`) objects, giving full control over the response structure. |
85
+
86
+ **`FakeListChatModel` — plain text responses:**
87
+
88
+ ```python
89
+ from langchain_core.language_models.fake_chat_models import FakeListChatModel
90
+
91
+ def test_document_approval_routing():
92
+ fake_model = FakeListChatModel(responses=[
93
+ "APPROVE",
94
+ "The document meets all quality criteria."
95
+ ])
96
+
97
+ workflow = DocumentWorkflow(model=fake_model)
98
+ result = workflow.run(input_doc)
99
+
100
+ assert result.status == "approved"
101
+ assert "quality criteria" in result.reasoning
102
+ ```
103
+
104
+ **`GenericFakeChatModel` — tool-call or structured responses:**
105
+
106
+ ```python
107
+ from langchain_core.language_models.fake_chat_models import GenericFakeChatModel
108
+ from langchain_core.messages import AIMessage
109
+ import json
110
+
111
+ def test_agent_tool_invocation():
112
+ # Simulate the LLM requesting a tool call, then producing a final answer
113
+ tool_call_msg = AIMessage(
114
+ content="",
115
+ tool_calls=[{
116
+ "name": "search_files",
117
+ "args": {"pattern": "*.py"},
118
+ "id": "call_1"
119
+ }]
120
+ )
121
+ final_msg = AIMessage(content="Found 3 Python files.")
122
+
123
+ fake_model = GenericFakeChatModel(messages=iter([tool_call_msg, final_msg]))
124
+
125
+ agent = FileAnalyzerAgent(model=fake_model)
126
+ result = agent.run()
127
+
128
+ assert result.summary == "Found 3 Python files."
129
+ ```
130
+
131
+ **Injectable LLM pattern (required for testability):**
132
+
133
+ Whenever a workflow, agent, or node makes LLM calls, it MUST accept the LLM instance as a constructor parameter or configuration field so that unit tests can inject a fake:
134
+
135
+ ```python
136
+ class DocumentWorkflow:
137
+ def __init__(self, model: Optional[BaseChatModel] = None):
138
+ self.model = model or ChatOpenAI(
139
+ api_key=config.openai_api_key,
140
+ model="gpt-4"
141
+ )
142
+ ```
143
+
144
+ This allows unit tests to inject `FakeListChatModel` or `GenericFakeChatModel` while production code uses the real provider.
145
+
146
+ #### 05-prompt-management
147
+
148
+ Prompt templates MUST be managed explicitly and versioned:
149
+
150
+ - Store prompt templates as separate files in `prompts/` directory when they exceed 10 lines or are reused across multiple components.
151
+ - Use LangChain `PromptTemplate` or `ChatPromptTemplate` for parameterized prompts.
152
+
153
+ **Example prompt file structure:**
154
+
155
+ ```text
156
+ lib/src/<package_name>/
157
+ prompts/
158
+ summarize.txt
159
+ extract_entities.txt
160
+ ```
161
+
162
+ **Example usage:**
163
+
164
+ ```python
165
+ from langchain.prompts import PromptTemplate
166
+
167
+ prompt = PromptTemplate.from_file(
168
+ "prompts/summarize_v1.0.0.txt",
169
+ input_variables=["document"]
170
+ )
171
+ ```
172
+
173
+ ## References
174
+
175
+ - [agentme-edr-019](019-ai-agents-development-standards.md) — Agent implementation standards (deepagents, tool-invocation loops)
176
+ - [agentme-edr-020](020-ai-workflow-development-standards.md) — Workflow implementation standards (LangGraph, MLflow run-level tracking)
177
+ - [agentme-edr-004](../principles/004-unit-test-requirements.md) — Unit test requirements including external API mocking guidance
178
+ - [agentme-edr-014](014-python-project-tooling.md) — Python project tooling and structure
179
+ - [agentme-edr-007](../principles/007-project-quality-standards.md) — Project quality standards including AI-tier testing requirements (rule `09-ai-project-testing-requirements`)
180
+ - [agentme-edr-021](021-ai-eval-standards.md) — AI eval standards: folder structure, script requirements, and MLflow tracking
@@ -0,0 +1,284 @@
1
+ ---
2
+ name: agentme-edr-policy-019-ai-agents-development-standards
3
+ description: Defines the standard framework and patterns for building AI agents with tool-invocation loops using the deepagents framework. Use when building agents where the LLM autonomously decides which tools to call and when to stop. For simple LLM calls see agentme-edr-018, for workflow orchestration see agentme-edr-020.
4
+ apply-to: AI agent projects that use tool-invocation loops where the LLM decides which tools to call and when to stop
5
+ valid-from: 2026-06-05
6
+ ---
7
+
8
+ # agentme-edr-policy-019: AI agents development standards
9
+
10
+ ## Context and Problem Statement
11
+
12
+ AI applications often need to give LLMs the ability to autonomously choose and invoke tools to accomplish tasks. Without standardized patterns for agent implementation, projects end up with incompatible approaches to tool definition, state management, and runtime environments.
13
+
14
+ Which framework should be used for building agents with tool-invocation loops, and what are the essential patterns for agent state, tools, and execution environments?
15
+
16
+ ## Decision Outcome
17
+
18
+ **Use the deepagents framework for all agent implementations where an LLM autonomously decides which tools to call and when to stop.**
19
+
20
+ ### Conceptual model
21
+
22
+ An **Agent** is an LLM-based flow driven by a tool-invocation loop that the LLM itself plans and executes. The LLM decides which tools to call and when to stop. The agent follows a perceive → plan → act → observe cycle autonomously until it reaches a terminal state.
23
+
24
+ ### Details
25
+
26
+ #### 01-agent-framework
27
+
28
+ All agent implementations MUST use the **deepagents** framework.
29
+
30
+ - Use deepagents whenever the LLM needs to autonomously select and invoke tools to accomplish a task.
31
+ - The agent MUST follow the perceive → plan → act → observe cycle where the LLM observes tool outputs and decides the next action.
32
+ - All LLM calls within agents MUST follow [agentme-edr-018](018-ai-llm-development-standards.md) for LangChain configuration and observability.
33
+
34
+ **When to use agents vs workflows:**
35
+
36
+ - Use an **agent** when the LLM should autonomously decide the sequence of tool calls based on runtime observations.
37
+ - Use a **workflow** when the execution path is predefined in code, even if individual nodes involve LLM calls or agent subgraphs.
38
+ - When in doubt, prefer workflows (explicit control flow) over agents (autonomous control flow) for maintainability and predictability.
39
+
40
+ #### 02-local-sandbox
41
+
42
+ When an agent requires a **local sandbox** — an isolated environment where the agent can read files, glob-search directories, and execute shell commands — use the **[deepagents](https://github.com/deepagents/deepagents) framework** to provide that sandbox.
43
+
44
+ **When to apply this rule:**
45
+
46
+ Use deepagents sandbox whenever ANY of the following is true:
47
+ - The agent needs to execute shell commands or scripts in a controlled environment.
48
+ - The agent needs to list, read, or search files across multiple directories at runtime.
49
+ - The agent operates on user-supplied or generated file trees that must not escape a sandboxed boundary.
50
+
51
+ **Integration requirements:**
52
+
53
+ - Initialize the sandbox at the start of the agent run and shut it down in the same `try/finally` block.
54
+ - Pass the sandbox handle into the agent's state so all tool calls share the same sandbox instance.
55
+ - If the host-side code needs to pass files into the sandbox (e.g. generated config or input data), create a temporary directory with `tempfile.mkdtemp()`, write the files there, and mount it into the sandbox. Clean it up in the `finally` block.
56
+ - Replace hand-rolled `read_file`, `search_files`, and `grep_file` tool implementations with the equivalent tools provided by deepagents.
57
+
58
+ **Example:**
59
+
60
+ ```python
61
+ import tempfile
62
+ from deepagents import Sandbox
63
+
64
+ def run_file_analysis_agent(input_files: List[Path]) -> AnalysisResult:
65
+ tmp_dir = tempfile.mkdtemp()
66
+ try:
67
+ # Copy input files to temp directory
68
+ for f in input_files:
69
+ shutil.copy(f, tmp_dir)
70
+
71
+ # Initialize sandbox with mounted directory
72
+ sandbox = Sandbox(mount_paths={tmp_dir: "/workspace"})
73
+
74
+ # Run agent with sandbox
75
+ agent = FileAnalysisAgent(sandbox=sandbox)
76
+ result = agent.run()
77
+
78
+ return result
79
+ finally:
80
+ sandbox.shutdown()
81
+ shutil.rmtree(tmp_dir)
82
+ ```
83
+
84
+ #### 03-agent-state-management
85
+
86
+ **State type naming:**
87
+
88
+ - Agent state types MUST end with `_agent_state` suffix (e.g., `file_analyzer_agent_state`)
89
+ - Follow [agentme-edr-020](020-ai-workflow-development-standards.md) rule `11-state-type-conventions` when agents are used as workflow nodes
90
+
91
+ #### 04-tool-definition-patterns
92
+
93
+ Tools provided to agents MUST follow these patterns:
94
+
95
+ **Tool signature:**
96
+
97
+ ```python
98
+ from typing import Any, Dict
99
+
100
+ def tool_name(arg1: str, arg2: int) -> Dict[str, Any]:
101
+ """
102
+ Brief description of what the tool does.
103
+
104
+ Args:
105
+ arg1: Description of arg1
106
+ arg2: Description of arg2
107
+
108
+ Returns:
109
+ Dictionary with tool execution results
110
+ """
111
+ # Tool implementation
112
+ return {"status": "success", "result": ...}
113
+ ```
114
+
115
+ **Tool requirements:**
116
+
117
+ - Tool names MUST be descriptive action verbs (e.g., `search_files`, `execute_command`, `read_document`)
118
+ - Tool docstrings MUST clearly describe the tool's purpose, arguments, and return value (the LLM reads these)
119
+ - Tools MUST return structured data (dictionaries or dataclasses), not bare strings or untyped values
120
+ - Tools MUST handle errors gracefully and return error information in the result structure, not raise exceptions
121
+ - Tools that interact with external systems MUST be placed in `adapters/connectors/` per [agentme-edr-026](026-pragmatic-hexagonal-architecture.md)
122
+
123
+ **Error handling in tools:**
124
+
125
+ ```python
126
+ def search_files(pattern: str, directory: str = ".") -> Dict[str, Any]:
127
+ """Search for files matching a glob pattern."""
128
+ try:
129
+ matches = list(Path(directory).glob(pattern))
130
+ return {
131
+ "status": "success",
132
+ "matches": [str(m) for m in matches],
133
+ "count": len(matches)
134
+ }
135
+ except Exception as e:
136
+ return {
137
+ "status": "error",
138
+ "error_message": str(e),
139
+ "error_type": type(e).__name__
140
+ }
141
+ ```
142
+
143
+ #### 05-agent-error-handling-and-recovery
144
+
145
+ Agents MUST implement robust error handling:
146
+
147
+ **Maximum iteration limits:**
148
+
149
+ - Every agent MUST have a maximum iteration limit to prevent infinite loops
150
+ - The default maximum SHOULD be configurable and logged when reached
151
+ - When the maximum is reached, the agent MUST return a structured failure result, not raise an exception
152
+
153
+ **Tool failure handling:**
154
+
155
+ - When a tool returns an error, the agent MUST be able to observe the error and decide on recovery actions
156
+ - Tools MUST NOT raise exceptions for expected failures (network errors, file not found, etc.)
157
+ - Agents MAY implement retry logic with exponential backoff for transient failures
158
+
159
+ **Terminal states:**
160
+
161
+ Agents MUST recognize and handle three terminal states:
162
+ - **Success**: Goal achieved, task complete
163
+ - **Failure**: Goal cannot be achieved, give up gracefully
164
+ - **Timeout**: Maximum iterations reached, return partial results if possible
165
+
166
+ #### 06-agent-naming-conventions
167
+
168
+ Agent class names MUST follow the pattern `<Purpose>Agent` where `<Purpose>` describes what the agent does:
169
+
170
+ **Good names:**
171
+ - `FileAnalyzerAgent` — analyzes files
172
+ - `CodeReviewerAgent` — reviews code
173
+ - `DataExtractorAgent` — extracts data from documents
174
+
175
+ **Bad names (FORBIDDEN):**
176
+ - `Agent` (too generic)
177
+ - `MainAgent` (not descriptive)
178
+ - `MyAgent` (not descriptive)
179
+ - `Agent1` (numbered, not semantic)
180
+
181
+ When agents are used as nodes in workflows, the node name MUST use the `_agent` suffix per [agentme-edr-020](020-ai-workflow-development-standards.md) rule `09-node-naming-conventions`.
182
+
183
+ #### 07-agent-observability
184
+
185
+ Agent execution MUST be observable through logging and tracing:
186
+
187
+ - Log each iteration of the perceive → plan → act → observe cycle with iteration number and tool selection.
188
+ - Use structured logging (JSON) with fields: `iteration`, `tool_selected`, `tool_result_status`, `decision`.
189
+ - For LLM calls within agents, follow [agentme-edr-018](018-ai-llm-development-standards.md) rule `03-llm-observability`.
190
+ - When agents run as workflow nodes, MLflow tracking from the parent workflow automatically captures agent-level traces.
191
+
192
+ **Example structured log entry:**
193
+
194
+ ```json
195
+ {
196
+ "timestamp": "2026-06-05T10:30:45Z",
197
+ "agent": "FileAnalyzerAgent",
198
+ "iteration": 3,
199
+ "tool_selected": "search_files",
200
+ "tool_args": {"pattern": "*.py"},
201
+ "tool_result_status": "success",
202
+ "decision": "continue"
203
+ }
204
+ ```
205
+
206
+ #### 08-agent-unit-testing
207
+
208
+ Agent LLM calls are external API calls and MUST be mocked in unit tests per [agentme-edr-018](018-ai-llm-development-standards.md) rule `04-unit-test-mocking`.
209
+
210
+ Because agents drive a tool-invocation loop — where the LLM decides which tools to call — the fake model must return **tool-call messages** followed by a final answer. Use **`GenericFakeChatModel`** for this:
211
+
212
+ ```python
213
+ from langchain_core.language_models.fake_chat_models import GenericFakeChatModel
214
+ from langchain_core.messages import AIMessage
215
+
216
+ def test_file_analyzer_agent_calls_search_then_stops():
217
+ # Iteration 1: LLM requests a tool call
218
+ tool_call_msg = AIMessage(
219
+ content="",
220
+ tool_calls=[{
221
+ "name": "search_files",
222
+ "args": {"pattern": "*.py", "directory": "/workspace"},
223
+ "id": "call_1"
224
+ }]
225
+ )
226
+ # Iteration 2: LLM produces a final answer after observing the tool result
227
+ final_msg = AIMessage(content="Found 3 Python files matching the pattern.")
228
+
229
+ fake_model = GenericFakeChatModel(messages=iter([tool_call_msg, final_msg]))
230
+
231
+ agent = FileAnalyzerAgent(model=fake_model)
232
+ result = agent.run(directory="/workspace")
233
+
234
+ assert result.status == "success"
235
+ assert "3 Python files" in result.summary
236
+ ```
237
+
238
+ Agents MUST be designed so that the LLM instance is injectable (constructor parameter) to allow test doubles. See [agentme-edr-018](018-ai-llm-development-standards.md) rule `04-unit-test-mocking` for the injectable LLM pattern.
239
+
240
+ **`mock_deep_agent`**
241
+
242
+ Place `mock_deep_agent` in a shared test utilities module (e.g., `tests/helpers.py`) so all test files that need it can import it from one location and mock deep_agent instances when needed.
243
+
244
+ **Example usage:**
245
+
246
+ ```python
247
+ from tests.helpers import mock_deep_agent
248
+
249
+ def test_workflow_calls_subagent(mocker):
250
+ mock_deep_agent(
251
+ mocker,
252
+ "mypackage.nodes.analysis_node.create_workflow_agent",
253
+ output={"status": "success", "findings": ["issue A"]}
254
+ )
255
+
256
+ result = run_analysis_workflow(input_data)
257
+
258
+ assert result.findings == ["issue A"]
259
+ ```
260
+
261
+ #### 09-agent-composition
262
+
263
+ When multiple agents are needed:
264
+
265
+ - **Single agent with multiple tools:** Use when tools share a common goal and context (e.g., a code analysis agent with `read_file`, `search_code`, and `analyze_pattern` tools).
266
+ - **Multiple agents as workflow nodes:** Use when agents have distinct responsibilities and outputs that feed into each other. Orchestrate them using LangGraph per [agentme-edr-020](020-ai-workflow-development-standards.md).
267
+ - Do NOT create nested agent loops (agent calling agent autonomously). Use workflows for multi-agent orchestration.
268
+
269
+ **Decision criteria:**
270
+
271
+ | Pattern | When to use |
272
+ |---|---|
273
+ | Single agent + tools | All tools serve the same goal; agent completes in one session |
274
+ | Multiple workflow-orchestrated agents | Each agent has a distinct goal; outputs flow between agents; deterministic sequencing needed |
275
+ | Nested agents (FORBIDDEN) | Never — always use workflow orchestration instead |
276
+
277
+ ## References
278
+
279
+ - [agentme-edr-018](018-ai-llm-development-standards.md) — LLM development standards (LangChain configuration, mocking patterns)
280
+ - [agentme-edr-020](020-ai-workflow-development-standards.md) — Workflow development standards (using agents as workflow nodes)
281
+ - [agentme-edr-026](026-pragmatic-hexagonal-architecture.md) — Hexagonal architecture (tool placement in adapters/connectors)
282
+ - [agentme-edr-014](014-python-project-tooling.md) — Python project tooling and structure
283
+ - [agentme-edr-007](../principles/007-project-quality-standards.md) — Project quality standards including AI-tier testing requirements (rule `09-ai-project-testing-requirements`)
284
+ - [agentme-edr-021](021-ai-eval-standards.md) — AI eval standards: folder structure, script requirements, and MLflow tracking