agentme 0.14.0 → 0.15.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.filedist-package.yml +1 -1
- package/.xdrs/agentme/edrs/application/003-javascript-project-tooling.md +3 -3
- package/.xdrs/agentme/edrs/application/010-golang-project-tooling.md +3 -3
- package/.xdrs/agentme/edrs/application/014-python-project-tooling.md +2 -2
- package/.xdrs/agentme/edrs/application/015-cli-tool-standards.md +2 -2
- package/.xdrs/agentme/edrs/application/018-ai-llm-development-standards.md +180 -0
- package/.xdrs/agentme/edrs/application/019-ai-agents-development-standards.md +284 -0
- package/.xdrs/agentme/edrs/application/020-ai-workflow-development-standards.md +261 -0
- package/.xdrs/agentme/edrs/application/021-ai-eval-standards.md +90 -0
- package/.xdrs/agentme/edrs/application/{019-ml-dataset-structure.md → 024-ml-dataset-structure.md} +2 -2
- package/.xdrs/agentme/edrs/application/{020-ai-agent-xdrs-knowledge-layer.md → 025-ai-agent-xdrs-knowledge-layer.md} +4 -4
- package/.xdrs/agentme/edrs/application/{021-pragmatic-hexagonal-architecture.md → 026-pragmatic-hexagonal-architecture.md} +2 -2
- package/.xdrs/agentme/edrs/application/skills/001-create-javascript-project/SKILL.md +2 -2
- package/.xdrs/agentme/edrs/application/skills/003-create-golang-project/SKILL.md +2 -2
- package/.xdrs/agentme/edrs/application/skills/005-create-python-project/SKILL.md +3 -3
- package/.xdrs/agentme/edrs/index.md +7 -5
- package/.xdrs/agentme/edrs/principles/007-project-quality-standards.md +29 -1
- package/package.json +3 -3
- package/.xdrs/agentme/edrs/application/018-ai-agent-development-standards.md +0 -309
- package/.xdrs/agentme/edrs/application/024-llm-development-standards.md +0 -116
package/.filedist-package.yml
CHANGED
|
@@ -82,7 +82,7 @@ Builds that miss the threshold must not be merged.
|
|
|
82
82
|
│ ├── dist/ # compiled files and packed .tgz artifacts
|
|
83
83
|
│ └── src/ # all TypeScript source files
|
|
84
84
|
│ ├── index.ts # public API re-exports from app/
|
|
85
|
-
│ ├── adapters/ # I/O boundary layer (following agentme-edr-
|
|
85
|
+
│ ├── adapters/ # I/O boundary layer (following agentme-edr-026)
|
|
86
86
|
│ │ ├── cli/ # inbound: CLI bootstrap and entry point
|
|
87
87
|
│ │ ├── http/ # inbound: HTTP server bootstrap and handlers
|
|
88
88
|
│ │ └── connectors/ # outbound: one folder per external resource
|
|
@@ -101,7 +101,7 @@ Builds that miss the threshold must not be merged.
|
|
|
101
101
|
|
|
102
102
|
The root `Makefile` delegates every target to `/lib` then `/examples` in sequence. Parent Makefiles should call child Makefiles directly, and each module Makefile is responsible for running its actual tool commands through `mise exec --`.
|
|
103
103
|
|
|
104
|
-
Internal source code MUST be organized following [agentme-edr-
|
|
104
|
+
Internal source code MUST be organized following [agentme-edr-026](026-pragmatic-hexagonal-architecture.md): `adapters/` (inbound and outbound I/O boundaries), `app/` (business logic), and `shared/` (infrastructure-agnostic utilities). The public API entry point (`index.ts`) re-exports from `app/`.
|
|
105
105
|
|
|
106
106
|
When a repository contains multiple JavaScript/TypeScript packages, each package MUST live in its own module folder such as `lib/my-package/` or `services/my-service/`, each with its own `Makefile`, `README.md`, `dist/`, and `.cache/`.
|
|
107
107
|
|
|
@@ -155,6 +155,6 @@ The examples folder MUST exist for any libraries and utilities that are publishe
|
|
|
155
155
|
## References
|
|
156
156
|
|
|
157
157
|
- [agentme-edr-004](../principles/004-unit-test-requirements.md) — Coverage and unit-test baseline
|
|
158
|
-
- [agentme-edr-
|
|
158
|
+
- [agentme-edr-026](026-pragmatic-hexagonal-architecture.md) — Internal adapter/application layer separation for applications
|
|
159
159
|
- [001-create-javascript-project](skills/001-create-javascript-project/SKILL.md) — scaffolds a new project following this structure
|
|
160
160
|
|
|
@@ -47,7 +47,7 @@ Direct installation of project-required Go CLIs with `go install ...@latest` as
|
|
|
47
47
|
├── main.go # binary entry point — argument dispatch only, no logic
|
|
48
48
|
├── .cache/ # GOCACHE, GOMODCACHE, golangci-lint cache, coverage
|
|
49
49
|
├── dist/ # built binaries and packaged outputs
|
|
50
|
-
├── adapters/ # I/O boundary layer (following agentme-edr-
|
|
50
|
+
├── adapters/ # I/O boundary layer (following agentme-edr-026)
|
|
51
51
|
│ ├── cli/ # inbound: CLI wiring — flag parsing, output formatting
|
|
52
52
|
│ │ └── *.go # subfolders per feature only when complexity warrants it
|
|
53
53
|
│ ├── http/ # inbound: HTTP server bootstrap and handlers
|
|
@@ -73,7 +73,7 @@ Direct installation of project-required Go CLIs with `go install ...@latest` as
|
|
|
73
73
|
|
|
74
74
|
**Key layout rules:**
|
|
75
75
|
|
|
76
|
-
- Internal source code is organized following [agentme-edr-
|
|
76
|
+
- Internal source code is organized following [agentme-edr-026](026-pragmatic-hexagonal-architecture.md): `adapters/` (inbound and outbound I/O boundaries), `app/` (business logic), and `shared/` (infrastructure-agnostic utilities).
|
|
77
77
|
- One Go module per project (`go.mod` at the project root). In a monorepo, each Go project has its own `go.mod` in its subdirectory. No nested modules within a single project unless explicitly justified.
|
|
78
78
|
- In a multi-module repository, each Go module MUST live in its own folder root with its own `Makefile`, `README.md`, `dist/`, and `.cache/`.
|
|
79
79
|
- `main.go` is solely an argument dispatcher — it reads `os.Args[1]` and delegates to an `adapters/cli/<feature>/Run*()` function. No domain logic lives in `main.go`.
|
|
@@ -178,5 +178,5 @@ Use the standard library `flag` package for CLI flags. Each `adapters/cli/<featu
|
|
|
178
178
|
|
|
179
179
|
## References
|
|
180
180
|
|
|
181
|
-
- [agentme-edr-
|
|
181
|
+
- [agentme-edr-026](026-pragmatic-hexagonal-architecture.md) — Defines the adapter/application separation that this layout follows
|
|
182
182
|
- [003-create-golang-project](skills/003-create-golang-project/SKILL.md) — scaffolds a new Go project following this structure
|
|
@@ -71,7 +71,7 @@ No tool MUST write cache or state files to the project root, `src/`, `tests/`, o
|
|
|
71
71
|
│ ├── src/
|
|
72
72
|
│ │ └── <package_name>/
|
|
73
73
|
│ │ ├── __init__.py
|
|
74
|
-
│ │ ├── adapters/ # I/O boundary layer (following agentme-edr-
|
|
74
|
+
│ │ ├── adapters/ # I/O boundary layer (following agentme-edr-026)
|
|
75
75
|
│ │ │ ├── cli/ # inbound: CLI bootstrap and entry point
|
|
76
76
|
│ │ │ ├── http/ # inbound: HTTP server bootstrap
|
|
77
77
|
│ │ │ └── connectors/ # outbound: one folder per external resource
|
|
@@ -96,7 +96,7 @@ Keep the repository root clean: source code, tests, distribution artifacts, and
|
|
|
96
96
|
|
|
97
97
|
Use the `lib/src/` layout for import safety and packaging clarity. Keep tests under `lib/tests/` and shared test setup in `lib/tests/conftest.py`. Do not introduce `requirements.txt`, `setup.py`, `setup.cfg`, `tox.ini`, `ruff.toml`, or `ty.toml` by default; keep project metadata and tool configuration in `lib/pyproject.toml`.
|
|
98
98
|
|
|
99
|
-
Internal source code MUST be organized following [agentme-edr-
|
|
99
|
+
Internal source code MUST be organized following [agentme-edr-026](026-pragmatic-hexagonal-architecture.md): `adapters/` (inbound and outbound I/O boundaries), `app/` (business logic), and `shared/` (infrastructure-agnostic utilities).
|
|
100
100
|
|
|
101
101
|
Libraries and shared utilities must include an `examples/` folder and wire example execution into the root `test` flow, following [agentme-edr-007](../principles/007-project-quality-standards.md). Each example directory is its own Python project with its own `pyproject.toml`, and examples must import the library as a consumer would rather than reaching back into `lib/src/` with relative imports. Local example verification must install the wheel built into `lib/dist/`; do not use editable or path-based dependencies back to `lib/`.
|
|
102
102
|
|
|
@@ -34,7 +34,7 @@ This keeps the user-facing command predictable while preserving a clean library
|
|
|
34
34
|
|
|
35
35
|
#### CLI to application separation
|
|
36
36
|
|
|
37
|
-
- Structure the software as `cli -> app` — the CLI adapter delegates to the application layer, following [agentme-edr-
|
|
37
|
+
- Structure the software as `cli -> app` — the CLI adapter delegates to the application layer, following [agentme-edr-026](026-pragmatic-hexagonal-architecture.md).
|
|
38
38
|
- The CLI layer must only parse arguments, load config, call the application layer, and format output.
|
|
39
39
|
- Domain logic must live in the application layer and be usable without CLI globals such as `argv`, `stdout`, or process exit handlers.
|
|
40
40
|
- Every feature available through the CLI must also be available through the application API.
|
|
@@ -99,7 +99,7 @@ This keeps the user-facing command predictable while preserving a clean library
|
|
|
99
99
|
|
|
100
100
|
## References
|
|
101
101
|
|
|
102
|
-
- [agentme-edr-
|
|
102
|
+
- [agentme-edr-026](026-pragmatic-hexagonal-architecture.md) - Defines the adapter/application separation that the CLI layer follows
|
|
103
103
|
- [agentme-edr-003](003-javascript-project-tooling.md) - JavaScript project packaging and structure
|
|
104
104
|
- [agentme-edr-007](../principles/007-project-quality-standards.md) - README and examples baseline
|
|
105
105
|
- [agentme-edr-008](../devops/008-common-targets.md) - Standard command names for project entry points
|
|
@@ -0,0 +1,180 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: agentme-edr-policy-018-ai-llm-development-standards
|
|
3
|
+
description: Defines the standard framework, provider configuration, observability approach, and LLM mocking patterns for simple LLM calls in Python. Use when building, reviewing, or scaffolding any code that makes direct LLM calls using LangChain, manages prompt context, or handles conversation history. For agentic patterns see agentme-edr-019, for workflow patterns see agentme-edr-020.
|
|
4
|
+
apply-to: Python projects that make direct LLM calls, manage prompt context, or handle conversation threads
|
|
5
|
+
valid-from: 2026-06-05
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# agentme-edr-policy-018: AI LLM development standards
|
|
9
|
+
|
|
10
|
+
## Context and Problem Statement
|
|
11
|
+
|
|
12
|
+
LLM-based applications can be built at different levels of abstraction — from a single prompt call to a full autonomous agent or a complex multi-step workflow. Without a shared vocabulary and a prescribed framework, projects mix incompatible patterns for invoking models, managing context, and tracing requests.
|
|
13
|
+
|
|
14
|
+
Which framework should be used for LLM calls, how should providers be configured, and what is the canonical meaning of "LLM", "agent", and "workflow" in this codebase?
|
|
15
|
+
|
|
16
|
+
## Decision Outcome
|
|
17
|
+
|
|
18
|
+
**Use LangChain as the standard framework for all direct LLM interactions. Adopt a strict three-tier conceptual model — LLM / Agent / Workflow — that maps each tier to a specific library.**
|
|
19
|
+
|
|
20
|
+
### Conceptual model
|
|
21
|
+
|
|
22
|
+
Three distinct tiers of LLM-based computation are recognized in this policy. Every component MUST be classified into exactly one tier:
|
|
23
|
+
|
|
24
|
+
| Tier | What it is | Library |
|
|
25
|
+
|---|---|---|
|
|
26
|
+
| **LLM** | A request → response prompt exchange with a model. May include a conversation history or thread. No autonomous decision-making. | `langchain` / `langchain-openai` |
|
|
27
|
+
| **Agent** | An LLM-based flow driven by a tool-invocation loop that the LLM itself plans and executes. The LLM decides which tools to call and when to stop. | `deepagents` |
|
|
28
|
+
| **Workflow** | A directed graph of nodes that mixes LLM-based nodes (simple LLM calls or agentic loops) with deterministic algorithmic nodes. The graph topology is defined in code, not chosen by the LLM at runtime. | `langgraph` |
|
|
29
|
+
|
|
30
|
+
These tiers nest: in general, a Workflow may contain Agent nodes; an Agent uses LLM calls internally. The tier of a component is determined by its outermost controlling structure.
|
|
31
|
+
|
|
32
|
+
See [agentme-edr-019](019-ai-agents-development-standards.md) for Agent implementation standards and [agentme-edr-020](020-ai-workflow-development-standards.md) for Workflow implementation standards.
|
|
33
|
+
|
|
34
|
+
### Details
|
|
35
|
+
|
|
36
|
+
#### 01-conceptual-model
|
|
37
|
+
|
|
38
|
+
Every component that interacts with an LLM MUST be classified as exactly one of the three tiers defined in the conceptual model table above: **LLM**, **Agent**, or **Workflow**.
|
|
39
|
+
|
|
40
|
+
- Do NOT use the word "agent" to describe a component that only makes a single LLM call without a tool-invocation loop.
|
|
41
|
+
- Do NOT use the word "workflow" to describe a component that is purely an LLM call with no graph structure.
|
|
42
|
+
- When designing a new feature, identify the correct tier first. The tier determines which library and patterns apply (LangChain, deepagents, or LangGraph).
|
|
43
|
+
|
|
44
|
+
**Function calling boundary:**
|
|
45
|
+
|
|
46
|
+
- A **single** function call decided by the LLM (e.g., "call get_weather(location)") is still an LLM-tier interaction if the function is called once and the result is returned to the user.
|
|
47
|
+
- An **iterative** function-calling loop where the LLM observes results and decides next actions autonomously is an Agent (see [agentme-edr-019](019-ai-agents-development-standards.md)).
|
|
48
|
+
|
|
49
|
+
#### 02-llm-framework
|
|
50
|
+
|
|
51
|
+
All direct LLM calls MUST use **LangChain** via the `langchain` packages.
|
|
52
|
+
|
|
53
|
+
- Use `langchain-openai` as the provider integration layer. It supports both OpenAI and Azure OpenAI.
|
|
54
|
+
- **Always configure LLM providers using explicit library attributes** such as `api_key`, `base_url`, `model`, `api_version`, etc. Never rely on environment variables for LLM configuration.
|
|
55
|
+
- Configuration MUST be passed via constructor parameters or configuration objects, making dependencies explicit and testable.
|
|
56
|
+
|
|
57
|
+
**Example of explicit configuration:**
|
|
58
|
+
|
|
59
|
+
```python
|
|
60
|
+
# Azure OpenAI configuration (explicit)
|
|
61
|
+
llm = ChatOpenAI(
|
|
62
|
+
api_key=config.azure_api_key,
|
|
63
|
+
azure_endpoint=config.azure_endpoint,
|
|
64
|
+
api_version="2024-02-15-preview",
|
|
65
|
+
azure_deployment=config.azure_deployment
|
|
66
|
+
)
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
#### 03-llm-observability
|
|
70
|
+
|
|
71
|
+
Enable LangChain auto-tracing at every application entry point by calling `mlflow.langchain.autolog()` during startup, before any LLM call is made.
|
|
72
|
+
|
|
73
|
+
- This captures inputs, outputs, token counts, and latency for every LangChain chain or runnable automatically.
|
|
74
|
+
|
|
75
|
+
#### 04-unit-test-mocking
|
|
76
|
+
|
|
77
|
+
LLM provider calls are external API calls and MUST be mocked in unit tests. Mocking LLM providers enables offline test execution while testing the logic, routing, and state management of LLM calls, agents, and workflows.
|
|
78
|
+
|
|
79
|
+
Use LangChain's built-in fake models from `langchain_core.language_models.fake_chat_models`. Choose the utility based on what the code under test expects from the model:
|
|
80
|
+
|
|
81
|
+
| Utility | When to use |
|
|
82
|
+
|---|---|
|
|
83
|
+
| `FakeListChatModel` | The code only reads the text content of the response (`AIMessage.content`). Returns plain-text `AIMessage` objects from a pre-defined list, in order. |
|
|
84
|
+
| `GenericFakeChatModel` | The code expects tool calls, structured outputs, or needs to inspect the message type beyond plain text. Accepts a list of pre-built `AIMessage` (or `AIMessageChunk`) objects, giving full control over the response structure. |
|
|
85
|
+
|
|
86
|
+
**`FakeListChatModel` — plain text responses:**
|
|
87
|
+
|
|
88
|
+
```python
|
|
89
|
+
from langchain_core.language_models.fake_chat_models import FakeListChatModel
|
|
90
|
+
|
|
91
|
+
def test_document_approval_routing():
|
|
92
|
+
fake_model = FakeListChatModel(responses=[
|
|
93
|
+
"APPROVE",
|
|
94
|
+
"The document meets all quality criteria."
|
|
95
|
+
])
|
|
96
|
+
|
|
97
|
+
workflow = DocumentWorkflow(model=fake_model)
|
|
98
|
+
result = workflow.run(input_doc)
|
|
99
|
+
|
|
100
|
+
assert result.status == "approved"
|
|
101
|
+
assert "quality criteria" in result.reasoning
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
**`GenericFakeChatModel` — tool-call or structured responses:**
|
|
105
|
+
|
|
106
|
+
```python
|
|
107
|
+
from langchain_core.language_models.fake_chat_models import GenericFakeChatModel
|
|
108
|
+
from langchain_core.messages import AIMessage
|
|
109
|
+
import json
|
|
110
|
+
|
|
111
|
+
def test_agent_tool_invocation():
|
|
112
|
+
# Simulate the LLM requesting a tool call, then producing a final answer
|
|
113
|
+
tool_call_msg = AIMessage(
|
|
114
|
+
content="",
|
|
115
|
+
tool_calls=[{
|
|
116
|
+
"name": "search_files",
|
|
117
|
+
"args": {"pattern": "*.py"},
|
|
118
|
+
"id": "call_1"
|
|
119
|
+
}]
|
|
120
|
+
)
|
|
121
|
+
final_msg = AIMessage(content="Found 3 Python files.")
|
|
122
|
+
|
|
123
|
+
fake_model = GenericFakeChatModel(messages=iter([tool_call_msg, final_msg]))
|
|
124
|
+
|
|
125
|
+
agent = FileAnalyzerAgent(model=fake_model)
|
|
126
|
+
result = agent.run()
|
|
127
|
+
|
|
128
|
+
assert result.summary == "Found 3 Python files."
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
**Injectable LLM pattern (required for testability):**
|
|
132
|
+
|
|
133
|
+
Whenever a workflow, agent, or node makes LLM calls, it MUST accept the LLM instance as a constructor parameter or configuration field so that unit tests can inject a fake:
|
|
134
|
+
|
|
135
|
+
```python
|
|
136
|
+
class DocumentWorkflow:
|
|
137
|
+
def __init__(self, model: Optional[BaseChatModel] = None):
|
|
138
|
+
self.model = model or ChatOpenAI(
|
|
139
|
+
api_key=config.openai_api_key,
|
|
140
|
+
model="gpt-4"
|
|
141
|
+
)
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
This allows unit tests to inject `FakeListChatModel` or `GenericFakeChatModel` while production code uses the real provider.
|
|
145
|
+
|
|
146
|
+
#### 05-prompt-management
|
|
147
|
+
|
|
148
|
+
Prompt templates MUST be managed explicitly and versioned:
|
|
149
|
+
|
|
150
|
+
- Store prompt templates as separate files in `prompts/` directory when they exceed 10 lines or are reused across multiple components.
|
|
151
|
+
- Use LangChain `PromptTemplate` or `ChatPromptTemplate` for parameterized prompts.
|
|
152
|
+
|
|
153
|
+
**Example prompt file structure:**
|
|
154
|
+
|
|
155
|
+
```text
|
|
156
|
+
lib/src/<package_name>/
|
|
157
|
+
prompts/
|
|
158
|
+
summarize.txt
|
|
159
|
+
extract_entities.txt
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
**Example usage:**
|
|
163
|
+
|
|
164
|
+
```python
|
|
165
|
+
from langchain.prompts import PromptTemplate
|
|
166
|
+
|
|
167
|
+
prompt = PromptTemplate.from_file(
|
|
168
|
+
"prompts/summarize_v1.0.0.txt",
|
|
169
|
+
input_variables=["document"]
|
|
170
|
+
)
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
## References
|
|
174
|
+
|
|
175
|
+
- [agentme-edr-019](019-ai-agents-development-standards.md) — Agent implementation standards (deepagents, tool-invocation loops)
|
|
176
|
+
- [agentme-edr-020](020-ai-workflow-development-standards.md) — Workflow implementation standards (LangGraph, MLflow run-level tracking)
|
|
177
|
+
- [agentme-edr-004](../principles/004-unit-test-requirements.md) — Unit test requirements including external API mocking guidance
|
|
178
|
+
- [agentme-edr-014](014-python-project-tooling.md) — Python project tooling and structure
|
|
179
|
+
- [agentme-edr-007](../principles/007-project-quality-standards.md) — Project quality standards including AI-tier testing requirements (rule `09-ai-project-testing-requirements`)
|
|
180
|
+
- [agentme-edr-021](021-ai-eval-standards.md) — AI eval standards: folder structure, script requirements, and MLflow tracking
|
|
@@ -0,0 +1,284 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: agentme-edr-policy-019-ai-agents-development-standards
|
|
3
|
+
description: Defines the standard framework and patterns for building AI agents with tool-invocation loops using the deepagents framework. Use when building agents where the LLM autonomously decides which tools to call and when to stop. For simple LLM calls see agentme-edr-018, for workflow orchestration see agentme-edr-020.
|
|
4
|
+
apply-to: AI agent projects that use tool-invocation loops where the LLM decides which tools to call and when to stop
|
|
5
|
+
valid-from: 2026-06-05
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# agentme-edr-policy-019: AI agents development standards
|
|
9
|
+
|
|
10
|
+
## Context and Problem Statement
|
|
11
|
+
|
|
12
|
+
AI applications often need to give LLMs the ability to autonomously choose and invoke tools to accomplish tasks. Without standardized patterns for agent implementation, projects end up with incompatible approaches to tool definition, state management, and runtime environments.
|
|
13
|
+
|
|
14
|
+
Which framework should be used for building agents with tool-invocation loops, and what are the essential patterns for agent state, tools, and execution environments?
|
|
15
|
+
|
|
16
|
+
## Decision Outcome
|
|
17
|
+
|
|
18
|
+
**Use the deepagents framework for all agent implementations where an LLM autonomously decides which tools to call and when to stop.**
|
|
19
|
+
|
|
20
|
+
### Conceptual model
|
|
21
|
+
|
|
22
|
+
An **Agent** is an LLM-based flow driven by a tool-invocation loop that the LLM itself plans and executes. The LLM decides which tools to call and when to stop. The agent follows a perceive → plan → act → observe cycle autonomously until it reaches a terminal state.
|
|
23
|
+
|
|
24
|
+
### Details
|
|
25
|
+
|
|
26
|
+
#### 01-agent-framework
|
|
27
|
+
|
|
28
|
+
All agent implementations MUST use the **deepagents** framework.
|
|
29
|
+
|
|
30
|
+
- Use deepagents whenever the LLM needs to autonomously select and invoke tools to accomplish a task.
|
|
31
|
+
- The agent MUST follow the perceive → plan → act → observe cycle where the LLM observes tool outputs and decides the next action.
|
|
32
|
+
- All LLM calls within agents MUST follow [agentme-edr-018](018-ai-llm-development-standards.md) for LangChain configuration and observability.
|
|
33
|
+
|
|
34
|
+
**When to use agents vs workflows:**
|
|
35
|
+
|
|
36
|
+
- Use an **agent** when the LLM should autonomously decide the sequence of tool calls based on runtime observations.
|
|
37
|
+
- Use a **workflow** when the execution path is predefined in code, even if individual nodes involve LLM calls or agent subgraphs.
|
|
38
|
+
- When in doubt, prefer workflows (explicit control flow) over agents (autonomous control flow) for maintainability and predictability.
|
|
39
|
+
|
|
40
|
+
#### 02-local-sandbox
|
|
41
|
+
|
|
42
|
+
When an agent requires a **local sandbox** — an isolated environment where the agent can read files, glob-search directories, and execute shell commands — use the **[deepagents](https://github.com/deepagents/deepagents) framework** to provide that sandbox.
|
|
43
|
+
|
|
44
|
+
**When to apply this rule:**
|
|
45
|
+
|
|
46
|
+
Use deepagents sandbox whenever ANY of the following is true:
|
|
47
|
+
- The agent needs to execute shell commands or scripts in a controlled environment.
|
|
48
|
+
- The agent needs to list, read, or search files across multiple directories at runtime.
|
|
49
|
+
- The agent operates on user-supplied or generated file trees that must not escape a sandboxed boundary.
|
|
50
|
+
|
|
51
|
+
**Integration requirements:**
|
|
52
|
+
|
|
53
|
+
- Initialize the sandbox at the start of the agent run and shut it down in the same `try/finally` block.
|
|
54
|
+
- Pass the sandbox handle into the agent's state so all tool calls share the same sandbox instance.
|
|
55
|
+
- If the host-side code needs to pass files into the sandbox (e.g. generated config or input data), create a temporary directory with `tempfile.mkdtemp()`, write the files there, and mount it into the sandbox. Clean it up in the `finally` block.
|
|
56
|
+
- Replace hand-rolled `read_file`, `search_files`, and `grep_file` tool implementations with the equivalent tools provided by deepagents.
|
|
57
|
+
|
|
58
|
+
**Example:**
|
|
59
|
+
|
|
60
|
+
```python
|
|
61
|
+
import tempfile
|
|
62
|
+
from deepagents import Sandbox
|
|
63
|
+
|
|
64
|
+
def run_file_analysis_agent(input_files: List[Path]) -> AnalysisResult:
|
|
65
|
+
tmp_dir = tempfile.mkdtemp()
|
|
66
|
+
try:
|
|
67
|
+
# Copy input files to temp directory
|
|
68
|
+
for f in input_files:
|
|
69
|
+
shutil.copy(f, tmp_dir)
|
|
70
|
+
|
|
71
|
+
# Initialize sandbox with mounted directory
|
|
72
|
+
sandbox = Sandbox(mount_paths={tmp_dir: "/workspace"})
|
|
73
|
+
|
|
74
|
+
# Run agent with sandbox
|
|
75
|
+
agent = FileAnalysisAgent(sandbox=sandbox)
|
|
76
|
+
result = agent.run()
|
|
77
|
+
|
|
78
|
+
return result
|
|
79
|
+
finally:
|
|
80
|
+
sandbox.shutdown()
|
|
81
|
+
shutil.rmtree(tmp_dir)
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
#### 03-agent-state-management
|
|
85
|
+
|
|
86
|
+
**State type naming:**
|
|
87
|
+
|
|
88
|
+
- Agent state types MUST end with `_agent_state` suffix (e.g., `file_analyzer_agent_state`)
|
|
89
|
+
- Follow [agentme-edr-020](020-ai-workflow-development-standards.md) rule `11-state-type-conventions` when agents are used as workflow nodes
|
|
90
|
+
|
|
91
|
+
#### 04-tool-definition-patterns
|
|
92
|
+
|
|
93
|
+
Tools provided to agents MUST follow these patterns:
|
|
94
|
+
|
|
95
|
+
**Tool signature:**
|
|
96
|
+
|
|
97
|
+
```python
|
|
98
|
+
from typing import Any, Dict
|
|
99
|
+
|
|
100
|
+
def tool_name(arg1: str, arg2: int) -> Dict[str, Any]:
|
|
101
|
+
"""
|
|
102
|
+
Brief description of what the tool does.
|
|
103
|
+
|
|
104
|
+
Args:
|
|
105
|
+
arg1: Description of arg1
|
|
106
|
+
arg2: Description of arg2
|
|
107
|
+
|
|
108
|
+
Returns:
|
|
109
|
+
Dictionary with tool execution results
|
|
110
|
+
"""
|
|
111
|
+
# Tool implementation
|
|
112
|
+
return {"status": "success", "result": ...}
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
**Tool requirements:**
|
|
116
|
+
|
|
117
|
+
- Tool names MUST be descriptive action verbs (e.g., `search_files`, `execute_command`, `read_document`)
|
|
118
|
+
- Tool docstrings MUST clearly describe the tool's purpose, arguments, and return value (the LLM reads these)
|
|
119
|
+
- Tools MUST return structured data (dictionaries or dataclasses), not bare strings or untyped values
|
|
120
|
+
- Tools MUST handle errors gracefully and return error information in the result structure, not raise exceptions
|
|
121
|
+
- Tools that interact with external systems MUST be placed in `adapters/connectors/` per [agentme-edr-026](026-pragmatic-hexagonal-architecture.md)
|
|
122
|
+
|
|
123
|
+
**Error handling in tools:**
|
|
124
|
+
|
|
125
|
+
```python
|
|
126
|
+
def search_files(pattern: str, directory: str = ".") -> Dict[str, Any]:
|
|
127
|
+
"""Search for files matching a glob pattern."""
|
|
128
|
+
try:
|
|
129
|
+
matches = list(Path(directory).glob(pattern))
|
|
130
|
+
return {
|
|
131
|
+
"status": "success",
|
|
132
|
+
"matches": [str(m) for m in matches],
|
|
133
|
+
"count": len(matches)
|
|
134
|
+
}
|
|
135
|
+
except Exception as e:
|
|
136
|
+
return {
|
|
137
|
+
"status": "error",
|
|
138
|
+
"error_message": str(e),
|
|
139
|
+
"error_type": type(e).__name__
|
|
140
|
+
}
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
#### 05-agent-error-handling-and-recovery
|
|
144
|
+
|
|
145
|
+
Agents MUST implement robust error handling:
|
|
146
|
+
|
|
147
|
+
**Maximum iteration limits:**
|
|
148
|
+
|
|
149
|
+
- Every agent MUST have a maximum iteration limit to prevent infinite loops
|
|
150
|
+
- The default maximum SHOULD be configurable and logged when reached
|
|
151
|
+
- When the maximum is reached, the agent MUST return a structured failure result, not raise an exception
|
|
152
|
+
|
|
153
|
+
**Tool failure handling:**
|
|
154
|
+
|
|
155
|
+
- When a tool returns an error, the agent MUST be able to observe the error and decide on recovery actions
|
|
156
|
+
- Tools MUST NOT raise exceptions for expected failures (network errors, file not found, etc.)
|
|
157
|
+
- Agents MAY implement retry logic with exponential backoff for transient failures
|
|
158
|
+
|
|
159
|
+
**Terminal states:**
|
|
160
|
+
|
|
161
|
+
Agents MUST recognize and handle three terminal states:
|
|
162
|
+
- **Success**: Goal achieved, task complete
|
|
163
|
+
- **Failure**: Goal cannot be achieved, give up gracefully
|
|
164
|
+
- **Timeout**: Maximum iterations reached, return partial results if possible
|
|
165
|
+
|
|
166
|
+
#### 06-agent-naming-conventions
|
|
167
|
+
|
|
168
|
+
Agent class names MUST follow the pattern `<Purpose>Agent` where `<Purpose>` describes what the agent does:
|
|
169
|
+
|
|
170
|
+
**Good names:**
|
|
171
|
+
- `FileAnalyzerAgent` — analyzes files
|
|
172
|
+
- `CodeReviewerAgent` — reviews code
|
|
173
|
+
- `DataExtractorAgent` — extracts data from documents
|
|
174
|
+
|
|
175
|
+
**Bad names (FORBIDDEN):**
|
|
176
|
+
- `Agent` (too generic)
|
|
177
|
+
- `MainAgent` (not descriptive)
|
|
178
|
+
- `MyAgent` (not descriptive)
|
|
179
|
+
- `Agent1` (numbered, not semantic)
|
|
180
|
+
|
|
181
|
+
When agents are used as nodes in workflows, the node name MUST use the `_agent` suffix per [agentme-edr-020](020-ai-workflow-development-standards.md) rule `09-node-naming-conventions`.
|
|
182
|
+
|
|
183
|
+
#### 07-agent-observability
|
|
184
|
+
|
|
185
|
+
Agent execution MUST be observable through logging and tracing:
|
|
186
|
+
|
|
187
|
+
- Log each iteration of the perceive → plan → act → observe cycle with iteration number and tool selection.
|
|
188
|
+
- Use structured logging (JSON) with fields: `iteration`, `tool_selected`, `tool_result_status`, `decision`.
|
|
189
|
+
- For LLM calls within agents, follow [agentme-edr-018](018-ai-llm-development-standards.md) rule `03-llm-observability`.
|
|
190
|
+
- When agents run as workflow nodes, MLflow tracking from the parent workflow automatically captures agent-level traces.
|
|
191
|
+
|
|
192
|
+
**Example structured log entry:**
|
|
193
|
+
|
|
194
|
+
```json
|
|
195
|
+
{
|
|
196
|
+
"timestamp": "2026-06-05T10:30:45Z",
|
|
197
|
+
"agent": "FileAnalyzerAgent",
|
|
198
|
+
"iteration": 3,
|
|
199
|
+
"tool_selected": "search_files",
|
|
200
|
+
"tool_args": {"pattern": "*.py"},
|
|
201
|
+
"tool_result_status": "success",
|
|
202
|
+
"decision": "continue"
|
|
203
|
+
}
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
#### 08-agent-unit-testing
|
|
207
|
+
|
|
208
|
+
Agent LLM calls are external API calls and MUST be mocked in unit tests per [agentme-edr-018](018-ai-llm-development-standards.md) rule `04-unit-test-mocking`.
|
|
209
|
+
|
|
210
|
+
Because agents drive a tool-invocation loop — where the LLM decides which tools to call — the fake model must return **tool-call messages** followed by a final answer. Use **`GenericFakeChatModel`** for this:
|
|
211
|
+
|
|
212
|
+
```python
|
|
213
|
+
from langchain_core.language_models.fake_chat_models import GenericFakeChatModel
|
|
214
|
+
from langchain_core.messages import AIMessage
|
|
215
|
+
|
|
216
|
+
def test_file_analyzer_agent_calls_search_then_stops():
|
|
217
|
+
# Iteration 1: LLM requests a tool call
|
|
218
|
+
tool_call_msg = AIMessage(
|
|
219
|
+
content="",
|
|
220
|
+
tool_calls=[{
|
|
221
|
+
"name": "search_files",
|
|
222
|
+
"args": {"pattern": "*.py", "directory": "/workspace"},
|
|
223
|
+
"id": "call_1"
|
|
224
|
+
}]
|
|
225
|
+
)
|
|
226
|
+
# Iteration 2: LLM produces a final answer after observing the tool result
|
|
227
|
+
final_msg = AIMessage(content="Found 3 Python files matching the pattern.")
|
|
228
|
+
|
|
229
|
+
fake_model = GenericFakeChatModel(messages=iter([tool_call_msg, final_msg]))
|
|
230
|
+
|
|
231
|
+
agent = FileAnalyzerAgent(model=fake_model)
|
|
232
|
+
result = agent.run(directory="/workspace")
|
|
233
|
+
|
|
234
|
+
assert result.status == "success"
|
|
235
|
+
assert "3 Python files" in result.summary
|
|
236
|
+
```
|
|
237
|
+
|
|
238
|
+
Agents MUST be designed so that the LLM instance is injectable (constructor parameter) to allow test doubles. See [agentme-edr-018](018-ai-llm-development-standards.md) rule `04-unit-test-mocking` for the injectable LLM pattern.
|
|
239
|
+
|
|
240
|
+
**`mock_deep_agent`**
|
|
241
|
+
|
|
242
|
+
Place `mock_deep_agent` in a shared test utilities module (e.g., `tests/helpers.py`) so all test files that need it can import it from one location and mock deep_agent instances when needed.
|
|
243
|
+
|
|
244
|
+
**Example usage:**
|
|
245
|
+
|
|
246
|
+
```python
|
|
247
|
+
from tests.helpers import mock_deep_agent
|
|
248
|
+
|
|
249
|
+
def test_workflow_calls_subagent(mocker):
|
|
250
|
+
mock_deep_agent(
|
|
251
|
+
mocker,
|
|
252
|
+
"mypackage.nodes.analysis_node.create_workflow_agent",
|
|
253
|
+
output={"status": "success", "findings": ["issue A"]}
|
|
254
|
+
)
|
|
255
|
+
|
|
256
|
+
result = run_analysis_workflow(input_data)
|
|
257
|
+
|
|
258
|
+
assert result.findings == ["issue A"]
|
|
259
|
+
```
|
|
260
|
+
|
|
261
|
+
#### 09-agent-composition
|
|
262
|
+
|
|
263
|
+
When multiple agents are needed:
|
|
264
|
+
|
|
265
|
+
- **Single agent with multiple tools:** Use when tools share a common goal and context (e.g., a code analysis agent with `read_file`, `search_code`, and `analyze_pattern` tools).
|
|
266
|
+
- **Multiple agents as workflow nodes:** Use when agents have distinct responsibilities and outputs that feed into each other. Orchestrate them using LangGraph per [agentme-edr-020](020-ai-workflow-development-standards.md).
|
|
267
|
+
- Do NOT create nested agent loops (agent calling agent autonomously). Use workflows for multi-agent orchestration.
|
|
268
|
+
|
|
269
|
+
**Decision criteria:**
|
|
270
|
+
|
|
271
|
+
| Pattern | When to use |
|
|
272
|
+
|---|---|
|
|
273
|
+
| Single agent + tools | All tools serve the same goal; agent completes in one session |
|
|
274
|
+
| Multiple workflow-orchestrated agents | Each agent has a distinct goal; outputs flow between agents; deterministic sequencing needed |
|
|
275
|
+
| Nested agents (FORBIDDEN) | Never — always use workflow orchestration instead |
|
|
276
|
+
|
|
277
|
+
## References
|
|
278
|
+
|
|
279
|
+
- [agentme-edr-018](018-ai-llm-development-standards.md) — LLM development standards (LangChain configuration, mocking patterns)
|
|
280
|
+
- [agentme-edr-020](020-ai-workflow-development-standards.md) — Workflow development standards (using agents as workflow nodes)
|
|
281
|
+
- [agentme-edr-026](026-pragmatic-hexagonal-architecture.md) — Hexagonal architecture (tool placement in adapters/connectors)
|
|
282
|
+
- [agentme-edr-014](014-python-project-tooling.md) — Python project tooling and structure
|
|
283
|
+
- [agentme-edr-007](../principles/007-project-quality-standards.md) — Project quality standards including AI-tier testing requirements (rule `09-ai-project-testing-requirements`)
|
|
284
|
+
- [agentme-edr-021](021-ai-eval-standards.md) — AI eval standards: folder structure, script requirements, and MLflow tracking
|