agentme 0.15.0 → 0.17.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.xdrs/agentme/edrs/application/010-golang-project-tooling.md +1 -1
- package/.xdrs/agentme/edrs/application/015-cli-tool-standards.md +4 -5
- package/.xdrs/agentme/edrs/application/018-ai-llm-development-standards.md +5 -4
- package/.xdrs/agentme/edrs/application/019-ai-agents-development-standards.md +78 -167
- package/.xdrs/agentme/edrs/application/020-ai-agents-quality-standards.md +182 -0
- package/.xdrs/agentme/edrs/application/{020-ai-workflow-development-standards.md → 021-ai-workflow-development-standards.md} +6 -5
- package/.xdrs/agentme/edrs/application/025-ai-agent-xdrs-knowledge-layer.md +12 -7
- package/.xdrs/agentme/edrs/application/{021-ai-eval-standards.md → 028-ai-eval-standards.md} +4 -3
- package/.xdrs/agentme/edrs/devops/008-common-targets.md +28 -3
- package/.xdrs/agentme/edrs/devops/027-environment-variable-configuration.md +158 -0
- package/.xdrs/agentme/edrs/index.md +5 -3
- package/.xdrs/agentme/edrs/principles/007-project-quality-standards.md +5 -2
- package/.xdrs/agentme/edrs/principles/022-secrets-management.md +20 -0
- package/package.json +2 -2
|
@@ -82,7 +82,7 @@ Direct installation of project-required Go CLIs with `go install ...@latest` as
|
|
|
82
82
|
- Outbound adapters live under `adapters/connectors/` with one subfolder per external resource, named descriptively (e.g., `postgres/`, `stripe-api/`, `redis-cache/`).
|
|
83
83
|
- `shared/` must contain only infrastructure-agnostic utilities — not business rules or domain logic.
|
|
84
84
|
- Packages are flat by default; sub-packages are only introduced when a feature package itself exceeds ~400 lines or has clearly separable sub-concerns.
|
|
85
|
-
- Application MAY import from Adapters when it simplifies the design (pragmatic coupling per edr-
|
|
85
|
+
- Application MAY import from Adapters when it simplifies the design (pragmatic coupling per edr-022 rule 05).
|
|
86
86
|
- Consumer examples for reusable libraries belong in a sibling `examples/` folder and MUST import the public module path rather than reaching into internal source paths. Because Go libraries are not typically consumed from a local packaged artifact, local example validation may use a temporary module replacement for resolution, but the import path MUST remain the public module path.
|
|
87
87
|
|
|
88
88
|
#### go.mod
|
|
@@ -53,11 +53,10 @@ This keeps the user-facing command predictable while preserving a clean library
|
|
|
53
53
|
#### Configuration
|
|
54
54
|
|
|
55
55
|
- Prefer flags and positional arguments for simple inputs.
|
|
56
|
-
- When configuration becomes long, nested, or repetitive,
|
|
56
|
+
- When configuration becomes long, nested, or repetitive, use a YAML config file instead of pushing all values into flags. See [agentme-edr-027](../devops/027-environment-variable-configuration.md) for when `.env` values should be referenced from within that file.
|
|
57
57
|
- By default, config-file discovery and loading must happen in the CLI layer, not in the application layer.
|
|
58
|
-
- When a config file is supported, the CLI
|
|
59
|
-
- The CLI
|
|
60
|
-
- For JavaScript tools, `cosmiconfig` is an acceptable implementation. Equivalent discovery libraries are acceptable in other ecosystems.
|
|
58
|
+
- When a config file is supported, the CLI must try to load a YAML file from `[cwd]/[tool-name].yml` by default.
|
|
59
|
+
- The CLI must also support an explicit config path flag such as `--config`.
|
|
61
60
|
- The application layer must not depend on the presence of the config file; it should receive parsed configuration values from the CLI layer.
|
|
62
61
|
- The application layer may load or parse config files only when that behavior is an explicit requirement of the application contract for non-CLI consumers as well.
|
|
63
62
|
|
|
@@ -106,4 +105,4 @@ This keeps the user-facing command predictable while preserving a clean library
|
|
|
106
105
|
- [agentme-edr-009](../principles/009-error-handling.md) - Process error signaling and error handling expectations
|
|
107
106
|
- [agentme-edr-010](010-golang-project-tooling.md) - Go CLI structure and verbose logging baseline
|
|
108
107
|
- [agentme-edr-014](014-python-project-tooling.md) - Python packaging and CLI entry-point guidance
|
|
109
|
-
- [
|
|
108
|
+
- [agentme-edr-027](../devops/027-environment-variable-configuration.md) - Environment variable configuration files; defines how `.env` values are referenced from YAML config files
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: agentme-edr-policy-018-ai-llm-development-standards
|
|
3
|
-
description: Defines the standard framework, provider configuration, observability approach, and LLM mocking patterns for simple LLM calls in Python. Use when building, reviewing, or scaffolding any code that makes direct LLM calls using LangChain, manages prompt context, or handles conversation history. For agentic patterns see agentme-edr-019, for workflow patterns see agentme-edr-
|
|
3
|
+
description: Defines the standard framework, provider configuration, observability approach, and LLM mocking patterns for simple LLM calls in Python. Use when building, reviewing, or scaffolding any code that makes direct LLM calls using LangChain, manages prompt context, or handles conversation history. For agentic patterns see agentme-edr-019, for workflow patterns see agentme-edr-021.
|
|
4
4
|
apply-to: Python projects that make direct LLM calls, manage prompt context, or handle conversation threads
|
|
5
5
|
valid-from: 2026-06-05
|
|
6
6
|
---
|
|
@@ -29,7 +29,7 @@ Three distinct tiers of LLM-based computation are recognized in this policy. Eve
|
|
|
29
29
|
|
|
30
30
|
These tiers nest: in general, a Workflow may contain Agent nodes; an Agent uses LLM calls internally. The tier of a component is determined by its outermost controlling structure.
|
|
31
31
|
|
|
32
|
-
See [agentme-edr-019](019-ai-agents-development-standards.md) for Agent implementation standards and [agentme-edr-
|
|
32
|
+
See [agentme-edr-019](019-ai-agents-development-standards.md) for Agent implementation standards and [agentme-edr-021](021-ai-workflow-development-standards.md) for Workflow implementation standards.
|
|
33
33
|
|
|
34
34
|
### Details
|
|
35
35
|
|
|
@@ -71,6 +71,7 @@ llm = ChatOpenAI(
|
|
|
71
71
|
Enable LangChain auto-tracing at every application entry point by calling `mlflow.langchain.autolog()` during startup, before any LLM call is made.
|
|
72
72
|
|
|
73
73
|
- This captures inputs, outputs, token counts, and latency for every LangChain chain or runnable automatically.
|
|
74
|
+
- The project Makefile MUST expose a `dev-mlflow` target to start a local MLflow tracking server for development inspection, per [agentme-edr-008](../devops/008-common-targets.md) rule `09-ai-project-dev-targets`.
|
|
74
75
|
|
|
75
76
|
#### 04-unit-test-mocking
|
|
76
77
|
|
|
@@ -173,8 +174,8 @@ prompt = PromptTemplate.from_file(
|
|
|
173
174
|
## References
|
|
174
175
|
|
|
175
176
|
- [agentme-edr-019](019-ai-agents-development-standards.md) — Agent implementation standards (deepagents, tool-invocation loops)
|
|
176
|
-
- [agentme-edr-
|
|
177
|
+
- [agentme-edr-021](021-ai-workflow-development-standards.md) — Workflow implementation standards (LangGraph, MLflow run-level tracking)
|
|
177
178
|
- [agentme-edr-004](../principles/004-unit-test-requirements.md) — Unit test requirements including external API mocking guidance
|
|
178
179
|
- [agentme-edr-014](014-python-project-tooling.md) — Python project tooling and structure
|
|
179
180
|
- [agentme-edr-007](../principles/007-project-quality-standards.md) — Project quality standards including AI-tier testing requirements (rule `09-ai-project-testing-requirements`)
|
|
180
|
-
- [agentme-edr-
|
|
181
|
+
- [agentme-edr-028](028-ai-eval-standards.md) — AI eval standards: folder structure, script requirements, and MLflow tracking
|
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: agentme-edr-policy-019-ai-agents-development-standards
|
|
3
|
-
description: Defines the
|
|
4
|
-
apply-to: AI agent projects
|
|
3
|
+
description: Defines the structural patterns and design decisions for building AI agents with tool-invocation loops using the deepagents framework: framework selection, sandbox setup, state naming, agent naming, composition patterns, and system prompt structure. Use when designing or scaffolding a new agent. For tool definitions, error handling, observability, and testing see agentme-edr-020. For simple LLM calls see agentme-edr-018, for workflow orchestration see agentme-edr-021.
|
|
4
|
+
apply-to: AI agent projects — consult when designing agent structure, choosing sandbox approach, defining naming conventions, and composing multi-agent systems
|
|
5
5
|
valid-from: 2026-06-05
|
|
6
6
|
---
|
|
7
7
|
|
|
@@ -9,9 +9,9 @@ valid-from: 2026-06-05
|
|
|
9
9
|
|
|
10
10
|
## Context and Problem Statement
|
|
11
11
|
|
|
12
|
-
AI applications often need to give LLMs the ability to autonomously choose and invoke tools to accomplish tasks. Without standardized patterns for agent
|
|
12
|
+
AI applications often need to give LLMs the ability to autonomously choose and invoke tools to accomplish tasks. Without standardized structural patterns for agent design, projects end up with incompatible approaches to framework selection, sandbox setup, state management, naming, composition, and system prompts.
|
|
13
13
|
|
|
14
|
-
Which framework should be used for building agents
|
|
14
|
+
Which framework should be used for building agents, how should agents be sandboxed, named, composed, and how should their system prompts be structured?
|
|
15
15
|
|
|
16
16
|
## Decision Outcome
|
|
17
17
|
|
|
@@ -50,6 +50,7 @@ Use deepagents sandbox whenever ANY of the following is true:
|
|
|
50
50
|
|
|
51
51
|
**Integration requirements:**
|
|
52
52
|
|
|
53
|
+
- The sandbox MUST always be initialized with `virtual_mode=True` to prevent the agent from reading or writing files outside the mounted workspace. Omitting this flag allows the agent unrestricted host filesystem access, which is a security violation.
|
|
53
54
|
- Initialize the sandbox at the start of the agent run and shut it down in the same `try/finally` block.
|
|
54
55
|
- Pass the sandbox handle into the agent's state so all tool calls share the same sandbox instance.
|
|
55
56
|
- If the host-side code needs to pass files into the sandbox (e.g. generated config or input data), create a temporary directory with `tempfile.mkdtemp()`, write the files there, and mount it into the sandbox. Clean it up in the `finally` block.
|
|
@@ -69,7 +70,7 @@ def run_file_analysis_agent(input_files: List[Path]) -> AnalysisResult:
|
|
|
69
70
|
shutil.copy(f, tmp_dir)
|
|
70
71
|
|
|
71
72
|
# Initialize sandbox with mounted directory
|
|
72
|
-
sandbox = Sandbox(mount_paths={tmp_dir: "/workspace"})
|
|
73
|
+
sandbox = Sandbox(mount_paths={tmp_dir: "/workspace"}, virtual_mode=True)
|
|
73
74
|
|
|
74
75
|
# Run agent with sandbox
|
|
75
76
|
agent = FileAnalysisAgent(sandbox=sandbox)
|
|
@@ -86,84 +87,9 @@ def run_file_analysis_agent(input_files: List[Path]) -> AnalysisResult:
|
|
|
86
87
|
**State type naming:**
|
|
87
88
|
|
|
88
89
|
- Agent state types MUST end with `_agent_state` suffix (e.g., `file_analyzer_agent_state`)
|
|
89
|
-
- Follow [agentme-edr-
|
|
90
|
+
- Follow [agentme-edr-021](021-ai-workflow-development-standards.md) rule `11-state-type-conventions` when agents are used as workflow nodes
|
|
90
91
|
|
|
91
|
-
#### 04-
|
|
92
|
-
|
|
93
|
-
Tools provided to agents MUST follow these patterns:
|
|
94
|
-
|
|
95
|
-
**Tool signature:**
|
|
96
|
-
|
|
97
|
-
```python
|
|
98
|
-
from typing import Any, Dict
|
|
99
|
-
|
|
100
|
-
def tool_name(arg1: str, arg2: int) -> Dict[str, Any]:
|
|
101
|
-
"""
|
|
102
|
-
Brief description of what the tool does.
|
|
103
|
-
|
|
104
|
-
Args:
|
|
105
|
-
arg1: Description of arg1
|
|
106
|
-
arg2: Description of arg2
|
|
107
|
-
|
|
108
|
-
Returns:
|
|
109
|
-
Dictionary with tool execution results
|
|
110
|
-
"""
|
|
111
|
-
# Tool implementation
|
|
112
|
-
return {"status": "success", "result": ...}
|
|
113
|
-
```
|
|
114
|
-
|
|
115
|
-
**Tool requirements:**
|
|
116
|
-
|
|
117
|
-
- Tool names MUST be descriptive action verbs (e.g., `search_files`, `execute_command`, `read_document`)
|
|
118
|
-
- Tool docstrings MUST clearly describe the tool's purpose, arguments, and return value (the LLM reads these)
|
|
119
|
-
- Tools MUST return structured data (dictionaries or dataclasses), not bare strings or untyped values
|
|
120
|
-
- Tools MUST handle errors gracefully and return error information in the result structure, not raise exceptions
|
|
121
|
-
- Tools that interact with external systems MUST be placed in `adapters/connectors/` per [agentme-edr-026](026-pragmatic-hexagonal-architecture.md)
|
|
122
|
-
|
|
123
|
-
**Error handling in tools:**
|
|
124
|
-
|
|
125
|
-
```python
|
|
126
|
-
def search_files(pattern: str, directory: str = ".") -> Dict[str, Any]:
|
|
127
|
-
"""Search for files matching a glob pattern."""
|
|
128
|
-
try:
|
|
129
|
-
matches = list(Path(directory).glob(pattern))
|
|
130
|
-
return {
|
|
131
|
-
"status": "success",
|
|
132
|
-
"matches": [str(m) for m in matches],
|
|
133
|
-
"count": len(matches)
|
|
134
|
-
}
|
|
135
|
-
except Exception as e:
|
|
136
|
-
return {
|
|
137
|
-
"status": "error",
|
|
138
|
-
"error_message": str(e),
|
|
139
|
-
"error_type": type(e).__name__
|
|
140
|
-
}
|
|
141
|
-
```
|
|
142
|
-
|
|
143
|
-
#### 05-agent-error-handling-and-recovery
|
|
144
|
-
|
|
145
|
-
Agents MUST implement robust error handling:
|
|
146
|
-
|
|
147
|
-
**Maximum iteration limits:**
|
|
148
|
-
|
|
149
|
-
- Every agent MUST have a maximum iteration limit to prevent infinite loops
|
|
150
|
-
- The default maximum SHOULD be configurable and logged when reached
|
|
151
|
-
- When the maximum is reached, the agent MUST return a structured failure result, not raise an exception
|
|
152
|
-
|
|
153
|
-
**Tool failure handling:**
|
|
154
|
-
|
|
155
|
-
- When a tool returns an error, the agent MUST be able to observe the error and decide on recovery actions
|
|
156
|
-
- Tools MUST NOT raise exceptions for expected failures (network errors, file not found, etc.)
|
|
157
|
-
- Agents MAY implement retry logic with exponential backoff for transient failures
|
|
158
|
-
|
|
159
|
-
**Terminal states:**
|
|
160
|
-
|
|
161
|
-
Agents MUST recognize and handle three terminal states:
|
|
162
|
-
- **Success**: Goal achieved, task complete
|
|
163
|
-
- **Failure**: Goal cannot be achieved, give up gracefully
|
|
164
|
-
- **Timeout**: Maximum iterations reached, return partial results if possible
|
|
165
|
-
|
|
166
|
-
#### 06-agent-naming-conventions
|
|
92
|
+
#### 04-agent-naming-conventions
|
|
167
93
|
|
|
168
94
|
Agent class names MUST follow the pattern `<Purpose>Agent` where `<Purpose>` describes what the agent does:
|
|
169
95
|
|
|
@@ -178,92 +104,14 @@ Agent class names MUST follow the pattern `<Purpose>Agent` where `<Purpose>` des
|
|
|
178
104
|
- `MyAgent` (not descriptive)
|
|
179
105
|
- `Agent1` (numbered, not semantic)
|
|
180
106
|
|
|
181
|
-
When agents are used as nodes in workflows, the node name MUST use the `_agent` suffix per [agentme-edr-
|
|
182
|
-
|
|
183
|
-
#### 07-agent-observability
|
|
184
|
-
|
|
185
|
-
Agent execution MUST be observable through logging and tracing:
|
|
186
|
-
|
|
187
|
-
- Log each iteration of the perceive → plan → act → observe cycle with iteration number and tool selection.
|
|
188
|
-
- Use structured logging (JSON) with fields: `iteration`, `tool_selected`, `tool_result_status`, `decision`.
|
|
189
|
-
- For LLM calls within agents, follow [agentme-edr-018](018-ai-llm-development-standards.md) rule `03-llm-observability`.
|
|
190
|
-
- When agents run as workflow nodes, MLflow tracking from the parent workflow automatically captures agent-level traces.
|
|
107
|
+
When agents are used as nodes in workflows, the node name MUST use the `_agent` suffix per [agentme-edr-021](021-ai-workflow-development-standards.md) rule `09-node-naming-conventions`.
|
|
191
108
|
|
|
192
|
-
|
|
193
|
-
|
|
194
|
-
```json
|
|
195
|
-
{
|
|
196
|
-
"timestamp": "2026-06-05T10:30:45Z",
|
|
197
|
-
"agent": "FileAnalyzerAgent",
|
|
198
|
-
"iteration": 3,
|
|
199
|
-
"tool_selected": "search_files",
|
|
200
|
-
"tool_args": {"pattern": "*.py"},
|
|
201
|
-
"tool_result_status": "success",
|
|
202
|
-
"decision": "continue"
|
|
203
|
-
}
|
|
204
|
-
```
|
|
205
|
-
|
|
206
|
-
#### 08-agent-unit-testing
|
|
207
|
-
|
|
208
|
-
Agent LLM calls are external API calls and MUST be mocked in unit tests per [agentme-edr-018](018-ai-llm-development-standards.md) rule `04-unit-test-mocking`.
|
|
209
|
-
|
|
210
|
-
Because agents drive a tool-invocation loop — where the LLM decides which tools to call — the fake model must return **tool-call messages** followed by a final answer. Use **`GenericFakeChatModel`** for this:
|
|
211
|
-
|
|
212
|
-
```python
|
|
213
|
-
from langchain_core.language_models.fake_chat_models import GenericFakeChatModel
|
|
214
|
-
from langchain_core.messages import AIMessage
|
|
215
|
-
|
|
216
|
-
def test_file_analyzer_agent_calls_search_then_stops():
|
|
217
|
-
# Iteration 1: LLM requests a tool call
|
|
218
|
-
tool_call_msg = AIMessage(
|
|
219
|
-
content="",
|
|
220
|
-
tool_calls=[{
|
|
221
|
-
"name": "search_files",
|
|
222
|
-
"args": {"pattern": "*.py", "directory": "/workspace"},
|
|
223
|
-
"id": "call_1"
|
|
224
|
-
}]
|
|
225
|
-
)
|
|
226
|
-
# Iteration 2: LLM produces a final answer after observing the tool result
|
|
227
|
-
final_msg = AIMessage(content="Found 3 Python files matching the pattern.")
|
|
228
|
-
|
|
229
|
-
fake_model = GenericFakeChatModel(messages=iter([tool_call_msg, final_msg]))
|
|
230
|
-
|
|
231
|
-
agent = FileAnalyzerAgent(model=fake_model)
|
|
232
|
-
result = agent.run(directory="/workspace")
|
|
233
|
-
|
|
234
|
-
assert result.status == "success"
|
|
235
|
-
assert "3 Python files" in result.summary
|
|
236
|
-
```
|
|
237
|
-
|
|
238
|
-
Agents MUST be designed so that the LLM instance is injectable (constructor parameter) to allow test doubles. See [agentme-edr-018](018-ai-llm-development-standards.md) rule `04-unit-test-mocking` for the injectable LLM pattern.
|
|
239
|
-
|
|
240
|
-
**`mock_deep_agent`**
|
|
241
|
-
|
|
242
|
-
Place `mock_deep_agent` in a shared test utilities module (e.g., `tests/helpers.py`) so all test files that need it can import it from one location and mock deep_agent instances when needed.
|
|
243
|
-
|
|
244
|
-
**Example usage:**
|
|
245
|
-
|
|
246
|
-
```python
|
|
247
|
-
from tests.helpers import mock_deep_agent
|
|
248
|
-
|
|
249
|
-
def test_workflow_calls_subagent(mocker):
|
|
250
|
-
mock_deep_agent(
|
|
251
|
-
mocker,
|
|
252
|
-
"mypackage.nodes.analysis_node.create_workflow_agent",
|
|
253
|
-
output={"status": "success", "findings": ["issue A"]}
|
|
254
|
-
)
|
|
255
|
-
|
|
256
|
-
result = run_analysis_workflow(input_data)
|
|
257
|
-
|
|
258
|
-
assert result.findings == ["issue A"]
|
|
259
|
-
```
|
|
260
|
-
|
|
261
|
-
#### 09-agent-composition
|
|
109
|
+
#### 05-agent-composition
|
|
262
110
|
|
|
263
111
|
When multiple agents are needed:
|
|
264
112
|
|
|
265
113
|
- **Single agent with multiple tools:** Use when tools share a common goal and context (e.g., a code analysis agent with `read_file`, `search_code`, and `analyze_pattern` tools).
|
|
266
|
-
- **Multiple agents as workflow nodes:** Use when agents have distinct responsibilities and outputs that feed into each other. Orchestrate them using LangGraph per [agentme-edr-
|
|
114
|
+
- **Multiple agents as workflow nodes:** Use when agents have distinct responsibilities and outputs that feed into each other. Orchestrate them using LangGraph per [agentme-edr-021](021-ai-workflow-development-standards.md).
|
|
267
115
|
- Do NOT create nested agent loops (agent calling agent autonomously). Use workflows for multi-agent orchestration.
|
|
268
116
|
|
|
269
117
|
**Decision criteria:**
|
|
@@ -274,11 +122,74 @@ When multiple agents are needed:
|
|
|
274
122
|
| Multiple workflow-orchestrated agents | Each agent has a distinct goal; outputs flow between agents; deterministic sequencing needed |
|
|
275
123
|
| Nested agents (FORBIDDEN) | Never — always use workflow orchestration instead |
|
|
276
124
|
|
|
125
|
+
#### 06-agent-system-prompt-structure
|
|
126
|
+
|
|
127
|
+
Every agent system prompt MUST follow this XML-section template. Sections must appear in this order. Required sections must always be present; optional sections may be omitted when they genuinely do not apply; never reorder them.
|
|
128
|
+
|
|
129
|
+
```xml
|
|
130
|
+
<OBJECTIVE>
|
|
131
|
+
[A one or two-sentence summary of the agent's main deliverable.
|
|
132
|
+
e.g.: Split the incoming file list into logical batches for parallel processing.]
|
|
133
|
+
</OBJECTIVE>
|
|
134
|
+
|
|
135
|
+
<ROLE>
|
|
136
|
+
[Defines who the agent is, its area of expertise, and its core persona.
|
|
137
|
+
If running inside a workflow, define exactly which node in WORKFLOW_CONTEXT this agent corresponds to.
|
|
138
|
+
e.g.: You are the batch_planning_agent (see WORKFLOW_CONTEXT). You are an expert at partitioning large file sets into balanced, directory-aware batches.]
|
|
139
|
+
</ROLE>
|
|
140
|
+
|
|
141
|
+
<INPUT>
|
|
142
|
+
[All inputs for this agent invocation. For standalone agents: list only the agent-specific inputs. For workflow agents: list workflow-level inputs shared across all agents first, then agent-specific inputs such as judge outcomes, counters, or intermediate results from upstream nodes.]
|
|
143
|
+
</INPUT>
|
|
144
|
+
|
|
145
|
+
<!-- Optional: include when the agent follows a non-trivial sequence of steps -->
|
|
146
|
+
<STEPS>
|
|
147
|
+
[Numbered list or chronological steps detailing how the agent should process an incoming request.
|
|
148
|
+
e.g.:
|
|
149
|
+
1. Analyse the file list and group files by directory.
|
|
150
|
+
2. Assign files to batches respecting the size constraints.
|
|
151
|
+
3. Emit the JSON output described in OUTPUT_FORMAT.]
|
|
152
|
+
</STEPS>
|
|
153
|
+
|
|
154
|
+
<!-- Optional: include when tool use needs explicit guidance -->
|
|
155
|
+
<TOOL_GUIDANCE>
|
|
156
|
+
[Explicit instructions on when and how to use external tools, preventing the agent from guessing or using the wrong sequence.
|
|
157
|
+
e.g.: Do not call any tools. All reasoning is done in-context using the INPUT fields only.]
|
|
158
|
+
</TOOL_GUIDANCE>
|
|
159
|
+
|
|
160
|
+
<!-- Optional: include when hard constraints need to be stated explicitly -->
|
|
161
|
+
<GUARDRAILS>
|
|
162
|
+
[Hard, non-negotiable constraints the agent must never violate.
|
|
163
|
+
e.g.: NEVER create batches with fewer than 5 or more than 20 files. NEVER split files from the same directory across different batches unless unavoidable.]
|
|
164
|
+
</GUARDRAILS>
|
|
165
|
+
|
|
166
|
+
<OUTPUT_FORMAT>
|
|
167
|
+
[A templated example or JSON schema specifying exactly how the final response should look.
|
|
168
|
+
e.g.: Respond with a JSON object matching this schema: ...]
|
|
169
|
+
</OUTPUT_FORMAT>
|
|
170
|
+
|
|
171
|
+
<!-- Workflow-only: omit this section for standalone (non-workflow) agents -->
|
|
172
|
+
<WORKFLOW_CONTEXT>
|
|
173
|
+
[A detailed prose or diagram description of the overall workflow graph so the agent understands its role within the larger flow. Reference the specific node name that maps to this agent. Include enough detail about upstream and downstream nodes so the agent can reason about its context.]
|
|
174
|
+
</WORKFLOW_CONTEXT>
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
**Rules:**
|
|
178
|
+
|
|
179
|
+
| Section | Required? | Notes |
|
|
180
|
+
|---|---|---|
|
|
181
|
+
| `<OBJECTIVE>` | Required | One or two sentences summarising the agent's main deliverable. |
|
|
182
|
+
| `<ROLE>` | Required | Agent persona and expertise. When inside a workflow, MUST reference its node name from `<WORKFLOW_CONTEXT>`. |
|
|
183
|
+
| `<INPUT>` | Required | List ALL inputs. For workflow agents: workflow-level inputs first, then agent-specific inputs. |
|
|
184
|
+
| `<STEPS>` | Optional | Include when the agent follows a non-trivial numbered sequence of steps. |
|
|
185
|
+
| `<TOOL_GUIDANCE>` | Optional | Include when tool use order or conditions need explicit direction. |
|
|
186
|
+
| `<GUARDRAILS>` | Optional | Hard constraints that must never be violated. |
|
|
187
|
+
| `<OUTPUT_FORMAT>` | Required | MUST include a concrete schema or templated example; do not leave it vague. |
|
|
188
|
+
| `<WORKFLOW_CONTEXT>` | Conditional | MUST be omitted for standalone agents. MUST be present when the agent runs as a node inside a LangGraph workflow. |
|
|
189
|
+
|
|
277
190
|
## References
|
|
278
191
|
|
|
279
192
|
- [agentme-edr-018](018-ai-llm-development-standards.md) — LLM development standards (LangChain configuration, mocking patterns)
|
|
280
|
-
- [agentme-edr-
|
|
281
|
-
- [agentme-edr-
|
|
193
|
+
- [agentme-edr-021](021-ai-workflow-development-standards.md) — Workflow development standards (using agents as workflow nodes)
|
|
194
|
+
- [agentme-edr-020](020-ai-agents-quality-standards.md) — Agent implementation quality standards (tool definitions, error handling, observability, unit testing)
|
|
282
195
|
- [agentme-edr-014](014-python-project-tooling.md) — Python project tooling and structure
|
|
283
|
-
- [agentme-edr-007](../principles/007-project-quality-standards.md) — Project quality standards including AI-tier testing requirements (rule `09-ai-project-testing-requirements`)
|
|
284
|
-
- [agentme-edr-021](021-ai-eval-standards.md) — AI eval standards: folder structure, script requirements, and MLflow tracking
|
|
@@ -0,0 +1,182 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: agentme-edr-policy-020-ai-agents-quality-standards
|
|
3
|
+
description: Defines implementation quality standards for AI agents: tool definition patterns, error handling and recovery, observability, and unit testing. Apply alongside agentme-edr-019 when implementing or reviewing agent code. For agent architecture and structural decisions (framework, sandbox, naming, composition, system prompts) see agentme-edr-019.
|
|
4
|
+
apply-to: AI agent implementation code — apply when writing tools, error handlers, logging, and unit tests for agents
|
|
5
|
+
valid-from: 2026-06-09
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# agentme-edr-policy-020: AI agents quality standards
|
|
9
|
+
|
|
10
|
+
## Context and Problem Statement
|
|
11
|
+
|
|
12
|
+
Beyond selecting the right framework and structural patterns, agent implementations require consistent standards for how tools are defined, how errors are handled, how execution is observed, and how agents are tested. Without these standards, agents become hard to debug, unreliable, and untestable.
|
|
13
|
+
|
|
14
|
+
How should agent tools be defined, what error handling must agents implement, how should agent execution be observed, and how should agents be unit tested?
|
|
15
|
+
|
|
16
|
+
## Decision Outcome
|
|
17
|
+
|
|
18
|
+
**Agent implementations MUST follow the tool definition, error handling, observability, and unit testing standards defined here, alongside the structural decisions in [agentme-edr-019](019-ai-agents-development-standards.md).**
|
|
19
|
+
|
|
20
|
+
### Details
|
|
21
|
+
|
|
22
|
+
#### 01-tool-definition-patterns
|
|
23
|
+
|
|
24
|
+
Tools provided to agents MUST follow these patterns:
|
|
25
|
+
|
|
26
|
+
**Tool signature:**
|
|
27
|
+
|
|
28
|
+
```python
|
|
29
|
+
from typing import Any, Dict
|
|
30
|
+
|
|
31
|
+
def tool_name(arg1: str, arg2: int) -> Dict[str, Any]:
|
|
32
|
+
"""
|
|
33
|
+
Brief description of what the tool does.
|
|
34
|
+
|
|
35
|
+
Args:
|
|
36
|
+
arg1: Description of arg1
|
|
37
|
+
arg2: Description of arg2
|
|
38
|
+
|
|
39
|
+
Returns:
|
|
40
|
+
Dictionary with tool execution results
|
|
41
|
+
"""
|
|
42
|
+
# Tool implementation
|
|
43
|
+
return {"status": "success", "result": ...}
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
**Tool requirements:**
|
|
47
|
+
|
|
48
|
+
- Tool names MUST be descriptive action verbs (e.g., `search_files`, `execute_command`, `read_document`)
|
|
49
|
+
- Tool docstrings MUST clearly describe the tool's purpose, arguments, and return value (the LLM reads these)
|
|
50
|
+
- Tools MUST return structured data (dictionaries or dataclasses), not bare strings or untyped values
|
|
51
|
+
- Tools MUST handle errors gracefully and return error information in the result structure, not raise exceptions
|
|
52
|
+
- Tools that interact with external systems MUST be placed in `adapters/connectors/` per [agentme-edr-026](026-pragmatic-hexagonal-architecture.md)
|
|
53
|
+
|
|
54
|
+
**Error handling in tools:**
|
|
55
|
+
|
|
56
|
+
```python
|
|
57
|
+
def search_files(pattern: str, directory: str = ".") -> Dict[str, Any]:
|
|
58
|
+
"""Search for files matching a glob pattern."""
|
|
59
|
+
try:
|
|
60
|
+
matches = list(Path(directory).glob(pattern))
|
|
61
|
+
return {
|
|
62
|
+
"status": "success",
|
|
63
|
+
"matches": [str(m) for m in matches],
|
|
64
|
+
"count": len(matches)
|
|
65
|
+
}
|
|
66
|
+
except Exception as e:
|
|
67
|
+
return {
|
|
68
|
+
"status": "error",
|
|
69
|
+
"error_message": str(e),
|
|
70
|
+
"error_type": type(e).__name__
|
|
71
|
+
}
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
#### 02-agent-error-handling-and-recovery
|
|
75
|
+
|
|
76
|
+
Agents MUST implement robust error handling:
|
|
77
|
+
|
|
78
|
+
**Maximum iteration limits:**
|
|
79
|
+
|
|
80
|
+
- Every agent MUST have a maximum iteration limit to prevent infinite loops
|
|
81
|
+
- The default maximum SHOULD be configurable and logged when reached
|
|
82
|
+
- When the maximum is reached, the agent MUST return a structured failure result, not raise an exception
|
|
83
|
+
|
|
84
|
+
**Tool failure handling:**
|
|
85
|
+
|
|
86
|
+
- When a tool returns an error, the agent MUST be able to observe the error and decide on recovery actions
|
|
87
|
+
- Tools MUST NOT raise exceptions for expected failures (network errors, file not found, etc.)
|
|
88
|
+
- Agents MAY implement retry logic with exponential backoff for transient failures
|
|
89
|
+
|
|
90
|
+
**Terminal states:**
|
|
91
|
+
|
|
92
|
+
Agents MUST recognize and handle three terminal states:
|
|
93
|
+
- **Success**: Goal achieved, task complete
|
|
94
|
+
- **Failure**: Goal cannot be achieved, give up gracefully
|
|
95
|
+
- **Timeout**: Maximum iterations reached, return partial results if possible
|
|
96
|
+
|
|
97
|
+
#### 03-agent-observability
|
|
98
|
+
|
|
99
|
+
Agent execution MUST be observable through logging and tracing:
|
|
100
|
+
|
|
101
|
+
- Log each iteration of the perceive → plan → act → observe cycle with iteration number and tool selection.
|
|
102
|
+
- Use structured logging (JSON) with fields: `iteration`, `tool_selected`, `tool_result_status`, `decision`.
|
|
103
|
+
- For LLM calls within agents, follow [agentme-edr-018](018-ai-llm-development-standards.md) rule `03-llm-observability`.
|
|
104
|
+
- When agents run as workflow nodes, MLflow tracking from the parent workflow automatically captures agent-level traces.
|
|
105
|
+
- The project Makefile MUST expose a `dev-mlflow` target to start a local MLflow tracking server for development inspection, per [agentme-edr-008](../devops/008-common-targets.md) rule `09-ai-project-dev-targets`.
|
|
106
|
+
|
|
107
|
+
**Example structured log entry:**
|
|
108
|
+
|
|
109
|
+
```json
|
|
110
|
+
{
|
|
111
|
+
"timestamp": "2026-06-05T10:30:45Z",
|
|
112
|
+
"agent": "FileAnalyzerAgent",
|
|
113
|
+
"iteration": 3,
|
|
114
|
+
"tool_selected": "search_files",
|
|
115
|
+
"tool_args": {"pattern": "*.py"},
|
|
116
|
+
"tool_result_status": "success",
|
|
117
|
+
"decision": "continue"
|
|
118
|
+
}
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
#### 04-agent-unit-testing
|
|
122
|
+
|
|
123
|
+
Agent LLM calls are external API calls and MUST be mocked in unit tests per [agentme-edr-018](018-ai-llm-development-standards.md) rule `04-unit-test-mocking`.
|
|
124
|
+
|
|
125
|
+
Because agents drive a tool-invocation loop — where the LLM decides which tools to call — the fake model must return **tool-call messages** followed by a final answer. Use **`GenericFakeChatModel`** for this:
|
|
126
|
+
|
|
127
|
+
```python
|
|
128
|
+
from langchain_core.language_models.fake_chat_models import GenericFakeChatModel
|
|
129
|
+
from langchain_core.messages import AIMessage
|
|
130
|
+
|
|
131
|
+
def test_file_analyzer_agent_calls_search_then_stops():
|
|
132
|
+
# Iteration 1: LLM requests a tool call
|
|
133
|
+
tool_call_msg = AIMessage(
|
|
134
|
+
content="",
|
|
135
|
+
tool_calls=[{
|
|
136
|
+
"name": "search_files",
|
|
137
|
+
"args": {"pattern": "*.py", "directory": "/workspace"},
|
|
138
|
+
"id": "call_1"
|
|
139
|
+
}]
|
|
140
|
+
)
|
|
141
|
+
# Iteration 2: LLM produces a final answer after observing the tool result
|
|
142
|
+
final_msg = AIMessage(content="Found 3 Python files matching the pattern.")
|
|
143
|
+
|
|
144
|
+
fake_model = GenericFakeChatModel(messages=iter([tool_call_msg, final_msg]))
|
|
145
|
+
|
|
146
|
+
agent = FileAnalyzerAgent(model=fake_model)
|
|
147
|
+
result = agent.run(directory="/workspace")
|
|
148
|
+
|
|
149
|
+
assert result.status == "success"
|
|
150
|
+
assert "3 Python files" in result.summary
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
Agents MUST be designed so that the LLM instance is injectable (constructor parameter) to allow test doubles. See [agentme-edr-018](018-ai-llm-development-standards.md) rule `04-unit-test-mocking` for the injectable LLM pattern.
|
|
154
|
+
|
|
155
|
+
**`mock_deep_agent`**
|
|
156
|
+
|
|
157
|
+
Place `mock_deep_agent` in a shared test utilities module (e.g., `tests/helpers.py`) so all test files that need it can import it from one location and mock deep_agent instances when needed.
|
|
158
|
+
|
|
159
|
+
**Example usage:**
|
|
160
|
+
|
|
161
|
+
```python
|
|
162
|
+
from tests.helpers import mock_deep_agent
|
|
163
|
+
|
|
164
|
+
def test_workflow_calls_subagent(mocker):
|
|
165
|
+
mock_deep_agent(
|
|
166
|
+
mocker,
|
|
167
|
+
"mypackage.nodes.analysis_node.create_workflow_agent",
|
|
168
|
+
output={"status": "success", "findings": ["issue A"]}
|
|
169
|
+
)
|
|
170
|
+
|
|
171
|
+
result = run_analysis_workflow(input_data)
|
|
172
|
+
|
|
173
|
+
assert result.findings == ["issue A"]
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
## References
|
|
177
|
+
|
|
178
|
+
- [agentme-edr-019](019-ai-agents-development-standards.md) — Agent development standards (framework, sandbox, naming, composition, system prompts)
|
|
179
|
+
- [agentme-edr-018](018-ai-llm-development-standards.md) — LLM development standards (LangChain configuration, mocking patterns)
|
|
180
|
+
- [agentme-edr-026](026-pragmatic-hexagonal-architecture.md) — Hexagonal architecture (tool placement in adapters/connectors)
|
|
181
|
+
- [agentme-edr-007](../principles/007-project-quality-standards.md) — Project quality standards including AI-tier testing requirements (rule `09-ai-project-testing-requirements`)
|
|
182
|
+
- [agentme-edr-028](028-ai-eval-standards.md) — AI eval standards: folder structure, script requirements, and MLflow tracking
|
|
@@ -1,11 +1,11 @@
|
|
|
1
1
|
---
|
|
2
|
-
name: agentme-edr-policy-
|
|
2
|
+
name: agentme-edr-policy-021-ai-workflow-development-standards
|
|
3
3
|
description: Defines the standard toolchain, framework, observability, and workflow patterns for building LangGraph workflows in Python. Use when scaffolding, reviewing, or extending AI workflow projects that orchestrate LLM calls, agents, and algorithmic nodes. For simple LLM calls see agentme-edr-018, for agentic patterns see agentme-edr-019.
|
|
4
4
|
apply-to: AI workflow projects using LangGraph StateGraph built with Python
|
|
5
5
|
valid-from: 2026-06-05
|
|
6
6
|
---
|
|
7
7
|
|
|
8
|
-
# agentme-edr-policy-
|
|
8
|
+
# agentme-edr-policy-021: AI workflow development standards
|
|
9
9
|
|
|
10
10
|
## Context and Problem Statement
|
|
11
11
|
|
|
@@ -33,10 +33,11 @@ Use **MLflow** for all workflow observability and evaluation:
|
|
|
33
33
|
- **LLM-level auto-tracing:** Enable LangChain auto-tracing per [agentme-edr-018](018-ai-llm-development-standards.md) rule `03-llm-observability` by calling `mlflow.langchain.autolog()` during application startup. This captures inputs, outputs, token counts, and latency for every LangChain call within workflow nodes.
|
|
34
34
|
- Log run parameters (model name, temperature, prompt version) and output metrics (accuracy, latency, token counts) using `mlflow.log_param` / `mlflow.log_metric`.
|
|
35
35
|
- Run a local MLflow tracking server with `mlflow ui` to inspect runs during development. Do not require a remote MLflow server for local development.
|
|
36
|
+
- The project Makefile MUST expose a `dev-mlflow` target to start the local MLflow tracking server, per [agentme-edr-008](../devops/008-common-targets.md) rule `09-ai-project-dev-targets`.
|
|
36
37
|
|
|
37
38
|
#### 04-dataset-driven-accuracy-measurement
|
|
38
39
|
|
|
39
|
-
Eval dataset and implementation requirements are defined in [agentme-edr-
|
|
40
|
+
Eval dataset and implementation requirements are defined in [agentme-edr-028](028-ai-eval-standards.md). Testing requirements (when evals are required, release gates) are defined in [agentme-edr-007](../principles/007-project-quality-standards.md) rule `09-ai-project-testing-requirements`.
|
|
40
41
|
|
|
41
42
|
#### 05-flow-documentation
|
|
42
43
|
|
|
@@ -100,7 +101,7 @@ lib/src/<package_name>/
|
|
|
100
101
|
|
|
101
102
|
#### 08-workflow-evals
|
|
102
103
|
|
|
103
|
-
Eval folder structure and script requirements are defined in [agentme-edr-
|
|
104
|
+
Eval folder structure and script requirements are defined in [agentme-edr-028](028-ai-eval-standards.md).
|
|
104
105
|
|
|
105
106
|
#### 09-node-naming-conventions
|
|
106
107
|
|
|
@@ -257,5 +258,5 @@ result = graph.invoke(input_state, config={"thread_id": "session-123"})
|
|
|
257
258
|
- [agentme-edr-026](026-pragmatic-hexagonal-architecture.md) — Adapter/application layer separation that defines the project layout
|
|
258
259
|
- [agentme-edr-014](014-python-project-tooling.md) — Python project tooling and structure
|
|
259
260
|
- [agentme-edr-024](024-ml-dataset-structure.md) — ML dataset structure for eval datasets
|
|
260
|
-
- [agentme-edr-
|
|
261
|
+
- [agentme-edr-028](028-ai-eval-standards.md) — AI eval standards: folder structure, script requirements, and MLflow tracking
|
|
261
262
|
- [agentme-edr-007](../principles/007-project-quality-standards.md) — Project quality standards including AI-tier testing requirements (rule `09-ai-project-testing-requirements`)
|
|
@@ -17,7 +17,7 @@ How should an AI agent project integrate XDRS as its runtime source of truth for
|
|
|
17
17
|
|
|
18
18
|
**Embed XDRS documents in `lib/data/.xdrs/`, instruct the agent to consult them via `AGENTS.md`, equip the agent with sandboxed file tools, and use the deepagents framework when a local sandbox is required.**
|
|
19
19
|
|
|
20
|
-
This policy MUST only be applied when the project explicitly chooses XDRS as its knowledge governance layer. It is not required by [agentme-edr-019](019-ai-agents-development-standards.md) or [agentme-edr-
|
|
20
|
+
This policy MUST only be applied when the project explicitly chooses XDRS as its knowledge governance layer. It is not required by [agentme-edr-019](019-ai-agents-development-standards.md) or [agentme-edr-021](021-ai-workflow-development-standards.md) in general.
|
|
21
21
|
|
|
22
22
|
### Details
|
|
23
23
|
|
|
@@ -59,7 +59,7 @@ Read /AGENTS.md and follow all instructions in it before proceeding.
|
|
|
59
59
|
|
|
60
60
|
#### 02-agent-file-tools
|
|
61
61
|
|
|
62
|
-
Every agent that uses the XDRS knowledge layer MUST use the file tools provided by the deepagents framework. Do not implement hand-rolled alternatives — see [agentme-edr-
|
|
62
|
+
Every agent that uses the XDRS knowledge layer MUST use the file tools provided by the deepagents framework. Do not implement hand-rolled alternatives — see [agentme-edr-019 rule 02-local-sandbox](019-ai-agents-development-standards.md) for the full sandbox and tool requirements.
|
|
63
63
|
|
|
64
64
|
These tools operate over two sandboxed roots (configured in rule `03-local-sandbox`):
|
|
65
65
|
|
|
@@ -72,7 +72,7 @@ These tools operate over two sandboxed roots (configured in rule `03-local-sandb
|
|
|
72
72
|
|
|
73
73
|
#### 03-local-sandbox
|
|
74
74
|
|
|
75
|
-
Follow [agentme-edr-
|
|
75
|
+
Follow [agentme-edr-019 rule 02-local-sandbox](019-ai-agents-development-standards.md) for the general deepagents sandbox setup. When XDRS is in use, add the following mounts to the sandbox configuration:
|
|
76
76
|
|
|
77
77
|
| Source | Content | Deepagents sandbox path |
|
|
78
78
|
|---|---|---|
|
|
@@ -92,8 +92,13 @@ agents_md = Path(temp_root) / "AGENTS.md"
|
|
|
92
92
|
agents_md.write_text(_AGENTS_MD) # content from xdrs-core AGENTS.md template; see rule 01-xdrs-knowledge-layer
|
|
93
93
|
|
|
94
94
|
# Add these mounts alongside the base mounts from agentme-edr-019 rule 02-local-sandbox:
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
{
|
|
98
|
-
|
|
95
|
+
# (mount_paths uses {src: dst} dict format per agentme-edr-019)
|
|
96
|
+
sandbox = Sandbox(
|
|
97
|
+
mount_paths={
|
|
98
|
+
tmp_dir: "/workspace",
|
|
99
|
+
f"{data_root}/.xdrs": "/.xdrs",
|
|
100
|
+
str(agents_md): "/AGENTS.md",
|
|
101
|
+
},
|
|
102
|
+
virtual_mode=True,
|
|
103
|
+
)
|
|
99
104
|
```
|
package/.xdrs/agentme/edrs/application/{021-ai-eval-standards.md → 028-ai-eval-standards.md}
RENAMED
|
@@ -1,11 +1,11 @@
|
|
|
1
1
|
---
|
|
2
|
-
name: agentme-edr-policy-
|
|
2
|
+
name: agentme-edr-policy-028-ai-eval-standards
|
|
3
3
|
description: Defines how to structure, write, and run eval tests for AI projects — folder layout, script requirements, and MLflow tracking. Use when implementing evals for LLM, Agent, or Workflow projects. For when evals are required see agentme-edr-007 rule 09-ai-project-testing-requirements.
|
|
4
4
|
apply-to: Python AI projects (LLM, Agent, or Workflow tier) that implement eval testing
|
|
5
5
|
valid-from: 2026-06-05
|
|
6
6
|
---
|
|
7
7
|
|
|
8
|
-
# agentme-edr-policy-
|
|
8
|
+
# agentme-edr-policy-028: AI eval standards
|
|
9
9
|
|
|
10
10
|
## Context and Problem Statement
|
|
11
11
|
|
|
@@ -86,5 +86,6 @@ with mlflow.start_run():
|
|
|
86
86
|
- [agentme-edr-007](../principles/007-project-quality-standards.md) — Project quality standards: when evals are required per AI tier (rule `09-ai-project-testing-requirements`) and statistical model eval targets (rule `07-statistical-models-must-have-eval-targets`)
|
|
87
87
|
- [agentme-edr-018](018-ai-llm-development-standards.md) — LLM development standards: LangChain framework and observability
|
|
88
88
|
- [agentme-edr-019](019-ai-agents-development-standards.md) — Agent development standards
|
|
89
|
-
- [agentme-edr-
|
|
89
|
+
- [agentme-edr-021](021-ai-workflow-development-standards.md) — Workflow development standards
|
|
90
|
+
|
|
90
91
|
- [agentme-edr-024](024-ml-dataset-structure.md) — ML dataset structure for eval datasets
|
|
@@ -73,7 +73,7 @@ Targets are organized into five lifecycle groups. Projects must use these names
|
|
|
73
73
|
| Target | Purpose |
|
|
74
74
|
|--------|---------|
|
|
75
75
|
| `setup` | Run `mise install` and any small project bootstrap needed before normal targets work. This is the first command after checkout. |
|
|
76
|
-
| `all` | Alias that runs `build`, `lint`, and `test` in sequence. Must be the default target (i.e., running `make` or the runner with no arguments invokes `all`). Used by developers as a fast pre-push check to verify the software meets minimum quality standards in one command. |
|
|
76
|
+
| `all` | Alias that runs `build`, `lint`, and `test` in sequence. Must be the default target (i.e., running `make` or the runner with no arguments invokes `all`). Used by developers as a fast pre-push check to verify the software meets minimum quality standards in one command. Must only invoke targets that run **offline** — no external credentials, running servers, paid APIs, or environment-specific configuration outside the repository. |
|
|
77
77
|
| `clean` | Remove all temporary or generated files created during build, lint, or test (e.g., `node_modules`, virtual environments, compiled binaries, generated files). Used both locally and in CI for a clean slate. |
|
|
78
78
|
| `dev` | Run the software locally for development (e.g., start a Node.js API server, open a Jupyter notebook, launch a React dev server). May have debugging tools, verbose logging, or hot reloading features enabled. |
|
|
79
79
|
| `run` | Run the software in production mode (e.g., start a compiled binary, launch a production server). No debugging or development-only features should be enabled. |
|
|
@@ -93,13 +93,13 @@ Targets are organized into five lifecycle groups. Projects must use these names
|
|
|
93
93
|
|
|
94
94
|
| Target | Purpose |
|
|
95
95
|
|--------|---------|
|
|
96
|
-
| `lint` | Run **all static quality checks** outside of tests. This MUST include: code formatting validation, code style enforcement, code smell detection, static analysis, dependency audits for known CVEs, security vulnerability scans (e.g., SAST), and project/configuration structure checks. All checks must be non-destructive (read-only); fixes are handled by `lint-fix`. |
|
|
96
|
+
| `lint` | Run **all static quality checks** outside of tests. This MUST include: code formatting validation, code style enforcement, code smell detection, static analysis, dependency audits for known CVEs, security vulnerability scans (e.g., SAST), and project/configuration structure checks. All checks must be non-destructive (read-only); fixes are handled by `lint-fix`. Must only invoke subtargets that run **offline** (no external credentials or services). |
|
|
97
97
|
| `lint-fix` | Automatically fix linting and formatting issues where possible. || `lint-format` | *(Optional)* Check code formatting only (e.g., Prettier, gofmt, Black). |
|
|
98
98
|
##### Test group
|
|
99
99
|
|
|
100
100
|
| Target | Purpose |
|
|
101
101
|
|--------|---------|
|
|
102
|
-
| `test` | Run **all tests** required for the project. This MUST include unit tests (with coverage enforcement — the build MUST fail if coverage thresholds are not met) and integration
|
|
102
|
+
| `test` | Run **all offline tests** required for the project. This MUST include unit tests (with coverage enforcement — the build MUST fail if coverage thresholds are not met) and any integration or end-to-end tests that run **offline** (no external servers, credentials, or paid APIs). Normally delegates to `test-unit` and, when offline, `test-integration` in sequence. Suffixed targets that require external dependencies must not be invoked automatically — see rule 08. |
|
|
103
103
|
| `test-unit` | Run unit tests only, including coverage report generation and coverage threshold enforcement. |
|
|
104
104
|
| `test-integration` | *(Optional)* Run integration and end-to-end tests only. Projects without integration tests may omit this target. |
|
|
105
105
|
| `test-smoke` | *(Optional)* Run a fast, minimal subset of tests to verify the software is basically functional. Useful as a post-deploy health check. |
|
|
@@ -150,6 +150,28 @@ The prefix convention ensures developers can infer the purpose of any target wit
|
|
|
150
150
|
|
|
151
151
|
---
|
|
152
152
|
|
|
153
|
+
#### 09-ai-project-dev-targets
|
|
154
|
+
|
|
155
|
+
AI-based projects (LLM, Agent, and Workflow tiers as defined in [agentme-edr-018](../application/018-ai-llm-development-standards.md)) MUST expose a `dev-mlflow` target that starts a local MLflow tracking server for development inspection.
|
|
156
|
+
|
|
157
|
+
**Example implementation:**
|
|
158
|
+
|
|
159
|
+
```makefile
|
|
160
|
+
dev-mlflow:
|
|
161
|
+
mise exec -- mlflow ui --host 0.0.0.0 --port 5000
|
|
162
|
+
open http://localhost:5000/
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
---
|
|
166
|
+
|
|
167
|
+
#### 08-default-targets-must-only-include-offline-subtargets
|
|
168
|
+
|
|
169
|
+
`make all`, `make test`, and `make lint` must include every subtarget that runs **offline** — meaning it requires no external credentials, no running servers, no paid APIs, and no environment-specific configuration outside the repository.
|
|
170
|
+
|
|
171
|
+
Subtargets that require external dependencies (e.g., `test-integration` against a live database, `test-e2e` against a staging environment, `lint-api` against a remote schema registry) **must** exist as named targets so developers can invoke them explicitly, but **must not** be invoked from `all`, `test`, or `lint`.
|
|
172
|
+
|
|
173
|
+
---
|
|
174
|
+
|
|
153
175
|
#### 06-monorepo-usage
|
|
154
176
|
|
|
155
177
|
In a monorepo, each module has its own `Makefile` with its own `build`, `lint`, `test`, and `deploy` targets scoped to that module. Parent-level Makefiles (at the application or repo root) delegate to child Makefiles in sequence. The parent Makefile should call `$(MAKE) -C <child> <target>` directly, while each child `Makefile` runs its actual tool commands through `mise exec --`.
|
|
@@ -194,6 +216,9 @@ make lint-fix
|
|
|
194
216
|
# run the software in dev mode (may have hot reload, debug tools enabled, verbose logging etc)
|
|
195
217
|
make dev
|
|
196
218
|
|
|
219
|
+
# [AI projects only] start a local MLflow tracking server for development inspection
|
|
220
|
+
make dev-mlflow
|
|
221
|
+
|
|
197
222
|
# run the software in production mode
|
|
198
223
|
make run
|
|
199
224
|
|
|
@@ -0,0 +1,158 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: agentme-edr-policy-027-environment-variable-configuration-files
|
|
3
|
+
description: Defines when to use YAML config files versus .env files for configuration, how to combine them, and how .env is loaded for spawned processes. Use when setting up project configuration for any application, CLI, or library.
|
|
4
|
+
apply-to: All projects that use environment variables for configuration
|
|
5
|
+
valid-from: 2026-06-09
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# agentme-edr-policy-027: Environment variable configuration files
|
|
9
|
+
|
|
10
|
+
## Context and Problem Statement
|
|
11
|
+
|
|
12
|
+
Projects need a consistent way to define non-secret configuration — service URLs, feature flags, port numbers, runtime modes — that varies across environments. Ad-hoc approaches (hardcoded defaults, scattered exports, application-level dotenv loaders, and flat env-var-only configs) lead to inconsistent behavior and unclear ownership of configuration.
|
|
13
|
+
|
|
14
|
+
CLI tools additionally need to handle multi-attribute invocation configuration without forcing users to provide every value as a flag. At the same time, some of those values may be environment-specific and must not be committed to the repository.
|
|
15
|
+
|
|
16
|
+
How should projects manage environment variable configuration and CLI invocation configuration across local development, deployment stages, and Makefiles?
|
|
17
|
+
|
|
18
|
+
## Decision Outcome
|
|
19
|
+
|
|
20
|
+
**Use YAML config files for CLI invocation configuration with multiple attributes; use `.env` files to supply environment variables to spawned processes and to hold uncommitted values referenced by config files. Load `.env` exclusively at process launch time — never inside application code.**
|
|
21
|
+
|
|
22
|
+
Secrets (API keys, passwords, tokens) must never be placed in `.env` files. Those are handled by [agentme-edr-022](../principles/022-secrets-management.md).
|
|
23
|
+
|
|
24
|
+
### Details
|
|
25
|
+
|
|
26
|
+
#### 01-when-to-use-dotenv
|
|
27
|
+
|
|
28
|
+
Use a `.env` file when either of the following is true:
|
|
29
|
+
|
|
30
|
+
1. **Spawned process needs env vars** — the project launches a process (a deployable service, background worker, or shell script) that reads configuration from OS environment variables such as port numbers or API endpoint URLs.
|
|
31
|
+
2. **Value must not be committed** — a configuration value used in a YAML config file (see rule 07) is environment-specific or sensitive enough to exclude from version control. In that case, store the value in `.env` and reference it from the YAML file using env var substitution (see rule 08).
|
|
32
|
+
|
|
33
|
+
Do not use `.env` as a general-purpose configuration store when a YAML config file is the right tool (see rule 07).
|
|
34
|
+
|
|
35
|
+
Example `.env` for a service with process-level env vars:
|
|
36
|
+
```
|
|
37
|
+
SERVER_URL=http://localhost:8080
|
|
38
|
+
LOG_LEVEL=debug
|
|
39
|
+
FEATURE_FLAG_NEW_UI=false
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
---
|
|
43
|
+
|
|
44
|
+
#### 02-dotenv-not-committed
|
|
45
|
+
|
|
46
|
+
`.env` must be listed in `.gitignore` and must never be committed to the repository. It is intended for local use in standalone projects and libraries that do not have a formal deployment pipeline.
|
|
47
|
+
|
|
48
|
+
---
|
|
49
|
+
|
|
50
|
+
#### 03-dotenv-example-committed
|
|
51
|
+
|
|
52
|
+
A `.env.example` file must be committed alongside `.env`. It contains all the same variable names with placeholder or illustrative values — no real URLs, credentials, or server names. This file documents what configuration is expected without exposing real values.
|
|
53
|
+
|
|
54
|
+
Example `.env.example`:
|
|
55
|
+
```
|
|
56
|
+
SERVER_URL=http://localhost:8080
|
|
57
|
+
LOG_LEVEL=debug
|
|
58
|
+
FEATURE_FLAG_NEW_UI=false
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
---
|
|
62
|
+
|
|
63
|
+
#### 04-stage-specific-dotenv-committed
|
|
64
|
+
|
|
65
|
+
Stage-specific overrides must use the naming convention `.env.[stage]` (e.g., `.env.production`, `.env.staging`, `.env.test`). These files may be committed to the repository because they carry deployment-stage configuration rather than local developer configuration. They are used during deployment pipelines where the stage is known and explicit.
|
|
66
|
+
|
|
67
|
+
The generic `.env` must still not be committed. The distinction is: `.env` is for local, ad-hoc, standalone use; `.env.[stage]` is for deployment pipelines with a defined environment identity.
|
|
68
|
+
|
|
69
|
+
---
|
|
70
|
+
|
|
71
|
+
#### 05-load-in-makefile-before-processes
|
|
72
|
+
|
|
73
|
+
When `.env` defines variables consumed by shell scripts or spawned processes, the Makefile must load and export them before invoking those processes. Use the following pattern at the top of the relevant Makefile or in a shared include:
|
|
74
|
+
|
|
75
|
+
```makefile
|
|
76
|
+
ifneq (,$(wildcard .env))
|
|
77
|
+
include .env
|
|
78
|
+
export
|
|
79
|
+
endif
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
This ensures all variables in `.env` are available as environment variables to every child process spawned by `make`. The `ifneq` guard prevents errors when `.env` does not exist (e.g., in CI or fresh checkouts).
|
|
83
|
+
|
|
84
|
+
---
|
|
85
|
+
|
|
86
|
+
#### 06-no-application-level-dotenv-loading
|
|
87
|
+
|
|
88
|
+
Applications must not load `.env` files directly inside their own code using dotenv libraries or equivalent mechanisms. Configuration must enter the process exclusively as OS-level environment variables, set before the process is launched (by the Makefile, a shell script, CI, or a container runtime).
|
|
89
|
+
|
|
90
|
+
Prohibited patterns:
|
|
91
|
+
|
|
92
|
+
```python
|
|
93
|
+
# Python — disallowed
|
|
94
|
+
from dotenv import load_dotenv
|
|
95
|
+
load_dotenv()
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
```typescript
|
|
99
|
+
// TypeScript — disallowed
|
|
100
|
+
import dotenv from "dotenv";
|
|
101
|
+
dotenv.config();
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
```go
|
|
105
|
+
// Go — disallowed
|
|
106
|
+
godotenv.Load()
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
Permitted pattern: set env vars in the Makefile (see rule 05), then launch the application normally. Inside application code, read configuration only from `os.environ`, `process.env`, or the standard OS environment API for the language.
|
|
110
|
+
|
|
111
|
+
This rule prevents two parallel loading paths — OS env and file-based env — from coexisting invisibly in the same process.
|
|
112
|
+
|
|
113
|
+
---
|
|
114
|
+
|
|
115
|
+
#### 07-cli-adapters-use-yaml-config
|
|
116
|
+
|
|
117
|
+
CLI adapters with multiple configuration attributes must use a YAML config file rather than env vars or flags for those attributes. This applies whenever configuration is nested, repetitive, or too verbose for flags alone.
|
|
118
|
+
|
|
119
|
+
The CLI layer is responsible for loading and parsing the YAML file and passing the resolved values to the application layer. The application layer must not read the config file directly.
|
|
120
|
+
|
|
121
|
+
Default config file discovery should follow the pattern defined in [agentme-edr-015](../application/015-cli-tool-standards.md): load `[cwd]/[tool-name].yml` by default, or an explicit path provided via `--config`.
|
|
122
|
+
|
|
123
|
+
Example `myconfig.yml`:
|
|
124
|
+
```yaml
|
|
125
|
+
openapi_endpoint: https://example.com/openapi
|
|
126
|
+
log_level: debug
|
|
127
|
+
max_retries: 3
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
---
|
|
131
|
+
|
|
132
|
+
#### 08-env-var-substitution-in-config-files
|
|
133
|
+
|
|
134
|
+
When a YAML config file contains a value that must not be committed (such as a real endpoint URL, a username, or any other environment-specific value), that value must be expressed as an environment variable reference using `${VAR_NAME}` syntax, and the actual value must be defined in `.env`.
|
|
135
|
+
|
|
136
|
+
This keeps the YAML file committable while keeping the environment-specific value out of the repository.
|
|
137
|
+
|
|
138
|
+
Example:
|
|
139
|
+
|
|
140
|
+
`.env` (not committed):
|
|
141
|
+
```
|
|
142
|
+
OPENAPI_ENDPOINT=https://real-server.example.com/openapi
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
`myconfig.yml` (committed):
|
|
146
|
+
```yaml
|
|
147
|
+
openapi_endpoint: ${OPENAPI_ENDPOINT}
|
|
148
|
+
log_level: debug
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
The `.env` file must be loaded in the Makefile before launching the process (see rule 05) so the variable is available when the CLI or process reads the config file.
|
|
152
|
+
|
|
153
|
+
## References
|
|
154
|
+
|
|
155
|
+
- [agentme-edr-022](../principles/022-secrets-management.md) - Secrets must use OS keychains or cloud secret managers, not `.env` files
|
|
156
|
+
- [agentme-edr-017](017-tool-execution-and-scripting.md) - Makefiles are the authoritative command entry point; rule 05 above integrates with that standard
|
|
157
|
+
- [agentme-edr-008](008-common-targets.md) - Standard Makefile target names
|
|
158
|
+
- [agentme-edr-015](../application/015-cli-tool-standards.md) - CLI config file discovery and CLI-to-application separation
|
|
@@ -32,9 +32,10 @@ Language and framework-specific tooling and project structure.
|
|
|
32
32
|
- [agentme-edr-014](application/014-python-project-tooling.md) - **Python project tooling and structure** - Scaffold Python packages and CLIs with the standard layout *(includes skill: [005-create-python-project](application/skills/005-create-python-project/SKILL.md))*
|
|
33
33
|
- [agentme-edr-015](application/015-cli-tool-standards.md) - **CLI tool standards** - Define command UX and behavior for CLI tools
|
|
34
34
|
- [agentme-edr-018](application/018-ai-llm-development-standards.md) - **AI LLM development standards** - Standard framework (LangChain) and patterns for simple LLM calls with explicit configuration (no environment variables)
|
|
35
|
-
- [agentme-edr-019](application/019-ai-agents-development-standards.md) - **AI agents development standards** -
|
|
36
|
-
- [agentme-edr-020](application/020-ai-
|
|
37
|
-
- [agentme-edr-021](application/021-ai-
|
|
35
|
+
- [agentme-edr-019](application/019-ai-agents-development-standards.md) - **AI agents development standards** - Structural patterns for agents: framework selection, sandbox setup, naming conventions, composition, and system prompt structure
|
|
36
|
+
- [agentme-edr-020](application/020-ai-agents-quality-standards.md) - **AI agents implementation quality standards** - Tool definition patterns, error handling, observability, and unit testing for agents
|
|
37
|
+
- [agentme-edr-021](application/021-ai-workflow-development-standards.md) - **AI workflow development standards** - Standard toolchain (LangGraph), evaluation, and testing patterns for workflow projects
|
|
38
|
+
- [agentme-edr-028](application/028-ai-eval-standards.md) - **AI eval standards** - Folder structure, script requirements, and MLflow tracking for eval tests across LLM, Agent, and Workflow tiers
|
|
38
39
|
- [agentme-edr-024](application/024-ml-dataset-structure.md) - **ML dataset structure** - Standard folder layout and file conventions for ML datasets
|
|
39
40
|
- [agentme-edr-025](application/025-ai-agent-xdrs-knowledge-layer.md) - **AI agent XDRS knowledge layer** - How to integrate XDRS as the runtime source of truth for policies and skills in AI agents (apply only when the project explicitly uses XDRS)
|
|
40
41
|
- [agentme-edr-026](application/026-pragmatic-hexagonal-architecture.md) - **Pragmatic hexagonal architecture** - Organize application layers as External/Adapters/Application with practical coupling rules
|
|
@@ -48,6 +49,7 @@ Repository structure, build conventions, and CI/CD pipelines.
|
|
|
48
49
|
- [agentme-edr-006](devops/006-github-pipelines.md) - **GitHub CI/CD pipelines** - Define required CI stages and workflow structure
|
|
49
50
|
- [agentme-edr-008](devops/008-common-targets.md) - **Common development script names** - Reuse standard build, lint, and test target names
|
|
50
51
|
- [agentme-edr-017](devops/017-tool-execution-and-scripting.md) - **Tool execution and scripting** - Run tools consistently across shells, Makefiles, and CI
|
|
52
|
+
- [agentme-edr-027](devops/027-environment-variable-configuration.md) - **Environment variable configuration files** - Manage non-secret configuration with `.env` files, `.gitignore` rules, stage variants, and Makefile loading
|
|
51
53
|
|
|
52
54
|
## Governance
|
|
53
55
|
|
|
@@ -145,6 +145,7 @@ Projects that are libraries or shared utilities must include an `examples/` dire
|
|
|
145
145
|
**Root Makefile:**
|
|
146
146
|
|
|
147
147
|
```makefile
|
|
148
|
+
# test-examples runs the examples offline (no external services) → include in test
|
|
148
149
|
test: test-unit test-examples
|
|
149
150
|
|
|
150
151
|
test-unit:
|
|
@@ -154,6 +155,8 @@ test-examples:
|
|
|
154
155
|
$(MAKE) -C examples
|
|
155
156
|
```
|
|
156
157
|
|
|
158
|
+
If examples require live services or credentials, remove `test-examples` from the `test` dependency list and keep it as a standalone named target only. See [agentme-edr-008](../devops/008-common-targets.md) rule 08 for the full offline/online decision table.
|
|
159
|
+
|
|
157
160
|
**Examples Makefile:**
|
|
158
161
|
|
|
159
162
|
```makefile
|
|
@@ -241,7 +244,7 @@ AI projects are classified into three tiers — LLM, Agent, and Workflow — def
|
|
|
241
244
|
|---|---|---|---|
|
|
242
245
|
| **LLM** ([agentme-edr-018](../application/018-ai-llm-development-standards.md)) | Not required | Not required; SHOULD be used when critical prompts are in use to measure accuracy and detect model drift | Not required |
|
|
243
246
|
| **Agent** ([agentme-edr-019](../application/019-ai-agents-development-standards.md)) | Not required | Not required; MAY be used | Not required |
|
|
244
|
-
| **Workflow** ([agentme-edr-
|
|
247
|
+
| **Workflow** ([agentme-edr-021](../application/021-ai-workflow-development-standards.md)) | **Required** — see below | **Required** before every release; failed evals block release | Advised |
|
|
245
248
|
|
|
246
249
|
**Workflow unit test requirements:**
|
|
247
250
|
|
|
@@ -257,4 +260,4 @@ AI projects are classified into three tiers — LLM, Agent, and Workflow — def
|
|
|
257
260
|
- Evals MUST be executed before every release.
|
|
258
261
|
- Accuracy below project-defined thresholds MUST block the release. Thresholds MUST be documented in the eval Makefile or README.
|
|
259
262
|
- Evals MUST run against real LLM providers (not mocks) to capture model drift.
|
|
260
|
-
- For eval folder structure and script requirements, see [agentme-edr-
|
|
263
|
+
- For eval folder structure and script requirements, see [agentme-edr-028](../application/028-ai-eval-standards.md).
|
|
@@ -96,6 +96,26 @@ $ make run
|
|
|
96
96
|
# Application starts successfully
|
|
97
97
|
```
|
|
98
98
|
|
|
99
|
+
#### 05a-makefile-uses-security-utility
|
|
100
|
+
|
|
101
|
+
Makefile targets (e.g., `setup-secrets`) must use the macOS native `security` CLI to store and retrieve secrets from the keychain. This restricts Makefile-based secret management to macOS developer machines, which is acceptable since all contributors are expected to use macOS.
|
|
102
|
+
|
|
103
|
+
Do **not** use `keyring` or other cross-platform libraries in Makefiles — `security` is simpler to invoke from shell and requires no additional dependencies.
|
|
104
|
+
|
|
105
|
+
Storing a secret:
|
|
106
|
+
```makefile
|
|
107
|
+
security add-generic-password -a "$(USER)" -s "mymodule/api-key" -w "$(SECRET_VALUE)" -U
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
Retrieving a secret (e.g., to pass to a command):
|
|
111
|
+
```makefile
|
|
112
|
+
SECRET_VALUE := $(shell security find-generic-password -a "$(USER)" -s "mymodule/api-key" -w 2>/dev/null)
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
The `-U` flag updates the entry if it already exists. Use the format `<group>/<secret-id>` as the service name (`-s`) to mirror the module name and cloud secret manager ID convention defined in rule 02 and 05.
|
|
116
|
+
|
|
117
|
+
In library code (Python, JS/TS, Go), continue using the cross-platform libraries defined in rule 02 (`keyring`, `cross-keychain`, `go-keyring`). The `security` utility is only for Makefile scripts.
|
|
118
|
+
|
|
99
119
|
---
|
|
100
120
|
|
|
101
121
|
#### 06-never-log-or-leak-secrets
|