agentme 0.14.0 → 0.16.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (23) hide show
  1. package/.filedist-package.yml +1 -1
  2. package/.xdrs/agentme/edrs/application/003-javascript-project-tooling.md +3 -3
  3. package/.xdrs/agentme/edrs/application/010-golang-project-tooling.md +3 -3
  4. package/.xdrs/agentme/edrs/application/014-python-project-tooling.md +2 -2
  5. package/.xdrs/agentme/edrs/application/015-cli-tool-standards.md +6 -7
  6. package/.xdrs/agentme/edrs/application/018-ai-llm-development-standards.md +181 -0
  7. package/.xdrs/agentme/edrs/application/019-ai-agents-development-standards.md +286 -0
  8. package/.xdrs/agentme/edrs/application/020-ai-workflow-development-standards.md +262 -0
  9. package/.xdrs/agentme/edrs/application/021-ai-eval-standards.md +90 -0
  10. package/.xdrs/agentme/edrs/application/{019-ml-dataset-structure.md → 024-ml-dataset-structure.md} +2 -2
  11. package/.xdrs/agentme/edrs/application/{020-ai-agent-xdrs-knowledge-layer.md → 025-ai-agent-xdrs-knowledge-layer.md} +4 -4
  12. package/.xdrs/agentme/edrs/application/{021-pragmatic-hexagonal-architecture.md → 026-pragmatic-hexagonal-architecture.md} +2 -2
  13. package/.xdrs/agentme/edrs/application/skills/001-create-javascript-project/SKILL.md +2 -2
  14. package/.xdrs/agentme/edrs/application/skills/003-create-golang-project/SKILL.md +2 -2
  15. package/.xdrs/agentme/edrs/application/skills/005-create-python-project/SKILL.md +3 -3
  16. package/.xdrs/agentme/edrs/devops/008-common-targets.md +28 -3
  17. package/.xdrs/agentme/edrs/devops/027-environment-variable-configuration.md +158 -0
  18. package/.xdrs/agentme/edrs/index.md +8 -5
  19. package/.xdrs/agentme/edrs/principles/007-project-quality-standards.md +32 -1
  20. package/.xdrs/agentme/edrs/principles/022-secrets-management.md +20 -0
  21. package/package.json +3 -3
  22. package/.xdrs/agentme/edrs/application/018-ai-agent-development-standards.md +0 -309
  23. package/.xdrs/agentme/edrs/application/024-llm-development-standards.md +0 -116
@@ -0,0 +1,262 @@
1
+ ---
2
+ name: agentme-edr-policy-020-ai-workflow-development-standards
3
+ description: Defines the standard toolchain, framework, observability, and workflow patterns for building LangGraph workflows in Python. Use when scaffolding, reviewing, or extending AI workflow projects that orchestrate LLM calls, agents, and algorithmic nodes. For simple LLM calls see agentme-edr-018, for agentic patterns see agentme-edr-019.
4
+ apply-to: AI workflow projects using LangGraph StateGraph built with Python
5
+ valid-from: 2026-06-05
6
+ ---
7
+
8
+ # agentme-edr-policy-020: AI workflow development standards
9
+
10
+ ## Context and Problem Statement
11
+
12
+ AI workflow projects vary widely in how they structure directed graphs, manage state, evaluate outputs, and test execution paths. Without a shared baseline, projects accumulate incompatible patterns for flow design, state management, and dataset-driven testing.
13
+
14
+ Which tools, frameworks, and design patterns should AI workflow projects follow to ensure reproducibility, testability, and maintainability?
15
+
16
+ ## Decision Outcome
17
+
18
+ **Use Python with LangGraph for flow orchestration and MLflow for experiment tracking and local evaluation.**
19
+
20
+ ### Details
21
+
22
+ #### 01-language-and-framework
23
+
24
+ Workflows MUST be built with **LangGraph**. Use LangGraph `StateGraph` to model each distinct workflow as an explicit directed graph with typed state.
25
+
26
+ For all direct LLM calls within workflow nodes, use LangChain per [agentme-edr-018](018-ai-llm-development-standards.md). For agent nodes with tool-invocation loops, use deepagents per [agentme-edr-019](019-ai-agents-development-standards.md).
27
+
28
+ #### 03-observability-and-experiment-tracking
29
+
30
+ Use **MLflow** for all workflow observability and evaluation:
31
+
32
+ - **Workflow-level tracking:** Wrap each workflow run with `mlflow.start_run()` to capture traces, parameters, and metrics locally.
33
+ - **LLM-level auto-tracing:** Enable LangChain auto-tracing per [agentme-edr-018](018-ai-llm-development-standards.md) rule `03-llm-observability` by calling `mlflow.langchain.autolog()` during application startup. This captures inputs, outputs, token counts, and latency for every LangChain call within workflow nodes.
34
+ - Log run parameters (model name, temperature, prompt version) and output metrics (accuracy, latency, token counts) using `mlflow.log_param` / `mlflow.log_metric`.
35
+ - Run a local MLflow tracking server with `mlflow ui` to inspect runs during development. Do not require a remote MLflow server for local development.
36
+ - The project Makefile MUST expose a `dev-mlflow` target to start the local MLflow tracking server, per [agentme-edr-008](../devops/008-common-targets.md) rule `09-ai-project-dev-targets`.
37
+
38
+ #### 04-dataset-driven-accuracy-measurement
39
+
40
+ Eval dataset and implementation requirements are defined in [agentme-edr-021](021-ai-eval-standards.md). Testing requirements (when evals are required, release gates) are defined in [agentme-edr-007](../principles/007-project-quality-standards.md) rule `09-ai-project-testing-requirements`.
41
+
42
+ #### 05-flow-documentation
43
+
44
+ Each workflow MUST be documented as a **Mermaid graph** in a `README.md`. The diagram must match the LangGraph `StateGraph` definition:
45
+
46
+ - Use `graph TD` or `graph LR` direction.
47
+ - Label each node with its Python function name.
48
+ - Label conditional edges with the condition expression.
49
+ - Update the diagram whenever the graph topology changes.
50
+
51
+ Example minimal diagram block:
52
+
53
+ ```mermaid
54
+ graph TD
55
+ A[fetch_context] --> B[draft_response]
56
+ B --> C{verify}
57
+ C -->|pass| D[output]
58
+ C -->|fail| B
59
+ ```
60
+
61
+ #### 06-verification-steps
62
+
63
+ Workflows MUST include at least one explicit verification node before producing final output:
64
+
65
+ - Model the verification step as a dedicated LangGraph node (e.g. `verify_output`).
66
+ - The node checks the draft output against defined acceptance criteria (schema validation, factual consistency check, rubric scoring, or LLM-as-judge call).
67
+ - On failure, the verification node MUST route back to the relevant generation node, not silently pass through.
68
+ - Log verification results (pass/fail, score, reason) as MLflow metrics on the current run.
69
+
70
+ #### 07-workflow-structure
71
+
72
+ Workflow logic MUST be organized as named workflows following [agentme-edr-026](026-pragmatic-hexagonal-architecture.md). Each workflow is an independent LangGraph `StateGraph` with a defined start node and end node, connecting LLM nodes, agent nodes, algorithmic nodes, states, routes, and decision nodes.
73
+
74
+ Workflows live inside `app/workflows/` (the application layer), while external integrations such as LLM providers, vector stores, and third-party APIs live under `adapters/connectors/` (the outbound adapter layer). Inbound interfaces (HTTP API, CLI) live under `adapters/` as inbound adapters.
75
+
76
+ For each workflow named `<workflow>`, the full project layout is:
77
+
78
+ ```text
79
+ lib/src/<package_name>/
80
+ adapters/
81
+ http/ # inbound: API server that triggers workflows
82
+ cli/ # inbound: CLI entry point (if applicable)
83
+ connectors/ # outbound: external resource integrations
84
+ openai/ # LLM provider connector
85
+ azure-openai/ # alternative LLM provider connector
86
+ postgres/ # database connector (if applicable)
87
+ vector-store/ # vector DB connector (if applicable)
88
+ app/
89
+ workflows/
90
+ <workflow>/
91
+ graph.py # StateGraph definition; entry point for the workflow
92
+ agents.py # deepagents agent definitions used by this workflow
93
+ states.py # Typed state dataclasses / TypedDicts
94
+ routes.py # Conditional edge functions
95
+ shared/ # infrastructure-agnostic utilities
96
+ ```
97
+
98
+ - `app/workflows/<workflow>/graph.py` MUST define and compile the `StateGraph` and expose a `graph` object that callers invoke.
99
+ - Tool calls within workflow nodes that interact with external systems MUST use connectors from `adapters/connectors/`, not inline API calls.
100
+ - Additional modules (prompts, schemas) MAY be added inside `app/workflows/<workflow>/` when they are specific to that workflow. Shared utilities belong in `shared/`.
101
+
102
+ #### 08-workflow-evals
103
+
104
+ Eval folder structure and script requirements are defined in [agentme-edr-021](021-ai-eval-standards.md).
105
+
106
+ #### 09-node-naming-conventions
107
+
108
+ LangGraph node names MUST follow a suffix convention that communicates the node's role at a glance. Names MUST be action-oriented and descriptive.
109
+
110
+ | Suffix | Node type | When to use |
111
+ |---|---|---|
112
+ | `_llm` | LLM call | Any node whose primary action is a direct LLM inference call (see [agentme-edr-018](018-ai-llm-development-standards.md)) |
113
+ | `_step` | Algorithmic step | Deterministic logic with no LLM involvement (transformation, validation, routing) |
114
+ | `_tool` | Tool/API call | A node that wraps a single external tool or API (e.g. a REST endpoint, DB query) |
115
+ | `_agent` | Subgraph agent | A node that invokes a nested subgraph containing its own tool-invocation cycle and LLM calls; use the **deepagents** library for these nodes (see [agentme-edr-019](019-ai-agents-development-standards.md)) |
116
+
117
+ The Python function implementing the node SHOULD share the same name as the node alias passed to `add_node`, so that graph definitions and stack traces remain unambiguous:
118
+
119
+ ```python
120
+ def draft_doc_llm(state): ...
121
+ graph.add_node("draft_doc_llm", draft_doc_llm)
122
+
123
+ # Tool node — calls the Stripe API
124
+ def stripe_api_tool(state): ...
125
+ graph.add_node("stripe_api_tool", stripe_api_tool)
126
+
127
+ # Agent node — uses deepagents for tool-invocation loop
128
+ def code_reviewer_agent(state): ...
129
+ graph.add_node("code_reviewer_agent", code_reviewer_agent)
130
+ ```
131
+
132
+ Names MUST NOT use generic labels such as `node1`, `process`, or `run`. Each name must clearly express what action the node performs.
133
+
134
+ #### 10-workflow-unit-testing
135
+
136
+ All LLM calls within workflow nodes are external API calls and MUST be mocked in unit tests per [agentme-edr-018](018-ai-llm-development-standards.md) rule `04-unit-test-mocking`. Workflow unit tests must run fully offline with no real LLM provider calls.
137
+
138
+ Choose the mock utility based on what the node under test expects from the model:
139
+
140
+ - Use **`FakeListChatModel`** when nodes only read `AIMessage.content` (e.g. a routing node that checks a text label).
141
+ - Use **`GenericFakeChatModel`** when any node in the workflow expects tool calls, structured outputs, or when the workflow contains `_agent` nodes that drive a tool-invocation loop.
142
+
143
+ **Example — workflow with plain-text LLM nodes:**
144
+
145
+ ```python
146
+ from langchain_core.language_models.fake_chat_models import FakeListChatModel
147
+
148
+ def test_document_workflow_approve_path():
149
+ # Responses consumed in node execution order
150
+ fake_model = FakeListChatModel(responses=["APPROVE", "Meets all criteria."])
151
+
152
+ workflow = DocumentWorkflow(model=fake_model)
153
+ result = workflow.run(input_doc)
154
+
155
+ assert result.status == "approved"
156
+ ```
157
+
158
+ **Example — workflow containing an agent node (`_agent` suffix):**
159
+
160
+ ```python
161
+ from langchain_core.language_models.fake_chat_models import GenericFakeChatModel
162
+ from langchain_core.messages import AIMessage
163
+
164
+ def test_document_workflow_with_agent_node():
165
+ tool_call_msg = AIMessage(
166
+ content="",
167
+ tool_calls=[{"name": "fetch_context", "args": {"doc_id": "42"}, "id": "c1"}]
168
+ )
169
+ agent_final_msg = AIMessage(content="Context retrieved successfully.")
170
+ routing_msg = AIMessage(content="APPROVE")
171
+
172
+ fake_model = GenericFakeChatModel(
173
+ messages=iter([tool_call_msg, agent_final_msg, routing_msg])
174
+ )
175
+
176
+ workflow = DocumentWorkflow(model=fake_model)
177
+ result = workflow.run(input_doc)
178
+
179
+ assert result.status == "approved"
180
+ ```
181
+
182
+ Workflows MUST accept the LLM instance as a constructor parameter so that unit tests can inject a fake. See the injectable LLM pattern in [agentme-edr-018](018-ai-llm-development-standards.md) rule `04-unit-test-mocking`.
183
+
184
+ #### 11-state-type-conventions
185
+
186
+ All TypedDict and dataclass types that represent LangGraph node or workflow state MUST end with `_state` in their name. This suffix signals at a glance that the type is a state boundary, not a plain data model.
187
+
188
+ **Naming reference:**
189
+
190
+ | Owner | Naming pattern | Example |
191
+ |---|---|---|
192
+ | Single agent / agent subgraph | `<agent_name>_agent_state` | `reviewer_agent_state` |
193
+ | Full workflow (`StateGraph`) | `<workflow_name>_workflow_state` | `document_workflow_state` |
194
+ | Named group of nodes sharing state | `<group_responsibility>_state` | `retrieval_pipeline_state` |
195
+
196
+ **Boundary rules:**
197
+
198
+ - Each agent or agent subgraph MUST define its own dedicated state type. Do not reuse or extend a generic state across unrelated agents.
199
+ - Each workflow (`StateGraph`) MUST define its own top-level state type. The workflow state is the authoritative boundary for that graph's inputs and outputs.
200
+ - When a group of nodes (not a full workflow and not a single agent) shares a state type, the type name MUST clearly reflect the shared responsibility. Generic names such as `shared_state`, `common_state`, or `global_state` are FORBIDDEN.
201
+ - Large workflows MUST NOT use a single monolithic state that all nodes read and write. Split the state into per-phase or per-agent state types scoped to the subgraph or set of nodes that produce or consume each field.
202
+
203
+ State type names SHOULD align with the agent or node names defined in rule `09-node-naming-conventions` (e.g., an agent node named `draft_doc_agent` has a state type named `draft_doc_agent_state`).
204
+
205
+ #### 12-workflow-naming-conventions
206
+
207
+ LangGraph `StateGraph` instances and their enclosing classes MUST be given a meaningful name that conveys the workflow's input, output, and/or behavior. The name MUST end with `Workflow` (PascalCase class) or `_workflow` (snake_case variable or directory).
208
+
209
+ Choose a name that summarises what the workflow consumes, processes, and produces — avoid generic labels such as `Pipeline`, `Flow`, `Graph`, or `Process`.
210
+
211
+ | Context | Pattern | Example |
212
+ |---|---|---|
213
+ | Python class | `<DescriptiveName>Workflow` | `FileMapJudgeReduceWorkflow` |
214
+ | Python variable / instance | `<descriptive_name>_workflow` | `file_map_judge_reduce_workflow` |
215
+ | Directory under `app/workflows/` | `<descriptive_name>_workflow` | `financial_report_analysis_workflow/` |
216
+
217
+ **Good names** communicate purpose at a glance:
218
+
219
+ - `FileMapJudgeReduceWorkflow` — maps files, judges each, then reduces results
220
+ - `FinancialReportAnalysisWorkflow` — analyses financial report inputs
221
+ - `MarketingCampaignExecutorWorkflow` — executes a marketing campaign end-to-end
222
+
223
+ **Bad names** (FORBIDDEN): `MainWorkflow`, `AgentGraph`, `ProcessFlow`, `Workflow1`, `RunGraph`.
224
+
225
+ #### 15-workflow-state-persistence
226
+
227
+ For long-running workflows that may need to be paused and resumed:
228
+
229
+ - Use LangGraph's built-in checkpointing with `MemorySaver` for development and testing.
230
+ - Use persistent checkpointers (e.g., `PostgresSaver`, or Redis-based checkpointers) for production workflows that need durability.
231
+ - Checkpoint state MUST be serializable (use TypedDict or dataclasses with JSON-compatible fields).
232
+ - Document the checkpoint strategy in the workflow's README.md.
233
+
234
+ **Example with MemorySaver (development):**
235
+
236
+ ```python
237
+ from langgraph.checkpoint.memory import MemorySaver
238
+
239
+ checkpointer = MemorySaver()
240
+ graph = workflow.compile(checkpointer=checkpointer)
241
+
242
+ # Resume from checkpoint
243
+ result = graph.invoke(input_state, config={"thread_id": "session-123"})
244
+ ```
245
+
246
+ **When to use checkpointing:**
247
+
248
+ - Workflows that take > 30 seconds to complete
249
+ - Workflows that require human-in-the-loop approval or input
250
+ - Workflows that are non-indempotent
251
+ - Workflows that may fail mid-execution and need to be retried from the last successful node
252
+ - Multi-session workflows where state persists across user interactions
253
+
254
+ ## References
255
+
256
+ - [agentme-edr-018](018-ai-llm-development-standards.md) — LLM development standards: LangChain framework, provider configuration, LLM observability, and unit test mocking
257
+ - [agentme-edr-019](019-ai-agents-development-standards.md) — Agent development standards: deepagents framework, tool-invocation loops, and agent patterns
258
+ - [agentme-edr-026](026-pragmatic-hexagonal-architecture.md) — Adapter/application layer separation that defines the project layout
259
+ - [agentme-edr-014](014-python-project-tooling.md) — Python project tooling and structure
260
+ - [agentme-edr-024](024-ml-dataset-structure.md) — ML dataset structure for eval datasets
261
+ - [agentme-edr-021](021-ai-eval-standards.md) — AI eval standards: folder structure, script requirements, and MLflow tracking
262
+ - [agentme-edr-007](../principles/007-project-quality-standards.md) — Project quality standards including AI-tier testing requirements (rule `09-ai-project-testing-requirements`)
@@ -0,0 +1,90 @@
1
+ ---
2
+ name: agentme-edr-policy-021-ai-eval-standards
3
+ description: Defines how to structure, write, and run eval tests for AI projects — folder layout, script requirements, and MLflow tracking. Use when implementing evals for LLM, Agent, or Workflow projects. For when evals are required see agentme-edr-007 rule 09-ai-project-testing-requirements.
4
+ apply-to: Python AI projects (LLM, Agent, or Workflow tier) that implement eval testing
5
+ valid-from: 2026-06-05
6
+ ---
7
+
8
+ # agentme-edr-policy-021: AI eval standards
9
+
10
+ ## Context and Problem Statement
11
+
12
+ Eval tests measure AI component accuracy against expected outputs using real LLM providers. Without a shared folder layout and script convention, eval setups diverge across LLM, Agent, and Workflow projects, making them hard to run, compare, and integrate into CI/CD pipelines.
13
+
14
+ How should eval tests be structured and run across all AI tiers?
15
+
16
+ ## Decision Outcome
17
+
18
+ **Use a per-component folder structure under `evals/` with a standardized Makefile interface and MLflow-backed scripts, applicable to LLM, Agent, and Workflow components.**
19
+
20
+ For when evals are required per AI tier, see [agentme-edr-007](../principles/007-project-quality-standards.md) rule `09-ai-project-testing-requirements`.
21
+
22
+ ### Details
23
+
24
+ #### 01-eval-folder-structure
25
+
26
+ For each AI component being evaluated (an LLM chain, agent, or workflow), create a corresponding directory under `evals/` at the same level as `lib/` and `examples/`:
27
+
28
+ ```text
29
+ evals/
30
+ <component>/
31
+ Makefile # eval targets for this component
32
+ dataset_<group>/ # one folder per eval group (see agentme-edr-024)
33
+ eval_<group>.py # evaluation script for each group
34
+ ```
35
+
36
+ Where `<component>` is the name of the LLM chain, agent, or workflow being evaluated (e.g., `summarizer`, `file_analyzer_agent`, `document_review_workflow`).
37
+
38
+ The per-component `evals/<component>/Makefile` MUST define:
39
+
40
+ | Target | Behaviour |
41
+ |---|---|
42
+ | `eval` | Runs all eval groups for the component |
43
+ | `eval-<group>` | Runs one named group (e.g. `eval-simple`, `eval-complex`) |
44
+
45
+ The module root Makefile MUST expose a `make eval` target that delegates to `eval` in every `evals/<component>/Makefile`:
46
+
47
+ ```makefile
48
+ eval:
49
+ $(MAKE) -C evals/summarizer eval
50
+ $(MAKE) -C evals/document_review_workflow eval
51
+ ```
52
+
53
+ #### 02-eval-script-requirements
54
+
55
+ Each `eval_<group>.py` script MUST:
56
+
57
+ - Load the dataset from `evals/<component>/dataset_<group>/` following [agentme-edr-024](024-ml-dataset-structure.md). For input/output pairs, use the JSONL format per `agentme-edr-024.04-complex-structured-datasets-must-use-jsonl`.
58
+ - Run every input through the live component against **real LLM providers** (not mocked responses), to capture model drift.
59
+ - Log per-sample and aggregate metrics to an MLflow experiment that runs **locally** — a remote MLflow server MUST NOT be required.
60
+ - Compare outputs to expected values using project-defined quality thresholds. Thresholds MUST be declared explicitly (e.g., in a Makefile variable or README).
61
+ - Exit with a non-zero status when any metric falls below its defined threshold, consistent with [agentme-edr-007](../principles/007-project-quality-standards.md) rule `07-statistical-models-must-have-eval-targets`.
62
+
63
+ **Example:**
64
+
65
+ ```python
66
+ import mlflow
67
+ from my_package.app.workflows.document_review_workflow.graph import graph
68
+
69
+ EVAL_MIN_ACCURACY = 0.85
70
+
71
+ with mlflow.start_run():
72
+ results = []
73
+ for sample in load_dataset("evals/document_review_workflow/dataset_basic/"):
74
+ output = graph.invoke({"document": sample["input"]})
75
+ results.append(output["label"] == sample["expected_label"])
76
+
77
+ accuracy = sum(results) / len(results)
78
+ mlflow.log_metric("accuracy", accuracy)
79
+
80
+ if accuracy < EVAL_MIN_ACCURACY:
81
+ raise SystemExit(f"Eval failed: accuracy {accuracy:.2f} < {EVAL_MIN_ACCURACY}")
82
+ ```
83
+
84
+ ## References
85
+
86
+ - [agentme-edr-007](../principles/007-project-quality-standards.md) — Project quality standards: when evals are required per AI tier (rule `09-ai-project-testing-requirements`) and statistical model eval targets (rule `07-statistical-models-must-have-eval-targets`)
87
+ - [agentme-edr-018](018-ai-llm-development-standards.md) — LLM development standards: LangChain framework and observability
88
+ - [agentme-edr-019](019-ai-agents-development-standards.md) — Agent development standards
89
+ - [agentme-edr-020](020-ai-workflow-development-standards.md) — Workflow development standards
90
+ - [agentme-edr-024](024-ml-dataset-structure.md) — ML dataset structure for eval datasets
@@ -1,11 +1,11 @@
1
1
  ---
2
- name: agentme-edr-policy-019-ml-dataset-structure
2
+ name: agentme-edr-policy-024-ml-dataset-structure
3
3
  description: Defines the standard folder layout and file conventions for ML datasets used in AI/ML projects. Use when creating, organizing, or consuming datasets for machine learning tasks such as image labeling, document extraction, tabular data, LLM evaluation, and Q&A sets.
4
4
  apply-to: ML and AI projects that produce or consume datasets
5
5
  valid-from: 2026-05-27
6
6
  ---
7
7
 
8
- # agentme-edr-policy-019: ML dataset structure
8
+ # agentme-edr-policy-024: ML dataset structure
9
9
 
10
10
  ## Context and Problem Statement
11
11
 
@@ -1,11 +1,11 @@
1
1
  ---
2
- name: agentme-edr-policy-020-ai-agent-xdrs-knowledge-layer
2
+ name: agentme-edr-policy-025-ai-agent-xdrs-knowledge-layer
3
3
  description: Defines how to integrate XDRS as the runtime knowledge source of truth for AI agents — covering document placement, AGENTS.md setup, file tools, and local sandbox configuration. Apply only when the project explicitly uses XDRS to govern agent behavior.
4
4
  apply-to: AI agent projects that use XDRS as the source of truth for policies and skills
5
5
  valid-from: 2026-05-27
6
6
  ---
7
7
 
8
- # agentme-edr-policy-020: AI agent XDRS knowledge layer
8
+ # agentme-edr-policy-025: AI agent XDRS knowledge layer
9
9
 
10
10
  ## Context and Problem Statement
11
11
 
@@ -17,7 +17,7 @@ How should an AI agent project integrate XDRS as its runtime source of truth for
17
17
 
18
18
  **Embed XDRS documents in `lib/data/.xdrs/`, instruct the agent to consult them via `AGENTS.md`, equip the agent with sandboxed file tools, and use the deepagents framework when a local sandbox is required.**
19
19
 
20
- This policy MUST only be applied when the project explicitly chooses XDRS as its knowledge governance layer. It is not required by [agentme-edr-018](018-ai-agent-development-standards.md) in general.
20
+ This policy MUST only be applied when the project explicitly chooses XDRS as its knowledge governance layer. It is not required by [agentme-edr-019](019-ai-agents-development-standards.md) or [agentme-edr-020](020-ai-workflow-development-standards.md) in general.
21
21
 
22
22
  ### Details
23
23
 
@@ -91,7 +91,7 @@ data_root = str(files("myagent").joinpath("data"))
91
91
  agents_md = Path(temp_root) / "AGENTS.md"
92
92
  agents_md.write_text(_AGENTS_MD) # content from xdrs-core AGENTS.md template; see rule 01-xdrs-knowledge-layer
93
93
 
94
- # Add these mounts alongside the base mounts from agentme-edr-018 rule 09-local-sandbox:
94
+ # Add these mounts alongside the base mounts from agentme-edr-019 rule 02-local-sandbox:
95
95
  xdrs_mounts = [
96
96
  {"src": f"{data_root}/.xdrs", "dst": "/.xdrs", "readonly": True},
97
97
  {"src": str(agents_md), "dst": "/AGENTS.md", "readonly": True},
@@ -1,11 +1,11 @@
1
1
  ---
2
- name: agentme-edr-policy-021-pragmatic-hexagonal-architecture
2
+ name: agentme-edr-policy-026-pragmatic-hexagonal-architecture
3
3
  description: Defines a pragmatic variant of Hexagonal Architecture for organizing application source code into Adapters (inbound/outbound I/O boundaries) and Application (business logic) layers, with explicit naming conventions and folder structure. Use when designing or reviewing the internal layout of application modules.
4
4
  apply-to: All application projects
5
5
  valid-from: 2026-05-28
6
6
  ---
7
7
 
8
- # agentme-edr-policy-021: Pragmatic hexagonal architecture
8
+ # agentme-edr-policy-026: Pragmatic hexagonal architecture
9
9
 
10
10
  ## Context and Problem Statement
11
11
 
@@ -15,12 +15,12 @@ compatibility: JavaScript/TypeScript, Node.js 18+
15
15
 
16
16
  Creates a complete JavaScript/TypeScript project from scratch. The layout keeps the
17
17
  package self-contained in its module root (`lib/`), organizes internal code following
18
- [agentme-edr-021](../../021-pragmatic-hexagonal-architecture.md) (`adapters/`, `app/`, `shared/`),
18
+ [agentme-edr-026](../../026-pragmatic-hexagonal-architecture.md) (`adapters/`, `app/`, `shared/`),
19
19
  places runnable consumer examples in the sibling `examples/` folder, redirects persistent caches
20
20
  into `.cache/`, and uses Makefiles as the only entry points. Boilerplate is derived from the
21
21
  [filedist](https://github.com/flaviostutz/filedist) project.
22
22
 
23
- Related EDRs: [agentme-edr-003](../../003-javascript-project-tooling.md), [agentme-edr-016](../../../principles/016-cross-language-module-structure.md), [agentme-edr-021](../../021-pragmatic-hexagonal-architecture.md)
23
+ Related EDRs: [agentme-edr-003](../../003-javascript-project-tooling.md), [agentme-edr-016](../../../principles/016-cross-language-module-structure.md), [agentme-edr-026](../../026-pragmatic-hexagonal-architecture.md)
24
24
 
25
25
  ## Instructions
26
26
 
@@ -12,9 +12,9 @@ compatibility: Go 1.21+
12
12
 
13
13
  ## Overview
14
14
 
15
- Creates a complete Go project from scratch, following the layout from [agentme-edr-010](../../010-golang-project-tooling.md) and [agentme-edr-021](../../021-pragmatic-hexagonal-architecture.md). Business logic lives in `app/<feature>/` packages; CLI wiring lives in `adapters/cli/`; outbound integrations live in `adapters/connectors/`; `main.go` is a thin dispatcher. The module root owns its `Makefile`, `README.md`, `dist/`, and `.cache/` folders.
15
+ Creates a complete Go project from scratch, following the layout from [agentme-edr-010](../../010-golang-project-tooling.md) and [agentme-edr-026](../../026-pragmatic-hexagonal-architecture.md). Business logic lives in `app/<feature>/` packages; CLI wiring lives in `adapters/cli/`; outbound integrations live in `adapters/connectors/`; `main.go` is a thin dispatcher. The module root owns its `Makefile`, `README.md`, `dist/`, and `.cache/` folders.
16
16
 
17
- Related EDRs: [agentme-edr-010](../../010-golang-project-tooling.md), [agentme-edr-016](../../../principles/016-cross-language-module-structure.md), [agentme-edr-021](../../021-pragmatic-hexagonal-architecture.md)
17
+ Related EDRs: [agentme-edr-010](../../010-golang-project-tooling.md), [agentme-edr-016](../../../principles/016-cross-language-module-structure.md), [agentme-edr-026](../../026-pragmatic-hexagonal-architecture.md)
18
18
 
19
19
  ## Instructions
20
20
 
@@ -14,11 +14,11 @@ compatibility: Python 3.12+
14
14
 
15
15
  Creates a complete Python project from scratch using Mise, `uv`, `pyproject.toml`, Ruff,
16
16
  ty, Pytest, and Makefiles. The layout keeps the package self-contained under `lib/`,
17
- organizes internal code following [agentme-edr-021](../../021-pragmatic-hexagonal-architecture.md)
17
+ organizes internal code following [agentme-edr-026](../../026-pragmatic-hexagonal-architecture.md)
18
18
  (`adapters/`, `app/`, `shared/`), uses a shared root `.venv/`, redirects persistent caches into
19
19
  `.cache/`, and places runnable consumer projects under the sibling `examples/` folder.
20
20
 
21
- Related EDRs: [agentme-edr-014](../../014-python-project-tooling.md), [agentme-edr-016](../../../principles/016-cross-language-module-structure.md), [agentme-edr-021](../../021-pragmatic-hexagonal-architecture.md)
21
+ Related EDRs: [agentme-edr-014](../../014-python-project-tooling.md), [agentme-edr-016](../../../principles/016-cross-language-module-structure.md), [agentme-edr-026](../../026-pragmatic-hexagonal-architecture.md)
22
22
 
23
23
  ## Instructions
24
24
 
@@ -282,7 +282,7 @@ make test
282
282
 
283
283
  ### Phase 4: Create the package and tests inside `lib/`
284
284
 
285
- Create this baseline structure following [agentme-edr-021](../../021-pragmatic-hexagonal-architecture.md).
285
+ Create this baseline structure following [agentme-edr-026](../../026-pragmatic-hexagonal-architecture.md).
286
286
 
287
287
  **`lib/src/[package_name]/__init__.py`**
288
288
 
@@ -73,7 +73,7 @@ Targets are organized into five lifecycle groups. Projects must use these names
73
73
  | Target | Purpose |
74
74
  |--------|---------|
75
75
  | `setup` | Run `mise install` and any small project bootstrap needed before normal targets work. This is the first command after checkout. |
76
- | `all` | Alias that runs `build`, `lint`, and `test` in sequence. Must be the default target (i.e., running `make` or the runner with no arguments invokes `all`). Used by developers as a fast pre-push check to verify the software meets minimum quality standards in one command. |
76
+ | `all` | Alias that runs `build`, `lint`, and `test` in sequence. Must be the default target (i.e., running `make` or the runner with no arguments invokes `all`). Used by developers as a fast pre-push check to verify the software meets minimum quality standards in one command. Must only invoke targets that run **offline** — no external credentials, running servers, paid APIs, or environment-specific configuration outside the repository. |
77
77
  | `clean` | Remove all temporary or generated files created during build, lint, or test (e.g., `node_modules`, virtual environments, compiled binaries, generated files). Used both locally and in CI for a clean slate. |
78
78
  | `dev` | Run the software locally for development (e.g., start a Node.js API server, open a Jupyter notebook, launch a React dev server). May have debugging tools, verbose logging, or hot reloading features enabled. |
79
79
  | `run` | Run the software in production mode (e.g., start a compiled binary, launch a production server). No debugging or development-only features should be enabled. |
@@ -93,13 +93,13 @@ Targets are organized into five lifecycle groups. Projects must use these names
93
93
 
94
94
  | Target | Purpose |
95
95
  |--------|---------|
96
- | `lint` | Run **all static quality checks** outside of tests. This MUST include: code formatting validation, code style enforcement, code smell detection, static analysis, dependency audits for known CVEs, security vulnerability scans (e.g., SAST), and project/configuration structure checks. All checks must be non-destructive (read-only); fixes are handled by `lint-fix`. |
96
+ | `lint` | Run **all static quality checks** outside of tests. This MUST include: code formatting validation, code style enforcement, code smell detection, static analysis, dependency audits for known CVEs, security vulnerability scans (e.g., SAST), and project/configuration structure checks. All checks must be non-destructive (read-only); fixes are handled by `lint-fix`. Must only invoke subtargets that run **offline** (no external credentials or services). |
97
97
  | `lint-fix` | Automatically fix linting and formatting issues where possible. || `lint-format` | *(Optional)* Check code formatting only (e.g., Prettier, gofmt, Black). |
98
98
  ##### Test group
99
99
 
100
100
  | Target | Purpose |
101
101
  |--------|---------|
102
- | `test` | Run **all tests** required for the project. This MUST include unit tests (with coverage enforcement — the build MUST fail if coverage thresholds are not met) and integration/end-to-end tests. Normally delegates to `test-unit` and `test-integration` in sequence. |
102
+ | `test` | Run **all offline tests** required for the project. This MUST include unit tests (with coverage enforcement — the build MUST fail if coverage thresholds are not met) and any integration or end-to-end tests that run **offline** (no external servers, credentials, or paid APIs). Normally delegates to `test-unit` and, when offline, `test-integration` in sequence. Suffixed targets that require external dependencies must not be invoked automatically — see rule 08. |
103
103
  | `test-unit` | Run unit tests only, including coverage report generation and coverage threshold enforcement. |
104
104
  | `test-integration` | *(Optional)* Run integration and end-to-end tests only. Projects without integration tests may omit this target. |
105
105
  | `test-smoke` | *(Optional)* Run a fast, minimal subset of tests to verify the software is basically functional. Useful as a post-deploy health check. |
@@ -150,6 +150,28 @@ The prefix convention ensures developers can infer the purpose of any target wit
150
150
 
151
151
  ---
152
152
 
153
+ #### 09-ai-project-dev-targets
154
+
155
+ AI-based projects (LLM, Agent, and Workflow tiers as defined in [agentme-edr-018](../application/018-ai-llm-development-standards.md)) MUST expose a `dev-mlflow` target that starts a local MLflow tracking server for development inspection.
156
+
157
+ **Example implementation:**
158
+
159
+ ```makefile
160
+ dev-mlflow:
161
+ mise exec -- mlflow ui --host 0.0.0.0 --port 5000
162
+ open http://localhost:5000/
163
+ ```
164
+
165
+ ---
166
+
167
+ #### 08-default-targets-must-only-include-offline-subtargets
168
+
169
+ `make all`, `make test`, and `make lint` must include every subtarget that runs **offline** — meaning it requires no external credentials, no running servers, no paid APIs, and no environment-specific configuration outside the repository.
170
+
171
+ Subtargets that require external dependencies (e.g., `test-integration` against a live database, `test-e2e` against a staging environment, `lint-api` against a remote schema registry) **must** exist as named targets so developers can invoke them explicitly, but **must not** be invoked from `all`, `test`, or `lint`.
172
+
173
+ ---
174
+
153
175
  #### 06-monorepo-usage
154
176
 
155
177
  In a monorepo, each module has its own `Makefile` with its own `build`, `lint`, `test`, and `deploy` targets scoped to that module. Parent-level Makefiles (at the application or repo root) delegate to child Makefiles in sequence. The parent Makefile should call `$(MAKE) -C <child> <target>` directly, while each child `Makefile` runs its actual tool commands through `mise exec --`.
@@ -194,6 +216,9 @@ make lint-fix
194
216
  # run the software in dev mode (may have hot reload, debug tools enabled, verbose logging etc)
195
217
  make dev
196
218
 
219
+ # [AI projects only] start a local MLflow tracking server for development inspection
220
+ make dev-mlflow
221
+
197
222
  # run the software in production mode
198
223
  make run
199
224