agentme 0.14.0 → 0.15.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (20) hide show
  1. package/.filedist-package.yml +1 -1
  2. package/.xdrs/agentme/edrs/application/003-javascript-project-tooling.md +3 -3
  3. package/.xdrs/agentme/edrs/application/010-golang-project-tooling.md +3 -3
  4. package/.xdrs/agentme/edrs/application/014-python-project-tooling.md +2 -2
  5. package/.xdrs/agentme/edrs/application/015-cli-tool-standards.md +2 -2
  6. package/.xdrs/agentme/edrs/application/018-ai-llm-development-standards.md +180 -0
  7. package/.xdrs/agentme/edrs/application/019-ai-agents-development-standards.md +284 -0
  8. package/.xdrs/agentme/edrs/application/020-ai-workflow-development-standards.md +261 -0
  9. package/.xdrs/agentme/edrs/application/021-ai-eval-standards.md +90 -0
  10. package/.xdrs/agentme/edrs/application/{019-ml-dataset-structure.md → 024-ml-dataset-structure.md} +2 -2
  11. package/.xdrs/agentme/edrs/application/{020-ai-agent-xdrs-knowledge-layer.md → 025-ai-agent-xdrs-knowledge-layer.md} +4 -4
  12. package/.xdrs/agentme/edrs/application/{021-pragmatic-hexagonal-architecture.md → 026-pragmatic-hexagonal-architecture.md} +2 -2
  13. package/.xdrs/agentme/edrs/application/skills/001-create-javascript-project/SKILL.md +2 -2
  14. package/.xdrs/agentme/edrs/application/skills/003-create-golang-project/SKILL.md +2 -2
  15. package/.xdrs/agentme/edrs/application/skills/005-create-python-project/SKILL.md +3 -3
  16. package/.xdrs/agentme/edrs/index.md +7 -5
  17. package/.xdrs/agentme/edrs/principles/007-project-quality-standards.md +29 -1
  18. package/package.json +3 -3
  19. package/.xdrs/agentme/edrs/application/018-ai-agent-development-standards.md +0 -309
  20. package/.xdrs/agentme/edrs/application/024-llm-development-standards.md +0 -116
@@ -0,0 +1,261 @@
1
+ ---
2
+ name: agentme-edr-policy-020-ai-workflow-development-standards
3
+ description: Defines the standard toolchain, framework, observability, and workflow patterns for building LangGraph workflows in Python. Use when scaffolding, reviewing, or extending AI workflow projects that orchestrate LLM calls, agents, and algorithmic nodes. For simple LLM calls see agentme-edr-018, for agentic patterns see agentme-edr-019.
4
+ apply-to: AI workflow projects using LangGraph StateGraph built with Python
5
+ valid-from: 2026-06-05
6
+ ---
7
+
8
+ # agentme-edr-policy-020: AI workflow development standards
9
+
10
+ ## Context and Problem Statement
11
+
12
+ AI workflow projects vary widely in how they structure directed graphs, manage state, evaluate outputs, and test execution paths. Without a shared baseline, projects accumulate incompatible patterns for flow design, state management, and dataset-driven testing.
13
+
14
+ Which tools, frameworks, and design patterns should AI workflow projects follow to ensure reproducibility, testability, and maintainability?
15
+
16
+ ## Decision Outcome
17
+
18
+ **Use Python with LangGraph for flow orchestration and MLflow for experiment tracking and local evaluation.**
19
+
20
+ ### Details
21
+
22
+ #### 01-language-and-framework
23
+
24
+ Workflows MUST be built with **LangGraph**. Use LangGraph `StateGraph` to model each distinct workflow as an explicit directed graph with typed state.
25
+
26
+ For all direct LLM calls within workflow nodes, use LangChain per [agentme-edr-018](018-ai-llm-development-standards.md). For agent nodes with tool-invocation loops, use deepagents per [agentme-edr-019](019-ai-agents-development-standards.md).
27
+
28
+ #### 03-observability-and-experiment-tracking
29
+
30
+ Use **MLflow** for all workflow observability and evaluation:
31
+
32
+ - **Workflow-level tracking:** Wrap each workflow run with `mlflow.start_run()` to capture traces, parameters, and metrics locally.
33
+ - **LLM-level auto-tracing:** Enable LangChain auto-tracing per [agentme-edr-018](018-ai-llm-development-standards.md) rule `03-llm-observability` by calling `mlflow.langchain.autolog()` during application startup. This captures inputs, outputs, token counts, and latency for every LangChain call within workflow nodes.
34
+ - Log run parameters (model name, temperature, prompt version) and output metrics (accuracy, latency, token counts) using `mlflow.log_param` / `mlflow.log_metric`.
35
+ - Run a local MLflow tracking server with `mlflow ui` to inspect runs during development. Do not require a remote MLflow server for local development.
36
+
37
+ #### 04-dataset-driven-accuracy-measurement
38
+
39
+ Eval dataset and implementation requirements are defined in [agentme-edr-021](021-ai-eval-standards.md). Testing requirements (when evals are required, release gates) are defined in [agentme-edr-007](../principles/007-project-quality-standards.md) rule `09-ai-project-testing-requirements`.
40
+
41
+ #### 05-flow-documentation
42
+
43
+ Each workflow MUST be documented as a **Mermaid graph** in a `README.md`. The diagram must match the LangGraph `StateGraph` definition:
44
+
45
+ - Use `graph TD` or `graph LR` direction.
46
+ - Label each node with its Python function name.
47
+ - Label conditional edges with the condition expression.
48
+ - Update the diagram whenever the graph topology changes.
49
+
50
+ Example minimal diagram block:
51
+
52
+ ```mermaid
53
+ graph TD
54
+ A[fetch_context] --> B[draft_response]
55
+ B --> C{verify}
56
+ C -->|pass| D[output]
57
+ C -->|fail| B
58
+ ```
59
+
60
+ #### 06-verification-steps
61
+
62
+ Workflows MUST include at least one explicit verification node before producing final output:
63
+
64
+ - Model the verification step as a dedicated LangGraph node (e.g. `verify_output`).
65
+ - The node checks the draft output against defined acceptance criteria (schema validation, factual consistency check, rubric scoring, or LLM-as-judge call).
66
+ - On failure, the verification node MUST route back to the relevant generation node, not silently pass through.
67
+ - Log verification results (pass/fail, score, reason) as MLflow metrics on the current run.
68
+
69
+ #### 07-workflow-structure
70
+
71
+ Workflow logic MUST be organized as named workflows following [agentme-edr-026](026-pragmatic-hexagonal-architecture.md). Each workflow is an independent LangGraph `StateGraph` with a defined start node and end node, connecting LLM nodes, agent nodes, algorithmic nodes, states, routes, and decision nodes.
72
+
73
+ Workflows live inside `app/workflows/` (the application layer), while external integrations such as LLM providers, vector stores, and third-party APIs live under `adapters/connectors/` (the outbound adapter layer). Inbound interfaces (HTTP API, CLI) live under `adapters/` as inbound adapters.
74
+
75
+ For each workflow named `<workflow>`, the full project layout is:
76
+
77
+ ```text
78
+ lib/src/<package_name>/
79
+ adapters/
80
+ http/ # inbound: API server that triggers workflows
81
+ cli/ # inbound: CLI entry point (if applicable)
82
+ connectors/ # outbound: external resource integrations
83
+ openai/ # LLM provider connector
84
+ azure-openai/ # alternative LLM provider connector
85
+ postgres/ # database connector (if applicable)
86
+ vector-store/ # vector DB connector (if applicable)
87
+ app/
88
+ workflows/
89
+ <workflow>/
90
+ graph.py # StateGraph definition; entry point for the workflow
91
+ agents.py # deepagents agent definitions used by this workflow
92
+ states.py # Typed state dataclasses / TypedDicts
93
+ routes.py # Conditional edge functions
94
+ shared/ # infrastructure-agnostic utilities
95
+ ```
96
+
97
+ - `app/workflows/<workflow>/graph.py` MUST define and compile the `StateGraph` and expose a `graph` object that callers invoke.
98
+ - Tool calls within workflow nodes that interact with external systems MUST use connectors from `adapters/connectors/`, not inline API calls.
99
+ - Additional modules (prompts, schemas) MAY be added inside `app/workflows/<workflow>/` when they are specific to that workflow. Shared utilities belong in `shared/`.
100
+
101
+ #### 08-workflow-evals
102
+
103
+ Eval folder structure and script requirements are defined in [agentme-edr-021](021-ai-eval-standards.md).
104
+
105
+ #### 09-node-naming-conventions
106
+
107
+ LangGraph node names MUST follow a suffix convention that communicates the node's role at a glance. Names MUST be action-oriented and descriptive.
108
+
109
+ | Suffix | Node type | When to use |
110
+ |---|---|---|
111
+ | `_llm` | LLM call | Any node whose primary action is a direct LLM inference call (see [agentme-edr-018](018-ai-llm-development-standards.md)) |
112
+ | `_step` | Algorithmic step | Deterministic logic with no LLM involvement (transformation, validation, routing) |
113
+ | `_tool` | Tool/API call | A node that wraps a single external tool or API (e.g. a REST endpoint, DB query) |
114
+ | `_agent` | Subgraph agent | A node that invokes a nested subgraph containing its own tool-invocation cycle and LLM calls; use the **deepagents** library for these nodes (see [agentme-edr-019](019-ai-agents-development-standards.md)) |
115
+
116
+ The Python function implementing the node SHOULD share the same name as the node alias passed to `add_node`, so that graph definitions and stack traces remain unambiguous:
117
+
118
+ ```python
119
+ def draft_doc_llm(state): ...
120
+ graph.add_node("draft_doc_llm", draft_doc_llm)
121
+
122
+ # Tool node — calls the Stripe API
123
+ def stripe_api_tool(state): ...
124
+ graph.add_node("stripe_api_tool", stripe_api_tool)
125
+
126
+ # Agent node — uses deepagents for tool-invocation loop
127
+ def code_reviewer_agent(state): ...
128
+ graph.add_node("code_reviewer_agent", code_reviewer_agent)
129
+ ```
130
+
131
+ Names MUST NOT use generic labels such as `node1`, `process`, or `run`. Each name must clearly express what action the node performs.
132
+
133
+ #### 10-workflow-unit-testing
134
+
135
+ All LLM calls within workflow nodes are external API calls and MUST be mocked in unit tests per [agentme-edr-018](018-ai-llm-development-standards.md) rule `04-unit-test-mocking`. Workflow unit tests must run fully offline with no real LLM provider calls.
136
+
137
+ Choose the mock utility based on what the node under test expects from the model:
138
+
139
+ - Use **`FakeListChatModel`** when nodes only read `AIMessage.content` (e.g. a routing node that checks a text label).
140
+ - Use **`GenericFakeChatModel`** when any node in the workflow expects tool calls, structured outputs, or when the workflow contains `_agent` nodes that drive a tool-invocation loop.
141
+
142
+ **Example — workflow with plain-text LLM nodes:**
143
+
144
+ ```python
145
+ from langchain_core.language_models.fake_chat_models import FakeListChatModel
146
+
147
+ def test_document_workflow_approve_path():
148
+ # Responses consumed in node execution order
149
+ fake_model = FakeListChatModel(responses=["APPROVE", "Meets all criteria."])
150
+
151
+ workflow = DocumentWorkflow(model=fake_model)
152
+ result = workflow.run(input_doc)
153
+
154
+ assert result.status == "approved"
155
+ ```
156
+
157
+ **Example — workflow containing an agent node (`_agent` suffix):**
158
+
159
+ ```python
160
+ from langchain_core.language_models.fake_chat_models import GenericFakeChatModel
161
+ from langchain_core.messages import AIMessage
162
+
163
+ def test_document_workflow_with_agent_node():
164
+ tool_call_msg = AIMessage(
165
+ content="",
166
+ tool_calls=[{"name": "fetch_context", "args": {"doc_id": "42"}, "id": "c1"}]
167
+ )
168
+ agent_final_msg = AIMessage(content="Context retrieved successfully.")
169
+ routing_msg = AIMessage(content="APPROVE")
170
+
171
+ fake_model = GenericFakeChatModel(
172
+ messages=iter([tool_call_msg, agent_final_msg, routing_msg])
173
+ )
174
+
175
+ workflow = DocumentWorkflow(model=fake_model)
176
+ result = workflow.run(input_doc)
177
+
178
+ assert result.status == "approved"
179
+ ```
180
+
181
+ Workflows MUST accept the LLM instance as a constructor parameter so that unit tests can inject a fake. See the injectable LLM pattern in [agentme-edr-018](018-ai-llm-development-standards.md) rule `04-unit-test-mocking`.
182
+
183
+ #### 11-state-type-conventions
184
+
185
+ All TypedDict and dataclass types that represent LangGraph node or workflow state MUST end with `_state` in their name. This suffix signals at a glance that the type is a state boundary, not a plain data model.
186
+
187
+ **Naming reference:**
188
+
189
+ | Owner | Naming pattern | Example |
190
+ |---|---|---|
191
+ | Single agent / agent subgraph | `<agent_name>_agent_state` | `reviewer_agent_state` |
192
+ | Full workflow (`StateGraph`) | `<workflow_name>_workflow_state` | `document_workflow_state` |
193
+ | Named group of nodes sharing state | `<group_responsibility>_state` | `retrieval_pipeline_state` |
194
+
195
+ **Boundary rules:**
196
+
197
+ - Each agent or agent subgraph MUST define its own dedicated state type. Do not reuse or extend a generic state across unrelated agents.
198
+ - Each workflow (`StateGraph`) MUST define its own top-level state type. The workflow state is the authoritative boundary for that graph's inputs and outputs.
199
+ - When a group of nodes (not a full workflow and not a single agent) shares a state type, the type name MUST clearly reflect the shared responsibility. Generic names such as `shared_state`, `common_state`, or `global_state` are FORBIDDEN.
200
+ - Large workflows MUST NOT use a single monolithic state that all nodes read and write. Split the state into per-phase or per-agent state types scoped to the subgraph or set of nodes that produce or consume each field.
201
+
202
+ State type names SHOULD align with the agent or node names defined in rule `09-node-naming-conventions` (e.g., an agent node named `draft_doc_agent` has a state type named `draft_doc_agent_state`).
203
+
204
+ #### 12-workflow-naming-conventions
205
+
206
+ LangGraph `StateGraph` instances and their enclosing classes MUST be given a meaningful name that conveys the workflow's input, output, and/or behavior. The name MUST end with `Workflow` (PascalCase class) or `_workflow` (snake_case variable or directory).
207
+
208
+ Choose a name that summarises what the workflow consumes, processes, and produces — avoid generic labels such as `Pipeline`, `Flow`, `Graph`, or `Process`.
209
+
210
+ | Context | Pattern | Example |
211
+ |---|---|---|
212
+ | Python class | `<DescriptiveName>Workflow` | `FileMapJudgeReduceWorkflow` |
213
+ | Python variable / instance | `<descriptive_name>_workflow` | `file_map_judge_reduce_workflow` |
214
+ | Directory under `app/workflows/` | `<descriptive_name>_workflow` | `financial_report_analysis_workflow/` |
215
+
216
+ **Good names** communicate purpose at a glance:
217
+
218
+ - `FileMapJudgeReduceWorkflow` — maps files, judges each, then reduces results
219
+ - `FinancialReportAnalysisWorkflow` — analyses financial report inputs
220
+ - `MarketingCampaignExecutorWorkflow` — executes a marketing campaign end-to-end
221
+
222
+ **Bad names** (FORBIDDEN): `MainWorkflow`, `AgentGraph`, `ProcessFlow`, `Workflow1`, `RunGraph`.
223
+
224
+ #### 15-workflow-state-persistence
225
+
226
+ For long-running workflows that may need to be paused and resumed:
227
+
228
+ - Use LangGraph's built-in checkpointing with `MemorySaver` for development and testing.
229
+ - Use persistent checkpointers (e.g., `PostgresSaver`, or Redis-based checkpointers) for production workflows that need durability.
230
+ - Checkpoint state MUST be serializable (use TypedDict or dataclasses with JSON-compatible fields).
231
+ - Document the checkpoint strategy in the workflow's README.md.
232
+
233
+ **Example with MemorySaver (development):**
234
+
235
+ ```python
236
+ from langgraph.checkpoint.memory import MemorySaver
237
+
238
+ checkpointer = MemorySaver()
239
+ graph = workflow.compile(checkpointer=checkpointer)
240
+
241
+ # Resume from checkpoint
242
+ result = graph.invoke(input_state, config={"thread_id": "session-123"})
243
+ ```
244
+
245
+ **When to use checkpointing:**
246
+
247
+ - Workflows that take > 30 seconds to complete
248
+ - Workflows that require human-in-the-loop approval or input
249
+ - Workflows that are non-indempotent
250
+ - Workflows that may fail mid-execution and need to be retried from the last successful node
251
+ - Multi-session workflows where state persists across user interactions
252
+
253
+ ## References
254
+
255
+ - [agentme-edr-018](018-ai-llm-development-standards.md) — LLM development standards: LangChain framework, provider configuration, LLM observability, and unit test mocking
256
+ - [agentme-edr-019](019-ai-agents-development-standards.md) — Agent development standards: deepagents framework, tool-invocation loops, and agent patterns
257
+ - [agentme-edr-026](026-pragmatic-hexagonal-architecture.md) — Adapter/application layer separation that defines the project layout
258
+ - [agentme-edr-014](014-python-project-tooling.md) — Python project tooling and structure
259
+ - [agentme-edr-024](024-ml-dataset-structure.md) — ML dataset structure for eval datasets
260
+ - [agentme-edr-021](021-ai-eval-standards.md) — AI eval standards: folder structure, script requirements, and MLflow tracking
261
+ - [agentme-edr-007](../principles/007-project-quality-standards.md) — Project quality standards including AI-tier testing requirements (rule `09-ai-project-testing-requirements`)
@@ -0,0 +1,90 @@
1
+ ---
2
+ name: agentme-edr-policy-021-ai-eval-standards
3
+ description: Defines how to structure, write, and run eval tests for AI projects — folder layout, script requirements, and MLflow tracking. Use when implementing evals for LLM, Agent, or Workflow projects. For when evals are required see agentme-edr-007 rule 09-ai-project-testing-requirements.
4
+ apply-to: Python AI projects (LLM, Agent, or Workflow tier) that implement eval testing
5
+ valid-from: 2026-06-05
6
+ ---
7
+
8
+ # agentme-edr-policy-021: AI eval standards
9
+
10
+ ## Context and Problem Statement
11
+
12
+ Eval tests measure AI component accuracy against expected outputs using real LLM providers. Without a shared folder layout and script convention, eval setups diverge across LLM, Agent, and Workflow projects, making them hard to run, compare, and integrate into CI/CD pipelines.
13
+
14
+ How should eval tests be structured and run across all AI tiers?
15
+
16
+ ## Decision Outcome
17
+
18
+ **Use a per-component folder structure under `evals/` with a standardized Makefile interface and MLflow-backed scripts, applicable to LLM, Agent, and Workflow components.**
19
+
20
+ For when evals are required per AI tier, see [agentme-edr-007](../principles/007-project-quality-standards.md) rule `09-ai-project-testing-requirements`.
21
+
22
+ ### Details
23
+
24
+ #### 01-eval-folder-structure
25
+
26
+ For each AI component being evaluated (an LLM chain, agent, or workflow), create a corresponding directory under `evals/` at the same level as `lib/` and `examples/`:
27
+
28
+ ```text
29
+ evals/
30
+ <component>/
31
+ Makefile # eval targets for this component
32
+ dataset_<group>/ # one folder per eval group (see agentme-edr-024)
33
+ eval_<group>.py # evaluation script for each group
34
+ ```
35
+
36
+ Where `<component>` is the name of the LLM chain, agent, or workflow being evaluated (e.g., `summarizer`, `file_analyzer_agent`, `document_review_workflow`).
37
+
38
+ The per-component `evals/<component>/Makefile` MUST define:
39
+
40
+ | Target | Behaviour |
41
+ |---|---|
42
+ | `eval` | Runs all eval groups for the component |
43
+ | `eval-<group>` | Runs one named group (e.g. `eval-simple`, `eval-complex`) |
44
+
45
+ The module root Makefile MUST expose a `make eval` target that delegates to `eval` in every `evals/<component>/Makefile`:
46
+
47
+ ```makefile
48
+ eval:
49
+ $(MAKE) -C evals/summarizer eval
50
+ $(MAKE) -C evals/document_review_workflow eval
51
+ ```
52
+
53
+ #### 02-eval-script-requirements
54
+
55
+ Each `eval_<group>.py` script MUST:
56
+
57
+ - Load the dataset from `evals/<component>/dataset_<group>/` following [agentme-edr-024](024-ml-dataset-structure.md). For input/output pairs, use the JSONL format per `agentme-edr-024.04-complex-structured-datasets-must-use-jsonl`.
58
+ - Run every input through the live component against **real LLM providers** (not mocked responses), to capture model drift.
59
+ - Log per-sample and aggregate metrics to an MLflow experiment that runs **locally** — a remote MLflow server MUST NOT be required.
60
+ - Compare outputs to expected values using project-defined quality thresholds. Thresholds MUST be declared explicitly (e.g., in a Makefile variable or README).
61
+ - Exit with a non-zero status when any metric falls below its defined threshold, consistent with [agentme-edr-007](../principles/007-project-quality-standards.md) rule `07-statistical-models-must-have-eval-targets`.
62
+
63
+ **Example:**
64
+
65
+ ```python
66
+ import mlflow
67
+ from my_package.app.workflows.document_review_workflow.graph import graph
68
+
69
+ EVAL_MIN_ACCURACY = 0.85
70
+
71
+ with mlflow.start_run():
72
+ results = []
73
+ for sample in load_dataset("evals/document_review_workflow/dataset_basic/"):
74
+ output = graph.invoke({"document": sample["input"]})
75
+ results.append(output["label"] == sample["expected_label"])
76
+
77
+ accuracy = sum(results) / len(results)
78
+ mlflow.log_metric("accuracy", accuracy)
79
+
80
+ if accuracy < EVAL_MIN_ACCURACY:
81
+ raise SystemExit(f"Eval failed: accuracy {accuracy:.2f} < {EVAL_MIN_ACCURACY}")
82
+ ```
83
+
84
+ ## References
85
+
86
+ - [agentme-edr-007](../principles/007-project-quality-standards.md) — Project quality standards: when evals are required per AI tier (rule `09-ai-project-testing-requirements`) and statistical model eval targets (rule `07-statistical-models-must-have-eval-targets`)
87
+ - [agentme-edr-018](018-ai-llm-development-standards.md) — LLM development standards: LangChain framework and observability
88
+ - [agentme-edr-019](019-ai-agents-development-standards.md) — Agent development standards
89
+ - [agentme-edr-020](020-ai-workflow-development-standards.md) — Workflow development standards
90
+ - [agentme-edr-024](024-ml-dataset-structure.md) — ML dataset structure for eval datasets
@@ -1,11 +1,11 @@
1
1
  ---
2
- name: agentme-edr-policy-019-ml-dataset-structure
2
+ name: agentme-edr-policy-024-ml-dataset-structure
3
3
  description: Defines the standard folder layout and file conventions for ML datasets used in AI/ML projects. Use when creating, organizing, or consuming datasets for machine learning tasks such as image labeling, document extraction, tabular data, LLM evaluation, and Q&A sets.
4
4
  apply-to: ML and AI projects that produce or consume datasets
5
5
  valid-from: 2026-05-27
6
6
  ---
7
7
 
8
- # agentme-edr-policy-019: ML dataset structure
8
+ # agentme-edr-policy-024: ML dataset structure
9
9
 
10
10
  ## Context and Problem Statement
11
11
 
@@ -1,11 +1,11 @@
1
1
  ---
2
- name: agentme-edr-policy-020-ai-agent-xdrs-knowledge-layer
2
+ name: agentme-edr-policy-025-ai-agent-xdrs-knowledge-layer
3
3
  description: Defines how to integrate XDRS as the runtime knowledge source of truth for AI agents — covering document placement, AGENTS.md setup, file tools, and local sandbox configuration. Apply only when the project explicitly uses XDRS to govern agent behavior.
4
4
  apply-to: AI agent projects that use XDRS as the source of truth for policies and skills
5
5
  valid-from: 2026-05-27
6
6
  ---
7
7
 
8
- # agentme-edr-policy-020: AI agent XDRS knowledge layer
8
+ # agentme-edr-policy-025: AI agent XDRS knowledge layer
9
9
 
10
10
  ## Context and Problem Statement
11
11
 
@@ -17,7 +17,7 @@ How should an AI agent project integrate XDRS as its runtime source of truth for
17
17
 
18
18
  **Embed XDRS documents in `lib/data/.xdrs/`, instruct the agent to consult them via `AGENTS.md`, equip the agent with sandboxed file tools, and use the deepagents framework when a local sandbox is required.**
19
19
 
20
- This policy MUST only be applied when the project explicitly chooses XDRS as its knowledge governance layer. It is not required by [agentme-edr-018](018-ai-agent-development-standards.md) in general.
20
+ This policy MUST only be applied when the project explicitly chooses XDRS as its knowledge governance layer. It is not required by [agentme-edr-019](019-ai-agents-development-standards.md) or [agentme-edr-020](020-ai-workflow-development-standards.md) in general.
21
21
 
22
22
  ### Details
23
23
 
@@ -91,7 +91,7 @@ data_root = str(files("myagent").joinpath("data"))
91
91
  agents_md = Path(temp_root) / "AGENTS.md"
92
92
  agents_md.write_text(_AGENTS_MD) # content from xdrs-core AGENTS.md template; see rule 01-xdrs-knowledge-layer
93
93
 
94
- # Add these mounts alongside the base mounts from agentme-edr-018 rule 09-local-sandbox:
94
+ # Add these mounts alongside the base mounts from agentme-edr-019 rule 02-local-sandbox:
95
95
  xdrs_mounts = [
96
96
  {"src": f"{data_root}/.xdrs", "dst": "/.xdrs", "readonly": True},
97
97
  {"src": str(agents_md), "dst": "/AGENTS.md", "readonly": True},
@@ -1,11 +1,11 @@
1
1
  ---
2
- name: agentme-edr-policy-021-pragmatic-hexagonal-architecture
2
+ name: agentme-edr-policy-026-pragmatic-hexagonal-architecture
3
3
  description: Defines a pragmatic variant of Hexagonal Architecture for organizing application source code into Adapters (inbound/outbound I/O boundaries) and Application (business logic) layers, with explicit naming conventions and folder structure. Use when designing or reviewing the internal layout of application modules.
4
4
  apply-to: All application projects
5
5
  valid-from: 2026-05-28
6
6
  ---
7
7
 
8
- # agentme-edr-policy-021: Pragmatic hexagonal architecture
8
+ # agentme-edr-policy-026: Pragmatic hexagonal architecture
9
9
 
10
10
  ## Context and Problem Statement
11
11
 
@@ -15,12 +15,12 @@ compatibility: JavaScript/TypeScript, Node.js 18+
15
15
 
16
16
  Creates a complete JavaScript/TypeScript project from scratch. The layout keeps the
17
17
  package self-contained in its module root (`lib/`), organizes internal code following
18
- [agentme-edr-021](../../021-pragmatic-hexagonal-architecture.md) (`adapters/`, `app/`, `shared/`),
18
+ [agentme-edr-026](../../026-pragmatic-hexagonal-architecture.md) (`adapters/`, `app/`, `shared/`),
19
19
  places runnable consumer examples in the sibling `examples/` folder, redirects persistent caches
20
20
  into `.cache/`, and uses Makefiles as the only entry points. Boilerplate is derived from the
21
21
  [filedist](https://github.com/flaviostutz/filedist) project.
22
22
 
23
- Related EDRs: [agentme-edr-003](../../003-javascript-project-tooling.md), [agentme-edr-016](../../../principles/016-cross-language-module-structure.md), [agentme-edr-021](../../021-pragmatic-hexagonal-architecture.md)
23
+ Related EDRs: [agentme-edr-003](../../003-javascript-project-tooling.md), [agentme-edr-016](../../../principles/016-cross-language-module-structure.md), [agentme-edr-026](../../026-pragmatic-hexagonal-architecture.md)
24
24
 
25
25
  ## Instructions
26
26
 
@@ -12,9 +12,9 @@ compatibility: Go 1.21+
12
12
 
13
13
  ## Overview
14
14
 
15
- Creates a complete Go project from scratch, following the layout from [agentme-edr-010](../../010-golang-project-tooling.md) and [agentme-edr-021](../../021-pragmatic-hexagonal-architecture.md). Business logic lives in `app/<feature>/` packages; CLI wiring lives in `adapters/cli/`; outbound integrations live in `adapters/connectors/`; `main.go` is a thin dispatcher. The module root owns its `Makefile`, `README.md`, `dist/`, and `.cache/` folders.
15
+ Creates a complete Go project from scratch, following the layout from [agentme-edr-010](../../010-golang-project-tooling.md) and [agentme-edr-026](../../026-pragmatic-hexagonal-architecture.md). Business logic lives in `app/<feature>/` packages; CLI wiring lives in `adapters/cli/`; outbound integrations live in `adapters/connectors/`; `main.go` is a thin dispatcher. The module root owns its `Makefile`, `README.md`, `dist/`, and `.cache/` folders.
16
16
 
17
- Related EDRs: [agentme-edr-010](../../010-golang-project-tooling.md), [agentme-edr-016](../../../principles/016-cross-language-module-structure.md), [agentme-edr-021](../../021-pragmatic-hexagonal-architecture.md)
17
+ Related EDRs: [agentme-edr-010](../../010-golang-project-tooling.md), [agentme-edr-016](../../../principles/016-cross-language-module-structure.md), [agentme-edr-026](../../026-pragmatic-hexagonal-architecture.md)
18
18
 
19
19
  ## Instructions
20
20
 
@@ -14,11 +14,11 @@ compatibility: Python 3.12+
14
14
 
15
15
  Creates a complete Python project from scratch using Mise, `uv`, `pyproject.toml`, Ruff,
16
16
  ty, Pytest, and Makefiles. The layout keeps the package self-contained under `lib/`,
17
- organizes internal code following [agentme-edr-021](../../021-pragmatic-hexagonal-architecture.md)
17
+ organizes internal code following [agentme-edr-026](../../026-pragmatic-hexagonal-architecture.md)
18
18
  (`adapters/`, `app/`, `shared/`), uses a shared root `.venv/`, redirects persistent caches into
19
19
  `.cache/`, and places runnable consumer projects under the sibling `examples/` folder.
20
20
 
21
- Related EDRs: [agentme-edr-014](../../014-python-project-tooling.md), [agentme-edr-016](../../../principles/016-cross-language-module-structure.md), [agentme-edr-021](../../021-pragmatic-hexagonal-architecture.md)
21
+ Related EDRs: [agentme-edr-014](../../014-python-project-tooling.md), [agentme-edr-016](../../../principles/016-cross-language-module-structure.md), [agentme-edr-026](../../026-pragmatic-hexagonal-architecture.md)
22
22
 
23
23
  ## Instructions
24
24
 
@@ -282,7 +282,7 @@ make test
282
282
 
283
283
  ### Phase 4: Create the package and tests inside `lib/`
284
284
 
285
- Create this baseline structure following [agentme-edr-021](../../021-pragmatic-hexagonal-architecture.md).
285
+ Create this baseline structure following [agentme-edr-026](../../026-pragmatic-hexagonal-architecture.md).
286
286
 
287
287
  **`lib/src/[package_name]/__init__.py`**
288
288
 
@@ -31,11 +31,13 @@ Language and framework-specific tooling and project structure.
31
31
  - [agentme-edr-010](application/010-golang-project-tooling.md) - **Go project tooling and structure** - Scaffold Go CLIs and libraries with the standard layout *(includes skill: [003-create-golang-project](application/skills/003-create-golang-project/SKILL.md))*
32
32
  - [agentme-edr-014](application/014-python-project-tooling.md) - **Python project tooling and structure** - Scaffold Python packages and CLIs with the standard layout *(includes skill: [005-create-python-project](application/skills/005-create-python-project/SKILL.md))*
33
33
  - [agentme-edr-015](application/015-cli-tool-standards.md) - **CLI tool standards** - Define command UX and behavior for CLI tools
34
- - [agentme-edr-018](application/018-ai-agent-development-standards.md) - **AI agent development standards** - Standard toolchain, framework, evaluation, and workflow patterns for AI agent and LangGraph workflow projects built with Python
35
- - [agentme-edr-019](application/019-ml-dataset-structure.md) - **ML dataset structure** - Standard folder layout and file conventions for ML datasets
36
- - [agentme-edr-024](application/024-llm-development-standards.md) - **LLM development standards** - Default framework (LangChain), provider compatibility, observability, and conceptual model (LLM / Agent / Workflow) for LLM-based applications
37
- - [agentme-edr-020](application/020-ai-agent-xdrs-knowledge-layer.md) - **AI agent XDRS knowledge layer** - How to integrate XDRS as the runtime source of truth for policies and skills in AI agents (apply only when the project explicitly uses XDRS)
38
- - [agentme-edr-021](application/021-pragmatic-hexagonal-architecture.md) - **Pragmatic hexagonal architecture** - Organize application layers as External/Adapters/Application with practical coupling rules
34
+ - [agentme-edr-018](application/018-ai-llm-development-standards.md) - **AI LLM development standards** - Standard framework (LangChain) and patterns for simple LLM calls with explicit configuration (no environment variables)
35
+ - [agentme-edr-019](application/019-ai-agents-development-standards.md) - **AI agents development standards** - Standard framework (deepagents) and patterns for agentic tool-invocation loops
36
+ - [agentme-edr-020](application/020-ai-workflow-development-standards.md) - **AI workflow development standards** - Standard toolchain (LangGraph), evaluation, and testing patterns for workflow projects
37
+ - [agentme-edr-021](application/021-ai-eval-standards.md) - **AI eval standards** - Folder structure, script requirements, and MLflow tracking for eval tests across LLM, Agent, and Workflow tiers
38
+ - [agentme-edr-024](application/024-ml-dataset-structure.md) - **ML dataset structure** - Standard folder layout and file conventions for ML datasets
39
+ - [agentme-edr-025](application/025-ai-agent-xdrs-knowledge-layer.md) - **AI agent XDRS knowledge layer** - How to integrate XDRS as the runtime source of truth for policies and skills in AI agents (apply only when the project explicitly uses XDRS)
40
+ - [agentme-edr-026](application/026-pragmatic-hexagonal-architecture.md) - **Pragmatic hexagonal architecture** - Organize application layers as External/Adapters/Application with practical coupling rules
39
41
  - [004-select-relevant-xdrs](application/skills/004-select-relevant-xdrs/SKILL.md) - **Select relevant XDRs**
40
42
 
41
43
  ## Devops
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: agentme-edr-policy-007-project-quality-standards
3
- description: Defines minimum project quality standards for README onboarding, testing (unit and integration), linting, XDR compliance, and runnable examples. Use when scaffolding or reviewing projects.
3
+ description: Defines minimum project quality standards for README onboarding, testing (unit, integration, and AI-tier evals), linting, XDR compliance, and runnable examples. Use when scaffolding or reviewing projects.
4
4
  apply-to: All projects
5
5
  valid-from: 2026-05-25
6
6
  ---
@@ -230,3 +230,31 @@ test-integration:
230
230
  ```
231
231
 
232
232
  Projects are not required to implement integration tests, but when present, they SHOULD follow these conventions for consistency across the codebase.
233
+
234
+ ---
235
+
236
+ #### 09-ai-project-testing-requirements
237
+
238
+ AI projects are classified into three tiers — LLM, Agent, and Workflow — defined in [agentme-edr-018](../application/018-ai-llm-development-standards.md). Testing requirements differ per tier:
239
+
240
+ | Tier | Unit tests | Evals | Integration tests |
241
+ |---|---|---|---|
242
+ | **LLM** ([agentme-edr-018](../application/018-ai-llm-development-standards.md)) | Not required | Not required; SHOULD be used when critical prompts are in use to measure accuracy and detect model drift | Not required |
243
+ | **Agent** ([agentme-edr-019](../application/019-ai-agents-development-standards.md)) | Not required | Not required; MAY be used | Not required |
244
+ | **Workflow** ([agentme-edr-020](../application/020-ai-workflow-development-standards.md)) | **Required** — see below | **Required** before every release; failed evals block release | Advised |
245
+
246
+ **Workflow unit test requirements:**
247
+
248
+ - MUST use mocked LLM providers. See [agentme-edr-018](../application/018-ai-llm-development-standards.md) rule `04-unit-test-mocking` for the mocking pattern.
249
+ - MUST run offline with no external dependencies per [agentme-edr-004](004-unit-test-requirements.md) rule `02-must-run-offline`.
250
+ - MUST achieve 80% code coverage per [agentme-edr-004](004-unit-test-requirements.md) rule `03-must-maintain-80-percent-coverage`.
251
+ - MUST test workflow routing logic, conditional edges, state transformations, and error handling.
252
+ - MUST achieve **80% coverage of LangGraph graph edges and branches**: every conditional edge MUST have test cases covering each possible branch, and every node→node transition MUST be exercised by at least one test.
253
+ - Files MUST be named `<name>_test.py` and placed alongside the source file per [agentme-edr-004](004-unit-test-requirements.md) rule `04-must-place-test-files-alongside-source`.
254
+
255
+ **Workflow eval requirements:**
256
+
257
+ - Evals MUST be executed before every release.
258
+ - Accuracy below project-defined thresholds MUST block the release. Thresholds MUST be documented in the eval Makefile or README.
259
+ - Evals MUST run against real LLM providers (not mocks) to capture model drift.
260
+ - For eval folder structure and script requirements, see [agentme-edr-021](../application/021-ai-eval-standards.md).
package/package.json CHANGED
@@ -1,9 +1,9 @@
1
1
  {
2
2
  "name": "agentme",
3
- "version": "0.14.0",
3
+ "version": "0.15.0",
4
4
  "description": "",
5
5
  "dependencies": {
6
- "filedist": "^0.34.1"
6
+ "filedist": "^0.34.2"
7
7
  },
8
8
  "bin": "bin/filedist.js",
9
9
  "files": [
@@ -22,6 +22,6 @@
22
22
  "url": "https://github.com/flaviostutz/agentme.git"
23
23
  },
24
24
  "devDependencies": {
25
- "xdrs-core": "^0.28.1"
25
+ "xdrs-core": "^0.28.3"
26
26
  }
27
27
  }