agentme 0.14.0 → 0.16.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (23) hide show
  1. package/.filedist-package.yml +1 -1
  2. package/.xdrs/agentme/edrs/application/003-javascript-project-tooling.md +3 -3
  3. package/.xdrs/agentme/edrs/application/010-golang-project-tooling.md +3 -3
  4. package/.xdrs/agentme/edrs/application/014-python-project-tooling.md +2 -2
  5. package/.xdrs/agentme/edrs/application/015-cli-tool-standards.md +6 -7
  6. package/.xdrs/agentme/edrs/application/018-ai-llm-development-standards.md +181 -0
  7. package/.xdrs/agentme/edrs/application/019-ai-agents-development-standards.md +286 -0
  8. package/.xdrs/agentme/edrs/application/020-ai-workflow-development-standards.md +262 -0
  9. package/.xdrs/agentme/edrs/application/021-ai-eval-standards.md +90 -0
  10. package/.xdrs/agentme/edrs/application/{019-ml-dataset-structure.md → 024-ml-dataset-structure.md} +2 -2
  11. package/.xdrs/agentme/edrs/application/{020-ai-agent-xdrs-knowledge-layer.md → 025-ai-agent-xdrs-knowledge-layer.md} +4 -4
  12. package/.xdrs/agentme/edrs/application/{021-pragmatic-hexagonal-architecture.md → 026-pragmatic-hexagonal-architecture.md} +2 -2
  13. package/.xdrs/agentme/edrs/application/skills/001-create-javascript-project/SKILL.md +2 -2
  14. package/.xdrs/agentme/edrs/application/skills/003-create-golang-project/SKILL.md +2 -2
  15. package/.xdrs/agentme/edrs/application/skills/005-create-python-project/SKILL.md +3 -3
  16. package/.xdrs/agentme/edrs/devops/008-common-targets.md +28 -3
  17. package/.xdrs/agentme/edrs/devops/027-environment-variable-configuration.md +158 -0
  18. package/.xdrs/agentme/edrs/index.md +8 -5
  19. package/.xdrs/agentme/edrs/principles/007-project-quality-standards.md +32 -1
  20. package/.xdrs/agentme/edrs/principles/022-secrets-management.md +20 -0
  21. package/package.json +3 -3
  22. package/.xdrs/agentme/edrs/application/018-ai-agent-development-standards.md +0 -309
  23. package/.xdrs/agentme/edrs/application/024-llm-development-standards.md +0 -116
@@ -0,0 +1,158 @@
1
+ ---
2
+ name: agentme-edr-policy-027-environment-variable-configuration-files
3
+ description: Defines when to use YAML config files versus .env files for configuration, how to combine them, and how .env is loaded for spawned processes. Use when setting up project configuration for any application, CLI, or library.
4
+ apply-to: All projects that use environment variables for configuration
5
+ valid-from: 2026-06-09
6
+ ---
7
+
8
+ # agentme-edr-policy-027: Environment variable configuration files
9
+
10
+ ## Context and Problem Statement
11
+
12
+ Projects need a consistent way to define non-secret configuration — service URLs, feature flags, port numbers, runtime modes — that varies across environments. Ad-hoc approaches (hardcoded defaults, scattered exports, application-level dotenv loaders, and flat env-var-only configs) lead to inconsistent behavior and unclear ownership of configuration.
13
+
14
+ CLI tools additionally need to handle multi-attribute invocation configuration without forcing users to provide every value as a flag. At the same time, some of those values may be environment-specific and must not be committed to the repository.
15
+
16
+ How should projects manage environment variable configuration and CLI invocation configuration across local development, deployment stages, and Makefiles?
17
+
18
+ ## Decision Outcome
19
+
20
+ **Use YAML config files for CLI invocation configuration with multiple attributes; use `.env` files to supply environment variables to spawned processes and to hold uncommitted values referenced by config files. Load `.env` exclusively at process launch time — never inside application code.**
21
+
22
+ Secrets (API keys, passwords, tokens) must never be placed in `.env` files. Those are handled by [agentme-edr-022](../principles/022-secrets-management.md).
23
+
24
+ ### Details
25
+
26
+ #### 01-when-to-use-dotenv
27
+
28
+ Use a `.env` file when either of the following is true:
29
+
30
+ 1. **Spawned process needs env vars** — the project launches a process (a deployable service, background worker, or shell script) that reads configuration from OS environment variables such as port numbers or API endpoint URLs.
31
+ 2. **Value must not be committed** — a configuration value used in a YAML config file (see rule 07) is environment-specific or sensitive enough to exclude from version control. In that case, store the value in `.env` and reference it from the YAML file using env var substitution (see rule 08).
32
+
33
+ Do not use `.env` as a general-purpose configuration store when a YAML config file is the right tool (see rule 07).
34
+
35
+ Example `.env` for a service with process-level env vars:
36
+ ```
37
+ SERVER_URL=http://localhost:8080
38
+ LOG_LEVEL=debug
39
+ FEATURE_FLAG_NEW_UI=false
40
+ ```
41
+
42
+ ---
43
+
44
+ #### 02-dotenv-not-committed
45
+
46
+ `.env` must be listed in `.gitignore` and must never be committed to the repository. It is intended for local use in standalone projects and libraries that do not have a formal deployment pipeline.
47
+
48
+ ---
49
+
50
+ #### 03-dotenv-example-committed
51
+
52
+ A `.env.example` file must be committed alongside `.env`. It contains all the same variable names with placeholder or illustrative values — no real URLs, credentials, or server names. This file documents what configuration is expected without exposing real values.
53
+
54
+ Example `.env.example`:
55
+ ```
56
+ SERVER_URL=http://localhost:8080
57
+ LOG_LEVEL=debug
58
+ FEATURE_FLAG_NEW_UI=false
59
+ ```
60
+
61
+ ---
62
+
63
+ #### 04-stage-specific-dotenv-committed
64
+
65
+ Stage-specific overrides must use the naming convention `.env.[stage]` (e.g., `.env.production`, `.env.staging`, `.env.test`). These files may be committed to the repository because they carry deployment-stage configuration rather than local developer configuration. They are used during deployment pipelines where the stage is known and explicit.
66
+
67
+ The generic `.env` must still not be committed. The distinction is: `.env` is for local, ad-hoc, standalone use; `.env.[stage]` is for deployment pipelines with a defined environment identity.
68
+
69
+ ---
70
+
71
+ #### 05-load-in-makefile-before-processes
72
+
73
+ When `.env` defines variables consumed by shell scripts or spawned processes, the Makefile must load and export them before invoking those processes. Use the following pattern at the top of the relevant Makefile or in a shared include:
74
+
75
+ ```makefile
76
+ ifneq (,$(wildcard .env))
77
+ include .env
78
+ export
79
+ endif
80
+ ```
81
+
82
+ This ensures all variables in `.env` are available as environment variables to every child process spawned by `make`. The `ifneq` guard prevents errors when `.env` does not exist (e.g., in CI or fresh checkouts).
83
+
84
+ ---
85
+
86
+ #### 06-no-application-level-dotenv-loading
87
+
88
+ Applications must not load `.env` files directly inside their own code using dotenv libraries or equivalent mechanisms. Configuration must enter the process exclusively as OS-level environment variables, set before the process is launched (by the Makefile, a shell script, CI, or a container runtime).
89
+
90
+ Prohibited patterns:
91
+
92
+ ```python
93
+ # Python — disallowed
94
+ from dotenv import load_dotenv
95
+ load_dotenv()
96
+ ```
97
+
98
+ ```typescript
99
+ // TypeScript — disallowed
100
+ import dotenv from "dotenv";
101
+ dotenv.config();
102
+ ```
103
+
104
+ ```go
105
+ // Go — disallowed
106
+ godotenv.Load()
107
+ ```
108
+
109
+ Permitted pattern: set env vars in the Makefile (see rule 05), then launch the application normally. Inside application code, read configuration only from `os.environ`, `process.env`, or the standard OS environment API for the language.
110
+
111
+ This rule prevents two parallel loading paths — OS env and file-based env — from coexisting invisibly in the same process.
112
+
113
+ ---
114
+
115
+ #### 07-cli-adapters-use-yaml-config
116
+
117
+ CLI adapters with multiple configuration attributes must use a YAML config file rather than env vars or flags for those attributes. This applies whenever configuration is nested, repetitive, or too verbose for flags alone.
118
+
119
+ The CLI layer is responsible for loading and parsing the YAML file and passing the resolved values to the application layer. The application layer must not read the config file directly.
120
+
121
+ Default config file discovery should follow the pattern defined in [agentme-edr-015](../application/015-cli-tool-standards.md): load `[cwd]/[tool-name].yml` by default, or an explicit path provided via `--config`.
122
+
123
+ Example `myconfig.yml`:
124
+ ```yaml
125
+ openapi_endpoint: https://example.com/openapi
126
+ log_level: debug
127
+ max_retries: 3
128
+ ```
129
+
130
+ ---
131
+
132
+ #### 08-env-var-substitution-in-config-files
133
+
134
+ When a YAML config file contains a value that must not be committed (such as a real endpoint URL, a username, or any other environment-specific value), that value must be expressed as an environment variable reference using `${VAR_NAME}` syntax, and the actual value must be defined in `.env`.
135
+
136
+ This keeps the YAML file committable while keeping the environment-specific value out of the repository.
137
+
138
+ Example:
139
+
140
+ `.env` (not committed):
141
+ ```
142
+ OPENAPI_ENDPOINT=https://real-server.example.com/openapi
143
+ ```
144
+
145
+ `myconfig.yml` (committed):
146
+ ```yaml
147
+ openapi_endpoint: ${OPENAPI_ENDPOINT}
148
+ log_level: debug
149
+ ```
150
+
151
+ The `.env` file must be loaded in the Makefile before launching the process (see rule 05) so the variable is available when the CLI or process reads the config file.
152
+
153
+ ## References
154
+
155
+ - [agentme-edr-022](../principles/022-secrets-management.md) - Secrets must use OS keychains or cloud secret managers, not `.env` files
156
+ - [agentme-edr-017](017-tool-execution-and-scripting.md) - Makefiles are the authoritative command entry point; rule 05 above integrates with that standard
157
+ - [agentme-edr-008](008-common-targets.md) - Standard Makefile target names
158
+ - [agentme-edr-015](../application/015-cli-tool-standards.md) - CLI config file discovery and CLI-to-application separation
@@ -31,11 +31,13 @@ Language and framework-specific tooling and project structure.
31
31
  - [agentme-edr-010](application/010-golang-project-tooling.md) - **Go project tooling and structure** - Scaffold Go CLIs and libraries with the standard layout *(includes skill: [003-create-golang-project](application/skills/003-create-golang-project/SKILL.md))*
32
32
  - [agentme-edr-014](application/014-python-project-tooling.md) - **Python project tooling and structure** - Scaffold Python packages and CLIs with the standard layout *(includes skill: [005-create-python-project](application/skills/005-create-python-project/SKILL.md))*
33
33
  - [agentme-edr-015](application/015-cli-tool-standards.md) - **CLI tool standards** - Define command UX and behavior for CLI tools
34
- - [agentme-edr-018](application/018-ai-agent-development-standards.md) - **AI agent development standards** - Standard toolchain, framework, evaluation, and workflow patterns for AI agent and LangGraph workflow projects built with Python
35
- - [agentme-edr-019](application/019-ml-dataset-structure.md) - **ML dataset structure** - Standard folder layout and file conventions for ML datasets
36
- - [agentme-edr-024](application/024-llm-development-standards.md) - **LLM development standards** - Default framework (LangChain), provider compatibility, observability, and conceptual model (LLM / Agent / Workflow) for LLM-based applications
37
- - [agentme-edr-020](application/020-ai-agent-xdrs-knowledge-layer.md) - **AI agent XDRS knowledge layer** - How to integrate XDRS as the runtime source of truth for policies and skills in AI agents (apply only when the project explicitly uses XDRS)
38
- - [agentme-edr-021](application/021-pragmatic-hexagonal-architecture.md) - **Pragmatic hexagonal architecture** - Organize application layers as External/Adapters/Application with practical coupling rules
34
+ - [agentme-edr-018](application/018-ai-llm-development-standards.md) - **AI LLM development standards** - Standard framework (LangChain) and patterns for simple LLM calls with explicit configuration (no environment variables)
35
+ - [agentme-edr-019](application/019-ai-agents-development-standards.md) - **AI agents development standards** - Standard framework (deepagents) and patterns for agentic tool-invocation loops
36
+ - [agentme-edr-020](application/020-ai-workflow-development-standards.md) - **AI workflow development standards** - Standard toolchain (LangGraph), evaluation, and testing patterns for workflow projects
37
+ - [agentme-edr-021](application/021-ai-eval-standards.md) - **AI eval standards** - Folder structure, script requirements, and MLflow tracking for eval tests across LLM, Agent, and Workflow tiers
38
+ - [agentme-edr-024](application/024-ml-dataset-structure.md) - **ML dataset structure** - Standard folder layout and file conventions for ML datasets
39
+ - [agentme-edr-025](application/025-ai-agent-xdrs-knowledge-layer.md) - **AI agent XDRS knowledge layer** - How to integrate XDRS as the runtime source of truth for policies and skills in AI agents (apply only when the project explicitly uses XDRS)
40
+ - [agentme-edr-026](application/026-pragmatic-hexagonal-architecture.md) - **Pragmatic hexagonal architecture** - Organize application layers as External/Adapters/Application with practical coupling rules
39
41
  - [004-select-relevant-xdrs](application/skills/004-select-relevant-xdrs/SKILL.md) - **Select relevant XDRs**
40
42
 
41
43
  ## Devops
@@ -46,6 +48,7 @@ Repository structure, build conventions, and CI/CD pipelines.
46
48
  - [agentme-edr-006](devops/006-github-pipelines.md) - **GitHub CI/CD pipelines** - Define required CI stages and workflow structure
47
49
  - [agentme-edr-008](devops/008-common-targets.md) - **Common development script names** - Reuse standard build, lint, and test target names
48
50
  - [agentme-edr-017](devops/017-tool-execution-and-scripting.md) - **Tool execution and scripting** - Run tools consistently across shells, Makefiles, and CI
51
+ - [agentme-edr-027](devops/027-environment-variable-configuration.md) - **Environment variable configuration files** - Manage non-secret configuration with `.env` files, `.gitignore` rules, stage variants, and Makefile loading
49
52
 
50
53
  ## Governance
51
54
 
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: agentme-edr-policy-007-project-quality-standards
3
- description: Defines minimum project quality standards for README onboarding, testing (unit and integration), linting, XDR compliance, and runnable examples. Use when scaffolding or reviewing projects.
3
+ description: Defines minimum project quality standards for README onboarding, testing (unit, integration, and AI-tier evals), linting, XDR compliance, and runnable examples. Use when scaffolding or reviewing projects.
4
4
  apply-to: All projects
5
5
  valid-from: 2026-05-25
6
6
  ---
@@ -145,6 +145,7 @@ Projects that are libraries or shared utilities must include an `examples/` dire
145
145
  **Root Makefile:**
146
146
 
147
147
  ```makefile
148
+ # test-examples runs the examples offline (no external services) → include in test
148
149
  test: test-unit test-examples
149
150
 
150
151
  test-unit:
@@ -154,6 +155,8 @@ test-examples:
154
155
  $(MAKE) -C examples
155
156
  ```
156
157
 
158
+ If examples require live services or credentials, remove `test-examples` from the `test` dependency list and keep it as a standalone named target only. See [agentme-edr-008](../devops/008-common-targets.md) rule 08 for the full offline/online decision table.
159
+
157
160
  **Examples Makefile:**
158
161
 
159
162
  ```makefile
@@ -230,3 +233,31 @@ test-integration:
230
233
  ```
231
234
 
232
235
  Projects are not required to implement integration tests, but when present, they SHOULD follow these conventions for consistency across the codebase.
236
+
237
+ ---
238
+
239
+ #### 09-ai-project-testing-requirements
240
+
241
+ AI projects are classified into three tiers — LLM, Agent, and Workflow — defined in [agentme-edr-018](../application/018-ai-llm-development-standards.md). Testing requirements differ per tier:
242
+
243
+ | Tier | Unit tests | Evals | Integration tests |
244
+ |---|---|---|---|
245
+ | **LLM** ([agentme-edr-018](../application/018-ai-llm-development-standards.md)) | Not required | Not required; SHOULD be used when critical prompts are in use to measure accuracy and detect model drift | Not required |
246
+ | **Agent** ([agentme-edr-019](../application/019-ai-agents-development-standards.md)) | Not required | Not required; MAY be used | Not required |
247
+ | **Workflow** ([agentme-edr-020](../application/020-ai-workflow-development-standards.md)) | **Required** — see below | **Required** before every release; failed evals block release | Advised |
248
+
249
+ **Workflow unit test requirements:**
250
+
251
+ - MUST use mocked LLM providers. See [agentme-edr-018](../application/018-ai-llm-development-standards.md) rule `04-unit-test-mocking` for the mocking pattern.
252
+ - MUST run offline with no external dependencies per [agentme-edr-004](004-unit-test-requirements.md) rule `02-must-run-offline`.
253
+ - MUST achieve 80% code coverage per [agentme-edr-004](004-unit-test-requirements.md) rule `03-must-maintain-80-percent-coverage`.
254
+ - MUST test workflow routing logic, conditional edges, state transformations, and error handling.
255
+ - MUST achieve **80% coverage of LangGraph graph edges and branches**: every conditional edge MUST have test cases covering each possible branch, and every node→node transition MUST be exercised by at least one test.
256
+ - Files MUST be named `<name>_test.py` and placed alongside the source file per [agentme-edr-004](004-unit-test-requirements.md) rule `04-must-place-test-files-alongside-source`.
257
+
258
+ **Workflow eval requirements:**
259
+
260
+ - Evals MUST be executed before every release.
261
+ - Accuracy below project-defined thresholds MUST block the release. Thresholds MUST be documented in the eval Makefile or README.
262
+ - Evals MUST run against real LLM providers (not mocks) to capture model drift.
263
+ - For eval folder structure and script requirements, see [agentme-edr-021](../application/021-ai-eval-standards.md).
@@ -96,6 +96,26 @@ $ make run
96
96
  # Application starts successfully
97
97
  ```
98
98
 
99
+ #### 05a-makefile-uses-security-utility
100
+
101
+ Makefile targets (e.g., `setup-secrets`) must use the macOS native `security` CLI to store and retrieve secrets from the keychain. This restricts Makefile-based secret management to macOS developer machines, which is acceptable since all contributors are expected to use macOS.
102
+
103
+ Do **not** use `keyring` or other cross-platform libraries in Makefiles — `security` is simpler to invoke from shell and requires no additional dependencies.
104
+
105
+ Storing a secret:
106
+ ```makefile
107
+ security add-generic-password -a "$(USER)" -s "mymodule/api-key" -w "$(SECRET_VALUE)" -U
108
+ ```
109
+
110
+ Retrieving a secret (e.g., to pass to a command):
111
+ ```makefile
112
+ SECRET_VALUE := $(shell security find-generic-password -a "$(USER)" -s "mymodule/api-key" -w 2>/dev/null)
113
+ ```
114
+
115
+ The `-U` flag updates the entry if it already exists. Use the format `<group>/<secret-id>` as the service name (`-s`) to mirror the module name and cloud secret manager ID convention defined in rule 02 and 05.
116
+
117
+ In library code (Python, JS/TS, Go), continue using the cross-platform libraries defined in rule 02 (`keyring`, `cross-keychain`, `go-keyring`). The `security` utility is only for Makefile scripts.
118
+
99
119
  ---
100
120
 
101
121
  #### 06-never-log-or-leak-secrets
package/package.json CHANGED
@@ -1,9 +1,9 @@
1
1
  {
2
2
  "name": "agentme",
3
- "version": "0.14.0",
3
+ "version": "0.16.0",
4
4
  "description": "",
5
5
  "dependencies": {
6
- "filedist": "^0.34.1"
6
+ "filedist": "^0.35.0"
7
7
  },
8
8
  "bin": "bin/filedist.js",
9
9
  "files": [
@@ -22,6 +22,6 @@
22
22
  "url": "https://github.com/flaviostutz/agentme.git"
23
23
  },
24
24
  "devDependencies": {
25
- "xdrs-core": "^0.28.1"
25
+ "xdrs-core": "^0.28.3"
26
26
  }
27
27
  }
@@ -1,309 +0,0 @@
1
- ---
2
- name: agentme-edr-policy-018-ai-agent-development-standards
3
- description: Defines the standard toolchain, framework, evaluation approach, testing strategy, and workflow patterns for building AI agents (tool-invocation loops) and LangGraph workflows in Python. Use when scaffolding, reviewing, or extending AI agent or workflow projects. For raw LLM calls and provider configuration, see agentme-edr-024.
4
- apply-to: AI agent and LangGraph workflow projects built with Python
5
- valid-from: 2026-05-26
6
- ---
7
-
8
- # agentme-edr-policy-018: AI agent development standards
9
-
10
- ## Context and Problem Statement
11
-
12
- AI agent projects vary widely in how they choose frameworks, manage context, evaluate outputs, and structure workflows. Without a shared baseline, projects accumulate incompatible patterns for LLM provider abstraction, flow design, and dataset-driven testing.
13
-
14
- Which tools, frameworks, and design patterns should AI agent projects follow to ensure reproducibility, testability, and maintainability?
15
-
16
- ## Decision Outcome
17
-
18
- **Use Python with LangGraph for flow orchestration and MLflow for experiment tracking and local evaluation.**
19
-
20
- This policy covers the **Agent** and **Workflow** tiers of the three-tier conceptual model defined in [agentme-edr-024](024-llm-development-standards.md). For the definition of LLM, Agent, and Workflow, and for the LangChain framework rules that govern direct LLM calls, see [agentme-edr-024](024-llm-development-standards.md).
21
-
22
- ### Details
23
-
24
- #### 01-language-and-framework
25
-
26
- All agent and workflow projects MUST be implemented in Python, following [agentme-edr-014](014-python-project-tooling.md) for project structure, tooling, and Makefile conventions.
27
-
28
- Agent flows MUST be built with **LangGraph**. Use LangGraph `StateGraph` to model each distinct workflow as an explicit directed graph with typed state.
29
-
30
- For all direct LLM calls within agent and workflow nodes, use LangChain per [agentme-edr-024](024-llm-development-standards.md).
31
-
32
- #### 03-observability-and-experiment-tracking
33
-
34
- Use **MLflow** for all agent and workflow observability and evaluation:
35
-
36
- - Wrap each agent or workflow run with `mlflow.start_run()` to capture traces, parameters, and metrics locally.
37
- - Log run parameters (model name, temperature, prompt version) and output metrics (accuracy, latency, token counts) using `mlflow.log_param` / `mlflow.log_metric`.
38
- - Run a local MLflow tracking server with `mlflow ui` to inspect runs during development. Do not require a remote MLflow server for local development.
39
- - For LangChain-level auto-tracing of individual LLM calls, see [agentme-edr-024](024-llm-development-standards.md) rule `03-llm-observability`.
40
-
41
- #### 04-dataset-driven-accuracy-measurement
42
-
43
- Every agent pipeline MUST have a companion evaluation dataset and an MLflow experiment that measures accuracy against it. Datasets and evals are organized per-workflow following rule `07-workflow-structure` and rule `08-workflow-evals`.
44
-
45
- - **Evals** measure model accuracy and performance against expected outputs. They are REQUIRED before every release to verify the workflow meets specified accuracy thresholds. They run against real LLM providers to capture model drift. They log metrics to MLflow and MUST have project-defined quality thresholds that block releases when not met.
46
- - **Integration tests** verify that workflows execute end-to-end with real connectors and real models, using pass/fail assertions. They are ADVISED but not required. They validate wiring, error handling, and system integration, not model accuracy. See rule `13-three-tier-testing-strategy` for integration test guidelines.
47
-
48
- **Eval requirements:**
49
-
50
- - Store evaluation datasets under `evals/<workflow>/` (sibling of `lib/` and `examples/`), following [agentme-edr-019](019-ml-dataset-structure.md) for structure and format. For MLflow input/output pairs, use the JSONL format described in `agentme-edr-019.04-complex-structured-datasets-must-use-jsonl`.
51
- - Write evaluation scripts under `evals/<workflow>/` that load the dataset, run each input through the live agent (against real LLMs, not mocks), compare outputs to expected values, and log per-sample and aggregate metrics to an MLflow experiment.
52
- - Add a `make eval` Makefile target in the module root Makefile (the same Makefile that sits alongside `lib/` and `examples/`) that delegates to all per-workflow eval targets.
53
- - Evaluation MUST run against real LLM providers, not recorded responses, to capture model drift. MLflow tracking MUST work locally without a remote server.
54
- - Evals MUST be executed before every release. Failed eval runs with accuracy below project-defined thresholds MUST block the release.
55
-
56
- #### 05-flow-documentation
57
-
58
- Each agent flow MUST be documented as a **Mermaid graph** in the project `README.md`. The diagram must match the LangGraph `StateGraph` definition:
59
-
60
- - Use `graph TD` or `graph LR` direction.
61
- - Label each node with its Python function name.
62
- - Label conditional edges with the condition expression.
63
- - Update the diagram whenever the graph topology changes.
64
-
65
- Example minimal diagram block:
66
-
67
- ```mermaid
68
- graph TD
69
- A[fetch_context] --> B[draft_response]
70
- B --> C{verify}
71
- C -->|pass| D[output]
72
- C -->|fail| B
73
- ```
74
-
75
- #### 06-verification-steps
76
-
77
- Agent flows MUST include at least one explicit verification node before producing final output:
78
-
79
- - Model the verification step as a dedicated LangGraph node (e.g. `verify_output`).
80
- - The node checks the draft output against defined acceptance criteria (schema validation, factual consistency check, rubric scoring, or LLM-as-judge call).
81
- - On failure, the verification node MUST route back to the relevant generation node, not silently pass through.
82
- - Log verification results (pass/fail, score, reason) as MLflow metrics on the current run.
83
-
84
- #### 07-workflow-structure
85
-
86
- Agent logic MUST be organized as named workflows following [agentme-edr-021](021-pragmatic-hexagonal-architecture.md). Each workflow is an independent LangGraph `StateGraph` with a defined start node and end node, connecting agents, states, routes, and decision nodes.
87
-
88
- Workflows live inside `app/workflows/` (the application layer), while external integrations such as LLM providers, vector stores, and third-party APIs live under `adapters/connectors/` (the outbound adapter layer). Inbound interfaces (HTTP API, CLI) live under `adapters/` as inbound adapters.
89
-
90
- For each workflow named `<workflow>`, the full project layout is:
91
-
92
- ```text
93
- lib/src/<package_name>/
94
- adapters/
95
- http/ # inbound: API server that triggers workflows
96
- cli/ # inbound: CLI entry point (if applicable)
97
- connectors/ # outbound: external resource integrations
98
- openai/ # LLM provider connector
99
- azure-openai/ # alternative LLM provider connector
100
- postgres/ # database connector (if applicable)
101
- vector-store/ # vector DB connector (if applicable)
102
- app/
103
- workflows/
104
- <workflow>/
105
- graph.py # StateGraph definition; entry point for the workflow
106
- agents.py # LangChain agent definitions used by this workflow
107
- states.py # Typed state dataclasses / TypedDicts
108
- routes.py # Conditional edge functions
109
- shared/ # infrastructure-agnostic utilities
110
- ```
111
-
112
- - `app/workflows/<workflow>/graph.py` MUST define and compile the `StateGraph` and expose a `graph` object that callers invoke.
113
- - Tool calls within workflow nodes that interact with external systems MUST use connectors from `adapters/connectors/`, not inline API calls.
114
- - Additional modules (prompts, schemas) MAY be added inside `app/workflows/<workflow>/` when they are specific to that workflow. Shared utilities belong in `shared/`.
115
- - Each workflow MUST be documented with a Mermaid diagram in the project `README.md` following rule `05-flow-documentation`.
116
-
117
- #### 08-workflow-evals
118
-
119
- For each workflow `<workflow>` there MUST be a corresponding eval directory:
120
-
121
- ```text
122
- evals/
123
- <workflow>/
124
- Makefile # eval targets for this workflow
125
- dataset_<slice>/ # one folder per eval slice (see agentme-edr-019)
126
- eval_<slice>.py # evaluation script for each slice
127
- ```
128
-
129
- The `evals/<workflow>/Makefile` MUST define:
130
-
131
- | Target | Behaviour |
132
- |---|---|
133
- | `eval` | Runs all eval slices for the workflow |
134
- | `eval-<slice>` | Runs one named slice (e.g. `eval-simple`, `eval-complex`) |
135
-
136
- Each `eval_<slice>.py` script MUST:
137
-
138
- - Load the dataset from `evals/<workflow>/dataset_<slice>/` following [agentme-edr-019](019-ml-dataset-structure.md).
139
- - Run every input through the live workflow against real LLMs.
140
- - Log per-sample and aggregate metrics to an MLflow experiment that runs locally.
141
-
142
- The module root Makefile `make eval` target MUST delegate to `eval` in every `evals/<workflow>/Makefile`.
143
-
144
- #### 09-node-naming-conventions
145
-
146
- LangGraph node names MUST follow a suffix convention that communicates the node's role at a glance. Names MUST be action-oriented and descriptive.
147
-
148
- | Suffix | Node type | When to use |
149
- |---|---|---|
150
- | `_llm` | LLM call | Any node whose primary action is a direct LLM inference call |
151
- | `_step` | Algorithmic step | Deterministic logic with no LLM involvement (transformation, validation, routing) |
152
- | `_tool` | Tool/API call | A node that wraps a single external tool or API (e.g. a REST endpoint, DB query) |
153
- | `_agent` | Subgraph agent | A node that invokes a nested subgraph containing its own tool-invocation cycle and LLM calls; prefer the **deepagents** library for these nodes |
154
-
155
- The Python function implementing the node SHOULD share the same name as the node alias passed to `add_node`, so that graph definitions and stack traces remain unambiguous:
156
-
157
- ```python
158
- def draft_doc_llm(state): ...
159
- graph.add_node("draft_doc_llm", draft_doc_llm)
160
-
161
- # Tool node — calls the Stripe API
162
- def stripe_api_tool(state): ...
163
- graph.add_node("stripe_api_tool", stripe_api_tool)
164
- ```
165
-
166
- Names MUST NOT use generic labels such as `node1`, `process`, or `run`. Each name must clearly express what action the node performs.
167
-
168
- #### 10-local-sandbox
169
-
170
- When a workflow node or tool requires a **local sandbox** — an isolated environment where the agent can read files, glob-search directories, and execute shell commands — use the **[deepagents](https://github.com/deepagents/deepagents) framework** to provide that sandbox.
171
-
172
- **When to apply this rule**
173
-
174
- Use deepagents whenever ANY of the following is true for a workflow or tool:
175
- - The agent needs to execute shell commands or scripts in a controlled environment.
176
- - The agent needs to list, read, or search files across multiple directories at runtime.
177
- - The agent operates on user-supplied or generated file trees that must not escape a sandboxed boundary.
178
-
179
- **Integration requirements**
180
-
181
- - Initialize the sandbox at the start of the workflow run and shut it down in the same `try/finally` block.
182
- - Pass the sandbox handle into the LangGraph workflow state so all nodes share the same sandbox instance.
183
- - If the host-side code needs to pass files into the sandbox (e.g. generated config or input data), create a temporary directory with `tempfile.mkdtemp()`, write the files there, and mount it into the sandbox. Clean it up in the `finally` block.
184
- - Replace hand-rolled `read_file`, `search_files`, and `grep_file` tool implementations with the equivalent tools provided by deepagents.
185
-
186
- #### 11-state-type-conventions
187
-
188
- All TypedDict and dataclass types that represent LangGraph node or workflow state MUST end with `_state` in their name. This suffix signals at a glance that the type is a state boundary, not a plain data model.
189
-
190
- **Naming reference:**
191
-
192
- | Owner | Naming pattern | Example |
193
- |---|---|---|
194
- | Single agent / agent subgraph | `<agent_name>_agent_state` | `reviewer_agent_state` |
195
- | Full workflow (`StateGraph`) | `<workflow_name>_workflow_state` | `document_workflow_state` |
196
- | Named group of nodes sharing state | `<group_responsibility>_state` | `retrieval_pipeline_state` |
197
-
198
- **Boundary rules:**
199
-
200
- - Each agent or agent subgraph MUST define its own dedicated state type. Do not reuse or extend a generic state across unrelated agents.
201
- - Each workflow (`StateGraph`) MUST define its own top-level state type. The workflow state is the authoritative boundary for that graph's inputs and outputs.
202
- - When a group of nodes (not a full workflow and not a single agent) shares a state type, the type name MUST clearly reflect the shared responsibility. Generic names such as `shared_state`, `common_state`, or `global_state` are FORBIDDEN.
203
- - Large workflows MUST NOT use a single monolithic state that all nodes read and write. Split the state into per-phase or per-agent state types scoped to the subgraph or set of nodes that produce or consume each field.
204
-
205
- State type names SHOULD align with the agent or node names defined in rule `09-node-naming-conventions` (e.g., an agent node named `draft_doc_agent` has a state type named `draft_doc_agent_state`).
206
-
207
- #### 12-workflow-naming-conventions
208
-
209
- LangGraph `StateGraph` instances and their enclosing classes MUST be given a meaningful name that conveys the workflow's input, output, and/or behavior. The name MUST end with `Workflow` (PascalCase class) or `_workflow` (snake_case variable or directory).
210
-
211
- Choose a name that summarises what the workflow consumes, processes, and produces — avoid generic labels such as `Pipeline`, `Flow`, `Graph`, or `Process`.
212
-
213
- | Context | Pattern | Example |
214
- |---|---|---|
215
- | Python class | `<DescriptiveName>Workflow` | `FileMapJudgeReduceWorkflow` |
216
- | Python variable / instance | `<descriptive_name>_workflow` | `file_map_judge_reduce_workflow` |
217
- | Directory under `app/workflows/` | `<descriptive_name>_workflow` | `financial_report_analysis_workflow/` |
218
-
219
- **Good names** communicate purpose at a glance:
220
-
221
- - `FileMapJudgeReduceWorkflow` — maps files, judges each, then reduces results
222
- - `FinancialReportAnalysisWorkflow` — analyses financial report inputs
223
- - `MarketingCampaignExecutorWorkflow` — executes a marketing campaign end-to-end
224
-
225
- **Bad names** (FORBIDDEN): `MainWorkflow`, `AgentGraph`, `ProcessFlow`, `Workflow1`, `RunGraph`.
226
-
227
- #### 13-three-tier-testing-strategy
228
-
229
- AI agent and workflow projects recognize three distinct testing tiers, each with its own purpose, tooling, and execution model:
230
-
231
- | Tier | Purpose | Model source | External APIs | File naming | When to run | Required |
232
- |---|---|---|---|---|---|---|
233
- | **Unit tests** | Test workflow logic, routing, and state management in isolation | Mocked (FakeListChatModel) | Mocked or faked | `<name>_test.py` | On every commit | **Required** |
234
- | **Integration tests** | Verify end-to-end wiring with real models and real external connectors | Real LLM providers | Real connectors | `<name>_integration_test.py` | Before releases or on a schedule | Advised |
235
- | **Evals** | Measure model accuracy and performance against expected outputs | Real LLM providers | Real connectors | `eval_<slice>.py` (see rule 08) | Before every release | **Required** |
236
-
237
- **Unit test requirements (REQUIRED):**
238
-
239
- - MUST use mocked LLM providers (see [agentme-edr-024](024-llm-development-standards.md) rule `04-unit-test-mocking`).
240
- - MUST run offline with no external dependencies per [agentme-edr-004](../principles/004-unit-test-requirements.md) rule `02-must-run-offline`.
241
- - MUST achieve 80% code coverage per [agentme-edr-004](../principles/004-unit-test-requirements.md) rule `03-must-maintain-80-percent-coverage`.
242
- - MUST test workflow routing logic, conditional edges, state transformations, and error handling.
243
- - Files MUST be named `<name>_test.py` and placed alongside the source file per [agentme-edr-004](../principles/004-unit-test-requirements.md) rule `04-must-place-test-files-alongside-source`.
244
- - MUST achieve 80% coverage of workflow graph edges and branches per rule `14-workflow-graph-coverage`.
245
-
246
- **Integration test guidelines (ADVISED):**
247
-
248
- Integration tests are advised but not required. See [agentme-edr-007](../principles/007-project-quality-standards.md) rule `08-integration-tests-are-advised` for general integration testing guidelines.
249
-
250
- For AI agent and workflow projects specifically, when integration tests are implemented:
251
-
252
- - SHOULD use real LLM providers configured via environment variables
253
- - MAY use smaller, faster models (e.g., `gpt-3.5-turbo`) instead of production models to reduce cost and latency
254
- - SHOULD verify workflow execution with pass/fail assertions, not accuracy metrics (accuracy is measured by evals)
255
-
256
- **Eval requirements (REQUIRED):**
257
-
258
- Evals are REQUIRED for all agent and workflow projects. See rule `04-dataset-driven-accuracy-measurement` and rule `08-workflow-evals` for full requirements.
259
-
260
- - Evals MUST run against real LLM providers to capture model drift and log metrics to MLflow for tracking model performance over time.
261
- - Evals MUST be executed before every release to verify the workflow meets specified accuracy thresholds.
262
- - Failed eval runs with accuracy below project-defined thresholds MUST block the release.
263
-
264
- #### 13-workflow-graph-coverage
265
-
266
- LangGraph Workflows MUST achieve **80% coverage of workflow graph edges and branches** during unit-tests.
267
-
268
- **Graph edge coverage** measures whether each transition (edge) in the `StateGraph` is exercised by at least one test. This ensures that routing logic, conditional edges, and error paths are tested.
269
-
270
- **Requirements:**
271
-
272
- - Every conditional edge (e.g., `add_conditional_edges`) MUST have test cases covering each possible branch.
273
- - Every node→node transition MUST be exercised by at least one test.
274
- - Error handling paths (e.g., nodes that route to error states or retry nodes) MUST be tested with inputs that trigger those paths.
275
- - Use mocked LLM responses in unit tests to control which branches are taken (see [agentme-edr-024](024-llm-development-standards.md) rule `04-unit-test-mocking`).
276
-
277
- **Example:**
278
-
279
- ```python
280
- def test_workflow_approval_path():
281
- """Test the workflow takes the approval path when LLM returns APPROVE."""
282
- fake_llm = FakeListChatModel(responses=["APPROVE"])
283
- workflow = DocumentWorkflow(llm=fake_llm)
284
-
285
- result = workflow.invoke({"document": sample_doc})
286
-
287
- assert result["status"] == "approved"
288
- assert "verify_approved" in result["_visited_nodes"]
289
-
290
- def test_workflow_rejection_path():
291
- """Test the workflow takes the rejection path when LLM returns REJECT."""
292
- fake_llm = FakeListChatModel(responses=["REJECT"])
293
- workflow = DocumentWorkflow(llm=fake_llm)
294
-
295
- result = workflow.invoke({"document": sample_doc})
296
-
297
- assert result["status"] == "rejected"
298
- assert "handle_rejection" in result["_visited_nodes"]
299
- ```
300
-
301
- When measuring coverage, use the same 80% threshold: at least 80% of graph edges must be covered by unit tests.
302
-
303
- ## References
304
-
305
- - [agentme-edr-024](024-llm-development-standards.md) — LLM development standards: LangChain framework, provider compatibility, LLM observability, unit test mocking, and the LLM / Agent / Workflow conceptual model
306
- - [agentme-edr-004](../principles/004-unit-test-requirements.md) — Unit test requirements including coverage, offline execution, and test file placement
307
- - [agentme-edr-021](021-pragmatic-hexagonal-architecture.md) — Adapter/application layer separation that defines the project layout
308
- - [agentme-edr-014](014-python-project-tooling.md) — Python project tooling and structure
309
- - [agentme-edr-019](019-ml-dataset-structure.md) — ML dataset structure for eval datasets