agentme 0.9.0 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,5 +1,6 @@
1
1
  sets:
2
- - package: xdrs-core@0.27.1
2
+ - package: xdrs-core@0.28.0
3
+ # - package: git:https://github.com/flaviostutz/xdrs-core.git@main
3
4
  selector:
4
5
  files:
5
6
  - .xdrs/_core/**
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: agentme-edr-policy-018-ai-agent-development-standards
3
- description: Defines the standard toolchain, framework, evaluation approach, and context management patterns for building AI agents. Use when scaffolding, reviewing, or extending AI agent projects.
3
+ description: Defines the standard toolchain, framework, evaluation approach, and workflow patterns for building AI agents with Python and LangGraph. Use when scaffolding, reviewing, or extending AI agent projects.
4
4
  apply-to: AI agent projects built with Python
5
5
  valid-from: 2026-05-26
6
6
  ---
@@ -9,13 +9,13 @@ valid-from: 2026-05-26
9
9
 
10
10
  ## Context and Problem Statement
11
11
 
12
- AI agent projects vary widely in how they choose frameworks, manage context, evaluate outputs, and expose policies to the agent at runtime. Without a shared baseline, projects accumulate incompatible patterns for LLM provider abstraction, flow design, dataset-driven testing, and knowledge delivery.
12
+ AI agent projects vary widely in how they choose frameworks, manage context, evaluate outputs, and structure workflows. Without a shared baseline, projects accumulate incompatible patterns for LLM provider abstraction, flow design, and dataset-driven testing.
13
13
 
14
14
  Which tools, frameworks, and design patterns should AI agent projects follow to ensure reproducibility, testability, and maintainability?
15
15
 
16
16
  ## Decision Outcome
17
17
 
18
- **Use Python with LangGraph for flow orchestration, MLflow for experiment tracking and local evaluation, and a file-system-based XDRS knowledge layer that the agent queries at runtime via explicit file tools.**
18
+ **Use Python with LangGraph for flow orchestration and MLflow for experiment tracking and local evaluation.**
19
19
 
20
20
  ### Details
21
21
 
@@ -54,7 +54,7 @@ Use **MLflow** for all agent observability and evaluation:
54
54
 
55
55
  #### 04-dataset-driven-accuracy-measurement
56
56
 
57
- Every agent pipeline MUST have a companion evaluation dataset and an MLflow experiment that measures accuracy against it. Datasets and evals are organized per-workflow following rule `09-workflow-structure` and rule `10-workflow-evals`.
57
+ Every agent pipeline MUST have a companion evaluation dataset and an MLflow experiment that measures accuracy against it. Datasets and evals are organized per-workflow following rule `07-workflow-structure` and rule `08-workflow-evals`.
58
58
 
59
59
  - Store evaluation datasets under `evals/<workflow>/` (sibling of `lib/` and `examples/`), following [agentme-edr-019](019-ml-dataset-structure.md) for structure and format. For MLflow input/output pairs, use the JSONL format described in `agentme-edr-019.04-complex-structured-datasets-must-use-jsonl`.
60
60
  - Write evaluation scripts under `evals/<workflow>/` that load the dataset, run each input through the live agent (against real LLMs, not mocks), compare outputs to expected values, and log per-sample and aggregate metrics to an MLflow experiment.
@@ -80,50 +80,7 @@ graph TD
80
80
  C -->|fail| B
81
81
  ```
82
82
 
83
- #### 06-xdrs-knowledge-layer
84
-
85
- When an agent must follow elaborate procedures, decision frameworks, or domain rules:
86
-
87
- **Static files distributed with the library**
88
-
89
- - All static files accessed by agents at runtime (XDRS documents, reference tables, domain dictionaries, lookup files) MUST live under a `data/` folder inside the library source tree (`lib/data/`) and be embedded in the package data manifest (e.g. `pyproject.toml` `[tool.hatch.build] include` or equivalent).
90
- - XDRS Policy and Skill documents MUST be placed at `lib/data/.xdrs/`, using the standard XDRS scope/type/subject folder structure (following `_core-adr-policy-001`).
91
- - Other static context data (reference tables, domain dictionaries, structured lookup files) MUST be placed under `lib/data/` in an appropriate sub-folder (e.g. `lib/data/context/`).
92
- - The agent system prompt MUST NOT inline procedure text. It MUST instruct the agent to read specific paths and follow the instructions found there. Example:
93
-
94
- ```
95
- Before answering, read and follow the instructions in data/.xdrs/_local/edrs/procedures/triage.md.
96
- ```
97
-
98
- **Dynamic context generated per workflow instantiation**
99
-
100
- - Context files that are generated at runtime per workflow run (unpacked archives, fetched documents, intermediate outputs) MUST be written to a temporary directory created via the OS temp API (`tempfile.mkdtemp()` in Python).
101
- - The temporary directory MUST be created at the start of the workflow run and passed into the workflow state so all nodes share the same path.
102
- - The temporary directory MUST be deleted (including all contents) when the workflow run finishes, whether it succeeds or fails, using a `try/finally` block or a context manager.
103
- - The agent file tools MUST be configured with the temporary directory path at workflow startup so the agent can read from it during the run.
104
-
105
- - The agent file tools MUST expose `data/` (for static files) and the temporary directory (for dynamic files) as sandboxed readable roots (see rule `07-agent-file-tools`).
106
-
107
- #### 07-agent-file-tools
108
-
109
- Every agent that uses the XDRS knowledge layer or file-based context MUST be equipped with at least the following tools:
110
-
111
- | Tool | Purpose |
112
- |---|---|
113
- | `read_file(path)` | Read the full content of a file by path |
114
- | `search_files(directory, pattern)` | Glob-search for files matching a pattern under a directory |
115
- | `grep_file(path, query)` | Search for lines matching a string or regex within a file |
116
-
117
- Implement these tools as LangChain `@tool`-decorated functions with explicit path sandboxing. Two sandboxed roots MUST be configured:
118
-
119
- | Root | Content | Source |
120
- |---|---|---|
121
- | `DATA_ROOT` | Static files shipped with the library (`lib/data/`) | Package data; resolved via `importlib.resources` or a path relative to the installed package |
122
- | `TEMP_ROOT` | Dynamic files generated for the current workflow run | Temporary directory created by `tempfile.mkdtemp()` at workflow startup |
123
-
124
- Resolve all paths against the appropriate root. Reject any path that would escape its root (no `../` traversal). `TEMP_ROOT` MUST be passed into the tool factory at workflow startup, not read from a global variable.
125
-
126
- #### 08-verification-steps
83
+ #### 06-verification-steps
127
84
 
128
85
  Agent flows MUST include at least one explicit verification node before producing final output:
129
86
 
@@ -132,7 +89,7 @@ Agent flows MUST include at least one explicit verification node before producin
132
89
  - On failure, the verification node MUST route back to the relevant generation node, not silently pass through.
133
90
  - Log verification results (pass/fail, score, reason) as MLflow metrics on the current run.
134
91
 
135
- #### 09-workflow-structure
92
+ #### 07-workflow-structure
136
93
 
137
94
  Agent logic MUST be organized as named workflows. Each workflow is an independent LangGraph `StateGraph` with a defined start node and end node, connecting agents, states, routes, and decision nodes.
138
95
 
@@ -152,7 +109,7 @@ lib/
152
109
  - Additional modules (tools, prompts, schemas) MAY be added inside `lib/workflows/<workflow>/` when they are specific to that workflow. Shared utilities belong in `lib/<module>/`.
153
110
  - Each workflow MUST be documented with a Mermaid diagram in the project `README.md` following rule `05-flow-documentation`.
154
111
 
155
- #### 10-workflow-evals
112
+ #### 08-workflow-evals
156
113
 
157
114
  For each workflow `<workflow>` there MUST be a corresponding eval directory:
158
115
 
@@ -168,8 +125,8 @@ The `evals/<workflow>/Makefile` MUST define:
168
125
 
169
126
  | Target | Behaviour |
170
127
  |---|---|
171
- | `test-eval` | Runs all eval slices for the workflow |
172
- | `test-eval-<slice>` | Runs one named slice (e.g. `test-eval-simple`, `test-eval-complex`) |
128
+ | `eval` | Runs all eval slices for the workflow |
129
+ | `eval-<slice>` | Runs one named slice (e.g. `eval-simple`, `eval-complex`) |
173
130
 
174
131
  Each `eval_<slice>.py` script MUST:
175
132
 
@@ -177,5 +134,23 @@ Each `eval_<slice>.py` script MUST:
177
134
  - Run every input through the live workflow against real LLMs.
178
135
  - Log per-sample and aggregate metrics to an MLflow experiment that runs locally.
179
136
 
180
- The module root Makefile `make eval` target MUST delegate to `test-eval` in every `evals/<workflow>/Makefile`.
137
+ The module root Makefile `make eval` target MUST delegate to `eval` in every `evals/<workflow>/Makefile`.
138
+
139
+ #### 09-local-sandbox
140
+
141
+ When a workflow node or tool requires a **local sandbox** — an isolated environment where the agent can read files, glob-search directories, and execute shell commands — use the **[deepagents](https://github.com/deepagents/deepagents) framework** to provide that sandbox.
142
+
143
+ **When to apply this rule**
144
+
145
+ Use deepagents whenever ANY of the following is true for a workflow or tool:
146
+ - The agent needs to execute shell commands or scripts in a controlled environment.
147
+ - The agent needs to list, read, or search files across multiple directories at runtime.
148
+ - The agent operates on user-supplied or generated file trees that must not escape a sandboxed boundary.
149
+
150
+ **Integration requirements**
151
+
152
+ - Initialize the sandbox at the start of the workflow run and shut it down in the same `try/finally` block.
153
+ - Pass the sandbox handle into the LangGraph workflow state so all nodes share the same sandbox instance.
154
+ - If the host-side code needs to pass files into the sandbox (e.g. generated config or input data), create a temporary directory with `tempfile.mkdtemp()`, write the files there, and mount it into the sandbox. Clean it up in the `finally` block.
155
+ - Replace hand-rolled `read_file`, `search_files`, and `grep_file` tool implementations with the equivalent tools provided by deepagents.
181
156
 
@@ -17,13 +17,13 @@ How should ML datasets be organized on disk so they are self-describing, easy to
17
17
 
18
18
  **A standard root layout with mandatory README.md and dataset.schema.json, plus type-specific conventions for data files**
19
19
 
20
- Every dataset must live in its own named folder and include a README and a JSON Schema file. Data files are organized according to three dataset types, each with its own placement rule.
20
+ Every dataset MUST live in its own named folder and include a README and a JSON Schema file. Data files are organized according to three dataset types, each with its own placement rule.
21
21
 
22
22
  ### Details
23
23
 
24
24
  #### 01-root-structure-is-mandatory
25
25
 
26
- Every dataset must follow this root layout:
26
+ Every dataset MUST follow this root layout:
27
27
 
28
28
  ```
29
29
  /[name-of-dataset]/
@@ -33,13 +33,13 @@ Every dataset must follow this root layout:
33
33
  ... (additional files depending on dataset type)
34
34
  ```
35
35
 
36
- - `README.md` must explain what the dataset is about, the procedures used to create it, remarks on data quality, and instructions on how to consume it with examples.
37
- - `dataset.schema.json` must be a valid [JSON Schema](https://json-schema.org/) document describing the structure of the dataset's primary data.
38
- - The dataset folder name must be lowercase, using hyphens as separators.
36
+ - `README.md` MUST explain what the dataset is about, the procedures used to create it, remarks on data quality, and instructions on how to consume it with examples.
37
+ - `dataset.schema.json` MUST be a valid [JSON Schema](https://json-schema.org/) document describing the structure of the dataset's primary data.
38
+ - The dataset folder name MUST be lowercase, using underscores as separators (e.g. `my_dataset`).
39
39
 
40
40
  #### 02-file-annotation-pairs-must-use-data-folder
41
41
 
42
- Datasets where each item is a file paired with structured JSON output (e.g. image labeling, document data extraction, medical records with known features) must store all files inside the `data/` subfolder. Each data file must have a sibling JSON annotation file named with the same filename suffixed with `.json`.
42
+ Datasets where each item is a file paired with structured JSON output (e.g. image labeling, document data extraction, medical records with known features) MUST store all files inside the `data/` subfolder. Each data file MUST have a sibling JSON annotation file named with the same filename suffixed with `.json`.
43
43
 
44
44
  ```
45
45
  /[name-of-dataset]/
@@ -56,11 +56,11 @@ Datasets where each item is a file paired with structured JSON output (e.g. imag
56
56
 
57
57
  Placing the annotation file next to its source file (same name + `.json`) keeps them adjacent even in large directories, making it easy to iterate pairs programmatically.
58
58
 
59
- Subdirectories inside `data/` are allowed when the number of files warrants grouping, but the `.json` sibling convention must be preserved at each level.
59
+ Subdirectories inside `data/` are allowed when the number of files warrants grouping, but the `.json` sibling convention MUST be preserved at each level.
60
60
 
61
61
  #### 03-tabular-datasets-must-use-csv-files-at-root
62
62
 
63
- Datasets composed of column-oriented tabular data must place CSV files at the root of the dataset folder. All tabular files must conform to the schema defined in `dataset.schema.json`, which must describe columns as named attributes with their types.
63
+ Datasets composed of column-oriented tabular data MUST place CSV files at the root of the dataset folder. All tabular files MUST conform to the schema defined in `dataset.schema.json`, which MUST describe columns as named attributes with their types.
64
64
 
65
65
  ```
66
66
  /[name-of-dataset]/
@@ -70,11 +70,11 @@ Datasets composed of column-oriented tabular data must place CSV files at the ro
70
70
  README.md
71
71
  ```
72
72
 
73
- Multiple CSV files are allowed when they represent different slices or splits of the same schema (e.g. train/test splits, subsets by source). All files in the same dataset must share the same column schema.
73
+ Multiple CSV files are allowed when they represent different slices or splits of the same schema (e.g. train/test splits, subsets by source). All files in the same dataset MUST share the same column schema.
74
74
 
75
75
  #### 04-complex-structured-datasets-must-use-jsonl
76
76
 
77
- Datasets with complex or heterogeneous per-record structures (e.g. LLM workflow evaluation sets, Q&A pairs, input → expected_output pairs) must use JSONL files (one JSON object per line) placed at the root of the dataset folder. Each line must conform to the schema defined in `dataset.schema.json`.
77
+ Datasets with complex or heterogeneous per-record structures (e.g. LLM workflow evaluation sets, Q&A pairs, input → expected_output pairs) MUST use JSONL files (one JSON object per line) placed at the root of the dataset folder. Each line MUST conform to the schema defined in `dataset.schema.json`.
78
78
 
79
79
  ```
80
80
  /[name-of-dataset]/
@@ -84,11 +84,11 @@ Datasets with complex or heterogeneous per-record structures (e.g. LLM workflow
84
84
  README.md
85
85
  ```
86
86
 
87
- Multiple JSONL files are allowed when they represent different splits or categories (e.g. easy vs. edge cases). All files in the same dataset must conform to the same line schema.
87
+ Multiple JSONL files are allowed when they represent different splits or categories (e.g. easy vs. edge cases). All files in the same dataset MUST conform to the same line schema.
88
88
 
89
89
  #### 05-referenced-files-must-live-in-data-folder
90
90
 
91
- When any dataset type (tabular, JSONL, or annotation-pair) contains references to external files as part of the data (e.g. a JSONL record that includes a file path), those referenced files must be stored inside the `data/` subfolder of the dataset. Paths inside data records must be relative to the dataset root.
91
+ When any dataset type (tabular, JSONL, or annotation-pair) contains references to external files as part of the data (e.g. a JSONL record that includes a file path), those referenced files MUST be stored inside the `data/` subfolder of the dataset. Paths inside data records MUST be relative to the dataset root.
92
92
 
93
93
  ## References
94
94
 
@@ -0,0 +1,99 @@
1
+ ---
2
+ name: agentme-edr-policy-020-ai-agent-xdrs-knowledge-layer
3
+ description: Defines how to integrate XDRS as the runtime knowledge source of truth for AI agents — covering document placement, AGENTS.md setup, file tools, and local sandbox configuration. Apply only when the project explicitly uses XDRS to govern agent behavior.
4
+ apply-to: AI agent projects that use XDRS as the source of truth for policies and skills
5
+ valid-from: 2026-05-27
6
+ ---
7
+
8
+ # agentme-edr-policy-020: AI agent XDRS knowledge layer
9
+
10
+ ## Context and Problem Statement
11
+
12
+ AI agents need access to project-specific policies and skills at runtime to produce consistent, governed outputs. XDRS provides a file-system-based structure for capturing these decisions, but there is no standard pattern for embedding XDRS documents in agent libraries, wiring the agent to consult them, or sandboxing file access securely.
13
+
14
+ How should an AI agent project integrate XDRS as its runtime source of truth for policies and skills?
15
+
16
+ ## Decision Outcome
17
+
18
+ **Embed XDRS documents in `lib/data/.xdrs/`, instruct the agent to consult them via `AGENTS.md`, equip the agent with sandboxed file tools, and use the deepagents framework when a local sandbox is required.**
19
+
20
+ This policy MUST only be applied when the project explicitly chooses XDRS as its knowledge governance layer. It is not required by [agentme-edr-018](018-ai-agent-development-standards.md) in general.
21
+
22
+ ### Details
23
+
24
+ #### 01-xdrs-knowledge-layer
25
+
26
+ XDRS documents are the source of truth for all policies and skills that the agent must follow during its tasks. The agent MUST consult XDRS before acting, not rely on general knowledge alone.
27
+
28
+ **Placing XDRS documents in the library**
29
+
30
+ - XDRS Policy and Skill documents MUST be placed at `lib/data/.xdrs/`, using the standard XDRS scope/type/subject folder structure (following `_core-adr-policy-001`).
31
+ - They MUST be embedded in the package data manifest (e.g. `pyproject.toml` `[tool.hatch.build] include` or equivalent) so they are available at runtime.
32
+ - When exposed through a deepagents sandbox, they MUST be mounted at `/.xdrs/` inside the sandbox (see rule `03-local-sandbox`).
33
+
34
+ **AGENTS.md — mandatory XDRS consultation**
35
+
36
+ Place an `AGENTS.md` file at the root of the deepagents sandbox (i.e. alongside `/.xdrs/`). This file instructs the agent to always consult XDRS before acting. Its content MUST follow the xdrs-core AGENTS.md template:
37
+
38
+ ```markdown
39
+ # AGENTS.md
40
+
41
+ **Purpose:** This file is intentionally brief. All decisions and working instructions are captured as Policies or Skills in the XDRS structure.
42
+
43
+ ## Policy Consultation in XDRS Is Mandatory For Every Request
44
+
45
+ Before answering **any** request you MUST:
46
+
47
+ 1. Read the XDRS root index at `/.xdrs/index.md` to identify relevant Policies and Skills.
48
+ 2. Read the relevant Policy and Skill files.
49
+ 3. Base your actions on those Policies and Skills.
50
+
51
+ This rule has NO exceptions. Do not answer from general knowledge alone when a Policy may exist on the topic.
52
+ ```
53
+
54
+ The agent system prompt MUST reference `AGENTS.md` so the agent loads it at startup. Example:
55
+
56
+ ```
57
+ Read /AGENTS.md and follow all instructions in it before proceeding.
58
+ ```
59
+
60
+ #### 02-agent-file-tools
61
+
62
+ Every agent that uses the XDRS knowledge layer MUST use the file tools provided by the deepagents framework. Do not implement hand-rolled alternatives — see [agentme-edr-policy-018-ai-agent-development-standards.[09-local-sandbox]](018-ai-agent-development-standards.md) for the full sandbox and tool requirements.
63
+
64
+ These tools operate over two sandboxed roots (configured in rule `03-local-sandbox`):
65
+
66
+ | Root | Content | Source |
67
+ |---|---|---|
68
+ | `data_root` | Static files shipped with the library (`lib/data/`) | Resolved via `importlib.resources` at workflow startup |
69
+ | `temp_root` | Dynamic files generated for the current workflow run | Temporary directory created by `tempfile.mkdtemp()` at workflow startup |
70
+
71
+ `temp_root` MUST be created at workflow startup and cleaned up in the same `try/finally` block. Pass it explicitly into the workflow; do not read it from a global variable.
72
+
73
+ #### 03-local-sandbox
74
+
75
+ Follow [agentme-edr-policy-018-ai-agent-development-standards.[09-local-sandbox]](018-ai-agent-development-standards.md) for the general deepagents sandbox setup. When XDRS is in use, add the following mounts to the sandbox configuration:
76
+
77
+ | Source | Content | Deepagents sandbox path |
78
+ |---|---|---|
79
+ | `lib/data/.xdrs/` | XDRS Policy and Skill documents | `/.xdrs/` (read-only) |
80
+ | Generated at startup | `AGENTS.md` instructing the agent to consult XDRS | `/AGENTS.md` (read-only) |
81
+
82
+ XDRS documents MUST always be mounted at `/.xdrs/`. `AGENTS.md` MUST always be placed at the sandbox root (`/AGENTS.md`).
83
+
84
+ Example XDRS mount additions:
85
+
86
+ ```python
87
+ from importlib.resources import files
88
+ from pathlib import Path
89
+
90
+ data_root = str(files("myagent").joinpath("data"))
91
+ agents_md = Path(temp_root) / "AGENTS.md"
92
+ agents_md.write_text(_AGENTS_MD) # content from xdrs-core AGENTS.md template; see rule 01-xdrs-knowledge-layer
93
+
94
+ # Add these mounts alongside the base mounts from agentme-edr-018 rule 09-local-sandbox:
95
+ xdrs_mounts = [
96
+ {"src": f"{data_root}/.xdrs", "dst": "/.xdrs", "readonly": True},
97
+ {"src": str(agents_md), "dst": "/AGENTS.md", "readonly": True},
98
+ ]
99
+ ```
@@ -103,6 +103,7 @@ Targets are organized into five lifecycle groups. Projects must use these names
103
103
  | `test-unit` | Run unit tests only, including coverage report generation and coverage threshold enforcement. |
104
104
  | `test-integration` | *(Optional)* Run integration and end-to-end tests only. Projects without integration tests may omit this target. |
105
105
  | `test-smoke` | *(Optional)* Run a fast, minimal subset of tests to verify the software is basically functional. Useful as a post-deploy health check. |
106
+ | `eval` | *(Optional)* Run **all evaluations** for the module. Used alongside `test` to measure the accuracy and performance of statistical systems such as ML models, AI agents, or noisy systems. Typically runs against a live or near-live system (similar to an integration test) and produces a performance analysis report (e.g., F1 score, Accuracy, Precision, Recall). Must not be included in `test` or `all` — evals are opt-in because they require live dependencies and may be slow or costly to run. Individual evaluations must follow the prefix convention: `eval-<qualifier>` (e.g., `eval-simple`, `eval-complex`). |
106
107
 
107
108
  ##### Release group
108
109
 
@@ -29,8 +29,9 @@ Language and framework-specific tooling and project structure.
29
29
  - [agentme-edr-010](application/010-golang-project-tooling.md) - **Go project tooling and structure** - Scaffold Go CLIs and libraries with the standard layout *(includes skill: [003-create-golang-project](application/skills/003-create-golang-project/SKILL.md))*
30
30
  - [agentme-edr-014](application/014-python-project-tooling.md) - **Python project tooling and structure** - Scaffold Python packages and CLIs with the standard layout *(includes skill: [005-create-python-project](application/skills/005-create-python-project/SKILL.md))*
31
31
  - [agentme-edr-015](application/015-cli-tool-standards.md) - **CLI tool standards** - Define command UX and behavior for CLI tools
32
- - [agentme-edr-018](application/018-ai-agent-development-standards.md) - **AI agent development standards** - Standard toolchain, framework, evaluation, and context patterns for AI agent projects
32
+ - [agentme-edr-018](application/018-ai-agent-development-standards.md) - **AI agent development standards** - Standard toolchain, framework, evaluation, and workflow patterns for AI agent projects built with Python and LangGraph
33
33
  - [agentme-edr-019](application/019-ml-dataset-structure.md) - **ML dataset structure** - Standard folder layout and file conventions for ML datasets
34
+ - [agentme-edr-020](application/020-ai-agent-xdrs-knowledge-layer.md) - **AI agent XDRS knowledge layer** - How to integrate XDRS as the runtime source of truth for policies and skills in AI agents (apply only when the project explicitly uses XDRS)
34
35
  - [004-select-relevant-xdrs](application/skills/004-select-relevant-xdrs/SKILL.md) - **Select relevant XDRs**
35
36
 
36
37
  ## Devops
@@ -68,7 +68,27 @@ Builds that miss the threshold must not be merged.
68
68
 
69
69
  ---
70
70
 
71
- #### 04-should-extract-shared-setup
71
+ #### 04-must-place-test-files-alongside-source
72
+
73
+ Test files must live next to the source file they test, in the same directory, following the convention of the language/framework (e.g. `file.test.ts`, `file_test.go`, `file.spec.js`).
74
+
75
+ ```
76
+ src/mymodule/group1/file1.ts ← source
77
+ src/mymodule/group1/file1.test.ts ← test (same directory)
78
+ ```
79
+
80
+ **Exception — separate test folder:** When the framework makes co-location impractical (e.g. Python's common `tests/` convention), or when the community strongly favors a separate folder, a dedicated test root (e.g. `tests/`) is allowed. In that case the test folder **must mirror** the source folder structure exactly:
81
+
82
+ ```
83
+ src/mymodule/group1/file1.py ← source
84
+ tests/mymodule/group1/file1_test.py ← test (mirrored path)
85
+ ```
86
+
87
+ Do not flatten or reorganize paths when using a separate test folder.
88
+
89
+ ---
90
+
91
+ #### 05-should-extract-shared-setup
72
92
 
73
93
  When setup logic is repeated across two or more test files, centralize it (`src/test-utils/`, `internal/testutil/`, `tests/conftest.py`).
74
94
 
@@ -81,7 +101,7 @@ export function makeOrder(overrides: Partial<Order> = {}): Order {
81
101
 
82
102
  ---
83
103
 
84
- #### 05-should-avoid-mocks
104
+ #### 06-should-avoid-mocks
85
105
 
86
106
  Use the lowest-cost alternative that exercises real behavior:
87
107
 
@@ -161,3 +161,31 @@ all:
161
161
  $(MAKE) -C basic-usage run
162
162
  $(MAKE) -C advanced-usage run
163
163
  ```
164
+
165
+ ---
166
+
167
+ #### 07-statistical-models-must-have-eval-targets
168
+
169
+ Projects that contain statistical models (e.g., ML models, LLM-based evaluators, classifiers, ranking systems, or any component whose output quality is measured probabilistically) must define measurable performance thresholds and verify them automatically.
170
+
171
+ **Requirements:**
172
+ - A `make eval` target must exist and execute all performance evaluations
173
+ - Each evaluation must have a **documented minimum performance threshold** (e.g., accuracy ≥ 0.85, F1 ≥ 0.80, BLEU ≥ 0.70)
174
+ - Thresholds must be declared explicitly in the project (e.g., in a config file, `Makefile` variable, or documented in `README.md`)
175
+ - `make eval` must **exit with a non-zero status** (fail) if:
176
+ - The evaluation cannot be executed (missing data, environment errors, model load failures)
177
+ - Any metric falls below its defined minimum threshold
178
+ - CI/CD must invoke `make eval` before releasing any version that changes model weights, prompts, or evaluation logic
179
+
180
+ **Threshold declaration example (Makefile):**
181
+
182
+ ```makefile
183
+ EVAL_MIN_ACCURACY := 0.85
184
+ EVAL_MIN_F1 := 0.80
185
+
186
+ eval:
187
+ python eval.py \
188
+ --min-accuracy $(EVAL_MIN_ACCURACY) \
189
+ --min-f1 $(EVAL_MIN_F1) \
190
+ || (echo "Evaluation failed: metrics below threshold"; exit 1)
191
+ ```
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "agentme",
3
- "version": "0.9.0",
3
+ "version": "0.10.0",
4
4
  "description": "",
5
5
  "dependencies": {
6
6
  "filedist": "^0.34.1"
@@ -22,6 +22,6 @@
22
22
  "url": "https://github.com/flaviostutz/agentme.git"
23
23
  },
24
24
  "devDependencies": {
25
- "xdrs-core": "^0.27.1"
25
+ "xdrs-core": "^0.28.0"
26
26
  }
27
27
  }