npm - agentme - Versions diffs - 0.9.0 → 0.10.0 - Mend

agentme 0.9.0 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

package/.filedist-package.yml CHANGED Viewed

@@ -1,5 +1,6 @@
 sets:
-  - package: xdrs-core@0.27.1
+  - package: xdrs-core@0.28.0
+  # - package: git:https://github.com/flaviostutz/xdrs-core.git@main
     selector:
       files:
         - .xdrs/_core/**

package/.xdrs/agentme/edrs/application/018-ai-agent-development-standards.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: agentme-edr-policy-018-ai-agent-development-standards
-description: Defines the standard toolchain, framework, evaluation approach, and context management patterns for building AI agents. Use when scaffolding, reviewing, or extending AI agent projects.
+description: Defines the standard toolchain, framework, evaluation approach, and workflow patterns for building AI agents with Python and LangGraph. Use when scaffolding, reviewing, or extending AI agent projects.
 apply-to: AI agent projects built with Python
 valid-from: 2026-05-26
 ---
@@ -9,13 +9,13 @@ valid-from: 2026-05-26
 ## Context and Problem Statement
-AI agent projects vary widely in how they choose frameworks, manage context, evaluate outputs, and expose policies to the agent at runtime. Without a shared baseline, projects accumulate incompatible patterns for LLM provider abstraction, flow design, dataset-driven testing, and knowledge delivery.
+AI agent projects vary widely in how they choose frameworks, manage context, evaluate outputs, and structure workflows. Without a shared baseline, projects accumulate incompatible patterns for LLM provider abstraction, flow design, and dataset-driven testing.
 Which tools, frameworks, and design patterns should AI agent projects follow to ensure reproducibility, testability, and maintainability?
 ## Decision Outcome
-**Use Python with LangGraph for flow orchestration, MLflow for experiment tracking and local evaluation, and a file-system-based XDRS knowledge layer that the agent queries at runtime via explicit file tools.**
+**Use Python with LangGraph for flow orchestration and MLflow for experiment tracking and local evaluation.**
 ### Details
@@ -54,7 +54,7 @@ Use **MLflow** for all agent observability and evaluation:
 #### 04-dataset-driven-accuracy-measurement
-Every agent pipeline MUST have a companion evaluation dataset and an MLflow experiment that measures accuracy against it. Datasets and evals are organized per-workflow following rule `09-workflow-structure` and rule `10-workflow-evals`.
+Every agent pipeline MUST have a companion evaluation dataset and an MLflow experiment that measures accuracy against it. Datasets and evals are organized per-workflow following rule `07-workflow-structure` and rule `08-workflow-evals`.
 - Store evaluation datasets under `evals/<workflow>/` (sibling of `lib/` and `examples/`), following [agentme-edr-019](019-ml-dataset-structure.md) for structure and format. For MLflow input/output pairs, use the JSONL format described in `agentme-edr-019.04-complex-structured-datasets-must-use-jsonl`.
 - Write evaluation scripts under `evals/<workflow>/` that load the dataset, run each input through the live agent (against real LLMs, not mocks), compare outputs to expected values, and log per-sample and aggregate metrics to an MLflow experiment.
@@ -80,50 +80,7 @@ graph TD
     C -->|fail| B
 ```
-#### 06-xdrs-knowledge-layer
-When an agent must follow elaborate procedures, decision frameworks, or domain rules:
-**Static files distributed with the library**
-- All static files accessed by agents at runtime (XDRS documents, reference tables, domain dictionaries, lookup files) MUST live under a `data/` folder inside the library source tree (`lib/data/`) and be embedded in the package data manifest (e.g. `pyproject.toml` `[tool.hatch.build] include` or equivalent).
-- XDRS Policy and Skill documents MUST be placed at `lib/data/.xdrs/`, using the standard XDRS scope/type/subject folder structure (following `_core-adr-policy-001`).
-- Other static context data (reference tables, domain dictionaries, structured lookup files) MUST be placed under `lib/data/` in an appropriate sub-folder (e.g. `lib/data/context/`).
-- The agent system prompt MUST NOT inline procedure text. It MUST instruct the agent to read specific paths and follow the instructions found there. Example:
-  ```
-  Before answering, read and follow the instructions in data/.xdrs/_local/edrs/procedures/triage.md.
-  ```
-**Dynamic context generated per workflow instantiation**
-- Context files that are generated at runtime per workflow run (unpacked archives, fetched documents, intermediate outputs) MUST be written to a temporary directory created via the OS temp API (`tempfile.mkdtemp()` in Python).
-- The temporary directory MUST be created at the start of the workflow run and passed into the workflow state so all nodes share the same path.
-- The temporary directory MUST be deleted (including all contents) when the workflow run finishes, whether it succeeds or fails, using a `try/finally` block or a context manager.
-- The agent file tools MUST be configured with the temporary directory path at workflow startup so the agent can read from it during the run.
-- The agent file tools MUST expose `data/` (for static files) and the temporary directory (for dynamic files) as sandboxed readable roots (see rule `07-agent-file-tools`).
-#### 07-agent-file-tools
-Every agent that uses the XDRS knowledge layer or file-based context MUST be equipped with at least the following tools:
-| Tool | Purpose |
-|---|---|
-| `read_file(path)` | Read the full content of a file by path |
-| `search_files(directory, pattern)` | Glob-search for files matching a pattern under a directory |
-| `grep_file(path, query)` | Search for lines matching a string or regex within a file |
-Implement these tools as LangChain `@tool`-decorated functions with explicit path sandboxing. Two sandboxed roots MUST be configured:
-| Root | Content | Source |
-|---|---|---|
-| `DATA_ROOT` | Static files shipped with the library (`lib/data/`) | Package data; resolved via `importlib.resources` or a path relative to the installed package |
-| `TEMP_ROOT` | Dynamic files generated for the current workflow run | Temporary directory created by `tempfile.mkdtemp()` at workflow startup |
-Resolve all paths against the appropriate root. Reject any path that would escape its root (no `../` traversal). `TEMP_ROOT` MUST be passed into the tool factory at workflow startup, not read from a global variable.
-#### 08-verification-steps
+#### 06-verification-steps
 Agent flows MUST include at least one explicit verification node before producing final output:
@@ -132,7 +89,7 @@ Agent flows MUST include at least one explicit verification node before producin
 - On failure, the verification node MUST route back to the relevant generation node, not silently pass through.
 - Log verification results (pass/fail, score, reason) as MLflow metrics on the current run.
-#### 09-workflow-structure
+#### 07-workflow-structure
 Agent logic MUST be organized as named workflows. Each workflow is an independent LangGraph `StateGraph` with a defined start node and end node, connecting agents, states, routes, and decision nodes.
@@ -152,7 +109,7 @@ lib/
 - Additional modules (tools, prompts, schemas) MAY be added inside `lib/workflows/<workflow>/` when they are specific to that workflow. Shared utilities belong in `lib/<module>/`.
 - Each workflow MUST be documented with a Mermaid diagram in the project `README.md` following rule `05-flow-documentation`.
-#### 10-workflow-evals
+#### 08-workflow-evals
 For each workflow `<workflow>` there MUST be a corresponding eval directory:
@@ -168,8 +125,8 @@ The `evals/<workflow>/Makefile` MUST define:
 | Target | Behaviour |
 |---|---|
-| `test-eval` | Runs all eval slices for the workflow |
-| `test-eval-<slice>` | Runs one named slice (e.g. `test-eval-simple`, `test-eval-complex`) |
+| `eval` | Runs all eval slices for the workflow |
+| `eval-<slice>` | Runs one named slice (e.g. `eval-simple`, `eval-complex`) |
 Each `eval_<slice>.py` script MUST:
@@ -177,5 +134,23 @@ Each `eval_<slice>.py` script MUST:
 - Run every input through the live workflow against real LLMs.
 - Log per-sample and aggregate metrics to an MLflow experiment that runs locally.
-The module root Makefile `make eval` target MUST delegate to `test-eval` in every `evals/<workflow>/Makefile`.
+The module root Makefile `make eval` target MUST delegate to `eval` in every `evals/<workflow>/Makefile`.
+#### 09-local-sandbox
+When a workflow node or tool requires a **local sandbox** — an isolated environment where the agent can read files, glob-search directories, and execute shell commands — use the **[deepagents](https://github.com/deepagents/deepagents) framework** to provide that sandbox.
+**When to apply this rule**
+Use deepagents whenever ANY of the following is true for a workflow or tool:
+- The agent needs to execute shell commands or scripts in a controlled environment.
+- The agent needs to list, read, or search files across multiple directories at runtime.
+- The agent operates on user-supplied or generated file trees that must not escape a sandboxed boundary.
+**Integration requirements**
+- Initialize the sandbox at the start of the workflow run and shut it down in the same `try/finally` block.
+- Pass the sandbox handle into the LangGraph workflow state so all nodes share the same sandbox instance.
+- If the host-side code needs to pass files into the sandbox (e.g. generated config or input data), create a temporary directory with `tempfile.mkdtemp()`, write the files there, and mount it into the sandbox. Clean it up in the `finally` block.
+- Replace hand-rolled `read_file`, `search_files`, and `grep_file` tool implementations with the equivalent tools provided by deepagents.

package/.xdrs/agentme/edrs/application/019-ml-dataset-structure.md CHANGED Viewed

@@ -17,13 +17,13 @@ How should ML datasets be organized on disk so they are self-describing, easy to
 **A standard root layout with mandatory README.md and dataset.schema.json, plus type-specific conventions for data files**
-Every dataset must live in its own named folder and include a README and a JSON Schema file. Data files are organized according to three dataset types, each with its own placement rule.
+Every dataset MUST live in its own named folder and include a README and a JSON Schema file. Data files are organized according to three dataset types, each with its own placement rule.
 ### Details
 #### 01-root-structure-is-mandatory
-Every dataset must follow this root layout:
+Every dataset MUST follow this root layout:
 ```
 /[name-of-dataset]/
@@ -33,13 +33,13 @@ Every dataset must follow this root layout:
     ...                (additional files depending on dataset type)
 ```
-- `README.md` must explain what the dataset is about, the procedures used to create it, remarks on data quality, and instructions on how to consume it with examples.
-- `dataset.schema.json` must be a valid [JSON Schema](https://json-schema.org/) document describing the structure of the dataset's primary data.
-- The dataset folder name must be lowercase, using hyphens as separators.
+- `README.md` MUST explain what the dataset is about, the procedures used to create it, remarks on data quality, and instructions on how to consume it with examples.
+- `dataset.schema.json` MUST be a valid [JSON Schema](https://json-schema.org/) document describing the structure of the dataset's primary data.
+- The dataset folder name MUST be lowercase, using underscores as separators (e.g. `my_dataset`).
 #### 02-file-annotation-pairs-must-use-data-folder
-Datasets where each item is a file paired with structured JSON output (e.g. image labeling, document data extraction, medical records with known features) must store all files inside the `data/` subfolder. Each data file must have a sibling JSON annotation file named with the same filename suffixed with `.json`.
+Datasets where each item is a file paired with structured JSON output (e.g. image labeling, document data extraction, medical records with known features) MUST store all files inside the `data/` subfolder. Each data file MUST have a sibling JSON annotation file named with the same filename suffixed with `.json`.
 ```
 /[name-of-dataset]/
@@ -56,11 +56,11 @@ Datasets where each item is a file paired with structured JSON output (e.g. imag
 Placing the annotation file next to its source file (same name + `.json`) keeps them adjacent even in large directories, making it easy to iterate pairs programmatically.
-Subdirectories inside `data/` are allowed when the number of files warrants grouping, but the `.json` sibling convention must be preserved at each level.
+Subdirectories inside `data/` are allowed when the number of files warrants grouping, but the `.json` sibling convention MUST be preserved at each level.
 #### 03-tabular-datasets-must-use-csv-files-at-root
-Datasets composed of column-oriented tabular data must place CSV files at the root of the dataset folder. All tabular files must conform to the schema defined in `dataset.schema.json`, which must describe columns as named attributes with their types.
+Datasets composed of column-oriented tabular data MUST place CSV files at the root of the dataset folder. All tabular files MUST conform to the schema defined in `dataset.schema.json`, which MUST describe columns as named attributes with their types.
 ```
 /[name-of-dataset]/
@@ -70,11 +70,11 @@ Datasets composed of column-oriented tabular data must place CSV files at the ro
     README.md
 ```
-Multiple CSV files are allowed when they represent different slices or splits of the same schema (e.g. train/test splits, subsets by source). All files in the same dataset must share the same column schema.
+Multiple CSV files are allowed when they represent different slices or splits of the same schema (e.g. train/test splits, subsets by source). All files in the same dataset MUST share the same column schema.
 #### 04-complex-structured-datasets-must-use-jsonl
-Datasets with complex or heterogeneous per-record structures (e.g. LLM workflow evaluation sets, Q&A pairs, input → expected_output pairs) must use JSONL files (one JSON object per line) placed at the root of the dataset folder. Each line must conform to the schema defined in `dataset.schema.json`.
+Datasets with complex or heterogeneous per-record structures (e.g. LLM workflow evaluation sets, Q&A pairs, input → expected_output pairs) MUST use JSONL files (one JSON object per line) placed at the root of the dataset folder. Each line MUST conform to the schema defined in `dataset.schema.json`.
 ```
 /[name-of-dataset]/
@@ -84,11 +84,11 @@ Datasets with complex or heterogeneous per-record structures (e.g. LLM workflow
     README.md
 ```
-Multiple JSONL files are allowed when they represent different splits or categories (e.g. easy vs. edge cases). All files in the same dataset must conform to the same line schema.
+Multiple JSONL files are allowed when they represent different splits or categories (e.g. easy vs. edge cases). All files in the same dataset MUST conform to the same line schema.
 #### 05-referenced-files-must-live-in-data-folder
-When any dataset type (tabular, JSONL, or annotation-pair) contains references to external files as part of the data (e.g. a JSONL record that includes a file path), those referenced files must be stored inside the `data/` subfolder of the dataset. Paths inside data records must be relative to the dataset root.
+When any dataset type (tabular, JSONL, or annotation-pair) contains references to external files as part of the data (e.g. a JSONL record that includes a file path), those referenced files MUST be stored inside the `data/` subfolder of the dataset. Paths inside data records MUST be relative to the dataset root.
 ## References

package/.xdrs/agentme/edrs/application/020-ai-agent-xdrs-knowledge-layer.md ADDED Viewed

@@ -0,0 +1,99 @@
+---
+name: agentme-edr-policy-020-ai-agent-xdrs-knowledge-layer
+description: Defines how to integrate XDRS as the runtime knowledge source of truth for AI agents — covering document placement, AGENTS.md setup, file tools, and local sandbox configuration. Apply only when the project explicitly uses XDRS to govern agent behavior.
+apply-to: AI agent projects that use XDRS as the source of truth for policies and skills
+valid-from: 2026-05-27
+---
+# agentme-edr-policy-020: AI agent XDRS knowledge layer
+## Context and Problem Statement
+AI agents need access to project-specific policies and skills at runtime to produce consistent, governed outputs. XDRS provides a file-system-based structure for capturing these decisions, but there is no standard pattern for embedding XDRS documents in agent libraries, wiring the agent to consult them, or sandboxing file access securely.
+How should an AI agent project integrate XDRS as its runtime source of truth for policies and skills?
+## Decision Outcome
+**Embed XDRS documents in `lib/data/.xdrs/`, instruct the agent to consult them via `AGENTS.md`, equip the agent with sandboxed file tools, and use the deepagents framework when a local sandbox is required.**
+This policy MUST only be applied when the project explicitly chooses XDRS as its knowledge governance layer. It is not required by [agentme-edr-018](018-ai-agent-development-standards.md) in general.
+### Details
+#### 01-xdrs-knowledge-layer
+XDRS documents are the source of truth for all policies and skills that the agent must follow during its tasks. The agent MUST consult XDRS before acting, not rely on general knowledge alone.
+**Placing XDRS documents in the library**
+- XDRS Policy and Skill documents MUST be placed at `lib/data/.xdrs/`, using the standard XDRS scope/type/subject folder structure (following `_core-adr-policy-001`).
+- They MUST be embedded in the package data manifest (e.g. `pyproject.toml` `[tool.hatch.build] include` or equivalent) so they are available at runtime.
+- When exposed through a deepagents sandbox, they MUST be mounted at `/.xdrs/` inside the sandbox (see rule `03-local-sandbox`).
+**AGENTS.md — mandatory XDRS consultation**
+Place an `AGENTS.md` file at the root of the deepagents sandbox (i.e. alongside `/.xdrs/`). This file instructs the agent to always consult XDRS before acting. Its content MUST follow the xdrs-core AGENTS.md template:
+```markdown
+# AGENTS.md
+**Purpose:** This file is intentionally brief. All decisions and working instructions are captured as Policies or Skills in the XDRS structure.
+## Policy Consultation in XDRS Is Mandatory For Every Request
+Before answering **any** request you MUST:
+1. Read the XDRS root index at `/.xdrs/index.md` to identify relevant Policies and Skills.
+2. Read the relevant Policy and Skill files.
+3. Base your actions on those Policies and Skills.
+This rule has NO exceptions. Do not answer from general knowledge alone when a Policy may exist on the topic.
+```
+The agent system prompt MUST reference `AGENTS.md` so the agent loads it at startup. Example:
+```
+Read /AGENTS.md and follow all instructions in it before proceeding.
+```
+#### 02-agent-file-tools
+Every agent that uses the XDRS knowledge layer MUST use the file tools provided by the deepagents framework. Do not implement hand-rolled alternatives — see [agentme-edr-policy-018-ai-agent-development-standards.[09-local-sandbox]](018-ai-agent-development-standards.md) for the full sandbox and tool requirements.
+These tools operate over two sandboxed roots (configured in rule `03-local-sandbox`):
+| Root | Content | Source |
+|---|---|---|
+| `data_root` | Static files shipped with the library (`lib/data/`) | Resolved via `importlib.resources` at workflow startup |
+| `temp_root` | Dynamic files generated for the current workflow run | Temporary directory created by `tempfile.mkdtemp()` at workflow startup |
+`temp_root` MUST be created at workflow startup and cleaned up in the same `try/finally` block. Pass it explicitly into the workflow; do not read it from a global variable.
+#### 03-local-sandbox
+Follow [agentme-edr-policy-018-ai-agent-development-standards.[09-local-sandbox]](018-ai-agent-development-standards.md) for the general deepagents sandbox setup. When XDRS is in use, add the following mounts to the sandbox configuration:
+| Source | Content | Deepagents sandbox path |
+|---|---|---|
+| `lib/data/.xdrs/` | XDRS Policy and Skill documents | `/.xdrs/` (read-only) |
+| Generated at startup | `AGENTS.md` instructing the agent to consult XDRS | `/AGENTS.md` (read-only) |
+XDRS documents MUST always be mounted at `/.xdrs/`. `AGENTS.md` MUST always be placed at the sandbox root (`/AGENTS.md`).
+Example XDRS mount additions:
+```python
+from importlib.resources import files
+from pathlib import Path
+data_root = str(files("myagent").joinpath("data"))
+agents_md = Path(temp_root) / "AGENTS.md"
+agents_md.write_text(_AGENTS_MD)  # content from xdrs-core AGENTS.md template; see rule 01-xdrs-knowledge-layer
+# Add these mounts alongside the base mounts from agentme-edr-018 rule 09-local-sandbox:
+xdrs_mounts = [
+    {"src": f"{data_root}/.xdrs", "dst": "/.xdrs",    "readonly": True},
+    {"src": str(agents_md),       "dst": "/AGENTS.md", "readonly": True},
+]
+```

package/.xdrs/agentme/edrs/devops/008-common-targets.md CHANGED Viewed

@@ -103,6 +103,7 @@ Targets are organized into five lifecycle groups. Projects must use these names
 | `test-unit` | Run unit tests only, including coverage report generation and coverage threshold enforcement. |
 | `test-integration` | *(Optional)* Run integration and end-to-end tests only. Projects without integration tests may omit this target. |
 | `test-smoke` | *(Optional)* Run a fast, minimal subset of tests to verify the software is basically functional. Useful as a post-deploy health check. |
+| `eval` | *(Optional)* Run **all evaluations** for the module. Used alongside `test` to measure the accuracy and performance of statistical systems such as ML models, AI agents, or noisy systems. Typically runs against a live or near-live system (similar to an integration test) and produces a performance analysis report (e.g., F1 score, Accuracy, Precision, Recall). Must not be included in `test` or `all` — evals are opt-in because they require live dependencies and may be slow or costly to run. Individual evaluations must follow the prefix convention: `eval-<qualifier>` (e.g., `eval-simple`, `eval-complex`). |
 ##### Release group

package/.xdrs/agentme/edrs/index.md CHANGED Viewed

@@ -29,8 +29,9 @@ Language and framework-specific tooling and project structure.
 - [agentme-edr-010](application/010-golang-project-tooling.md) - **Go project tooling and structure** - Scaffold Go CLIs and libraries with the standard layout *(includes skill: [003-create-golang-project](application/skills/003-create-golang-project/SKILL.md))*
 - [agentme-edr-014](application/014-python-project-tooling.md) - **Python project tooling and structure** - Scaffold Python packages and CLIs with the standard layout *(includes skill: [005-create-python-project](application/skills/005-create-python-project/SKILL.md))*
 - [agentme-edr-015](application/015-cli-tool-standards.md) - **CLI tool standards** - Define command UX and behavior for CLI tools
-- [agentme-edr-018](application/018-ai-agent-development-standards.md) - **AI agent development standards** - Standard toolchain, framework, evaluation, and context patterns for AI agent projects
+- [agentme-edr-018](application/018-ai-agent-development-standards.md) - **AI agent development standards** - Standard toolchain, framework, evaluation, and workflow patterns for AI agent projects built with Python and LangGraph
 - [agentme-edr-019](application/019-ml-dataset-structure.md) - **ML dataset structure** - Standard folder layout and file conventions for ML datasets
+- [agentme-edr-020](application/020-ai-agent-xdrs-knowledge-layer.md) - **AI agent XDRS knowledge layer** - How to integrate XDRS as the runtime source of truth for policies and skills in AI agents (apply only when the project explicitly uses XDRS)
 - [004-select-relevant-xdrs](application/skills/004-select-relevant-xdrs/SKILL.md) - **Select relevant XDRs**
 ## Devops

package/.xdrs/agentme/edrs/principles/004-unit-test-requirements.md CHANGED Viewed

@@ -68,7 +68,27 @@ Builds that miss the threshold must not be merged.
 ---
-#### 04-should-extract-shared-setup
+#### 04-must-place-test-files-alongside-source
+Test files must live next to the source file they test, in the same directory, following the convention of the language/framework (e.g. `file.test.ts`, `file_test.go`, `file.spec.js`).
+```
+src/mymodule/group1/file1.ts        ← source
+src/mymodule/group1/file1.test.ts   ← test (same directory)
+```
+**Exception — separate test folder:** When the framework makes co-location impractical (e.g. Python's common `tests/` convention), or when the community strongly favors a separate folder, a dedicated test root (e.g. `tests/`) is allowed. In that case the test folder **must mirror** the source folder structure exactly:
+```
+src/mymodule/group1/file1.py          ← source
+tests/mymodule/group1/file1_test.py   ← test (mirrored path)
+```
+Do not flatten or reorganize paths when using a separate test folder.
+---
+#### 05-should-extract-shared-setup
 When setup logic is repeated across two or more test files, centralize it (`src/test-utils/`, `internal/testutil/`, `tests/conftest.py`).
@@ -81,7 +101,7 @@ export function makeOrder(overrides: Partial<Order> = {}): Order {
 ---
-#### 05-should-avoid-mocks
+#### 06-should-avoid-mocks
 Use the lowest-cost alternative that exercises real behavior:

package/.xdrs/agentme/edrs/principles/007-project-quality-standards.md CHANGED Viewed

@@ -161,3 +161,31 @@ all:
 	$(MAKE) -C basic-usage run
 	$(MAKE) -C advanced-usage run
 ```
+---
+#### 07-statistical-models-must-have-eval-targets
+Projects that contain statistical models (e.g., ML models, LLM-based evaluators, classifiers, ranking systems, or any component whose output quality is measured probabilistically) must define measurable performance thresholds and verify them automatically.
+**Requirements:**
+- A `make eval` target must exist and execute all performance evaluations
+- Each evaluation must have a **documented minimum performance threshold** (e.g., accuracy ≥ 0.85, F1 ≥ 0.80, BLEU ≥ 0.70)
+- Thresholds must be declared explicitly in the project (e.g., in a config file, `Makefile` variable, or documented in `README.md`)
+- `make eval` must **exit with a non-zero status** (fail) if:
+  - The evaluation cannot be executed (missing data, environment errors, model load failures)
+  - Any metric falls below its defined minimum threshold
+- CI/CD must invoke `make eval` before releasing any version that changes model weights, prompts, or evaluation logic
+**Threshold declaration example (Makefile):**
+```makefile
+EVAL_MIN_ACCURACY := 0.85
+EVAL_MIN_F1       := 0.80
+eval:
+	python eval.py \
+	  --min-accuracy $(EVAL_MIN_ACCURACY) \
+	  --min-f1 $(EVAL_MIN_F1) \
+	  || (echo "Evaluation failed: metrics below threshold"; exit 1)
+```

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "agentme",
-  "version": "0.9.0",
+  "version": "0.10.0",
   "description": "",
   "dependencies": {
     "filedist": "^0.34.1"
@@ -22,6 +22,6 @@
     "url": "https://github.com/flaviostutz/agentme.git"
   },
   "devDependencies": {
-    "xdrs-core": "^0.27.1"
+    "xdrs-core": "^0.28.0"
   }
 }