agentme 0.8.2 → 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,5 +1,5 @@
1
1
  sets:
2
- - package: xdrs-core@0.27.0
2
+ - package: xdrs-core@0.27.1
3
3
  selector:
4
4
  files:
5
5
  - .xdrs/_core/**
@@ -0,0 +1,181 @@
1
+ ---
2
+ name: agentme-edr-policy-018-ai-agent-development-standards
3
+ description: Defines the standard toolchain, framework, evaluation approach, and context management patterns for building AI agents. Use when scaffolding, reviewing, or extending AI agent projects.
4
+ apply-to: AI agent projects built with Python
5
+ valid-from: 2026-05-26
6
+ ---
7
+
8
+ # agentme-edr-policy-018: AI agent development standards
9
+
10
+ ## Context and Problem Statement
11
+
12
+ AI agent projects vary widely in how they choose frameworks, manage context, evaluate outputs, and expose policies to the agent at runtime. Without a shared baseline, projects accumulate incompatible patterns for LLM provider abstraction, flow design, dataset-driven testing, and knowledge delivery.
13
+
14
+ Which tools, frameworks, and design patterns should AI agent projects follow to ensure reproducibility, testability, and maintainability?
15
+
16
+ ## Decision Outcome
17
+
18
+ **Use Python with LangGraph for flow orchestration, MLflow for experiment tracking and local evaluation, and a file-system-based XDRS knowledge layer that the agent queries at runtime via explicit file tools.**
19
+
20
+ ### Details
21
+
22
+ #### 01-language-and-framework
23
+
24
+ All agent projects MUST be implemented in Python, following [agentme-edr-014](014-python-project-tooling.md) for project structure, tooling, and Makefile conventions.
25
+
26
+ Agent flows MUST be built with **LangGraph**. Use LangGraph `StateGraph` to model each distinct workflow as an explicit directed graph with typed state.
27
+
28
+ #### 02-llm-provider-compatibility
29
+
30
+ Agent code MUST be compatible with both **OpenAI** and **Azure OpenAI** providers without code changes. Achieve this by:
31
+
32
+ - Using the `langchain-openai` package which supports both providers through environment variables.
33
+ - Selecting the provider by setting `OPENAI_API_TYPE=azure` (Azure OpenAI) or omitting it (OpenAI).
34
+ - Never hardcoding provider-specific URLs, deployment names, or API versions in code; inject them through environment variables or a configuration object.
35
+
36
+ Minimum required environment variable surface:
37
+
38
+ | Variable | Purpose |
39
+ |---|---|
40
+ | `OPENAI_API_KEY` | API key (both providers) |
41
+ | `OPENAI_API_BASE` / `AZURE_OPENAI_ENDPOINT` | Endpoint (Azure only) |
42
+ | `OPENAI_API_VERSION` | API version (Azure only) |
43
+ | `AZURE_OPENAI_DEPLOYMENT` | Deployment/model name (Azure only) |
44
+ | `OPENAI_MODEL` | Model name (OpenAI only) |
45
+
46
+ #### 03-observability-and-experiment-tracking
47
+
48
+ Use **MLflow** for all agent observability and evaluation:
49
+
50
+ - Wrap each agent run with `mlflow.start_run()` to capture traces, parameters, and metrics locally.
51
+ - Enable LangChain auto-tracing via `mlflow.langchain.autolog()` at entry point startup.
52
+ - Log run parameters (model name, temperature, prompt version) and output metrics (accuracy, latency, token counts) using `mlflow.log_param` / `mlflow.log_metric`.
53
+ - Run a local MLflow tracking server with `mlflow ui` to inspect runs during development. Do not require a remote MLflow server for local development.
54
+
55
+ #### 04-dataset-driven-accuracy-measurement
56
+
57
+ Every agent pipeline MUST have a companion evaluation dataset and an MLflow experiment that measures accuracy against it. Datasets and evals are organized per-workflow following rule `09-workflow-structure` and rule `10-workflow-evals`.
58
+
59
+ - Store evaluation datasets under `evals/<workflow>/` (sibling of `lib/` and `examples/`), following [agentme-edr-019](019-ml-dataset-structure.md) for structure and format. For MLflow input/output pairs, use the JSONL format described in `agentme-edr-019.04-complex-structured-datasets-must-use-jsonl`.
60
+ - Write evaluation scripts under `evals/<workflow>/` that load the dataset, run each input through the live agent (against real LLMs, not mocks), compare outputs to expected values, and log per-sample and aggregate metrics to an MLflow experiment.
61
+ - Add a `make eval` Makefile target in the module root Makefile (the same Makefile that sits alongside `lib/` and `examples/`) that delegates to all per-workflow eval targets.
62
+ - Evaluation MUST run against real LLM providers, not recorded responses, to capture model drift. MLflow tracking MUST work locally without a remote server.
63
+
64
+ #### 05-flow-documentation
65
+
66
+ Each agent flow MUST be documented as a **Mermaid graph** in the project `README.md`. The diagram must match the LangGraph `StateGraph` definition:
67
+
68
+ - Use `graph TD` or `graph LR` direction.
69
+ - Label each node with its Python function name.
70
+ - Label conditional edges with the condition expression.
71
+ - Update the diagram whenever the graph topology changes.
72
+
73
+ Example minimal diagram block:
74
+
75
+ ```mermaid
76
+ graph TD
77
+ A[fetch_context] --> B[draft_response]
78
+ B --> C{verify}
79
+ C -->|pass| D[output]
80
+ C -->|fail| B
81
+ ```
82
+
83
+ #### 06-xdrs-knowledge-layer
84
+
85
+ When an agent must follow elaborate procedures, decision frameworks, or domain rules:
86
+
87
+ **Static files distributed with the library**
88
+
89
+ - All static files accessed by agents at runtime (XDRS documents, reference tables, domain dictionaries, lookup files) MUST live under a `data/` folder inside the library source tree (`lib/data/`) and be embedded in the package data manifest (e.g. `pyproject.toml` `[tool.hatch.build] include` or equivalent).
90
+ - XDRS Policy and Skill documents MUST be placed at `lib/data/.xdrs/`, using the standard XDRS scope/type/subject folder structure (following `_core-adr-policy-001`).
91
+ - Other static context data (reference tables, domain dictionaries, structured lookup files) MUST be placed under `lib/data/` in an appropriate sub-folder (e.g. `lib/data/context/`).
92
+ - The agent system prompt MUST NOT inline procedure text. It MUST instruct the agent to read specific paths and follow the instructions found there. Example:
93
+
94
+ ```
95
+ Before answering, read and follow the instructions in data/.xdrs/_local/edrs/procedures/triage.md.
96
+ ```
97
+
98
+ **Dynamic context generated per workflow instantiation**
99
+
100
+ - Context files that are generated at runtime per workflow run (unpacked archives, fetched documents, intermediate outputs) MUST be written to a temporary directory created via the OS temp API (`tempfile.mkdtemp()` in Python).
101
+ - The temporary directory MUST be created at the start of the workflow run and passed into the workflow state so all nodes share the same path.
102
+ - The temporary directory MUST be deleted (including all contents) when the workflow run finishes, whether it succeeds or fails, using a `try/finally` block or a context manager.
103
+ - The agent file tools MUST be configured with the temporary directory path at workflow startup so the agent can read from it during the run.
104
+
105
+ - The agent file tools MUST expose `data/` (for static files) and the temporary directory (for dynamic files) as sandboxed readable roots (see rule `07-agent-file-tools`).
106
+
107
+ #### 07-agent-file-tools
108
+
109
+ Every agent that uses the XDRS knowledge layer or file-based context MUST be equipped with at least the following tools:
110
+
111
+ | Tool | Purpose |
112
+ |---|---|
113
+ | `read_file(path)` | Read the full content of a file by path |
114
+ | `search_files(directory, pattern)` | Glob-search for files matching a pattern under a directory |
115
+ | `grep_file(path, query)` | Search for lines matching a string or regex within a file |
116
+
117
+ Implement these tools as LangChain `@tool`-decorated functions with explicit path sandboxing. Two sandboxed roots MUST be configured:
118
+
119
+ | Root | Content | Source |
120
+ |---|---|---|
121
+ | `DATA_ROOT` | Static files shipped with the library (`lib/data/`) | Package data; resolved via `importlib.resources` or a path relative to the installed package |
122
+ | `TEMP_ROOT` | Dynamic files generated for the current workflow run | Temporary directory created by `tempfile.mkdtemp()` at workflow startup |
123
+
124
+ Resolve all paths against the appropriate root. Reject any path that would escape its root (no `../` traversal). `TEMP_ROOT` MUST be passed into the tool factory at workflow startup, not read from a global variable.
125
+
126
+ #### 08-verification-steps
127
+
128
+ Agent flows MUST include at least one explicit verification node before producing final output:
129
+
130
+ - Model the verification step as a dedicated LangGraph node (e.g. `verify_output`).
131
+ - The node checks the draft output against defined acceptance criteria (schema validation, factual consistency check, rubric scoring, or LLM-as-judge call).
132
+ - On failure, the verification node MUST route back to the relevant generation node, not silently pass through.
133
+ - Log verification results (pass/fail, score, reason) as MLflow metrics on the current run.
134
+
135
+ #### 09-workflow-structure
136
+
137
+ Agent logic MUST be organized as named workflows. Each workflow is an independent LangGraph `StateGraph` with a defined start node and end node, connecting agents, states, routes, and decision nodes.
138
+
139
+ For each workflow named `<workflow>`, create:
140
+
141
+ ```text
142
+ lib/
143
+ workflows/
144
+ <workflow>/
145
+ graph.py # StateGraph definition; entry point for the workflow
146
+ agents.py # LangChain agent definitions used by this workflow
147
+ states.py # Typed state dataclasses / TypedDicts
148
+ routes.py # Conditional edge functions
149
+ ```
150
+
151
+ - `graph.py` MUST define and compile the `StateGraph` and expose a `graph` object that callers invoke.
152
+ - Additional modules (tools, prompts, schemas) MAY be added inside `lib/workflows/<workflow>/` when they are specific to that workflow. Shared utilities belong in `lib/<module>/`.
153
+ - Each workflow MUST be documented with a Mermaid diagram in the project `README.md` following rule `05-flow-documentation`.
154
+
155
+ #### 10-workflow-evals
156
+
157
+ For each workflow `<workflow>` there MUST be a corresponding eval directory:
158
+
159
+ ```text
160
+ evals/
161
+ <workflow>/
162
+ Makefile # eval targets for this workflow
163
+ dataset_<slice>/ # one folder per eval slice (see agentme-edr-019)
164
+ eval_<slice>.py # evaluation script for each slice
165
+ ```
166
+
167
+ The `evals/<workflow>/Makefile` MUST define:
168
+
169
+ | Target | Behaviour |
170
+ |---|---|
171
+ | `test-eval` | Runs all eval slices for the workflow |
172
+ | `test-eval-<slice>` | Runs one named slice (e.g. `test-eval-simple`, `test-eval-complex`) |
173
+
174
+ Each `eval_<slice>.py` script MUST:
175
+
176
+ - Load the dataset from `evals/<workflow>/dataset_<slice>/` following [agentme-edr-019](019-ml-dataset-structure.md).
177
+ - Run every input through the live workflow against real LLMs.
178
+ - Log per-sample and aggregate metrics to an MLflow experiment that runs locally.
179
+
180
+ The module root Makefile `make eval` target MUST delegate to `test-eval` in every `evals/<workflow>/Makefile`.
181
+
@@ -0,0 +1,96 @@
1
+ ---
2
+ name: agentme-edr-policy-019-ml-dataset-structure
3
+ description: Defines the standard folder layout and file conventions for ML datasets used in AI/ML projects. Use when creating, organizing, or consuming datasets for machine learning tasks such as image labeling, document extraction, tabular data, LLM evaluation, and Q&A sets.
4
+ apply-to: ML and AI projects that produce or consume datasets
5
+ valid-from: 2026-05-27
6
+ ---
7
+
8
+ # agentme-edr-policy-019: ML dataset structure
9
+
10
+ ## Context and Problem Statement
11
+
12
+ ML projects accumulate datasets of different shapes: file-paired annotations, tabular CSVs, and structured JSONL records. Without a shared layout convention, tooling and agents cannot reliably discover schema files, consume data programmatically, or understand what a dataset contains.
13
+
14
+ How should ML datasets be organized on disk so they are self-describing, easy to consume, and consistent across dataset types?
15
+
16
+ ## Decision Outcome
17
+
18
+ **A standard root layout with mandatory README.md and dataset.schema.json, plus type-specific conventions for data files**
19
+
20
+ Every dataset must live in its own named folder and include a README and a JSON Schema file. Data files are organized according to three dataset types, each with its own placement rule.
21
+
22
+ ### Details
23
+
24
+ #### 01-root-structure-is-mandatory
25
+
26
+ Every dataset must follow this root layout:
27
+
28
+ ```
29
+ /[name-of-dataset]/
30
+ README.md
31
+ dataset.schema.json
32
+ data/ (present when dataset files are referenced by other data, or for file+annotation pairs)
33
+ ... (additional files depending on dataset type)
34
+ ```
35
+
36
+ - `README.md` must explain what the dataset is about, the procedures used to create it, remarks on data quality, and instructions on how to consume it with examples.
37
+ - `dataset.schema.json` must be a valid [JSON Schema](https://json-schema.org/) document describing the structure of the dataset's primary data.
38
+ - The dataset folder name must be lowercase, using hyphens as separators.
39
+
40
+ #### 02-file-annotation-pairs-must-use-data-folder
41
+
42
+ Datasets where each item is a file paired with structured JSON output (e.g. image labeling, document data extraction, medical records with known features) must store all files inside the `data/` subfolder. Each data file must have a sibling JSON annotation file named with the same filename suffixed with `.json`.
43
+
44
+ ```
45
+ /[name-of-dataset]/
46
+ data/
47
+ image1.jpg
48
+ image1.jpg.json
49
+ docu.pdf
50
+ docu.pdf.json
51
+ case-123.json
52
+ case-123.json.json
53
+ dataset.schema.json (defines the schema for the .json annotation files)
54
+ README.md
55
+ ```
56
+
57
+ Placing the annotation file next to its source file (same name + `.json`) keeps them adjacent even in large directories, making it easy to iterate pairs programmatically.
58
+
59
+ Subdirectories inside `data/` are allowed when the number of files warrants grouping, but the `.json` sibling convention must be preserved at each level.
60
+
61
+ #### 03-tabular-datasets-must-use-csv-files-at-root
62
+
63
+ Datasets composed of column-oriented tabular data must place CSV files at the root of the dataset folder. All tabular files must conform to the schema defined in `dataset.schema.json`, which must describe columns as named attributes with their types.
64
+
65
+ ```
66
+ /[name-of-dataset]/
67
+ samples-special.csv
68
+ samples-simple.csv
69
+ dataset.schema.json (column definitions with types for all tabular files)
70
+ README.md
71
+ ```
72
+
73
+ Multiple CSV files are allowed when they represent different slices or splits of the same schema (e.g. train/test splits, subsets by source). All files in the same dataset must share the same column schema.
74
+
75
+ #### 04-complex-structured-datasets-must-use-jsonl
76
+
77
+ Datasets with complex or heterogeneous per-record structures (e.g. LLM workflow evaluation sets, Q&A pairs, input → expected_output pairs) must use JSONL files (one JSON object per line) placed at the root of the dataset folder. Each line must conform to the schema defined in `dataset.schema.json`.
78
+
79
+ ```
80
+ /[name-of-dataset]/
81
+ simple-cases-test.jsonl
82
+ edge-cases-test.jsonl
83
+ dataset.schema.json (schema defining the structure of each line in the JSONL files)
84
+ README.md
85
+ ```
86
+
87
+ Multiple JSONL files are allowed when they represent different splits or categories (e.g. easy vs. edge cases). All files in the same dataset must conform to the same line schema.
88
+
89
+ #### 05-referenced-files-must-live-in-data-folder
90
+
91
+ When any dataset type (tabular, JSONL, or annotation-pair) contains references to external files as part of the data (e.g. a JSONL record that includes a file path), those referenced files must be stored inside the `data/` subfolder of the dataset. Paths inside data records must be relative to the dataset root.
92
+
93
+ ## References
94
+
95
+ - [JSON Schema specification](https://json-schema.org/)
96
+ - [JSONL format](https://jsonlines.org/)
@@ -29,6 +29,8 @@ Language and framework-specific tooling and project structure.
29
29
  - [agentme-edr-010](application/010-golang-project-tooling.md) - **Go project tooling and structure** - Scaffold Go CLIs and libraries with the standard layout *(includes skill: [003-create-golang-project](application/skills/003-create-golang-project/SKILL.md))*
30
30
  - [agentme-edr-014](application/014-python-project-tooling.md) - **Python project tooling and structure** - Scaffold Python packages and CLIs with the standard layout *(includes skill: [005-create-python-project](application/skills/005-create-python-project/SKILL.md))*
31
31
  - [agentme-edr-015](application/015-cli-tool-standards.md) - **CLI tool standards** - Define command UX and behavior for CLI tools
32
+ - [agentme-edr-018](application/018-ai-agent-development-standards.md) - **AI agent development standards** - Standard toolchain, framework, evaluation, and context patterns for AI agent projects
33
+ - [agentme-edr-019](application/019-ml-dataset-structure.md) - **ML dataset structure** - Standard folder layout and file conventions for ML datasets
32
34
  - [004-select-relevant-xdrs](application/skills/004-select-relevant-xdrs/SKILL.md) - **Select relevant XDRs**
33
35
 
34
36
  ## Devops
package/package.json CHANGED
@@ -1,9 +1,9 @@
1
1
  {
2
2
  "name": "agentme",
3
- "version": "0.8.2",
3
+ "version": "0.9.0",
4
4
  "description": "",
5
5
  "dependencies": {
6
- "filedist": "^0.33.0"
6
+ "filedist": "^0.34.1"
7
7
  },
8
8
  "bin": "bin/filedist.js",
9
9
  "files": [
@@ -22,6 +22,6 @@
22
22
  "url": "https://github.com/flaviostutz/agentme.git"
23
23
  },
24
24
  "devDependencies": {
25
- "xdrs-core": "^0.27.0"
25
+ "xdrs-core": "^0.27.1"
26
26
  }
27
27
  }