agentme 0.9.0 → 0.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (27) hide show
  1. package/.filedist-package.yml +2 -1
  2. package/.xdrs/agentme/edrs/application/003-javascript-project-tooling.md +41 -5
  3. package/.xdrs/agentme/edrs/application/010-golang-project-tooling.md +39 -15
  4. package/.xdrs/agentme/edrs/application/014-python-project-tooling.md +63 -5
  5. package/.xdrs/agentme/edrs/application/015-cli-tool-standards.md +25 -24
  6. package/.xdrs/agentme/edrs/application/018-ai-agent-development-standards.md +57 -64
  7. package/.xdrs/agentme/edrs/application/019-ml-dataset-structure.md +12 -12
  8. package/.xdrs/agentme/edrs/application/020-ai-agent-xdrs-knowledge-layer.md +99 -0
  9. package/.xdrs/agentme/edrs/application/021-pragmatic-hexagonal-architecture.md +112 -0
  10. package/.xdrs/agentme/edrs/application/skills/001-create-javascript-project/SKILL.md +26 -11
  11. package/.xdrs/agentme/edrs/application/skills/003-create-golang-project/SKILL.md +31 -14
  12. package/.xdrs/agentme/edrs/application/skills/005-create-python-project/SKILL.md +56 -23
  13. package/.xdrs/agentme/edrs/devops/005-monorepo-structure.md +1 -1
  14. package/.xdrs/agentme/edrs/devops/006-github-pipelines.md +1 -1
  15. package/.xdrs/agentme/edrs/devops/008-common-targets.md +2 -1
  16. package/.xdrs/agentme/edrs/devops/017-tool-execution-and-scripting.md +1 -1
  17. package/.xdrs/agentme/edrs/governance/013-contributing-guide-requirements.md +1 -1
  18. package/.xdrs/agentme/edrs/index.md +3 -1
  19. package/.xdrs/agentme/edrs/observability/011-service-health-check-endpoint.md +1 -1
  20. package/.xdrs/agentme/edrs/principles/002-coding-best-practices.md +1 -1
  21. package/.xdrs/agentme/edrs/principles/004-unit-test-requirements.md +23 -3
  22. package/.xdrs/agentme/edrs/principles/007-project-quality-standards.md +31 -3
  23. package/.xdrs/agentme/edrs/principles/009-error-handling.md +1 -1
  24. package/.xdrs/agentme/edrs/principles/012-continuous-xdr-enrichment.md +2 -2
  25. package/.xdrs/agentme/edrs/principles/016-cross-language-module-structure.md +1 -1
  26. package/.xdrs/agentme/edrs/principles/articles/001-continuous-xdr-improvement.md +5 -5
  27. package/package.json +2 -2
@@ -19,7 +19,7 @@ What GitHub Actions workflows should every project follow to ensure a safe, pred
19
19
 
20
20
  Separating these concerns eliminates accidental publishes from CI runs, ensures monotag has access to the full git history, and makes each workflow independently auditable and re-runnable.
21
21
 
22
- ### Implementation Details
22
+ ### Details
23
23
 
24
24
  #### Workflow overview
25
25
 
@@ -19,7 +19,7 @@ What standard set of Makefile target names and execution rules should projects a
19
19
 
20
20
  Standardizing both the target names and the execution chain removes per-project guesswork, makes CI pipelines reusable, and keeps tooling behavior visible in one place.
21
21
 
22
- ### Implementation Details
22
+ ### Details
23
23
 
24
24
  #### 01-every-project-must-have-root-makefile
25
25
 
@@ -103,6 +103,7 @@ Targets are organized into five lifecycle groups. Projects must use these names
103
103
  | `test-unit` | Run unit tests only, including coverage report generation and coverage threshold enforcement. |
104
104
  | `test-integration` | *(Optional)* Run integration and end-to-end tests only. Projects without integration tests may omit this target. |
105
105
  | `test-smoke` | *(Optional)* Run a fast, minimal subset of tests to verify the software is basically functional. Useful as a post-deploy health check. |
106
+ | `eval` | *(Optional)* Run **all evaluations** for the module. Used alongside `test` to measure the accuracy and performance of statistical systems such as ML models, AI agents, or noisy systems. Typically runs against a live or near-live system (similar to an integration test) and produces a performance analysis report (e.g., F1 score, Accuracy, Precision, Recall). Must not be included in `test` or `all` — evals are opt-in because they require live dependencies and may be slow or costly to run. Individual evaluations must follow the prefix convention: `eval-<qualifier>` (e.g., `eval-simple`, `eval-complex`). |
106
107
 
107
108
  ##### Release group
108
109
 
@@ -19,7 +19,7 @@ How should projects execute development commands so the command surface stays pr
19
19
 
20
20
  This keeps local development and CI aligned, reduces indirection, and lets contributors understand project behavior by reading one command surface.
21
21
 
22
- ### Implementation Details
22
+ ### Details
23
23
 
24
24
  - Every project MUST use a root `Makefile` as the authoritative entry point for developer and pipeline commands.
25
25
  - The target names in that `Makefile` MUST follow [agentme-edr-008](008-common-targets.md).
@@ -19,7 +19,7 @@ What contributor workflow guidance must every project publish so contributors kn
19
19
 
20
20
  Projects must keep a `CONTRIBUTING.md` file at the repository root. The file must explain where bugs, feature discussions, and code changes belong so contributors follow a predictable workflow before opening pull requests.
21
21
 
22
- ### Implementation Details
22
+ ### Details
23
23
 
24
24
  - Every project **MUST** have a root `CONTRIBUTING.md`.
25
25
  - The guide **MUST** direct bug reports to issues.
@@ -29,8 +29,10 @@ Language and framework-specific tooling and project structure.
29
29
  - [agentme-edr-010](application/010-golang-project-tooling.md) - **Go project tooling and structure** - Scaffold Go CLIs and libraries with the standard layout *(includes skill: [003-create-golang-project](application/skills/003-create-golang-project/SKILL.md))*
30
30
  - [agentme-edr-014](application/014-python-project-tooling.md) - **Python project tooling and structure** - Scaffold Python packages and CLIs with the standard layout *(includes skill: [005-create-python-project](application/skills/005-create-python-project/SKILL.md))*
31
31
  - [agentme-edr-015](application/015-cli-tool-standards.md) - **CLI tool standards** - Define command UX and behavior for CLI tools
32
- - [agentme-edr-018](application/018-ai-agent-development-standards.md) - **AI agent development standards** - Standard toolchain, framework, evaluation, and context patterns for AI agent projects
32
+ - [agentme-edr-018](application/018-ai-agent-development-standards.md) - **AI agent development standards** - Standard toolchain, framework, evaluation, and workflow patterns for AI agent projects built with Python and LangGraph
33
33
  - [agentme-edr-019](application/019-ml-dataset-structure.md) - **ML dataset structure** - Standard folder layout and file conventions for ML datasets
34
+ - [agentme-edr-020](application/020-ai-agent-xdrs-knowledge-layer.md) - **AI agent XDRS knowledge layer** - How to integrate XDRS as the runtime source of truth for policies and skills in AI agents (apply only when the project explicitly uses XDRS)
35
+ - [agentme-edr-021](application/021-pragmatic-hexagonal-architecture.md) - **Pragmatic hexagonal architecture** - Organize application layers as External/Adapters/Application with practical coupling rules
34
36
  - [004-select-relevant-xdrs](application/skills/004-select-relevant-xdrs/SKILL.md) - **Select relevant XDRs**
35
37
 
36
38
  ## Devops
@@ -19,7 +19,7 @@ How should services expose their health status and validate operational readines
19
19
 
20
20
  All services must expose a `GET /health` endpoint that validates external dependencies using read-only operations and returns structured status with appropriate HTTP codes.
21
21
 
22
- ### Implementation Details
22
+ ### Details
23
23
 
24
24
  **Endpoint contract:**
25
25
 
@@ -17,7 +17,7 @@ What coding practices should be followed across all languages and projects to ke
17
17
 
18
18
  **Apply a set of language-agnostic structural and organizational practices that keep files small, logic decomposed, types co-located, tests co-located, and documentation always in sync.**
19
19
 
20
- ### Implementation Details
20
+ ### Details
21
21
 
22
22
  #### 01-keep-files-short
23
23
 
@@ -17,7 +17,7 @@ What unit testing practices should be followed to ensure tests are meaningful, r
17
17
 
18
18
  **Every test must assert behavior, run offline without external dependencies, enforce 80% coverage, centralize shared setup, and prefer real code over mocks.**
19
19
 
20
- ### Implementation Details
20
+ ### Details
21
21
 
22
22
  #### 01-must-have-at-least-one-assertion-per-test
23
23
 
@@ -68,7 +68,27 @@ Builds that miss the threshold must not be merged.
68
68
 
69
69
  ---
70
70
 
71
- #### 04-should-extract-shared-setup
71
+ #### 04-must-place-test-files-alongside-source
72
+
73
+ Test files must live next to the source file they test, in the same directory, following the convention of the language/framework (e.g. `file.test.ts`, `file_test.go`, `file.spec.js`).
74
+
75
+ ```
76
+ src/mymodule/group1/file1.ts ← source
77
+ src/mymodule/group1/file1.test.ts ← test (same directory)
78
+ ```
79
+
80
+ **Exception — separate test folder:** When the framework makes co-location impractical (e.g. Python's common `tests/` convention), or when the community strongly favors a separate folder, a dedicated test root (e.g. `tests/`) is allowed. In that case the test folder **must mirror** the source folder structure exactly:
81
+
82
+ ```
83
+ src/mymodule/group1/file1.py ← source
84
+ tests/mymodule/group1/file1_test.py ← test (mirrored path)
85
+ ```
86
+
87
+ Do not flatten or reorganize paths when using a separate test folder.
88
+
89
+ ---
90
+
91
+ #### 05-should-extract-shared-setup
72
92
 
73
93
  When setup logic is repeated across two or more test files, centralize it (`src/test-utils/`, `internal/testutil/`, `tests/conftest.py`).
74
94
 
@@ -81,7 +101,7 @@ export function makeOrder(overrides: Partial<Order> = {}): Order {
81
101
 
82
102
  ---
83
103
 
84
- #### 05-should-avoid-mocks
104
+ #### 06-should-avoid-mocks
85
105
 
86
106
  Use the lowest-cost alternative that exercises real behavior:
87
107
 
@@ -19,7 +19,7 @@ Every project must meet six minimum quality standards: a Getting Started section
19
19
 
20
20
  These standards form a non-negotiable baseline. Individual projects may raise the bar but must never fall below it.
21
21
 
22
- ### Implementation Details
22
+ ### Details
23
23
 
24
24
  #### 01-readme-must-have-getting-started
25
25
 
@@ -31,7 +31,7 @@ These standards form a non-negotiable baseline. Individual projects may raise th
31
31
 
32
32
  **Required README structure:**
33
33
 
34
- ```markdown
34
+ ````markdown
35
35
  # Project Name
36
36
 
37
37
  One-line description.
@@ -46,7 +46,7 @@ npm install my-package
46
46
  import { myFunction } from "my-package";
47
47
  myFunction({ input: "value" });
48
48
  ```
49
- ```
49
+ ````
50
50
 
51
51
  ---
52
52
 
@@ -161,3 +161,31 @@ all:
161
161
  $(MAKE) -C basic-usage run
162
162
  $(MAKE) -C advanced-usage run
163
163
  ```
164
+
165
+ ---
166
+
167
+ #### 07-statistical-models-must-have-eval-targets
168
+
169
+ Projects that contain statistical models (e.g., ML models, LLM-based evaluators, classifiers, ranking systems, or any component whose output quality is measured probabilistically) must define measurable performance thresholds and verify them automatically.
170
+
171
+ **Requirements:**
172
+ - A `make eval` target must exist and execute all performance evaluations
173
+ - Each evaluation must have a **documented minimum performance threshold** (e.g., accuracy ≥ 0.85, F1 ≥ 0.80, BLEU ≥ 0.70)
174
+ - Thresholds must be declared explicitly in the project (e.g., in a config file, `Makefile` variable, or documented in `README.md`)
175
+ - `make eval` must **exit with a non-zero status** (fail) if:
176
+ - The evaluation cannot be executed (missing data, environment errors, model load failures)
177
+ - Any metric falls below its defined minimum threshold
178
+ - CI/CD must invoke `make eval` before releasing any version that changes model weights, prompts, or evaluation logic
179
+
180
+ **Threshold declaration example (Makefile):**
181
+
182
+ ```makefile
183
+ EVAL_MIN_ACCURACY := 0.85
184
+ EVAL_MIN_F1 := 0.80
185
+
186
+ eval:
187
+ python eval.py \
188
+ --min-accuracy $(EVAL_MIN_ACCURACY) \
189
+ --min-f1 $(EVAL_MIN_F1) \
190
+ || (echo "Evaluation failed: metrics below threshold"; exit 1)
191
+ ```
@@ -17,7 +17,7 @@ What error handling practices should be followed across all languages and projec
17
17
 
18
18
  **Follow a set of consistent error handling practices: catch only where you can handle, return errors as values at interfaces, centralize repetitive catch logic, communicate failure clearly at process and service boundaries, and exercise error paths with dedicated tests.**
19
19
 
20
- ### Implementation Details
20
+ ### Details
21
21
 
22
22
  #### 01-catch-only-where-handled
23
23
 
@@ -19,7 +19,7 @@ Question: What policy should developers follow to continuously enrich XDRs so re
19
19
 
20
20
  Developers must treat reusable missing guidance discovered during implementation as an XDR gap to be proposed and reviewed, not as permanent prompt-only context or repeated vibe coding.
21
21
 
22
- ### Implementation Details
22
+ ### Details
23
23
 
24
24
  - The main objective is sharing, discussing, and converging practices across teams. Controlled divergence during exploration is acceptable, but recurring successful decisions must be converged into shared XDRs.
25
25
  - The non _local scope exists to share practices across projects, company areas, and functionally organized teams. Decisions placed in `_local` should be truly specific to the needs of a single application or repository.
@@ -43,4 +43,4 @@ Developers must treat reusable missing guidance discovered during implementation
43
43
  - [_core-adr-001](../../../_core/adrs/principles/001-xdrs-core.md)
44
44
  - [_core-article-001](../../../_core/adrs/principles/articles/001-xdrs-overview.md)
45
45
  - [agentme-article-001](articles/001-continuous-xdr-improvement.md)
46
- - [002-write-policy skill](../../../../.github/skills/002-write-policy/SKILL.md)
46
+ - [002-write-policy skill](../../../_core/adrs/principles/skills/002-write-policy/SKILL.md)
@@ -19,7 +19,7 @@ What baseline structure rules must every buildable module follow regardless of l
19
19
 
20
20
  Language-specific EDRs may add ecosystem details, but they must not redefine these baseline folder responsibilities.
21
21
 
22
- ### Implementation Details
22
+ ### Details
23
23
 
24
24
  #### 01-module-must-own-folder-root
25
25
 
@@ -2,13 +2,13 @@
2
2
 
3
3
  ## Overview
4
4
 
5
- This article explains how architects, engineers and business professionals should recognize, organize, and promote reusable delivery decisions into XDRs as a continuous improvement activity. It is aimed at people working with coding agents, vibe-coding loops, or SDD-oriented delivery who need a practical path from task friction to shared documentation.
5
+ A practical guide for recognizing recurring delivery decisions and promoting them into shared XDRs. Intended for engineers, architects, and business professionals working with coding agents or SDD-oriented delivery.
6
6
 
7
- Continuous improvement matters because delivery decisions do not stay correct forever. Team structures change, platforms evolve, tools mature, and the trade-offs behind earlier choices shift over time. If XDRs are not revisited and improved continuously, previously useful decisions become stale guidance and eventually turn into a form of legacy documentation that misleads delivery instead of guiding it.
7
+ ## Content
8
8
 
9
- Continuous improvement also keeps the target state explicit. As XDRs evolve across projects and tracks, teams need a clear shared view of where they are trying to converge, what remains intentionally different, and what should be treated as technical debt on the path toward that target. Keeping XDRs current reduces confusion about the desired future state and helps each project evolve toward it deliberately instead of drifting through ad hoc local decisions.
9
+ Delivery decisions do not stay correct forever. Team structures change, platforms evolve, tools mature, and the trade-offs behind earlier choices shift over time. If XDRs are not revisited and improved continuously, previously useful decisions become stale guidance that misleads delivery instead of guiding it.
10
10
 
11
- ## Content
11
+ Keeping XDRs current also makes the target state explicit. Teams need a clear shared view of where they are converging, what remains intentionally different, and what is technical debt on the path to that target.
12
12
 
13
13
  ### Start from delivery friction
14
14
 
@@ -90,4 +90,4 @@ If the same clarification would likely be needed in another feature, by another
90
90
  - [_core-adr-001](../../../../_core/adrs/principles/001-xdrs-core.md) - XDR structure, numbering, and mandatory template
91
91
  - [_core-article-001](../../../../_core/adrs/principles/articles/001-xdrs-overview.md) - XDR introduction and general adoption guidance
92
92
  - [agentme-edr-012](../012-continuous-xdr-enrichment.md) - Shared-first XDR enrichment policy and 80% coverage target
93
- - [002-write-policy skill](../../../../../.github/skills/002-write-policy/SKILL.md) - Step-by-step procedure for drafting new XDRs
93
+ - [002-write-policy skill](../../../../_core/adrs/principles/skills/002-write-policy/SKILL.md) - Step-by-step procedure for drafting new XDRs
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "agentme",
3
- "version": "0.9.0",
3
+ "version": "0.11.0",
4
4
  "description": "",
5
5
  "dependencies": {
6
6
  "filedist": "^0.34.1"
@@ -22,6 +22,6 @@
22
22
  "url": "https://github.com/flaviostutz/agentme.git"
23
23
  },
24
24
  "devDependencies": {
25
- "xdrs-core": "^0.27.1"
25
+ "xdrs-core": "^0.28.0"
26
26
  }
27
27
  }