npm - agentme - Versions diffs - 0.9.0 → 0.11.0 - Mend

agentme 0.9.0 → 0.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

package/.xdrs/agentme/edrs/devops/006-github-pipelines.md CHANGED Viewed

@@ -19,7 +19,7 @@ What GitHub Actions workflows should every project follow to ensure a safe, pred
 Separating these concerns eliminates accidental publishes from CI runs, ensures monotag has access to the full git history, and makes each workflow independently auditable and re-runnable.
-### Implementation Details
+### Details
 #### Workflow overview

package/.xdrs/agentme/edrs/devops/008-common-targets.md CHANGED Viewed

@@ -19,7 +19,7 @@ What standard set of Makefile target names and execution rules should projects a
 Standardizing both the target names and the execution chain removes per-project guesswork, makes CI pipelines reusable, and keeps tooling behavior visible in one place.
-### Implementation Details
+### Details
 #### 01-every-project-must-have-root-makefile
@@ -103,6 +103,7 @@ Targets are organized into five lifecycle groups. Projects must use these names
 | `test-unit` | Run unit tests only, including coverage report generation and coverage threshold enforcement. |
 | `test-integration` | *(Optional)* Run integration and end-to-end tests only. Projects without integration tests may omit this target. |
 | `test-smoke` | *(Optional)* Run a fast, minimal subset of tests to verify the software is basically functional. Useful as a post-deploy health check. |
+| `eval` | *(Optional)* Run **all evaluations** for the module. Used alongside `test` to measure the accuracy and performance of statistical systems such as ML models, AI agents, or noisy systems. Typically runs against a live or near-live system (similar to an integration test) and produces a performance analysis report (e.g., F1 score, Accuracy, Precision, Recall). Must not be included in `test` or `all` — evals are opt-in because they require live dependencies and may be slow or costly to run. Individual evaluations must follow the prefix convention: `eval-<qualifier>` (e.g., `eval-simple`, `eval-complex`). |
 ##### Release group

package/.xdrs/agentme/edrs/devops/017-tool-execution-and-scripting.md CHANGED Viewed

@@ -19,7 +19,7 @@ How should projects execute development commands so the command surface stays pr
 This keeps local development and CI aligned, reduces indirection, and lets contributors understand project behavior by reading one command surface.
-### Implementation Details
+### Details
 - Every project MUST use a root `Makefile` as the authoritative entry point for developer and pipeline commands.
 - The target names in that `Makefile` MUST follow [agentme-edr-008](008-common-targets.md).

package/.xdrs/agentme/edrs/governance/013-contributing-guide-requirements.md CHANGED Viewed

@@ -19,7 +19,7 @@ What contributor workflow guidance must every project publish so contributors kn
 Projects must keep a `CONTRIBUTING.md` file at the repository root. The file must explain where bugs, feature discussions, and code changes belong so contributors follow a predictable workflow before opening pull requests.
-### Implementation Details
+### Details
 - Every project **MUST** have a root `CONTRIBUTING.md`.
 - The guide **MUST** direct bug reports to issues.

package/.xdrs/agentme/edrs/index.md CHANGED Viewed

@@ -29,8 +29,10 @@ Language and framework-specific tooling and project structure.
 - [agentme-edr-010](application/010-golang-project-tooling.md) - **Go project tooling and structure** - Scaffold Go CLIs and libraries with the standard layout *(includes skill: [003-create-golang-project](application/skills/003-create-golang-project/SKILL.md))*
 - [agentme-edr-014](application/014-python-project-tooling.md) - **Python project tooling and structure** - Scaffold Python packages and CLIs with the standard layout *(includes skill: [005-create-python-project](application/skills/005-create-python-project/SKILL.md))*
 - [agentme-edr-015](application/015-cli-tool-standards.md) - **CLI tool standards** - Define command UX and behavior for CLI tools
-- [agentme-edr-018](application/018-ai-agent-development-standards.md) - **AI agent development standards** - Standard toolchain, framework, evaluation, and context patterns for AI agent projects
+- [agentme-edr-018](application/018-ai-agent-development-standards.md) - **AI agent development standards** - Standard toolchain, framework, evaluation, and workflow patterns for AI agent projects built with Python and LangGraph
 - [agentme-edr-019](application/019-ml-dataset-structure.md) - **ML dataset structure** - Standard folder layout and file conventions for ML datasets
+- [agentme-edr-020](application/020-ai-agent-xdrs-knowledge-layer.md) - **AI agent XDRS knowledge layer** - How to integrate XDRS as the runtime source of truth for policies and skills in AI agents (apply only when the project explicitly uses XDRS)
+- [agentme-edr-021](application/021-pragmatic-hexagonal-architecture.md) - **Pragmatic hexagonal architecture** - Organize application layers as External/Adapters/Application with practical coupling rules
 - [004-select-relevant-xdrs](application/skills/004-select-relevant-xdrs/SKILL.md) - **Select relevant XDRs**
 ## Devops

package/.xdrs/agentme/edrs/observability/011-service-health-check-endpoint.md CHANGED Viewed

@@ -19,7 +19,7 @@ How should services expose their health status and validate operational readines
 All services must expose a `GET /health` endpoint that validates external dependencies using read-only operations and returns structured status with appropriate HTTP codes.
-### Implementation Details
+### Details
 **Endpoint contract:**

package/.xdrs/agentme/edrs/principles/002-coding-best-practices.md CHANGED Viewed

@@ -17,7 +17,7 @@ What coding practices should be followed across all languages and projects to ke
 **Apply a set of language-agnostic structural and organizational practices that keep files small, logic decomposed, types co-located, tests co-located, and documentation always in sync.**
-### Implementation Details
+### Details
 #### 01-keep-files-short

package/.xdrs/agentme/edrs/principles/004-unit-test-requirements.md CHANGED Viewed

@@ -17,7 +17,7 @@ What unit testing practices should be followed to ensure tests are meaningful, r
 **Every test must assert behavior, run offline without external dependencies, enforce 80% coverage, centralize shared setup, and prefer real code over mocks.**
-### Implementation Details
+### Details
 #### 01-must-have-at-least-one-assertion-per-test
@@ -68,7 +68,27 @@ Builds that miss the threshold must not be merged.
 ---
-#### 04-should-extract-shared-setup
+#### 04-must-place-test-files-alongside-source
+Test files must live next to the source file they test, in the same directory, following the convention of the language/framework (e.g. `file.test.ts`, `file_test.go`, `file.spec.js`).
+```
+src/mymodule/group1/file1.ts        ← source
+src/mymodule/group1/file1.test.ts   ← test (same directory)
+```
+**Exception — separate test folder:** When the framework makes co-location impractical (e.g. Python's common `tests/` convention), or when the community strongly favors a separate folder, a dedicated test root (e.g. `tests/`) is allowed. In that case the test folder **must mirror** the source folder structure exactly:
+```
+src/mymodule/group1/file1.py          ← source
+tests/mymodule/group1/file1_test.py   ← test (mirrored path)
+```
+Do not flatten or reorganize paths when using a separate test folder.
+---
+#### 05-should-extract-shared-setup
 When setup logic is repeated across two or more test files, centralize it (`src/test-utils/`, `internal/testutil/`, `tests/conftest.py`).
@@ -81,7 +101,7 @@ export function makeOrder(overrides: Partial<Order> = {}): Order {
 ---
-#### 05-should-avoid-mocks
+#### 06-should-avoid-mocks
 Use the lowest-cost alternative that exercises real behavior:

package/.xdrs/agentme/edrs/principles/007-project-quality-standards.md CHANGED Viewed

@@ -19,7 +19,7 @@ Every project must meet six minimum quality standards: a Getting Started section
 These standards form a non-negotiable baseline. Individual projects may raise the bar but must never fall below it.
-### Implementation Details
+### Details
 #### 01-readme-must-have-getting-started
@@ -31,7 +31,7 @@ These standards form a non-negotiable baseline. Individual projects may raise th
 **Required README structure:**
-```markdown
+````markdown
 # Project Name
 One-line description.
@@ -46,7 +46,7 @@ npm install my-package
 import { myFunction } from "my-package";
 myFunction({ input: "value" });
 ```
-```
+````
 ---
@@ -161,3 +161,31 @@ all:
 	$(MAKE) -C basic-usage run
 	$(MAKE) -C advanced-usage run
 ```
+---
+#### 07-statistical-models-must-have-eval-targets
+Projects that contain statistical models (e.g., ML models, LLM-based evaluators, classifiers, ranking systems, or any component whose output quality is measured probabilistically) must define measurable performance thresholds and verify them automatically.
+**Requirements:**
+- A `make eval` target must exist and execute all performance evaluations
+- Each evaluation must have a **documented minimum performance threshold** (e.g., accuracy ≥ 0.85, F1 ≥ 0.80, BLEU ≥ 0.70)
+- Thresholds must be declared explicitly in the project (e.g., in a config file, `Makefile` variable, or documented in `README.md`)
+- `make eval` must **exit with a non-zero status** (fail) if:
+  - The evaluation cannot be executed (missing data, environment errors, model load failures)
+  - Any metric falls below its defined minimum threshold
+- CI/CD must invoke `make eval` before releasing any version that changes model weights, prompts, or evaluation logic
+**Threshold declaration example (Makefile):**
+```makefile
+EVAL_MIN_ACCURACY := 0.85
+EVAL_MIN_F1       := 0.80
+eval:
+	python eval.py \
+	  --min-accuracy $(EVAL_MIN_ACCURACY) \
+	  --min-f1 $(EVAL_MIN_F1) \
+	  || (echo "Evaluation failed: metrics below threshold"; exit 1)
+```

package/.xdrs/agentme/edrs/principles/009-error-handling.md CHANGED Viewed

@@ -17,7 +17,7 @@ What error handling practices should be followed across all languages and projec
 **Follow a set of consistent error handling practices: catch only where you can handle, return errors as values at interfaces, centralize repetitive catch logic, communicate failure clearly at process and service boundaries, and exercise error paths with dedicated tests.**
-### Implementation Details
+### Details
 #### 01-catch-only-where-handled

package/.xdrs/agentme/edrs/principles/012-continuous-xdr-enrichment.md CHANGED Viewed

@@ -19,7 +19,7 @@ Question: What policy should developers follow to continuously enrich XDRs so re
 Developers must treat reusable missing guidance discovered during implementation as an XDR gap to be proposed and reviewed, not as permanent prompt-only context or repeated vibe coding.
-### Implementation Details
+### Details
 - The main objective is sharing, discussing, and converging practices across teams. Controlled divergence during exploration is acceptable, but recurring successful decisions must be converged into shared XDRs.
 - The non _local scope exists to share practices across projects, company areas, and functionally organized teams. Decisions placed in `_local` should be truly specific to the needs of a single application or repository.
@@ -43,4 +43,4 @@ Developers must treat reusable missing guidance discovered during implementation
 - [_core-adr-001](../../../_core/adrs/principles/001-xdrs-core.md)
 - [_core-article-001](../../../_core/adrs/principles/articles/001-xdrs-overview.md)
 - [agentme-article-001](articles/001-continuous-xdr-improvement.md)
-- [002-write-policy skill](../../../../.github/skills/002-write-policy/SKILL.md)
+- [002-write-policy skill](../../../_core/adrs/principles/skills/002-write-policy/SKILL.md)

package/.xdrs/agentme/edrs/principles/016-cross-language-module-structure.md CHANGED Viewed

@@ -19,7 +19,7 @@ What baseline structure rules must every buildable module follow regardless of l
 Language-specific EDRs may add ecosystem details, but they must not redefine these baseline folder responsibilities.
-### Implementation Details
+### Details
 #### 01-module-must-own-folder-root

package/.xdrs/agentme/edrs/principles/articles/001-continuous-xdr-improvement.md CHANGED Viewed

@@ -2,13 +2,13 @@
 ## Overview
-This article explains how architects, engineers and business professionals should recognize, organize, and promote reusable delivery decisions into XDRs as a continuous improvement activity. It is aimed at people working with coding agents, vibe-coding loops, or SDD-oriented delivery who need a practical path from task friction to shared documentation.
+A practical guide for recognizing recurring delivery decisions and promoting them into shared XDRs. Intended for engineers, architects, and business professionals working with coding agents or SDD-oriented delivery.
-Continuous improvement matters because delivery decisions do not stay correct forever. Team structures change, platforms evolve, tools mature, and the trade-offs behind earlier choices shift over time. If XDRs are not revisited and improved continuously, previously useful decisions become stale guidance and eventually turn into a form of legacy documentation that misleads delivery instead of guiding it.
+## Content
-Continuous improvement also keeps the target state explicit. As XDRs evolve across projects and tracks, teams need a clear shared view of where they are trying to converge, what remains intentionally different, and what should be treated as technical debt on the path toward that target. Keeping XDRs current reduces confusion about the desired future state and helps each project evolve toward it deliberately instead of drifting through ad hoc local decisions.
+Delivery decisions do not stay correct forever. Team structures change, platforms evolve, tools mature, and the trade-offs behind earlier choices shift over time. If XDRs are not revisited and improved continuously, previously useful decisions become stale guidance that misleads delivery instead of guiding it.
-## Content
+Keeping XDRs current also makes the target state explicit. Teams need a clear shared view of where they are converging, what remains intentionally different, and what is technical debt on the path to that target.
 ### Start from delivery friction
@@ -90,4 +90,4 @@ If the same clarification would likely be needed in another feature, by another
 - [_core-adr-001](../../../../_core/adrs/principles/001-xdrs-core.md) - XDR structure, numbering, and mandatory template
 - [_core-article-001](../../../../_core/adrs/principles/articles/001-xdrs-overview.md) - XDR introduction and general adoption guidance
 - [agentme-edr-012](../012-continuous-xdr-enrichment.md) - Shared-first XDR enrichment policy and 80% coverage target
-- [002-write-policy skill](../../../../../.github/skills/002-write-policy/SKILL.md) - Step-by-step procedure for drafting new XDRs
+- [002-write-policy skill](../../../../_core/adrs/principles/skills/002-write-policy/SKILL.md) - Step-by-step procedure for drafting new XDRs

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "agentme",
-  "version": "0.9.0",
+  "version": "0.11.0",
   "description": "",
   "dependencies": {
     "filedist": "^0.34.1"
@@ -22,6 +22,6 @@
     "url": "https://github.com/flaviostutz/agentme.git"
   },
   "devDependencies": {
-    "xdrs-core": "^0.27.1"
+    "xdrs-core": "^0.28.0"
   }
 }