agentme 0.17.0 → 0.19.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.xdrs/agentme/edrs/application/019-ai-agents-development-standards.md +15 -5
- package/.xdrs/agentme/edrs/application/021-ai-workflow-development-standards.md +74 -5
- package/.xdrs/agentme/edrs/application/028-ai-eval-standards.md +112 -16
- package/.xdrs/agentme/edrs/principles/002-coding-best-practices.md +24 -3
- package/.xdrs/index.md +3 -1
- package/package.json +1 -1
|
@@ -127,16 +127,18 @@ When multiple agents are needed:
|
|
|
127
127
|
Every agent system prompt MUST follow this XML-section template. Sections must appear in this order. Required sections must always be present; optional sections may be omitted when they genuinely do not apply; never reorder them.
|
|
128
128
|
|
|
129
129
|
```xml
|
|
130
|
+
[specific task description to the agent. if not defined use the default prompt "Execute your objective taking into consideration the inputs provided and all the sections described below"]
|
|
131
|
+
|
|
130
132
|
<OBJECTIVE>
|
|
131
133
|
[A one or two-sentence summary of the agent's main deliverable.
|
|
132
134
|
e.g.: Split the incoming file list into logical batches for parallel processing.]
|
|
133
135
|
</OBJECTIVE>
|
|
134
136
|
|
|
135
|
-
<
|
|
137
|
+
<AGENT_ROLE>
|
|
136
138
|
[Defines who the agent is, its area of expertise, and its core persona.
|
|
137
139
|
If running inside a workflow, define exactly which node in WORKFLOW_CONTEXT this agent corresponds to.
|
|
138
140
|
e.g.: You are the batch_planning_agent (see WORKFLOW_CONTEXT). You are an expert at partitioning large file sets into balanced, directory-aware batches.]
|
|
139
|
-
</
|
|
141
|
+
</AGENT_ROLE>
|
|
140
142
|
|
|
141
143
|
<INPUT>
|
|
142
144
|
[All inputs for this agent invocation. For standalone agents: list only the agent-specific inputs. For workflow agents: list workflow-level inputs shared across all agents first, then agent-specific inputs such as judge outcomes, counters, or intermediate results from upstream nodes.]
|
|
@@ -160,31 +162,39 @@ e.g.: Do not call any tools. All reasoning is done in-context using the INPUT fi
|
|
|
160
162
|
<!-- Optional: include when hard constraints need to be stated explicitly -->
|
|
161
163
|
<GUARDRAILS>
|
|
162
164
|
[Hard, non-negotiable constraints the agent must never violate.
|
|
165
|
+
All constraints MUST use mandatory language: MUST, MUST NOT, NEVER, ALWAYS, SHALL, SHALL NOT.
|
|
163
166
|
e.g.: NEVER create batches with fewer than 5 or more than 20 files. NEVER split files from the same directory across different batches unless unavoidable.]
|
|
164
167
|
</GUARDRAILS>
|
|
165
168
|
|
|
166
169
|
<OUTPUT_FORMAT>
|
|
167
170
|
[A templated example or JSON schema specifying exactly how the final response should look.
|
|
168
|
-
|
|
171
|
+
When multiple output formats are possible, use mandatory language (MUST, ALWAYS, NEVER) to specify exactly which format to use and exclude the others.
|
|
172
|
+
e.g.: Respond with a JSON object matching this schema: ... ALWAYS return valid JSON. NEVER include prose outside the JSON block.]
|
|
169
173
|
</OUTPUT_FORMAT>
|
|
170
174
|
|
|
171
175
|
<!-- Workflow-only: omit this section for standalone (non-workflow) agents -->
|
|
172
176
|
<WORKFLOW_CONTEXT>
|
|
173
177
|
[A detailed prose or diagram description of the overall workflow graph so the agent understands its role within the larger flow. Reference the specific node name that maps to this agent. Include enough detail about upstream and downstream nodes so the agent can reason about its context.]
|
|
174
178
|
</WORKFLOW_CONTEXT>
|
|
179
|
+
|
|
180
|
+
<SYSTEM_CONTEXT>
|
|
181
|
+
The current date/time is [now in ISO 8601 format].
|
|
182
|
+
The current OS is: [operating system name].
|
|
183
|
+
</SYSTEM_CONTEXT>
|
|
175
184
|
```
|
|
176
185
|
|
|
177
186
|
**Rules:**
|
|
178
187
|
|
|
179
188
|
| Section | Required? | Notes |
|
|
180
189
|
|---|---|---|
|
|
190
|
+
| `<SYSTEM_CONTEXT>` | Optional | Runtime environment context injected at invocation time (e.g., current date/time in ISO 8601, OS). Include whenever the agent may need temporal or environment awareness. |
|
|
181
191
|
| `<OBJECTIVE>` | Required | One or two sentences summarising the agent's main deliverable. |
|
|
182
192
|
| `<ROLE>` | Required | Agent persona and expertise. When inside a workflow, MUST reference its node name from `<WORKFLOW_CONTEXT>`. |
|
|
183
193
|
| `<INPUT>` | Required | List ALL inputs. For workflow agents: workflow-level inputs first, then agent-specific inputs. |
|
|
184
194
|
| `<STEPS>` | Optional | Include when the agent follows a non-trivial numbered sequence of steps. |
|
|
185
195
|
| `<TOOL_GUIDANCE>` | Optional | Include when tool use order or conditions need explicit direction. |
|
|
186
|
-
| `<GUARDRAILS>` | Optional | Hard constraints that must never be violated. |
|
|
187
|
-
| `<OUTPUT_FORMAT>` | Required | MUST include a concrete schema or templated example; do not leave it vague. |
|
|
196
|
+
| `<GUARDRAILS>` | Optional | Hard constraints that must never be violated. All constraint statements MUST use mandatory language: MUST, MUST NOT, NEVER, ALWAYS, SHALL, SHALL NOT. |
|
|
197
|
+
| `<OUTPUT_FORMAT>` | Required | MUST include a concrete schema or templated example; do not leave it vague. When multiple output formats are possible, MUST use mandatory language to specify exactly which one to use and explicitly exclude the others. |
|
|
188
198
|
| `<WORKFLOW_CONTEXT>` | Conditional | MUST be omitted for standalone agents. MUST be present when the agent runs as a node inside a LangGraph workflow. |
|
|
189
199
|
|
|
190
200
|
## References
|
|
@@ -107,12 +107,13 @@ Eval folder structure and script requirements are defined in [agentme-edr-028](0
|
|
|
107
107
|
|
|
108
108
|
LangGraph node names MUST follow a suffix convention that communicates the node's role at a glance. Names MUST be action-oriented and descriptive.
|
|
109
109
|
|
|
110
|
-
|
|
|
110
|
+
| Convention | Node type | When to use |
|
|
111
111
|
|---|---|---|
|
|
112
|
-
| `_llm` | LLM call | Any node whose primary action is a direct LLM inference call (see [agentme-edr-018](018-ai-llm-development-standards.md)) |
|
|
113
|
-
| `_step` | Algorithmic step | Deterministic logic with no LLM involvement (transformation, validation, routing) |
|
|
114
|
-
| `_tool` | Tool/API call | A node that wraps a single external tool or API (e.g. a REST endpoint, DB query) |
|
|
115
|
-
| `_agent` | Subgraph agent | A node that invokes a nested subgraph containing its own tool-invocation cycle and LLM calls; use the **deepagents** library for these nodes (see [agentme-edr-019](019-ai-agents-development-standards.md)) |
|
|
112
|
+
| suffix `_llm` | LLM call | Any node whose primary action is a direct LLM inference call (see [agentme-edr-018](018-ai-llm-development-standards.md)) |
|
|
113
|
+
| suffix `_step` | Algorithmic step | Deterministic logic with no LLM involvement (transformation, validation, routing) |
|
|
114
|
+
| suffix `_tool` | Tool/API call | A node that wraps a single external tool or API (e.g. a REST endpoint, DB query) |
|
|
115
|
+
| suffix `_agent` | Subgraph agent | A node that invokes a nested subgraph containing its own tool-invocation cycle and LLM calls; use the **deepagents** library for these nodes (see [agentme-edr-019](019-ai-agents-development-standards.md)) |
|
|
116
|
+
| prefix `evaluate_` | Judge node | A node that evaluates the quality, correctness, completeness, or progress of prior outputs and returns a structured verdict; MUST follow rule `13-judge-node-output-format` |
|
|
116
117
|
|
|
117
118
|
The Python function implementing the node SHOULD share the same name as the node alias passed to `add_node`, so that graph definitions and stack traces remain unambiguous:
|
|
118
119
|
|
|
@@ -131,6 +132,8 @@ graph.add_node("code_reviewer_agent", code_reviewer_agent)
|
|
|
131
132
|
|
|
132
133
|
Names MUST NOT use generic labels such as `node1`, `process`, or `run`. Each name must clearly express what action the node performs.
|
|
133
134
|
|
|
135
|
+
Judge nodes use a **prefix** convention instead of a suffix: the name MUST start with `evaluate_` followed by the subject being judged (e.g. `evaluate_progress`, `evaluate_quality`, `evaluate_completeness`, `evaluate_relevance`). This makes judge nodes immediately distinguishable from all other node types at a glance.
|
|
136
|
+
|
|
134
137
|
#### 10-workflow-unit-testing
|
|
135
138
|
|
|
136
139
|
All LLM calls within workflow nodes are external API calls and MUST be mocked in unit tests per [agentme-edr-018](018-ai-llm-development-standards.md) rule `04-unit-test-mocking`. Workflow unit tests must run fully offline with no real LLM provider calls.
|
|
@@ -222,6 +225,72 @@ Choose a name that summarises what the workflow consumes, processes, and produce
|
|
|
222
225
|
|
|
223
226
|
**Bad names** (FORBIDDEN): `MainWorkflow`, `AgentGraph`, `ProcessFlow`, `Workflow1`, `RunGraph`.
|
|
224
227
|
|
|
228
|
+
#### 13-judge-node-output-format
|
|
229
|
+
|
|
230
|
+
Every node whose name starts with `evaluate_` (a judge node) MUST return a structured verdict object as its output. This ensures all judge nodes are interchangeable and their results can be uniformly consumed by downstream routing logic, logged, and compared across runs.
|
|
231
|
+
|
|
232
|
+
**Required output schema:**
|
|
233
|
+
|
|
234
|
+
```python
|
|
235
|
+
from typing import Literal, Optional
|
|
236
|
+
from dataclasses import dataclass, field
|
|
237
|
+
|
|
238
|
+
FindingLevel = Literal["OK", "INFO", "WARNING", "ERROR"]
|
|
239
|
+
|
|
240
|
+
@dataclass
|
|
241
|
+
class JudgeFinding:
|
|
242
|
+
level: FindingLevel
|
|
243
|
+
# MUST: short action-oriented label; < 10 words
|
|
244
|
+
title: str
|
|
245
|
+
# MUST when level != "OK": why this is an issue; < 30 words
|
|
246
|
+
reason: Optional[str] = None
|
|
247
|
+
# MUST when level != "OK": notes/findings using mandatory (MUST) or advisory (SHOULD) language; < 400 words
|
|
248
|
+
details: Optional[str] = None
|
|
249
|
+
# OPTIONAL: possible fixes, only when directly inferrable from the finding without further analysis; < 200 words
|
|
250
|
+
fix: Optional[str] = None
|
|
251
|
+
|
|
252
|
+
@dataclass
|
|
253
|
+
class JudgeVerdict:
|
|
254
|
+
# MUST: highest severity level across all findings; "OK" only when every finding is "OK"
|
|
255
|
+
verdict: FindingLevel
|
|
256
|
+
# MUST: at least one finding present
|
|
257
|
+
findings: list[JudgeFinding] = field(default_factory=list)
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
Example (for logging, state storage, and inter-node communication):
|
|
261
|
+
|
|
262
|
+
```json
|
|
263
|
+
{
|
|
264
|
+
"verdict": "WARNING",
|
|
265
|
+
"findings": [
|
|
266
|
+
{
|
|
267
|
+
"level": "OK",
|
|
268
|
+
"title": "All required sections present"
|
|
269
|
+
},
|
|
270
|
+
{
|
|
271
|
+
"level": "WARNING",
|
|
272
|
+
"title": "Code coverage below threshold",
|
|
273
|
+
"reason": "Current coverage is 62%, minimum required is 80%.",
|
|
274
|
+
"details": "The following modules have no test coverage: auth.py, payments.py. SHOULD add unit tests for all public methods in these modules.",
|
|
275
|
+
"fix": "Add unit tests for auth.py and payments.py. Run `make test-coverage` to verify the threshold is met."
|
|
276
|
+
}
|
|
277
|
+
]
|
|
278
|
+
}
|
|
279
|
+
```
|
|
280
|
+
|
|
281
|
+
**Routing from judge nodes:**
|
|
282
|
+
|
|
283
|
+
Downstream conditional edges MUST route on `verdict` only:
|
|
284
|
+
|
|
285
|
+
```python
|
|
286
|
+
def route_after_evaluate_quality(state) -> str:
|
|
287
|
+
if state["evaluate_quality_result"].verdict in ("ERROR", "WARNING"):
|
|
288
|
+
return "revise_draft_llm"
|
|
289
|
+
return "publish_step"
|
|
290
|
+
```
|
|
291
|
+
|
|
292
|
+
**Logging:** Log `verdict` and the count of each level as MLflow metrics on the current run per rule `03-observability-and-experiment-tracking`.
|
|
293
|
+
|
|
225
294
|
#### 15-workflow-state-persistence
|
|
226
295
|
|
|
227
296
|
For long-running workflows that may need to be paused and resumed:
|
|
@@ -23,41 +23,47 @@ For when evals are required per AI tier, see [agentme-edr-007](../principles/007
|
|
|
23
23
|
|
|
24
24
|
#### 01-eval-folder-structure
|
|
25
25
|
|
|
26
|
-
|
|
26
|
+
Each named eval is a self-contained unit. Create one directory per eval under `evals/` at the same level as `lib/` and `examples/`:
|
|
27
27
|
|
|
28
28
|
```text
|
|
29
29
|
evals/
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
30
|
+
eval-<name>/
|
|
31
|
+
dataset/ # EDR-024 compliant dataset (README.md, dataset.schema.json, data/)
|
|
32
|
+
eval-<name>.py # evaluation script
|
|
33
|
+
eval-report.md # generated report (overwritten on each run — see rule 03)
|
|
34
|
+
Makefile # eval and run targets
|
|
35
|
+
eval-<name2>/
|
|
36
|
+
...
|
|
34
37
|
```
|
|
35
38
|
|
|
36
|
-
Where `<
|
|
39
|
+
Where `<name>` identifies the specific evaluation scenario (e.g., `eval-basic`, `eval-complex`, `eval-edge-cases`).
|
|
37
40
|
|
|
38
|
-
The
|
|
41
|
+
The `dataset/` subfolder MUST be a valid [agentme-edr-024](024-ml-dataset-structure.md) dataset — it MUST include `README.md` and `dataset.schema.json` at its root. For input/output pairs, use JSONL files per `agentme-edr-024.04-complex-structured-datasets-must-use-jsonl`.
|
|
42
|
+
|
|
43
|
+
Each `evals/eval-<name>/Makefile` MUST define:
|
|
39
44
|
|
|
40
45
|
| Target | Behaviour |
|
|
41
46
|
|---|---|
|
|
42
|
-
| `eval` | Runs
|
|
43
|
-
| `
|
|
47
|
+
| `eval` | Runs the eval with threshold enforcement; exits non-zero on failure (CI-safe) |
|
|
48
|
+
| `run` | Runs the eval without threshold enforcement (exploration / debugging) |
|
|
44
49
|
|
|
45
|
-
The module root Makefile MUST expose a `make eval` target that delegates to `eval` in every `evals
|
|
50
|
+
The module root Makefile MUST expose a `make eval` target that delegates to `eval` in every `evals/eval-<name>/Makefile`:
|
|
46
51
|
|
|
47
52
|
```makefile
|
|
48
53
|
eval:
|
|
49
|
-
$(MAKE) -C evals/
|
|
50
|
-
$(MAKE) -C evals/
|
|
54
|
+
$(MAKE) -C evals/eval-basic eval
|
|
55
|
+
$(MAKE) -C evals/eval-complex eval
|
|
51
56
|
```
|
|
52
57
|
|
|
53
58
|
#### 02-eval-script-requirements
|
|
54
59
|
|
|
55
|
-
Each `
|
|
60
|
+
Each `eval-<name>.py` script MUST:
|
|
56
61
|
|
|
57
|
-
- Load the dataset from `
|
|
62
|
+
- Load the dataset from `dataset/` in the same eval folder, following [agentme-edr-024](024-ml-dataset-structure.md). For input/output pairs, use the JSONL format per `agentme-edr-024.04-complex-structured-datasets-must-use-jsonl`.
|
|
58
63
|
- Run every input through the live component against **real LLM providers** (not mocked responses), to capture model drift.
|
|
59
64
|
- Log per-sample and aggregate metrics to an MLflow experiment that runs **locally** — a remote MLflow server MUST NOT be required.
|
|
60
65
|
- Compare outputs to expected values using project-defined quality thresholds. Thresholds MUST be declared explicitly (e.g., in a Makefile variable or README).
|
|
66
|
+
- Write `eval-report.md` in the same folder per rule `03-eval-report-file`.
|
|
61
67
|
- Exit with a non-zero status when any metric falls below its defined threshold, consistent with [agentme-edr-007](../principles/007-project-quality-standards.md) rule `07-statistical-models-must-have-eval-targets`.
|
|
62
68
|
|
|
63
69
|
**Example:**
|
|
@@ -68,19 +74,109 @@ from my_package.app.workflows.document_review_workflow.graph import graph
|
|
|
68
74
|
|
|
69
75
|
EVAL_MIN_ACCURACY = 0.85
|
|
70
76
|
|
|
71
|
-
with mlflow.start_run():
|
|
77
|
+
with mlflow.start_run() as run:
|
|
72
78
|
results = []
|
|
73
|
-
for sample in load_dataset("
|
|
79
|
+
for sample in load_dataset("dataset/"):
|
|
74
80
|
output = graph.invoke({"document": sample["input"]})
|
|
75
81
|
results.append(output["label"] == sample["expected_label"])
|
|
76
82
|
|
|
77
83
|
accuracy = sum(results) / len(results)
|
|
78
84
|
mlflow.log_metric("accuracy", accuracy)
|
|
79
85
|
|
|
86
|
+
write_eval_report(run, results, thresholds={"accuracy": EVAL_MIN_ACCURACY})
|
|
87
|
+
|
|
80
88
|
if accuracy < EVAL_MIN_ACCURACY:
|
|
81
89
|
raise SystemExit(f"Eval failed: accuracy {accuracy:.2f} < {EVAL_MIN_ACCURACY}")
|
|
82
90
|
```
|
|
83
91
|
|
|
92
|
+
#### 03-eval-report-file
|
|
93
|
+
|
|
94
|
+
Each eval script MUST produce `eval-report.md` in the same `evals/eval-<name>/` folder and overwrite it on every run.
|
|
95
|
+
|
|
96
|
+
**Generation constraint:** The report MUST be produced programmatically, reading raw metric values directly from MLflow. No LLM or generative model may write, summarize, or paraphrase any section of the report, to prevent hallucinated metric values.
|
|
97
|
+
|
|
98
|
+
The report MUST follow this template:
|
|
99
|
+
|
|
100
|
+
```markdown
|
|
101
|
+
# Eval Report: <name>
|
|
102
|
+
|
|
103
|
+
**Date:** <ISO date>
|
|
104
|
+
**Dataset:** dataset/
|
|
105
|
+
**Script:** eval-<name>.py
|
|
106
|
+
**Thresholds:** accuracy ≥ <value>, F1 ≥ <value>
|
|
107
|
+
|
|
108
|
+
## Overall Results
|
|
109
|
+
|
|
110
|
+
| Metric | Value | 95% CI | Threshold | Status |
|
|
111
|
+
|-----------|--------|----------------|-----------|---------|
|
|
112
|
+
| Accuracy | <val> | [<low>, <high>]| ≥ <thr> | ✓/✗ PASS/FAIL |
|
|
113
|
+
| F1 Score | <val> | — | ≥ <thr> | ✓/✗ PASS/FAIL |
|
|
114
|
+
| Precision | <val> | — | — | — |
|
|
115
|
+
| Recall | <val> | — | — | — |
|
|
116
|
+
| Samples | <n> | — | — | — |
|
|
117
|
+
|
|
118
|
+
**Overall: PASS / FAIL**
|
|
119
|
+
|
|
120
|
+
## Per-item Results
|
|
121
|
+
|
|
122
|
+
| ID | Input Summary | Expected | Actual | Correct |
|
|
123
|
+
|-----|---------------|----------|--------|---------|
|
|
124
|
+
| 001 | <summary> | <label> | <label>| ✓ |
|
|
125
|
+
| 002 | <summary> | <label> | <label>| ✗ |
|
|
126
|
+
|
|
127
|
+
## Notes
|
|
128
|
+
|
|
129
|
+
- <observations, failure patterns, MLflow run link>
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
**Confidence interval:** The 95% CI for accuracy MUST be computed using the **Wilson score interval** (preferred over the normal approximation for small $n$). A wide interval signals that the dataset is too small to support confident conclusions and the sample count should be increased.
|
|
133
|
+
|
|
134
|
+
The Wilson score bounds at 95% confidence ($z = 1.96$) are:
|
|
135
|
+
|
|
136
|
+
$$\frac{\hat{p} + \frac{z^2}{2n} \pm z\sqrt{\frac{\hat{p}(1-\hat{p})}{n} + \frac{z^2}{4n^2}}}{1 + \frac{z^2}{n}}$$
|
|
137
|
+
|
|
138
|
+
Where $\hat{p}$ is observed accuracy and $n$ is sample count. Accuracy and F1 are required; precision and recall are recommended.
|
|
139
|
+
|
|
140
|
+
**Filled-in example** (`evals/eval-basic/eval-report.md` for a document review workflow):
|
|
141
|
+
|
|
142
|
+
```markdown
|
|
143
|
+
# Eval Report: eval-basic
|
|
144
|
+
|
|
145
|
+
**Date:** 2026-06-12
|
|
146
|
+
**Dataset:** dataset/
|
|
147
|
+
**Script:** eval-basic.py
|
|
148
|
+
**Thresholds:** accuracy ≥ 0.85, F1 ≥ 0.80
|
|
149
|
+
|
|
150
|
+
## Overall Results
|
|
151
|
+
|
|
152
|
+
| Metric | Value | 95% CI | Threshold | Status |
|
|
153
|
+
|-----------|-------|--------------|-----------|-------------|
|
|
154
|
+
| Accuracy | 0.88 | [0.69, 0.97] | ≥ 0.85 | ✓ PASS |
|
|
155
|
+
| F1 Score | 0.86 | — | ≥ 0.80 | ✓ PASS |
|
|
156
|
+
| Precision | 0.89 | — | — | — |
|
|
157
|
+
| Recall | 0.84 | — | — | — |
|
|
158
|
+
| Samples | 25 | — | — | — |
|
|
159
|
+
|
|
160
|
+
**Overall: PASS**
|
|
161
|
+
|
|
162
|
+
> Note: CI [0.69, 0.97] is wide — 25 samples may be insufficient for high confidence. Consider expanding the dataset.
|
|
163
|
+
|
|
164
|
+
## Per-item Results
|
|
165
|
+
|
|
166
|
+
| ID | Input Summary | Expected | Actual | Correct |
|
|
167
|
+
|-----|-------------------------------------|----------|----------|---------|
|
|
168
|
+
| 001 | Contract renewal, 3 pages, standard | approve | approve | ✓ |
|
|
169
|
+
| 002 | NDA with unusual liability clause | escalate | escalate | ✓ |
|
|
170
|
+
| 003 | Vendor invoice, missing PO number | reject | reject | ✓ |
|
|
171
|
+
| 004 | Employment agreement, standard terms| approve | approve | ✓ |
|
|
172
|
+
| 005 | Amendment with redlined IP clause | escalate | approve | ✗ |
|
|
173
|
+
|
|
174
|
+
## Notes
|
|
175
|
+
|
|
176
|
+
- Sample 005 misclassified: redlined IP clause not flagged as escalation trigger. Possible model drift.
|
|
177
|
+
- MLflow run: experiment `eval_basic` — view with `mlflow ui`
|
|
178
|
+
```
|
|
179
|
+
|
|
84
180
|
## References
|
|
85
181
|
|
|
86
182
|
- [agentme-edr-007](../principles/007-project-quality-standards.md) — Project quality standards: when evals are required per AI tier (rule `09-ai-project-testing-requirements`) and statistical model eval targets (rule `07-statistical-models-must-have-eval-targets`)
|
|
@@ -83,7 +83,28 @@ def _persist_order(order, total): ...
|
|
|
83
83
|
|
|
84
84
|
---
|
|
85
85
|
|
|
86
|
-
#### 03-
|
|
86
|
+
#### 03-put-entry-point-function-first
|
|
87
|
+
|
|
88
|
+
Place the **entry-point function** (the outermost caller) at the **top** of the file. All helper or sub-functions it calls internally must appear **below** it.
|
|
89
|
+
|
|
90
|
+
*Why:* Readers can follow the overall logic top-down without jumping around the file. The most important function is immediately visible when the file is opened.
|
|
91
|
+
|
|
92
|
+
**Example (Python):**
|
|
93
|
+
|
|
94
|
+
```python
|
|
95
|
+
def process_order(order): # entry point at the top
|
|
96
|
+
_validate_order(order)
|
|
97
|
+
total = _calculate_price(order)
|
|
98
|
+
_persist_order(order, total)
|
|
99
|
+
|
|
100
|
+
def _validate_order(order): ...
|
|
101
|
+
def _calculate_price(order) -> Decimal: ...
|
|
102
|
+
def _persist_order(order, total): ...
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
---
|
|
106
|
+
|
|
107
|
+
#### 04-keep-readme-tests-and-examples-in-sync
|
|
87
108
|
|
|
88
109
|
Every change to a public interface, behavior, or configuration option must be reflected in:
|
|
89
110
|
|
|
@@ -95,7 +116,7 @@ Every change to a public interface, behavior, or configuration option must be re
|
|
|
95
116
|
|
|
96
117
|
---
|
|
97
118
|
|
|
98
|
-
####
|
|
119
|
+
#### 05-declare-types-in-file-where-used
|
|
99
120
|
|
|
100
121
|
If a type (struct, interface, class, typedef, etc.) is used in only **one** file, declare it in that same file. Move a type to a shared module only when it is referenced in two or more files.
|
|
101
122
|
|
|
@@ -103,7 +124,7 @@ If a type (struct, interface, class, typedef, etc.) is used in only **one** file
|
|
|
103
124
|
|
|
104
125
|
---
|
|
105
126
|
|
|
106
|
-
####
|
|
127
|
+
#### 06-keep-test-files-next-to-source
|
|
107
128
|
|
|
108
129
|
Where the language ecosystem supports it (e.g. JavaScript/TypeScript, Go, Rust), place test files **beside** the source file they cover and use a consistent naming convention rather than mirroring the source tree in a separate `tests/` folder.
|
|
109
130
|
|
package/.xdrs/index.md
CHANGED
|
@@ -25,4 +25,6 @@ Opiniated set of decisions and skills for common development tasks
|
|
|
25
25
|
|
|
26
26
|
### _local (reserved)
|
|
27
27
|
|
|
28
|
-
|
|
28
|
+
_local scope is the default scope for new xdrs and might override other scope decisions. These decisions are local and are not supposed to be shared in other contexts.
|
|
29
|
+
|
|
30
|
+
Read _local scope index at `_local/index.md` when it exists.
|