agentme 0.22.0 → 0.23.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
|
@@ -134,6 +134,23 @@ Names MUST NOT use generic labels such as `node1`, `process`, or `run`. Each nam
|
|
|
134
134
|
|
|
135
135
|
Judge nodes use a **prefix** convention instead of a suffix: the name MUST start with `evaluate_` followed by the subject being judged (e.g. `evaluate_progress`, `evaluate_quality`, `evaluate_completeness`, `evaluate_relevance`). This makes judge nodes immediately distinguishable from all other node types at a glance.
|
|
136
136
|
|
|
137
|
+
**Grouping prefix for related nodes:** When multiple nodes deal with the same subject, entity, or workflow region, SHOULD use a shared grouping word as a prefix followed by a verb and the role suffix. The pattern is `<group>_<verb>_<role_suffix>`. This makes the graph topology scannable and clusters related nodes together alphabetically in logs, traces, and code.
|
|
138
|
+
|
|
139
|
+
```python
|
|
140
|
+
# Nodes grouped under the "invoice" subject
|
|
141
|
+
def invoice_fetch_tool(state): ... # fetches invoice data from an API
|
|
142
|
+
def invoice_validate_step(state): ... # validates invoice fields deterministically
|
|
143
|
+
def invoice_summarize_llm(state): ... # summarizes invoice content with an LLM
|
|
144
|
+
def invoice_review_agent(state): ... # runs an agent loop to review the invoice
|
|
145
|
+
|
|
146
|
+
graph.add_node("invoice_fetch_tool", invoice_fetch_tool)
|
|
147
|
+
graph.add_node("invoice_validate_step", invoice_validate_step)
|
|
148
|
+
graph.add_node("invoice_summarize_llm", invoice_summarize_llm)
|
|
149
|
+
graph.add_node("invoice_review_agent", invoice_review_agent)
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
The grouping prefix is optional for workflows where all nodes clearly belong to a single domain. It MUST be used when a workflow spans multiple subjects or regions (e.g. `invoice_*`, `payment_*`, `notification_*`) to prevent name collisions and to make the graph structure self-documenting.
|
|
153
|
+
|
|
137
154
|
#### 10-workflow-unit-testing
|
|
138
155
|
|
|
139
156
|
All LLM calls within workflow nodes are external API calls and MUST be mocked in unit tests per [agentme-edr-018](018-ai-llm-development-standards.md) rule `04-unit-test-mocking`. Workflow unit tests must run fully offline with no real LLM provider calls.
|
|
@@ -182,6 +182,12 @@ Where $\hat{p}$ is observed accuracy and $n$ is sample count. Accuracy and F1 ar
|
|
|
182
182
|
- MLflow run: experiment `workflow-document-review/eval-basic` — view with `mlflow ui`
|
|
183
183
|
```
|
|
184
184
|
|
|
185
|
+
#### 04-eval-mlflow-unique-port
|
|
186
|
+
|
|
187
|
+
Each `evals/<component>/eval-<name>/Makefile` MUST start its MLflow tracking server on a **unique port** to prevent conflicts when multiple eval Makefiles are run concurrently or in parallel (e.g., in CI or across multiple terminal sessions).
|
|
188
|
+
|
|
189
|
+
Ports MUST be statically assigned per eval scenario and MUST NOT reuse the default `5000` port (reserved for `dev-mlflow` per [agentme-edr-008](../devops/008-common-targets.md) rule `09-ai-project-dev-targets`). Assign ports starting at `5100` and incrementing by 1 for each additional eval scenario across the entire project.
|
|
190
|
+
|
|
185
191
|
## References
|
|
186
192
|
|
|
187
193
|
- [agentme-edr-007](../principles/007-project-quality-standards.md) — Project quality standards: when evals are required per AI tier (rule `09-ai-project-testing-requirements`) and statistical model eval targets (rule `07-statistical-models-must-have-eval-targets`)
|