npm - agentme - Versions diffs - 0.14.0 → 0.16.0 - Mend

agentme 0.14.0 → 0.16.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

package/.xdrs/agentme/edrs/application/020-ai-workflow-development-standards.md ADDED Viewed

@@ -0,0 +1,262 @@
+---
+name: agentme-edr-policy-020-ai-workflow-development-standards
+description: Defines the standard toolchain, framework, observability, and workflow patterns for building LangGraph workflows in Python. Use when scaffolding, reviewing, or extending AI workflow projects that orchestrate LLM calls, agents, and algorithmic nodes. For simple LLM calls see agentme-edr-018, for agentic patterns see agentme-edr-019.
+apply-to: AI workflow projects using LangGraph StateGraph built with Python
+valid-from: 2026-06-05
+---
+# agentme-edr-policy-020: AI workflow development standards
+## Context and Problem Statement
+AI workflow projects vary widely in how they structure directed graphs, manage state, evaluate outputs, and test execution paths. Without a shared baseline, projects accumulate incompatible patterns for flow design, state management, and dataset-driven testing.
+Which tools, frameworks, and design patterns should AI workflow projects follow to ensure reproducibility, testability, and maintainability?
+## Decision Outcome
+**Use Python with LangGraph for flow orchestration and MLflow for experiment tracking and local evaluation.**
+### Details
+#### 01-language-and-framework
+Workflows MUST be built with **LangGraph**. Use LangGraph `StateGraph` to model each distinct workflow as an explicit directed graph with typed state.
+For all direct LLM calls within workflow nodes, use LangChain per [agentme-edr-018](018-ai-llm-development-standards.md). For agent nodes with tool-invocation loops, use deepagents per [agentme-edr-019](019-ai-agents-development-standards.md).
+#### 03-observability-and-experiment-tracking
+Use **MLflow** for all workflow observability and evaluation:
+- **Workflow-level tracking:** Wrap each workflow run with `mlflow.start_run()` to capture traces, parameters, and metrics locally.
+- **LLM-level auto-tracing:** Enable LangChain auto-tracing per [agentme-edr-018](018-ai-llm-development-standards.md) rule `03-llm-observability` by calling `mlflow.langchain.autolog()` during application startup. This captures inputs, outputs, token counts, and latency for every LangChain call within workflow nodes.
+- Log run parameters (model name, temperature, prompt version) and output metrics (accuracy, latency, token counts) using `mlflow.log_param` / `mlflow.log_metric`.
+- Run a local MLflow tracking server with `mlflow ui` to inspect runs during development. Do not require a remote MLflow server for local development.
+- The project Makefile MUST expose a `dev-mlflow` target to start the local MLflow tracking server, per [agentme-edr-008](../devops/008-common-targets.md) rule `09-ai-project-dev-targets`.
+#### 04-dataset-driven-accuracy-measurement
+Eval dataset and implementation requirements are defined in [agentme-edr-021](021-ai-eval-standards.md). Testing requirements (when evals are required, release gates) are defined in [agentme-edr-007](../principles/007-project-quality-standards.md) rule `09-ai-project-testing-requirements`.
+#### 05-flow-documentation
+Each workflow MUST be documented as a **Mermaid graph** in a `README.md`. The diagram must match the LangGraph `StateGraph` definition:
+- Use `graph TD` or `graph LR` direction.
+- Label each node with its Python function name.
+- Label conditional edges with the condition expression.
+- Update the diagram whenever the graph topology changes.
+Example minimal diagram block:
+```mermaid
+graph TD
+    A[fetch_context] --> B[draft_response]
+    B --> C{verify}
+    C -->|pass| D[output]
+    C -->|fail| B
+```
+#### 06-verification-steps
+Workflows MUST include at least one explicit verification node before producing final output:
+- Model the verification step as a dedicated LangGraph node (e.g. `verify_output`).
+- The node checks the draft output against defined acceptance criteria (schema validation, factual consistency check, rubric scoring, or LLM-as-judge call).
+- On failure, the verification node MUST route back to the relevant generation node, not silently pass through.
+- Log verification results (pass/fail, score, reason) as MLflow metrics on the current run.
+#### 07-workflow-structure
+Workflow logic MUST be organized as named workflows following [agentme-edr-026](026-pragmatic-hexagonal-architecture.md). Each workflow is an independent LangGraph `StateGraph` with a defined start node and end node, connecting LLM nodes, agent nodes, algorithmic nodes, states, routes, and decision nodes.
+Workflows live inside `app/workflows/` (the application layer), while external integrations such as LLM providers, vector stores, and third-party APIs live under `adapters/connectors/` (the outbound adapter layer). Inbound interfaces (HTTP API, CLI) live under `adapters/` as inbound adapters.
+For each workflow named `<workflow>`, the full project layout is:
+```text
+lib/src/<package_name>/
+  adapters/
+    http/                      # inbound: API server that triggers workflows
+    cli/                       # inbound: CLI entry point (if applicable)
+    connectors/                # outbound: external resource integrations
+      openai/                  # LLM provider connector
+      azure-openai/            # alternative LLM provider connector
+      postgres/                # database connector (if applicable)
+      vector-store/            # vector DB connector (if applicable)
+  app/
+    workflows/
+      <workflow>/
+        graph.py               # StateGraph definition; entry point for the workflow
+        agents.py              # deepagents agent definitions used by this workflow
+        states.py              # Typed state dataclasses / TypedDicts
+        routes.py              # Conditional edge functions
+  shared/                      # infrastructure-agnostic utilities
+```
+- `app/workflows/<workflow>/graph.py` MUST define and compile the `StateGraph` and expose a `graph` object that callers invoke.
+- Tool calls within workflow nodes that interact with external systems MUST use connectors from `adapters/connectors/`, not inline API calls.
+- Additional modules (prompts, schemas) MAY be added inside `app/workflows/<workflow>/` when they are specific to that workflow. Shared utilities belong in `shared/`.
+#### 08-workflow-evals
+Eval folder structure and script requirements are defined in [agentme-edr-021](021-ai-eval-standards.md).
+#### 09-node-naming-conventions
+LangGraph node names MUST follow a suffix convention that communicates the node's role at a glance. Names MUST be action-oriented and descriptive.
+| Suffix | Node type | When to use |
+|---|---|---|
+| `_llm` | LLM call | Any node whose primary action is a direct LLM inference call (see [agentme-edr-018](018-ai-llm-development-standards.md)) |
+| `_step` | Algorithmic step | Deterministic logic with no LLM involvement (transformation, validation, routing) |
+| `_tool` | Tool/API call | A node that wraps a single external tool or API (e.g. a REST endpoint, DB query) |
+| `_agent` | Subgraph agent | A node that invokes a nested subgraph containing its own tool-invocation cycle and LLM calls; use the **deepagents** library for these nodes (see [agentme-edr-019](019-ai-agents-development-standards.md)) |
+The Python function implementing the node SHOULD share the same name as the node alias passed to `add_node`, so that graph definitions and stack traces remain unambiguous:
+```python
+def draft_doc_llm(state): ...
+graph.add_node("draft_doc_llm", draft_doc_llm)
+# Tool node — calls the Stripe API
+def stripe_api_tool(state): ...
+graph.add_node("stripe_api_tool", stripe_api_tool)
+# Agent node — uses deepagents for tool-invocation loop
+def code_reviewer_agent(state): ...
+graph.add_node("code_reviewer_agent", code_reviewer_agent)
+```
+Names MUST NOT use generic labels such as `node1`, `process`, or `run`. Each name must clearly express what action the node performs.
+#### 10-workflow-unit-testing
+All LLM calls within workflow nodes are external API calls and MUST be mocked in unit tests per [agentme-edr-018](018-ai-llm-development-standards.md) rule `04-unit-test-mocking`. Workflow unit tests must run fully offline with no real LLM provider calls.
+Choose the mock utility based on what the node under test expects from the model:
+- Use **`FakeListChatModel`** when nodes only read `AIMessage.content` (e.g. a routing node that checks a text label).
+- Use **`GenericFakeChatModel`** when any node in the workflow expects tool calls, structured outputs, or when the workflow contains `_agent` nodes that drive a tool-invocation loop.
+**Example — workflow with plain-text LLM nodes:**
+```python
+from langchain_core.language_models.fake_chat_models import FakeListChatModel
+def test_document_workflow_approve_path():
+    # Responses consumed in node execution order
+    fake_model = FakeListChatModel(responses=["APPROVE", "Meets all criteria."])
+    workflow = DocumentWorkflow(model=fake_model)
+    result = workflow.run(input_doc)
+    assert result.status == "approved"
+```
+**Example — workflow containing an agent node (`_agent` suffix):**
+```python
+from langchain_core.language_models.fake_chat_models import GenericFakeChatModel
+from langchain_core.messages import AIMessage
+def test_document_workflow_with_agent_node():
+    tool_call_msg = AIMessage(
+        content="",
+        tool_calls=[{"name": "fetch_context", "args": {"doc_id": "42"}, "id": "c1"}]
+    )
+    agent_final_msg = AIMessage(content="Context retrieved successfully.")
+    routing_msg = AIMessage(content="APPROVE")
+    fake_model = GenericFakeChatModel(
+        messages=iter([tool_call_msg, agent_final_msg, routing_msg])
+    )
+    workflow = DocumentWorkflow(model=fake_model)
+    result = workflow.run(input_doc)
+    assert result.status == "approved"
+```
+Workflows MUST accept the LLM instance as a constructor parameter so that unit tests can inject a fake. See the injectable LLM pattern in [agentme-edr-018](018-ai-llm-development-standards.md) rule `04-unit-test-mocking`.
+#### 11-state-type-conventions
+All TypedDict and dataclass types that represent LangGraph node or workflow state MUST end with `_state` in their name. This suffix signals at a glance that the type is a state boundary, not a plain data model.
+**Naming reference:**
+| Owner | Naming pattern | Example |
+|---|---|---|
+| Single agent / agent subgraph | `<agent_name>_agent_state` | `reviewer_agent_state` |
+| Full workflow (`StateGraph`) | `<workflow_name>_workflow_state` | `document_workflow_state` |
+| Named group of nodes sharing state | `<group_responsibility>_state` | `retrieval_pipeline_state` |
+**Boundary rules:**
+- Each agent or agent subgraph MUST define its own dedicated state type. Do not reuse or extend a generic state across unrelated agents.
+- Each workflow (`StateGraph`) MUST define its own top-level state type. The workflow state is the authoritative boundary for that graph's inputs and outputs.
+- When a group of nodes (not a full workflow and not a single agent) shares a state type, the type name MUST clearly reflect the shared responsibility. Generic names such as `shared_state`, `common_state`, or `global_state` are FORBIDDEN.
+- Large workflows MUST NOT use a single monolithic state that all nodes read and write. Split the state into per-phase or per-agent state types scoped to the subgraph or set of nodes that produce or consume each field.
+State type names SHOULD align with the agent or node names defined in rule `09-node-naming-conventions` (e.g., an agent node named `draft_doc_agent` has a state type named `draft_doc_agent_state`).
+#### 12-workflow-naming-conventions
+LangGraph `StateGraph` instances and their enclosing classes MUST be given a meaningful name that conveys the workflow's input, output, and/or behavior. The name MUST end with `Workflow` (PascalCase class) or `_workflow` (snake_case variable or directory).
+Choose a name that summarises what the workflow consumes, processes, and produces — avoid generic labels such as `Pipeline`, `Flow`, `Graph`, or `Process`.
+| Context | Pattern | Example |
+|---|---|---|
+| Python class | `<DescriptiveName>Workflow` | `FileMapJudgeReduceWorkflow` |
+| Python variable / instance | `<descriptive_name>_workflow` | `file_map_judge_reduce_workflow` |
+| Directory under `app/workflows/` | `<descriptive_name>_workflow` | `financial_report_analysis_workflow/` |
+**Good names** communicate purpose at a glance:
+- `FileMapJudgeReduceWorkflow` — maps files, judges each, then reduces results
+- `FinancialReportAnalysisWorkflow` — analyses financial report inputs
+- `MarketingCampaignExecutorWorkflow` — executes a marketing campaign end-to-end
+**Bad names** (FORBIDDEN): `MainWorkflow`, `AgentGraph`, `ProcessFlow`, `Workflow1`, `RunGraph`.
+#### 15-workflow-state-persistence
+For long-running workflows that may need to be paused and resumed:
+- Use LangGraph's built-in checkpointing with `MemorySaver` for development and testing.
+- Use persistent checkpointers (e.g., `PostgresSaver`, or Redis-based checkpointers) for production workflows that need durability.
+- Checkpoint state MUST be serializable (use TypedDict or dataclasses with JSON-compatible fields).
+- Document the checkpoint strategy in the workflow's README.md.
+**Example with MemorySaver (development):**
+```python
+from langgraph.checkpoint.memory import MemorySaver
+checkpointer = MemorySaver()
+graph = workflow.compile(checkpointer=checkpointer)
+# Resume from checkpoint
+result = graph.invoke(input_state, config={"thread_id": "session-123"})
+```
+**When to use checkpointing:**
+- Workflows that take > 30 seconds to complete
+- Workflows that require human-in-the-loop approval or input
+- Workflows that are non-indempotent
+- Workflows that may fail mid-execution and need to be retried from the last successful node
+- Multi-session workflows where state persists across user interactions
+## References
+- [agentme-edr-018](018-ai-llm-development-standards.md) — LLM development standards: LangChain framework, provider configuration, LLM observability, and unit test mocking
+- [agentme-edr-019](019-ai-agents-development-standards.md) — Agent development standards: deepagents framework, tool-invocation loops, and agent patterns
+- [agentme-edr-026](026-pragmatic-hexagonal-architecture.md) — Adapter/application layer separation that defines the project layout
+- [agentme-edr-014](014-python-project-tooling.md) — Python project tooling and structure
+- [agentme-edr-024](024-ml-dataset-structure.md) — ML dataset structure for eval datasets
+- [agentme-edr-021](021-ai-eval-standards.md) — AI eval standards: folder structure, script requirements, and MLflow tracking
+- [agentme-edr-007](../principles/007-project-quality-standards.md) — Project quality standards including AI-tier testing requirements (rule `09-ai-project-testing-requirements`)

package/.xdrs/agentme/edrs/application/021-ai-eval-standards.md ADDED Viewed

@@ -0,0 +1,90 @@
+---
+name: agentme-edr-policy-021-ai-eval-standards
+description: Defines how to structure, write, and run eval tests for AI projects — folder layout, script requirements, and MLflow tracking. Use when implementing evals for LLM, Agent, or Workflow projects. For when evals are required see agentme-edr-007 rule 09-ai-project-testing-requirements.
+apply-to: Python AI projects (LLM, Agent, or Workflow tier) that implement eval testing
+valid-from: 2026-06-05
+---
+# agentme-edr-policy-021: AI eval standards
+## Context and Problem Statement
+Eval tests measure AI component accuracy against expected outputs using real LLM providers. Without a shared folder layout and script convention, eval setups diverge across LLM, Agent, and Workflow projects, making them hard to run, compare, and integrate into CI/CD pipelines.
+How should eval tests be structured and run across all AI tiers?
+## Decision Outcome
+**Use a per-component folder structure under `evals/` with a standardized Makefile interface and MLflow-backed scripts, applicable to LLM, Agent, and Workflow components.**
+For when evals are required per AI tier, see [agentme-edr-007](../principles/007-project-quality-standards.md) rule `09-ai-project-testing-requirements`.
+### Details
+#### 01-eval-folder-structure
+For each AI component being evaluated (an LLM chain, agent, or workflow), create a corresponding directory under `evals/` at the same level as `lib/` and `examples/`:
+```text
+evals/
+  <component>/
+    Makefile                  # eval targets for this component
+    dataset_<group>/          # one folder per eval group (see agentme-edr-024)
+    eval_<group>.py           # evaluation script for each group
+```
+Where `<component>` is the name of the LLM chain, agent, or workflow being evaluated (e.g., `summarizer`, `file_analyzer_agent`, `document_review_workflow`).
+The per-component `evals/<component>/Makefile` MUST define:
+| Target | Behaviour |
+|---|---|
+| `eval` | Runs all eval groups for the component |
+| `eval-<group>` | Runs one named group (e.g. `eval-simple`, `eval-complex`) |
+The module root Makefile MUST expose a `make eval` target that delegates to `eval` in every `evals/<component>/Makefile`:
+```makefile
+eval:
+	$(MAKE) -C evals/summarizer eval
+	$(MAKE) -C evals/document_review_workflow eval
+```
+#### 02-eval-script-requirements
+Each `eval_<group>.py` script MUST:
+- Load the dataset from `evals/<component>/dataset_<group>/` following [agentme-edr-024](024-ml-dataset-structure.md). For input/output pairs, use the JSONL format per `agentme-edr-024.04-complex-structured-datasets-must-use-jsonl`.
+- Run every input through the live component against **real LLM providers** (not mocked responses), to capture model drift.
+- Log per-sample and aggregate metrics to an MLflow experiment that runs **locally** — a remote MLflow server MUST NOT be required.
+- Compare outputs to expected values using project-defined quality thresholds. Thresholds MUST be declared explicitly (e.g., in a Makefile variable or README).
+- Exit with a non-zero status when any metric falls below its defined threshold, consistent with [agentme-edr-007](../principles/007-project-quality-standards.md) rule `07-statistical-models-must-have-eval-targets`.
+**Example:**
+```python
+import mlflow
+from my_package.app.workflows.document_review_workflow.graph import graph
+EVAL_MIN_ACCURACY = 0.85
+with mlflow.start_run():
+    results = []
+    for sample in load_dataset("evals/document_review_workflow/dataset_basic/"):
+        output = graph.invoke({"document": sample["input"]})
+        results.append(output["label"] == sample["expected_label"])
+    accuracy = sum(results) / len(results)
+    mlflow.log_metric("accuracy", accuracy)
+    if accuracy < EVAL_MIN_ACCURACY:
+        raise SystemExit(f"Eval failed: accuracy {accuracy:.2f} < {EVAL_MIN_ACCURACY}")
+```
+## References
+- [agentme-edr-007](../principles/007-project-quality-standards.md) — Project quality standards: when evals are required per AI tier (rule `09-ai-project-testing-requirements`) and statistical model eval targets (rule `07-statistical-models-must-have-eval-targets`)
+- [agentme-edr-018](018-ai-llm-development-standards.md) — LLM development standards: LangChain framework and observability
+- [agentme-edr-019](019-ai-agents-development-standards.md) — Agent development standards
+- [agentme-edr-020](020-ai-workflow-development-standards.md) — Workflow development standards
+- [agentme-edr-024](024-ml-dataset-structure.md) — ML dataset structure for eval datasets

package/.xdrs/agentme/edrs/application/{019-ml-dataset-structure.md → 024-ml-dataset-structure.md} RENAMED Viewed

@@ -1,11 +1,11 @@
 ---
-name: agentme-edr-policy-019-ml-dataset-structure
+name: agentme-edr-policy-024-ml-dataset-structure
 description: Defines the standard folder layout and file conventions for ML datasets used in AI/ML projects. Use when creating, organizing, or consuming datasets for machine learning tasks such as image labeling, document extraction, tabular data, LLM evaluation, and Q&A sets.
 apply-to: ML and AI projects that produce or consume datasets
 valid-from: 2026-05-27
 ---
-# agentme-edr-policy-019: ML dataset structure
+# agentme-edr-policy-024: ML dataset structure
 ## Context and Problem Statement

package/.xdrs/agentme/edrs/application/{020-ai-agent-xdrs-knowledge-layer.md → 025-ai-agent-xdrs-knowledge-layer.md} RENAMED Viewed

@@ -1,11 +1,11 @@
 ---
-name: agentme-edr-policy-020-ai-agent-xdrs-knowledge-layer
+name: agentme-edr-policy-025-ai-agent-xdrs-knowledge-layer
 description: Defines how to integrate XDRS as the runtime knowledge source of truth for AI agents — covering document placement, AGENTS.md setup, file tools, and local sandbox configuration. Apply only when the project explicitly uses XDRS to govern agent behavior.
 apply-to: AI agent projects that use XDRS as the source of truth for policies and skills
 valid-from: 2026-05-27
 ---
-# agentme-edr-policy-020: AI agent XDRS knowledge layer
+# agentme-edr-policy-025: AI agent XDRS knowledge layer
 ## Context and Problem Statement
@@ -17,7 +17,7 @@ How should an AI agent project integrate XDRS as its runtime source of truth for
 **Embed XDRS documents in `lib/data/.xdrs/`, instruct the agent to consult them via `AGENTS.md`, equip the agent with sandboxed file tools, and use the deepagents framework when a local sandbox is required.**
-This policy MUST only be applied when the project explicitly chooses XDRS as its knowledge governance layer. It is not required by [agentme-edr-018](018-ai-agent-development-standards.md) in general.
+This policy MUST only be applied when the project explicitly chooses XDRS as its knowledge governance layer. It is not required by [agentme-edr-019](019-ai-agents-development-standards.md) or [agentme-edr-020](020-ai-workflow-development-standards.md) in general.
 ### Details
@@ -91,7 +91,7 @@ data_root = str(files("myagent").joinpath("data"))
 agents_md = Path(temp_root) / "AGENTS.md"
 agents_md.write_text(_AGENTS_MD)  # content from xdrs-core AGENTS.md template; see rule 01-xdrs-knowledge-layer
-# Add these mounts alongside the base mounts from agentme-edr-018 rule 09-local-sandbox:
+# Add these mounts alongside the base mounts from agentme-edr-019 rule 02-local-sandbox:
 xdrs_mounts = [
     {"src": f"{data_root}/.xdrs", "dst": "/.xdrs",    "readonly": True},
     {"src": str(agents_md),       "dst": "/AGENTS.md", "readonly": True},

package/.xdrs/agentme/edrs/application/{021-pragmatic-hexagonal-architecture.md → 026-pragmatic-hexagonal-architecture.md} RENAMED Viewed

@@ -1,11 +1,11 @@
 ---
-name: agentme-edr-policy-021-pragmatic-hexagonal-architecture
+name: agentme-edr-policy-026-pragmatic-hexagonal-architecture
 description: Defines a pragmatic variant of Hexagonal Architecture for organizing application source code into Adapters (inbound/outbound I/O boundaries) and Application (business logic) layers, with explicit naming conventions and folder structure. Use when designing or reviewing the internal layout of application modules.
 apply-to: All application projects
 valid-from: 2026-05-28
 ---
-# agentme-edr-policy-021: Pragmatic hexagonal architecture
+# agentme-edr-policy-026: Pragmatic hexagonal architecture
 ## Context and Problem Statement

package/.xdrs/agentme/edrs/application/skills/001-create-javascript-project/SKILL.md CHANGED Viewed

@@ -15,12 +15,12 @@ compatibility: JavaScript/TypeScript, Node.js 18+
 Creates a complete JavaScript/TypeScript project from scratch. The layout keeps the
 package self-contained in its module root (`lib/`), organizes internal code following
-[agentme-edr-021](../../021-pragmatic-hexagonal-architecture.md) (`adapters/`, `app/`, `shared/`),
+[agentme-edr-026](../../026-pragmatic-hexagonal-architecture.md) (`adapters/`, `app/`, `shared/`),
 places runnable consumer examples in the sibling `examples/` folder, redirects persistent caches
 into `.cache/`, and uses Makefiles as the only entry points. Boilerplate is derived from the
 [filedist](https://github.com/flaviostutz/filedist) project.
-Related EDRs: [agentme-edr-003](../../003-javascript-project-tooling.md), [agentme-edr-016](../../../principles/016-cross-language-module-structure.md), [agentme-edr-021](../../021-pragmatic-hexagonal-architecture.md)
+Related EDRs: [agentme-edr-003](../../003-javascript-project-tooling.md), [agentme-edr-016](../../../principles/016-cross-language-module-structure.md), [agentme-edr-026](../../026-pragmatic-hexagonal-architecture.md)
 ## Instructions

package/.xdrs/agentme/edrs/application/skills/003-create-golang-project/SKILL.md CHANGED Viewed

@@ -12,9 +12,9 @@ compatibility: Go 1.21+
 ## Overview
-Creates a complete Go project from scratch, following the layout from [agentme-edr-010](../../010-golang-project-tooling.md) and [agentme-edr-021](../../021-pragmatic-hexagonal-architecture.md). Business logic lives in `app/<feature>/` packages; CLI wiring lives in `adapters/cli/`; outbound integrations live in `adapters/connectors/`; `main.go` is a thin dispatcher. The module root owns its `Makefile`, `README.md`, `dist/`, and `.cache/` folders.
+Creates a complete Go project from scratch, following the layout from [agentme-edr-010](../../010-golang-project-tooling.md) and [agentme-edr-026](../../026-pragmatic-hexagonal-architecture.md). Business logic lives in `app/<feature>/` packages; CLI wiring lives in `adapters/cli/`; outbound integrations live in `adapters/connectors/`; `main.go` is a thin dispatcher. The module root owns its `Makefile`, `README.md`, `dist/`, and `.cache/` folders.
-Related EDRs: [agentme-edr-010](../../010-golang-project-tooling.md), [agentme-edr-016](../../../principles/016-cross-language-module-structure.md), [agentme-edr-021](../../021-pragmatic-hexagonal-architecture.md)
+Related EDRs: [agentme-edr-010](../../010-golang-project-tooling.md), [agentme-edr-016](../../../principles/016-cross-language-module-structure.md), [agentme-edr-026](../../026-pragmatic-hexagonal-architecture.md)
 ## Instructions

package/.xdrs/agentme/edrs/application/skills/005-create-python-project/SKILL.md CHANGED Viewed

@@ -14,11 +14,11 @@ compatibility: Python 3.12+
 Creates a complete Python project from scratch using Mise, `uv`, `pyproject.toml`, Ruff,
 ty, Pytest, and Makefiles. The layout keeps the package self-contained under `lib/`,
-organizes internal code following [agentme-edr-021](../../021-pragmatic-hexagonal-architecture.md)
+organizes internal code following [agentme-edr-026](../../026-pragmatic-hexagonal-architecture.md)
 (`adapters/`, `app/`, `shared/`), uses a shared root `.venv/`, redirects persistent caches into
 `.cache/`, and places runnable consumer projects under the sibling `examples/` folder.
-Related EDRs: [agentme-edr-014](../../014-python-project-tooling.md), [agentme-edr-016](../../../principles/016-cross-language-module-structure.md), [agentme-edr-021](../../021-pragmatic-hexagonal-architecture.md)
+Related EDRs: [agentme-edr-014](../../014-python-project-tooling.md), [agentme-edr-016](../../../principles/016-cross-language-module-structure.md), [agentme-edr-026](../../026-pragmatic-hexagonal-architecture.md)
 ## Instructions
@@ -282,7 +282,7 @@ make test
 ### Phase 4: Create the package and tests inside `lib/`
-Create this baseline structure following [agentme-edr-021](../../021-pragmatic-hexagonal-architecture.md).
+Create this baseline structure following [agentme-edr-026](../../026-pragmatic-hexagonal-architecture.md).
 **`lib/src/[package_name]/__init__.py`**

package/.xdrs/agentme/edrs/devops/008-common-targets.md CHANGED Viewed

@@ -73,7 +73,7 @@ Targets are organized into five lifecycle groups. Projects must use these names
 | Target | Purpose |
 |--------|---------|
 | `setup` | Run `mise install` and any small project bootstrap needed before normal targets work. This is the first command after checkout. |
-| `all` | Alias that runs `build`, `lint`, and `test` in sequence. Must be the default target (i.e., running `make` or the runner with no arguments invokes `all`). Used by developers as a fast pre-push check to verify the software meets minimum quality standards in one command. |
+| `all` | Alias that runs `build`, `lint`, and `test` in sequence. Must be the default target (i.e., running `make` or the runner with no arguments invokes `all`). Used by developers as a fast pre-push check to verify the software meets minimum quality standards in one command. Must only invoke targets that run **offline** — no external credentials, running servers, paid APIs, or environment-specific configuration outside the repository. |
 | `clean` | Remove all temporary or generated files created during build, lint, or test (e.g., `node_modules`, virtual environments, compiled binaries, generated files). Used both locally and in CI for a clean slate. |
 | `dev` | Run the software locally for development (e.g., start a Node.js API server, open a Jupyter notebook, launch a React dev server). May have debugging tools, verbose logging, or hot reloading features enabled. |
 | `run` | Run the software in production mode (e.g., start a compiled binary, launch a production server). No debugging or development-only features should be enabled. |
@@ -93,13 +93,13 @@ Targets are organized into five lifecycle groups. Projects must use these names
 | Target | Purpose |
 |--------|---------|
-| `lint` | Run **all static quality checks** outside of tests. This MUST include: code formatting validation, code style enforcement, code smell detection, static analysis, dependency audits for known CVEs, security vulnerability scans (e.g., SAST), and project/configuration structure checks. All checks must be non-destructive (read-only); fixes are handled by `lint-fix`. |
+| `lint` | Run **all static quality checks** outside of tests. This MUST include: code formatting validation, code style enforcement, code smell detection, static analysis, dependency audits for known CVEs, security vulnerability scans (e.g., SAST), and project/configuration structure checks. All checks must be non-destructive (read-only); fixes are handled by `lint-fix`. Must only invoke subtargets that run **offline** (no external credentials or services). |
 | `lint-fix` | Automatically fix linting and formatting issues where possible. || `lint-format` | *(Optional)* Check code formatting only (e.g., Prettier, gofmt, Black). |
 ##### Test group
 | Target | Purpose |
 |--------|---------|
-| `test` | Run **all tests** required for the project. This MUST include unit tests (with coverage enforcement — the build MUST fail if coverage thresholds are not met) and integration/end-to-end tests. Normally delegates to `test-unit` and `test-integration` in sequence. |
+| `test` | Run **all offline tests** required for the project. This MUST include unit tests (with coverage enforcement — the build MUST fail if coverage thresholds are not met) and any integration or end-to-end tests that run **offline** (no external servers, credentials, or paid APIs). Normally delegates to `test-unit` and, when offline, `test-integration` in sequence. Suffixed targets that require external dependencies must not be invoked automatically — see rule 08. |
 | `test-unit` | Run unit tests only, including coverage report generation and coverage threshold enforcement. |
 | `test-integration` | *(Optional)* Run integration and end-to-end tests only. Projects without integration tests may omit this target. |
 | `test-smoke` | *(Optional)* Run a fast, minimal subset of tests to verify the software is basically functional. Useful as a post-deploy health check. |
@@ -150,6 +150,28 @@ The prefix convention ensures developers can infer the purpose of any target wit
 ---
+#### 09-ai-project-dev-targets
+AI-based projects (LLM, Agent, and Workflow tiers as defined in [agentme-edr-018](../application/018-ai-llm-development-standards.md)) MUST expose a `dev-mlflow` target that starts a local MLflow tracking server for development inspection.
+**Example implementation:**
+```makefile
+dev-mlflow:
+	mise exec -- mlflow ui --host 0.0.0.0 --port 5000
+	open http://localhost:5000/
+```
+---
+#### 08-default-targets-must-only-include-offline-subtargets
+`make all`, `make test`, and `make lint` must include every subtarget that runs **offline** — meaning it requires no external credentials, no running servers, no paid APIs, and no environment-specific configuration outside the repository.
+Subtargets that require external dependencies (e.g., `test-integration` against a live database, `test-e2e` against a staging environment, `lint-api` against a remote schema registry) **must** exist as named targets so developers can invoke them explicitly, but **must not** be invoked from `all`, `test`, or `lint`.
+---
 #### 06-monorepo-usage
 In a monorepo, each module has its own `Makefile` with its own `build`, `lint`, `test`, and `deploy` targets scoped to that module. Parent-level Makefiles (at the application or repo root) delegate to child Makefiles in sequence. The parent Makefile should call `$(MAKE) -C <child> <target>` directly, while each child `Makefile` runs its actual tool commands through `mise exec --`.
@@ -194,6 +216,9 @@ make lint-fix
 # run the software in dev mode (may have hot reload, debug tools enabled, verbose logging etc)
 make dev
+# [AI projects only] start a local MLflow tracking server for development inspection
+make dev-mlflow
 # run the software in production mode
 make run