PyPI - agentmark-sdk - Versions diffs - 0.2.1__tar.gz → 0.3.0__tar.gz - Mend

agentmark-sdk 0.2.1tar.gz → 0.3.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

{agentmark_sdk-0.2.1 → agentmark_sdk-0.3.0}/.gitignore RENAMED Viewed

@@ -9,7 +9,9 @@ yalc.lock
 .env
 *storybook.log
 storybook-static
-tmp-dev*/
+# CLI test fixtures (test/dev.test.ts et al. create tmp-* dirs in packages/cli;
+# crashed runs can leave them behind)
+tmp-*/
 .claude
 # Nx

{agentmark_sdk-0.2.1 → agentmark_sdk-0.3.0}/CHANGELOG.md RENAMED Viewed

@@ -1,3 +1,46 @@
+## 0.3.0 (2026-06-09)
+### 🚀 Features
+- feat(webhook): the runner owns dispatch; evals reach the cloud on every path ([#717](https://github.com/agentmark-ai/agentmark/pull/717))
+  The New Experiment dialog showed *"No evals available"* for deployed apps even
+  when they registered evals. Root cause: no single object owned "what the
+  deployed app exposes," so the eval registry had to travel a hand-assembled chain
+  (client → executor → runner → handler → dispatch → transport) that every entry
+  path re-wired — and any path could drop it. The Python managed handler hand-rolled
+  dispatch and 400'd on `get-evals`; the TS managed server forwarded the dispatch
+  envelope raw; the BYO `createWebhookRunner` built a client with no `evals` input
+  at all. This makes the chain non-assemblable.
+  - **Dispatch lives on the runner.** `WebhookRunner.dispatch(event)` (TS + Python)
+    routes prompt-run / dataset-run / get-evals, sourcing evals from its OWN
+    client — no passable, omittable client argument. The canonical managed handler
+    is `handler = runner.dispatch` (or `adapterHandler.dispatch`). `runner.client`
+    / `getEvalNames()` are public so a runner satisfies the control-plane contract.
+  - **`evals` is threaded through every builder.** TS `createWebhookRunner({ evals })`
+    and the new Python `create_webhook_runner(executor, evals=…)` register evals
+    once → they both run in experiments and list in the dialog. Adapter factories
+    already threaded evals; now the BYO path does too.
+  - **Adapters delegate, don't reimplement.** Pydantic / claude / ai-sdk-v4 / v5
+    webhook handlers expose `.dispatch` + `.client` by delegating to the shared
+    runner (both span hooks bundled at construction); no per-adapter dispatch code.
+  - **Anti-drift.** `conformance-vectors/protocol-catalog.json` gains a normative
+    `webhookJobs` section; both languages assert their REAL dispatch's job-type set
+    (`WEBHOOK_JOB_TYPES` / `WebhookRequest['type']`) is exhaustive over it, and the
+    get-evals payload stays pinned to `control-plane.json` on the dev AND managed
+    surfaces. Adding a job to one language without the other fails the other's CI.
+  New public API (minor) across prompt-core (TS + Python), the SDK
+  (`createWebhookRunner` `evals` option), and the adapters (`dispatch`/`client`).
+  Back-compat: `handleWebhookRequest(event, handler, client?)` still works; the
+  managed servers still accept legacy flat results. The managed Node server now
+  unwraps the dispatch envelope (the TS half of the empty dialog) — see
+  `apps/builder` machine-execute-contract test (monorepo, not released here).
 ## 0.2.1 (2026-04-16)
 ### 🩹 Fixes

{agentmark_sdk-0.2.1 → agentmark_sdk-0.3.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: agentmark-sdk
-Version: 0.2.1
+Version: 0.3.0
 Summary: AgentMark SDK for Python - Tracing and Observability
 Author-email: AgentMark <support@agentmark.co>
 License: MIT

{agentmark_sdk-0.2.1 → agentmark_sdk-0.3.0}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "agentmark-sdk"
-version = "0.2.1"
+version = "0.3.0"
 description = "AgentMark SDK for Python - Tracing and Observability"
 requires-python = ">=3.10"
 keywords = [

{agentmark_sdk-0.2.1 → agentmark_sdk-0.3.0}/src/agentmark_sdk/__init__.py RENAMED Viewed

@@ -46,6 +46,7 @@ from .pii_masker import CustomPattern, PiiMaskerConfig, create_pii_masker
 from .sampler import AgentmarkSampler
 from .sdk import AgentMarkSDK
 from .serialize import serialize_value
+from .span_hooks import create_agentmark_span_hooks
 from .trace import SpanContext, SpanOptions, SpanResult, span, span_context, span_context_sync
 __all__ = [
@@ -57,6 +58,7 @@ __all__ = [
     "span",
     "span_context",
     "span_context_sync",
+    "create_agentmark_span_hooks",
     "observe",
     "SpanOptions",
     "SpanContext",

agentmark_sdk-0.3.0/src/agentmark_sdk/span_hooks.py ADDED Viewed

@@ -0,0 +1,80 @@
+"""AgentMark span hooks for the shared ``WebhookRunner``.
+``create_agentmark_span_hooks()`` is the Python counterpart of the TypeScript
+``createAgentmarkSpanHooks()`` (``@agentmark-ai/sdk``): the one call that wires a
+runner so every prompt run and every experiment item is traced to AgentMark. A
+bring-your-own-SDK app passes it to ``create_webhook_runner`` (which also
+defaults to it when this SDK is installed), so Python BYO tracing is as
+turn-key as TypeScript.
+The hooks map the runner's per-call params (``ExperimentItemParams`` /
+``PromptSpanParams`` from ``agentmark.prompt_core``, duck-typed here so this SDK
+stays prompt-core-free) onto ``span_context``. They are intentionally identical
+to the per-adapter hooks the pydantic / claude adapters define inline today —
+those should adopt this single source in a follow-up so the mapping lives once.
+"""
+from __future__ import annotations
+import json
+from contextlib import asynccontextmanager, suppress
+from dataclasses import dataclass
+from typing import Any
+from .trace import SpanOptions, span_context
+@dataclass
+class _AgentMarkSpanCtx:
+    """Adapts a span context to the hook Protocol the shared runner expects:
+    ``trace_id`` + ``set_attribute(key, value)``."""
+    _inner: Any
+    trace_id: str = ""
+    def set_attribute(self, key: str, value: str) -> None:
+        with suppress(Exception):
+            self._inner.set_attribute(key, value)
+@asynccontextmanager
+async def _item_span(params: Any) -> Any:
+    """Per-item experiment span: maps the runner's params → SpanOptions."""
+    dataset_expected = (
+        json.dumps(params.dataset_expected_output)
+        if params.dataset_expected_output is not None
+        else None
+    )
+    dataset_input_json = (
+        json.dumps(params.dataset_input, default=str)
+        if params.dataset_input is not None
+        else None
+    )
+    options = SpanOptions(
+        name=f"experiment-{params.dataset_run_name}-{params.index}",
+        prompt_name=params.prompt_name,
+        dataset_run_id=params.experiment_run_id,
+        dataset_run_name=params.dataset_run_name,
+        dataset_item_name=params.dataset_item_name,
+        dataset_expected_output=dataset_expected,
+        dataset_input=dataset_input_json,
+        dataset_path=params.dataset_path,
+        metadata={"commit_sha": params.commit_sha} if params.commit_sha else None,
+    )
+    async with span_context(options) as ctx:
+        yield _AgentMarkSpanCtx(_inner=ctx, trace_id=ctx.trace_id)
+@asynccontextmanager
+async def _prompt_span(params: Any) -> Any:
+    """Prompt-level span for a single run."""
+    options = SpanOptions(name=params.name, prompt_name=params.prompt_name)
+    async with span_context(options) as ctx:
+        yield _AgentMarkSpanCtx(_inner=ctx, trace_id=ctx.trace_id)
+def create_agentmark_span_hooks() -> dict[str, Any]:
+    """Return ``{"prompt_span_hook", "item_span_hook"}`` for a ``WebhookRunner`` —
+    every run and experiment item traced to AgentMark. Mirrors the TS
+    ``createAgentmarkSpanHooks()``."""
+    return {"prompt_span_hook": _prompt_span, "item_span_hook": _item_span}

agentmark_sdk-0.3.0/tests/test_span_hooks.py ADDED Viewed

@@ -0,0 +1,46 @@
+"""create_agentmark_span_hooks — the shared WebhookRunner span hooks.
+Python counterpart of the TS ``createAgentmarkSpanHooks()``: the one call that
+makes ``create_webhook_runner`` (and the adapters) trace every run and
+experiment item. The hooks take the runner's per-call params duck-typed, so a
+plain namespace stands in for ``PromptSpanParams`` / ``ExperimentItemParams``.
+"""
+from __future__ import annotations
+from types import SimpleNamespace
+from agentmark_sdk import create_agentmark_span_hooks
+def test_returns_both_runner_hooks() -> None:
+    hooks = create_agentmark_span_hooks()
+    assert set(hooks) == {"prompt_span_hook", "item_span_hook"}
+async def test_prompt_span_yields_an_annotatable_ctx() -> None:
+    hooks = create_agentmark_span_hooks()
+    params = SimpleNamespace(name="run", prompt_name="greeting")
+    async with hooks["prompt_span_hook"](params) as ctx:
+        assert hasattr(ctx, "trace_id")
+        ctx.set_attribute("k", "v")  # must not raise
+async def test_item_span_maps_dataset_params_without_error() -> None:
+    # Exercises the dataset-field mapping (json.dumps of expected/input + the
+    # commit_sha metadata branch), the part most likely to drift.
+    hooks = create_agentmark_span_hooks()
+    params = SimpleNamespace(
+        dataset_run_name="exp",
+        index=0,
+        prompt_name="greeting",
+        experiment_run_id="run-1",
+        dataset_item_name="item-0",
+        dataset_expected_output={"ok": True},
+        dataset_input={"q": "hi"},
+        dataset_path="data/x.jsonl",
+        commit_sha="abc",
+    )
+    async with hooks["item_span_hook"](params) as ctx:
+        assert hasattr(ctx, "trace_id")
+        ctx.set_attribute("k", "v")

{agentmark_sdk-0.2.1 → agentmark_sdk-0.3.0}/package.json RENAMED Viewed

File without changes

{agentmark_sdk-0.2.1 → agentmark_sdk-0.3.0}/src/agentmark_sdk/config.py RENAMED Viewed

File without changes

{agentmark_sdk-0.2.1 → agentmark_sdk-0.3.0}/src/agentmark_sdk/decorator.py RENAMED Viewed

File without changes

{agentmark_sdk-0.2.1 → agentmark_sdk-0.3.0}/src/agentmark_sdk/masking_processor.py RENAMED Viewed

File without changes

{agentmark_sdk-0.2.1 → agentmark_sdk-0.3.0}/src/agentmark_sdk/otlp_json_exporter.py RENAMED Viewed

File without changes

{agentmark_sdk-0.2.1 → agentmark_sdk-0.3.0}/src/agentmark_sdk/pii_masker.py RENAMED Viewed

File without changes

{agentmark_sdk-0.2.1 → agentmark_sdk-0.3.0}/src/agentmark_sdk/py.typed RENAMED Viewed

File without changes

{agentmark_sdk-0.2.1 → agentmark_sdk-0.3.0}/src/agentmark_sdk/sampler.py RENAMED Viewed

File without changes

{agentmark_sdk-0.2.1 → agentmark_sdk-0.3.0}/src/agentmark_sdk/sdk.py RENAMED Viewed

File without changes

{agentmark_sdk-0.2.1 → agentmark_sdk-0.3.0}/src/agentmark_sdk/serialize.py RENAMED Viewed

File without changes

{agentmark_sdk-0.2.1 → agentmark_sdk-0.3.0}/src/agentmark_sdk/trace.py RENAMED Viewed

File without changes

{agentmark_sdk-0.2.1 → agentmark_sdk-0.3.0}/tests/__init__.py RENAMED Viewed

File without changes

{agentmark_sdk-0.2.1 → agentmark_sdk-0.3.0}/tests/test_decorator.py RENAMED Viewed

File without changes

{agentmark_sdk-0.2.1 → agentmark_sdk-0.3.0}/tests/test_masking_processor.py RENAMED Viewed

File without changes

{agentmark_sdk-0.2.1 → agentmark_sdk-0.3.0}/tests/test_otlp_json_exporter.py RENAMED Viewed

File without changes

{agentmark_sdk-0.2.1 → agentmark_sdk-0.3.0}/tests/test_pii_masker.py RENAMED Viewed

File without changes

{agentmark_sdk-0.2.1 → agentmark_sdk-0.3.0}/tests/test_sampler.py RENAMED Viewed

File without changes

{agentmark_sdk-0.2.1 → agentmark_sdk-0.3.0}/tests/test_sdk.py RENAMED Viewed

File without changes

{agentmark_sdk-0.2.1 → agentmark_sdk-0.3.0}/tests/test_serialize.py RENAMED Viewed

File without changes

{agentmark_sdk-0.2.1 → agentmark_sdk-0.3.0}/tests/test_trace.py RENAMED Viewed

File without changes

agentmark-sdk 0.2.1__tar.gz → 0.3.0__tar.gz

agentmark-sdk 0.2.1tar.gz → 0.3.0tar.gz