PyPI - agentops-accelerator - Versions diffs - 0.4.4__tar.gz → 0.5.0__tar.gz - Mend

agentops-accelerator 0.4.4tar.gz → 0.5.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (324) hide show

{agentops_accelerator-0.4.4 → agentops_accelerator-0.5.0}/CHANGELOG.md RENAMED Viewed

@@ -5,6 +5,44 @@ This format follows [Keep a Changelog](https://keepachangelog.com/) and adheres
 ## [Unreleased]
+### Added
+- **Grey-box retrieval capture for HTTP JSON targets.** An HTTP target can now
+  capture extra named fields from a JSON response via a `response_fields` map
+  (`name -> dot-path`). Captured values are exposed to evaluator `input_mapping`
+  as `$response.<name>` (for example `$response.context`,
+  `$response.retrieved_documents`), and dataset columns can be referenced with
+  `$row.<name>` (for example `$row.qrels`). This lets RAG evaluators such as
+  Groundedness, Retrieval, and Document Retrieval score the retrieval actually
+  used at eval time instead of static dataset context. The primary prediction
+  (`response_field`) and single-field behavior are unchanged when
+  `response_fields` is not set.
+## [0.4.5] - 2026-06-19
+### Added
+- **Governance gates for HTTP agents (ASSERT and Red Team).** `agentops assert
+  run` and `agentops redteam run` now work against a live HTTP orchestrator
+  endpoint, not only model/deployment targets. Red Team wraps the HTTP endpoint
+  as an SDK-compatible target and reuses the AgentOps HTTP mapping
+  (`request_field`, `response_mode`, `stream`, custom headers). ASSERT resolves
+  `assert-ai` inside the active virtual environment, accepts non-secret values
+  from `assert.env`, can request an AAD token from the Azure CLI for local
+  auth-disabled Azure AI resources, injects the GPT-5 `max_completion_tokens`
+  shim only when configured, and materializes a runtime ASSERT config so
+  committed configs no longer need absolute artifact paths.
+- **Generated workflows run the ASSERT and Red Team gates.** `agentops workflow
+  generate` now installs the optional ASSERT/Red Team dependencies, runs those
+  gates when `assert:` or `redteam:` is present in `agentops.yaml`, uploads
+  their artifacts, and emits the corrected Red Team command quoting.
+### Fixed
+- **Reasoning-model judges no longer fail the eval gate in CI.** The generated
+  GitHub Actions and Azure DevOps eval and Red Team steps now forward
+  `AZURE_OPENAI_MODEL_NAME`, so AgentOps detects reasoning models (such as
+  `gpt-5-nano`) and uses `max_completion_tokens` instead of `max_tokens`. This
+  removes the judge `400` error that could break the eval gate when the judge
+  deployment is a reasoning model.
 ## [0.4.4] - 2026-06-18
 ### Added
@@ -56,6 +94,22 @@ This format follows [Keep a Changelog](https://keepachangelog.com/) and adheres
   `eval.yaml`, so users can see why those evaluators were chosen.
   ([#323](https://github.com/Azure/agentops/issues/323))
+## [0.4.2] - 2026-06-17
+### Fixed
+- **`agentops eval init` now works with both old and new `azure.ai.agents` azd
+  extensions.** Version 0.1.40 of the extension renamed the eval subcommand from
+  `azd ai agent eval init` to `azd ai agent eval generate`, which made
+  `agentops eval init` hard-fail with `Command "init" is deprecated, use 'azd ai
+  agent eval generate' instead`. AgentOps now invokes `generate` first and
+  transparently falls back to the legacy `init` subcommand when an older
+  extension does not recognise `generate`. The fallback only triggers on
+  subcommand-name/deprecation errors; genuine failures (authentication, project
+  endpoint, timeouts) are still surfaced immediately and unchanged. All
+  previously passed flags (`--project-endpoint`, `--agent`,
+  `--gen-instruction-file`, `--eval-model`, `--dataset`, `--evaluator`) and the
+  recipe discovery/persistence behaviour are preserved.
 ## [0.4.1] - 2026-06-15
 ### Changed

{agentops_accelerator-0.4.4/src/agentops_accelerator.egg-info → agentops_accelerator-0.5.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: agentops-accelerator
-Version: 0.4.4
+Version: 0.5.0
 Summary: Release readiness gates and evidence for Microsoft Foundry agents
 License: MIT License

{agentops_accelerator-0.4.4 → agentops_accelerator-0.5.0}/docs/bundles.md RENAMED Viewed

@@ -96,6 +96,8 @@ metadata:
 | `$prediction` | Model or agent response |
 | `$expected` | Ground truth / expected answer from the dataset row |
 | `$context` | Retrieved context documents from the dataset row |
+| `$response.<name>` | A field captured from the live HTTP JSON response via the target's `response_fields` map (e.g. `$response.context`, `$response.retrieved_documents`). Missing captures are skipped. |
+| `$row.<name>` | An arbitrary column from the dataset row (e.g. `$row.qrels` for Document Retrieval ground truth). Missing columns are skipped. |
 | `$tool_calls` | Tool calls returned by the agent |
 | `$tool_definitions` | Tool definitions from the dataset row |

{agentops_accelerator-0.4.4 → agentops_accelerator-0.5.0}/docs/how-it-works.md RENAMED Viewed

@@ -325,6 +325,7 @@ That's a complete config. AgentOps:
 | `thresholds` | no | Metric gates such as `">=3"` or `"<=10"`. |
 | `protocol` | no | URL protocol: `responses`, `invocations`, or `http-json`. |
 | `request_field` / `response_field` / `tool_calls_field` | no | Request/response JSON keys or dot-paths. |
+| `response_fields` | no | Map of `name -> dot-path` capturing extra fields from a JSON response. Each captured value is exposed to evaluator `input_mapping` as `$response.<name>`. Only used when `response_mode` is `json`. |
 | `headers` | no | Static HTTP headers (dict). |
 | `auth_header_env` | no | Env var name holding a Bearer token. |
 | `evaluators` | no | Escape-hatch list of evaluator names that overrides auto-selection. |
@@ -379,6 +380,35 @@ response_field: text              # dot-path; default is "text"
 auth_header_env: APP_API_TOKEN    # value used as Bearer token
 ```
+**HTTP-deployed agent with grey-box retrieval capture (RAG evaluators):**
+When the endpoint can return its retrieval alongside the answer (for example a
+JSON body `{"answer": ..., "context": ..., "retrieved_documents": [...]}`),
+capture the extra fields with `response_fields` and reference them in evaluator
+`input_mapping` via `$response.<name>`. This scores the retrieval actually used
+at eval time instead of static dataset context.
+```yaml
+version: 1
+agent: https://my-aca-app.eastus2.azurecontainerapps.io/orchestrator
+dataset: .agentops/data/qa.jsonl
+response_mode: json
+request_field: ask
+response_field: answer             # primary prediction (dot-path)
+response_fields:                   # extra fields captured per row
+  context: context
+  retrieved_documents: retrieved_documents
+bundle:
+  evaluators:
+    - name: groundedness
+      config:
+        kind: builtin
+        class_name: GroundednessEvaluator
+        input_mapping:
+          response: $response.answer
+          context: $response.context
+```
 **Raw model deployment:**
 ```yaml

{agentops_accelerator-0.4.4 → agentops_accelerator-0.5.0}/plugins/agentops/skills/agentops-governance/SKILL.md RENAMED Viewed

@@ -206,6 +206,115 @@ assert-ai init
 It walks them through behavior description, target callable / model /
 endpoint, dimensions, and judge presets, and writes a validated YAML.
+### HTTP orchestrator ASSERT
+If `agentops.yaml` uses `protocol: http-json` or the user says the target is an
+HTTP orchestrator, do not use ASSERT native endpoint mode. `assert-ai 0.1.0`
+posts `message/history` and expects `response`; AgentOps HTTP targets may use
+custom fields like `ask` and streamed text. Scaffold a callable adapter instead.
+Create `.agentops/assert_http_adapter.py`:
+```python
+from __future__ import annotations
+import json
+from pathlib import Path
+from typing import Any
+from agentops.core.config_loader import load_agentops_config
+from agentops.pipeline.invocations import (
+    _aggregate_stream,
+    _dot_path,
+    _http_request_json,
+    _http_request_stream,
+)
+def target(message: str, history: list[dict[str, Any]] | None = None) -> str:
+    del history
+    config = load_agentops_config(Path("agentops.yaml"))
+    if not config.agent:
+        raise RuntimeError("agentops.yaml must define a top-level HTTP agent endpoint")
+    request_field = config.request_field or "message"
+    headers = dict(config.headers)
+    headers.setdefault("Content-Type", "application/json")
+    body = {request_field: message}
+    if config.response_mode in ("sse", "text"):
+        raw_body = _http_request_stream(
+            method="POST",
+            url=config.agent,
+            headers=headers,
+            body=body,
+            timeout=120,
+        )
+        return _aggregate_stream(config.response_mode, raw_body, config.stream).strip()
+    payload = _http_request_json(
+        method="POST",
+        url=config.agent,
+        headers=headers,
+        body=body,
+        timeout=120,
+    )
+    response_path = config.response_field or "text"
+    response_text = _dot_path(payload, response_path)
+    if response_text is None and isinstance(payload, dict):
+        for fallback in ("response", "output", "content", "message", "text"):
+            response_text = payload.get(fallback)
+            if response_text:
+                break
+    return (
+        response_text
+        if isinstance(response_text, str)
+        else json.dumps(response_text or "", ensure_ascii=False)
+    )
+```
+Create an ASSERT smoke from a known-good eval dataset row, not a random general
+question. For the HTTP tutorial, use:
+```yaml
+suite: gpt-rag-http-smoke
+run: local-http-contract-smoke
+default_model:
+  name: azure/chat
+pipeline:
+  systematize:
+    enabled: false
+  test_set:
+    enabled: false
+  inference:
+    test_set_path: test_set.jsonl
+    target:
+      callable: assert_http_adapter:target
+    max_turns: 1
+  judge:
+    taxonomy_path: taxonomy.json
+    preset:
+      - grounding
+```
+Append this `assert:` block to `agentops.yaml`. Discover `AZURE_API_BASE` from
+the Azure AI/OpenAI resource and set `AZURE_API_VERSION` to the version used by
+the deployment. These are not secrets. If local auth is disabled, AgentOps will
+use the signed-in Azure CLI token for the ASSERT subprocess.
+```yaml
+assert:
+  config: ./assert/eval_config.yaml
+  fail_on_violations: true
+  env:
+    AZURE_API_BASE: https://<azure-ai-resource>.cognitiveservices.azure.com/
+    AZURE_API_VERSION: 2024-12-01-preview
+    AGENTOPS_ASSERT_AZURE_MAX_COMPLETION_TOKENS: "true"
+    PYTHONPATH: .agentops
+```
 **3. Append the `assert:` block to `agentops.yaml`** (preserve every existing
 key — read the file, append the block if missing, write back):

{agentops_accelerator-0.4.4 → agentops_accelerator-0.5.0}/src/agentops/cli/app.py RENAMED Viewed

@@ -14,13 +14,16 @@ from html import escape as html_escape
 from pathlib import Path
 from textwrap import wrap
 from collections.abc import Sequence
-from typing import Annotated, Any, Optional
+from typing import Annotated, Any, Optional, TYPE_CHECKING
 import typer
 from agentops.utils.colors import style
 from agentops.utils.logging import get_logger, setup_logging
+if TYPE_CHECKING:
+    from agentops.core.agentops_config import AgentOpsConfig
 app = typer.Typer(
     name="agentops",
     help="AgentOps - standardized evaluation workflows for AI projects.",
@@ -1574,11 +1577,13 @@ def cmd_init(
     from agentops.services.setup_wizard import (
         AGENT_TITLE,
         DATASET_TITLE,
+        ENDPOINT_SOURCE_AZD_RESOURCE_DISCOVERY,
         PROJECT_ENDPOINT_TITLE,
         REQUIRED_CONFIGURATION_MESSAGE,
         WizardAnswers,
         apply_answers,
         discover_defaults,
+        is_placeholder_agent,
         run_wizard,
         validate_agent,
         validate_dataset,
@@ -1763,12 +1768,14 @@ def cmd_init(
         force_prompt_fields = {"agent", "dataset"} if config_seeded_this_run else set()
         prompt_values = [
             defaults.project_endpoint,
-            defaults.agent,
+            None if is_placeholder_agent(defaults.agent) else defaults.agent,
             defaults.dataset,
         ]
-        will_prompt = reconfigure or bool(force_prompt_fields) or any(
-            v is None or not str(v).strip()
-            for v in prompt_values
+        will_prompt = (
+            reconfigure
+            or bool(force_prompt_fields)
+            or any(v is None or not str(v).strip() for v in prompt_values)
+            or defaults.project_endpoint_source == ENDPOINT_SOURCE_AZD_RESOURCE_DISCOVERY
         )
         if will_prompt:
             typer.echo(style("Press Enter to accept the value in brackets.", "dim"))
@@ -1817,6 +1824,7 @@ def cmd_init(
             workspace,
             prompt=_prompt,
             echo=_wizard_echo,
+            defaults=defaults,
             on_answer=_on_answer,
             reconfigure=reconfigure,
             force_prompt_fields=force_prompt_fields,
@@ -2121,8 +2129,13 @@ def cmd_eval_init(
     if _maybe_explain_leaf(("eval", "init"), explain):
         return
+    from agentops.core.config_loader import load_agentops_config
     from agentops.pipeline.azd_runner import AzdBackendError
-    from agentops.services.azd_eval_init import run_azd_eval_init
+    from agentops.services.azd_eval_init import (
+        ensure_local_evaluator_model_env,
+        recommend_evaluators_for_config,
+        run_azd_eval_init,
+    )
     workspace = directory.resolve()
     config_path = _resolve_eval_config_path(config)
@@ -2130,8 +2143,48 @@ def cmd_eval_init(
         config_path = workspace / config_path
     try:
+        loaded_config = load_agentops_config(config_path)
+        target = loaded_config.resolved_target()
+        if target.kind not in {"foundry_prompt", "foundry_hosted"}:
+            selection = recommend_evaluators_for_config(
+                config_path=config_path,
+                dataset=dataset,
+            )
+            typer.echo(
+                f"{_cli_label('AgentOps eval init')}: local HTTP/model target detected; "
+                "azd eval assets are not required."
+            )
+            typer.echo(f"{_cli_label('Evaluator recommendation')}: {selection.source}")
+            for signal in selection.signals:
+                typer.echo(f" {style('-', 'dim')} {signal}")
+            if selection.names:
+                typer.echo(f"{_cli_label('Evaluators')}: {', '.join(selection.names)}")
+            model_env = ensure_local_evaluator_model_env(
+                workspace=workspace,
+                selection=selection,
+            )
+            if model_env.configured:
+                action = "configured" if model_env.changed_keys else "using"
+                typer.echo(
+                    f"{_cli_label('Evaluator model')}: {action} "
+                    f"{model_env.deployment} ({model_env.model})"
+                )
+                if model_env.changed_keys and model_env.env_path is not None:
+                    typer.echo(
+                        f" {style('-', 'dim')} saved "
+                        f"{', '.join(model_env.changed_keys)} to "
+                        f"{_cli_path(model_env.env_path)}"
+                    )
+            elif selection.names and model_env.source != "not needed":
+                typer.echo(
+                    f"{_cli_warn('Warning')}: could not auto-discover an evaluator "
+                    "model deployment. Set AZURE_OPENAI_DEPLOYMENT and "
+                    "AZURE_OPENAI_MODEL_NAME before `agentops eval run`."
+                )
+            typer.echo(f"{_cli_label('Next')}: {_cli_command('agentops eval run')}")
+            return
         typer.echo(
-            f"{_cli_label('azd eval init')}: checking/generating eval.yaml "
+            f"{_cli_label('azd eval generate')}: checking/generating eval.yaml "
             "(this can take a few minutes on the first run)"
         )
         result = run_azd_eval_init(
@@ -2148,9 +2201,9 @@ def cmd_eval_init(
         raise typer.Exit(code=1) from exc
     if result.command_ran:
-        typer.echo(f"{_cli_label('azd eval init')}: completed")
+        typer.echo(f"{_cli_label('azd eval generate')}: completed")
     else:
-        typer.echo(f"{_cli_label('azd eval init')}: existing recipe reused")
+        typer.echo(f"{_cli_label('azd eval generate')}: existing recipe reused")
     if result.evaluators:
         typer.echo(f"{_cli_label('Evaluator recommendation')}: {result.evaluator_source}")
         for signal in result.evaluator_signals:
@@ -2346,6 +2399,17 @@ def cmd_assert_run(
             ),
         ),
     ] = False,
+    cached: Annotated[
+        bool,
+        typer.Option(
+            "--cached",
+            help=(
+                "Reuse cached inference/judge rows from a previous run with the "
+                "same run id. By default ASSERT re-runs inference against the live "
+                "target each time so the gate always exercises the current agent."
+            ),
+        ),
+    ] = False,
     explain: Annotated[str | None, typer.Argument(hidden=True)] = None,
 ) -> None:
     """Invoke the ASSERT (assert-ai) CLI and normalize its results."""
@@ -2403,6 +2467,7 @@ def cmd_assert_run(
     resolved_suite: str | None = suite
     resolved_run_id: str | None = run_id
     fail_on_violations = True
+    subprocess_env: dict[str, str] | None = None
     if cfg.assert_run is not None:
         if eval_config_path is None:
@@ -2414,6 +2479,7 @@ def cmd_assert_run(
         if resolved_run_id is None:
             resolved_run_id = cfg.assert_run.run_id
         fail_on_violations = cfg.assert_run.fail_on_violations
+        subprocess_env = dict(cfg.assert_run.env)
     if no_gate:
         fail_on_violations = False
@@ -2428,6 +2494,12 @@ def cmd_assert_run(
         typer.echo(
             f"  suite={resolved_suite or '<auto>'} run_id={resolved_run_id or '<auto>'}"
         )
+    if cached:
+        typer.echo("  cache: reusing prior inference/judge rows when available")
+    else:
+        typer.echo("  cache: forcing fresh inference against the live target")
+    assert_extra_args = None if cached else ["--force-stage", "inference"]
     try:
         result = run_assert(
@@ -2436,6 +2508,8 @@ def cmd_assert_run(
             results_dir=resolved_results_dir,
             suite=resolved_suite,
             run_id=resolved_run_id,
+            env=subprocess_env,
+            extra_args=assert_extra_args,
         )
     except AssertRunnerError as exc:
         typer.echo(f"{_cli_error('Error')}: {exc}", err=True)
@@ -2471,9 +2545,15 @@ def cmd_assert_run(
             violations = bucket.get("violations", 0)
             total = bucket.get("total", 0)
             skipped = bucket.get("skipped", 0)
-            marker = _cli_ok("OK") if violations == 0 else _cli_error("VIOLATIONS")
             suffix = f" (skipped={skipped})" if skipped else ""
-            typer.echo(f"  {name}: {violations}/{total}{suffix} {marker}")
+            if violations == 0:
+                clean = max(total - skipped, 0)
+                typer.echo(f"  {name}: {clean}/{total} clean{suffix} {_cli_ok('OK')}")
+            else:
+                typer.echo(
+                    f"  {name}: {violations}/{total} violating{suffix} "
+                    f"{_cli_error('VIOLATIONS')}"
+                )
     typer.echo("")
     typer.echo(_cli_heading("Inspect details"))
@@ -2666,6 +2746,7 @@ def cmd_redteam_run(
                 err=True,
             )
             raise typer.Exit(code=1)
+    _apply_http_redteam_defaults(resolved_target, cfg)
     if output_path is not None and not output_path.is_absolute():
         output_path = (workspace / output_path).resolve()
@@ -2786,6 +2867,21 @@ def _derive_redteam_target_from_agent(agent: str | None) -> dict[str, Any]:
     return {"agent": agent}
+def _apply_http_redteam_defaults(target: dict[str, Any], cfg: AgentOpsConfig) -> None:
+    if "endpoint" not in target:
+        return
+    if cfg.request_field:
+        target.setdefault("request_field", cfg.request_field)
+    if cfg.response_field:
+        target.setdefault("response_field", cfg.response_field)
+    if cfg.response_mode:
+        target.setdefault("response_mode", cfg.response_mode)
+    if cfg.headers:
+        target.setdefault("headers", cfg.headers)
+    if cfg.stream:
+        target.setdefault("stream", cfg.stream.model_dump(exclude_none=True))
 def _run_flat_schema_eval(
     *,
     config_path: Path,

{agentops_accelerator-0.4.4 → agentops_accelerator-0.5.0}/src/agentops/core/agentops_config.py RENAMED Viewed

@@ -462,6 +462,14 @@ class AssertRunConfig(BaseModel):
             "results without gating the pipeline."
         ),
     )
+    env: Dict[str, str] = Field(
+        default_factory=dict,
+        description=(
+            "Optional non-secret environment variables passed only to the "
+            "assert-ai subprocess, for example AZURE_API_BASE or "
+            "AZURE_API_VERSION."
+        ),
+    )
     model_config = ConfigDict(extra="forbid")
@@ -750,6 +758,18 @@ class AgentOpsConfig(BaseModel):
     request_field: Optional[str] = None
     response_field: Optional[str] = None
     tool_calls_field: Optional[str] = None
+    response_fields: Dict[str, str] = Field(
+        default_factory=dict,
+        description=(
+            "Extra named fields to capture from an http-json response, mapping "
+            "a name to a dot-path into the JSON body (e.g. {context: context, "
+            "retrieved_documents: retrieved_documents}). Each captured value is "
+            "exposed to evaluator input_mapping via the '$response.<name>' "
+            "token, so RAG evaluators can score the live retrieved context "
+            "returned by the same call. Only used when response_mode is "
+            "'json'. The primary answer still comes from response_field."
+        ),
+    )
     headers: Dict[str, str] = Field(default_factory=dict)
     auth_header_env: Optional[str] = None
     response_mode: ResponseMode = Field(
@@ -935,6 +955,7 @@ class AgentOpsConfig(BaseModel):
             self.request_field
             or self.response_field
             or self.tool_calls_field
+            or self.response_fields
             or self.headers
             or self.auth_header_env
             or self.response_mode != "json"

{agentops_accelerator-0.4.4 → agentops_accelerator-0.5.0}/src/agentops/pipeline/invocations.py RENAMED Viewed

@@ -658,10 +658,17 @@ def _invoke_http_json(
         if isinstance(extracted, list):
             tool_calls = extracted
+    captured: Dict[str, Any] = {}
+    for name, path in (config.response_fields or {}).items():
+        value = _dot_path(payload, path)
+        if value is not None:
+            captured[name] = value
     return InvocationResult(
         response=response_text.strip(),
         latency_seconds=elapsed,
         tool_calls=tool_calls,
+        metadata={"response_fields": captured} if captured else {},
     )

{agentops_accelerator-0.4.4 → agentops_accelerator-0.5.0}/src/agentops/pipeline/orchestrator.py RENAMED Viewed

@@ -760,6 +760,7 @@ def _evaluate_row(
         )
         metrics: List[RowMetric] = []
+        captured_fields = invocation.metadata.get("response_fields") or {}
         for evaluator in evaluators:
             metric = runtime.run_evaluator(
                 evaluator,
@@ -767,6 +768,7 @@ def _evaluate_row(
                 response=invocation.response,
                 latency_seconds=invocation.latency_seconds,
                 actual_tool_calls=invocation.tool_calls,
+                response_fields=captured_fields,
             )
             metrics.append(metric)
@@ -819,7 +821,7 @@ def _evaluate_row(
         input=str(row.get("input", "")),
         expected=row.get("expected"),
         response=invocation.response,
-        context=row.get("context"),
+        context=captured_fields.get("context", row.get("context")),
         latency_seconds=invocation.latency_seconds,
         tool_calls=invocation.tool_calls,
         metrics=metrics,

{agentops_accelerator-0.4.4 → agentops_accelerator-0.5.0}/src/agentops/pipeline/publisher.py RENAMED Viewed

@@ -110,12 +110,18 @@ def _build_instance_rows(result: RunResult) -> List[Dict[str, Any]]:
     for row in result.rows:
         payload: Dict[str, Any] = {
             "line_number": row.row_index,
-            "input": row.input,
-            "response": row.response,
-            "ground_truth": row.expected or "",
+            "inputs.input": row.input,
+            "inputs.response": row.response,
+            "inputs.ground_truth": row.expected or "",
         }
         for metric in row.metrics:
             if metric.value is not None:
-                payload[metric.name] = metric.value
+                payload[f"outputs.{metric.name}.score"] = metric.value
+                if not metric.name.endswith("_latency_seconds"):
+                    payload[f"metric.{metric.name}"] = metric.value
+            if metric.reason:
+                payload[f"outputs.{metric.name}.reason"] = metric.reason
+            if metric.error:
+                payload[f"outputs.{metric.name}.error"] = metric.error
         rows.append(payload)
     return rows

{agentops_accelerator-0.4.4 → agentops_accelerator-0.5.0}/src/agentops/pipeline/runtime.py RENAMED Viewed

@@ -67,6 +67,10 @@ def _credential() -> Any:
 _REASONING_MODEL_PREFIXES = ("gpt-5", "o1", "o3", "o4")
+def _evaluator_model_name() -> Optional[str]:
+    return os.getenv("AZURE_OPENAI_MODEL_NAME") or os.getenv("AZURE_AI_MODEL_NAME")
 def _model_config() -> Dict[str, Any]:
     from agentops.utils.azure_endpoints import (
         derive_openai_endpoint_from_project,
@@ -166,7 +170,9 @@ def load_evaluator(preset: EvaluatorPreset) -> EvaluatorRuntime:
     if preset.class_name in _AI_ASSISTED:
         model_config = _model_config()
         init_kwargs["model_config"] = model_config
-        if _is_reasoning_model_deployment(model_config.get("azure_deployment")):
+        if _is_reasoning_model_deployment(
+            _evaluator_model_name() or model_config.get("azure_deployment")
+        ):
             init_kwargs["is_reasoning_model"] = True
     if preset.class_name in _SAFETY:
         init_kwargs["azure_ai_project"] = _project_endpoint()
@@ -292,13 +298,33 @@ def _resolve_kwargs(
     *,
     row: Dict[str, Any],
     response: str,
+    response_fields: Optional[Dict[str, Any]] = None,
 ) -> Dict[str, Any]:
     resolved: Dict[str, Any] = {}
     merged = {**row, "response": response, "input": row.get("input")}
+    captured = response_fields or {}
     for kwarg, placeholder in mapping.items():
         if not isinstance(placeholder, str) or not placeholder.startswith("$"):
             resolved[kwarg] = placeholder
             continue
+        if placeholder.startswith("$response."):
+            # Live multi-field capture from an http-json target, e.g.
+            # '$response.context' resolves to the context the endpoint
+            # returned alongside the answer on the same call.
+            name = placeholder[len("$response."):]
+            value = captured.get(name)
+            if value is not None:
+                resolved[kwarg] = value
+            continue
+        if placeholder.startswith("$row."):
+            # Arbitrary dataset column, e.g. '$row.qrels' for Document
+            # Retrieval ground-truth labels that the fixed token set does
+            # not name explicitly.
+            name = placeholder[len("$row."):]
+            value = row.get(name)
+            if value is not None:
+                resolved[kwarg] = value
+            continue
         source_key = _PLACEHOLDERS.get(placeholder)
         if source_key is None:
             raise ValueError(f"unknown evaluator placeholder {placeholder!r}")
@@ -353,6 +379,7 @@ def run_evaluator(
     response: str,
     latency_seconds: float,
     actual_tool_calls: Optional[List[Any]] = None,
+    response_fields: Optional[Dict[str, Any]] = None,
 ) -> RowMetric:
     """Execute one evaluator on one row. Captures errors so the run continues."""
     preset = runtime.preset
@@ -383,7 +410,12 @@ def run_evaluator(
             )
     try:
-        kwargs = _resolve_kwargs(preset.input_mapping, row=row, response=response)
+        kwargs = _resolve_kwargs(
+            preset.input_mapping,
+            row=row,
+            response=response,
+            response_fields=response_fields,
+        )
         if preset.needs_conversation:
             # Prefer the actual calls made by the agent during invocation;
             # fall back to the dataset's expected calls if the runner did

agentops-accelerator 0.4.4__tar.gz → 0.5.0__tar.gz

agentops-accelerator 0.4.4tar.gz → 0.5.0tar.gz