PyPI - agentops-accelerator - Versions diffs - 0.3.4__tar.gz → 0.3.6__tar.gz - Mend

agentops-accelerator 0.3.4tar.gz → 0.3.6tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (299) hide show

{agentops_accelerator-0.3.4 → agentops_accelerator-0.3.6}/.claude-plugin/marketplace.json RENAMED Viewed

@@ -13,7 +13,7 @@
       "name": "agentops-accelerator",
       "source": "../../plugins/agentops",
       "description": "Copilot agent skills for running standardized evaluation workflows with AgentOps Toolkit and Microsoft Foundry agents.",
-      "version": "0.3.4",
+      "version": "0.3.6",
       "keywords": [
         "agentops",
         "evaluation",

{agentops_accelerator-0.3.4 → agentops_accelerator-0.3.6}/.github/plugin/marketplace.json RENAMED Viewed

@@ -13,7 +13,7 @@
       "name": "agentops-accelerator",
       "source": "../../plugins/agentops",
       "description": "Copilot agent skills for running standardized evaluation workflows with AgentOps Toolkit and Microsoft Foundry agents.",
-      "version": "0.3.4",
+      "version": "0.3.6",
       "keywords": [
         "agentops",
         "evaluation",

{agentops_accelerator-0.3.4 → agentops_accelerator-0.3.6}/CHANGELOG.md RENAMED Viewed

@@ -5,6 +5,56 @@ This format follows [Keep a Changelog](https://keepachangelog.com/) and adheres
 ## [Unreleased]
+## [0.3.6] - 2026-06-01
+### Changed
+- **`agentops eval run` now distinguishes a grader *execution* failure from a
+  quality-gate failure.** When evaluator workers error out on a subset of rows
+  (auth/RBAC/timeout), no row has every grader return a score, so
+  `items_passed_all` is `0` and the run reports `Threshold status: FAILED` even
+  though every threshold that *could* be computed passed. The CLI now detects
+  this case (errored graders combined with all thresholds passing) and prints a
+  `Warning` explaining that this is an execution error, not a quality
+  regression, names the most common cause (data-plane RBAC granted moments
+  earlier that is still propagating to the evaluator workers), surfaces the
+  first underlying grader error, and advises waiting a few minutes before
+  re-running. The exit-code contract is unchanged. Added the
+  `_grader_error_summary` helper plus focused unit tests.
+- **Corrected the RBAC propagation guidance in the tutorials and the
+  `agentops-eval` skill.** Data-plane role assignments on Cognitive Services
+  accounts can take several minutes (not 30-120 seconds) to reach the
+  independent, per-row evaluator workers, which can produce an *intermittent*
+  `FAILED` with otherwise-green thresholds on the first run after granting
+  access. The prompt-agent, hosted-agent, and end-to-end tutorials and the
+  skill now describe this symptom and tell readers to wait and re-run rather
+  than lower thresholds.
+## [0.3.5] - 2026-06-01
+### Changed
+- **`agentops-eval` coding-agent skill now preflights the data-plane RBAC
+  step that the Foundry portal does not assign by default.** Creating a
+  Foundry project through the portal only grants the user `Foundry User`
+  at the *project* scope, which does not cover
+  `Microsoft.CognitiveServices/accounts/OpenAI/deployments/chat/completions/action`
+  on the parent AI Services account where chat completions actually live.
+  Subscription `Owner` is also insufficient because the built-in `Owner`
+  role definition has `actions: ["*"]` but `dataActions: []`. The first
+  `agentops eval run` against a fresh workspace therefore failed with
+  `PermissionDenied` on every AI-assisted evaluator and every cloud-eval
+  grader. The skill's new **Step 0.5 - Ensure data-plane RBAC on the AI
+  Services account** resolves the Foundry project endpoint from
+  `.azure/<env>/.env` or `.agentops/.env`, looks up the backing AI
+  Services account + resource group with
+  `az cognitiveservices account list`, fetches the signed-in object ID
+  with `az ad signed-in-user show`, and runs an idempotent
+  `az role assignment create` for `Cognitive Services OpenAI User` at
+  the resource-group scope before handing off to `agentops eval analyze`.
+  This keeps the skill experience consistent with the new manual
+  instructions added to the prompt-agent, hosted-agent, and end-to-end
+  tutorials, so users running the skill against a fresh Foundry project
+  no longer hit the same 401 the manual tutorials previously hid.
 ## [0.3.4] - 2026-06-01
 ### Fixed

{agentops_accelerator-0.3.4/src/agentops_accelerator.egg-info → agentops_accelerator-0.3.6}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: agentops-accelerator
-Version: 0.3.4
+Version: 0.3.6
 Summary: Release readiness gates and evidence for Microsoft Foundry agents
 License: MIT License

{agentops_accelerator-0.3.4 → agentops_accelerator-0.3.6}/docs/tutorial-end-to-end.md RENAMED Viewed

@@ -286,6 +286,42 @@ for creating agents, tools, tracing, evaluation, and red-team scans:
 https://github.com/Azure-Samples/microsoft-foundry-e2e-agent-observability-workshop/tree/2026-04-aie-europe
 ```
+### Grant your identity data-plane access to the AI Services account
+Both options above (prompt agent and hosted HTTP agent) eventually drive
+an `agentops eval run` that calls chat-completions on the AI Services
+account behind your Foundry project — either through Foundry's cloud
+graders or through the local AI-assisted evaluators. Creating a project
+through the portal assigns you `Foundry User` **only at the project
+scope**, which does not cover OpenAI data-plane actions on the parent
+account. Subscription `Owner` is also insufficient: its built-in role
+definition has `actions: ["*"]` but `dataActions: []`. Skipping this is
+what causes the eval to fail later with `PermissionDenied` on
+`Microsoft.CognitiveServices/accounts/OpenAI/deployments/chat/
+completions/action`.
+Run the assignment once per resource group that hosts a Foundry account
+you will evaluate against. Replace `<your-objectId>`,
+`<subscription-id>`, and `<resource-group>` with your own values (use
+`az ad signed-in-user show --query id -o tsv` to get the object ID):
+```powershell
+az role assignment create `
+  --assignee <your-objectId> `
+  --role "Cognitive Services OpenAI User" `
+  --scope /subscriptions/<subscription-id>/resourceGroups/<resource-group>
+```
+> **Give the assignment a few minutes to propagate.** Data-plane role
+> assignments on the AI Services account do **not** take effect
+> instantly — propagation to the evaluator workers can take several
+> minutes (occasionally up to ~15). Evaluators authenticate per call, so
+> the **first eval right after granting the role may show intermittent
+> `AuthenticationError` on a subset of graders and report
+> `Threshold status: FAILED` even when every threshold is green**. This
+> is a grader execution failure, not a quality regression — wait a few
+> minutes and re-run the eval.
 ## 2. Create the travel eval dataset
 ```powershell

{agentops_accelerator-0.3.4 → agentops_accelerator-0.3.6}/docs/tutorial-hosted-agent-quickstart.md RENAMED Viewed

@@ -310,6 +310,40 @@ If the deployed endpoint needs a bearer token:
 $env:HOSTED_AGENT_TOKEN = "<token>"
 ```
+### Grant your identity data-plane access to the AI Services account
+The local AI-assisted evaluators that AgentOps runs in step 8 call
+chat-completions on the AI Services account that backs your Foundry
+project. Creating a project through the portal only assigns you
+`Foundry User` **at the project scope**, which does not cover the
+OpenAI data-plane action on the parent account. Even subscription
+`Owner` is insufficient: the built-in `Owner` role has `actions: ["*"]`
+but `dataActions: []`. Skipping this once causes the eval to fail with
+`PermissionDenied` on `Microsoft.CognitiveServices/accounts/OpenAI/
+deployments/chat/completions/action`.
+Run the assignment once per resource group hosting a Foundry account
+you will evaluate against (replace `<your-objectId>`,
+`<subscription-id>`, and `<resource-group>` with your values; get the
+object ID with `az ad signed-in-user show --query id -o tsv`):
+```powershell
+az role assignment create `
+  --assignee <your-objectId> `
+  --role "Cognitive Services OpenAI User" `
+  --scope /subscriptions/<subscription-id>/resourceGroups/<resource-group>
+```
+> **Give the assignment a few minutes to propagate.** Data-plane role
+> assignments on the AI Services account do **not** take effect
+> instantly — propagation to the local/Foundry evaluator workers can
+> take several minutes (occasionally up to ~15). Evaluators authenticate
+> per call, so the **first eval right after granting the role may show
+> intermittent `AuthenticationError` on a subset of graders and report
+> `Threshold status: FAILED` even when every threshold is green**. This
+> is a grader execution failure, not a quality regression — wait a few
+> minutes and re-run the eval.
 ## 5. Initialize AgentOps interactively
 ```powershell

{agentops_accelerator-0.3.4 → agentops_accelerator-0.3.6}/docs/tutorial-prompt-agent-quickstart.md RENAMED Viewed

@@ -241,6 +241,53 @@ Show me the planned changes and the resulting endpoints before applying.
 If the skill is not available, use Path A.
+### Grant your identity data-plane access to the AI Services account
+Creating a project through the portal only assigns you `Foundry User` **at
+the project scope**. That role does not cover the OpenAI data-plane actions
+that live on the parent AI Services *account* — the chat-completions call
+that backs every AI-assisted evaluator and every cloud-eval grader. Even
+`Owner` on the subscription is not enough: the built-in `Owner` role
+definition has `actions: ["*"]` but `dataActions: []`, so it grants full
+control plane and zero data plane on Cognitive Services accounts.
+Skipping this step is what causes the eval grader to fail later with::
+    PermissionDenied: The principal `<your-objectId>` lacks the required
+    data action `Microsoft.CognitiveServices/accounts/OpenAI/deployments/
+    chat/completions/action` to perform `POST /openai/deployments/...`
+Run the assignment once per resource group that hosts a Foundry account
+you will evaluate against. Replace `<your-objectId>`, `<subscription-id>`,
+and `<resource-group>` with your own values (you can get the object ID
+with `az ad signed-in-user show --query id -o tsv`):
+```powershell
+az role assignment create `
+  --assignee <your-objectId> `
+  --role "Cognitive Services OpenAI User" `
+  --scope /subscriptions/<subscription-id>/resourceGroups/<resource-group>
+```
+Repeat the command with the `travel-agent-dev` resource group if the dev
+project lives in a different RG.
+> **Give the assignment a few minutes to propagate.** Data-plane role
+> assignments on the AI Services account do **not** take effect
+> instantly — propagation to the Foundry evaluator workers can take
+> several minutes (occasionally up to ~15). The cloud eval runs each
+> grader as an independent worker that authenticates separately, so the
+> **first run right after granting the role may show intermittent
+> `AuthenticationError` on a subset of graders and report
+> `Threshold status: FAILED` even when every threshold is green** (no
+> single row had all graders succeed). This is a grader execution
+> failure, not a quality regression. Wait a few minutes and re-run
+> `agentops eval run` — once propagation finishes, every grader scores
+> and the gate passes.
+AgentOps Doctor will detect the missing assignment in a future release,
+but until then this is a manual one-time setup step per new environment.
 ## 4. Seed `travel-agent` in the sandbox project
 You only author the agent in **one place**: your sandbox Foundry

{agentops_accelerator-0.3.4 → agentops_accelerator-0.3.6}/plugins/agentops/package.json RENAMED Viewed

@@ -2,7 +2,7 @@
   "name": "agentops-accelerator",
   "displayName": "AgentOps Accelerator — Skills for GitHub Copilot",
   "description": "Copilot agent skills for running standardized evaluation workflows with AgentOps Accelerator and Microsoft Foundry agents.",
-  "version": "0.3.4",
+  "version": "0.3.6",
   "publisher": "AgentOpsAccelerator",
   "icon": "icon.png",
   "license": "MIT",

{agentops_accelerator-0.3.4 → agentops_accelerator-0.3.6}/plugins/agentops/plugin.json RENAMED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "agentops-accelerator",
   "description": "Copilot agent skills for running standardized evaluation workflows with AgentOps Accelerator and Microsoft Foundry agents.",
-  "version": "0.3.4",
+  "version": "0.3.6",
   "author": {
     "name": "AgentOps Accelerator",
     "url": "https://github.com/Azure/agentops"

{agentops_accelerator-0.3.4 → agentops_accelerator-0.3.6}/plugins/agentops/skills/agentops-eval/SKILL.md RENAMED Viewed

@@ -25,6 +25,65 @@ with a `name:version` or URL.
    (`--project-endpoint`, `--agent`, `--dataset`, …) for non-interactive
    runs. Run `agentops init show` later to inspect the resolved config.
+## Step 0.5 - Ensure data-plane RBAC on the AI Services account
+AgentOps eval (cloud graders **and** local AI-assisted evaluators) calls
+`/openai/deployments/.../chat/completions` on the AI Services account
+that backs the Foundry project. Creating a project through the Foundry
+portal only assigns the user `Foundry User` at the *project* scope,
+which does **not** cover OpenAI data-plane actions on the parent
+account. Subscription `Owner` is also insufficient because the built-in
+`Owner` role has `actions: ["*"]` but `dataActions: []`. The first
+`agentops eval run` against a fresh workspace will otherwise fail with:
+```
+PermissionDenied … lacks the required data action
+'Microsoft.CognitiveServices/accounts/OpenAI/deployments/chat/completions/action'
+```
+Run this preflight before Step 1 - it is idempotent (Azure returns
+`RoleAssignmentExists` if already granted) and takes ~5 seconds:
+```bash
+# 1. Resolve the AI Services account from agentops.yaml / .azure/<env>/.env
+PROJECT_ENDPOINT=$(grep -h '^AZURE_AI_FOUNDRY_PROJECT_ENDPOINT' .azure/*/.env .agentops/.env 2>/dev/null | tail -1 | cut -d= -f2- | tr -d '"')
+ACCOUNT_HOST=$(echo "$PROJECT_ENDPOINT" | awk -F[/:] '{print $4}')
+ACCOUNT_NAME=$(echo "$ACCOUNT_HOST" | cut -d. -f1)
+# 2. Resolve subscription, resource group, and signed-in object ID
+SUB_ID=$(az account show --query id -o tsv)
+RG=$(az cognitiveservices account list --subscription "$SUB_ID" --query "[?name=='$ACCOUNT_NAME'].resourceGroup | [0]" -o tsv)
+OBJ_ID=$(az ad signed-in-user show --query id -o tsv)
+# 3. Grant data-plane access at the RG scope (covers sandbox + future evals)
+az role assignment create \
+  --assignee "$OBJ_ID" \
+  --role "Cognitive Services OpenAI User" \
+  --scope "/subscriptions/$SUB_ID/resourceGroups/$RG"
+```
+PowerShell equivalent: replace `$(...)` with the PowerShell variable
+assignments shown in `docs/tutorial-prompt-agent-quickstart.md`.
+If the user has not run `az login` yet, do that first. If
+`az cognitiveservices account list` returns an empty RG, the AI Services
+account lives in a different subscription - ask the user which one.
+Skip this step only if the user explicitly says the role is already
+assigned, or if a previous `agentops eval run` succeeded against the
+same Foundry account.
+**Propagation:** data-plane role assignments do not take effect
+instantly — allow several minutes (occasionally up to ~15) before the
+first eval. The cloud/local graders authenticate per call, so if the
+user runs an eval immediately after this preflight and sees intermittent
+`AuthenticationError` on a subset of graders plus
+`Threshold status: FAILED` while the visible thresholds are green, that
+is propagation lag (a grader **execution** failure), not a quality
+regression. Tell the user to wait a few minutes and re-run
+`agentops eval run`; do not treat it as a failing gate or start changing
+thresholds.
 ## Step 1 - Analyze evaluation setup
 Run the deterministic local triage first:

{agentops_accelerator-0.3.4 → agentops_accelerator-0.3.6}/src/agentops/cli/app.py RENAMED Viewed

@@ -2055,10 +2055,57 @@ def _run_flat_schema_eval(
     if result.summary.overall_passed:
         typer.echo(f"{_cli_label('Threshold status')}: {style('PASSED', 'bold', 'green')}")
         return
+    # Distinguish a genuine quality-gate failure from grader *execution*
+    # errors. When evaluator workers error (auth/RBAC/timeout) on a subset of
+    # rows, no row has every grader succeed, so `items_passed_all` is 0 and the
+    # gate reports FAILED even though every threshold that *could* be computed
+    # passed. Surfacing this prevents users from chasing a phantom quality
+    # regression - the most common cause is data-plane RBAC granted moments
+    # earlier that is still propagating to the evaluator workers.
+    errored, total, first_error = _grader_error_summary(result)
+    all_thresholds_passed = (
+        result.summary.thresholds_total > 0
+        and result.summary.thresholds_passed == result.summary.thresholds_total
+    )
+    if errored and all_thresholds_passed:
+        typer.echo(
+            f"{_cli_warn('Warning')}: {errored} of {total} grader execution(s) "
+            "errored, so no dataset row had every grader return a score. This is "
+            "a grader execution failure, not a quality regression - every "
+            "threshold that could be computed passed. The most common cause is "
+            "data-plane RBAC granted recently that is still propagating to the "
+            "evaluator workers; wait a few minutes and re-run `agentops eval run`.",
+            err=True,
+        )
+        if first_error:
+            typer.echo(f"{_cli_warn('Warning')}: first grader error: {first_error}", err=True)
     typer.echo(f"{_cli_label('Threshold status')}: {style('FAILED', 'bold', 'red')}")
     raise typer.Exit(code=exit_code_from(result))
+def _grader_error_summary(result) -> tuple[int, int, Optional[str]]:
+    """Return ``(errored_metric_count, total_metric_count, first_error)``.
+    Walks every per-row metric in the run so the CLI can tell a grader
+    *execution* failure (auth/RBAC/timeout) apart from a quality-gate failure.
+    The first non-empty error string is lifted out as the actionable cause.
+    """
+    errored = 0
+    total = 0
+    first_error: Optional[str] = None
+    for row in result.rows:
+        for metric in row.metrics:
+            total += 1
+            err = getattr(metric, "error", None)
+            if isinstance(err, str) and err.strip():
+                errored += 1
+                if first_error is None:
+                    first_error = err.strip()
+    return errored, total, first_error
 def _default_flat_output_dir(config_path: Path) -> Path:
     base = config_path.parent / ".agentops" / "results"
     timestamp = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H-%M-%SZ")

{agentops_accelerator-0.3.4 → agentops_accelerator-0.3.6}/src/agentops/templates/skills/agentops-eval/SKILL.md RENAMED Viewed

@@ -25,6 +25,65 @@ with a `name:version` or URL.
    (`--project-endpoint`, `--agent`, `--dataset`, …) for non-interactive
    runs. Run `agentops init show` later to inspect the resolved config.
+## Step 0.5 - Ensure data-plane RBAC on the AI Services account
+AgentOps eval (cloud graders **and** local AI-assisted evaluators) calls
+`/openai/deployments/.../chat/completions` on the AI Services account
+that backs the Foundry project. Creating a project through the Foundry
+portal only assigns the user `Foundry User` at the *project* scope,
+which does **not** cover OpenAI data-plane actions on the parent
+account. Subscription `Owner` is also insufficient because the built-in
+`Owner` role has `actions: ["*"]` but `dataActions: []`. The first
+`agentops eval run` against a fresh workspace will otherwise fail with:
+```
+PermissionDenied … lacks the required data action
+'Microsoft.CognitiveServices/accounts/OpenAI/deployments/chat/completions/action'
+```
+Run this preflight before Step 1 - it is idempotent (Azure returns
+`RoleAssignmentExists` if already granted) and takes ~5 seconds:
+```bash
+# 1. Resolve the AI Services account from agentops.yaml / .azure/<env>/.env
+PROJECT_ENDPOINT=$(grep -h '^AZURE_AI_FOUNDRY_PROJECT_ENDPOINT' .azure/*/.env .agentops/.env 2>/dev/null | tail -1 | cut -d= -f2- | tr -d '"')
+ACCOUNT_HOST=$(echo "$PROJECT_ENDPOINT" | awk -F[/:] '{print $4}')
+ACCOUNT_NAME=$(echo "$ACCOUNT_HOST" | cut -d. -f1)
+# 2. Resolve subscription, resource group, and signed-in object ID
+SUB_ID=$(az account show --query id -o tsv)
+RG=$(az cognitiveservices account list --subscription "$SUB_ID" --query "[?name=='$ACCOUNT_NAME'].resourceGroup | [0]" -o tsv)
+OBJ_ID=$(az ad signed-in-user show --query id -o tsv)
+# 3. Grant data-plane access at the RG scope (covers sandbox + future evals)
+az role assignment create \
+  --assignee "$OBJ_ID" \
+  --role "Cognitive Services OpenAI User" \
+  --scope "/subscriptions/$SUB_ID/resourceGroups/$RG"
+```
+PowerShell equivalent: replace `$(...)` with the PowerShell variable
+assignments shown in `docs/tutorial-prompt-agent-quickstart.md`.
+If the user has not run `az login` yet, do that first. If
+`az cognitiveservices account list` returns an empty RG, the AI Services
+account lives in a different subscription - ask the user which one.
+Skip this step only if the user explicitly says the role is already
+assigned, or if a previous `agentops eval run` succeeded against the
+same Foundry account.
+**Propagation:** data-plane role assignments do not take effect
+instantly — allow several minutes (occasionally up to ~15) before the
+first eval. The cloud/local graders authenticate per call, so if the
+user runs an eval immediately after this preflight and sees intermittent
+`AuthenticationError` on a subset of graders plus
+`Threshold status: FAILED` while the visible thresholds are green, that
+is propagation lag (a grader **execution** failure), not a quality
+regression. Tell the user to wait a few minutes and re-run
+`agentops eval run`; do not treat it as a failing gate or start changing
+thresholds.
 ## Step 1 - Analyze evaluation setup
 Run the deterministic local triage first:

{agentops_accelerator-0.3.4 → agentops_accelerator-0.3.6/src/agentops_accelerator.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: agentops-accelerator
-Version: 0.3.4
+Version: 0.3.6
 Summary: Release readiness gates and evidence for Microsoft Foundry agents
 License: MIT License

{agentops_accelerator-0.3.4 → agentops_accelerator-0.3.6}/src/agentops_accelerator.egg-info/SOURCES.txt RENAMED Viewed

@@ -268,6 +268,7 @@ tests/unit/test_doctor_cli_explain.py
 tests/unit/test_dotenv_loader.py
 tests/unit/test_e2e_render.py
 tests/unit/test_eval_analysis.py
+tests/unit/test_eval_run_grader_errors.py
 tests/unit/test_evaluators.py
 tests/unit/test_foundry_discovery.py
 tests/unit/test_init_command.py

agentops_accelerator-0.3.6/tests/unit/test_eval_run_grader_errors.py ADDED Viewed

@@ -0,0 +1,150 @@
+"""CLI behaviour when graders *execute* but a subset errors out.
+A grader execution error (auth/RBAC/timeout) is not a quality regression, but
+because ``items_passed_all`` requires every grader on a row to succeed, a single
+errored grader flips ``overall_passed`` to ``False`` and the run reports
+``Threshold status: FAILED`` even though every computable threshold passed.
+The CLI must surface that distinction loudly so users (the most common trigger
+is data-plane RBAC that is still propagating) do not chase a phantom quality
+failure or start lowering thresholds.
+"""
+from __future__ import annotations
+import json
+from pathlib import Path
+from typer.testing import CliRunner
+from agentops.cli.app import _grader_error_summary, app
+from agentops.core.results import (
+    RowMetric,
+    RowResult,
+    RunResult,
+    RunSummary,
+    TargetInfo,
+    ThresholdEvaluation,
+)
+runner = CliRunner()
+_AUTH_ERROR = (
+    "FAILED_EXECUTION: (UserError) OpenAI API hits AuthenticationError: "
+    "Principal does not have access to API/Operation."
+)
+def _result_with_partial_grader_errors() -> RunResult:
+    """One row where coherence scored but similarity errored on auth."""
+    row = RowResult(
+        row_index=0,
+        input="plan a trip",
+        expected="an itinerary",
+        response="here is an itinerary",
+        metrics=[
+            RowMetric(name="coherence", value=5.0),
+            RowMetric(name="similarity", value=None, error=_AUTH_ERROR),
+        ],
+    )
+    summary = RunSummary(
+        items_total=1,
+        items_passed_all=0,  # the errored grader means no row passed all
+        items_pass_rate=0.0,
+        thresholds_total=1,
+        thresholds_passed=1,  # every computable threshold passed
+        threshold_pass_rate=1.0,
+        overall_passed=False,
+    )
+    return RunResult(
+        started_at="2026-06-01T00:00:00+00:00",
+        finished_at="2026-06-01T00:01:00+00:00",
+        duration_seconds=60.0,
+        target=TargetInfo(kind="foundry_prompt", raw="travel-agent:2"),
+        dataset_path="dataset.jsonl",
+        evaluators=["CoherenceEvaluator", "SimilarityEvaluator"],
+        rows=[row],
+        aggregate_metrics={"coherence": 5.0},
+        thresholds=[
+            ThresholdEvaluation(
+                metric="coherence",
+                criteria=">=",
+                expected=">=3",
+                actual="5",
+                passed=True,
+            )
+        ],
+        summary=summary,
+    )
+def test_grader_error_summary_counts_and_lifts_first_error() -> None:
+    errored, total, first_error = _grader_error_summary(
+        _result_with_partial_grader_errors()
+    )
+    assert (errored, total) == (1, 2)
+    assert first_error is not None
+    assert "AuthenticationError" in first_error
+def _write_minimal_config(tmp_path: Path) -> Path:
+    dataset = tmp_path / "dataset.jsonl"
+    dataset.write_text(json.dumps({"input": "hi", "expected": "hi"}), encoding="utf-8")
+    config = tmp_path / "agentops.yaml"
+    config.write_text(
+        json.dumps(
+            {"version": 1, "agent": "model:gpt-4o", "dataset": str(dataset)}
+        ),
+        encoding="utf-8",
+    )
+    return config
+def test_eval_run_warns_on_partial_grader_errors(tmp_path, monkeypatch) -> None:
+    config = _write_minimal_config(tmp_path)
+    output = tmp_path / "out"
+    output.mkdir()
+    crafted = _result_with_partial_grader_errors()
+    import agentops.pipeline.orchestrator as orch
+    monkeypatch.setattr(orch, "run_evaluation", lambda *a, **k: crafted)
+    result = runner.invoke(
+        app,
+        ["eval", "run", "--config", str(config), "--output", str(output)],
+    )
+    # A grader-execution failure keeps the gate-failed exit code...
+    assert result.exit_code == 2, result.output
+    # ...but the user is told it is an execution error, not a quality failure.
+    assert "grader execution(s) errored" in result.output
+    assert "propagating" in result.output
+    assert "AuthenticationError" in result.output
+    assert "FAILED" in result.output
+def test_eval_run_no_warning_when_no_grader_errors(tmp_path, monkeypatch) -> None:
+    config = _write_minimal_config(tmp_path)
+    output = tmp_path / "out"
+    output.mkdir()
+    clean = _result_with_partial_grader_errors()
+    # Drop the errored grader so the row is clean and the gate genuinely passes.
+    clean.rows[0].metrics = [RowMetric(name="coherence", value=5.0)]
+    clean.summary.items_passed_all = 1
+    clean.summary.items_pass_rate = 1.0
+    clean.summary.overall_passed = True
+    import agentops.pipeline.orchestrator as orch
+    monkeypatch.setattr(orch, "run_evaluation", lambda *a, **k: clean)
+    result = runner.invoke(
+        app,
+        ["eval", "run", "--config", str(config), "--output", str(output)],
+    )
+    assert result.exit_code == 0, result.output
+    assert "PASSED" in result.output
+    assert "grader execution(s) errored" not in result.output