PyPI - eval-protocol - Versions diffs - 0.2.86__tar.gz → 0.2.87__tar.gz - Mend

eval-protocol 0.2.86tar.gz → 0.2.87tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (450) hide show

{eval_protocol-0.2.86/eval_protocol.egg-info → eval_protocol-0.2.87}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: eval-protocol
-Version: 0.2.86
+Version: 0.2.87
 Summary: The official Python SDK for Eval Protocol (EP.) EP is an open protocol that standardizes how developers author evals for large language model (LLM) applications.
 Author-email: Fireworks AI <info@fireworks.ai>
 License-Expression: MIT
@@ -113,111 +113,37 @@ Requires-Dist: langfuse>=2.0.0; extra == "proxy"
 Requires-Dist: uuid6>=2025.0.0; extra == "proxy"
 Dynamic: license-file
-# Eval Protocol (EP)
+# Eval Protocol
 [![PyPI - Version](https://img.shields.io/pypi/v/eval-protocol)](https://pypi.org/project/eval-protocol/)
 [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/eval-protocol/python-sdk)
-**The open-source framework to help you write evals for RL.**
+**Eval Protocol (EP) is an open solution for doing reinforcement learning fine-tuning on existing agents — across any language, container, or framework.**
-## 🚀 Features
+![Eval Protocol overview](./docs/intro.png)
-- **Pytest authoring**: `@evaluation_test` decorator to configure evaluations
-- **Robust rollouts**: Handles flaky LLM APIs and parallel execution
-- **Integrations**: Works with Langfuse, LangSmith, Braintrust, Responses API
-- **Agent support**: LangGraph and Pydantic AI
-- **MCP RL envs**: Build reinforcement learning environments with MCP
-- **Built-in benchmarks**: AIME, tau-bench
-- **LLM judge**: Stack-rank models using pairwise Arena-Hard-Auto
-- **Local UI**: Pivot/table views for real-time analysis
+Most teams already have complex agents running in production — often across remote services with heavy dependencies, Docker containers, or TypeScript backends deployed on Vercel. When they try to train or fine-tune these agents with reinforcement learning, connecting them to a trainer quickly becomes painful.
-## ⚡ Quickstart (no labels needed)
+Eval Protocol makes this possible in two ways:
-Install with your tracing platform extras and set API keys:
+1. **Expose your agent through a simple API**
+   Wrap your existing agent (Python, TypeScript, Docker, etc.) in a simple HTTP service using EP’s rollout interface. EP handles the rollout orchestration, metadata passing, and trace storage automatically.
+2. **Connect with any trainer**
+   Once your agent speaks the EP standard, it can be fine-tuned or evaluated with any supported trainer — Fireworks RFT, TRL, Unsloth, or your own — with no environment rewrites.
-```bash
-pip install 'eval-protocol[langfuse]'
+The result: RL that works out-of-the-box for existing production agents.
-# Model API keys (set what you need)
-export OPENAI_API_KEY=...
-export FIREWORKS_API_KEY=...
-export GEMINI_API_KEY=...
+## Who This Is For
-# Platform keys
-export LANGFUSE_PUBLIC_KEY=...
-export LANGFUSE_SECRET_KEY=...
-export LANGFUSE_HOST=https://your-deployment.com  # optional
-```
+- **Applied AI teams** adding RL to existing production agents.
+- **Research engineers** experimenting with fine-tuning complex, multi-turn or tool-using agents.
+- **MLOps teams** building reproducible, language-agnostic rollout pipelines.
-Minimal evaluation using the built-in AHA judge:
+## Quickstart
-```python
-from datetime import datetime
-import pytest
+- See the Quickstart repository: [eval-protocol/quickstart](https://github.com/eval-protocol/quickstart/tree/main)
-from eval_protocol import (
-    evaluation_test,
-    aha_judge,
-    EvaluationRow,
-    SingleTurnRolloutProcessor,
-    DynamicDataLoader,
-    create_langfuse_adapter,
-)
-def langfuse_data_generator() -> list[EvaluationRow]:
-    adapter = create_langfuse_adapter()
-    return adapter.get_evaluation_rows(
-        to_timestamp=datetime.utcnow(),
-        limit=20,
-        sample_size=5,
-    )
-@pytest.mark.parametrize(
-    "completion_params",
-    [
-        {"model": "openai/gpt-4.1"},
-        {"model": "fireworks_ai/accounts/fireworks/models/gpt-oss-120b"},
-    ],
-)
-@evaluation_test(
-    data_loaders=DynamicDataLoader(generators=[langfuse_data_generator]),
-    rollout_processor=SingleTurnRolloutProcessor(),
-)
-async def test_llm_judge(row: EvaluationRow) -> EvaluationRow:
-    return await aha_judge(row)
-```
-Run it:
-```bash
-pytest -q -s
-```
-The pytest output includes local links for a leaderboard and row-level traces (pivot/table) at `http://localhost:8000`.
-## Installation
-This library requires Python >= 3.10.
-### pip
-```bash
-pip install eval-protocol
-```
-### uv (recommended)
-```bash
-# Install uv (if needed)
-curl -LsSf https://astral.sh/uv/install.sh | sh
-# Add to your project
-uv add eval-protocol
-```
-## 📚 Resources
+## Resources
 - **[Documentation](https://evalprotocol.io)** – Guides and API reference
 - **[Discord](https://discord.com/channels/1137072072808472616/1400975572405850155)** – Community

eval_protocol-0.2.87/README.md ADDED Viewed

@@ -0,0 +1,39 @@
+# Eval Protocol
+[![PyPI - Version](https://img.shields.io/pypi/v/eval-protocol)](https://pypi.org/project/eval-protocol/)
+[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/eval-protocol/python-sdk)
+**Eval Protocol (EP) is an open solution for doing reinforcement learning fine-tuning on existing agents — across any language, container, or framework.**
+![Eval Protocol overview](./docs/intro.png)
+Most teams already have complex agents running in production — often across remote services with heavy dependencies, Docker containers, or TypeScript backends deployed on Vercel. When they try to train or fine-tune these agents with reinforcement learning, connecting them to a trainer quickly becomes painful.
+Eval Protocol makes this possible in two ways:
+1. **Expose your agent through a simple API**
+   Wrap your existing agent (Python, TypeScript, Docker, etc.) in a simple HTTP service using EP’s rollout interface. EP handles the rollout orchestration, metadata passing, and trace storage automatically.
+2. **Connect with any trainer**
+   Once your agent speaks the EP standard, it can be fine-tuned or evaluated with any supported trainer — Fireworks RFT, TRL, Unsloth, or your own — with no environment rewrites.
+The result: RL that works out-of-the-box for existing production agents.
+## Who This Is For
+- **Applied AI teams** adding RL to existing production agents.
+- **Research engineers** experimenting with fine-tuning complex, multi-turn or tool-using agents.
+- **MLOps teams** building reproducible, language-agnostic rollout pipelines.
+## Quickstart
+- See the Quickstart repository: [eval-protocol/quickstart](https://github.com/eval-protocol/quickstart/tree/main)
+## Resources
+- **[Documentation](https://evalprotocol.io)** – Guides and API reference
+- **[Discord](https://discord.com/channels/1137072072808472616/1400975572405850155)** – Community
+- **[GitHub](https://github.com/eval-protocol/python-sdk)** – Source and examples
+## License
+[MIT](LICENSE)

{eval_protocol-0.2.86 → eval_protocol-0.2.87}/eval_protocol/_version.py RENAMED Viewed

@@ -8,11 +8,11 @@ import json
 version_json = '''
 {
- "date": "2025-11-11T17:37:10-0800",
+ "date": "2025-11-12T15:43:06-0800",
  "dirty": false,
  "error": null,
- "full-revisionid": "3b2340644bd8894822a9e34445bc60552e3843cf",
- "version": "0.2.86"
+ "full-revisionid": "8ab1c920bb77880deb87f2320c6cf6ea8780458e",
+ "version": "0.2.87"
 }
 '''  # END VERSION_JSON

{eval_protocol-0.2.86 → eval_protocol-0.2.87}/eval_protocol/cli.py RENAMED Viewed

@@ -371,13 +371,13 @@ def parse_args(args=None):
         help="Create a Reinforcement Fine-tuning Job on Fireworks",
     )
     rft_parser.add_argument(
-        "--evaluator-id",
-        help="Evaluator ID used during upload; if omitted, derive from local traces or a single discovered test",
+        "--evaluator",
+        help="Evaluator ID or fully-qualified resource (accounts/{acct}/evaluators/{id}); if omitted, derive from local tests",
     )
     # Dataset options
     rft_parser.add_argument(
-        "--dataset-id",
-        help="Use existing Fireworks dataset id (skip local materialization)",
+        "--dataset",
+        help="Use existing dataset (ID or resource 'accounts/{acct}/datasets/{id}') to skip local materialization",
     )
     rft_parser.add_argument(
         "--dataset-jsonl",
@@ -400,6 +400,8 @@ def parse_args(args=None):
     rft_parser.add_argument("--learning-rate", type=float, default=3e-5)
     rft_parser.add_argument("--max-context-length", type=int, default=65536)
     rft_parser.add_argument("--lora-rank", type=int, default=16)
+    rft_parser.add_argument("--gradient-accumulation-steps", type=int, help="Number of gradient accumulation steps")
+    rft_parser.add_argument("--learning-rate-warmup-steps", type=int, help="Number of LR warmup steps")
     rft_parser.add_argument("--accelerator-count", type=int, default=1)
     rft_parser.add_argument("--region", help="Fireworks region enum value")
     rft_parser.add_argument("--display-name", help="RFT job display name")
@@ -412,9 +414,14 @@ def parse_args(args=None):
     rft_parser.add_argument("--temperature", type=float)
     rft_parser.add_argument("--top-p", type=float)
     rft_parser.add_argument("--top-k", type=int)
-    rft_parser.add_argument("--max-tokens", type=int, default=32768)
-    rft_parser.add_argument("--n", type=int, default=8)
-    rft_parser.add_argument("--inference-extra-body", help="JSON string for extra inference params")
+    rft_parser.add_argument("--max-output-tokens", type=int, default=32768)
+    rft_parser.add_argument("--response-candidates-count", type=int, default=8)
+    rft_parser.add_argument("--extra-body", help="JSON string for extra inference params")
+    # MCP server (optional)
+    rft_parser.add_argument(
+        "--mcp-server",
+        help="The MCP server resource name to use for the reinforcement fine-tuning job.",
+    )
     # Wandb
     rft_parser.add_argument("--wandb-enabled", action="store_true")
     rft_parser.add_argument("--wandb-project")
@@ -422,7 +429,7 @@ def parse_args(args=None):
     rft_parser.add_argument("--wandb-run-id")
     rft_parser.add_argument("--wandb-api-key")
     # Misc
-    rft_parser.add_argument("--rft-job-id", help="Specify an explicit RFT job id")
+    rft_parser.add_argument("--job-id", help="Specify an explicit RFT job id")
     rft_parser.add_argument("--yes", "-y", action="store_true", help="Non-interactive mode")
     rft_parser.add_argument("--dry-run", action="store_true", help="Print planned REST calls without sending")
     rft_parser.add_argument("--force", action="store_true", help="Overwrite existing evaluator with the same ID")

{eval_protocol-0.2.86 → eval_protocol-0.2.87}/eval_protocol/cli_commands/create_rft.py RENAMED Viewed

@@ -344,7 +344,7 @@ def _poll_evaluator_status(
 def create_rft_command(args) -> int:
-    evaluator_id: Optional[str] = getattr(args, "evaluator_id", None)
+    evaluator_id: Optional[str] = getattr(args, "evaluator", None)
     non_interactive: bool = bool(getattr(args, "yes", False))
     dry_run: bool = bool(getattr(args, "dry_run", False))
     force: bool = bool(getattr(args, "force", False))
@@ -373,11 +373,11 @@ def create_rft_command(args) -> int:
             print("No evaluation tests found.")
             print("\nHint: Make sure your tests use the @evaluation_test decorator.")
             return 1
-        # Always interactive selection here (no implicit quiet unless --evaluator-id was provided)
+        # Always interactive selection here
         try:
             selected_tests = _prompt_select(tests, non_interactive=non_interactive)
         except Exception:
-            print("Error: Failed to open selector UI. Please pass --evaluator-id or --entry explicitly.")
+            print("Error: Failed to open selector UI. Please pass --evaluator or --entry explicitly.")
             return 1
         if not selected_tests:
             print("No tests selected.")
@@ -385,7 +385,7 @@ def create_rft_command(args) -> int:
         if len(selected_tests) != 1:
             if non_interactive and len(selected_tests) > 1:
                 print("Error: Multiple evaluation tests found in --yes (non-interactive) mode.")
-                print("       Please pass --evaluator-id or --entry to disambiguate.")
+                print("       Please pass --evaluator or --entry to disambiguate.")
                 try:
                     # Offer candidate evaluator ids for convenience
                     tests = _discover_tests(project_root)
@@ -410,8 +410,13 @@ def create_rft_command(args) -> int:
         selected_test_file_path, selected_test_func_name = _resolve_selected_test(
             project_root, evaluator_id, selected_tests=selected_tests
         )
-    # Resolve evaluator resource name to fully-qualified format required by API
-    evaluator_resource_name = f"accounts/{account_id}/evaluators/{evaluator_id}"
+    # Resolve evaluator resource name to fully-qualified format required by API.
+    # Allow users to pass either short id or fully-qualified resource.
+    if evaluator_id and evaluator_id.startswith("accounts/"):
+        evaluator_resource_name = evaluator_id
+        evaluator_id = _extract_terminal_segment(evaluator_id)
+    else:
+        evaluator_resource_name = f"accounts/{account_id}/evaluators/{evaluator_id}"
     # Optional short-circuit: if evaluator already exists and not forcing, skip upload path
     skip_upload = False
@@ -470,10 +475,10 @@ def create_rft_command(args) -> int:
             # If still unresolved and multiple tests exist, fail fast to avoid uploading unintended evaluators
             if selected_entry is None and len(tests) > 1:
                 print(
-                    f"Error: Multiple evaluation tests found, and the selected evaluator_id {evaluator_id} does not match any discovered test.\n"
-                    "       Please re-run specifying the evaluator id.\n"
+                    f"Error: Multiple evaluation tests found, and the selected evaluator {evaluator_id} does not match any discovered test.\n"
+                    "       Please re-run specifying the evaluator.\n"
                     "       Hints:\n"
-                    "         - eval-protocol create rft --evaluator-id <existing-evaluator-id>\n"
+                    "         - eval-protocol create rft --evaluator <existing-evaluator-id>\n"
                 )
                 return 1
@@ -523,10 +528,15 @@ def create_rft_command(args) -> int:
             print(f"Warning: Failed to upload evaluator automatically: {e}")
     # Determine dataset id and materialization path
-    dataset_id = getattr(args, "dataset_id", None)
+    dataset_id = getattr(args, "dataset", None)
     dataset_jsonl = getattr(args, "dataset_jsonl", None)
     dataset_display_name = getattr(args, "dataset_display_name", None)
     dataset_builder = getattr(args, "dataset_builder", None)  # accepted but unused in simplified flow
+    dataset_resource_override: Optional[str] = None
+    if isinstance(dataset_id, str) and dataset_id.startswith("accounts/"):
+        # Caller passed a fully-qualified dataset; capture it for body and keep only terminal id for printing
+        dataset_resource_override = dataset_id
+        dataset_id = _extract_terminal_segment(dataset_id)
     if not dataset_id:
         # Prefer explicit --dataset-jsonl, else attempt to extract from the selected test's data loader or input_dataset.
@@ -573,7 +583,7 @@ def create_rft_command(args) -> int:
                             print(f"Warning: dataset builder failed: {e}")
         if not dataset_jsonl:
             print(
-                "Error: Could not determine dataset. Provide --dataset-id or --dataset-jsonl, or ensure a JSONL-based data loader or input_dataset is used in your single discovered test."
+                "Error: Could not determine dataset. Provide --dataset or --dataset-jsonl, or ensure a JSONL-based data loader or input_dataset is used in your single discovered test."
             )
             return 1
@@ -628,6 +638,8 @@ def create_rft_command(args) -> int:
         ("learningRate", "learning_rate"),
         ("maxContextLength", "max_context_length"),
         ("loraRank", "lora_rank"),
+        ("gradientAccumulationSteps", "gradient_accumulation_steps"),
+        ("learningRateWarmupSteps", "learning_rate_warmup_steps"),
         ("acceleratorCount", "accelerator_count"),
         ("region", "region"),
     ]:
@@ -640,14 +652,25 @@ def create_rft_command(args) -> int:
         ("temperature", "temperature"),
         ("topP", "top_p"),
         ("topK", "top_k"),
-        ("maxTokens", "max_tokens"),
-        ("n", "n"),
+        ("maxTokens", "max_output_tokens"),
+        ("n", "response_candidates_count"),
     ]:
         val = getattr(args, arg_name, None)
         if val is not None:
             inference_params[key] = val
-    if getattr(args, "inference_extra_body", None):
-        inference_params["extraBody"] = args.inference_extra_body
+    if getattr(args, "extra_body", None):
+        extra = getattr(args, "extra_body")
+        if isinstance(extra, (dict, list)):
+            try:
+                inference_params["extraBody"] = json.dumps(extra, ensure_ascii=False)
+            except (TypeError, ValueError) as e:
+                print(f"Error: --extra-body dict/list must be JSON-serializable: {e}")
+                return 1
+        elif isinstance(extra, str):
+            inference_params["extraBody"] = extra
+        else:
+            print("Error: --extra-body must be a JSON string or a JSON-serializable dict/list.")
+            return 1
     wandb_config: Optional[Dict[str, Any]] = None
     if getattr(args, "wandb_enabled", False):
@@ -659,9 +682,12 @@ def create_rft_command(args) -> int:
             "runId": getattr(args, "wandb_run_id", None),
         }
+    # Build dataset resource (prefer override when provided)
+    dataset_resource = dataset_resource_override or f"accounts/{account_id}/datasets/{dataset_id}"
     body: Dict[str, Any] = {
-        # "displayName": getattr(args, "display_name", None) or f"{evaluator_id}-rft",
-        "dataset": f"accounts/{account_id}/datasets/{dataset_id}",
+        "displayName": getattr(args, "display_name", None),
+        "dataset": dataset_resource,
         "evaluator": evaluator_resource_name,
         "evalAutoCarveout": bool(getattr(args, "eval_auto_carveout", True)),
         "trainingConfig": training_config,
@@ -670,7 +696,8 @@ def create_rft_command(args) -> int:
         "chunkSize": getattr(args, "chunk_size", None),
         "outputStats": None,
         "outputMetrics": None,
-        "mcpServer": None,
+        "mcpServer": getattr(args, "mcp_server", None),
+        "jobId": getattr(args, "job_id", None),
     }
     # Debug: print minimal summary
     print(f"Prepared RFT job for evaluator '{evaluator_id}' using dataset '{dataset_id}'")

{eval_protocol-0.2.86 → eval_protocol-0.2.87}/eval_protocol/fireworks_rft.py RENAMED Viewed

@@ -8,6 +8,7 @@ import time
 import uuid
 from pathlib import Path
 from typing import Any, Callable, Dict, Iterable, Optional, Tuple
+from urllib.parse import urlencode
 import requests
@@ -186,6 +187,14 @@ def create_reinforcement_fine_tuning_job(
     body: Dict[str, Any],
 ) -> Dict[str, Any]:
     url = f"{api_base.rstrip('/')}/v1/accounts/{account_id}/reinforcementFineTuningJobs"
+    # Move optional jobId from body to query parameter if provided
+    job_id = body.get("jobId")
+    if isinstance(job_id, str):
+        job_id = job_id.strip()
+    if job_id:
+        # Remove from body and append as query param
+        body.pop("jobId", None)
+        url = f"{url}?{urlencode({'reinforcementFineTuningJobId': job_id})}"
     headers = {
         "Authorization": f"Bearer {api_key}",
         "Content-Type": "application/json",

{eval_protocol-0.2.86 → eval_protocol-0.2.87/eval_protocol.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: eval-protocol
-Version: 0.2.86
+Version: 0.2.87
 Summary: The official Python SDK for Eval Protocol (EP.) EP is an open protocol that standardizes how developers author evals for large language model (LLM) applications.
 Author-email: Fireworks AI <info@fireworks.ai>
 License-Expression: MIT
@@ -113,111 +113,37 @@ Requires-Dist: langfuse>=2.0.0; extra == "proxy"
 Requires-Dist: uuid6>=2025.0.0; extra == "proxy"
 Dynamic: license-file
-# Eval Protocol (EP)
+# Eval Protocol
 [![PyPI - Version](https://img.shields.io/pypi/v/eval-protocol)](https://pypi.org/project/eval-protocol/)
 [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/eval-protocol/python-sdk)
-**The open-source framework to help you write evals for RL.**
+**Eval Protocol (EP) is an open solution for doing reinforcement learning fine-tuning on existing agents — across any language, container, or framework.**
-## 🚀 Features
+![Eval Protocol overview](./docs/intro.png)
-- **Pytest authoring**: `@evaluation_test` decorator to configure evaluations
-- **Robust rollouts**: Handles flaky LLM APIs and parallel execution
-- **Integrations**: Works with Langfuse, LangSmith, Braintrust, Responses API
-- **Agent support**: LangGraph and Pydantic AI
-- **MCP RL envs**: Build reinforcement learning environments with MCP
-- **Built-in benchmarks**: AIME, tau-bench
-- **LLM judge**: Stack-rank models using pairwise Arena-Hard-Auto
-- **Local UI**: Pivot/table views for real-time analysis
+Most teams already have complex agents running in production — often across remote services with heavy dependencies, Docker containers, or TypeScript backends deployed on Vercel. When they try to train or fine-tune these agents with reinforcement learning, connecting them to a trainer quickly becomes painful.
-## ⚡ Quickstart (no labels needed)
+Eval Protocol makes this possible in two ways:
-Install with your tracing platform extras and set API keys:
+1. **Expose your agent through a simple API**
+   Wrap your existing agent (Python, TypeScript, Docker, etc.) in a simple HTTP service using EP’s rollout interface. EP handles the rollout orchestration, metadata passing, and trace storage automatically.
+2. **Connect with any trainer**
+   Once your agent speaks the EP standard, it can be fine-tuned or evaluated with any supported trainer — Fireworks RFT, TRL, Unsloth, or your own — with no environment rewrites.
-```bash
-pip install 'eval-protocol[langfuse]'
+The result: RL that works out-of-the-box for existing production agents.
-# Model API keys (set what you need)
-export OPENAI_API_KEY=...
-export FIREWORKS_API_KEY=...
-export GEMINI_API_KEY=...
+## Who This Is For
-# Platform keys
-export LANGFUSE_PUBLIC_KEY=...
-export LANGFUSE_SECRET_KEY=...
-export LANGFUSE_HOST=https://your-deployment.com  # optional
-```
+- **Applied AI teams** adding RL to existing production agents.
+- **Research engineers** experimenting with fine-tuning complex, multi-turn or tool-using agents.
+- **MLOps teams** building reproducible, language-agnostic rollout pipelines.
-Minimal evaluation using the built-in AHA judge:
+## Quickstart
-```python
-from datetime import datetime
-import pytest
+- See the Quickstart repository: [eval-protocol/quickstart](https://github.com/eval-protocol/quickstart/tree/main)
-from eval_protocol import (
-    evaluation_test,
-    aha_judge,
-    EvaluationRow,
-    SingleTurnRolloutProcessor,
-    DynamicDataLoader,
-    create_langfuse_adapter,
-)
-def langfuse_data_generator() -> list[EvaluationRow]:
-    adapter = create_langfuse_adapter()
-    return adapter.get_evaluation_rows(
-        to_timestamp=datetime.utcnow(),
-        limit=20,
-        sample_size=5,
-    )
-@pytest.mark.parametrize(
-    "completion_params",
-    [
-        {"model": "openai/gpt-4.1"},
-        {"model": "fireworks_ai/accounts/fireworks/models/gpt-oss-120b"},
-    ],
-)
-@evaluation_test(
-    data_loaders=DynamicDataLoader(generators=[langfuse_data_generator]),
-    rollout_processor=SingleTurnRolloutProcessor(),
-)
-async def test_llm_judge(row: EvaluationRow) -> EvaluationRow:
-    return await aha_judge(row)
-```
-Run it:
-```bash
-pytest -q -s
-```
-The pytest output includes local links for a leaderboard and row-level traces (pivot/table) at `http://localhost:8000`.
-## Installation
-This library requires Python >= 3.10.
-### pip
-```bash
-pip install eval-protocol
-```
-### uv (recommended)
-```bash
-# Install uv (if needed)
-curl -LsSf https://astral.sh/uv/install.sh | sh
-# Add to your project
-uv add eval-protocol
-```
-## 📚 Resources
+## Resources
 - **[Documentation](https://evalprotocol.io)** – Guides and API reference
 - **[Discord](https://discord.com/channels/1137072072808472616/1400975572405850155)** – Community

eval-protocol 0.2.86__tar.gz → 0.2.87__tar.gz

eval-protocol 0.2.86tar.gz → 0.2.87tar.gz