PyPI - eval-protocol - Versions diffs - 0.2.85__tar.gz → 0.2.87__tar.gz - Mend

eval-protocol 0.2.85tar.gz → 0.2.87tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (450) hide show

{eval_protocol-0.2.85/eval_protocol.egg-info → eval_protocol-0.2.87}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: eval-protocol
-Version: 0.2.85
+Version: 0.2.87
 Summary: The official Python SDK for Eval Protocol (EP.) EP is an open protocol that standardizes how developers author evals for large language model (LLM) applications.
 Author-email: Fireworks AI <info@fireworks.ai>
 License-Expression: MIT
@@ -113,113 +113,37 @@ Requires-Dist: langfuse>=2.0.0; extra == "proxy"
 Requires-Dist: uuid6>=2025.0.0; extra == "proxy"
 Dynamic: license-file
-# Eval Protocol (EP)
+# Eval Protocol
 [![PyPI - Version](https://img.shields.io/pypi/v/eval-protocol)](https://pypi.org/project/eval-protocol/)
 [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/eval-protocol/python-sdk)
-**Stop guessing which AI model to use. Build a data-driven model leaderboard.**
+**Eval Protocol (EP) is an open solution for doing reinforcement learning fine-tuning on existing agents — across any language, container, or framework.**
-With hundreds of models and configs, you need objective data to choose the right one for your use case. EP helps you evaluate real traces, compare models, and visualize results locally.
+![Eval Protocol overview](./docs/intro.png)
-## 🚀 Features
+Most teams already have complex agents running in production — often across remote services with heavy dependencies, Docker containers, or TypeScript backends deployed on Vercel. When they try to train or fine-tune these agents with reinforcement learning, connecting them to a trainer quickly becomes painful.
-- **Pytest authoring**: `@evaluation_test` decorator to configure evaluations
-- **Robust rollouts**: Handles flaky LLM APIs and parallel execution
-- **Integrations**: Works with Langfuse, LangSmith, Braintrust, Responses API
-- **Agent support**: LangGraph and Pydantic AI
-- **MCP RL envs**: Build reinforcement learning environments with MCP
-- **Built-in benchmarks**: AIME, tau-bench
-- **LLM judge**: Stack-rank models using pairwise Arena-Hard-Auto
-- **Local UI**: Pivot/table views for real-time analysis
+Eval Protocol makes this possible in two ways:
-## ⚡ Quickstart (no labels needed)
+1. **Expose your agent through a simple API**
+   Wrap your existing agent (Python, TypeScript, Docker, etc.) in a simple HTTP service using EP’s rollout interface. EP handles the rollout orchestration, metadata passing, and trace storage automatically.
+2. **Connect with any trainer**
+   Once your agent speaks the EP standard, it can be fine-tuned or evaluated with any supported trainer — Fireworks RFT, TRL, Unsloth, or your own — with no environment rewrites.
-Install with your tracing platform extras and set API keys:
+The result: RL that works out-of-the-box for existing production agents.
-```bash
-pip install 'eval-protocol[langfuse]'
+## Who This Is For
-# Model API keys (set what you need)
-export OPENAI_API_KEY=...
-export FIREWORKS_API_KEY=...
-export GEMINI_API_KEY=...
+- **Applied AI teams** adding RL to existing production agents.
+- **Research engineers** experimenting with fine-tuning complex, multi-turn or tool-using agents.
+- **MLOps teams** building reproducible, language-agnostic rollout pipelines.
-# Platform keys
-export LANGFUSE_PUBLIC_KEY=...
-export LANGFUSE_SECRET_KEY=...
-export LANGFUSE_HOST=https://your-deployment.com  # optional
-```
+## Quickstart
-Minimal evaluation using the built-in AHA judge:
+- See the Quickstart repository: [eval-protocol/quickstart](https://github.com/eval-protocol/quickstart/tree/main)
-```python
-from datetime import datetime
-import pytest
-from eval_protocol import (
-    evaluation_test,
-    aha_judge,
-    EvaluationRow,
-    SingleTurnRolloutProcessor,
-    DynamicDataLoader,
-    create_langfuse_adapter,
-)
-def langfuse_data_generator() -> list[EvaluationRow]:
-    adapter = create_langfuse_adapter()
-    return adapter.get_evaluation_rows(
-        to_timestamp=datetime.utcnow(),
-        limit=20,
-        sample_size=5,
-    )
-@pytest.mark.parametrize(
-    "completion_params",
-    [
-        {"model": "openai/gpt-4.1"},
-        {"model": "fireworks_ai/accounts/fireworks/models/gpt-oss-120b"},
-    ],
-)
-@evaluation_test(
-    data_loaders=DynamicDataLoader(generators=[langfuse_data_generator]),
-    rollout_processor=SingleTurnRolloutProcessor(),
-)
-async def test_llm_judge(row: EvaluationRow) -> EvaluationRow:
-    return await aha_judge(row)
-```
-Run it:
-```bash
-pytest -q -s
-```
-The pytest output includes local links for a leaderboard and row-level traces (pivot/table) at `http://localhost:8000`.
-## Installation
-This library requires Python >= 3.10.
-### pip
-```bash
-pip install eval-protocol
-```
-### uv (recommended)
-```bash
-# Install uv (if needed)
-curl -LsSf https://astral.sh/uv/install.sh | sh
-# Add to your project
-uv add eval-protocol
-```
-## 📚 Resources
+## Resources
 - **[Documentation](https://evalprotocol.io)** – Guides and API reference
 - **[Discord](https://discord.com/channels/1137072072808472616/1400975572405850155)** – Community

eval_protocol-0.2.87/README.md ADDED Viewed

@@ -0,0 +1,39 @@
+# Eval Protocol
+[![PyPI - Version](https://img.shields.io/pypi/v/eval-protocol)](https://pypi.org/project/eval-protocol/)
+[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/eval-protocol/python-sdk)
+**Eval Protocol (EP) is an open solution for doing reinforcement learning fine-tuning on existing agents — across any language, container, or framework.**
+![Eval Protocol overview](./docs/intro.png)
+Most teams already have complex agents running in production — often across remote services with heavy dependencies, Docker containers, or TypeScript backends deployed on Vercel. When they try to train or fine-tune these agents with reinforcement learning, connecting them to a trainer quickly becomes painful.
+Eval Protocol makes this possible in two ways:
+1. **Expose your agent through a simple API**
+   Wrap your existing agent (Python, TypeScript, Docker, etc.) in a simple HTTP service using EP’s rollout interface. EP handles the rollout orchestration, metadata passing, and trace storage automatically.
+2. **Connect with any trainer**
+   Once your agent speaks the EP standard, it can be fine-tuned or evaluated with any supported trainer — Fireworks RFT, TRL, Unsloth, or your own — with no environment rewrites.
+The result: RL that works out-of-the-box for existing production agents.
+## Who This Is For
+- **Applied AI teams** adding RL to existing production agents.
+- **Research engineers** experimenting with fine-tuning complex, multi-turn or tool-using agents.
+- **MLOps teams** building reproducible, language-agnostic rollout pipelines.
+## Quickstart
+- See the Quickstart repository: [eval-protocol/quickstart](https://github.com/eval-protocol/quickstart/tree/main)
+## Resources
+- **[Documentation](https://evalprotocol.io)** – Guides and API reference
+- **[Discord](https://discord.com/channels/1137072072808472616/1400975572405850155)** – Community
+- **[GitHub](https://github.com/eval-protocol/python-sdk)** – Source and examples
+## License
+[MIT](LICENSE)

{eval_protocol-0.2.85 → eval_protocol-0.2.87}/eval_protocol/_version.py RENAMED Viewed

@@ -8,11 +8,11 @@ import json
 version_json = '''
 {
- "date": "2025-11-11T15:13:03-0800",
+ "date": "2025-11-12T15:43:06-0800",
  "dirty": false,
  "error": null,
- "full-revisionid": "41b79daeafe6bcb53a8a3183738314596874696d",
- "version": "0.2.85"
+ "full-revisionid": "8ab1c920bb77880deb87f2320c6cf6ea8780458e",
+ "version": "0.2.87"
 }
 '''  # END VERSION_JSON

{eval_protocol-0.2.85 → eval_protocol-0.2.87}/eval_protocol/cli.py RENAMED Viewed

@@ -371,13 +371,13 @@ def parse_args(args=None):
         help="Create a Reinforcement Fine-tuning Job on Fireworks",
     )
     rft_parser.add_argument(
-        "--evaluator-id",
-        help="Evaluator ID used during upload; if omitted, derive from local traces or a single discovered test",
+        "--evaluator",
+        help="Evaluator ID or fully-qualified resource (accounts/{acct}/evaluators/{id}); if omitted, derive from local tests",
     )
     # Dataset options
     rft_parser.add_argument(
-        "--dataset-id",
-        help="Use existing Fireworks dataset id (skip local materialization)",
+        "--dataset",
+        help="Use existing dataset (ID or resource 'accounts/{acct}/datasets/{id}') to skip local materialization",
     )
     rft_parser.add_argument(
         "--dataset-jsonl",
@@ -400,6 +400,8 @@ def parse_args(args=None):
     rft_parser.add_argument("--learning-rate", type=float, default=3e-5)
     rft_parser.add_argument("--max-context-length", type=int, default=65536)
     rft_parser.add_argument("--lora-rank", type=int, default=16)
+    rft_parser.add_argument("--gradient-accumulation-steps", type=int, help="Number of gradient accumulation steps")
+    rft_parser.add_argument("--learning-rate-warmup-steps", type=int, help="Number of LR warmup steps")
     rft_parser.add_argument("--accelerator-count", type=int, default=1)
     rft_parser.add_argument("--region", help="Fireworks region enum value")
     rft_parser.add_argument("--display-name", help="RFT job display name")
@@ -407,14 +409,19 @@ def parse_args(args=None):
     rft_parser.add_argument("--eval-auto-carveout", dest="eval_auto_carveout", action="store_true", default=True)
     rft_parser.add_argument("--no-eval-auto-carveout", dest="eval_auto_carveout", action="store_false")
     # Rollout chunking
-    rft_parser.add_argument("--chunk-size", type=int, default=10, help="Data chunk size for rollout batching")
+    rft_parser.add_argument("--chunk-size", type=int, default=100, help="Data chunk size for rollout batching")
     # Inference params
     rft_parser.add_argument("--temperature", type=float)
     rft_parser.add_argument("--top-p", type=float)
     rft_parser.add_argument("--top-k", type=int)
-    rft_parser.add_argument("--max-tokens", type=int, default=32768)
-    rft_parser.add_argument("--n", type=int, default=8)
-    rft_parser.add_argument("--inference-extra-body", help="JSON string for extra inference params")
+    rft_parser.add_argument("--max-output-tokens", type=int, default=32768)
+    rft_parser.add_argument("--response-candidates-count", type=int, default=8)
+    rft_parser.add_argument("--extra-body", help="JSON string for extra inference params")
+    # MCP server (optional)
+    rft_parser.add_argument(
+        "--mcp-server",
+        help="The MCP server resource name to use for the reinforcement fine-tuning job.",
+    )
     # Wandb
     rft_parser.add_argument("--wandb-enabled", action="store_true")
     rft_parser.add_argument("--wandb-project")
@@ -422,7 +429,7 @@ def parse_args(args=None):
     rft_parser.add_argument("--wandb-run-id")
     rft_parser.add_argument("--wandb-api-key")
     # Misc
-    rft_parser.add_argument("--rft-job-id", help="Specify an explicit RFT job id")
+    rft_parser.add_argument("--job-id", help="Specify an explicit RFT job id")
     rft_parser.add_argument("--yes", "-y", action="store_true", help="Non-interactive mode")
     rft_parser.add_argument("--dry-run", action="store_true", help="Print planned REST calls without sending")
     rft_parser.add_argument("--force", action="store_true", help="Overwrite existing evaluator with the same ID")
@@ -447,6 +454,16 @@ def parse_args(args=None):
         action="store_true",
         help="Non-interactive: if multiple tests exist and no --entry, fails with guidance",
     )
+    local_test_parser.add_argument(
+        "--docker-build-extra",
+        default="",
+        help="Extra flags to pass to 'docker build' (quoted string, e.g. \"--no-cache --pull --progress=plain\")",
+    )
+    local_test_parser.add_argument(
+        "--docker-run-extra",
+        default="",
+        help="Extra flags to pass to 'docker run' (quoted string, e.g. \"--env-file .env --memory=8g\")",
+    )
     # Run command (for Hydra-based evaluations)
     # This subparser intentionally defines no arguments itself.

{eval_protocol-0.2.85 → eval_protocol-0.2.87}/eval_protocol/cli_commands/create_rft.py RENAMED Viewed

@@ -344,7 +344,7 @@ def _poll_evaluator_status(
 def create_rft_command(args) -> int:
-    evaluator_id: Optional[str] = getattr(args, "evaluator_id", None)
+    evaluator_id: Optional[str] = getattr(args, "evaluator", None)
     non_interactive: bool = bool(getattr(args, "yes", False))
     dry_run: bool = bool(getattr(args, "dry_run", False))
     force: bool = bool(getattr(args, "force", False))
@@ -373,11 +373,11 @@ def create_rft_command(args) -> int:
             print("No evaluation tests found.")
             print("\nHint: Make sure your tests use the @evaluation_test decorator.")
             return 1
-        # Always interactive selection here (no implicit quiet unless --evaluator-id was provided)
+        # Always interactive selection here
         try:
             selected_tests = _prompt_select(tests, non_interactive=non_interactive)
         except Exception:
-            print("Error: Failed to open selector UI. Please pass --evaluator-id or --entry explicitly.")
+            print("Error: Failed to open selector UI. Please pass --evaluator or --entry explicitly.")
             return 1
         if not selected_tests:
             print("No tests selected.")
@@ -385,7 +385,7 @@ def create_rft_command(args) -> int:
         if len(selected_tests) != 1:
             if non_interactive and len(selected_tests) > 1:
                 print("Error: Multiple evaluation tests found in --yes (non-interactive) mode.")
-                print("       Please pass --evaluator-id or --entry to disambiguate.")
+                print("       Please pass --evaluator or --entry to disambiguate.")
                 try:
                     # Offer candidate evaluator ids for convenience
                     tests = _discover_tests(project_root)
@@ -410,8 +410,13 @@ def create_rft_command(args) -> int:
         selected_test_file_path, selected_test_func_name = _resolve_selected_test(
             project_root, evaluator_id, selected_tests=selected_tests
         )
-    # Resolve evaluator resource name to fully-qualified format required by API
-    evaluator_resource_name = f"accounts/{account_id}/evaluators/{evaluator_id}"
+    # Resolve evaluator resource name to fully-qualified format required by API.
+    # Allow users to pass either short id or fully-qualified resource.
+    if evaluator_id and evaluator_id.startswith("accounts/"):
+        evaluator_resource_name = evaluator_id
+        evaluator_id = _extract_terminal_segment(evaluator_id)
+    else:
+        evaluator_resource_name = f"accounts/{account_id}/evaluators/{evaluator_id}"
     # Optional short-circuit: if evaluator already exists and not forcing, skip upload path
     skip_upload = False
@@ -470,10 +475,10 @@ def create_rft_command(args) -> int:
             # If still unresolved and multiple tests exist, fail fast to avoid uploading unintended evaluators
             if selected_entry is None and len(tests) > 1:
                 print(
-                    f"Error: Multiple evaluation tests found, and the selected evaluator_id {evaluator_id} does not match any discovered test.\n"
-                    "       Please re-run specifying the evaluator id.\n"
+                    f"Error: Multiple evaluation tests found, and the selected evaluator {evaluator_id} does not match any discovered test.\n"
+                    "       Please re-run specifying the evaluator.\n"
                     "       Hints:\n"
-                    "         - eval-protocol create rft --evaluator-id <existing-evaluator-id>\n"
+                    "         - eval-protocol create rft --evaluator <existing-evaluator-id>\n"
                 )
                 return 1
@@ -523,10 +528,15 @@ def create_rft_command(args) -> int:
             print(f"Warning: Failed to upload evaluator automatically: {e}")
     # Determine dataset id and materialization path
-    dataset_id = getattr(args, "dataset_id", None)
+    dataset_id = getattr(args, "dataset", None)
     dataset_jsonl = getattr(args, "dataset_jsonl", None)
     dataset_display_name = getattr(args, "dataset_display_name", None)
     dataset_builder = getattr(args, "dataset_builder", None)  # accepted but unused in simplified flow
+    dataset_resource_override: Optional[str] = None
+    if isinstance(dataset_id, str) and dataset_id.startswith("accounts/"):
+        # Caller passed a fully-qualified dataset; capture it for body and keep only terminal id for printing
+        dataset_resource_override = dataset_id
+        dataset_id = _extract_terminal_segment(dataset_id)
     if not dataset_id:
         # Prefer explicit --dataset-jsonl, else attempt to extract from the selected test's data loader or input_dataset.
@@ -573,7 +583,7 @@ def create_rft_command(args) -> int:
                             print(f"Warning: dataset builder failed: {e}")
         if not dataset_jsonl:
             print(
-                "Error: Could not determine dataset. Provide --dataset-id or --dataset-jsonl, or ensure a JSONL-based data loader or input_dataset is used in your single discovered test."
+                "Error: Could not determine dataset. Provide --dataset or --dataset-jsonl, or ensure a JSONL-based data loader or input_dataset is used in your single discovered test."
             )
             return 1
@@ -628,6 +638,8 @@ def create_rft_command(args) -> int:
         ("learningRate", "learning_rate"),
         ("maxContextLength", "max_context_length"),
         ("loraRank", "lora_rank"),
+        ("gradientAccumulationSteps", "gradient_accumulation_steps"),
+        ("learningRateWarmupSteps", "learning_rate_warmup_steps"),
         ("acceleratorCount", "accelerator_count"),
         ("region", "region"),
     ]:
@@ -640,14 +652,25 @@ def create_rft_command(args) -> int:
         ("temperature", "temperature"),
         ("topP", "top_p"),
         ("topK", "top_k"),
-        ("maxTokens", "max_tokens"),
-        ("n", "n"),
+        ("maxTokens", "max_output_tokens"),
+        ("n", "response_candidates_count"),
     ]:
         val = getattr(args, arg_name, None)
         if val is not None:
             inference_params[key] = val
-    if getattr(args, "inference_extra_body", None):
-        inference_params["extraBody"] = args.inference_extra_body
+    if getattr(args, "extra_body", None):
+        extra = getattr(args, "extra_body")
+        if isinstance(extra, (dict, list)):
+            try:
+                inference_params["extraBody"] = json.dumps(extra, ensure_ascii=False)
+            except (TypeError, ValueError) as e:
+                print(f"Error: --extra-body dict/list must be JSON-serializable: {e}")
+                return 1
+        elif isinstance(extra, str):
+            inference_params["extraBody"] = extra
+        else:
+            print("Error: --extra-body must be a JSON string or a JSON-serializable dict/list.")
+            return 1
     wandb_config: Optional[Dict[str, Any]] = None
     if getattr(args, "wandb_enabled", False):
@@ -659,9 +682,12 @@ def create_rft_command(args) -> int:
             "runId": getattr(args, "wandb_run_id", None),
         }
+    # Build dataset resource (prefer override when provided)
+    dataset_resource = dataset_resource_override or f"accounts/{account_id}/datasets/{dataset_id}"
     body: Dict[str, Any] = {
-        # "displayName": getattr(args, "display_name", None) or f"{evaluator_id}-rft",
-        "dataset": f"accounts/{account_id}/datasets/{dataset_id}",
+        "displayName": getattr(args, "display_name", None),
+        "dataset": dataset_resource,
         "evaluator": evaluator_resource_name,
         "evalAutoCarveout": bool(getattr(args, "eval_auto_carveout", True)),
         "trainingConfig": training_config,
@@ -670,7 +696,8 @@ def create_rft_command(args) -> int:
         "chunkSize": getattr(args, "chunk_size", None),
         "outputStats": None,
         "outputMetrics": None,
-        "mcpServer": None,
+        "mcpServer": getattr(args, "mcp_server", None),
+        "jobId": getattr(args, "job_id", None),
     }
     # Debug: print minimal summary
     print(f"Prepared RFT job for evaluator '{evaluator_id}' using dataset '{dataset_id}'")

{eval_protocol-0.2.85 → eval_protocol-0.2.87}/eval_protocol/cli_commands/local_test.py RENAMED Viewed

@@ -2,6 +2,7 @@ import argparse
 import os
 import subprocess
 import sys
+import shlex
 from typing import List
 from .upload import _discover_tests, _prompt_select
@@ -24,16 +25,15 @@ def _run_pytest_host(pytest_target: str) -> int:
     return proc.returncode
-def _build_docker_image(dockerfile_path: str, image_tag: str) -> bool:
+def _build_docker_image(dockerfile_path: str, image_tag: str, build_extras: List[str] | None = None) -> bool:
     context_dir = os.path.dirname(dockerfile_path)
     print(f"Building Docker image '{image_tag}' from {dockerfile_path} ...")
     try:
-        proc = subprocess.run(
-            ["docker", "build", "-t", image_tag, "-f", dockerfile_path, context_dir],
-            stdout=subprocess.PIPE,
-            stderr=subprocess.STDOUT,
-            text=True,
-        )
+        base_cmd = ["docker", "build"]
+        if build_extras:
+            base_cmd += build_extras
+        base_cmd += ["-t", image_tag, "-f", dockerfile_path, context_dir]
+        proc = subprocess.run(base_cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True)
         print(proc.stdout)
         return proc.returncode == 0
     except FileNotFoundError:
@@ -41,7 +41,9 @@ def _build_docker_image(dockerfile_path: str, image_tag: str) -> bool:
         return False
-def _run_pytest_in_docker(project_root: str, image_tag: str, pytest_target: str) -> int:
+def _run_pytest_in_docker(
+    project_root: str, image_tag: str, pytest_target: str, run_extras: List[str] | None = None
+) -> int:
     workdir = "/workspace"
     # Host HOME logs directory to map into container
     host_home = os.path.expanduser("~")
@@ -73,6 +75,8 @@ def _run_pytest_in_docker(project_root: str, image_tag: str, pytest_target: str)
         cmd += ["--user", f"{uid}:{gid}"]
     except Exception:
         pass
+    if run_extras:
+        cmd += run_extras
     cmd += [image_tag, "pytest", pytest_target, "-vs"]
     print("Running in Docker:", " ".join(cmd))
     try:
@@ -91,11 +95,16 @@ def local_test_command(args: argparse.Namespace) -> int:
     entry = getattr(args, "entry", None)
     if entry:
         if "::" in entry:
-            file_part = entry.split("::", 1)[0]
+            file_part, func_part = entry.split("::", 1)
             file_path = (
                 file_part if os.path.isabs(file_part) else os.path.abspath(os.path.join(project_root, file_part))
             )
-            pytest_target = entry
+            # Convert to project-relative like the non-:: path
+            try:
+                rel = os.path.relpath(file_path, project_root)
+            except Exception:
+                rel = file_path
+            pytest_target = f"{rel}::{func_part}"
         else:
             file_path = entry if os.path.isabs(entry) else os.path.abspath(os.path.join(project_root, entry))
             # Use path relative to project_root when possible
@@ -126,6 +135,10 @@ def local_test_command(args: argparse.Namespace) -> int:
         pytest_target = rel
     ignore_docker = bool(getattr(args, "ignore_docker", False))
+    build_extras_str = getattr(args, "docker_build_extra", "") or ""
+    run_extras_str = getattr(args, "docker_run_extra", "") or ""
+    build_extras = shlex.split(build_extras_str) if build_extras_str else []
+    run_extras = shlex.split(run_extras_str) if run_extras_str else []
     if ignore_docker:
         if not pytest_target:
             print("Error: Failed to resolve a pytest target to run.")
@@ -146,14 +159,14 @@ def local_test_command(args: argparse.Namespace) -> int:
         except Exception:
             pass
         image_tag = "ep-evaluator:local"
-        ok = _build_docker_image(dockerfiles[0], image_tag)
+        ok = _build_docker_image(dockerfiles[0], image_tag, build_extras=build_extras)
         if not ok:
             print("Docker build failed. See logs above.")
             return 1
         if not pytest_target:
             print("Error: Failed to resolve a pytest target to run.")
             return 1
-        return _run_pytest_in_docker(project_root, image_tag, pytest_target)
+        return _run_pytest_in_docker(project_root, image_tag, pytest_target, run_extras=run_extras)
     # No Dockerfile: run on host
     if not pytest_target:

{eval_protocol-0.2.85 → eval_protocol-0.2.87}/eval_protocol/fireworks_rft.py RENAMED Viewed

@@ -8,6 +8,7 @@ import time
 import uuid
 from pathlib import Path
 from typing import Any, Callable, Dict, Iterable, Optional, Tuple
+from urllib.parse import urlencode
 import requests
@@ -186,6 +187,14 @@ def create_reinforcement_fine_tuning_job(
     body: Dict[str, Any],
 ) -> Dict[str, Any]:
     url = f"{api_base.rstrip('/')}/v1/accounts/{account_id}/reinforcementFineTuningJobs"
+    # Move optional jobId from body to query parameter if provided
+    job_id = body.get("jobId")
+    if isinstance(job_id, str):
+        job_id = job_id.strip()
+    if job_id:
+        # Remove from body and append as query param
+        body.pop("jobId", None)
+        url = f"{url}?{urlencode({'reinforcementFineTuningJobId': job_id})}"
     headers = {
         "Authorization": f"Bearer {api_key}",
         "Content-Type": "application/json",

eval-protocol 0.2.85__tar.gz → 0.2.87__tar.gz

eval-protocol 0.2.85tar.gz → 0.2.87tar.gz