PyPI - benchmax - Versions diffs - 0.1.2.dev0__tar.gz → 0.1.2.dev2__tar.gz - Mend

benchmax 0.1.2.dev0tar.gz → 0.1.2.dev2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (47) hide show

{benchmax-0.1.2.dev0 → benchmax-0.1.2.dev2}/PKG-INFO RENAMED Viewed

@@ -1,14 +1,16 @@
 Metadata-Version: 2.4
 Name: benchmax
-Version: 0.1.2.dev0
+Version: 0.1.2.dev2
 Summary: Framework-Agnostic RL Environments for LLM Fine-Tuning
 Author: cgft.io
-Requires-Python: <3.14,>=3.12
+Classifier: Programming Language :: Python :: 3
+Classifier: Operating System :: OS Independent
+Requires-Python: ==3.12.*
 Description-Content-Type: text/markdown
 License-File: LICENSE
 Requires-Dist: aiohttp>=3.13.1
 Requires-Dist: asyncio>=4.0.0
-Requires-Dist: datasets>=4.3.0
+Requires-Dist: datasets>=4.0.0
 Requires-Dist: fastmcp~=2.12.0
 Requires-Dist: pyjwt>=2.10.1
 Requires-Dist: skypilot~=0.8.1
@@ -36,24 +38,19 @@ Dynamic: license-file
 ## 📌 News
-- **[26 Oct 2025]** 🎉 Added support for easy multi-node parallelization across all major cloud providers using [SkyPilot](https://github.com/skypilot-org/skypilot)
-- **[26 Oct 2025]** 🎉 Integration with [SkyRL](https://github.com/NovaSky-AI/SkyRL) for distributed RL training across clusters
+- **[29 Oct 2025]** 🎉 Added support for easy multi-node parallelization across all major cloud providers using [SkyPilot](https://github.com/skypilot-org/skypilot)
+- **[29 Oct 2025]** 🎉 Integration with [SkyRL](https://github.com/NovaSky-AI/SkyRL) for distributed RL training across clusters
 - **[Upcoming]** 🛠️ Integration with Tinker API.
 ## 📘 Quickstart
 **Example: Multi-node parallelization of Excel Env with SkyRL and SkyPilot**
-RL environments can be computationally expensive to run (e.g. codegen). To handle this workload efficiently, we distribute rollouts across multiple nodes using **SkyPilot**, horizontally scaling `benchmax` across cloud providers like GCP, AWS, Azure, etc.
+RL environments can be computationally expensive to run (e.g. running tests). To handle these workloads efficiently, we distribute rollouts across multiple nodes using **SkyPilot**, horizontally scaling `benchmax` across cloud providers like GCP, AWS, Azure, etc.
-**SkyRL** is a training framework `benchmax` is currently integrated with. Use our ***SkyRL*** integration to RL finetune Qwen-2.5 to do spreadsheet manipulation using a excel MCP parallelized across multiple nodes. The environment is defined at `benchmax.envs.excel.excel_env.ExcelEnvSkypilot`
+**SkyRL** is a training framework `benchmax` is currently integrated with. Use our ***SkyRL*** integration to RL finetune Qwen-2.5 to do spreadsheet manipulation using a excel MCP parallelized across multiple nodes. The environment is defined in [`benchmax.envs.excel.excel_env.ExcelEnvSkypilot`](/src/benchmax/envs/excel/excel_env.py)
-1. **Installation**
-    `pip install benchmax[excel,skyrl]`
-2. **Prepare the dataset**
+1. **Prepare the dataset**
     ```bash
     uv run src/benchmax/adapters/skyrl/benchmax_data_process.py \
@@ -64,13 +61,13 @@ RL environments can be computationally expensive to run (e.g. codegen). To handl
     Note: We are using `ExcelEnvLocal` instead of `ExcelEnvSkypilot` because the MCP is only used for listing tools to prepare the system prompt.
-3. **Run training and parallelize Excel environment**
+2. **Run training and parallelize Excel environment**
     ```bash
-    sh examples/skyrl/run_benchmax_excel.sh
+    bash examples/skyrl/run_benchmax_excel.sh
     ```
-This excel env example will spin up 5 nodes with 4 servers per node (total 20 MCP server in parallel). For more details, check out [multi-node parallelization](/src/benchmax/envs/mcp/README.md) and [SkyRL integration](/examples/skyrl/README.md).
+This excel env example will spin up 5 nodes with 20 servers per node (total 100 MCP server in parallel). For more details, check out [multi-node parallelization](/src/benchmax/envs/mcp/README.md) and [SkyRL integration](/examples/skyrl/README.md).
 ## ℹ️ Overview

{benchmax-0.1.2.dev0 → benchmax-0.1.2.dev2}/README.md RENAMED Viewed

@@ -20,24 +20,19 @@
 ## 📌 News
-- **[26 Oct 2025]** 🎉 Added support for easy multi-node parallelization across all major cloud providers using [SkyPilot](https://github.com/skypilot-org/skypilot)
-- **[26 Oct 2025]** 🎉 Integration with [SkyRL](https://github.com/NovaSky-AI/SkyRL) for distributed RL training across clusters
+- **[29 Oct 2025]** 🎉 Added support for easy multi-node parallelization across all major cloud providers using [SkyPilot](https://github.com/skypilot-org/skypilot)
+- **[29 Oct 2025]** 🎉 Integration with [SkyRL](https://github.com/NovaSky-AI/SkyRL) for distributed RL training across clusters
 - **[Upcoming]** 🛠️ Integration with Tinker API.
 ## 📘 Quickstart
 **Example: Multi-node parallelization of Excel Env with SkyRL and SkyPilot**
-RL environments can be computationally expensive to run (e.g. codegen). To handle this workload efficiently, we distribute rollouts across multiple nodes using **SkyPilot**, horizontally scaling `benchmax` across cloud providers like GCP, AWS, Azure, etc.
+RL environments can be computationally expensive to run (e.g. running tests). To handle these workloads efficiently, we distribute rollouts across multiple nodes using **SkyPilot**, horizontally scaling `benchmax` across cloud providers like GCP, AWS, Azure, etc.
-**SkyRL** is a training framework `benchmax` is currently integrated with. Use our ***SkyRL*** integration to RL finetune Qwen-2.5 to do spreadsheet manipulation using a excel MCP parallelized across multiple nodes. The environment is defined at `benchmax.envs.excel.excel_env.ExcelEnvSkypilot`
+**SkyRL** is a training framework `benchmax` is currently integrated with. Use our ***SkyRL*** integration to RL finetune Qwen-2.5 to do spreadsheet manipulation using a excel MCP parallelized across multiple nodes. The environment is defined in [`benchmax.envs.excel.excel_env.ExcelEnvSkypilot`](/src/benchmax/envs/excel/excel_env.py)
-1. **Installation**
-    `pip install benchmax[excel,skyrl]`
-2. **Prepare the dataset**
+1. **Prepare the dataset**
     ```bash
     uv run src/benchmax/adapters/skyrl/benchmax_data_process.py \
@@ -48,13 +43,13 @@ RL environments can be computationally expensive to run (e.g. codegen). To handl
     Note: We are using `ExcelEnvLocal` instead of `ExcelEnvSkypilot` because the MCP is only used for listing tools to prepare the system prompt.
-3. **Run training and parallelize Excel environment**
+2. **Run training and parallelize Excel environment**
     ```bash
-    sh examples/skyrl/run_benchmax_excel.sh
+    bash examples/skyrl/run_benchmax_excel.sh
     ```
-This excel env example will spin up 5 nodes with 4 servers per node (total 20 MCP server in parallel). For more details, check out [multi-node parallelization](/src/benchmax/envs/mcp/README.md) and [SkyRL integration](/examples/skyrl/README.md).
+This excel env example will spin up 5 nodes with 20 servers per node (total 100 MCP server in parallel). For more details, check out [multi-node parallelization](/src/benchmax/envs/mcp/README.md) and [SkyRL integration](/examples/skyrl/README.md).
 ## ℹ️ Overview

benchmax-0.1.2.dev2/pyproject.toml ADDED Viewed

@@ -0,0 +1,62 @@
+[project]
+name = "benchmax"
+version = "0.1.2.dev2"
+description = "Framework-Agnostic RL Environments for LLM Fine-Tuning"
+readme = "README.md"
+authors = [{ name = "cgft.io" }]
+requires-python = "==3.12.*"
+dependencies = [
+    "aiohttp>=3.13.1",
+    "asyncio>=4.0.0",
+    "datasets>=4.0.0",
+    "fastmcp~=2.12.0",
+    "pyjwt>=2.10.1",
+    "skypilot~=0.8.1",
+]
+classifiers = [
+    "Programming Language :: Python :: 3",
+    "Operating System :: OS Independent",
+]
+[build-system]
+requires = ["setuptools>=61.0", "wheel"]
+build-backend = "setuptools.build_meta"
+[tool.setuptools.packages.find]
+where = ["src"]
+[dependency-groups]
+dev = [
+    "pytest>=8.4.2",
+    "pytest-asyncio>=1.2.0",
+    "python-dotenv>=1.2.1",
+    "ruff>=0.14.2",
+]
+skypilot = [
+    "skypilot[aws,gcp,azure]~=0.8.1", # Change this to your cloud provider
+    "pip>=25.3",                      # Added as needed for skypilot launch
+    "msrestazure>=0.6.4.post1",
+]
+skyrl = [
+    "grpcio>=1.60.0",
+    "hydra-core>=1.3.2",
+    "omegaconf>=2.3.0",
+    "ray>=2.48.0",
+    "skyrl-gym>=0.1.1",
+    "skyrl-train[vllm]>=0.2.0",
+]
+excel = ["openpyxl>=3.1.5"]
+excel-mac-windows = ["openpyxl>=3.1.5", "xlwings>=0.33.16"]
+crm = ["python-dateutil>=2.9.0.post0", "simple-salesforce>=1.12.9"]
+[tool.uv]
+conflicts = [[{ group = "skypilot" }, { group = "skyrl" }]]
+[tool.uv.pip]
+extra = ["dev", "skypilot", "skyrl", "excel", "excel-mac-windows", "crm"]
+[tool.uv.extra-build-dependencies]
+flash-attn = [{ requirement = "torch", match-runtime = true }]
+[tool.uv.extra-build-variables]
+flash-attn = { FLASH_ATTENTION_SKIP_CUDA_BUILD = "TRUE" }

{benchmax-0.1.2.dev0 → benchmax-0.1.2.dev2}/src/benchmax/adapters/benchmax_wrapper.py RENAMED Viewed

@@ -6,6 +6,9 @@ from typing import Dict, List, Any, Optional, Type, Union
 from benchmax.envs.base_env import BaseEnv
+# 5 minutes timeout in seconds
+RAY_GET_TIMEOUT = 300
 class BenchmaxEnv:
     """
@@ -34,6 +37,10 @@ class BenchmaxEnv:
     async def init_rollout(self, rollout_id: str, **rollout_args: Any) -> None:
         return await self._env.init_rollout(rollout_id=rollout_id, **rollout_args)
+    @ray.method
+    async def release_rollout(self, rollout_id: str) -> None:
+        return await self._env.release_rollout(rollout_id)
     @ray.method
     async def copy_to_workspace(
         self, rollout_id: str, src_path: Path, dst_filename: Optional[str] = None
@@ -95,7 +102,7 @@ class BenchmaxEnvWrapper:
         obj_ref: ray.ObjectRef[str] = self._actor.get_system_prompt.remote(
             add_tool_defs=add_tool_defs  # type: ignore
         )
-        return ray.get(obj_ref)
+        return ray.get(obj_ref, timeout=RAY_GET_TIMEOUT)
     async def get_system_prompt(self, add_tool_defs: bool = True) -> str:
         """Async method to get system prompt."""
@@ -113,7 +120,7 @@ class BenchmaxEnvWrapper:
     def list_tools_sync(self) -> List[Any]:
         """Sync method to list available tools."""
         obj_ref: ray.ObjectRef[List[Any]] = self._actor.list_tools.remote()  # type: ignore
-        return ray.get(obj_ref)
+        return ray.get(obj_ref, timeout=RAY_GET_TIMEOUT)
     # === Shutdown ===
     async def shutdown(self) -> None:
@@ -124,9 +131,9 @@ class BenchmaxEnvWrapper:
     def shutdown_sync(self) -> None:
         """Sync method to shutdown the environment."""
         obj_ref: ray.ObjectRef[Any] = self._actor.shutdown.remote()
-        ray.get(obj_ref)
+        ray.get(obj_ref, timeout=RAY_GET_TIMEOUT)
-    # === Init Rollout ===
+    # === Rollout Lifecycle ===
     async def init_rollout(self, rollout_id: str, **rollout_args: Any) -> None:
         """Async method to initialize a rollout."""
         obj_ref: ray.ObjectRef[Any] = self._actor.init_rollout.remote(
@@ -139,7 +146,15 @@ class BenchmaxEnvWrapper:
         obj_ref: ray.ObjectRef[Any] = self._actor.init_rollout.remote(
             rollout_id, **rollout_args
         )
-        ray.get(obj_ref)
+        ray.get(obj_ref, timeout=RAY_GET_TIMEOUT)
+    async def release_rollout(self, rollout_id: str) -> None:
+        obj_ref: ray.ObjectRef[Any] = self._actor.release_rollout.remote(rollout_id)
+        await obj_ref
+    def release_rollout_sync(self, rollout_id: str) -> None:
+        obj_ref: ray.ObjectRef[Any] = self._actor.release_rollout.remote(rollout_id)
+        ray.get(obj_ref, timeout=RAY_GET_TIMEOUT)
     # === Run Tool ===
     async def run_tool(self, rollout_id: str, tool_name: str, **tool_args: Any) -> Any:
@@ -154,7 +169,7 @@ class BenchmaxEnvWrapper:
         obj_ref: ray.ObjectRef[Any] = self._actor.run_tool.remote(
             rollout_id, tool_name, **tool_args
         )
-        return ray.get(obj_ref)
+        return ray.get(obj_ref, timeout=RAY_GET_TIMEOUT)
     # === Copy to Workspace ===
     async def copy_to_workspace(
@@ -183,7 +198,7 @@ class BenchmaxEnvWrapper:
             Path(src_path),
             dst_filename=dst_filename,  # type: ignore
         )
-        ray.get(obj_ref)
+        ray.get(obj_ref, timeout=RAY_GET_TIMEOUT)
     # === Copy Content to Workspace ===
     async def copy_content_to_workspace(
@@ -212,7 +227,7 @@ class BenchmaxEnvWrapper:
             src_content,
             dst_filename=dst_filename,  # type: ignore
         )
-        ray.get(obj_ref)
+        ray.get(obj_ref, timeout=RAY_GET_TIMEOUT)
     # === Copy from Workspace ===
     async def copy_from_workspace(
@@ -237,7 +252,7 @@ class BenchmaxEnvWrapper:
         obj_ref: ray.ObjectRef[Any] = self._actor.copy_from_workspace.remote(
             rollout_id, src_filename, Path(dst_path)
         )
-        ray.get(obj_ref)
+        ray.get(obj_ref, timeout=RAY_GET_TIMEOUT)
     # === Compute Reward ===
     async def compute_reward(
@@ -264,4 +279,4 @@ class BenchmaxEnvWrapper:
         obj_ref: ray.ObjectRef[Dict[str, float]] = self._actor.compute_reward.remote(
             rollout_id, completion, ground_truth, **kwargs
         )  # type: ignore
-        return ray.get(obj_ref)
+        return ray.get(obj_ref, timeout=RAY_GET_TIMEOUT)

{benchmax-0.1.2.dev0 → benchmax-0.1.2.dev2}/src/benchmax/adapters/skyrl/benchmax_data_process.py RENAMED Viewed

@@ -18,6 +18,7 @@ from benchmax.envs.base_env import BaseEnv
 # Set logging level to WARNING and above
 logging.basicConfig(level=logging.WARNING)
 def load_class(dotted_path: str) -> Type[BaseEnv]:
     """
     Load and return the class specified by `dotted_path`.
@@ -40,18 +41,58 @@ def load_class(dotted_path: str) -> Type[BaseEnv]:
     return cls
+def get_canonical_class_name(cls: Type[BaseEnv]) -> str:
+    """
+    Get the canonical class name, removing local/skypilot prefix/suffix if the parent class
+    has the same name without that prefix/suffix.
+    """
+    class_name = cls.__name__
+    # Check for prefixes/suffixes to strip
+    prefixes = ["local", "skypilot"]
+    suffixes = ["local", "skypilot"]
+    # Try to find a matching parent class without the prefix/suffix
+    for base_cls in cls.__bases__:
+        base_name = base_cls.__name__
+        # Check if current class has prefix that base doesn't
+        for prefix in prefixes:
+            if class_name.lower().startswith(
+                prefix
+            ) and not base_name.lower().startswith(prefix):
+                # Check if removing prefix gives us the base name
+                stripped = class_name[len(prefix) :]
+                if stripped == base_name:
+                    return base_name
+        # Check if current class has suffix that base doesn't
+        for suffix in suffixes:
+            if class_name.lower().endswith(suffix) and not base_name.lower().endswith(
+                suffix
+            ):
+                # Check if removing suffix gives us the base name
+                stripped = class_name[: -len(suffix)]
+                if stripped == base_name:
+                    return base_name
+    # No matching parent found, return original name
+    return class_name
 async def get_system_prompt(cls: Type[BaseEnv]) -> str:
     """Setup env and get system prompt in async context."""
     # Initialize env with num_local_servers=1 if supported
     init_signature = inspect.signature(cls.__init__)
-    if 'num_local_servers' in init_signature.parameters:
-        env = cls(num_local_servers=1) # type: ignore
+    if "num_local_servers" in init_signature.parameters:
+        env = cls(num_local_servers=1)  # type: ignore
     else:
         env = cls()
     # Get system prompt (async function)
     prompt = await env.get_system_prompt(add_tool_defs=True)
     await env.shutdown()
     return prompt
@@ -96,11 +137,16 @@ if __name__ == "__main__":
     print("Getting system prompt...", flush=True)
     system_prompt = asyncio.run(get_system_prompt(benchmax_cls))
+    # Get canonical class name (strips local/skypilot if parent matches)
+    canonical_name = get_canonical_class_name(benchmax_cls)
     def process_example(example):
         """Single mapping function that does all processing."""
         # First apply dataset-specific preprocessing
-        standardized = benchmax_cls.dataset_preprocess(example, dataset_path=dataset_path)
+        standardized = benchmax_cls.dataset_preprocess(
+            example, dataset_path=dataset_path
+        )
         # Then format as multiturn prompt
         prompt = [
             {
@@ -112,13 +158,13 @@ if __name__ == "__main__":
         result = {
             **standardized,
             "prompt": prompt,
-            "env_class": benchmax_cls.__name__,
-            "data_source": benchmax_cls.__name__,
+            "env_class": canonical_name,
+            "data_source": canonical_name,
         }
         # Remove keys with None values
         result = {k: v for k, v in result.items() if v is not None}
         return result
     print("Processing examples...", flush=True)
@@ -145,4 +191,4 @@ if __name__ == "__main__":
     print(f"Saving to {args.local_dir}...", flush=True)
     local_dir = Path(args.local_dir)
     train_dataset.to_parquet(local_dir / "train.parquet")
-    test_dataset.to_parquet(local_dir / "test.parquet")
+    test_dataset.to_parquet(local_dir / "test.parquet")

benchmax 0.1.2.dev0__tar.gz → 0.1.2.dev2__tar.gz

benchmax 0.1.2.dev0tar.gz → 0.1.2.dev2tar.gz