benchmax 0.1.1.dev1__tar.gz → 0.1.1.dev2__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- benchmax-0.1.1.dev2/PKG-INFO +401 -0
- benchmax-0.1.1.dev2/README.md +377 -0
- {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/pyproject.toml +2 -2
- benchmax-0.1.1.dev1/PKG-INFO +0 -43
- benchmax-0.1.1.dev1/README.md +0 -19
- {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/LICENSE +0 -0
- {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/adapters/__init__.py +0 -0
- {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/adapters/verifiers/verifiers_adapters.py +0 -0
- {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/adapters/verl/benchmax_data_process.py +0 -0
- {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/__init__.py +0 -0
- {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/base_env.py +0 -0
- {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/bounded_dict.py +0 -0
- {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/crm/README.md +0 -0
- {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/crm/crm_env.py +0 -0
- {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/crm/salesforce_mcp.py +0 -0
- {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/crm/salesforce_requirements.txt +0 -0
- {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/excel/README.md +0 -0
- {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/excel/data_utils.py +0 -0
- {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/excel/excel_code_runner_mcp.py +0 -0
- {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/excel/excel_env.py +0 -0
- {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/excel/excel_utils.py +0 -0
- {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/local_mcp_env.py +0 -0
- {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/math/README.md +0 -0
- {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/math/math_env.py +0 -0
- {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/types.py +0 -0
- {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/wikipedia/README.md +0 -0
- {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/wikipedia/utils.py +0 -0
- {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/wikipedia/wiki_env.py +0 -0
- {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/prompts/tools.py +0 -0
|
@@ -0,0 +1,401 @@
|
|
|
1
|
+
Metadata-Version: 2.3
|
|
2
|
+
Name: benchmax
|
|
3
|
+
Version: 0.1.1.dev2
|
|
4
|
+
Summary: Framework-Agnostic RL Environments for LLM Fine-Tuning
|
|
5
|
+
Author: cgft.io
|
|
6
|
+
Requires-Python: >=3.11,<3.13
|
|
7
|
+
Classifier: Programming Language :: Python :: 3
|
|
8
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
9
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
10
|
+
Provides-Extra: crm
|
|
11
|
+
Provides-Extra: excel
|
|
12
|
+
Provides-Extra: excel-linux
|
|
13
|
+
Provides-Extra: verifiers
|
|
14
|
+
Provides-Extra: verl
|
|
15
|
+
Requires-Dist: fastmcp (>=2.10.0,<2.11.0)
|
|
16
|
+
Requires-Dist: openpyxl (==3.1.5) ; extra == "excel-linux" or extra == "excel"
|
|
17
|
+
Requires-Dist: python-dateutil (>=2.9.0,<2.10.0) ; extra == "crm"
|
|
18
|
+
Requires-Dist: sglang[all] (==0.4.9) ; extra == "verl"
|
|
19
|
+
Requires-Dist: simple-salesforce (>=1.12.3) ; extra == "crm"
|
|
20
|
+
Requires-Dist: verifiers[train] (>=0.1.1,<0.2.0) ; extra == "verifiers"
|
|
21
|
+
Requires-Dist: verl-cgft-fork (==0.4.1.dev1) ; extra == "verl"
|
|
22
|
+
Requires-Dist: xlwings (==0.33.15) ; extra == "excel"
|
|
23
|
+
Description-Content-Type: text/markdown
|
|
24
|
+
|
|
25
|
+
<picture>
|
|
26
|
+
<img alt="Benchmax" src="./static/benchmax.png" width="full">
|
|
27
|
+
</picture>
|
|
28
|
+
|
|
29
|
+
## benchmax: Framework-Agnostic RL Environments for LLM Fine-Tuning
|
|
30
|
+
*A lightweight, training-framework agnostic library for defining, running, and parallelizing environments, to fine-tune OSS LLMs with reinforcement learning.*
|
|
31
|
+
<div align="center">
|
|
32
|
+
</div>
|
|
33
|
+
<div id="badges" align="center">
|
|
34
|
+
<a href="https://cgft.io">
|
|
35
|
+
<img src="https://img.shields.io/badge/cgft.io-blue?style=for-the-badge" alt="Website"/>
|
|
36
|
+
</a>
|
|
37
|
+
<a href="https://x.com/cgftlabs">
|
|
38
|
+
<img src="https://img.shields.io/badge/Follow @cgftlabs-black?style=for-the-badge&logo=X&logoColor=white" alt="@cgftlabs"/>
|
|
39
|
+
</a>
|
|
40
|
+
</div>
|
|
41
|
+
<div align="center" style="line-height: 1;">
|
|
42
|
+
<a href="https://github.com/girishbarca/benchmax/blob/main/LICENSE"><img alt="License" src="https://img.shields.io/badge/License-Apache_2.0-blue.svg"/></a>
|
|
43
|
+
</div>
|
|
44
|
+
|
|
45
|
+
## Overview
|
|
46
|
+
|
|
47
|
+
`benchmax` comes with:
|
|
48
|
+
|
|
49
|
+
- A collection of ready-to-use reinforcement learning (RL) environments for LLM fine-tuning ranging from multi-hop search to spreadsheet manipulation to CRM agents
|
|
50
|
+
- An easy to define, compose, and parallelize your own environments, including leveraging the existing ecosystem of MCP servers
|
|
51
|
+
- Built-in integrations with popular RL training libraries (verl, verifiers, etc.). `benchmax` is trainer-agnostic by design
|
|
52
|
+
|
|
53
|
+
Define your environment as:
|
|
54
|
+
|
|
55
|
+
1. A **toolset** (LLM calls, external APIs, calculators, MCPs, etc.).
|
|
56
|
+
2. **Output parsing** logic to extract structured observations.
|
|
57
|
+
3. **Reward functions** to score model outputs.
|
|
58
|
+
|
|
59
|
+
Rollout management, parallel execution, etc. comes out of the box.
|
|
60
|
+
|
|
61
|
+
⭐ Star our repository to show your support!
|
|
62
|
+
|
|
63
|
+
## 💡 Core Features
|
|
64
|
+
|
|
65
|
+
**Built-in examples & templates**
|
|
66
|
+
|
|
67
|
+
Get started with ready to use recipes, from Wikipedia search to spreadsheet manipulation. Easy to copy, customize, and extend. And yes, more are on the way.
|
|
68
|
+
|
|
69
|
+
**Trainer Integrations**
|
|
70
|
+
|
|
71
|
+
Use your own trainer or training framework - no lock-in. `benchmax` is already Integrated into verl and verifiers, with more integrations (SkyRL, etc.) coming soon!
|
|
72
|
+
|
|
73
|
+
**MCP Support**
|
|
74
|
+
Tap into the growing MCP ecosystem and integrate them as tools within your environments.
|
|
75
|
+
|
|
76
|
+
**Parallel execution & state management**
|
|
77
|
+
|
|
78
|
+
- Local multi‐process pool
|
|
79
|
+
- State is isolated across roll-outs (e.g. editing files on local filesystem, etc.)
|
|
80
|
+
- Multi-Node Parallelization (Coming soon!)
|
|
81
|
+
|
|
82
|
+
## 📘 Quickstart
|
|
83
|
+
|
|
84
|
+
**Example: Math Question Answering with a Calculator MCP**
|
|
85
|
+
|
|
86
|
+
**verl** is a training framework `benchmax` is currently integrated with. Use our ***verl*** integration to RL finetune Qwen-3 to do math using a calculator MCP (https://github.com/githejie/mcp-server-calculator). The environment is defined at `benchmax.envs.math.math_env.MathEnv`
|
|
87
|
+
|
|
88
|
+
1. **Installation**
|
|
89
|
+
|
|
90
|
+
`pip install benchmax[verl]`
|
|
91
|
+
|
|
92
|
+
1. **Prepare the dataset**
|
|
93
|
+
|
|
94
|
+
```bash
|
|
95
|
+
python benchmax/adapters/verl/benchmax_data_process.py \
|
|
96
|
+
--local_dir ~/data/math \
|
|
97
|
+
--dataset_name dawidmt/arithmetic50 \
|
|
98
|
+
--env_path benchmax.envs.math.math_env.MathEnv
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
2. **Run training**
|
|
102
|
+
|
|
103
|
+
```bash
|
|
104
|
+
sh examples/verl/run_qwen2.5-3b_benchmax_math.sh
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
This math environment is just a quick example. Explore some of the more complex environments like `excel`, `crm` in `benchmax/envs`.
|
|
108
|
+
|
|
109
|
+
## 🌐 Creating & Training with Environments
|
|
110
|
+
|
|
111
|
+
### What is an environment?
|
|
112
|
+
|
|
113
|
+
An environment consists of:
|
|
114
|
+
|
|
115
|
+
- A list of tools that an LLM can call
|
|
116
|
+
- A list of reward functions that evaluate the quality & correctness of the model's final output.
|
|
117
|
+
|
|
118
|
+
We also support MCP servers natively, allowing you to easily leverage the many servers built by the community.
|
|
119
|
+
|
|
120
|
+
### Pre-built environments
|
|
121
|
+
|
|
122
|
+
Ready-to-use environments with pre-configured tools and reward functions.
|
|
123
|
+
|
|
124
|
+
- [CRM](benchmax/envs/crm/README.md)
|
|
125
|
+
- [Excel](benchmax/envs/excel/README.md)
|
|
126
|
+
- [Math](benchmax/envs/math/README.md)
|
|
127
|
+
- [Wikipedia](benchmax/envs/wikipedia/README.md)
|
|
128
|
+
|
|
129
|
+
### How do I create a custom environment?
|
|
130
|
+
|
|
131
|
+
<details>
|
|
132
|
+
<summary>With existing MCP Servers</summary>
|
|
133
|
+
|
|
134
|
+
To create a custom environment using an MCP server (like a calculator, browser, or spreadsheet), you can extend `LocalMCPEnv`. Here's a quick step-by-step guide using `benchmax.envs.math.math_env.MathEnv` as an example.
|
|
135
|
+
|
|
136
|
+
### 1. **Define a System Prompt**
|
|
137
|
+
|
|
138
|
+
This prompt guides the LLM’s behavior. It can include any instruction, such as how to format the answer or when to use tools.
|
|
139
|
+
|
|
140
|
+
```python
|
|
141
|
+
SYSTEM_PROMPT = """Please use the tools provided to do any computation.
|
|
142
|
+
Write your complete answer on the final line only, within the xml tags <answer></answer>.
|
|
143
|
+
"""
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
### 2. **Configure MCP Server(s)**
|
|
147
|
+
|
|
148
|
+
Define the MCP servers to be launched. You can configure one or more:
|
|
149
|
+
|
|
150
|
+
```python
|
|
151
|
+
MCP_CONFIG = """
|
|
152
|
+
{
|
|
153
|
+
"mcpServers": {
|
|
154
|
+
"server-name": {
|
|
155
|
+
"command": "uvx",
|
|
156
|
+
"args": ["mcp_server_calculator"]
|
|
157
|
+
}
|
|
158
|
+
}
|
|
159
|
+
}
|
|
160
|
+
"""
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
### 3. **Write a Reward Function**
|
|
164
|
+
|
|
165
|
+
The reward function evaluates how "correct" the model's output is, based on structured output. Here’s a simple XML-based example:
|
|
166
|
+
|
|
167
|
+
Note that `**kwargs` contains all the other fields in your dataset, so feel free to use them in `reward_func` calculations.
|
|
168
|
+
|
|
169
|
+
```python
|
|
170
|
+
def reward_func(prompt, completion, ground_truth, workspace, **kwargs):
|
|
171
|
+
m = re.search(r'<answer>(.*?)</answer>', completion, flags=re.IGNORECASE | re.DOTALL)
|
|
172
|
+
if not m:
|
|
173
|
+
return 0.0
|
|
174
|
+
answer_text = unescape(m.group(1)).strip().lower()
|
|
175
|
+
return float(ground_truth.lower() == answer_text)
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
### 4. Define **`dataset_preprocess`**
|
|
179
|
+
|
|
180
|
+
If your dataset is not already standardized, implement this method to convert a raw example into a standardized one with:
|
|
181
|
+
|
|
182
|
+
- `"prompt"`: A fully constructed string prompt.
|
|
183
|
+
- `"ground_truth"`: A known correct output (optional depending on reward).
|
|
184
|
+
- `"init_rollout_args"`: Arguments needed to initialize a rollout.
|
|
185
|
+
|
|
186
|
+
Example for our math task:
|
|
187
|
+
|
|
188
|
+
```python
|
|
189
|
+
def dataset_preprocess(self, example: dict) -> StandardizedExample:
|
|
190
|
+
return StandardizedExample(
|
|
191
|
+
prompt=example.get("task", ""),
|
|
192
|
+
ground_truth=example.get("answer", ""),
|
|
193
|
+
init_rollout_args={}
|
|
194
|
+
)
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
<details>
|
|
198
|
+
<summary>Notes on init_rollout_args</summary>
|
|
199
|
+
The `init_rollout_args` dictionary is passed from `dataset_preprocess()` to your environment's `init_rollout()` method. It is used to initialize any **per-example files, resources, or execution context** needed before a rollout begins.
|
|
200
|
+
|
|
201
|
+
Common use cases include:
|
|
202
|
+
|
|
203
|
+
- **Input files**: For environments that manipulate files like spreadsheets, images, or databases, pass the necessary file paths.
|
|
204
|
+
- **Version control**: For code-related tasks, you might pass a `commit_id` to check out the correct code state.
|
|
205
|
+
- **Task-specific settings**: Pass metadata like cell ranges, task IDs, or execution flags.
|
|
206
|
+
|
|
207
|
+
Example:
|
|
208
|
+
|
|
209
|
+
```python
|
|
210
|
+
# Inside dataset_preprocess
|
|
211
|
+
return {
|
|
212
|
+
"prompt": "...",
|
|
213
|
+
"ground_truth": "...",
|
|
214
|
+
"init_rollout_args": {
|
|
215
|
+
"spreadsheet_path": "/path/to/1_001_input.xlsx"
|
|
216
|
+
}
|
|
217
|
+
}
|
|
218
|
+
```
|
|
219
|
+
|
|
220
|
+
Then in your `init_rollout()` method:
|
|
221
|
+
|
|
222
|
+
```python
|
|
223
|
+
def init_rollout(self, rollout_id: str, **rollout_args):
|
|
224
|
+
spreadsheet_path = rollout_args["spreadsheet_path"]
|
|
225
|
+
workspace = self.get_rollout_workspace(rollout_id)
|
|
226
|
+
|
|
227
|
+
# Copy the input file into the rollout's workspace
|
|
228
|
+
shutil.copy(spreadsheet_path, workspace / Path(spreadsheet_path).name)
|
|
229
|
+
```
|
|
230
|
+
|
|
231
|
+
This pattern ensures each rollout starts with the correct inputs and configuration.
|
|
232
|
+
</details>
|
|
233
|
+
|
|
234
|
+
|
|
235
|
+
### 5. **Extend `LocalMCPEnv`**
|
|
236
|
+
|
|
237
|
+
Now bring everything together into a custom environment class:
|
|
238
|
+
|
|
239
|
+
```python
|
|
240
|
+
from envs.local_mcp_env import LocalMCPEnv
|
|
241
|
+
from typing import List
|
|
242
|
+
|
|
243
|
+
class MathEnv(LocalMCPEnv):
|
|
244
|
+
"""Environment for math problems, using local MCP tools."""
|
|
245
|
+
|
|
246
|
+
system_prompt: str = SYSTEM_PROMPT
|
|
247
|
+
reward_funcs: List[RewardFunction] = [reward_func]
|
|
248
|
+
|
|
249
|
+
def __init__(self, **kwargs):
|
|
250
|
+
super().__init__(MCP_CONFIG)
|
|
251
|
+
|
|
252
|
+
def dataset_preprocess(self, example: Any) -> StandardizedExample:
|
|
253
|
+
return StandardizedExample(
|
|
254
|
+
prompt=example.get("task", ""),
|
|
255
|
+
ground_truth=example.get("answer", ""),
|
|
256
|
+
init_rollout_args={}
|
|
257
|
+
)
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
You're done! This environment is now compatible with `benchmax` and can be plugged into any compatible RL trainer.
|
|
261
|
+
</details>
|
|
262
|
+
<details>
|
|
263
|
+
<summary>Extend BaseEnv</summary>
|
|
264
|
+
If you don’t need MCP servers, you can build a environment from scratch by extending `BaseEnv` directly. Here's how to make a minimal math environment with a single tool: an arithmetic evaluator.
|
|
265
|
+
|
|
266
|
+
### 1. **Define the system prompt**
|
|
267
|
+
|
|
268
|
+
This helps instruct the model on how to interact with the tool and format output.
|
|
269
|
+
|
|
270
|
+
```python
|
|
271
|
+
SYSTEM_PROMPT = """Use the `evaluate` tool to perform any computation.
|
|
272
|
+
Write your final answer on the last line inside <answer>...</answer>.
|
|
273
|
+
"""
|
|
274
|
+
```
|
|
275
|
+
|
|
276
|
+
### 2. **Create a reward function**
|
|
277
|
+
|
|
278
|
+
We'll score the model 1.0 if it places the correct answer inside `<answer>...</answer>` tags:
|
|
279
|
+
|
|
280
|
+
```python
|
|
281
|
+
import re
|
|
282
|
+
from html import unescape
|
|
283
|
+
from pathlib import Path
|
|
284
|
+
|
|
285
|
+
def reward_func(prompt: str, completion: str, ground_truth: str, workspace: Path, **kwargs) -> float:
|
|
286
|
+
m = re.search(r'<answer>(.*?)</answer>', completion, flags=re.IGNORECASE | re.DOTALL)
|
|
287
|
+
if not m:
|
|
288
|
+
return 0.0
|
|
289
|
+
answer_text = unescape(m.group(1)).strip().lower()
|
|
290
|
+
return float(answer_text == ground_truth.lower())
|
|
291
|
+
```
|
|
292
|
+
|
|
293
|
+
### 3. **Define your math tool**
|
|
294
|
+
|
|
295
|
+
A simple safe `eval` for math expressions:
|
|
296
|
+
|
|
297
|
+
```python
|
|
298
|
+
def evaluate_expression(expr: str) -> str:
|
|
299
|
+
try:
|
|
300
|
+
result = eval(expr, {"__builtins__": {}})
|
|
301
|
+
return str(result)
|
|
302
|
+
except Exception as e:
|
|
303
|
+
return f"Error: {str(e)}"
|
|
304
|
+
```
|
|
305
|
+
|
|
306
|
+
### 4. **Create the environment class**
|
|
307
|
+
|
|
308
|
+
Bring it all together in a subclass of `BaseEnv`:
|
|
309
|
+
|
|
310
|
+
```python
|
|
311
|
+
class SimpleMathEnv(BaseEnv):
|
|
312
|
+
system_prompt: str = SYSTEM_PROMPT
|
|
313
|
+
_reward_funcs: List[RewardFunction] = [reward_func]
|
|
314
|
+
|
|
315
|
+
def __init__(self):
|
|
316
|
+
eval_tool = ToolDefinition(
|
|
317
|
+
name="evaluate",
|
|
318
|
+
description="Safely evaluate a math expression like '2 + 3 * 4'.",
|
|
319
|
+
input_schema={
|
|
320
|
+
"type": "object",
|
|
321
|
+
"properties": {
|
|
322
|
+
"expr": {
|
|
323
|
+
"type": "string",
|
|
324
|
+
"description": "Math expression to evaluate.",
|
|
325
|
+
},
|
|
326
|
+
},
|
|
327
|
+
"required": ["expr"],
|
|
328
|
+
}
|
|
329
|
+
)
|
|
330
|
+
self.tools: Dict[str, Tuple[ToolDefinition, Callable]] = {
|
|
331
|
+
"evaluate": (eval_tool, evaluate_expression)
|
|
332
|
+
}
|
|
333
|
+
def dataset_preprocess(self, example: dict) -> StandardizedExample:
|
|
334
|
+
return {
|
|
335
|
+
"prompt": f"Question: {example['question']}\n\nWrite your answer below.",
|
|
336
|
+
"ground_truth": example.get("answer", ""),
|
|
337
|
+
"init_rollout_args": {}
|
|
338
|
+
}
|
|
339
|
+
|
|
340
|
+
def list_tools(self) -> List[ToolDefinition]:
|
|
341
|
+
return [tool_def for tool_def, _ in self.tools.values()]
|
|
342
|
+
|
|
343
|
+
def run_tool(self, rollout_id: str, tool_name: str, **tool_args) -> Any:
|
|
344
|
+
_, tool_fn = self.tools[tool_name]
|
|
345
|
+
return tool_fn(**tool_args)
|
|
346
|
+
```
|
|
347
|
+
</details>
|
|
348
|
+
|
|
349
|
+
### How about more complex environments?
|
|
350
|
+
|
|
351
|
+
- Check out our excel spreadsheet RL environment: `benchmax.envs.excel.excel_env.ExcelEnv`
|
|
352
|
+
|
|
353
|
+
### How do I use an environment with my preferred RL Trainer?
|
|
354
|
+
|
|
355
|
+
We currently have integrations with both verifiers and verl. More incoming!
|
|
356
|
+
|
|
357
|
+
[`benchmax` environments with verl](/examples/verl/README.md)
|
|
358
|
+
|
|
359
|
+
[`benchmax` environments with verifiers](/examples/verifiers/README.md)
|
|
360
|
+
|
|
361
|
+
### I want a specific environment
|
|
362
|
+
|
|
363
|
+
Open an issue and tag us & we will look into building you one!
|
|
364
|
+
|
|
365
|
+
---
|
|
366
|
+
|
|
367
|
+
## 🎯 Motivation
|
|
368
|
+
|
|
369
|
+
- **Modularity and Simplicity**:
|
|
370
|
+
|
|
371
|
+
We set out to build a lightweight, modular system for defining RL environments—breaking them down into simple, composable parts: tools, tool output parsing, and reward functions.
|
|
372
|
+
|
|
373
|
+
The goal’s to make it easy for software engineers to build and experiment with RL environments without needing deep RL expertise.
|
|
374
|
+
|
|
375
|
+
- **Trainer Integrations**:
|
|
376
|
+
|
|
377
|
+
There’s been lots of new RL training frameworks popping up (e.g., numerous forks of verl) & we expect this to continue. They are often tightly coupled with specific environments, leading to fragmentation and limited compatibility.
|
|
378
|
+
|
|
379
|
+
We are building `benchmax` as a standalone library with integrations to these different training frameworks & as an easy way for new frameworks to tap into an existing pool of environments. We're already integrated with verl and verifiers. More integrations (e.g. SkyRL) coming soon!
|
|
380
|
+
|
|
381
|
+
- **Task Recipes and Ideas**:
|
|
382
|
+
|
|
383
|
+
We want `benchmax` to be a living library of reusable, RL-compatible task recipes, ready to inspire and extend beyond the usual suspects like math and coding. We aim to support more real-world workflows, including open-ended and long-horizon tasks.
|
|
384
|
+
|
|
385
|
+
- **Parallelization and Cloud Compatibility**:
|
|
386
|
+
- Enable efficient parallelization with maintained statefulness between rollouts.
|
|
387
|
+
- Facilitate easy deployment and scalability in cloud environments.
|
|
388
|
+
- **MCP as a first class citizen**:
|
|
389
|
+
|
|
390
|
+
There has been an explosion of MCP servers/tools built out for usecases ranging from browser use to excel to game creation.`benchmax` allow folks to leverage and composes these existing MCP servers to build environment integrated with real world systems e.g. excel
|
|
391
|
+
|
|
392
|
+
|
|
393
|
+
## 🤝 Contributing
|
|
394
|
+
|
|
395
|
+
We welcome new environment recipes, bug reports, and trainer integrations!
|
|
396
|
+
|
|
397
|
+
⭐ Star our repository to show your support!
|
|
398
|
+
|
|
399
|
+
## 📜 License
|
|
400
|
+
|
|
401
|
+
Apache 2.0 © 2025 CGFT Inc.
|
|
@@ -0,0 +1,377 @@
|
|
|
1
|
+
<picture>
|
|
2
|
+
<img alt="Benchmax" src="./static/benchmax.png" width="full">
|
|
3
|
+
</picture>
|
|
4
|
+
|
|
5
|
+
## benchmax: Framework-Agnostic RL Environments for LLM Fine-Tuning
|
|
6
|
+
*A lightweight, training-framework agnostic library for defining, running, and parallelizing environments, to fine-tune OSS LLMs with reinforcement learning.*
|
|
7
|
+
<div align="center">
|
|
8
|
+
</div>
|
|
9
|
+
<div id="badges" align="center">
|
|
10
|
+
<a href="https://cgft.io">
|
|
11
|
+
<img src="https://img.shields.io/badge/cgft.io-blue?style=for-the-badge" alt="Website"/>
|
|
12
|
+
</a>
|
|
13
|
+
<a href="https://x.com/cgftlabs">
|
|
14
|
+
<img src="https://img.shields.io/badge/Follow @cgftlabs-black?style=for-the-badge&logo=X&logoColor=white" alt="@cgftlabs"/>
|
|
15
|
+
</a>
|
|
16
|
+
</div>
|
|
17
|
+
<div align="center" style="line-height: 1;">
|
|
18
|
+
<a href="https://github.com/girishbarca/benchmax/blob/main/LICENSE"><img alt="License" src="https://img.shields.io/badge/License-Apache_2.0-blue.svg"/></a>
|
|
19
|
+
</div>
|
|
20
|
+
|
|
21
|
+
## Overview
|
|
22
|
+
|
|
23
|
+
`benchmax` comes with:
|
|
24
|
+
|
|
25
|
+
- A collection of ready-to-use reinforcement learning (RL) environments for LLM fine-tuning ranging from multi-hop search to spreadsheet manipulation to CRM agents
|
|
26
|
+
- An easy to define, compose, and parallelize your own environments, including leveraging the existing ecosystem of MCP servers
|
|
27
|
+
- Built-in integrations with popular RL training libraries (verl, verifiers, etc.). `benchmax` is trainer-agnostic by design
|
|
28
|
+
|
|
29
|
+
Define your environment as:
|
|
30
|
+
|
|
31
|
+
1. A **toolset** (LLM calls, external APIs, calculators, MCPs, etc.).
|
|
32
|
+
2. **Output parsing** logic to extract structured observations.
|
|
33
|
+
3. **Reward functions** to score model outputs.
|
|
34
|
+
|
|
35
|
+
Rollout management, parallel execution, etc. comes out of the box.
|
|
36
|
+
|
|
37
|
+
⭐ Star our repository to show your support!
|
|
38
|
+
|
|
39
|
+
## 💡 Core Features
|
|
40
|
+
|
|
41
|
+
**Built-in examples & templates**
|
|
42
|
+
|
|
43
|
+
Get started with ready to use recipes, from Wikipedia search to spreadsheet manipulation. Easy to copy, customize, and extend. And yes, more are on the way.
|
|
44
|
+
|
|
45
|
+
**Trainer Integrations**
|
|
46
|
+
|
|
47
|
+
Use your own trainer or training framework - no lock-in. `benchmax` is already Integrated into verl and verifiers, with more integrations (SkyRL, etc.) coming soon!
|
|
48
|
+
|
|
49
|
+
**MCP Support**
|
|
50
|
+
Tap into the growing MCP ecosystem and integrate them as tools within your environments.
|
|
51
|
+
|
|
52
|
+
**Parallel execution & state management**
|
|
53
|
+
|
|
54
|
+
- Local multi‐process pool
|
|
55
|
+
- State is isolated across roll-outs (e.g. editing files on local filesystem, etc.)
|
|
56
|
+
- Multi-Node Parallelization (Coming soon!)
|
|
57
|
+
|
|
58
|
+
## 📘 Quickstart
|
|
59
|
+
|
|
60
|
+
**Example: Math Question Answering with a Calculator MCP**
|
|
61
|
+
|
|
62
|
+
**verl** is a training framework `benchmax` is currently integrated with. Use our ***verl*** integration to RL finetune Qwen-3 to do math using a calculator MCP (https://github.com/githejie/mcp-server-calculator). The environment is defined at `benchmax.envs.math.math_env.MathEnv`
|
|
63
|
+
|
|
64
|
+
1. **Installation**
|
|
65
|
+
|
|
66
|
+
`pip install benchmax[verl]`
|
|
67
|
+
|
|
68
|
+
1. **Prepare the dataset**
|
|
69
|
+
|
|
70
|
+
```bash
|
|
71
|
+
python benchmax/adapters/verl/benchmax_data_process.py \
|
|
72
|
+
--local_dir ~/data/math \
|
|
73
|
+
--dataset_name dawidmt/arithmetic50 \
|
|
74
|
+
--env_path benchmax.envs.math.math_env.MathEnv
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
2. **Run training**
|
|
78
|
+
|
|
79
|
+
```bash
|
|
80
|
+
sh examples/verl/run_qwen2.5-3b_benchmax_math.sh
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
This math environment is just a quick example. Explore some of the more complex environments like `excel`, `crm` in `benchmax/envs`.
|
|
84
|
+
|
|
85
|
+
## 🌐 Creating & Training with Environments
|
|
86
|
+
|
|
87
|
+
### What is an environment?
|
|
88
|
+
|
|
89
|
+
An environment consists of:
|
|
90
|
+
|
|
91
|
+
- A list of tools that an LLM can call
|
|
92
|
+
- A list of reward functions that evaluate the quality & correctness of the model's final output.
|
|
93
|
+
|
|
94
|
+
We also support MCP servers natively, allowing you to easily leverage the many servers built by the community.
|
|
95
|
+
|
|
96
|
+
### Pre-built environments
|
|
97
|
+
|
|
98
|
+
Ready-to-use environments with pre-configured tools and reward functions.
|
|
99
|
+
|
|
100
|
+
- [CRM](benchmax/envs/crm/README.md)
|
|
101
|
+
- [Excel](benchmax/envs/excel/README.md)
|
|
102
|
+
- [Math](benchmax/envs/math/README.md)
|
|
103
|
+
- [Wikipedia](benchmax/envs/wikipedia/README.md)
|
|
104
|
+
|
|
105
|
+
### How do I create a custom environment?
|
|
106
|
+
|
|
107
|
+
<details>
|
|
108
|
+
<summary>With existing MCP Servers</summary>
|
|
109
|
+
|
|
110
|
+
To create a custom environment using an MCP server (like a calculator, browser, or spreadsheet), you can extend `LocalMCPEnv`. Here's a quick step-by-step guide using `benchmax.envs.math.math_env.MathEnv` as an example.
|
|
111
|
+
|
|
112
|
+
### 1. **Define a System Prompt**
|
|
113
|
+
|
|
114
|
+
This prompt guides the LLM’s behavior. It can include any instruction, such as how to format the answer or when to use tools.
|
|
115
|
+
|
|
116
|
+
```python
|
|
117
|
+
SYSTEM_PROMPT = """Please use the tools provided to do any computation.
|
|
118
|
+
Write your complete answer on the final line only, within the xml tags <answer></answer>.
|
|
119
|
+
"""
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
### 2. **Configure MCP Server(s)**
|
|
123
|
+
|
|
124
|
+
Define the MCP servers to be launched. You can configure one or more:
|
|
125
|
+
|
|
126
|
+
```python
|
|
127
|
+
MCP_CONFIG = """
|
|
128
|
+
{
|
|
129
|
+
"mcpServers": {
|
|
130
|
+
"server-name": {
|
|
131
|
+
"command": "uvx",
|
|
132
|
+
"args": ["mcp_server_calculator"]
|
|
133
|
+
}
|
|
134
|
+
}
|
|
135
|
+
}
|
|
136
|
+
"""
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
### 3. **Write a Reward Function**
|
|
140
|
+
|
|
141
|
+
The reward function evaluates how "correct" the model's output is, based on structured output. Here’s a simple XML-based example:
|
|
142
|
+
|
|
143
|
+
Note that `**kwargs` contains all the other fields in your dataset, so feel free to use them in `reward_func` calculations.
|
|
144
|
+
|
|
145
|
+
```python
|
|
146
|
+
def reward_func(prompt, completion, ground_truth, workspace, **kwargs):
|
|
147
|
+
m = re.search(r'<answer>(.*?)</answer>', completion, flags=re.IGNORECASE | re.DOTALL)
|
|
148
|
+
if not m:
|
|
149
|
+
return 0.0
|
|
150
|
+
answer_text = unescape(m.group(1)).strip().lower()
|
|
151
|
+
return float(ground_truth.lower() == answer_text)
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
### 4. Define **`dataset_preprocess`**
|
|
155
|
+
|
|
156
|
+
If your dataset is not already standardized, implement this method to convert a raw example into a standardized one with:
|
|
157
|
+
|
|
158
|
+
- `"prompt"`: A fully constructed string prompt.
|
|
159
|
+
- `"ground_truth"`: A known correct output (optional depending on reward).
|
|
160
|
+
- `"init_rollout_args"`: Arguments needed to initialize a rollout.
|
|
161
|
+
|
|
162
|
+
Example for our math task:
|
|
163
|
+
|
|
164
|
+
```python
|
|
165
|
+
def dataset_preprocess(self, example: dict) -> StandardizedExample:
|
|
166
|
+
return StandardizedExample(
|
|
167
|
+
prompt=example.get("task", ""),
|
|
168
|
+
ground_truth=example.get("answer", ""),
|
|
169
|
+
init_rollout_args={}
|
|
170
|
+
)
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
<details>
|
|
174
|
+
<summary>Notes on init_rollout_args</summary>
|
|
175
|
+
The `init_rollout_args` dictionary is passed from `dataset_preprocess()` to your environment's `init_rollout()` method. It is used to initialize any **per-example files, resources, or execution context** needed before a rollout begins.
|
|
176
|
+
|
|
177
|
+
Common use cases include:
|
|
178
|
+
|
|
179
|
+
- **Input files**: For environments that manipulate files like spreadsheets, images, or databases, pass the necessary file paths.
|
|
180
|
+
- **Version control**: For code-related tasks, you might pass a `commit_id` to check out the correct code state.
|
|
181
|
+
- **Task-specific settings**: Pass metadata like cell ranges, task IDs, or execution flags.
|
|
182
|
+
|
|
183
|
+
Example:
|
|
184
|
+
|
|
185
|
+
```python
|
|
186
|
+
# Inside dataset_preprocess
|
|
187
|
+
return {
|
|
188
|
+
"prompt": "...",
|
|
189
|
+
"ground_truth": "...",
|
|
190
|
+
"init_rollout_args": {
|
|
191
|
+
"spreadsheet_path": "/path/to/1_001_input.xlsx"
|
|
192
|
+
}
|
|
193
|
+
}
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
Then in your `init_rollout()` method:
|
|
197
|
+
|
|
198
|
+
```python
|
|
199
|
+
def init_rollout(self, rollout_id: str, **rollout_args):
|
|
200
|
+
spreadsheet_path = rollout_args["spreadsheet_path"]
|
|
201
|
+
workspace = self.get_rollout_workspace(rollout_id)
|
|
202
|
+
|
|
203
|
+
# Copy the input file into the rollout's workspace
|
|
204
|
+
shutil.copy(spreadsheet_path, workspace / Path(spreadsheet_path).name)
|
|
205
|
+
```
|
|
206
|
+
|
|
207
|
+
This pattern ensures each rollout starts with the correct inputs and configuration.
|
|
208
|
+
</details>
|
|
209
|
+
|
|
210
|
+
|
|
211
|
+
### 5. **Extend `LocalMCPEnv`**
|
|
212
|
+
|
|
213
|
+
Now bring everything together into a custom environment class:
|
|
214
|
+
|
|
215
|
+
```python
|
|
216
|
+
from envs.local_mcp_env import LocalMCPEnv
|
|
217
|
+
from typing import List
|
|
218
|
+
|
|
219
|
+
class MathEnv(LocalMCPEnv):
|
|
220
|
+
"""Environment for math problems, using local MCP tools."""
|
|
221
|
+
|
|
222
|
+
system_prompt: str = SYSTEM_PROMPT
|
|
223
|
+
reward_funcs: List[RewardFunction] = [reward_func]
|
|
224
|
+
|
|
225
|
+
def __init__(self, **kwargs):
|
|
226
|
+
super().__init__(MCP_CONFIG)
|
|
227
|
+
|
|
228
|
+
def dataset_preprocess(self, example: Any) -> StandardizedExample:
|
|
229
|
+
return StandardizedExample(
|
|
230
|
+
prompt=example.get("task", ""),
|
|
231
|
+
ground_truth=example.get("answer", ""),
|
|
232
|
+
init_rollout_args={}
|
|
233
|
+
)
|
|
234
|
+
```
|
|
235
|
+
|
|
236
|
+
You're done! This environment is now compatible with `benchmax` and can be plugged into any compatible RL trainer.
|
|
237
|
+
</details>
|
|
238
|
+
<details>
|
|
239
|
+
<summary>Extend BaseEnv</summary>
|
|
240
|
+
If you don’t need MCP servers, you can build a environment from scratch by extending `BaseEnv` directly. Here's how to make a minimal math environment with a single tool: an arithmetic evaluator.
|
|
241
|
+
|
|
242
|
+
### 1. **Define the system prompt**
|
|
243
|
+
|
|
244
|
+
This helps instruct the model on how to interact with the tool and format output.
|
|
245
|
+
|
|
246
|
+
```python
|
|
247
|
+
SYSTEM_PROMPT = """Use the `evaluate` tool to perform any computation.
|
|
248
|
+
Write your final answer on the last line inside <answer>...</answer>.
|
|
249
|
+
"""
|
|
250
|
+
```
|
|
251
|
+
|
|
252
|
+
### 2. **Create a reward function**
|
|
253
|
+
|
|
254
|
+
We'll score the model 1.0 if it places the correct answer inside `<answer>...</answer>` tags:
|
|
255
|
+
|
|
256
|
+
```python
|
|
257
|
+
import re
|
|
258
|
+
from html import unescape
|
|
259
|
+
from pathlib import Path
|
|
260
|
+
|
|
261
|
+
def reward_func(prompt: str, completion: str, ground_truth: str, workspace: Path, **kwargs) -> float:
|
|
262
|
+
m = re.search(r'<answer>(.*?)</answer>', completion, flags=re.IGNORECASE | re.DOTALL)
|
|
263
|
+
if not m:
|
|
264
|
+
return 0.0
|
|
265
|
+
answer_text = unescape(m.group(1)).strip().lower()
|
|
266
|
+
return float(answer_text == ground_truth.lower())
|
|
267
|
+
```
|
|
268
|
+
|
|
269
|
+
### 3. **Define your math tool**
|
|
270
|
+
|
|
271
|
+
A simple safe `eval` for math expressions:
|
|
272
|
+
|
|
273
|
+
```python
|
|
274
|
+
def evaluate_expression(expr: str) -> str:
|
|
275
|
+
try:
|
|
276
|
+
result = eval(expr, {"__builtins__": {}})
|
|
277
|
+
return str(result)
|
|
278
|
+
except Exception as e:
|
|
279
|
+
return f"Error: {str(e)}"
|
|
280
|
+
```
|
|
281
|
+
|
|
282
|
+
### 4. **Create the environment class**
|
|
283
|
+
|
|
284
|
+
Bring it all together in a subclass of `BaseEnv`:
|
|
285
|
+
|
|
286
|
+
```python
|
|
287
|
+
class SimpleMathEnv(BaseEnv):
|
|
288
|
+
system_prompt: str = SYSTEM_PROMPT
|
|
289
|
+
_reward_funcs: List[RewardFunction] = [reward_func]
|
|
290
|
+
|
|
291
|
+
def __init__(self):
|
|
292
|
+
eval_tool = ToolDefinition(
|
|
293
|
+
name="evaluate",
|
|
294
|
+
description="Safely evaluate a math expression like '2 + 3 * 4'.",
|
|
295
|
+
input_schema={
|
|
296
|
+
"type": "object",
|
|
297
|
+
"properties": {
|
|
298
|
+
"expr": {
|
|
299
|
+
"type": "string",
|
|
300
|
+
"description": "Math expression to evaluate.",
|
|
301
|
+
},
|
|
302
|
+
},
|
|
303
|
+
"required": ["expr"],
|
|
304
|
+
}
|
|
305
|
+
)
|
|
306
|
+
self.tools: Dict[str, Tuple[ToolDefinition, Callable]] = {
|
|
307
|
+
"evaluate": (eval_tool, evaluate_expression)
|
|
308
|
+
}
|
|
309
|
+
def dataset_preprocess(self, example: dict) -> StandardizedExample:
|
|
310
|
+
return {
|
|
311
|
+
"prompt": f"Question: {example['question']}\n\nWrite your answer below.",
|
|
312
|
+
"ground_truth": example.get("answer", ""),
|
|
313
|
+
"init_rollout_args": {}
|
|
314
|
+
}
|
|
315
|
+
|
|
316
|
+
def list_tools(self) -> List[ToolDefinition]:
|
|
317
|
+
return [tool_def for tool_def, _ in self.tools.values()]
|
|
318
|
+
|
|
319
|
+
def run_tool(self, rollout_id: str, tool_name: str, **tool_args) -> Any:
|
|
320
|
+
_, tool_fn = self.tools[tool_name]
|
|
321
|
+
return tool_fn(**tool_args)
|
|
322
|
+
```
|
|
323
|
+
</details>
|
|
324
|
+
|
|
325
|
+
### How about more complex environments?
|
|
326
|
+
|
|
327
|
+
- Check out our excel spreadsheet RL environment: `benchmax.envs.excel.excel_env.ExcelEnv`
|
|
328
|
+
|
|
329
|
+
### How do I use an environment with my preferred RL Trainer?
|
|
330
|
+
|
|
331
|
+
We currently have integrations with both verifiers and verl. More incoming!
|
|
332
|
+
|
|
333
|
+
[`benchmax` environments with verl](/examples/verl/README.md)
|
|
334
|
+
|
|
335
|
+
[`benchmax` environments with verifiers](/examples/verifiers/README.md)
|
|
336
|
+
|
|
337
|
+
### I want a specific environment
|
|
338
|
+
|
|
339
|
+
Open an issue and tag us & we will look into building you one!
|
|
340
|
+
|
|
341
|
+
---
|
|
342
|
+
|
|
343
|
+
## 🎯 Motivation
|
|
344
|
+
|
|
345
|
+
- **Modularity and Simplicity**:
|
|
346
|
+
|
|
347
|
+
We set out to build a lightweight, modular system for defining RL environments—breaking them down into simple, composable parts: tools, tool output parsing, and reward functions.
|
|
348
|
+
|
|
349
|
+
The goal’s to make it easy for software engineers to build and experiment with RL environments without needing deep RL expertise.
|
|
350
|
+
|
|
351
|
+
- **Trainer Integrations**:
|
|
352
|
+
|
|
353
|
+
There’s been lots of new RL training frameworks popping up (e.g., numerous forks of verl) & we expect this to continue. They are often tightly coupled with specific environments, leading to fragmentation and limited compatibility.
|
|
354
|
+
|
|
355
|
+
We are building `benchmax` as a standalone library with integrations to these different training frameworks & as an easy way for new frameworks to tap into an existing pool of environments. We're already integrated with verl and verifiers. More integrations (e.g. SkyRL) coming soon!
|
|
356
|
+
|
|
357
|
+
- **Task Recipes and Ideas**:
|
|
358
|
+
|
|
359
|
+
We want `benchmax` to be a living library of reusable, RL-compatible task recipes, ready to inspire and extend beyond the usual suspects like math and coding. We aim to support more real-world workflows, including open-ended and long-horizon tasks.
|
|
360
|
+
|
|
361
|
+
- **Parallelization and Cloud Compatibility**:
|
|
362
|
+
- Enable efficient parallelization with maintained statefulness between rollouts.
|
|
363
|
+
- Facilitate easy deployment and scalability in cloud environments.
|
|
364
|
+
- **MCP as a first class citizen**:
|
|
365
|
+
|
|
366
|
+
There has been an explosion of MCP servers/tools built out for usecases ranging from browser use to excel to game creation.`benchmax` allow folks to leverage and composes these existing MCP servers to build environment integrated with real world systems e.g. excel
|
|
367
|
+
|
|
368
|
+
|
|
369
|
+
## 🤝 Contributing
|
|
370
|
+
|
|
371
|
+
We welcome new environment recipes, bug reports, and trainer integrations!
|
|
372
|
+
|
|
373
|
+
⭐ Star our repository to show your support!
|
|
374
|
+
|
|
375
|
+
## 📜 License
|
|
376
|
+
|
|
377
|
+
Apache 2.0 © 2025 CGFT Inc.
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
[tool.poetry]
|
|
2
2
|
name = "benchmax"
|
|
3
|
-
version = "0.1.1.
|
|
3
|
+
version = "0.1.1.dev2"
|
|
4
4
|
description = "Framework-Agnostic RL Environments for LLM Fine-Tuning"
|
|
5
5
|
authors = ["cgft.io"]
|
|
6
6
|
readme = "README.md"
|
|
@@ -13,7 +13,7 @@ python = ">=3.11,<3.13"
|
|
|
13
13
|
fastmcp = "~2.10.0"
|
|
14
14
|
|
|
15
15
|
verl-cgft-fork = { version = "0.4.1.dev1", optional = true }
|
|
16
|
-
sglang = { version = "0.4.9", optional = true }
|
|
16
|
+
sglang = { version = "0.4.9", optional = true, extras = ["all"] }
|
|
17
17
|
verifiers = { version = "^0.1.1", optional = true, extras = ["train"] }
|
|
18
18
|
|
|
19
19
|
openpyxl = { version = "3.1.5", optional = true }
|
benchmax-0.1.1.dev1/PKG-INFO
DELETED
|
@@ -1,43 +0,0 @@
|
|
|
1
|
-
Metadata-Version: 2.3
|
|
2
|
-
Name: benchmax
|
|
3
|
-
Version: 0.1.1.dev1
|
|
4
|
-
Summary: Framework-Agnostic RL Environments for LLM Fine-Tuning
|
|
5
|
-
Author: cgft.io
|
|
6
|
-
Requires-Python: >=3.11,<3.13
|
|
7
|
-
Classifier: Programming Language :: Python :: 3
|
|
8
|
-
Classifier: Programming Language :: Python :: 3.11
|
|
9
|
-
Classifier: Programming Language :: Python :: 3.12
|
|
10
|
-
Provides-Extra: crm
|
|
11
|
-
Provides-Extra: excel
|
|
12
|
-
Provides-Extra: excel-linux
|
|
13
|
-
Provides-Extra: verifiers
|
|
14
|
-
Provides-Extra: verl
|
|
15
|
-
Requires-Dist: fastmcp (>=2.10.0,<2.11.0)
|
|
16
|
-
Requires-Dist: openpyxl (==3.1.5) ; extra == "excel-linux" or extra == "excel"
|
|
17
|
-
Requires-Dist: python-dateutil (>=2.9.0,<2.10.0) ; extra == "crm"
|
|
18
|
-
Requires-Dist: sglang (==0.4.9) ; extra == "verl"
|
|
19
|
-
Requires-Dist: simple-salesforce (>=1.12.3) ; extra == "crm"
|
|
20
|
-
Requires-Dist: verifiers[train] (>=0.1.1,<0.2.0) ; extra == "verifiers"
|
|
21
|
-
Requires-Dist: verl-cgft-fork (==0.4.1.dev1) ; extra == "verl"
|
|
22
|
-
Requires-Dist: xlwings (==0.33.15) ; extra == "excel"
|
|
23
|
-
Description-Content-Type: text/markdown
|
|
24
|
-
|
|
25
|
-
<picture>
|
|
26
|
-
<img alt="Benchmax" src="./static/benchmax.png" width="full">
|
|
27
|
-
</picture>
|
|
28
|
-
|
|
29
|
-
## benchmax: Framework-Agnostic Reinforcement Learning Environments for LLM Fine-Tuning
|
|
30
|
-
|
|
31
|
-
<div align="center">
|
|
32
|
-
</div>
|
|
33
|
-
<div id="badges" align="center">
|
|
34
|
-
<a href="https://cgft.io">
|
|
35
|
-
<img src="https://img.shields.io/badge/cgft.io-blue?style=for-the-badge" alt="Website"/>
|
|
36
|
-
</a>
|
|
37
|
-
<a href="https://x.com/cgftlabs">
|
|
38
|
-
<img src="https://img.shields.io/badge/Follow @cgftlabs-black?style=for-the-badge&logo=X&logoColor=white" alt="@cgftlabs"/>
|
|
39
|
-
</a>
|
|
40
|
-
</div>
|
|
41
|
-
<div align="center" style="line-height: 1;">
|
|
42
|
-
<a href="https://github.com/girishbarca/benchmax/blob/main/LICENSE"><img alt="License" src="https://img.shields.io/badge/License-Apache_2.0-blue.svg"/></a>
|
|
43
|
-
</div>
|
benchmax-0.1.1.dev1/README.md
DELETED
|
@@ -1,19 +0,0 @@
|
|
|
1
|
-
<picture>
|
|
2
|
-
<img alt="Benchmax" src="./static/benchmax.png" width="full">
|
|
3
|
-
</picture>
|
|
4
|
-
|
|
5
|
-
## benchmax: Framework-Agnostic Reinforcement Learning Environments for LLM Fine-Tuning
|
|
6
|
-
|
|
7
|
-
<div align="center">
|
|
8
|
-
</div>
|
|
9
|
-
<div id="badges" align="center">
|
|
10
|
-
<a href="https://cgft.io">
|
|
11
|
-
<img src="https://img.shields.io/badge/cgft.io-blue?style=for-the-badge" alt="Website"/>
|
|
12
|
-
</a>
|
|
13
|
-
<a href="https://x.com/cgftlabs">
|
|
14
|
-
<img src="https://img.shields.io/badge/Follow @cgftlabs-black?style=for-the-badge&logo=X&logoColor=white" alt="@cgftlabs"/>
|
|
15
|
-
</a>
|
|
16
|
-
</div>
|
|
17
|
-
<div align="center" style="line-height: 1;">
|
|
18
|
-
<a href="https://github.com/girishbarca/benchmax/blob/main/LICENSE"><img alt="License" src="https://img.shields.io/badge/License-Apache_2.0-blue.svg"/></a>
|
|
19
|
-
</div>
|
|
File without changes
|
|
File without changes
|
{benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/adapters/verifiers/verifiers_adapters.py
RENAMED
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|