benchmax 0.1.1.dev1__tar.gz → 0.1.1.dev2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (29) hide show
  1. benchmax-0.1.1.dev2/PKG-INFO +401 -0
  2. benchmax-0.1.1.dev2/README.md +377 -0
  3. {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/pyproject.toml +2 -2
  4. benchmax-0.1.1.dev1/PKG-INFO +0 -43
  5. benchmax-0.1.1.dev1/README.md +0 -19
  6. {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/LICENSE +0 -0
  7. {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/adapters/__init__.py +0 -0
  8. {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/adapters/verifiers/verifiers_adapters.py +0 -0
  9. {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/adapters/verl/benchmax_data_process.py +0 -0
  10. {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/__init__.py +0 -0
  11. {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/base_env.py +0 -0
  12. {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/bounded_dict.py +0 -0
  13. {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/crm/README.md +0 -0
  14. {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/crm/crm_env.py +0 -0
  15. {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/crm/salesforce_mcp.py +0 -0
  16. {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/crm/salesforce_requirements.txt +0 -0
  17. {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/excel/README.md +0 -0
  18. {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/excel/data_utils.py +0 -0
  19. {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/excel/excel_code_runner_mcp.py +0 -0
  20. {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/excel/excel_env.py +0 -0
  21. {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/excel/excel_utils.py +0 -0
  22. {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/local_mcp_env.py +0 -0
  23. {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/math/README.md +0 -0
  24. {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/math/math_env.py +0 -0
  25. {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/types.py +0 -0
  26. {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/wikipedia/README.md +0 -0
  27. {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/wikipedia/utils.py +0 -0
  28. {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/envs/wikipedia/wiki_env.py +0 -0
  29. {benchmax-0.1.1.dev1 → benchmax-0.1.1.dev2}/benchmax/prompts/tools.py +0 -0
@@ -0,0 +1,401 @@
1
+ Metadata-Version: 2.3
2
+ Name: benchmax
3
+ Version: 0.1.1.dev2
4
+ Summary: Framework-Agnostic RL Environments for LLM Fine-Tuning
5
+ Author: cgft.io
6
+ Requires-Python: >=3.11,<3.13
7
+ Classifier: Programming Language :: Python :: 3
8
+ Classifier: Programming Language :: Python :: 3.11
9
+ Classifier: Programming Language :: Python :: 3.12
10
+ Provides-Extra: crm
11
+ Provides-Extra: excel
12
+ Provides-Extra: excel-linux
13
+ Provides-Extra: verifiers
14
+ Provides-Extra: verl
15
+ Requires-Dist: fastmcp (>=2.10.0,<2.11.0)
16
+ Requires-Dist: openpyxl (==3.1.5) ; extra == "excel-linux" or extra == "excel"
17
+ Requires-Dist: python-dateutil (>=2.9.0,<2.10.0) ; extra == "crm"
18
+ Requires-Dist: sglang[all] (==0.4.9) ; extra == "verl"
19
+ Requires-Dist: simple-salesforce (>=1.12.3) ; extra == "crm"
20
+ Requires-Dist: verifiers[train] (>=0.1.1,<0.2.0) ; extra == "verifiers"
21
+ Requires-Dist: verl-cgft-fork (==0.4.1.dev1) ; extra == "verl"
22
+ Requires-Dist: xlwings (==0.33.15) ; extra == "excel"
23
+ Description-Content-Type: text/markdown
24
+
25
+ <picture>
26
+ <img alt="Benchmax" src="./static/benchmax.png" width="full">
27
+ </picture>
28
+
29
+ ## benchmax: Framework-Agnostic RL Environments for LLM Fine-Tuning
30
+ *A lightweight, training-framework agnostic library for defining, running, and parallelizing environments, to fine-tune OSS LLMs with reinforcement learning.*
31
+ <div align="center">
32
+ </div>
33
+ <div id="badges" align="center">
34
+ <a href="https://cgft.io">
35
+ <img src="https://img.shields.io/badge/cgft.io-blue?style=for-the-badge" alt="Website"/>
36
+ </a>
37
+ <a href="https://x.com/cgftlabs">
38
+ <img src="https://img.shields.io/badge/Follow @cgftlabs-black?style=for-the-badge&logo=X&logoColor=white" alt="@cgftlabs"/>
39
+ </a>
40
+ </div>
41
+ <div align="center" style="line-height: 1;">
42
+ <a href="https://github.com/girishbarca/benchmax/blob/main/LICENSE"><img alt="License" src="https://img.shields.io/badge/License-Apache_2.0-blue.svg"/></a>
43
+ </div>
44
+
45
+ ## Overview
46
+
47
+ `benchmax` comes with:
48
+
49
+ - A collection of ready-to-use reinforcement learning (RL) environments for LLM fine-tuning ranging from multi-hop search to spreadsheet manipulation to CRM agents
50
+ - An easy to define, compose, and parallelize your own environments, including leveraging the existing ecosystem of MCP servers
51
+ - Built-in integrations with popular RL training libraries (verl, verifiers, etc.). `benchmax` is trainer-agnostic by design
52
+
53
+ Define your environment as:
54
+
55
+ 1. A **toolset** (LLM calls, external APIs, calculators, MCPs, etc.).
56
+ 2. **Output parsing** logic to extract structured observations.
57
+ 3. **Reward functions** to score model outputs.
58
+
59
+ Rollout management, parallel execution, etc. comes out of the box.
60
+
61
+ ⭐ Star our repository to show your support!
62
+
63
+ ## 💡 Core Features
64
+
65
+ **Built-in examples & templates**
66
+
67
+ Get started with ready to use recipes, from Wikipedia search to spreadsheet manipulation. Easy to copy, customize, and extend. And yes, more are on the way.
68
+
69
+ **Trainer Integrations**
70
+
71
+ Use your own trainer or training framework - no lock-in. `benchmax` is already Integrated into verl and verifiers, with more integrations (SkyRL, etc.) coming soon!
72
+
73
+ **MCP Support**
74
+ Tap into the growing MCP ecosystem and integrate them as tools within your environments.
75
+
76
+ **Parallel execution & state management**
77
+
78
+ - Local multi‐process pool
79
+ - State is isolated across roll-outs (e.g. editing files on local filesystem, etc.)
80
+ - Multi-Node Parallelization (Coming soon!)
81
+
82
+ ## 📘 Quickstart
83
+
84
+ **Example: Math Question Answering with a Calculator MCP**
85
+
86
+ **verl** is a training framework `benchmax` is currently integrated with. Use our ***verl*** integration to RL finetune Qwen-3 to do math using a calculator MCP (https://github.com/githejie/mcp-server-calculator). The environment is defined at `benchmax.envs.math.math_env.MathEnv`
87
+
88
+ 1. **Installation**
89
+
90
+ `pip install benchmax[verl]`
91
+
92
+ 1. **Prepare the dataset**
93
+
94
+ ```bash
95
+ python benchmax/adapters/verl/benchmax_data_process.py \
96
+ --local_dir ~/data/math \
97
+ --dataset_name dawidmt/arithmetic50 \
98
+ --env_path benchmax.envs.math.math_env.MathEnv
99
+ ```
100
+
101
+ 2. **Run training**
102
+
103
+ ```bash
104
+ sh examples/verl/run_qwen2.5-3b_benchmax_math.sh
105
+ ```
106
+
107
+ This math environment is just a quick example. Explore some of the more complex environments like `excel`, `crm` in `benchmax/envs`.
108
+
109
+ ## 🌐 Creating & Training with Environments
110
+
111
+ ### What is an environment?
112
+
113
+ An environment consists of:
114
+
115
+ - A list of tools that an LLM can call
116
+ - A list of reward functions that evaluate the quality & correctness of the model's final output.
117
+
118
+ We also support MCP servers natively, allowing you to easily leverage the many servers built by the community.
119
+
120
+ ### Pre-built environments
121
+
122
+ Ready-to-use environments with pre-configured tools and reward functions.
123
+
124
+ - [CRM](benchmax/envs/crm/README.md)
125
+ - [Excel](benchmax/envs/excel/README.md)
126
+ - [Math](benchmax/envs/math/README.md)
127
+ - [Wikipedia](benchmax/envs/wikipedia/README.md)
128
+
129
+ ### How do I create a custom environment?
130
+
131
+ <details>
132
+ <summary>With existing MCP Servers</summary>
133
+
134
+ To create a custom environment using an MCP server (like a calculator, browser, or spreadsheet), you can extend `LocalMCPEnv`. Here's a quick step-by-step guide using `benchmax.envs.math.math_env.MathEnv` as an example.
135
+
136
+ ### 1. **Define a System Prompt**
137
+
138
+ This prompt guides the LLM’s behavior. It can include any instruction, such as how to format the answer or when to use tools.
139
+
140
+ ```python
141
+ SYSTEM_PROMPT = """Please use the tools provided to do any computation.
142
+ Write your complete answer on the final line only, within the xml tags <answer></answer>.
143
+ """
144
+ ```
145
+
146
+ ### 2. **Configure MCP Server(s)**
147
+
148
+ Define the MCP servers to be launched. You can configure one or more:
149
+
150
+ ```python
151
+ MCP_CONFIG = """
152
+ {
153
+ "mcpServers": {
154
+ "server-name": {
155
+ "command": "uvx",
156
+ "args": ["mcp_server_calculator"]
157
+ }
158
+ }
159
+ }
160
+ """
161
+ ```
162
+
163
+ ### 3. **Write a Reward Function**
164
+
165
+ The reward function evaluates how "correct" the model's output is, based on structured output. Here’s a simple XML-based example:
166
+
167
+ Note that `**kwargs` contains all the other fields in your dataset, so feel free to use them in `reward_func` calculations.
168
+
169
+ ```python
170
+ def reward_func(prompt, completion, ground_truth, workspace, **kwargs):
171
+ m = re.search(r'<answer>(.*?)</answer>', completion, flags=re.IGNORECASE | re.DOTALL)
172
+ if not m:
173
+ return 0.0
174
+ answer_text = unescape(m.group(1)).strip().lower()
175
+ return float(ground_truth.lower() == answer_text)
176
+ ```
177
+
178
+ ### 4. Define **`dataset_preprocess`**
179
+
180
+ If your dataset is not already standardized, implement this method to convert a raw example into a standardized one with:
181
+
182
+ - `"prompt"`: A fully constructed string prompt.
183
+ - `"ground_truth"`: A known correct output (optional depending on reward).
184
+ - `"init_rollout_args"`: Arguments needed to initialize a rollout.
185
+
186
+ Example for our math task:
187
+
188
+ ```python
189
+ def dataset_preprocess(self, example: dict) -> StandardizedExample:
190
+ return StandardizedExample(
191
+ prompt=example.get("task", ""),
192
+ ground_truth=example.get("answer", ""),
193
+ init_rollout_args={}
194
+ )
195
+ ```
196
+
197
+ <details>
198
+ <summary>Notes on init_rollout_args</summary>
199
+ The `init_rollout_args` dictionary is passed from `dataset_preprocess()` to your environment's `init_rollout()` method. It is used to initialize any **per-example files, resources, or execution context** needed before a rollout begins.
200
+
201
+ Common use cases include:
202
+
203
+ - **Input files**: For environments that manipulate files like spreadsheets, images, or databases, pass the necessary file paths.
204
+ - **Version control**: For code-related tasks, you might pass a `commit_id` to check out the correct code state.
205
+ - **Task-specific settings**: Pass metadata like cell ranges, task IDs, or execution flags.
206
+
207
+ Example:
208
+
209
+ ```python
210
+ # Inside dataset_preprocess
211
+ return {
212
+ "prompt": "...",
213
+ "ground_truth": "...",
214
+ "init_rollout_args": {
215
+ "spreadsheet_path": "/path/to/1_001_input.xlsx"
216
+ }
217
+ }
218
+ ```
219
+
220
+ Then in your `init_rollout()` method:
221
+
222
+ ```python
223
+ def init_rollout(self, rollout_id: str, **rollout_args):
224
+ spreadsheet_path = rollout_args["spreadsheet_path"]
225
+ workspace = self.get_rollout_workspace(rollout_id)
226
+
227
+ # Copy the input file into the rollout's workspace
228
+ shutil.copy(spreadsheet_path, workspace / Path(spreadsheet_path).name)
229
+ ```
230
+
231
+ This pattern ensures each rollout starts with the correct inputs and configuration.
232
+ </details>
233
+
234
+
235
+ ### 5. **Extend `LocalMCPEnv`**
236
+
237
+ Now bring everything together into a custom environment class:
238
+
239
+ ```python
240
+ from envs.local_mcp_env import LocalMCPEnv
241
+ from typing import List
242
+
243
+ class MathEnv(LocalMCPEnv):
244
+ """Environment for math problems, using local MCP tools."""
245
+
246
+ system_prompt: str = SYSTEM_PROMPT
247
+ reward_funcs: List[RewardFunction] = [reward_func]
248
+
249
+ def __init__(self, **kwargs):
250
+ super().__init__(MCP_CONFIG)
251
+
252
+ def dataset_preprocess(self, example: Any) -> StandardizedExample:
253
+ return StandardizedExample(
254
+ prompt=example.get("task", ""),
255
+ ground_truth=example.get("answer", ""),
256
+ init_rollout_args={}
257
+ )
258
+ ```
259
+
260
+ You're done! This environment is now compatible with `benchmax` and can be plugged into any compatible RL trainer.
261
+ </details>
262
+ <details>
263
+ <summary>Extend BaseEnv</summary>
264
+ If you don’t need MCP servers, you can build a environment from scratch by extending `BaseEnv` directly. Here's how to make a minimal math environment with a single tool: an arithmetic evaluator.
265
+
266
+ ### 1. **Define the system prompt**
267
+
268
+ This helps instruct the model on how to interact with the tool and format output.
269
+
270
+ ```python
271
+ SYSTEM_PROMPT = """Use the `evaluate` tool to perform any computation.
272
+ Write your final answer on the last line inside <answer>...</answer>.
273
+ """
274
+ ```
275
+
276
+ ### 2. **Create a reward function**
277
+
278
+ We'll score the model 1.0 if it places the correct answer inside `<answer>...</answer>` tags:
279
+
280
+ ```python
281
+ import re
282
+ from html import unescape
283
+ from pathlib import Path
284
+
285
+ def reward_func(prompt: str, completion: str, ground_truth: str, workspace: Path, **kwargs) -> float:
286
+ m = re.search(r'<answer>(.*?)</answer>', completion, flags=re.IGNORECASE | re.DOTALL)
287
+ if not m:
288
+ return 0.0
289
+ answer_text = unescape(m.group(1)).strip().lower()
290
+ return float(answer_text == ground_truth.lower())
291
+ ```
292
+
293
+ ### 3. **Define your math tool**
294
+
295
+ A simple safe `eval` for math expressions:
296
+
297
+ ```python
298
+ def evaluate_expression(expr: str) -> str:
299
+ try:
300
+ result = eval(expr, {"__builtins__": {}})
301
+ return str(result)
302
+ except Exception as e:
303
+ return f"Error: {str(e)}"
304
+ ```
305
+
306
+ ### 4. **Create the environment class**
307
+
308
+ Bring it all together in a subclass of `BaseEnv`:
309
+
310
+ ```python
311
+ class SimpleMathEnv(BaseEnv):
312
+ system_prompt: str = SYSTEM_PROMPT
313
+ _reward_funcs: List[RewardFunction] = [reward_func]
314
+
315
+ def __init__(self):
316
+ eval_tool = ToolDefinition(
317
+ name="evaluate",
318
+ description="Safely evaluate a math expression like '2 + 3 * 4'.",
319
+ input_schema={
320
+ "type": "object",
321
+ "properties": {
322
+ "expr": {
323
+ "type": "string",
324
+ "description": "Math expression to evaluate.",
325
+ },
326
+ },
327
+ "required": ["expr"],
328
+ }
329
+ )
330
+ self.tools: Dict[str, Tuple[ToolDefinition, Callable]] = {
331
+ "evaluate": (eval_tool, evaluate_expression)
332
+ }
333
+ def dataset_preprocess(self, example: dict) -> StandardizedExample:
334
+ return {
335
+ "prompt": f"Question: {example['question']}\n\nWrite your answer below.",
336
+ "ground_truth": example.get("answer", ""),
337
+ "init_rollout_args": {}
338
+ }
339
+
340
+ def list_tools(self) -> List[ToolDefinition]:
341
+ return [tool_def for tool_def, _ in self.tools.values()]
342
+
343
+ def run_tool(self, rollout_id: str, tool_name: str, **tool_args) -> Any:
344
+ _, tool_fn = self.tools[tool_name]
345
+ return tool_fn(**tool_args)
346
+ ```
347
+ </details>
348
+
349
+ ### How about more complex environments?
350
+
351
+ - Check out our excel spreadsheet RL environment: `benchmax.envs.excel.excel_env.ExcelEnv`
352
+
353
+ ### How do I use an environment with my preferred RL Trainer?
354
+
355
+ We currently have integrations with both verifiers and verl. More incoming!
356
+
357
+ [`benchmax` environments with verl](/examples/verl/README.md)
358
+
359
+ [`benchmax` environments with verifiers](/examples/verifiers/README.md)
360
+
361
+ ### I want a specific environment
362
+
363
+ Open an issue and tag us & we will look into building you one!
364
+
365
+ ---
366
+
367
+ ## 🎯 Motivation
368
+
369
+ - **Modularity and Simplicity**:
370
+
371
+ We set out to build a lightweight, modular system for defining RL environments—breaking them down into simple, composable parts: tools, tool output parsing, and reward functions.
372
+
373
+ The goal’s to make it easy for software engineers to build and experiment with RL environments without needing deep RL expertise.
374
+
375
+ - **Trainer Integrations**:
376
+
377
+ There’s been lots of new RL training frameworks popping up (e.g., numerous forks of verl) & we expect this to continue. They are often tightly coupled with specific environments, leading to fragmentation and limited compatibility.
378
+
379
+ We are building `benchmax` as a standalone library with integrations to these different training frameworks & as an easy way for new frameworks to tap into an existing pool of environments. We're already integrated with verl and verifiers. More integrations (e.g. SkyRL) coming soon!
380
+
381
+ - **Task Recipes and Ideas**:
382
+
383
+ We want `benchmax` to be a living library of reusable, RL-compatible task recipes, ready to inspire and extend beyond the usual suspects like math and coding. We aim to support more real-world workflows, including open-ended and long-horizon tasks.
384
+
385
+ - **Parallelization and Cloud Compatibility**:
386
+ - Enable efficient parallelization with maintained statefulness between rollouts.
387
+ - Facilitate easy deployment and scalability in cloud environments.
388
+ - **MCP as a first class citizen**:
389
+
390
+ There has been an explosion of MCP servers/tools built out for usecases ranging from browser use to excel to game creation.`benchmax` allow folks to leverage and composes these existing MCP servers to build environment integrated with real world systems e.g. excel
391
+
392
+
393
+ ## 🤝 Contributing
394
+
395
+ We welcome new environment recipes, bug reports, and trainer integrations!
396
+
397
+ ⭐ Star our repository to show your support!
398
+
399
+ ## 📜 License
400
+
401
+ Apache 2.0 © 2025 CGFT Inc.
@@ -0,0 +1,377 @@
1
+ <picture>
2
+ <img alt="Benchmax" src="./static/benchmax.png" width="full">
3
+ </picture>
4
+
5
+ ## benchmax: Framework-Agnostic RL Environments for LLM Fine-Tuning
6
+ *A lightweight, training-framework agnostic library for defining, running, and parallelizing environments, to fine-tune OSS LLMs with reinforcement learning.*
7
+ <div align="center">
8
+ </div>
9
+ <div id="badges" align="center">
10
+ <a href="https://cgft.io">
11
+ <img src="https://img.shields.io/badge/cgft.io-blue?style=for-the-badge" alt="Website"/>
12
+ </a>
13
+ <a href="https://x.com/cgftlabs">
14
+ <img src="https://img.shields.io/badge/Follow @cgftlabs-black?style=for-the-badge&logo=X&logoColor=white" alt="@cgftlabs"/>
15
+ </a>
16
+ </div>
17
+ <div align="center" style="line-height: 1;">
18
+ <a href="https://github.com/girishbarca/benchmax/blob/main/LICENSE"><img alt="License" src="https://img.shields.io/badge/License-Apache_2.0-blue.svg"/></a>
19
+ </div>
20
+
21
+ ## Overview
22
+
23
+ `benchmax` comes with:
24
+
25
+ - A collection of ready-to-use reinforcement learning (RL) environments for LLM fine-tuning ranging from multi-hop search to spreadsheet manipulation to CRM agents
26
+ - An easy to define, compose, and parallelize your own environments, including leveraging the existing ecosystem of MCP servers
27
+ - Built-in integrations with popular RL training libraries (verl, verifiers, etc.). `benchmax` is trainer-agnostic by design
28
+
29
+ Define your environment as:
30
+
31
+ 1. A **toolset** (LLM calls, external APIs, calculators, MCPs, etc.).
32
+ 2. **Output parsing** logic to extract structured observations.
33
+ 3. **Reward functions** to score model outputs.
34
+
35
+ Rollout management, parallel execution, etc. comes out of the box.
36
+
37
+ ⭐ Star our repository to show your support!
38
+
39
+ ## 💡 Core Features
40
+
41
+ **Built-in examples & templates**
42
+
43
+ Get started with ready to use recipes, from Wikipedia search to spreadsheet manipulation. Easy to copy, customize, and extend. And yes, more are on the way.
44
+
45
+ **Trainer Integrations**
46
+
47
+ Use your own trainer or training framework - no lock-in. `benchmax` is already Integrated into verl and verifiers, with more integrations (SkyRL, etc.) coming soon!
48
+
49
+ **MCP Support**
50
+ Tap into the growing MCP ecosystem and integrate them as tools within your environments.
51
+
52
+ **Parallel execution & state management**
53
+
54
+ - Local multi‐process pool
55
+ - State is isolated across roll-outs (e.g. editing files on local filesystem, etc.)
56
+ - Multi-Node Parallelization (Coming soon!)
57
+
58
+ ## 📘 Quickstart
59
+
60
+ **Example: Math Question Answering with a Calculator MCP**
61
+
62
+ **verl** is a training framework `benchmax` is currently integrated with. Use our ***verl*** integration to RL finetune Qwen-3 to do math using a calculator MCP (https://github.com/githejie/mcp-server-calculator). The environment is defined at `benchmax.envs.math.math_env.MathEnv`
63
+
64
+ 1. **Installation**
65
+
66
+ `pip install benchmax[verl]`
67
+
68
+ 1. **Prepare the dataset**
69
+
70
+ ```bash
71
+ python benchmax/adapters/verl/benchmax_data_process.py \
72
+ --local_dir ~/data/math \
73
+ --dataset_name dawidmt/arithmetic50 \
74
+ --env_path benchmax.envs.math.math_env.MathEnv
75
+ ```
76
+
77
+ 2. **Run training**
78
+
79
+ ```bash
80
+ sh examples/verl/run_qwen2.5-3b_benchmax_math.sh
81
+ ```
82
+
83
+ This math environment is just a quick example. Explore some of the more complex environments like `excel`, `crm` in `benchmax/envs`.
84
+
85
+ ## 🌐 Creating & Training with Environments
86
+
87
+ ### What is an environment?
88
+
89
+ An environment consists of:
90
+
91
+ - A list of tools that an LLM can call
92
+ - A list of reward functions that evaluate the quality & correctness of the model's final output.
93
+
94
+ We also support MCP servers natively, allowing you to easily leverage the many servers built by the community.
95
+
96
+ ### Pre-built environments
97
+
98
+ Ready-to-use environments with pre-configured tools and reward functions.
99
+
100
+ - [CRM](benchmax/envs/crm/README.md)
101
+ - [Excel](benchmax/envs/excel/README.md)
102
+ - [Math](benchmax/envs/math/README.md)
103
+ - [Wikipedia](benchmax/envs/wikipedia/README.md)
104
+
105
+ ### How do I create a custom environment?
106
+
107
+ <details>
108
+ <summary>With existing MCP Servers</summary>
109
+
110
+ To create a custom environment using an MCP server (like a calculator, browser, or spreadsheet), you can extend `LocalMCPEnv`. Here's a quick step-by-step guide using `benchmax.envs.math.math_env.MathEnv` as an example.
111
+
112
+ ### 1. **Define a System Prompt**
113
+
114
+ This prompt guides the LLM’s behavior. It can include any instruction, such as how to format the answer or when to use tools.
115
+
116
+ ```python
117
+ SYSTEM_PROMPT = """Please use the tools provided to do any computation.
118
+ Write your complete answer on the final line only, within the xml tags <answer></answer>.
119
+ """
120
+ ```
121
+
122
+ ### 2. **Configure MCP Server(s)**
123
+
124
+ Define the MCP servers to be launched. You can configure one or more:
125
+
126
+ ```python
127
+ MCP_CONFIG = """
128
+ {
129
+ "mcpServers": {
130
+ "server-name": {
131
+ "command": "uvx",
132
+ "args": ["mcp_server_calculator"]
133
+ }
134
+ }
135
+ }
136
+ """
137
+ ```
138
+
139
+ ### 3. **Write a Reward Function**
140
+
141
+ The reward function evaluates how "correct" the model's output is, based on structured output. Here’s a simple XML-based example:
142
+
143
+ Note that `**kwargs` contains all the other fields in your dataset, so feel free to use them in `reward_func` calculations.
144
+
145
+ ```python
146
+ def reward_func(prompt, completion, ground_truth, workspace, **kwargs):
147
+ m = re.search(r'<answer>(.*?)</answer>', completion, flags=re.IGNORECASE | re.DOTALL)
148
+ if not m:
149
+ return 0.0
150
+ answer_text = unescape(m.group(1)).strip().lower()
151
+ return float(ground_truth.lower() == answer_text)
152
+ ```
153
+
154
+ ### 4. Define **`dataset_preprocess`**
155
+
156
+ If your dataset is not already standardized, implement this method to convert a raw example into a standardized one with:
157
+
158
+ - `"prompt"`: A fully constructed string prompt.
159
+ - `"ground_truth"`: A known correct output (optional depending on reward).
160
+ - `"init_rollout_args"`: Arguments needed to initialize a rollout.
161
+
162
+ Example for our math task:
163
+
164
+ ```python
165
+ def dataset_preprocess(self, example: dict) -> StandardizedExample:
166
+ return StandardizedExample(
167
+ prompt=example.get("task", ""),
168
+ ground_truth=example.get("answer", ""),
169
+ init_rollout_args={}
170
+ )
171
+ ```
172
+
173
+ <details>
174
+ <summary>Notes on init_rollout_args</summary>
175
+ The `init_rollout_args` dictionary is passed from `dataset_preprocess()` to your environment's `init_rollout()` method. It is used to initialize any **per-example files, resources, or execution context** needed before a rollout begins.
176
+
177
+ Common use cases include:
178
+
179
+ - **Input files**: For environments that manipulate files like spreadsheets, images, or databases, pass the necessary file paths.
180
+ - **Version control**: For code-related tasks, you might pass a `commit_id` to check out the correct code state.
181
+ - **Task-specific settings**: Pass metadata like cell ranges, task IDs, or execution flags.
182
+
183
+ Example:
184
+
185
+ ```python
186
+ # Inside dataset_preprocess
187
+ return {
188
+ "prompt": "...",
189
+ "ground_truth": "...",
190
+ "init_rollout_args": {
191
+ "spreadsheet_path": "/path/to/1_001_input.xlsx"
192
+ }
193
+ }
194
+ ```
195
+
196
+ Then in your `init_rollout()` method:
197
+
198
+ ```python
199
+ def init_rollout(self, rollout_id: str, **rollout_args):
200
+ spreadsheet_path = rollout_args["spreadsheet_path"]
201
+ workspace = self.get_rollout_workspace(rollout_id)
202
+
203
+ # Copy the input file into the rollout's workspace
204
+ shutil.copy(spreadsheet_path, workspace / Path(spreadsheet_path).name)
205
+ ```
206
+
207
+ This pattern ensures each rollout starts with the correct inputs and configuration.
208
+ </details>
209
+
210
+
211
+ ### 5. **Extend `LocalMCPEnv`**
212
+
213
+ Now bring everything together into a custom environment class:
214
+
215
+ ```python
216
+ from envs.local_mcp_env import LocalMCPEnv
217
+ from typing import List
218
+
219
+ class MathEnv(LocalMCPEnv):
220
+ """Environment for math problems, using local MCP tools."""
221
+
222
+ system_prompt: str = SYSTEM_PROMPT
223
+ reward_funcs: List[RewardFunction] = [reward_func]
224
+
225
+ def __init__(self, **kwargs):
226
+ super().__init__(MCP_CONFIG)
227
+
228
+ def dataset_preprocess(self, example: Any) -> StandardizedExample:
229
+ return StandardizedExample(
230
+ prompt=example.get("task", ""),
231
+ ground_truth=example.get("answer", ""),
232
+ init_rollout_args={}
233
+ )
234
+ ```
235
+
236
+ You're done! This environment is now compatible with `benchmax` and can be plugged into any compatible RL trainer.
237
+ </details>
238
+ <details>
239
+ <summary>Extend BaseEnv</summary>
240
+ If you don’t need MCP servers, you can build a environment from scratch by extending `BaseEnv` directly. Here's how to make a minimal math environment with a single tool: an arithmetic evaluator.
241
+
242
+ ### 1. **Define the system prompt**
243
+
244
+ This helps instruct the model on how to interact with the tool and format output.
245
+
246
+ ```python
247
+ SYSTEM_PROMPT = """Use the `evaluate` tool to perform any computation.
248
+ Write your final answer on the last line inside <answer>...</answer>.
249
+ """
250
+ ```
251
+
252
+ ### 2. **Create a reward function**
253
+
254
+ We'll score the model 1.0 if it places the correct answer inside `<answer>...</answer>` tags:
255
+
256
+ ```python
257
+ import re
258
+ from html import unescape
259
+ from pathlib import Path
260
+
261
+ def reward_func(prompt: str, completion: str, ground_truth: str, workspace: Path, **kwargs) -> float:
262
+ m = re.search(r'<answer>(.*?)</answer>', completion, flags=re.IGNORECASE | re.DOTALL)
263
+ if not m:
264
+ return 0.0
265
+ answer_text = unescape(m.group(1)).strip().lower()
266
+ return float(answer_text == ground_truth.lower())
267
+ ```
268
+
269
+ ### 3. **Define your math tool**
270
+
271
+ A simple safe `eval` for math expressions:
272
+
273
+ ```python
274
+ def evaluate_expression(expr: str) -> str:
275
+ try:
276
+ result = eval(expr, {"__builtins__": {}})
277
+ return str(result)
278
+ except Exception as e:
279
+ return f"Error: {str(e)}"
280
+ ```
281
+
282
+ ### 4. **Create the environment class**
283
+
284
+ Bring it all together in a subclass of `BaseEnv`:
285
+
286
+ ```python
287
+ class SimpleMathEnv(BaseEnv):
288
+ system_prompt: str = SYSTEM_PROMPT
289
+ _reward_funcs: List[RewardFunction] = [reward_func]
290
+
291
+ def __init__(self):
292
+ eval_tool = ToolDefinition(
293
+ name="evaluate",
294
+ description="Safely evaluate a math expression like '2 + 3 * 4'.",
295
+ input_schema={
296
+ "type": "object",
297
+ "properties": {
298
+ "expr": {
299
+ "type": "string",
300
+ "description": "Math expression to evaluate.",
301
+ },
302
+ },
303
+ "required": ["expr"],
304
+ }
305
+ )
306
+ self.tools: Dict[str, Tuple[ToolDefinition, Callable]] = {
307
+ "evaluate": (eval_tool, evaluate_expression)
308
+ }
309
+ def dataset_preprocess(self, example: dict) -> StandardizedExample:
310
+ return {
311
+ "prompt": f"Question: {example['question']}\n\nWrite your answer below.",
312
+ "ground_truth": example.get("answer", ""),
313
+ "init_rollout_args": {}
314
+ }
315
+
316
+ def list_tools(self) -> List[ToolDefinition]:
317
+ return [tool_def for tool_def, _ in self.tools.values()]
318
+
319
+ def run_tool(self, rollout_id: str, tool_name: str, **tool_args) -> Any:
320
+ _, tool_fn = self.tools[tool_name]
321
+ return tool_fn(**tool_args)
322
+ ```
323
+ </details>
324
+
325
+ ### How about more complex environments?
326
+
327
+ - Check out our excel spreadsheet RL environment: `benchmax.envs.excel.excel_env.ExcelEnv`
328
+
329
+ ### How do I use an environment with my preferred RL Trainer?
330
+
331
+ We currently have integrations with both verifiers and verl. More incoming!
332
+
333
+ [`benchmax` environments with verl](/examples/verl/README.md)
334
+
335
+ [`benchmax` environments with verifiers](/examples/verifiers/README.md)
336
+
337
+ ### I want a specific environment
338
+
339
+ Open an issue and tag us & we will look into building you one!
340
+
341
+ ---
342
+
343
+ ## 🎯 Motivation
344
+
345
+ - **Modularity and Simplicity**:
346
+
347
+ We set out to build a lightweight, modular system for defining RL environments—breaking them down into simple, composable parts: tools, tool output parsing, and reward functions.
348
+
349
+ The goal’s to make it easy for software engineers to build and experiment with RL environments without needing deep RL expertise.
350
+
351
+ - **Trainer Integrations**:
352
+
353
+ There’s been lots of new RL training frameworks popping up (e.g., numerous forks of verl) & we expect this to continue. They are often tightly coupled with specific environments, leading to fragmentation and limited compatibility.
354
+
355
+ We are building `benchmax` as a standalone library with integrations to these different training frameworks & as an easy way for new frameworks to tap into an existing pool of environments. We're already integrated with verl and verifiers. More integrations (e.g. SkyRL) coming soon!
356
+
357
+ - **Task Recipes and Ideas**:
358
+
359
+ We want `benchmax` to be a living library of reusable, RL-compatible task recipes, ready to inspire and extend beyond the usual suspects like math and coding. We aim to support more real-world workflows, including open-ended and long-horizon tasks.
360
+
361
+ - **Parallelization and Cloud Compatibility**:
362
+ - Enable efficient parallelization with maintained statefulness between rollouts.
363
+ - Facilitate easy deployment and scalability in cloud environments.
364
+ - **MCP as a first class citizen**:
365
+
366
+ There has been an explosion of MCP servers/tools built out for usecases ranging from browser use to excel to game creation.`benchmax` allow folks to leverage and composes these existing MCP servers to build environment integrated with real world systems e.g. excel
367
+
368
+
369
+ ## 🤝 Contributing
370
+
371
+ We welcome new environment recipes, bug reports, and trainer integrations!
372
+
373
+ ⭐ Star our repository to show your support!
374
+
375
+ ## 📜 License
376
+
377
+ Apache 2.0 © 2025 CGFT Inc.
@@ -1,6 +1,6 @@
1
1
  [tool.poetry]
2
2
  name = "benchmax"
3
- version = "0.1.1.dev1"
3
+ version = "0.1.1.dev2"
4
4
  description = "Framework-Agnostic RL Environments for LLM Fine-Tuning"
5
5
  authors = ["cgft.io"]
6
6
  readme = "README.md"
@@ -13,7 +13,7 @@ python = ">=3.11,<3.13"
13
13
  fastmcp = "~2.10.0"
14
14
 
15
15
  verl-cgft-fork = { version = "0.4.1.dev1", optional = true }
16
- sglang = { version = "0.4.9", optional = true }
16
+ sglang = { version = "0.4.9", optional = true, extras = ["all"] }
17
17
  verifiers = { version = "^0.1.1", optional = true, extras = ["train"] }
18
18
 
19
19
  openpyxl = { version = "3.1.5", optional = true }
@@ -1,43 +0,0 @@
1
- Metadata-Version: 2.3
2
- Name: benchmax
3
- Version: 0.1.1.dev1
4
- Summary: Framework-Agnostic RL Environments for LLM Fine-Tuning
5
- Author: cgft.io
6
- Requires-Python: >=3.11,<3.13
7
- Classifier: Programming Language :: Python :: 3
8
- Classifier: Programming Language :: Python :: 3.11
9
- Classifier: Programming Language :: Python :: 3.12
10
- Provides-Extra: crm
11
- Provides-Extra: excel
12
- Provides-Extra: excel-linux
13
- Provides-Extra: verifiers
14
- Provides-Extra: verl
15
- Requires-Dist: fastmcp (>=2.10.0,<2.11.0)
16
- Requires-Dist: openpyxl (==3.1.5) ; extra == "excel-linux" or extra == "excel"
17
- Requires-Dist: python-dateutil (>=2.9.0,<2.10.0) ; extra == "crm"
18
- Requires-Dist: sglang (==0.4.9) ; extra == "verl"
19
- Requires-Dist: simple-salesforce (>=1.12.3) ; extra == "crm"
20
- Requires-Dist: verifiers[train] (>=0.1.1,<0.2.0) ; extra == "verifiers"
21
- Requires-Dist: verl-cgft-fork (==0.4.1.dev1) ; extra == "verl"
22
- Requires-Dist: xlwings (==0.33.15) ; extra == "excel"
23
- Description-Content-Type: text/markdown
24
-
25
- <picture>
26
- <img alt="Benchmax" src="./static/benchmax.png" width="full">
27
- </picture>
28
-
29
- ## benchmax: Framework-Agnostic Reinforcement Learning Environments for LLM Fine-Tuning
30
-
31
- <div align="center">
32
- </div>
33
- <div id="badges" align="center">
34
- <a href="https://cgft.io">
35
- <img src="https://img.shields.io/badge/cgft.io-blue?style=for-the-badge" alt="Website"/>
36
- </a>
37
- <a href="https://x.com/cgftlabs">
38
- <img src="https://img.shields.io/badge/Follow @cgftlabs-black?style=for-the-badge&logo=X&logoColor=white" alt="@cgftlabs"/>
39
- </a>
40
- </div>
41
- <div align="center" style="line-height: 1;">
42
- <a href="https://github.com/girishbarca/benchmax/blob/main/LICENSE"><img alt="License" src="https://img.shields.io/badge/License-Apache_2.0-blue.svg"/></a>
43
- </div>
@@ -1,19 +0,0 @@
1
- <picture>
2
- <img alt="Benchmax" src="./static/benchmax.png" width="full">
3
- </picture>
4
-
5
- ## benchmax: Framework-Agnostic Reinforcement Learning Environments for LLM Fine-Tuning
6
-
7
- <div align="center">
8
- </div>
9
- <div id="badges" align="center">
10
- <a href="https://cgft.io">
11
- <img src="https://img.shields.io/badge/cgft.io-blue?style=for-the-badge" alt="Website"/>
12
- </a>
13
- <a href="https://x.com/cgftlabs">
14
- <img src="https://img.shields.io/badge/Follow @cgftlabs-black?style=for-the-badge&logo=X&logoColor=white" alt="@cgftlabs"/>
15
- </a>
16
- </div>
17
- <div align="center" style="line-height: 1;">
18
- <a href="https://github.com/girishbarca/benchmax/blob/main/LICENSE"><img alt="License" src="https://img.shields.io/badge/License-Apache_2.0-blue.svg"/></a>
19
- </div>
File without changes