batchbench 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,2 @@
1
+ include README.md
2
+ recursive-include src/batchbench/bin *
@@ -0,0 +1,80 @@
1
+ Metadata-Version: 2.4
2
+ Name: batchbench
3
+ Version: 0.1.0
4
+ Summary: Offline and online benchmarking utilities for large language model workloads
5
+ Author: BatchBench Contributors
6
+ License: Apache-2.0
7
+ Requires-Python: >=3.9
8
+ Description-Content-Type: text/markdown
9
+ Provides-Extra: generate
10
+ Requires-Dist: transformers>=4.39.0; extra == "generate"
11
+ Provides-Extra: offline
12
+ Requires-Dist: vllm>=0.4.0; extra == "offline"
13
+
14
+ # BatchBench
15
+
16
+ BatchBench bundles three benchmarking utilities behind installable Python entrypoints:
17
+
18
+ - `batchbench.generate` produces JSONL request corpora with controllable prefix overlap and approximate token counts.
19
+ - `batchbench.offline` drives an offline vLLM workload to record prompt and generation throughput.
20
+ - `batchbench.online` launches the packaged Rust binary that fans requests out to OpenAI-compatible endpoints in parallel.
21
+
22
+ ## Installation
23
+
24
+ ```bash
25
+ pip install batchbench
26
+ ```
27
+
28
+ Optional extras install tool-specific dependencies:
29
+
30
+ ```bash
31
+ pip install "batchbench[generate]" # adds transformers for prompt sizing
32
+ pip install "batchbench[offline]" # adds vllm for the offline benchmark
33
+ ```
34
+
35
+ ## Generating Requests
36
+
37
+ ```bash
38
+ batchbench.generate \
39
+ --count 100 \
40
+ --prefix-overlap 0.3 \
41
+ --approx-input-tokens 512 \
42
+ --tokenizer-model gpt-3.5-turbo \
43
+ --output data
44
+ ```
45
+
46
+ Each row in the resulting JSONL file has a `text` field. The filename embeds run metadata (count, tokens, prefix, tokenizer) to keep runs distinct.
47
+
48
+ ## Offline Benchmarking
49
+
50
+ The offline harness requires vLLM and a compatible model checkpoint.
51
+
52
+ ```bash
53
+ batchbench.offline \
54
+ --model facebook/opt-125m \
55
+ --num_reqs 2048 \
56
+ --icl 1024 \
57
+ --ocl 1
58
+ ```
59
+
60
+ The command prints prompt/generation throughput statistics and writes the sampled history to `vllm_throughput_history.csv` (configurable via `--throughput_csv`).
61
+
62
+ ## Online Benchmarking
63
+
64
+ `batchbench.online` wraps the Rust executable that used to live under `rust-bench/`. The binary ships inside the wheel, so Cargo is not required on the host.
65
+
66
+ ```bash
67
+ batchbench.online \
68
+ --jsonl data/requests.jsonl \
69
+ --model gpt-4o-mini \
70
+ --host https://api.openai.com \
71
+ --endpoint /v1/chat/completions \
72
+ --users 8 \
73
+ --requests-per-user 1
74
+ ```
75
+
76
+ Provide an API key via `--api-key` or the environment variable named by `--api-key-env` (defaults to `OPENAI_API_KEY`).
77
+
78
+ ## Development Notes
79
+
80
+ The project now follows a `src/` layout. Run `pip install -e .[generate,offline]` during development to work against the editable package. The Rust binary can be rebuilt with `cargo build --release` inside `rust-bench/`; copy the resulting executable to `src/batchbench/bin/` if you need to refresh it.
@@ -0,0 +1,67 @@
1
+ # BatchBench
2
+
3
+ BatchBench bundles three benchmarking utilities behind installable Python entrypoints:
4
+
5
+ - `batchbench.generate` produces JSONL request corpora with controllable prefix overlap and approximate token counts.
6
+ - `batchbench.offline` drives an offline vLLM workload to record prompt and generation throughput.
7
+ - `batchbench.online` launches the packaged Rust binary that fans requests out to OpenAI-compatible endpoints in parallel.
8
+
9
+ ## Installation
10
+
11
+ ```bash
12
+ pip install batchbench
13
+ ```
14
+
15
+ Optional extras install tool-specific dependencies:
16
+
17
+ ```bash
18
+ pip install "batchbench[generate]" # adds transformers for prompt sizing
19
+ pip install "batchbench[offline]" # adds vllm for the offline benchmark
20
+ ```
21
+
22
+ ## Generating Requests
23
+
24
+ ```bash
25
+ batchbench.generate \
26
+ --count 100 \
27
+ --prefix-overlap 0.3 \
28
+ --approx-input-tokens 512 \
29
+ --tokenizer-model gpt-3.5-turbo \
30
+ --output data
31
+ ```
32
+
33
+ Each row in the resulting JSONL file has a `text` field. The filename embeds run metadata (count, tokens, prefix, tokenizer) to keep runs distinct.
34
+
35
+ ## Offline Benchmarking
36
+
37
+ The offline harness requires vLLM and a compatible model checkpoint.
38
+
39
+ ```bash
40
+ batchbench.offline \
41
+ --model facebook/opt-125m \
42
+ --num_reqs 2048 \
43
+ --icl 1024 \
44
+ --ocl 1
45
+ ```
46
+
47
+ The command prints prompt/generation throughput statistics and writes the sampled history to `vllm_throughput_history.csv` (configurable via `--throughput_csv`).
48
+
49
+ ## Online Benchmarking
50
+
51
+ `batchbench.online` wraps the Rust executable that used to live under `rust-bench/`. The binary ships inside the wheel, so Cargo is not required on the host.
52
+
53
+ ```bash
54
+ batchbench.online \
55
+ --jsonl data/requests.jsonl \
56
+ --model gpt-4o-mini \
57
+ --host https://api.openai.com \
58
+ --endpoint /v1/chat/completions \
59
+ --users 8 \
60
+ --requests-per-user 1
61
+ ```
62
+
63
+ Provide an API key via `--api-key` or the environment variable named by `--api-key-env` (defaults to `OPENAI_API_KEY`).
64
+
65
+ ## Development Notes
66
+
67
+ The project now follows a `src/` layout. Run `pip install -e .[generate,offline]` during development to work against the editable package. The Rust binary can be rebuilt with `cargo build --release` inside `rust-bench/`; copy the resulting executable to `src/batchbench/bin/` if you need to refresh it.
@@ -0,0 +1,33 @@
1
+ [build-system]
2
+ requires = ["setuptools>=69", "wheel"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "batchbench"
7
+ version = "0.1.0"
8
+ description = "Offline and online benchmarking utilities for large language model workloads"
9
+ readme = "README.md"
10
+ authors = [
11
+ {name = "BatchBench Contributors"}
12
+ ]
13
+ license = {text = "Apache-2.0"}
14
+ requires-python = ">=3.9"
15
+ dependencies = []
16
+
17
+ [project.optional-dependencies]
18
+ generate = ["transformers>=4.39.0"]
19
+ offline = ["vllm>=0.4.0"]
20
+
21
+ [project.scripts]
22
+ 'batchbench.generate' = "batchbench.generate:main"
23
+ 'batchbench.offline' = "batchbench.offline:main"
24
+ 'batchbench.online' = "batchbench.online:main"
25
+
26
+ [tool.setuptools]
27
+ package-dir = {"" = "src"}
28
+
29
+ [tool.setuptools.packages.find]
30
+ where = ["src"]
31
+
32
+ [tool.setuptools.package-data]
33
+ "batchbench" = ["bin/batchbench"]
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,22 @@
1
+ """batchbench brings offline and online benchmarking utilities under one roof."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from importlib import resources
6
+
7
+ try: # pragma: no cover
8
+ from importlib.metadata import PackageNotFoundError, version
9
+ except ImportError: # pragma: no cover
10
+ from importlib_metadata import PackageNotFoundError, version # type: ignore
11
+
12
+
13
+ def package_version() -> str:
14
+ """Return the installed package version or a placeholder when run from source."""
15
+ try:
16
+ return version("batchbench")
17
+ except PackageNotFoundError:
18
+ return "0.0.0"
19
+
20
+
21
+ __all__ = ["package_version", "resources"]
22
+ __version__ = package_version()
@@ -0,0 +1,256 @@
1
+ """Utilities and CLI entrypoint for generating batchbench request payloads."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import argparse
6
+ import json
7
+ import math
8
+ import os
9
+ import random
10
+ import re
11
+ from pathlib import Path
12
+ from typing import Any, List
13
+
14
+ DEFAULT_PREFIX_TEXT = (
15
+ "In this experiment we explore the capability of large language models "
16
+ "to adapt their narrative based on subtle contextual variations. The "
17
+ "following prompt requests creative output across a range of scenarios."
18
+ )
19
+
20
+
21
+ def load_tokenizer(model_name: str, token: str | None = None):
22
+ """Load a Hugging Face tokenizer and configure pad token if needed."""
23
+ try:
24
+ from transformers import AutoTokenizer
25
+ except ImportError as exc: # pragma: no cover
26
+ raise SystemExit(
27
+ "Approximate token sizing requires the 'transformers' package. "
28
+ "Install it with `pip install transformers`."
29
+ ) from exc
30
+
31
+ load_kwargs = {"use_fast": True}
32
+ if token:
33
+ load_kwargs["token"] = token
34
+ try:
35
+ tokenizer = AutoTokenizer.from_pretrained(model_name, **load_kwargs)
36
+ except Exception as exc: # pragma: no cover
37
+ raise SystemExit(f"Failed to load tokenizer '{model_name}': {exc}") from exc
38
+
39
+ if tokenizer.pad_token is None and tokenizer.eos_token is not None:
40
+ tokenizer.pad_token = tokenizer.eos_token
41
+
42
+ return tokenizer
43
+
44
+
45
+ def assemble_prompts(
46
+ count: int,
47
+ prefix_overlap: float,
48
+ *,
49
+ target_tokens: int | None = None,
50
+ tokenizer: Any | None = None,
51
+ tolerance: int = 5,
52
+ ) -> List[str]:
53
+ """Create prompt strings whose prefixes overlap by the requested fraction."""
54
+ if tokenizer is None:
55
+ raise ValueError("A tokenizer is required to detokenize random token ids.")
56
+
57
+ prompts: List[str] = []
58
+ rng = random.Random()
59
+ sequence_lengths: List[int] = []
60
+ for _ in range(count):
61
+ sequence_length = 1
62
+ if target_tokens and target_tokens > 0:
63
+ lower = max(1, target_tokens - (tolerance if tolerance else 0))
64
+ upper = target_tokens + (tolerance if tolerance else 0)
65
+ if lower > upper:
66
+ lower = upper
67
+ sequence_length = rng.randint(lower, upper) if lower != upper else lower
68
+ sequence_lengths.append(sequence_length)
69
+
70
+ prefix_ratio = max(0.0, min(prefix_overlap, 1.0))
71
+ min_length = min(sequence_lengths) if sequence_lengths else 0
72
+ prefix_length = int(math.floor(min_length * prefix_ratio)) if min_length else 0
73
+ if prefix_ratio > 0.0 and prefix_length == 0 and min_length > 0:
74
+ prefix_length = 1
75
+
76
+ prefix_ids = (
77
+ [rng.randint(1, 10000) for _ in range(prefix_length)] if prefix_length else []
78
+ )
79
+
80
+ for seq_length in sequence_lengths:
81
+ token_ids = [rng.randint(1, 10000) for _ in range(seq_length)]
82
+ unique_ids = token_ids[prefix_length:]
83
+ final_ids = prefix_ids + unique_ids
84
+
85
+ prompt_text = tokenizer.decode(
86
+ final_ids,
87
+ skip_special_tokens=True,
88
+ clean_up_tokenization_spaces=True,
89
+ )
90
+
91
+ if not prompt_text.strip():
92
+ try:
93
+ tokens = tokenizer.convert_ids_to_tokens(final_ids)
94
+ prompt_text = " ".join(tokens).strip()
95
+ except Exception:
96
+ prompt_text = " ".join(str(tid) for tid in final_ids)
97
+
98
+ prompts.append(prompt_text)
99
+
100
+ return prompts
101
+
102
+
103
+ def parse_args(argv: List[str] | None = None) -> argparse.Namespace:
104
+ parser = argparse.ArgumentParser(description=__doc__)
105
+ parser.add_argument(
106
+ "--count",
107
+ "-n",
108
+ type=int,
109
+ default=10,
110
+ help="Number of requests to generate (default: 10)",
111
+ )
112
+ parser.add_argument(
113
+ "--prefix-overlap",
114
+ type=float,
115
+ default=0.0,
116
+ help="Fraction of tokens shared as a prefix across requests (0.0-1.0, default: 0.0)",
117
+ )
118
+ parser.add_argument(
119
+ "--output",
120
+ "-o",
121
+ type=Path,
122
+ default=Path("outputs"),
123
+ help=(
124
+ "Destination directory or base filename. Metadata (count, prefix, token "
125
+ "target, tokenizer) is appended automatically."
126
+ ),
127
+ )
128
+ parser.add_argument(
129
+ "--approx-input-tokens",
130
+ type=int,
131
+ default=0,
132
+ help="Approximate number of tokens each prompt should contain (default: 0 = no adjustment)",
133
+ )
134
+ parser.add_argument(
135
+ "--tokenizer-model",
136
+ default=None,
137
+ help=(
138
+ "Hugging Face tokenizer identifier to use when approximating token counts. "
139
+ "Defaults to 'gpt2' when not specified."
140
+ ),
141
+ )
142
+ parser.add_argument(
143
+ "--token-tolerance",
144
+ type=int,
145
+ default=None,
146
+ help="Acceptable +/- token tolerance when approximating lengths (default: max(5, 5%% of target))",
147
+ )
148
+ parser.add_argument(
149
+ "--huggingface-token",
150
+ default=None,
151
+ help=(
152
+ "Personal access token for Hugging Face Hub (optional). If omitted, the "
153
+ "generator checks HUGGINGFACE_TOKEN and HUGGING_FACE_HUB_TOKEN env vars."
154
+ ),
155
+ )
156
+ return parser.parse_args(argv)
157
+
158
+
159
+ def resolve_tolerance(target_tokens: int, explicit: int | None) -> int:
160
+ if target_tokens <= 0:
161
+ return 0
162
+ if explicit is not None and explicit >= 0:
163
+ return explicit
164
+ return max(5, int(target_tokens * 0.05))
165
+
166
+
167
+ def sanitize_component(value: str | None) -> str:
168
+ if not value:
169
+ return "none"
170
+ cleaned = re.sub(r"[^0-9A-Za-z._-]+", "-", value).strip("-._")
171
+ return cleaned or "none"
172
+
173
+
174
+ def format_prefix(prefix_overlap: float) -> str:
175
+ return f"{prefix_overlap:.2f}".replace(".", "p")
176
+
177
+
178
+ def build_output_path(
179
+ base_path: Path,
180
+ *,
181
+ count: int,
182
+ prefix_overlap: float,
183
+ target_tokens: int | None,
184
+ tokenizer_label: str,
185
+ ) -> Path:
186
+ tokens_label = str(target_tokens) if target_tokens and target_tokens > 0 else "none"
187
+ prefix_label = format_prefix(prefix_overlap)
188
+ tokenizer_component = sanitize_component(tokenizer_label)
189
+ metadata_suffix = (
190
+ f"count-{count}_tokens-{tokens_label}_prefix-{prefix_label}_tokenizer-{tokenizer_component}"
191
+ )
192
+
193
+ if base_path.suffix == ".jsonl" and not base_path.is_dir():
194
+ directory = base_path.parent if base_path.parent else Path(".")
195
+ stem = sanitize_component(base_path.stem) or "requests"
196
+ filename = f"{stem}_{metadata_suffix}.jsonl"
197
+ else:
198
+ directory = base_path
199
+ filename = f"requests_{metadata_suffix}.jsonl"
200
+
201
+ directory.mkdir(parents=True, exist_ok=True)
202
+ output_path = directory / filename
203
+ if output_path.exists():
204
+ raise SystemExit(
205
+ f"Refusing to overwrite existing file {output_path}. Delete it or choose a different output directory."
206
+ )
207
+ return output_path
208
+
209
+
210
+ def main(argv: List[str] | None = None) -> int:
211
+ args = parse_args(argv)
212
+
213
+ hf_token = (
214
+ args.huggingface_token
215
+ or os.getenv("HUGGINGFACE_TOKEN")
216
+ or os.getenv("HUGGING_FACE_HUB_TOKEN")
217
+ )
218
+
219
+ target_tokens = args.approx_input_tokens if args.approx_input_tokens > 0 else None
220
+ tokenizer_label = args.tokenizer_model or "gpt2"
221
+ tokenizer = load_tokenizer(tokenizer_label, hf_token)
222
+ tolerance = (
223
+ resolve_tolerance(target_tokens, args.token_tolerance) if target_tokens else 0
224
+ )
225
+
226
+ prompts = assemble_prompts(
227
+ count=args.count,
228
+ prefix_overlap=args.prefix_overlap,
229
+ target_tokens=target_tokens,
230
+ tokenizer=tokenizer,
231
+ tolerance=tolerance,
232
+ )
233
+
234
+ output_path = build_output_path(
235
+ args.output,
236
+ count=args.count,
237
+ prefix_overlap=args.prefix_overlap,
238
+ target_tokens=target_tokens,
239
+ tokenizer_label=tokenizer_label,
240
+ )
241
+
242
+ with output_path.open("w", encoding="utf-8") as handle:
243
+ for prompt in prompts:
244
+ json.dump({"text": prompt}, handle)
245
+ handle.write("\n")
246
+
247
+ print(
248
+ f"Wrote {len(prompts)} request prompts to {output_path} "
249
+ f"with prefix overlap {args.prefix_overlap:.2f}."
250
+ )
251
+
252
+ return 0
253
+
254
+
255
+ if __name__ == "__main__": # pragma: no cover
256
+ raise SystemExit(main())
@@ -0,0 +1,210 @@
1
+ """Offline vLLM benchmarking CLI entrypoint."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import argparse
6
+ import csv
7
+ import logging
8
+ import os
9
+ import re
10
+ from contextlib import contextmanager
11
+ from random import randint
12
+ from statistics import mean, pstdev
13
+ from typing import Dict, Iterable, List, Optional
14
+
15
+ def _load_vllm():
16
+ try:
17
+ from vllm import LLM, SamplingParams # type: ignore
18
+ except ImportError as exc: # pragma: no cover
19
+ raise SystemExit(
20
+ "vLLM is required for batchbench.offline. Install with `pip install batchbench[offline]`."
21
+ ) from exc
22
+
23
+ # Ensure vLLM keeps emitting throughput stats while we run.
24
+ os.environ.setdefault("VLLM_LOG_STATS_INTERVAL", "1")
25
+ return LLM, SamplingParams
26
+
27
+
28
+ class VLLMThroughputCollector(logging.Handler):
29
+ """Logging handler that captures vLLM throughput stats from INFO logs."""
30
+
31
+ def __init__(self):
32
+ super().__init__(level=logging.INFO)
33
+ self.prompt_tps: List[float] = []
34
+ self.gen_tps: List[float] = []
35
+ self.TP_LINE = re.compile(
36
+ r"Avg prompt throughput:\s*([0-9.]+)\s*tokens/s,\s*"
37
+ r"Avg generation throughput:\s*([0-9.]+)\s*tokens/s"
38
+ )
39
+
40
+ def emit(self, record: logging.LogRecord) -> None: # type: ignore[override]
41
+ try:
42
+ msg = record.getMessage()
43
+ except Exception:
44
+ return
45
+ match = self.TP_LINE.search(msg)
46
+ if match:
47
+ self.prompt_tps.append(float(match.group(1)))
48
+ self.gen_tps.append(float(match.group(2)))
49
+
50
+ def summary(self) -> Optional[Dict[str, float]]:
51
+ if not self.prompt_tps or not self.gen_tps:
52
+ return None
53
+ return {
54
+ "prompt_avg": mean(self.prompt_tps),
55
+ "prompt_std": pstdev(self.prompt_tps),
56
+ "gen_avg": mean(self.gen_tps),
57
+ "gen_std": pstdev(self.gen_tps),
58
+ }
59
+
60
+ def save_csv(self, path: str = "vllm_throughput_history.csv") -> str:
61
+ """Persist the captured throughput history for later inspection."""
62
+ with open(path, "w", newline="") as handle:
63
+ writer = csv.writer(handle)
64
+ writer.writerow(["prompt_tps", "gen_tps"])
65
+ for prompt_tps, gen_tps in zip(self.prompt_tps, self.gen_tps):
66
+ writer.writerow([prompt_tps, gen_tps])
67
+ return path
68
+
69
+
70
+ @contextmanager
71
+ def capture_vllm_throughput() -> Iterable[VLLMThroughputCollector]:
72
+ """Attach the collector to the vLLM logger while the workload runs."""
73
+ logger = logging.getLogger("vllm")
74
+ collector = VLLMThroughputCollector()
75
+ logger.addHandler(collector)
76
+ if logger.level > logging.INFO:
77
+ logger.setLevel(logging.INFO)
78
+ try:
79
+ yield collector
80
+ finally:
81
+ logger.removeHandler(collector)
82
+
83
+
84
+ def build_parser() -> argparse.ArgumentParser:
85
+ parser = argparse.ArgumentParser(
86
+ description=(
87
+ "Run the metrics workload while allowing the model and vLLM options "
88
+ "to be configured from the command line."
89
+ )
90
+ )
91
+ parser.add_argument(
92
+ "--model",
93
+ default="facebook/opt-125m",
94
+ help="Model identifier or path to load with vLLM."
95
+ )
96
+ parser.add_argument(
97
+ "--num_reqs",
98
+ type=int,
99
+ default=2048,
100
+ help="Number of synthetic prompts to generate."
101
+ )
102
+ parser.add_argument(
103
+ "--icl",
104
+ type=int,
105
+ default=1024,
106
+ help="Input context length (tokens per prompt)."
107
+ )
108
+ parser.add_argument(
109
+ "--ocl",
110
+ type=int,
111
+ default=1,
112
+ help="Output context length (max tokens generated per request)."
113
+ )
114
+ parser.add_argument(
115
+ "--throughput_csv",
116
+ default="vllm_throughput_history.csv",
117
+ help="Where to persist the throughput history CSV."
118
+ )
119
+ parser.add_argument(
120
+ "--tensor_parallel_size",
121
+ type=int,
122
+ default=1,
123
+ help="Tensor parallel world size for vLLM initialisation."
124
+ )
125
+ parser.add_argument(
126
+ "--pipeline_parallel_size",
127
+ type=int,
128
+ default=1,
129
+ help="Pipeline parallel world size for vLLM initialisation."
130
+ )
131
+ parser.add_argument(
132
+ "--max_num_batched_tokens",
133
+ type=int,
134
+ default=512,
135
+ help="Maximum tokens per batch when pre-filling prompts."
136
+ )
137
+ return parser
138
+
139
+
140
+ def validate_args(args: argparse.Namespace) -> None:
141
+ if args.num_reqs < 1:
142
+ raise ValueError("num_reqs must be >= 1")
143
+ if args.icl < 1:
144
+ raise ValueError("icl must be >= 1")
145
+ if args.ocl < 0:
146
+ raise ValueError("ocl must be >= 0")
147
+ if args.tensor_parallel_size < 1:
148
+ raise ValueError("tensor_parallel_size must be >= 1")
149
+ if args.pipeline_parallel_size < 1:
150
+ raise ValueError("pipeline_parallel_size must be >= 1")
151
+ if args.max_num_batched_tokens < 1:
152
+ raise ValueError("max_num_batched_tokens must be >= 1")
153
+
154
+
155
+ def main(argv: List[str] | None = None) -> int:
156
+ parser = build_parser()
157
+ args = parser.parse_args(argv)
158
+
159
+ try:
160
+ validate_args(args)
161
+ except ValueError as exc:
162
+ parser.error(str(exc))
163
+
164
+ LLM, SamplingParams = _load_vllm()
165
+
166
+ llm_kwargs = {
167
+ "model": args.model,
168
+ "disable_log_stats": False,
169
+ "enable_chunked_prefill": True,
170
+ "tensor_parallel_size": args.tensor_parallel_size,
171
+ "pipeline_parallel_size": args.pipeline_parallel_size,
172
+ "max_num_batched_tokens": args.max_num_batched_tokens,
173
+ }
174
+
175
+ sampling_params = SamplingParams(
176
+ temperature=0.8,
177
+ top_k=20,
178
+ max_tokens=args.ocl,
179
+ min_tokens=args.ocl,
180
+ )
181
+
182
+ llm = LLM(**llm_kwargs)
183
+
184
+ tokenizer = llm.get_tokenizer()
185
+
186
+ tokenized_prompts = [
187
+ [randint(1, 10000) for _ in range(args.icl)]
188
+ for _ in range(args.num_reqs)
189
+ ]
190
+ prompts = tokenizer.batch_decode(tokenized_prompts)
191
+
192
+ with capture_vllm_throughput() as collector:
193
+ llm.generate(prompts, sampling_params, use_tqdm=False)
194
+ stats = collector.summary()
195
+ csv_path = collector.save_csv(args.throughput_csv)
196
+
197
+ if stats is None:
198
+ print("No throughput lines were captured. Make sure vLLM is emitting stats logs.")
199
+ else:
200
+ print(
201
+ f"Prompt TPS mean={stats['prompt_avg']:.2f} +/- {stats['prompt_std']:.2f}\n"
202
+ f"Gen TPS mean={stats['gen_avg']:.2f} +/- {stats['gen_std']:.2f}\n"
203
+ f"History CSV written to: {csv_path}"
204
+ )
205
+
206
+ return 0
207
+
208
+
209
+ if __name__ == "__main__": # pragma: no cover
210
+ raise SystemExit(main())
@@ -0,0 +1,38 @@
1
+ """Wrapper that launches the bundled Rust-powered online benchmark."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import os
6
+ import subprocess
7
+ import sys
8
+
9
+ from importlib import resources
10
+
11
+
12
+ BINARY_NAME = "batchbench"
13
+
14
+
15
+ def main(argv: list[str] | None = None) -> int:
16
+ args = argv if argv is not None else sys.argv[1:]
17
+
18
+ resource = resources.files(__package__).joinpath("bin", BINARY_NAME)
19
+ if not resource.is_file(): # pragma: no cover
20
+ raise SystemExit(
21
+ "Bundled online benchmarking binary not found. "
22
+ "Please reinstall batchbench or report an issue."
23
+ )
24
+
25
+ env = os.environ.copy()
26
+
27
+ with resources.as_file(resource) as binary_path:
28
+ command = [str(binary_path), *args]
29
+ try:
30
+ completed = subprocess.run(command, check=False, env=env)
31
+ except FileNotFoundError as exc: # pragma: no cover
32
+ raise SystemExit(f"Failed to execute bundled binary: {exc}") from exc
33
+
34
+ return int(completed.returncode)
35
+
36
+
37
+ if __name__ == "__main__": # pragma: no cover
38
+ raise SystemExit(main())
@@ -0,0 +1,80 @@
1
+ Metadata-Version: 2.4
2
+ Name: batchbench
3
+ Version: 0.1.0
4
+ Summary: Offline and online benchmarking utilities for large language model workloads
5
+ Author: BatchBench Contributors
6
+ License: Apache-2.0
7
+ Requires-Python: >=3.9
8
+ Description-Content-Type: text/markdown
9
+ Provides-Extra: generate
10
+ Requires-Dist: transformers>=4.39.0; extra == "generate"
11
+ Provides-Extra: offline
12
+ Requires-Dist: vllm>=0.4.0; extra == "offline"
13
+
14
+ # BatchBench
15
+
16
+ BatchBench bundles three benchmarking utilities behind installable Python entrypoints:
17
+
18
+ - `batchbench.generate` produces JSONL request corpora with controllable prefix overlap and approximate token counts.
19
+ - `batchbench.offline` drives an offline vLLM workload to record prompt and generation throughput.
20
+ - `batchbench.online` launches the packaged Rust binary that fans requests out to OpenAI-compatible endpoints in parallel.
21
+
22
+ ## Installation
23
+
24
+ ```bash
25
+ pip install batchbench
26
+ ```
27
+
28
+ Optional extras install tool-specific dependencies:
29
+
30
+ ```bash
31
+ pip install "batchbench[generate]" # adds transformers for prompt sizing
32
+ pip install "batchbench[offline]" # adds vllm for the offline benchmark
33
+ ```
34
+
35
+ ## Generating Requests
36
+
37
+ ```bash
38
+ batchbench.generate \
39
+ --count 100 \
40
+ --prefix-overlap 0.3 \
41
+ --approx-input-tokens 512 \
42
+ --tokenizer-model gpt-3.5-turbo \
43
+ --output data
44
+ ```
45
+
46
+ Each row in the resulting JSONL file has a `text` field. The filename embeds run metadata (count, tokens, prefix, tokenizer) to keep runs distinct.
47
+
48
+ ## Offline Benchmarking
49
+
50
+ The offline harness requires vLLM and a compatible model checkpoint.
51
+
52
+ ```bash
53
+ batchbench.offline \
54
+ --model facebook/opt-125m \
55
+ --num_reqs 2048 \
56
+ --icl 1024 \
57
+ --ocl 1
58
+ ```
59
+
60
+ The command prints prompt/generation throughput statistics and writes the sampled history to `vllm_throughput_history.csv` (configurable via `--throughput_csv`).
61
+
62
+ ## Online Benchmarking
63
+
64
+ `batchbench.online` wraps the Rust executable that used to live under `rust-bench/`. The binary ships inside the wheel, so Cargo is not required on the host.
65
+
66
+ ```bash
67
+ batchbench.online \
68
+ --jsonl data/requests.jsonl \
69
+ --model gpt-4o-mini \
70
+ --host https://api.openai.com \
71
+ --endpoint /v1/chat/completions \
72
+ --users 8 \
73
+ --requests-per-user 1
74
+ ```
75
+
76
+ Provide an API key via `--api-key` or the environment variable named by `--api-key-env` (defaults to `OPENAI_API_KEY`).
77
+
78
+ ## Development Notes
79
+
80
+ The project now follows a `src/` layout. Run `pip install -e .[generate,offline]` during development to work against the editable package. The Rust binary can be rebuilt with `cargo build --release` inside `rust-bench/`; copy the resulting executable to `src/batchbench/bin/` if you need to refresh it.
@@ -0,0 +1,14 @@
1
+ MANIFEST.in
2
+ README.md
3
+ pyproject.toml
4
+ src/batchbench/__init__.py
5
+ src/batchbench/generate.py
6
+ src/batchbench/offline.py
7
+ src/batchbench/online.py
8
+ src/batchbench.egg-info/PKG-INFO
9
+ src/batchbench.egg-info/SOURCES.txt
10
+ src/batchbench.egg-info/dependency_links.txt
11
+ src/batchbench.egg-info/entry_points.txt
12
+ src/batchbench.egg-info/requires.txt
13
+ src/batchbench.egg-info/top_level.txt
14
+ src/batchbench/bin/batchbench
@@ -0,0 +1,4 @@
1
+ [console_scripts]
2
+ batchbench.generate = batchbench.generate:main
3
+ batchbench.offline = batchbench.offline:main
4
+ batchbench.online = batchbench.online:main
@@ -0,0 +1,6 @@
1
+
2
+ [generate]
3
+ transformers>=4.39.0
4
+
5
+ [offline]
6
+ vllm>=0.4.0
@@ -0,0 +1 @@
1
+ batchbench