batchbench 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- batchbench-0.1.0/MANIFEST.in +2 -0
- batchbench-0.1.0/PKG-INFO +80 -0
- batchbench-0.1.0/README.md +67 -0
- batchbench-0.1.0/pyproject.toml +33 -0
- batchbench-0.1.0/setup.cfg +4 -0
- batchbench-0.1.0/src/batchbench/__init__.py +22 -0
- batchbench-0.1.0/src/batchbench/bin/batchbench +0 -0
- batchbench-0.1.0/src/batchbench/generate.py +256 -0
- batchbench-0.1.0/src/batchbench/offline.py +210 -0
- batchbench-0.1.0/src/batchbench/online.py +38 -0
- batchbench-0.1.0/src/batchbench.egg-info/PKG-INFO +80 -0
- batchbench-0.1.0/src/batchbench.egg-info/SOURCES.txt +14 -0
- batchbench-0.1.0/src/batchbench.egg-info/dependency_links.txt +1 -0
- batchbench-0.1.0/src/batchbench.egg-info/entry_points.txt +4 -0
- batchbench-0.1.0/src/batchbench.egg-info/requires.txt +6 -0
- batchbench-0.1.0/src/batchbench.egg-info/top_level.txt +1 -0
|
@@ -0,0 +1,80 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: batchbench
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Offline and online benchmarking utilities for large language model workloads
|
|
5
|
+
Author: BatchBench Contributors
|
|
6
|
+
License: Apache-2.0
|
|
7
|
+
Requires-Python: >=3.9
|
|
8
|
+
Description-Content-Type: text/markdown
|
|
9
|
+
Provides-Extra: generate
|
|
10
|
+
Requires-Dist: transformers>=4.39.0; extra == "generate"
|
|
11
|
+
Provides-Extra: offline
|
|
12
|
+
Requires-Dist: vllm>=0.4.0; extra == "offline"
|
|
13
|
+
|
|
14
|
+
# BatchBench
|
|
15
|
+
|
|
16
|
+
BatchBench bundles three benchmarking utilities behind installable Python entrypoints:
|
|
17
|
+
|
|
18
|
+
- `batchbench.generate` produces JSONL request corpora with controllable prefix overlap and approximate token counts.
|
|
19
|
+
- `batchbench.offline` drives an offline vLLM workload to record prompt and generation throughput.
|
|
20
|
+
- `batchbench.online` launches the packaged Rust binary that fans requests out to OpenAI-compatible endpoints in parallel.
|
|
21
|
+
|
|
22
|
+
## Installation
|
|
23
|
+
|
|
24
|
+
```bash
|
|
25
|
+
pip install batchbench
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
Optional extras install tool-specific dependencies:
|
|
29
|
+
|
|
30
|
+
```bash
|
|
31
|
+
pip install "batchbench[generate]" # adds transformers for prompt sizing
|
|
32
|
+
pip install "batchbench[offline]" # adds vllm for the offline benchmark
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
## Generating Requests
|
|
36
|
+
|
|
37
|
+
```bash
|
|
38
|
+
batchbench.generate \
|
|
39
|
+
--count 100 \
|
|
40
|
+
--prefix-overlap 0.3 \
|
|
41
|
+
--approx-input-tokens 512 \
|
|
42
|
+
--tokenizer-model gpt-3.5-turbo \
|
|
43
|
+
--output data
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
Each row in the resulting JSONL file has a `text` field. The filename embeds run metadata (count, tokens, prefix, tokenizer) to keep runs distinct.
|
|
47
|
+
|
|
48
|
+
## Offline Benchmarking
|
|
49
|
+
|
|
50
|
+
The offline harness requires vLLM and a compatible model checkpoint.
|
|
51
|
+
|
|
52
|
+
```bash
|
|
53
|
+
batchbench.offline \
|
|
54
|
+
--model facebook/opt-125m \
|
|
55
|
+
--num_reqs 2048 \
|
|
56
|
+
--icl 1024 \
|
|
57
|
+
--ocl 1
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
The command prints prompt/generation throughput statistics and writes the sampled history to `vllm_throughput_history.csv` (configurable via `--throughput_csv`).
|
|
61
|
+
|
|
62
|
+
## Online Benchmarking
|
|
63
|
+
|
|
64
|
+
`batchbench.online` wraps the Rust executable that used to live under `rust-bench/`. The binary ships inside the wheel, so Cargo is not required on the host.
|
|
65
|
+
|
|
66
|
+
```bash
|
|
67
|
+
batchbench.online \
|
|
68
|
+
--jsonl data/requests.jsonl \
|
|
69
|
+
--model gpt-4o-mini \
|
|
70
|
+
--host https://api.openai.com \
|
|
71
|
+
--endpoint /v1/chat/completions \
|
|
72
|
+
--users 8 \
|
|
73
|
+
--requests-per-user 1
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
Provide an API key via `--api-key` or the environment variable named by `--api-key-env` (defaults to `OPENAI_API_KEY`).
|
|
77
|
+
|
|
78
|
+
## Development Notes
|
|
79
|
+
|
|
80
|
+
The project now follows a `src/` layout. Run `pip install -e .[generate,offline]` during development to work against the editable package. The Rust binary can be rebuilt with `cargo build --release` inside `rust-bench/`; copy the resulting executable to `src/batchbench/bin/` if you need to refresh it.
|
|
@@ -0,0 +1,67 @@
|
|
|
1
|
+
# BatchBench
|
|
2
|
+
|
|
3
|
+
BatchBench bundles three benchmarking utilities behind installable Python entrypoints:
|
|
4
|
+
|
|
5
|
+
- `batchbench.generate` produces JSONL request corpora with controllable prefix overlap and approximate token counts.
|
|
6
|
+
- `batchbench.offline` drives an offline vLLM workload to record prompt and generation throughput.
|
|
7
|
+
- `batchbench.online` launches the packaged Rust binary that fans requests out to OpenAI-compatible endpoints in parallel.
|
|
8
|
+
|
|
9
|
+
## Installation
|
|
10
|
+
|
|
11
|
+
```bash
|
|
12
|
+
pip install batchbench
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
Optional extras install tool-specific dependencies:
|
|
16
|
+
|
|
17
|
+
```bash
|
|
18
|
+
pip install "batchbench[generate]" # adds transformers for prompt sizing
|
|
19
|
+
pip install "batchbench[offline]" # adds vllm for the offline benchmark
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
## Generating Requests
|
|
23
|
+
|
|
24
|
+
```bash
|
|
25
|
+
batchbench.generate \
|
|
26
|
+
--count 100 \
|
|
27
|
+
--prefix-overlap 0.3 \
|
|
28
|
+
--approx-input-tokens 512 \
|
|
29
|
+
--tokenizer-model gpt-3.5-turbo \
|
|
30
|
+
--output data
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
Each row in the resulting JSONL file has a `text` field. The filename embeds run metadata (count, tokens, prefix, tokenizer) to keep runs distinct.
|
|
34
|
+
|
|
35
|
+
## Offline Benchmarking
|
|
36
|
+
|
|
37
|
+
The offline harness requires vLLM and a compatible model checkpoint.
|
|
38
|
+
|
|
39
|
+
```bash
|
|
40
|
+
batchbench.offline \
|
|
41
|
+
--model facebook/opt-125m \
|
|
42
|
+
--num_reqs 2048 \
|
|
43
|
+
--icl 1024 \
|
|
44
|
+
--ocl 1
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
The command prints prompt/generation throughput statistics and writes the sampled history to `vllm_throughput_history.csv` (configurable via `--throughput_csv`).
|
|
48
|
+
|
|
49
|
+
## Online Benchmarking
|
|
50
|
+
|
|
51
|
+
`batchbench.online` wraps the Rust executable that used to live under `rust-bench/`. The binary ships inside the wheel, so Cargo is not required on the host.
|
|
52
|
+
|
|
53
|
+
```bash
|
|
54
|
+
batchbench.online \
|
|
55
|
+
--jsonl data/requests.jsonl \
|
|
56
|
+
--model gpt-4o-mini \
|
|
57
|
+
--host https://api.openai.com \
|
|
58
|
+
--endpoint /v1/chat/completions \
|
|
59
|
+
--users 8 \
|
|
60
|
+
--requests-per-user 1
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
Provide an API key via `--api-key` or the environment variable named by `--api-key-env` (defaults to `OPENAI_API_KEY`).
|
|
64
|
+
|
|
65
|
+
## Development Notes
|
|
66
|
+
|
|
67
|
+
The project now follows a `src/` layout. Run `pip install -e .[generate,offline]` during development to work against the editable package. The Rust binary can be rebuilt with `cargo build --release` inside `rust-bench/`; copy the resulting executable to `src/batchbench/bin/` if you need to refresh it.
|
|
@@ -0,0 +1,33 @@
|
|
|
1
|
+
[build-system]
|
|
2
|
+
requires = ["setuptools>=69", "wheel"]
|
|
3
|
+
build-backend = "setuptools.build_meta"
|
|
4
|
+
|
|
5
|
+
[project]
|
|
6
|
+
name = "batchbench"
|
|
7
|
+
version = "0.1.0"
|
|
8
|
+
description = "Offline and online benchmarking utilities for large language model workloads"
|
|
9
|
+
readme = "README.md"
|
|
10
|
+
authors = [
|
|
11
|
+
{name = "BatchBench Contributors"}
|
|
12
|
+
]
|
|
13
|
+
license = {text = "Apache-2.0"}
|
|
14
|
+
requires-python = ">=3.9"
|
|
15
|
+
dependencies = []
|
|
16
|
+
|
|
17
|
+
[project.optional-dependencies]
|
|
18
|
+
generate = ["transformers>=4.39.0"]
|
|
19
|
+
offline = ["vllm>=0.4.0"]
|
|
20
|
+
|
|
21
|
+
[project.scripts]
|
|
22
|
+
'batchbench.generate' = "batchbench.generate:main"
|
|
23
|
+
'batchbench.offline' = "batchbench.offline:main"
|
|
24
|
+
'batchbench.online' = "batchbench.online:main"
|
|
25
|
+
|
|
26
|
+
[tool.setuptools]
|
|
27
|
+
package-dir = {"" = "src"}
|
|
28
|
+
|
|
29
|
+
[tool.setuptools.packages.find]
|
|
30
|
+
where = ["src"]
|
|
31
|
+
|
|
32
|
+
[tool.setuptools.package-data]
|
|
33
|
+
"batchbench" = ["bin/batchbench"]
|
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
"""batchbench brings offline and online benchmarking utilities under one roof."""
|
|
2
|
+
|
|
3
|
+
from __future__ import annotations
|
|
4
|
+
|
|
5
|
+
from importlib import resources
|
|
6
|
+
|
|
7
|
+
try: # pragma: no cover
|
|
8
|
+
from importlib.metadata import PackageNotFoundError, version
|
|
9
|
+
except ImportError: # pragma: no cover
|
|
10
|
+
from importlib_metadata import PackageNotFoundError, version # type: ignore
|
|
11
|
+
|
|
12
|
+
|
|
13
|
+
def package_version() -> str:
|
|
14
|
+
"""Return the installed package version or a placeholder when run from source."""
|
|
15
|
+
try:
|
|
16
|
+
return version("batchbench")
|
|
17
|
+
except PackageNotFoundError:
|
|
18
|
+
return "0.0.0"
|
|
19
|
+
|
|
20
|
+
|
|
21
|
+
__all__ = ["package_version", "resources"]
|
|
22
|
+
__version__ = package_version()
|
|
Binary file
|
|
@@ -0,0 +1,256 @@
|
|
|
1
|
+
"""Utilities and CLI entrypoint for generating batchbench request payloads."""
|
|
2
|
+
|
|
3
|
+
from __future__ import annotations
|
|
4
|
+
|
|
5
|
+
import argparse
|
|
6
|
+
import json
|
|
7
|
+
import math
|
|
8
|
+
import os
|
|
9
|
+
import random
|
|
10
|
+
import re
|
|
11
|
+
from pathlib import Path
|
|
12
|
+
from typing import Any, List
|
|
13
|
+
|
|
14
|
+
DEFAULT_PREFIX_TEXT = (
|
|
15
|
+
"In this experiment we explore the capability of large language models "
|
|
16
|
+
"to adapt their narrative based on subtle contextual variations. The "
|
|
17
|
+
"following prompt requests creative output across a range of scenarios."
|
|
18
|
+
)
|
|
19
|
+
|
|
20
|
+
|
|
21
|
+
def load_tokenizer(model_name: str, token: str | None = None):
|
|
22
|
+
"""Load a Hugging Face tokenizer and configure pad token if needed."""
|
|
23
|
+
try:
|
|
24
|
+
from transformers import AutoTokenizer
|
|
25
|
+
except ImportError as exc: # pragma: no cover
|
|
26
|
+
raise SystemExit(
|
|
27
|
+
"Approximate token sizing requires the 'transformers' package. "
|
|
28
|
+
"Install it with `pip install transformers`."
|
|
29
|
+
) from exc
|
|
30
|
+
|
|
31
|
+
load_kwargs = {"use_fast": True}
|
|
32
|
+
if token:
|
|
33
|
+
load_kwargs["token"] = token
|
|
34
|
+
try:
|
|
35
|
+
tokenizer = AutoTokenizer.from_pretrained(model_name, **load_kwargs)
|
|
36
|
+
except Exception as exc: # pragma: no cover
|
|
37
|
+
raise SystemExit(f"Failed to load tokenizer '{model_name}': {exc}") from exc
|
|
38
|
+
|
|
39
|
+
if tokenizer.pad_token is None and tokenizer.eos_token is not None:
|
|
40
|
+
tokenizer.pad_token = tokenizer.eos_token
|
|
41
|
+
|
|
42
|
+
return tokenizer
|
|
43
|
+
|
|
44
|
+
|
|
45
|
+
def assemble_prompts(
|
|
46
|
+
count: int,
|
|
47
|
+
prefix_overlap: float,
|
|
48
|
+
*,
|
|
49
|
+
target_tokens: int | None = None,
|
|
50
|
+
tokenizer: Any | None = None,
|
|
51
|
+
tolerance: int = 5,
|
|
52
|
+
) -> List[str]:
|
|
53
|
+
"""Create prompt strings whose prefixes overlap by the requested fraction."""
|
|
54
|
+
if tokenizer is None:
|
|
55
|
+
raise ValueError("A tokenizer is required to detokenize random token ids.")
|
|
56
|
+
|
|
57
|
+
prompts: List[str] = []
|
|
58
|
+
rng = random.Random()
|
|
59
|
+
sequence_lengths: List[int] = []
|
|
60
|
+
for _ in range(count):
|
|
61
|
+
sequence_length = 1
|
|
62
|
+
if target_tokens and target_tokens > 0:
|
|
63
|
+
lower = max(1, target_tokens - (tolerance if tolerance else 0))
|
|
64
|
+
upper = target_tokens + (tolerance if tolerance else 0)
|
|
65
|
+
if lower > upper:
|
|
66
|
+
lower = upper
|
|
67
|
+
sequence_length = rng.randint(lower, upper) if lower != upper else lower
|
|
68
|
+
sequence_lengths.append(sequence_length)
|
|
69
|
+
|
|
70
|
+
prefix_ratio = max(0.0, min(prefix_overlap, 1.0))
|
|
71
|
+
min_length = min(sequence_lengths) if sequence_lengths else 0
|
|
72
|
+
prefix_length = int(math.floor(min_length * prefix_ratio)) if min_length else 0
|
|
73
|
+
if prefix_ratio > 0.0 and prefix_length == 0 and min_length > 0:
|
|
74
|
+
prefix_length = 1
|
|
75
|
+
|
|
76
|
+
prefix_ids = (
|
|
77
|
+
[rng.randint(1, 10000) for _ in range(prefix_length)] if prefix_length else []
|
|
78
|
+
)
|
|
79
|
+
|
|
80
|
+
for seq_length in sequence_lengths:
|
|
81
|
+
token_ids = [rng.randint(1, 10000) for _ in range(seq_length)]
|
|
82
|
+
unique_ids = token_ids[prefix_length:]
|
|
83
|
+
final_ids = prefix_ids + unique_ids
|
|
84
|
+
|
|
85
|
+
prompt_text = tokenizer.decode(
|
|
86
|
+
final_ids,
|
|
87
|
+
skip_special_tokens=True,
|
|
88
|
+
clean_up_tokenization_spaces=True,
|
|
89
|
+
)
|
|
90
|
+
|
|
91
|
+
if not prompt_text.strip():
|
|
92
|
+
try:
|
|
93
|
+
tokens = tokenizer.convert_ids_to_tokens(final_ids)
|
|
94
|
+
prompt_text = " ".join(tokens).strip()
|
|
95
|
+
except Exception:
|
|
96
|
+
prompt_text = " ".join(str(tid) for tid in final_ids)
|
|
97
|
+
|
|
98
|
+
prompts.append(prompt_text)
|
|
99
|
+
|
|
100
|
+
return prompts
|
|
101
|
+
|
|
102
|
+
|
|
103
|
+
def parse_args(argv: List[str] | None = None) -> argparse.Namespace:
|
|
104
|
+
parser = argparse.ArgumentParser(description=__doc__)
|
|
105
|
+
parser.add_argument(
|
|
106
|
+
"--count",
|
|
107
|
+
"-n",
|
|
108
|
+
type=int,
|
|
109
|
+
default=10,
|
|
110
|
+
help="Number of requests to generate (default: 10)",
|
|
111
|
+
)
|
|
112
|
+
parser.add_argument(
|
|
113
|
+
"--prefix-overlap",
|
|
114
|
+
type=float,
|
|
115
|
+
default=0.0,
|
|
116
|
+
help="Fraction of tokens shared as a prefix across requests (0.0-1.0, default: 0.0)",
|
|
117
|
+
)
|
|
118
|
+
parser.add_argument(
|
|
119
|
+
"--output",
|
|
120
|
+
"-o",
|
|
121
|
+
type=Path,
|
|
122
|
+
default=Path("outputs"),
|
|
123
|
+
help=(
|
|
124
|
+
"Destination directory or base filename. Metadata (count, prefix, token "
|
|
125
|
+
"target, tokenizer) is appended automatically."
|
|
126
|
+
),
|
|
127
|
+
)
|
|
128
|
+
parser.add_argument(
|
|
129
|
+
"--approx-input-tokens",
|
|
130
|
+
type=int,
|
|
131
|
+
default=0,
|
|
132
|
+
help="Approximate number of tokens each prompt should contain (default: 0 = no adjustment)",
|
|
133
|
+
)
|
|
134
|
+
parser.add_argument(
|
|
135
|
+
"--tokenizer-model",
|
|
136
|
+
default=None,
|
|
137
|
+
help=(
|
|
138
|
+
"Hugging Face tokenizer identifier to use when approximating token counts. "
|
|
139
|
+
"Defaults to 'gpt2' when not specified."
|
|
140
|
+
),
|
|
141
|
+
)
|
|
142
|
+
parser.add_argument(
|
|
143
|
+
"--token-tolerance",
|
|
144
|
+
type=int,
|
|
145
|
+
default=None,
|
|
146
|
+
help="Acceptable +/- token tolerance when approximating lengths (default: max(5, 5%% of target))",
|
|
147
|
+
)
|
|
148
|
+
parser.add_argument(
|
|
149
|
+
"--huggingface-token",
|
|
150
|
+
default=None,
|
|
151
|
+
help=(
|
|
152
|
+
"Personal access token for Hugging Face Hub (optional). If omitted, the "
|
|
153
|
+
"generator checks HUGGINGFACE_TOKEN and HUGGING_FACE_HUB_TOKEN env vars."
|
|
154
|
+
),
|
|
155
|
+
)
|
|
156
|
+
return parser.parse_args(argv)
|
|
157
|
+
|
|
158
|
+
|
|
159
|
+
def resolve_tolerance(target_tokens: int, explicit: int | None) -> int:
|
|
160
|
+
if target_tokens <= 0:
|
|
161
|
+
return 0
|
|
162
|
+
if explicit is not None and explicit >= 0:
|
|
163
|
+
return explicit
|
|
164
|
+
return max(5, int(target_tokens * 0.05))
|
|
165
|
+
|
|
166
|
+
|
|
167
|
+
def sanitize_component(value: str | None) -> str:
|
|
168
|
+
if not value:
|
|
169
|
+
return "none"
|
|
170
|
+
cleaned = re.sub(r"[^0-9A-Za-z._-]+", "-", value).strip("-._")
|
|
171
|
+
return cleaned or "none"
|
|
172
|
+
|
|
173
|
+
|
|
174
|
+
def format_prefix(prefix_overlap: float) -> str:
|
|
175
|
+
return f"{prefix_overlap:.2f}".replace(".", "p")
|
|
176
|
+
|
|
177
|
+
|
|
178
|
+
def build_output_path(
|
|
179
|
+
base_path: Path,
|
|
180
|
+
*,
|
|
181
|
+
count: int,
|
|
182
|
+
prefix_overlap: float,
|
|
183
|
+
target_tokens: int | None,
|
|
184
|
+
tokenizer_label: str,
|
|
185
|
+
) -> Path:
|
|
186
|
+
tokens_label = str(target_tokens) if target_tokens and target_tokens > 0 else "none"
|
|
187
|
+
prefix_label = format_prefix(prefix_overlap)
|
|
188
|
+
tokenizer_component = sanitize_component(tokenizer_label)
|
|
189
|
+
metadata_suffix = (
|
|
190
|
+
f"count-{count}_tokens-{tokens_label}_prefix-{prefix_label}_tokenizer-{tokenizer_component}"
|
|
191
|
+
)
|
|
192
|
+
|
|
193
|
+
if base_path.suffix == ".jsonl" and not base_path.is_dir():
|
|
194
|
+
directory = base_path.parent if base_path.parent else Path(".")
|
|
195
|
+
stem = sanitize_component(base_path.stem) or "requests"
|
|
196
|
+
filename = f"{stem}_{metadata_suffix}.jsonl"
|
|
197
|
+
else:
|
|
198
|
+
directory = base_path
|
|
199
|
+
filename = f"requests_{metadata_suffix}.jsonl"
|
|
200
|
+
|
|
201
|
+
directory.mkdir(parents=True, exist_ok=True)
|
|
202
|
+
output_path = directory / filename
|
|
203
|
+
if output_path.exists():
|
|
204
|
+
raise SystemExit(
|
|
205
|
+
f"Refusing to overwrite existing file {output_path}. Delete it or choose a different output directory."
|
|
206
|
+
)
|
|
207
|
+
return output_path
|
|
208
|
+
|
|
209
|
+
|
|
210
|
+
def main(argv: List[str] | None = None) -> int:
|
|
211
|
+
args = parse_args(argv)
|
|
212
|
+
|
|
213
|
+
hf_token = (
|
|
214
|
+
args.huggingface_token
|
|
215
|
+
or os.getenv("HUGGINGFACE_TOKEN")
|
|
216
|
+
or os.getenv("HUGGING_FACE_HUB_TOKEN")
|
|
217
|
+
)
|
|
218
|
+
|
|
219
|
+
target_tokens = args.approx_input_tokens if args.approx_input_tokens > 0 else None
|
|
220
|
+
tokenizer_label = args.tokenizer_model or "gpt2"
|
|
221
|
+
tokenizer = load_tokenizer(tokenizer_label, hf_token)
|
|
222
|
+
tolerance = (
|
|
223
|
+
resolve_tolerance(target_tokens, args.token_tolerance) if target_tokens else 0
|
|
224
|
+
)
|
|
225
|
+
|
|
226
|
+
prompts = assemble_prompts(
|
|
227
|
+
count=args.count,
|
|
228
|
+
prefix_overlap=args.prefix_overlap,
|
|
229
|
+
target_tokens=target_tokens,
|
|
230
|
+
tokenizer=tokenizer,
|
|
231
|
+
tolerance=tolerance,
|
|
232
|
+
)
|
|
233
|
+
|
|
234
|
+
output_path = build_output_path(
|
|
235
|
+
args.output,
|
|
236
|
+
count=args.count,
|
|
237
|
+
prefix_overlap=args.prefix_overlap,
|
|
238
|
+
target_tokens=target_tokens,
|
|
239
|
+
tokenizer_label=tokenizer_label,
|
|
240
|
+
)
|
|
241
|
+
|
|
242
|
+
with output_path.open("w", encoding="utf-8") as handle:
|
|
243
|
+
for prompt in prompts:
|
|
244
|
+
json.dump({"text": prompt}, handle)
|
|
245
|
+
handle.write("\n")
|
|
246
|
+
|
|
247
|
+
print(
|
|
248
|
+
f"Wrote {len(prompts)} request prompts to {output_path} "
|
|
249
|
+
f"with prefix overlap {args.prefix_overlap:.2f}."
|
|
250
|
+
)
|
|
251
|
+
|
|
252
|
+
return 0
|
|
253
|
+
|
|
254
|
+
|
|
255
|
+
if __name__ == "__main__": # pragma: no cover
|
|
256
|
+
raise SystemExit(main())
|
|
@@ -0,0 +1,210 @@
|
|
|
1
|
+
"""Offline vLLM benchmarking CLI entrypoint."""
|
|
2
|
+
|
|
3
|
+
from __future__ import annotations
|
|
4
|
+
|
|
5
|
+
import argparse
|
|
6
|
+
import csv
|
|
7
|
+
import logging
|
|
8
|
+
import os
|
|
9
|
+
import re
|
|
10
|
+
from contextlib import contextmanager
|
|
11
|
+
from random import randint
|
|
12
|
+
from statistics import mean, pstdev
|
|
13
|
+
from typing import Dict, Iterable, List, Optional
|
|
14
|
+
|
|
15
|
+
def _load_vllm():
|
|
16
|
+
try:
|
|
17
|
+
from vllm import LLM, SamplingParams # type: ignore
|
|
18
|
+
except ImportError as exc: # pragma: no cover
|
|
19
|
+
raise SystemExit(
|
|
20
|
+
"vLLM is required for batchbench.offline. Install with `pip install batchbench[offline]`."
|
|
21
|
+
) from exc
|
|
22
|
+
|
|
23
|
+
# Ensure vLLM keeps emitting throughput stats while we run.
|
|
24
|
+
os.environ.setdefault("VLLM_LOG_STATS_INTERVAL", "1")
|
|
25
|
+
return LLM, SamplingParams
|
|
26
|
+
|
|
27
|
+
|
|
28
|
+
class VLLMThroughputCollector(logging.Handler):
|
|
29
|
+
"""Logging handler that captures vLLM throughput stats from INFO logs."""
|
|
30
|
+
|
|
31
|
+
def __init__(self):
|
|
32
|
+
super().__init__(level=logging.INFO)
|
|
33
|
+
self.prompt_tps: List[float] = []
|
|
34
|
+
self.gen_tps: List[float] = []
|
|
35
|
+
self.TP_LINE = re.compile(
|
|
36
|
+
r"Avg prompt throughput:\s*([0-9.]+)\s*tokens/s,\s*"
|
|
37
|
+
r"Avg generation throughput:\s*([0-9.]+)\s*tokens/s"
|
|
38
|
+
)
|
|
39
|
+
|
|
40
|
+
def emit(self, record: logging.LogRecord) -> None: # type: ignore[override]
|
|
41
|
+
try:
|
|
42
|
+
msg = record.getMessage()
|
|
43
|
+
except Exception:
|
|
44
|
+
return
|
|
45
|
+
match = self.TP_LINE.search(msg)
|
|
46
|
+
if match:
|
|
47
|
+
self.prompt_tps.append(float(match.group(1)))
|
|
48
|
+
self.gen_tps.append(float(match.group(2)))
|
|
49
|
+
|
|
50
|
+
def summary(self) -> Optional[Dict[str, float]]:
|
|
51
|
+
if not self.prompt_tps or not self.gen_tps:
|
|
52
|
+
return None
|
|
53
|
+
return {
|
|
54
|
+
"prompt_avg": mean(self.prompt_tps),
|
|
55
|
+
"prompt_std": pstdev(self.prompt_tps),
|
|
56
|
+
"gen_avg": mean(self.gen_tps),
|
|
57
|
+
"gen_std": pstdev(self.gen_tps),
|
|
58
|
+
}
|
|
59
|
+
|
|
60
|
+
def save_csv(self, path: str = "vllm_throughput_history.csv") -> str:
|
|
61
|
+
"""Persist the captured throughput history for later inspection."""
|
|
62
|
+
with open(path, "w", newline="") as handle:
|
|
63
|
+
writer = csv.writer(handle)
|
|
64
|
+
writer.writerow(["prompt_tps", "gen_tps"])
|
|
65
|
+
for prompt_tps, gen_tps in zip(self.prompt_tps, self.gen_tps):
|
|
66
|
+
writer.writerow([prompt_tps, gen_tps])
|
|
67
|
+
return path
|
|
68
|
+
|
|
69
|
+
|
|
70
|
+
@contextmanager
|
|
71
|
+
def capture_vllm_throughput() -> Iterable[VLLMThroughputCollector]:
|
|
72
|
+
"""Attach the collector to the vLLM logger while the workload runs."""
|
|
73
|
+
logger = logging.getLogger("vllm")
|
|
74
|
+
collector = VLLMThroughputCollector()
|
|
75
|
+
logger.addHandler(collector)
|
|
76
|
+
if logger.level > logging.INFO:
|
|
77
|
+
logger.setLevel(logging.INFO)
|
|
78
|
+
try:
|
|
79
|
+
yield collector
|
|
80
|
+
finally:
|
|
81
|
+
logger.removeHandler(collector)
|
|
82
|
+
|
|
83
|
+
|
|
84
|
+
def build_parser() -> argparse.ArgumentParser:
|
|
85
|
+
parser = argparse.ArgumentParser(
|
|
86
|
+
description=(
|
|
87
|
+
"Run the metrics workload while allowing the model and vLLM options "
|
|
88
|
+
"to be configured from the command line."
|
|
89
|
+
)
|
|
90
|
+
)
|
|
91
|
+
parser.add_argument(
|
|
92
|
+
"--model",
|
|
93
|
+
default="facebook/opt-125m",
|
|
94
|
+
help="Model identifier or path to load with vLLM."
|
|
95
|
+
)
|
|
96
|
+
parser.add_argument(
|
|
97
|
+
"--num_reqs",
|
|
98
|
+
type=int,
|
|
99
|
+
default=2048,
|
|
100
|
+
help="Number of synthetic prompts to generate."
|
|
101
|
+
)
|
|
102
|
+
parser.add_argument(
|
|
103
|
+
"--icl",
|
|
104
|
+
type=int,
|
|
105
|
+
default=1024,
|
|
106
|
+
help="Input context length (tokens per prompt)."
|
|
107
|
+
)
|
|
108
|
+
parser.add_argument(
|
|
109
|
+
"--ocl",
|
|
110
|
+
type=int,
|
|
111
|
+
default=1,
|
|
112
|
+
help="Output context length (max tokens generated per request)."
|
|
113
|
+
)
|
|
114
|
+
parser.add_argument(
|
|
115
|
+
"--throughput_csv",
|
|
116
|
+
default="vllm_throughput_history.csv",
|
|
117
|
+
help="Where to persist the throughput history CSV."
|
|
118
|
+
)
|
|
119
|
+
parser.add_argument(
|
|
120
|
+
"--tensor_parallel_size",
|
|
121
|
+
type=int,
|
|
122
|
+
default=1,
|
|
123
|
+
help="Tensor parallel world size for vLLM initialisation."
|
|
124
|
+
)
|
|
125
|
+
parser.add_argument(
|
|
126
|
+
"--pipeline_parallel_size",
|
|
127
|
+
type=int,
|
|
128
|
+
default=1,
|
|
129
|
+
help="Pipeline parallel world size for vLLM initialisation."
|
|
130
|
+
)
|
|
131
|
+
parser.add_argument(
|
|
132
|
+
"--max_num_batched_tokens",
|
|
133
|
+
type=int,
|
|
134
|
+
default=512,
|
|
135
|
+
help="Maximum tokens per batch when pre-filling prompts."
|
|
136
|
+
)
|
|
137
|
+
return parser
|
|
138
|
+
|
|
139
|
+
|
|
140
|
+
def validate_args(args: argparse.Namespace) -> None:
|
|
141
|
+
if args.num_reqs < 1:
|
|
142
|
+
raise ValueError("num_reqs must be >= 1")
|
|
143
|
+
if args.icl < 1:
|
|
144
|
+
raise ValueError("icl must be >= 1")
|
|
145
|
+
if args.ocl < 0:
|
|
146
|
+
raise ValueError("ocl must be >= 0")
|
|
147
|
+
if args.tensor_parallel_size < 1:
|
|
148
|
+
raise ValueError("tensor_parallel_size must be >= 1")
|
|
149
|
+
if args.pipeline_parallel_size < 1:
|
|
150
|
+
raise ValueError("pipeline_parallel_size must be >= 1")
|
|
151
|
+
if args.max_num_batched_tokens < 1:
|
|
152
|
+
raise ValueError("max_num_batched_tokens must be >= 1")
|
|
153
|
+
|
|
154
|
+
|
|
155
|
+
def main(argv: List[str] | None = None) -> int:
|
|
156
|
+
parser = build_parser()
|
|
157
|
+
args = parser.parse_args(argv)
|
|
158
|
+
|
|
159
|
+
try:
|
|
160
|
+
validate_args(args)
|
|
161
|
+
except ValueError as exc:
|
|
162
|
+
parser.error(str(exc))
|
|
163
|
+
|
|
164
|
+
LLM, SamplingParams = _load_vllm()
|
|
165
|
+
|
|
166
|
+
llm_kwargs = {
|
|
167
|
+
"model": args.model,
|
|
168
|
+
"disable_log_stats": False,
|
|
169
|
+
"enable_chunked_prefill": True,
|
|
170
|
+
"tensor_parallel_size": args.tensor_parallel_size,
|
|
171
|
+
"pipeline_parallel_size": args.pipeline_parallel_size,
|
|
172
|
+
"max_num_batched_tokens": args.max_num_batched_tokens,
|
|
173
|
+
}
|
|
174
|
+
|
|
175
|
+
sampling_params = SamplingParams(
|
|
176
|
+
temperature=0.8,
|
|
177
|
+
top_k=20,
|
|
178
|
+
max_tokens=args.ocl,
|
|
179
|
+
min_tokens=args.ocl,
|
|
180
|
+
)
|
|
181
|
+
|
|
182
|
+
llm = LLM(**llm_kwargs)
|
|
183
|
+
|
|
184
|
+
tokenizer = llm.get_tokenizer()
|
|
185
|
+
|
|
186
|
+
tokenized_prompts = [
|
|
187
|
+
[randint(1, 10000) for _ in range(args.icl)]
|
|
188
|
+
for _ in range(args.num_reqs)
|
|
189
|
+
]
|
|
190
|
+
prompts = tokenizer.batch_decode(tokenized_prompts)
|
|
191
|
+
|
|
192
|
+
with capture_vllm_throughput() as collector:
|
|
193
|
+
llm.generate(prompts, sampling_params, use_tqdm=False)
|
|
194
|
+
stats = collector.summary()
|
|
195
|
+
csv_path = collector.save_csv(args.throughput_csv)
|
|
196
|
+
|
|
197
|
+
if stats is None:
|
|
198
|
+
print("No throughput lines were captured. Make sure vLLM is emitting stats logs.")
|
|
199
|
+
else:
|
|
200
|
+
print(
|
|
201
|
+
f"Prompt TPS mean={stats['prompt_avg']:.2f} +/- {stats['prompt_std']:.2f}\n"
|
|
202
|
+
f"Gen TPS mean={stats['gen_avg']:.2f} +/- {stats['gen_std']:.2f}\n"
|
|
203
|
+
f"History CSV written to: {csv_path}"
|
|
204
|
+
)
|
|
205
|
+
|
|
206
|
+
return 0
|
|
207
|
+
|
|
208
|
+
|
|
209
|
+
if __name__ == "__main__": # pragma: no cover
|
|
210
|
+
raise SystemExit(main())
|
|
@@ -0,0 +1,38 @@
|
|
|
1
|
+
"""Wrapper that launches the bundled Rust-powered online benchmark."""
|
|
2
|
+
|
|
3
|
+
from __future__ import annotations
|
|
4
|
+
|
|
5
|
+
import os
|
|
6
|
+
import subprocess
|
|
7
|
+
import sys
|
|
8
|
+
|
|
9
|
+
from importlib import resources
|
|
10
|
+
|
|
11
|
+
|
|
12
|
+
BINARY_NAME = "batchbench"
|
|
13
|
+
|
|
14
|
+
|
|
15
|
+
def main(argv: list[str] | None = None) -> int:
|
|
16
|
+
args = argv if argv is not None else sys.argv[1:]
|
|
17
|
+
|
|
18
|
+
resource = resources.files(__package__).joinpath("bin", BINARY_NAME)
|
|
19
|
+
if not resource.is_file(): # pragma: no cover
|
|
20
|
+
raise SystemExit(
|
|
21
|
+
"Bundled online benchmarking binary not found. "
|
|
22
|
+
"Please reinstall batchbench or report an issue."
|
|
23
|
+
)
|
|
24
|
+
|
|
25
|
+
env = os.environ.copy()
|
|
26
|
+
|
|
27
|
+
with resources.as_file(resource) as binary_path:
|
|
28
|
+
command = [str(binary_path), *args]
|
|
29
|
+
try:
|
|
30
|
+
completed = subprocess.run(command, check=False, env=env)
|
|
31
|
+
except FileNotFoundError as exc: # pragma: no cover
|
|
32
|
+
raise SystemExit(f"Failed to execute bundled binary: {exc}") from exc
|
|
33
|
+
|
|
34
|
+
return int(completed.returncode)
|
|
35
|
+
|
|
36
|
+
|
|
37
|
+
if __name__ == "__main__": # pragma: no cover
|
|
38
|
+
raise SystemExit(main())
|
|
@@ -0,0 +1,80 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: batchbench
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Offline and online benchmarking utilities for large language model workloads
|
|
5
|
+
Author: BatchBench Contributors
|
|
6
|
+
License: Apache-2.0
|
|
7
|
+
Requires-Python: >=3.9
|
|
8
|
+
Description-Content-Type: text/markdown
|
|
9
|
+
Provides-Extra: generate
|
|
10
|
+
Requires-Dist: transformers>=4.39.0; extra == "generate"
|
|
11
|
+
Provides-Extra: offline
|
|
12
|
+
Requires-Dist: vllm>=0.4.0; extra == "offline"
|
|
13
|
+
|
|
14
|
+
# BatchBench
|
|
15
|
+
|
|
16
|
+
BatchBench bundles three benchmarking utilities behind installable Python entrypoints:
|
|
17
|
+
|
|
18
|
+
- `batchbench.generate` produces JSONL request corpora with controllable prefix overlap and approximate token counts.
|
|
19
|
+
- `batchbench.offline` drives an offline vLLM workload to record prompt and generation throughput.
|
|
20
|
+
- `batchbench.online` launches the packaged Rust binary that fans requests out to OpenAI-compatible endpoints in parallel.
|
|
21
|
+
|
|
22
|
+
## Installation
|
|
23
|
+
|
|
24
|
+
```bash
|
|
25
|
+
pip install batchbench
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
Optional extras install tool-specific dependencies:
|
|
29
|
+
|
|
30
|
+
```bash
|
|
31
|
+
pip install "batchbench[generate]" # adds transformers for prompt sizing
|
|
32
|
+
pip install "batchbench[offline]" # adds vllm for the offline benchmark
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
## Generating Requests
|
|
36
|
+
|
|
37
|
+
```bash
|
|
38
|
+
batchbench.generate \
|
|
39
|
+
--count 100 \
|
|
40
|
+
--prefix-overlap 0.3 \
|
|
41
|
+
--approx-input-tokens 512 \
|
|
42
|
+
--tokenizer-model gpt-3.5-turbo \
|
|
43
|
+
--output data
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
Each row in the resulting JSONL file has a `text` field. The filename embeds run metadata (count, tokens, prefix, tokenizer) to keep runs distinct.
|
|
47
|
+
|
|
48
|
+
## Offline Benchmarking
|
|
49
|
+
|
|
50
|
+
The offline harness requires vLLM and a compatible model checkpoint.
|
|
51
|
+
|
|
52
|
+
```bash
|
|
53
|
+
batchbench.offline \
|
|
54
|
+
--model facebook/opt-125m \
|
|
55
|
+
--num_reqs 2048 \
|
|
56
|
+
--icl 1024 \
|
|
57
|
+
--ocl 1
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
The command prints prompt/generation throughput statistics and writes the sampled history to `vllm_throughput_history.csv` (configurable via `--throughput_csv`).
|
|
61
|
+
|
|
62
|
+
## Online Benchmarking
|
|
63
|
+
|
|
64
|
+
`batchbench.online` wraps the Rust executable that used to live under `rust-bench/`. The binary ships inside the wheel, so Cargo is not required on the host.
|
|
65
|
+
|
|
66
|
+
```bash
|
|
67
|
+
batchbench.online \
|
|
68
|
+
--jsonl data/requests.jsonl \
|
|
69
|
+
--model gpt-4o-mini \
|
|
70
|
+
--host https://api.openai.com \
|
|
71
|
+
--endpoint /v1/chat/completions \
|
|
72
|
+
--users 8 \
|
|
73
|
+
--requests-per-user 1
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
Provide an API key via `--api-key` or the environment variable named by `--api-key-env` (defaults to `OPENAI_API_KEY`).
|
|
77
|
+
|
|
78
|
+
## Development Notes
|
|
79
|
+
|
|
80
|
+
The project now follows a `src/` layout. Run `pip install -e .[generate,offline]` during development to work against the editable package. The Rust binary can be rebuilt with `cargo build --release` inside `rust-bench/`; copy the resulting executable to `src/batchbench/bin/` if you need to refresh it.
|
|
@@ -0,0 +1,14 @@
|
|
|
1
|
+
MANIFEST.in
|
|
2
|
+
README.md
|
|
3
|
+
pyproject.toml
|
|
4
|
+
src/batchbench/__init__.py
|
|
5
|
+
src/batchbench/generate.py
|
|
6
|
+
src/batchbench/offline.py
|
|
7
|
+
src/batchbench/online.py
|
|
8
|
+
src/batchbench.egg-info/PKG-INFO
|
|
9
|
+
src/batchbench.egg-info/SOURCES.txt
|
|
10
|
+
src/batchbench.egg-info/dependency_links.txt
|
|
11
|
+
src/batchbench.egg-info/entry_points.txt
|
|
12
|
+
src/batchbench.egg-info/requires.txt
|
|
13
|
+
src/batchbench.egg-info/top_level.txt
|
|
14
|
+
src/batchbench/bin/batchbench
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
batchbench
|