metrillm 0.2.0 → 0.2.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +42 -20
- package/dist/index.mjs +1120 -325
- package/package.json +2 -2
package/README.md
CHANGED
|
@@ -13,7 +13,8 @@
|
|
|
13
13
|
> Think Geekbench, but for local LLMs on your actual hardware.
|
|
14
14
|
|
|
15
15
|
```bash
|
|
16
|
-
|
|
16
|
+
npm install -g metrillm
|
|
17
|
+
metrillm bench
|
|
17
18
|
```
|
|
18
19
|
|
|
19
20
|
<p align="center">
|
|
@@ -56,26 +57,19 @@ npx metrillm@latest bench
|
|
|
56
57
|
> [Ollama](https://ollama.com/) or [LM Studio](https://lmstudio.ai/).
|
|
57
58
|
|
|
58
59
|
```bash
|
|
59
|
-
#
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
# Or install globally
|
|
63
|
-
npm i -g metrillm
|
|
60
|
+
# Install globally
|
|
61
|
+
npm install -g metrillm
|
|
64
62
|
metrillm bench
|
|
65
63
|
|
|
66
|
-
#
|
|
67
|
-
|
|
68
|
-
|
|
64
|
+
# Alternative package managers
|
|
65
|
+
pnpm add -g metrillm
|
|
66
|
+
bun add -g metrillm
|
|
69
67
|
|
|
70
|
-
#
|
|
71
|
-
brew
|
|
72
|
-
# Then:
|
|
73
|
-
brew install metrillm
|
|
74
|
-
metrillm bench
|
|
68
|
+
# Homebrew
|
|
69
|
+
brew install MetriLLM/metrillm/metrillm
|
|
75
70
|
|
|
76
|
-
#
|
|
77
|
-
|
|
78
|
-
bunx metrillm@latest bench
|
|
71
|
+
# Or run without installing
|
|
72
|
+
npx metrillm@latest bench
|
|
79
73
|
```
|
|
80
74
|
|
|
81
75
|
## Usage
|
|
@@ -120,19 +114,47 @@ By default, production builds upload shared results to the official MetriLLM lea
|
|
|
120
114
|
|
|
121
115
|
If these variables are set to placeholder values (from templates), MetriLLM falls back to official defaults.
|
|
122
116
|
|
|
117
|
+
## Windows Users
|
|
118
|
+
|
|
119
|
+
PowerShell's default execution policy blocks npm global scripts. If you see `PSSecurityException` or `UnauthorizedAccess` when running `metrillm`, run this once:
|
|
120
|
+
|
|
121
|
+
```powershell
|
|
122
|
+
Set-ExecutionPolicy -Scope CurrentUser -ExecutionPolicy RemoteSigned
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
Alternatively, use `npx metrillm` which bypasses the issue entirely.
|
|
126
|
+
|
|
123
127
|
## Runtime Backends
|
|
124
128
|
|
|
125
129
|
| Backend | Flag | Default URL | Required env |
|
|
126
130
|
|---|---|---|---|
|
|
127
131
|
| Ollama | `--backend ollama` | `http://127.0.0.1:11434` | `OLLAMA_HOST` (optional) |
|
|
128
|
-
| LM Studio | `--backend lm-studio` | `http://127.0.0.1:1234` | `LM_STUDIO_BASE_URL` (optional), `LM_STUDIO_API_KEY` (optional)
|
|
132
|
+
| LM Studio | `--backend lm-studio` | `http://127.0.0.1:1234` | `LM_STUDIO_BASE_URL` (optional), `LM_STUDIO_API_KEY` (optional) |
|
|
133
|
+
|
|
134
|
+
Shared runtime env:
|
|
135
|
+
- `METRILLM_STREAM_STALL_TIMEOUT_MS` (optional): stream watchdog for all backends, default `30000`, `0` disables it
|
|
136
|
+
|
|
137
|
+
LM Studio benchmark runs now use the native REST inference endpoint (`/api/v1/chat`) for both streaming and non-streaming generation.
|
|
138
|
+
The previous OpenAI-compatible inference path (`/v1/chat/completions`) has been retired from MetriLLM so tok/s and TTFT can rely on native LM Studio stats when available.
|
|
139
|
+
If a LM Studio response omits native token stats, MetriLLM still computes a score and shows the throughput as `estimated`.
|
|
129
140
|
|
|
130
141
|
For very large models, tune timeout flags:
|
|
131
142
|
- `--perf-warmup-timeout-ms` (default `300000`)
|
|
132
143
|
- `--perf-prompt-timeout-ms` (default `120000`)
|
|
133
144
|
- `--quality-timeout-ms` (default `120000`)
|
|
134
145
|
- `--coding-timeout-ms` (default `240000`)
|
|
135
|
-
- `--
|
|
146
|
+
- `--stream-stall-timeout-ms` (default `30000`, `0` disables stall timeout for any backend)
|
|
147
|
+
|
|
148
|
+
Benchmark Profile v1 (applied to all benchmark prompts):
|
|
149
|
+
- `temperature=0`
|
|
150
|
+
- `top_p=1`
|
|
151
|
+
- `seed=42`
|
|
152
|
+
- `thinking` follows your benchmark mode (`--thinking` / `--no-thinking`)
|
|
153
|
+
- Context window stays runtime default (`context=runtime-default`) and is recorded as such in metadata.
|
|
154
|
+
|
|
155
|
+
LM Studio non-thinking guard:
|
|
156
|
+
- When benchmark mode requests non-thinking (`--no-thinking` or default), MetriLLM now aborts if the model still emits reasoning traces (for result comparability).
|
|
157
|
+
- To disable it in LM Studio for affected models, put this at the top of the model chat template: `{%- set enable_thinking = false %}` then eject/reload the model.
|
|
136
158
|
|
|
137
159
|
## How Scoring Works
|
|
138
160
|
|
|
@@ -258,7 +280,7 @@ The tap formula lives in `Formula/metrillm.rb`.
|
|
|
258
280
|
./scripts/update-homebrew-formula.sh
|
|
259
281
|
|
|
260
282
|
# Or pin a specific version
|
|
261
|
-
./scripts/update-homebrew-formula.sh 0.2.
|
|
283
|
+
./scripts/update-homebrew-formula.sh 0.2.1
|
|
262
284
|
```
|
|
263
285
|
|
|
264
286
|
After updating the formula, commit and push so users can install/update with:
|