metrillm 0.2.0 → 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +42 -20
  2. package/dist/index.mjs +1120 -325
  3. package/package.json +2 -2
package/README.md CHANGED
@@ -13,7 +13,8 @@
13
13
  > Think Geekbench, but for local LLMs on your actual hardware.
14
14
 
15
15
  ```bash
16
- npx metrillm@latest bench
16
+ npm install -g metrillm
17
+ metrillm bench
17
18
  ```
18
19
 
19
20
  <p align="center">
@@ -56,26 +57,19 @@ npx metrillm@latest bench
56
57
  > [Ollama](https://ollama.com/) or [LM Studio](https://lmstudio.ai/).
57
58
 
58
59
  ```bash
59
- # Run directly (no install)
60
- npx metrillm@latest bench
61
-
62
- # Or install globally
63
- npm i -g metrillm
60
+ # Install globally
61
+ npm install -g metrillm
64
62
  metrillm bench
65
63
 
66
- # Homebrew (no global npm install)
67
- # One-liner install (without pre-tapping):
68
- brew install MetriLLM/metrillm/metrillm
64
+ # Alternative package managers
65
+ pnpm add -g metrillm
66
+ bun add -g metrillm
69
67
 
70
- # Or one-time tap for short install command:
71
- brew tap MetriLLM/metrillm
72
- # Then:
73
- brew install metrillm
74
- metrillm bench
68
+ # Homebrew
69
+ brew install MetriLLM/metrillm/metrillm
75
70
 
76
- # Alternative package managers
77
- pnpm dlx metrillm@latest bench
78
- bunx metrillm@latest bench
71
+ # Or run without installing
72
+ npx metrillm@latest bench
79
73
  ```
80
74
 
81
75
  ## Usage
@@ -120,19 +114,47 @@ By default, production builds upload shared results to the official MetriLLM lea
120
114
 
121
115
  If these variables are set to placeholder values (from templates), MetriLLM falls back to official defaults.
122
116
 
117
+ ## Windows Users
118
+
119
+ PowerShell's default execution policy blocks npm global scripts. If you see `PSSecurityException` or `UnauthorizedAccess` when running `metrillm`, run this once:
120
+
121
+ ```powershell
122
+ Set-ExecutionPolicy -Scope CurrentUser -ExecutionPolicy RemoteSigned
123
+ ```
124
+
125
+ Alternatively, use `npx metrillm` which bypasses the issue entirely.
126
+
123
127
  ## Runtime Backends
124
128
 
125
129
  | Backend | Flag | Default URL | Required env |
126
130
  |---|---|---|---|
127
131
  | Ollama | `--backend ollama` | `http://127.0.0.1:11434` | `OLLAMA_HOST` (optional) |
128
- | LM Studio | `--backend lm-studio` | `http://127.0.0.1:1234` | `LM_STUDIO_BASE_URL` (optional), `LM_STUDIO_API_KEY` (optional), `LM_STUDIO_STREAM_STALL_TIMEOUT_MS` (optional) |
132
+ | LM Studio | `--backend lm-studio` | `http://127.0.0.1:1234` | `LM_STUDIO_BASE_URL` (optional), `LM_STUDIO_API_KEY` (optional) |
133
+
134
+ Shared runtime env:
135
+ - `METRILLM_STREAM_STALL_TIMEOUT_MS` (optional): stream watchdog for all backends, default `30000`, `0` disables it
136
+
137
+ LM Studio benchmark runs now use the native REST inference endpoint (`/api/v1/chat`) for both streaming and non-streaming generation.
138
+ The previous OpenAI-compatible inference path (`/v1/chat/completions`) has been retired from MetriLLM so tok/s and TTFT can rely on native LM Studio stats when available.
139
+ If a LM Studio response omits native token stats, MetriLLM still computes a score and shows the throughput as `estimated`.
129
140
 
130
141
  For very large models, tune timeout flags:
131
142
  - `--perf-warmup-timeout-ms` (default `300000`)
132
143
  - `--perf-prompt-timeout-ms` (default `120000`)
133
144
  - `--quality-timeout-ms` (default `120000`)
134
145
  - `--coding-timeout-ms` (default `240000`)
135
- - `--lm-studio-stream-stall-timeout-ms` (default `180000`, `0` disables stall timeout)
146
+ - `--stream-stall-timeout-ms` (default `30000`, `0` disables stall timeout for any backend)
147
+
148
+ Benchmark Profile v1 (applied to all benchmark prompts):
149
+ - `temperature=0`
150
+ - `top_p=1`
151
+ - `seed=42`
152
+ - `thinking` follows your benchmark mode (`--thinking` / `--no-thinking`)
153
+ - Context window stays runtime default (`context=runtime-default`) and is recorded as such in metadata.
154
+
155
+ LM Studio non-thinking guard:
156
+ - When benchmark mode requests non-thinking (`--no-thinking` or default), MetriLLM now aborts if the model still emits reasoning traces (for result comparability).
157
+ - To disable it in LM Studio for affected models, put this at the top of the model chat template: `{%- set enable_thinking = false %}` then eject/reload the model.
136
158
 
137
159
  ## How Scoring Works
138
160
 
@@ -258,7 +280,7 @@ The tap formula lives in `Formula/metrillm.rb`.
258
280
  ./scripts/update-homebrew-formula.sh
259
281
 
260
282
  # Or pin a specific version
261
- ./scripts/update-homebrew-formula.sh 0.2.0
283
+ ./scripts/update-homebrew-formula.sh 0.2.1
262
284
  ```
263
285
 
264
286
  After updating the formula, commit and push so users can install/update with: