metrillm 0.1.1 → 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +79 -13
- package/dist/index.mjs +48490 -46389
- package/package.json +6 -4
package/README.md
CHANGED
|
@@ -13,7 +13,8 @@
|
|
|
13
13
|
> Think Geekbench, but for local LLMs on your actual hardware.
|
|
14
14
|
|
|
15
15
|
```bash
|
|
16
|
-
|
|
16
|
+
npm install -g metrillm
|
|
17
|
+
metrillm bench
|
|
17
18
|
```
|
|
18
19
|
|
|
19
20
|
<p align="center">
|
|
@@ -27,7 +28,7 @@ npx metrillm@latest bench
|
|
|
27
28
|
|
|
28
29
|
- **Performance metrics**: tokens/sec, time to first token, memory usage, load time
|
|
29
30
|
- **Quality evaluation**: reasoning, coding, math, instruction following, structured output, multilingual (14 prompts, 6 categories)
|
|
30
|
-
- **Global score** (0-100):
|
|
31
|
+
- **Global score** (0-100): 30% hardware fit + 70% quality
|
|
31
32
|
- **Verdict**: EXCELLENT / GOOD / MARGINAL / NOT RECOMMENDED
|
|
32
33
|
- **One-click share**: `--share` uploads your result and gives you a public URL + leaderboard rank
|
|
33
34
|
|
|
@@ -52,19 +53,23 @@ npx metrillm@latest bench
|
|
|
52
53
|
|
|
53
54
|
## Install
|
|
54
55
|
|
|
55
|
-
> Requires [Node 20+](https://nodejs.org/) and
|
|
56
|
+
> Requires [Node 20+](https://nodejs.org/) and a local runtime:
|
|
57
|
+
> [Ollama](https://ollama.com/) or [LM Studio](https://lmstudio.ai/).
|
|
56
58
|
|
|
57
59
|
```bash
|
|
58
|
-
#
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
# Or install globally
|
|
62
|
-
npm i -g metrillm
|
|
60
|
+
# Install globally
|
|
61
|
+
npm install -g metrillm
|
|
63
62
|
metrillm bench
|
|
64
63
|
|
|
65
64
|
# Alternative package managers
|
|
66
|
-
pnpm
|
|
67
|
-
|
|
65
|
+
pnpm add -g metrillm
|
|
66
|
+
bun add -g metrillm
|
|
67
|
+
|
|
68
|
+
# Homebrew
|
|
69
|
+
brew install MetriLLM/metrillm/metrillm
|
|
70
|
+
|
|
71
|
+
# Or run without installing
|
|
72
|
+
npx metrillm@latest bench
|
|
68
73
|
```
|
|
69
74
|
|
|
70
75
|
## Usage
|
|
@@ -76,6 +81,9 @@ metrillm bench
|
|
|
76
81
|
# Benchmark a specific model
|
|
77
82
|
metrillm bench --model gemma3:4b
|
|
78
83
|
|
|
84
|
+
# Benchmark with LM Studio backend
|
|
85
|
+
metrillm bench --backend lm-studio --model qwen3-8b
|
|
86
|
+
|
|
79
87
|
# Benchmark all installed models
|
|
80
88
|
metrillm bench --all
|
|
81
89
|
|
|
@@ -93,18 +101,56 @@ metrillm bench --export json
|
|
|
93
101
|
metrillm bench --export csv
|
|
94
102
|
```
|
|
95
103
|
|
|
104
|
+
## Upload Configuration (CLI + MCP)
|
|
105
|
+
|
|
106
|
+
By default, production builds upload shared results to the official MetriLLM leaderboard (`https://metrillm.dev`).
|
|
107
|
+
|
|
108
|
+
- No CI secret injection is required for standard releases.
|
|
109
|
+
- Local/dev runs use the same default behavior.
|
|
110
|
+
- Self-hosted or staging deployments can override endpoints with:
|
|
111
|
+
- `METRILLM_SUPABASE_URL`
|
|
112
|
+
- `METRILLM_SUPABASE_ANON_KEY`
|
|
113
|
+
- `METRILLM_PUBLIC_RESULT_BASE_URL`
|
|
114
|
+
|
|
115
|
+
If these variables are set to placeholder values (from templates), MetriLLM falls back to official defaults.
|
|
116
|
+
|
|
117
|
+
## Runtime Backends
|
|
118
|
+
|
|
119
|
+
| Backend | Flag | Default URL | Required env |
|
|
120
|
+
|---|---|---|---|
|
|
121
|
+
| Ollama | `--backend ollama` | `http://127.0.0.1:11434` | `OLLAMA_HOST` (optional) |
|
|
122
|
+
| LM Studio | `--backend lm-studio` | `http://127.0.0.1:1234` | `LM_STUDIO_BASE_URL` (optional), `LM_STUDIO_API_KEY` (optional), `LM_STUDIO_STREAM_STALL_TIMEOUT_MS` (optional) |
|
|
123
|
+
|
|
124
|
+
For very large models, tune timeout flags:
|
|
125
|
+
- `--perf-warmup-timeout-ms` (default `300000`)
|
|
126
|
+
- `--perf-prompt-timeout-ms` (default `120000`)
|
|
127
|
+
- `--quality-timeout-ms` (default `120000`)
|
|
128
|
+
- `--coding-timeout-ms` (default `240000`)
|
|
129
|
+
- `--lm-studio-stream-stall-timeout-ms` (default `180000`, `0` disables stall timeout)
|
|
130
|
+
|
|
131
|
+
Benchmark Profile v1 (applied to all benchmark prompts):
|
|
132
|
+
- `temperature=0`
|
|
133
|
+
- `top_p=1`
|
|
134
|
+
- `seed=42`
|
|
135
|
+
- `thinking` follows your benchmark mode (`--thinking` / `--no-thinking`)
|
|
136
|
+
- Context window stays runtime default (`context=runtime-default`) and is recorded as such in metadata.
|
|
137
|
+
|
|
138
|
+
LM Studio non-thinking guard:
|
|
139
|
+
- When benchmark mode requests non-thinking (`--no-thinking` or default), MetriLLM now aborts if the model still emits reasoning traces (for result comparability).
|
|
140
|
+
- To disable it in LM Studio for affected models, put this at the top of the model chat template: `{%- set enable_thinking = false %}` then eject/reload the model.
|
|
141
|
+
|
|
96
142
|
## How Scoring Works
|
|
97
143
|
|
|
98
144
|
**Hardware Fit Score** (0-100) — how well the model runs on your machine:
|
|
99
|
-
- Speed:
|
|
100
|
-
- TTFT:
|
|
145
|
+
- Speed: 50% (tokens/sec relative to your hardware tier)
|
|
146
|
+
- TTFT: 20% (time to first token)
|
|
101
147
|
- Memory: 30% (RAM efficiency)
|
|
102
148
|
|
|
103
149
|
**Quality Score** (0-100) — how well the model answers:
|
|
104
150
|
- Reasoning: 20pts | Coding: 20pts | Instruction Following: 20pts
|
|
105
151
|
- Structured Output: 15pts | Math: 15pts | Multilingual: 10pts
|
|
106
152
|
|
|
107
|
-
**Global Score** =
|
|
153
|
+
**Global Score** = 30% Hardware Fit + 70% Quality
|
|
108
154
|
|
|
109
155
|
Hardware is auto-detected and scoring adapts to your tier (Entry/Balanced/High-End). A model hitting 10 tok/s on a 8GB machine scores differently than on a 64GB rig.
|
|
110
156
|
|
|
@@ -208,6 +254,26 @@ npm run dev # run from source
|
|
|
208
254
|
npm run test:watch # vitest watch mode
|
|
209
255
|
```
|
|
210
256
|
|
|
257
|
+
## Homebrew Formula Maintenance
|
|
258
|
+
|
|
259
|
+
The tap formula lives in `Formula/metrillm.rb`.
|
|
260
|
+
|
|
261
|
+
```bash
|
|
262
|
+
# Refresh Formula/metrillm.rb with latest npm tarball + sha256
|
|
263
|
+
./scripts/update-homebrew-formula.sh
|
|
264
|
+
|
|
265
|
+
# Or pin a specific version
|
|
266
|
+
./scripts/update-homebrew-formula.sh 0.2.1
|
|
267
|
+
```
|
|
268
|
+
|
|
269
|
+
After updating the formula, commit and push so users can install/update with:
|
|
270
|
+
|
|
271
|
+
```bash
|
|
272
|
+
brew tap MetriLLM/metrillm
|
|
273
|
+
brew install metrillm
|
|
274
|
+
brew upgrade metrillm
|
|
275
|
+
```
|
|
276
|
+
|
|
211
277
|
## Contributing
|
|
212
278
|
|
|
213
279
|
Contributions are welcome! Please read the [Contributing Guide](CONTRIBUTING.md) before submitting a pull request. All commits must include a DCO sign-off.
|