metrillm 0.1.1 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +79 -13
  2. package/dist/index.mjs +48490 -46389
  3. package/package.json +6 -4
package/README.md CHANGED
@@ -13,7 +13,8 @@
13
13
  > Think Geekbench, but for local LLMs on your actual hardware.
14
14
 
15
15
  ```bash
16
- npx metrillm@latest bench
16
+ npm install -g metrillm
17
+ metrillm bench
17
18
  ```
18
19
 
19
20
  <p align="center">
@@ -27,7 +28,7 @@ npx metrillm@latest bench
27
28
 
28
29
  - **Performance metrics**: tokens/sec, time to first token, memory usage, load time
29
30
  - **Quality evaluation**: reasoning, coding, math, instruction following, structured output, multilingual (14 prompts, 6 categories)
30
- - **Global score** (0-100): 40% hardware fit + 60% quality
31
+ - **Global score** (0-100): 30% hardware fit + 70% quality
31
32
  - **Verdict**: EXCELLENT / GOOD / MARGINAL / NOT RECOMMENDED
32
33
  - **One-click share**: `--share` uploads your result and gives you a public URL + leaderboard rank
33
34
 
@@ -52,19 +53,23 @@ npx metrillm@latest bench
52
53
 
53
54
  ## Install
54
55
 
55
- > Requires [Node 20+](https://nodejs.org/) and [Ollama](https://ollama.com/) running.
56
+ > Requires [Node 20+](https://nodejs.org/) and a local runtime:
57
+ > [Ollama](https://ollama.com/) or [LM Studio](https://lmstudio.ai/).
56
58
 
57
59
  ```bash
58
- # Run directly (no install)
59
- npx metrillm@latest bench
60
-
61
- # Or install globally
62
- npm i -g metrillm
60
+ # Install globally
61
+ npm install -g metrillm
63
62
  metrillm bench
64
63
 
65
64
  # Alternative package managers
66
- pnpm dlx metrillm@latest bench
67
- bunx metrillm@latest bench
65
+ pnpm add -g metrillm
66
+ bun add -g metrillm
67
+
68
+ # Homebrew
69
+ brew install MetriLLM/metrillm/metrillm
70
+
71
+ # Or run without installing
72
+ npx metrillm@latest bench
68
73
  ```
69
74
 
70
75
  ## Usage
@@ -76,6 +81,9 @@ metrillm bench
76
81
  # Benchmark a specific model
77
82
  metrillm bench --model gemma3:4b
78
83
 
84
+ # Benchmark with LM Studio backend
85
+ metrillm bench --backend lm-studio --model qwen3-8b
86
+
79
87
  # Benchmark all installed models
80
88
  metrillm bench --all
81
89
 
@@ -93,18 +101,56 @@ metrillm bench --export json
93
101
  metrillm bench --export csv
94
102
  ```
95
103
 
104
+ ## Upload Configuration (CLI + MCP)
105
+
106
+ By default, production builds upload shared results to the official MetriLLM leaderboard (`https://metrillm.dev`).
107
+
108
+ - No CI secret injection is required for standard releases.
109
+ - Local/dev runs use the same default behavior.
110
+ - Self-hosted or staging deployments can override endpoints with:
111
+ - `METRILLM_SUPABASE_URL`
112
+ - `METRILLM_SUPABASE_ANON_KEY`
113
+ - `METRILLM_PUBLIC_RESULT_BASE_URL`
114
+
115
+ If these variables are set to placeholder values (from templates), MetriLLM falls back to official defaults.
116
+
117
+ ## Runtime Backends
118
+
119
+ | Backend | Flag | Default URL | Required env |
120
+ |---|---|---|---|
121
+ | Ollama | `--backend ollama` | `http://127.0.0.1:11434` | `OLLAMA_HOST` (optional) |
122
+ | LM Studio | `--backend lm-studio` | `http://127.0.0.1:1234` | `LM_STUDIO_BASE_URL` (optional), `LM_STUDIO_API_KEY` (optional), `LM_STUDIO_STREAM_STALL_TIMEOUT_MS` (optional) |
123
+
124
+ For very large models, tune timeout flags:
125
+ - `--perf-warmup-timeout-ms` (default `300000`)
126
+ - `--perf-prompt-timeout-ms` (default `120000`)
127
+ - `--quality-timeout-ms` (default `120000`)
128
+ - `--coding-timeout-ms` (default `240000`)
129
+ - `--lm-studio-stream-stall-timeout-ms` (default `180000`, `0` disables stall timeout)
130
+
131
+ Benchmark Profile v1 (applied to all benchmark prompts):
132
+ - `temperature=0`
133
+ - `top_p=1`
134
+ - `seed=42`
135
+ - `thinking` follows your benchmark mode (`--thinking` / `--no-thinking`)
136
+ - Context window stays runtime default (`context=runtime-default`) and is recorded as such in metadata.
137
+
138
+ LM Studio non-thinking guard:
139
+ - When benchmark mode requests non-thinking (`--no-thinking` or default), MetriLLM now aborts if the model still emits reasoning traces (for result comparability).
140
+ - To disable it in LM Studio for affected models, put this at the top of the model chat template: `{%- set enable_thinking = false %}` then eject/reload the model.
141
+
96
142
  ## How Scoring Works
97
143
 
98
144
  **Hardware Fit Score** (0-100) — how well the model runs on your machine:
99
- - Speed: 40% (tokens/sec relative to your hardware tier)
100
- - TTFT: 30% (time to first token)
145
+ - Speed: 50% (tokens/sec relative to your hardware tier)
146
+ - TTFT: 20% (time to first token)
101
147
  - Memory: 30% (RAM efficiency)
102
148
 
103
149
  **Quality Score** (0-100) — how well the model answers:
104
150
  - Reasoning: 20pts | Coding: 20pts | Instruction Following: 20pts
105
151
  - Structured Output: 15pts | Math: 15pts | Multilingual: 10pts
106
152
 
107
- **Global Score** = 40% Hardware Fit + 60% Quality
153
+ **Global Score** = 30% Hardware Fit + 70% Quality
108
154
 
109
155
  Hardware is auto-detected and scoring adapts to your tier (Entry/Balanced/High-End). A model hitting 10 tok/s on a 8GB machine scores differently than on a 64GB rig.
110
156
 
@@ -208,6 +254,26 @@ npm run dev # run from source
208
254
  npm run test:watch # vitest watch mode
209
255
  ```
210
256
 
257
+ ## Homebrew Formula Maintenance
258
+
259
+ The tap formula lives in `Formula/metrillm.rb`.
260
+
261
+ ```bash
262
+ # Refresh Formula/metrillm.rb with latest npm tarball + sha256
263
+ ./scripts/update-homebrew-formula.sh
264
+
265
+ # Or pin a specific version
266
+ ./scripts/update-homebrew-formula.sh 0.2.1
267
+ ```
268
+
269
+ After updating the formula, commit and push so users can install/update with:
270
+
271
+ ```bash
272
+ brew tap MetriLLM/metrillm
273
+ brew install metrillm
274
+ brew upgrade metrillm
275
+ ```
276
+
211
277
  ## Contributing
212
278
 
213
279
  Contributions are welcome! Please read the [Contributing Guide](CONTRIBUTING.md) before submitting a pull request. All commits must include a DCO sign-off.