metrillm 0.1.1 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +66 -5
  2. package/dist/index.mjs +30672 -28795
  3. package/package.json +5 -3
package/README.md CHANGED
@@ -27,7 +27,7 @@ npx metrillm@latest bench
27
27
 
28
28
  - **Performance metrics**: tokens/sec, time to first token, memory usage, load time
29
29
  - **Quality evaluation**: reasoning, coding, math, instruction following, structured output, multilingual (14 prompts, 6 categories)
30
- - **Global score** (0-100): 40% hardware fit + 60% quality
30
+ - **Global score** (0-100): 30% hardware fit + 70% quality
31
31
  - **Verdict**: EXCELLENT / GOOD / MARGINAL / NOT RECOMMENDED
32
32
  - **One-click share**: `--share` uploads your result and gives you a public URL + leaderboard rank
33
33
 
@@ -52,7 +52,8 @@ npx metrillm@latest bench
52
52
 
53
53
  ## Install
54
54
 
55
- > Requires [Node 20+](https://nodejs.org/) and [Ollama](https://ollama.com/) running.
55
+ > Requires [Node 20+](https://nodejs.org/) and a local runtime:
56
+ > [Ollama](https://ollama.com/) or [LM Studio](https://lmstudio.ai/).
56
57
 
57
58
  ```bash
58
59
  # Run directly (no install)
@@ -62,6 +63,16 @@ npx metrillm@latest bench
62
63
  npm i -g metrillm
63
64
  metrillm bench
64
65
 
66
+ # Homebrew (no global npm install)
67
+ # One-liner install (without pre-tapping):
68
+ brew install MetriLLM/metrillm/metrillm
69
+
70
+ # Or one-time tap for short install command:
71
+ brew tap MetriLLM/metrillm
72
+ # Then:
73
+ brew install metrillm
74
+ metrillm bench
75
+
65
76
  # Alternative package managers
66
77
  pnpm dlx metrillm@latest bench
67
78
  bunx metrillm@latest bench
@@ -76,6 +87,9 @@ metrillm bench
76
87
  # Benchmark a specific model
77
88
  metrillm bench --model gemma3:4b
78
89
 
90
+ # Benchmark with LM Studio backend
91
+ metrillm bench --backend lm-studio --model qwen3-8b
92
+
79
93
  # Benchmark all installed models
80
94
  metrillm bench --all
81
95
 
@@ -93,18 +107,45 @@ metrillm bench --export json
93
107
  metrillm bench --export csv
94
108
  ```
95
109
 
110
+ ## Upload Configuration (CLI + MCP)
111
+
112
+ By default, production builds upload shared results to the official MetriLLM leaderboard (`https://metrillm.dev`).
113
+
114
+ - No CI secret injection is required for standard releases.
115
+ - Local/dev runs use the same default behavior.
116
+ - Self-hosted or staging deployments can override endpoints with:
117
+ - `METRILLM_SUPABASE_URL`
118
+ - `METRILLM_SUPABASE_ANON_KEY`
119
+ - `METRILLM_PUBLIC_RESULT_BASE_URL`
120
+
121
+ If these variables are set to placeholder values (from templates), MetriLLM falls back to official defaults.
122
+
123
+ ## Runtime Backends
124
+
125
+ | Backend | Flag | Default URL | Required env |
126
+ |---|---|---|---|
127
+ | Ollama | `--backend ollama` | `http://127.0.0.1:11434` | `OLLAMA_HOST` (optional) |
128
+ | LM Studio | `--backend lm-studio` | `http://127.0.0.1:1234` | `LM_STUDIO_BASE_URL` (optional), `LM_STUDIO_API_KEY` (optional), `LM_STUDIO_STREAM_STALL_TIMEOUT_MS` (optional) |
129
+
130
+ For very large models, tune timeout flags:
131
+ - `--perf-warmup-timeout-ms` (default `300000`)
132
+ - `--perf-prompt-timeout-ms` (default `120000`)
133
+ - `--quality-timeout-ms` (default `120000`)
134
+ - `--coding-timeout-ms` (default `240000`)
135
+ - `--lm-studio-stream-stall-timeout-ms` (default `180000`, `0` disables stall timeout)
136
+
96
137
  ## How Scoring Works
97
138
 
98
139
  **Hardware Fit Score** (0-100) — how well the model runs on your machine:
99
- - Speed: 40% (tokens/sec relative to your hardware tier)
100
- - TTFT: 30% (time to first token)
140
+ - Speed: 50% (tokens/sec relative to your hardware tier)
141
+ - TTFT: 20% (time to first token)
101
142
  - Memory: 30% (RAM efficiency)
102
143
 
103
144
  **Quality Score** (0-100) — how well the model answers:
104
145
  - Reasoning: 20pts | Coding: 20pts | Instruction Following: 20pts
105
146
  - Structured Output: 15pts | Math: 15pts | Multilingual: 10pts
106
147
 
107
- **Global Score** = 40% Hardware Fit + 60% Quality
148
+ **Global Score** = 30% Hardware Fit + 70% Quality
108
149
 
109
150
  Hardware is auto-detected and scoring adapts to your tier (Entry/Balanced/High-End). A model hitting 10 tok/s on a 8GB machine scores differently than on a 64GB rig.
110
151
 
@@ -208,6 +249,26 @@ npm run dev # run from source
208
249
  npm run test:watch # vitest watch mode
209
250
  ```
210
251
 
252
+ ## Homebrew Formula Maintenance
253
+
254
+ The tap formula lives in `Formula/metrillm.rb`.
255
+
256
+ ```bash
257
+ # Refresh Formula/metrillm.rb with latest npm tarball + sha256
258
+ ./scripts/update-homebrew-formula.sh
259
+
260
+ # Or pin a specific version
261
+ ./scripts/update-homebrew-formula.sh 0.2.0
262
+ ```
263
+
264
+ After updating the formula, commit and push so users can install/update with:
265
+
266
+ ```bash
267
+ brew tap MetriLLM/metrillm
268
+ brew install metrillm
269
+ brew upgrade metrillm
270
+ ```
271
+
211
272
  ## Contributing
212
273
 
213
274
  Contributions are welcome! Please read the [Contributing Guide](CONTRIBUTING.md) before submitting a pull request. All commits must include a DCO sign-off.