metrillm 0.1.0 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +155 -14
- package/dist/index.mjs +30683 -28797
- package/package.json +5 -3
package/README.md
CHANGED
|
@@ -16,11 +16,18 @@
|
|
|
16
16
|
npx metrillm@latest bench
|
|
17
17
|
```
|
|
18
18
|
|
|
19
|
+
<p align="center">
|
|
20
|
+
<img src="docs/images/cli1.png" width="48%" alt="MetriLLM CLI — interactive menu" />
|
|
21
|
+
<img src="docs/images/cli2.png" width="48%" alt="MetriLLM CLI — hardware detection" />
|
|
22
|
+
</p>
|
|
23
|
+
|
|
24
|
+
[](https://metrillm.dev)
|
|
25
|
+
|
|
19
26
|
## What You Get
|
|
20
27
|
|
|
21
28
|
- **Performance metrics**: tokens/sec, time to first token, memory usage, load time
|
|
22
29
|
- **Quality evaluation**: reasoning, coding, math, instruction following, structured output, multilingual (14 prompts, 6 categories)
|
|
23
|
-
- **Global score** (0-100):
|
|
30
|
+
- **Global score** (0-100): 30% hardware fit + 70% quality
|
|
24
31
|
- **Verdict**: EXCELLENT / GOOD / MARGINAL / NOT RECOMMENDED
|
|
25
32
|
- **One-click share**: `--share` uploads your result and gives you a public URL + leaderboard rank
|
|
26
33
|
|
|
@@ -45,7 +52,8 @@ npx metrillm@latest bench
|
|
|
45
52
|
|
|
46
53
|
## Install
|
|
47
54
|
|
|
48
|
-
> Requires [Node 20+](https://nodejs.org/) and
|
|
55
|
+
> Requires [Node 20+](https://nodejs.org/) and a local runtime:
|
|
56
|
+
> [Ollama](https://ollama.com/) or [LM Studio](https://lmstudio.ai/).
|
|
49
57
|
|
|
50
58
|
```bash
|
|
51
59
|
# Run directly (no install)
|
|
@@ -55,6 +63,16 @@ npx metrillm@latest bench
|
|
|
55
63
|
npm i -g metrillm
|
|
56
64
|
metrillm bench
|
|
57
65
|
|
|
66
|
+
# Homebrew (no global npm install)
|
|
67
|
+
# One-liner install (without pre-tapping):
|
|
68
|
+
brew install MetriLLM/metrillm/metrillm
|
|
69
|
+
|
|
70
|
+
# Or one-time tap for short install command:
|
|
71
|
+
brew tap MetriLLM/metrillm
|
|
72
|
+
# Then:
|
|
73
|
+
brew install metrillm
|
|
74
|
+
metrillm bench
|
|
75
|
+
|
|
58
76
|
# Alternative package managers
|
|
59
77
|
pnpm dlx metrillm@latest bench
|
|
60
78
|
bunx metrillm@latest bench
|
|
@@ -69,6 +87,9 @@ metrillm bench
|
|
|
69
87
|
# Benchmark a specific model
|
|
70
88
|
metrillm bench --model gemma3:4b
|
|
71
89
|
|
|
90
|
+
# Benchmark with LM Studio backend
|
|
91
|
+
metrillm bench --backend lm-studio --model qwen3-8b
|
|
92
|
+
|
|
72
93
|
# Benchmark all installed models
|
|
73
94
|
metrillm bench --all
|
|
74
95
|
|
|
@@ -86,38 +107,138 @@ metrillm bench --export json
|
|
|
86
107
|
metrillm bench --export csv
|
|
87
108
|
```
|
|
88
109
|
|
|
110
|
+
## Upload Configuration (CLI + MCP)
|
|
111
|
+
|
|
112
|
+
By default, production builds upload shared results to the official MetriLLM leaderboard (`https://metrillm.dev`).
|
|
113
|
+
|
|
114
|
+
- No CI secret injection is required for standard releases.
|
|
115
|
+
- Local/dev runs use the same default behavior.
|
|
116
|
+
- Self-hosted or staging deployments can override endpoints with:
|
|
117
|
+
- `METRILLM_SUPABASE_URL`
|
|
118
|
+
- `METRILLM_SUPABASE_ANON_KEY`
|
|
119
|
+
- `METRILLM_PUBLIC_RESULT_BASE_URL`
|
|
120
|
+
|
|
121
|
+
If these variables are set to placeholder values (from templates), MetriLLM falls back to official defaults.
|
|
122
|
+
|
|
123
|
+
## Runtime Backends
|
|
124
|
+
|
|
125
|
+
| Backend | Flag | Default URL | Required env |
|
|
126
|
+
|---|---|---|---|
|
|
127
|
+
| Ollama | `--backend ollama` | `http://127.0.0.1:11434` | `OLLAMA_HOST` (optional) |
|
|
128
|
+
| LM Studio | `--backend lm-studio` | `http://127.0.0.1:1234` | `LM_STUDIO_BASE_URL` (optional), `LM_STUDIO_API_KEY` (optional), `LM_STUDIO_STREAM_STALL_TIMEOUT_MS` (optional) |
|
|
129
|
+
|
|
130
|
+
For very large models, tune timeout flags:
|
|
131
|
+
- `--perf-warmup-timeout-ms` (default `300000`)
|
|
132
|
+
- `--perf-prompt-timeout-ms` (default `120000`)
|
|
133
|
+
- `--quality-timeout-ms` (default `120000`)
|
|
134
|
+
- `--coding-timeout-ms` (default `240000`)
|
|
135
|
+
- `--lm-studio-stream-stall-timeout-ms` (default `180000`, `0` disables stall timeout)
|
|
136
|
+
|
|
89
137
|
## How Scoring Works
|
|
90
138
|
|
|
91
139
|
**Hardware Fit Score** (0-100) — how well the model runs on your machine:
|
|
92
|
-
- Speed:
|
|
93
|
-
- TTFT:
|
|
140
|
+
- Speed: 50% (tokens/sec relative to your hardware tier)
|
|
141
|
+
- TTFT: 20% (time to first token)
|
|
94
142
|
- Memory: 30% (RAM efficiency)
|
|
95
143
|
|
|
96
144
|
**Quality Score** (0-100) — how well the model answers:
|
|
97
145
|
- Reasoning: 20pts | Coding: 20pts | Instruction Following: 20pts
|
|
98
146
|
- Structured Output: 15pts | Math: 15pts | Multilingual: 10pts
|
|
99
147
|
|
|
100
|
-
**Global Score** =
|
|
148
|
+
**Global Score** = 30% Hardware Fit + 70% Quality
|
|
101
149
|
|
|
102
150
|
Hardware is auto-detected and scoring adapts to your tier (Entry/Balanced/High-End). A model hitting 10 tok/s on a 8GB machine scores differently than on a 64GB rig.
|
|
103
151
|
|
|
104
152
|
[Full methodology →](https://metrillm.dev/methodology)
|
|
105
153
|
|
|
106
|
-
##
|
|
154
|
+
## Share Your Results
|
|
107
155
|
|
|
108
|
-
Every benchmark you share enriches the public leaderboard. No account needed
|
|
156
|
+
Every benchmark you share enriches the [public leaderboard](https://metrillm.dev). No account needed — pick the method that fits your workflow:
|
|
157
|
+
|
|
158
|
+
| Method | Command / Action | Best for |
|
|
159
|
+
|--------|-----------------|----------|
|
|
160
|
+
| **CLI** | `metrillm bench --share` | Terminal users |
|
|
161
|
+
| **MCP** | Call `share_result` tool | AI coding assistants |
|
|
162
|
+
| **Plugin** | `/benchmark` skill with share option | Claude Code / Cursor |
|
|
163
|
+
|
|
164
|
+
All methods produce the same result:
|
|
165
|
+
- A **public URL** for your benchmark
|
|
166
|
+
- Your **rank**: "Top X% globally, Top Y% on [your CPU]"
|
|
167
|
+
- A **share card** for social media
|
|
168
|
+
- A **challenge link** to send to friends
|
|
169
|
+
|
|
170
|
+
[Compare your results on the leaderboard →](https://metrillm.dev)
|
|
171
|
+
|
|
172
|
+
## MCP Server
|
|
173
|
+
|
|
174
|
+
Use MetriLLM from Claude Code, Cursor, Windsurf, or any MCP client — no CLI needed.
|
|
109
175
|
|
|
110
176
|
```bash
|
|
111
|
-
|
|
177
|
+
# Claude Code
|
|
178
|
+
claude mcp add metrillm -- npx metrillm-mcp@latest
|
|
179
|
+
|
|
180
|
+
# Claude Desktop / Cursor / Windsurf — add to MCP config:
|
|
181
|
+
# { "command": "npx", "args": ["metrillm-mcp@latest"] }
|
|
112
182
|
```
|
|
113
183
|
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
|
|
184
|
+
| Tool | Description |
|
|
185
|
+
|------|-------------|
|
|
186
|
+
| `list_models` | List locally available LLM models |
|
|
187
|
+
| `run_benchmark` | Run full benchmark (performance + quality) on a model |
|
|
188
|
+
| `get_results` | Retrieve previous benchmark results |
|
|
189
|
+
| `share_result` | Upload a result to the public leaderboard |
|
|
119
190
|
|
|
120
|
-
[
|
|
191
|
+
[Full MCP documentation →](mcp/README.md)
|
|
192
|
+
|
|
193
|
+
## Skills
|
|
194
|
+
|
|
195
|
+
Slash commands that work inside AI coding assistants — no server needed, just a Markdown file.
|
|
196
|
+
|
|
197
|
+
| Skill | Trigger | Description |
|
|
198
|
+
|-------|---------|-------------|
|
|
199
|
+
| `/benchmark` | User-invoked | Run a full benchmark interactively |
|
|
200
|
+
| `metrillm-guide` | Auto-invoked | Contextual guidance on model selection and results |
|
|
201
|
+
|
|
202
|
+
Skills are included in the [plugins](#plugins) below, or can be installed standalone:
|
|
203
|
+
|
|
204
|
+
```bash
|
|
205
|
+
# Claude Code
|
|
206
|
+
cp -r plugins/claude-code/skills/* ~/.claude/skills/
|
|
207
|
+
|
|
208
|
+
# Cursor
|
|
209
|
+
cp -r plugins/cursor/skills/* ~/.cursor/skills/
|
|
210
|
+
```
|
|
211
|
+
|
|
212
|
+
## Plugins
|
|
213
|
+
|
|
214
|
+
Pre-built bundles (MCP + skills + agents) for deeper IDE integration.
|
|
215
|
+
|
|
216
|
+
| Component | Description |
|
|
217
|
+
|-----------|-------------|
|
|
218
|
+
| MCP config | Auto-connects to `metrillm-mcp` server |
|
|
219
|
+
| Skills | `/benchmark` + `metrillm-guide` |
|
|
220
|
+
| Agent | `benchmark-advisor` — analyzes your hardware and recommends models |
|
|
221
|
+
|
|
222
|
+
**Install:**
|
|
223
|
+
```bash
|
|
224
|
+
# Claude Code
|
|
225
|
+
cp -r plugins/claude-code/.claude/* ~/.claude/
|
|
226
|
+
|
|
227
|
+
# Cursor
|
|
228
|
+
cp -r plugins/cursor/.cursor/* ~/.cursor/
|
|
229
|
+
```
|
|
230
|
+
|
|
231
|
+
See [Claude Code plugin](plugins/claude-code/README.md) and [Cursor plugin](plugins/cursor/README.md) for details.
|
|
232
|
+
|
|
233
|
+
## Integrations
|
|
234
|
+
|
|
235
|
+
| Integration | Package | Status | Docs |
|
|
236
|
+
|-------------|---------|--------|------|
|
|
237
|
+
| CLI | [`metrillm`](https://www.npmjs.com/package/metrillm) | Stable | [Usage](#usage) |
|
|
238
|
+
| MCP Server | [`metrillm-mcp`](https://www.npmjs.com/package/metrillm-mcp) | Stable | [MCP docs](mcp/README.md) |
|
|
239
|
+
| Skills | — | Stable | [Skills](#skills) |
|
|
240
|
+
| Claude Code plugin | — | Stable | [Plugin docs](plugins/claude-code/README.md) |
|
|
241
|
+
| Cursor plugin | — | Stable | [Plugin docs](plugins/cursor/README.md) |
|
|
121
242
|
|
|
122
243
|
## Development
|
|
123
244
|
|
|
@@ -128,6 +249,26 @@ npm run dev # run from source
|
|
|
128
249
|
npm run test:watch # vitest watch mode
|
|
129
250
|
```
|
|
130
251
|
|
|
252
|
+
## Homebrew Formula Maintenance
|
|
253
|
+
|
|
254
|
+
The tap formula lives in `Formula/metrillm.rb`.
|
|
255
|
+
|
|
256
|
+
```bash
|
|
257
|
+
# Refresh Formula/metrillm.rb with latest npm tarball + sha256
|
|
258
|
+
./scripts/update-homebrew-formula.sh
|
|
259
|
+
|
|
260
|
+
# Or pin a specific version
|
|
261
|
+
./scripts/update-homebrew-formula.sh 0.2.0
|
|
262
|
+
```
|
|
263
|
+
|
|
264
|
+
After updating the formula, commit and push so users can install/update with:
|
|
265
|
+
|
|
266
|
+
```bash
|
|
267
|
+
brew tap MetriLLM/metrillm
|
|
268
|
+
brew install metrillm
|
|
269
|
+
brew upgrade metrillm
|
|
270
|
+
```
|
|
271
|
+
|
|
131
272
|
## Contributing
|
|
132
273
|
|
|
133
274
|
Contributions are welcome! Please read the [Contributing Guide](CONTRIBUTING.md) before submitting a pull request. All commits must include a DCO sign-off.
|