metrillm 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +155 -14
  2. package/dist/index.mjs +30683 -28797
  3. package/package.json +5 -3
package/README.md CHANGED
@@ -16,11 +16,18 @@
16
16
  npx metrillm@latest bench
17
17
  ```
18
18
 
19
+ <p align="center">
20
+ <img src="docs/images/cli1.png" width="48%" alt="MetriLLM CLI — interactive menu" />
21
+ <img src="docs/images/cli2.png" width="48%" alt="MetriLLM CLI — hardware detection" />
22
+ </p>
23
+
24
+ [![MetriLLM Leaderboard](docs/images/leaderboard.png)](https://metrillm.dev)
25
+
19
26
  ## What You Get
20
27
 
21
28
  - **Performance metrics**: tokens/sec, time to first token, memory usage, load time
22
29
  - **Quality evaluation**: reasoning, coding, math, instruction following, structured output, multilingual (14 prompts, 6 categories)
23
- - **Global score** (0-100): 40% hardware fit + 60% quality
30
+ - **Global score** (0-100): 30% hardware fit + 70% quality
24
31
  - **Verdict**: EXCELLENT / GOOD / MARGINAL / NOT RECOMMENDED
25
32
  - **One-click share**: `--share` uploads your result and gives you a public URL + leaderboard rank
26
33
 
@@ -45,7 +52,8 @@ npx metrillm@latest bench
45
52
 
46
53
  ## Install
47
54
 
48
- > Requires [Node 20+](https://nodejs.org/) and [Ollama](https://ollama.com/) running.
55
+ > Requires [Node 20+](https://nodejs.org/) and a local runtime:
56
+ > [Ollama](https://ollama.com/) or [LM Studio](https://lmstudio.ai/).
49
57
 
50
58
  ```bash
51
59
  # Run directly (no install)
@@ -55,6 +63,16 @@ npx metrillm@latest bench
55
63
  npm i -g metrillm
56
64
  metrillm bench
57
65
 
66
+ # Homebrew (no global npm install)
67
+ # One-liner install (without pre-tapping):
68
+ brew install MetriLLM/metrillm/metrillm
69
+
70
+ # Or one-time tap for short install command:
71
+ brew tap MetriLLM/metrillm
72
+ # Then:
73
+ brew install metrillm
74
+ metrillm bench
75
+
58
76
  # Alternative package managers
59
77
  pnpm dlx metrillm@latest bench
60
78
  bunx metrillm@latest bench
@@ -69,6 +87,9 @@ metrillm bench
69
87
  # Benchmark a specific model
70
88
  metrillm bench --model gemma3:4b
71
89
 
90
+ # Benchmark with LM Studio backend
91
+ metrillm bench --backend lm-studio --model qwen3-8b
92
+
72
93
  # Benchmark all installed models
73
94
  metrillm bench --all
74
95
 
@@ -86,38 +107,138 @@ metrillm bench --export json
86
107
  metrillm bench --export csv
87
108
  ```
88
109
 
110
+ ## Upload Configuration (CLI + MCP)
111
+
112
+ By default, production builds upload shared results to the official MetriLLM leaderboard (`https://metrillm.dev`).
113
+
114
+ - No CI secret injection is required for standard releases.
115
+ - Local/dev runs use the same default behavior.
116
+ - Self-hosted or staging deployments can override endpoints with:
117
+ - `METRILLM_SUPABASE_URL`
118
+ - `METRILLM_SUPABASE_ANON_KEY`
119
+ - `METRILLM_PUBLIC_RESULT_BASE_URL`
120
+
121
+ If these variables are set to placeholder values (from templates), MetriLLM falls back to official defaults.
122
+
123
+ ## Runtime Backends
124
+
125
+ | Backend | Flag | Default URL | Required env |
126
+ |---|---|---|---|
127
+ | Ollama | `--backend ollama` | `http://127.0.0.1:11434` | `OLLAMA_HOST` (optional) |
128
+ | LM Studio | `--backend lm-studio` | `http://127.0.0.1:1234` | `LM_STUDIO_BASE_URL` (optional), `LM_STUDIO_API_KEY` (optional), `LM_STUDIO_STREAM_STALL_TIMEOUT_MS` (optional) |
129
+
130
+ For very large models, tune timeout flags:
131
+ - `--perf-warmup-timeout-ms` (default `300000`)
132
+ - `--perf-prompt-timeout-ms` (default `120000`)
133
+ - `--quality-timeout-ms` (default `120000`)
134
+ - `--coding-timeout-ms` (default `240000`)
135
+ - `--lm-studio-stream-stall-timeout-ms` (default `180000`, `0` disables stall timeout)
136
+
89
137
  ## How Scoring Works
90
138
 
91
139
  **Hardware Fit Score** (0-100) — how well the model runs on your machine:
92
- - Speed: 40% (tokens/sec relative to your hardware tier)
93
- - TTFT: 30% (time to first token)
140
+ - Speed: 50% (tokens/sec relative to your hardware tier)
141
+ - TTFT: 20% (time to first token)
94
142
  - Memory: 30% (RAM efficiency)
95
143
 
96
144
  **Quality Score** (0-100) — how well the model answers:
97
145
  - Reasoning: 20pts | Coding: 20pts | Instruction Following: 20pts
98
146
  - Structured Output: 15pts | Math: 15pts | Multilingual: 10pts
99
147
 
100
- **Global Score** = 40% Hardware Fit + 60% Quality
148
+ **Global Score** = 30% Hardware Fit + 70% Quality
101
149
 
102
150
  Hardware is auto-detected and scoring adapts to your tier (Entry/Balanced/High-End). A model hitting 10 tok/s on a 8GB machine scores differently than on a 64GB rig.
103
151
 
104
152
  [Full methodology &rarr;](https://metrillm.dev/methodology)
105
153
 
106
- ## Submit Your Result
154
+ ## Share Your Results
107
155
 
108
- Every benchmark you share enriches the public leaderboard. No account needed.
156
+ Every benchmark you share enriches the [public leaderboard](https://metrillm.dev). No account needed — pick the method that fits your workflow:
157
+
158
+ | Method | Command / Action | Best for |
159
+ |--------|-----------------|----------|
160
+ | **CLI** | `metrillm bench --share` | Terminal users |
161
+ | **MCP** | Call `share_result` tool | AI coding assistants |
162
+ | **Plugin** | `/benchmark` skill with share option | Claude Code / Cursor |
163
+
164
+ All methods produce the same result:
165
+ - A **public URL** for your benchmark
166
+ - Your **rank**: "Top X% globally, Top Y% on [your CPU]"
167
+ - A **share card** for social media
168
+ - A **challenge link** to send to friends
169
+
170
+ [Compare your results on the leaderboard &rarr;](https://metrillm.dev)
171
+
172
+ ## MCP Server
173
+
174
+ Use MetriLLM from Claude Code, Cursor, Windsurf, or any MCP client — no CLI needed.
109
175
 
110
176
  ```bash
111
- metrillm bench --share
177
+ # Claude Code
178
+ claude mcp add metrillm -- npx metrillm-mcp@latest
179
+
180
+ # Claude Desktop / Cursor / Windsurf — add to MCP config:
181
+ # { "command": "npx", "args": ["metrillm-mcp@latest"] }
112
182
  ```
113
183
 
114
- You'll get:
115
- - A public URL for your result
116
- - Your rank: "Top X% globally, Top Y% on [your CPU]"
117
- - A share card for social media
118
- - A challenge link to send to friends
184
+ | Tool | Description |
185
+ |------|-------------|
186
+ | `list_models` | List locally available LLM models |
187
+ | `run_benchmark` | Run full benchmark (performance + quality) on a model |
188
+ | `get_results` | Retrieve previous benchmark results |
189
+ | `share_result` | Upload a result to the public leaderboard |
119
190
 
120
- [Compare your results on the leaderboard &rarr;](https://metrillm.dev)
191
+ [Full MCP documentation &rarr;](mcp/README.md)
192
+
193
+ ## Skills
194
+
195
+ Slash commands that work inside AI coding assistants — no server needed, just a Markdown file.
196
+
197
+ | Skill | Trigger | Description |
198
+ |-------|---------|-------------|
199
+ | `/benchmark` | User-invoked | Run a full benchmark interactively |
200
+ | `metrillm-guide` | Auto-invoked | Contextual guidance on model selection and results |
201
+
202
+ Skills are included in the [plugins](#plugins) below, or can be installed standalone:
203
+
204
+ ```bash
205
+ # Claude Code
206
+ cp -r plugins/claude-code/skills/* ~/.claude/skills/
207
+
208
+ # Cursor
209
+ cp -r plugins/cursor/skills/* ~/.cursor/skills/
210
+ ```
211
+
212
+ ## Plugins
213
+
214
+ Pre-built bundles (MCP + skills + agents) for deeper IDE integration.
215
+
216
+ | Component | Description |
217
+ |-----------|-------------|
218
+ | MCP config | Auto-connects to `metrillm-mcp` server |
219
+ | Skills | `/benchmark` + `metrillm-guide` |
220
+ | Agent | `benchmark-advisor` — analyzes your hardware and recommends models |
221
+
222
+ **Install:**
223
+ ```bash
224
+ # Claude Code
225
+ cp -r plugins/claude-code/.claude/* ~/.claude/
226
+
227
+ # Cursor
228
+ cp -r plugins/cursor/.cursor/* ~/.cursor/
229
+ ```
230
+
231
+ See [Claude Code plugin](plugins/claude-code/README.md) and [Cursor plugin](plugins/cursor/README.md) for details.
232
+
233
+ ## Integrations
234
+
235
+ | Integration | Package | Status | Docs |
236
+ |-------------|---------|--------|------|
237
+ | CLI | [`metrillm`](https://www.npmjs.com/package/metrillm) | Stable | [Usage](#usage) |
238
+ | MCP Server | [`metrillm-mcp`](https://www.npmjs.com/package/metrillm-mcp) | Stable | [MCP docs](mcp/README.md) |
239
+ | Skills | — | Stable | [Skills](#skills) |
240
+ | Claude Code plugin | — | Stable | [Plugin docs](plugins/claude-code/README.md) |
241
+ | Cursor plugin | — | Stable | [Plugin docs](plugins/cursor/README.md) |
121
242
 
122
243
  ## Development
123
244
 
@@ -128,6 +249,26 @@ npm run dev # run from source
128
249
  npm run test:watch # vitest watch mode
129
250
  ```
130
251
 
252
+ ## Homebrew Formula Maintenance
253
+
254
+ The tap formula lives in `Formula/metrillm.rb`.
255
+
256
+ ```bash
257
+ # Refresh Formula/metrillm.rb with latest npm tarball + sha256
258
+ ./scripts/update-homebrew-formula.sh
259
+
260
+ # Or pin a specific version
261
+ ./scripts/update-homebrew-formula.sh 0.2.0
262
+ ```
263
+
264
+ After updating the formula, commit and push so users can install/update with:
265
+
266
+ ```bash
267
+ brew tap MetriLLM/metrillm
268
+ brew install metrillm
269
+ brew upgrade metrillm
270
+ ```
271
+
131
272
  ## Contributing
132
273
 
133
274
  Contributions are welcome! Please read the [Contributing Guide](CONTRIBUTING.md) before submitting a pull request. All commits must include a DCO sign-off.