metrillm-mcp 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,148 @@
1
+ # MetriLLM MCP Server
2
+
3
+ [![npm version](https://img.shields.io/npm/v/metrillm-mcp)](https://www.npmjs.com/package/metrillm-mcp)
4
+
5
+ [MCP](https://modelcontextprotocol.io) (Model Context Protocol) server for [MetriLLM](https://github.com/MetriLLM/metrillm) — benchmark local LLMs directly from Claude Code, Cursor, Windsurf, Continue.dev, or any MCP-compatible client.
6
+
7
+ ## Quick Start
8
+
9
+ ### Claude Code
10
+
11
+ ```bash
12
+ claude mcp add metrillm -- npx metrillm-mcp@latest
13
+ ```
14
+
15
+ ### Claude Desktop
16
+
17
+ Add to `claude_desktop_config.json`:
18
+
19
+ ```json
20
+ {
21
+ "mcpServers": {
22
+ "metrillm": {
23
+ "command": "npx",
24
+ "args": ["metrillm-mcp@latest"]
25
+ }
26
+ }
27
+ }
28
+ ```
29
+
30
+ ### Cursor / Windsurf / Continue.dev
31
+
32
+ Add to your editor's MCP configuration:
33
+
34
+ ```json
35
+ {
36
+ "mcpServers": {
37
+ "metrillm": {
38
+ "command": "npx",
39
+ "args": ["metrillm-mcp@latest"]
40
+ }
41
+ }
42
+ }
43
+ ```
44
+
45
+ ## Prerequisites
46
+
47
+ - Node.js >= 20
48
+ - [Ollama](https://ollama.com) installed and running (`ollama serve`)
49
+ - At least one model available (`ollama pull llama3.2:3b`)
50
+
51
+ ## Available Tools
52
+
53
+ ### `list_models`
54
+
55
+ List all locally available LLM models.
56
+
57
+ | Param | Type | Default | Description |
58
+ |---|---|---|---|
59
+ | `runtime` | `"ollama"` | `"ollama"` | Inference runtime |
60
+
61
+ **Example response:**
62
+ ```json
63
+ {
64
+ "models": [
65
+ { "name": "llama3.2:3b", "size": 2019393189, "parameterSize": "3.2B", "quantization": "Q4_K_M", "family": "llama" }
66
+ ],
67
+ "count": 1
68
+ }
69
+ ```
70
+
71
+ ### `run_benchmark`
72
+
73
+ Run a full benchmark (performance + quality) on a local model.
74
+
75
+ | Param | Type | Default | Description |
76
+ |---|---|---|---|
77
+ | `model` | `string` | *(required)* | Model name (e.g. `"llama3.2:3b"`) |
78
+ | `runtime` | `"ollama"` | `"ollama"` | Inference runtime |
79
+ | `perfOnly` | `boolean` | `false` | If `true`, measure performance only (skip quality) |
80
+
81
+ **Example response:**
82
+ ```json
83
+ {
84
+ "success": true,
85
+ "model": "llama3.2:3b",
86
+ "verdict": "GOOD",
87
+ "globalScore": 65,
88
+ "performance": {
89
+ "tokensPerSecond": 42.5,
90
+ "ttftMs": 120,
91
+ "memoryUsedGB": 2.1,
92
+ "memoryPercent": 13
93
+ },
94
+ "interpretation": "This model runs well on your hardware."
95
+ }
96
+ ```
97
+
98
+ ### `get_results`
99
+
100
+ Retrieve previous benchmark results stored locally.
101
+
102
+ | Param | Type | Default | Description |
103
+ |---|---|---|---|
104
+ | `model` | `string` | *(optional)* | Filter by model name (substring match) |
105
+ | `runtime` | `"ollama"` | `"ollama"` | Inference runtime |
106
+
107
+ ### `share_result`
108
+
109
+ Upload a result to the public MetriLLM leaderboard.
110
+
111
+ | Param | Type | Description |
112
+ |---|---|---|
113
+ | `resultFile` | `string` | Absolute path to the result JSON file |
114
+
115
+ **Required environment variables:**
116
+ - `METRILLM_SUPABASE_URL`
117
+ - `METRILLM_SUPABASE_ANON_KEY`
118
+ - `METRILLM_PUBLIC_RESULT_BASE_URL`
119
+
120
+ ## Architecture
121
+
122
+ The MCP server is a thin wrapper around the existing MetriLLM CLI logic:
123
+
124
+ ```
125
+ mcp/src/index.ts → MCP entry point (stdio transport)
126
+ mcp/src/tools.ts → Tool definitions + calls to CLI modules
127
+
128
+ ../src/core/ → CLI logic reused directly
129
+ ../src/commands/ → CLI commands (bench, list)
130
+ ```
131
+
132
+ No code duplication — the MCP server imports CLI modules directly.
133
+
134
+ ## Supported Runtimes
135
+
136
+ | Runtime | Status |
137
+ |---|---|
138
+ | Ollama | Supported |
139
+ | LM Studio | Planned |
140
+ | MLX | Planned |
141
+ | llama.cpp | Planned |
142
+ | vLLM | Planned |
143
+
144
+ The `runtime` parameter is present on every tool to prepare for multi-runtime support. Unimplemented runtimes return a clear error.
145
+
146
+ ## License
147
+
148
+ [Apache License 2.0](../LICENSE)