@pipelinescore/mcp 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +201 -0
- package/README.md +121 -0
- package/dist/index.js +279 -0
- package/package.json +62 -0
package/LICENSE
ADDED
|
@@ -0,0 +1,201 @@
|
|
|
1
|
+
Apache License
|
|
2
|
+
Version 2.0, January 2004
|
|
3
|
+
http://www.apache.org/licenses/
|
|
4
|
+
|
|
5
|
+
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
|
6
|
+
|
|
7
|
+
1. Definitions.
|
|
8
|
+
|
|
9
|
+
"License" shall mean the terms and conditions for use, reproduction,
|
|
10
|
+
and distribution as defined by Sections 1 through 9 of this document.
|
|
11
|
+
|
|
12
|
+
"Licensor" shall mean the copyright owner or entity authorized by
|
|
13
|
+
the copyright owner that is granting the License.
|
|
14
|
+
|
|
15
|
+
"Legal Entity" shall mean the union of the acting entity and all
|
|
16
|
+
other entities that control, are controlled by, or are under common
|
|
17
|
+
control with that entity. For the purposes of this definition,
|
|
18
|
+
"control" means (i) the power, direct or indirect, to cause the
|
|
19
|
+
direction or management of such entity, whether by contract or
|
|
20
|
+
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
|
21
|
+
outstanding shares, or (iii) beneficial ownership of such entity.
|
|
22
|
+
|
|
23
|
+
"You" (or "Your") shall mean an individual or Legal Entity
|
|
24
|
+
exercising permissions granted by this License.
|
|
25
|
+
|
|
26
|
+
"Source" form shall mean the preferred form for making modifications,
|
|
27
|
+
including but not limited to software source code, documentation
|
|
28
|
+
source, and configuration files.
|
|
29
|
+
|
|
30
|
+
"Object" form shall mean any form resulting from mechanical
|
|
31
|
+
transformation or translation of a Source form, including but
|
|
32
|
+
not limited to compiled object code, generated documentation,
|
|
33
|
+
and conversions to other media types.
|
|
34
|
+
|
|
35
|
+
"Work" shall mean the work of authorship, whether in Source or
|
|
36
|
+
Object form, made available under the License, as indicated by a
|
|
37
|
+
copyright notice that is included in or attached to the work
|
|
38
|
+
(an example is provided in the Appendix below).
|
|
39
|
+
|
|
40
|
+
"Derivative Works" shall mean any work, whether in Source or Object
|
|
41
|
+
form, that is based on (or derived from) the Work and for which the
|
|
42
|
+
editorial revisions, annotations, elaborations, or other modifications
|
|
43
|
+
represent, as a whole, an original work of authorship. For the purposes
|
|
44
|
+
of this License, Derivative Works shall not include works that remain
|
|
45
|
+
separable from, or merely link (or bind by name) to the interfaces of,
|
|
46
|
+
the Work and Derivative Works thereof.
|
|
47
|
+
|
|
48
|
+
"Contribution" shall mean any work of authorship, including
|
|
49
|
+
the original version of the Work and any modifications or additions
|
|
50
|
+
to that Work or Derivative Works thereof, that is intentionally
|
|
51
|
+
submitted to Licensor for inclusion in the Work by the copyright owner
|
|
52
|
+
or by an individual or Legal Entity authorized to submit on behalf of
|
|
53
|
+
the copyright owner. For the purposes of this definition, "submitted"
|
|
54
|
+
means any form of electronic, verbal, or written communication sent
|
|
55
|
+
to the Licensor or its representatives, including but not limited to
|
|
56
|
+
communication on electronic mailing lists, source code control systems,
|
|
57
|
+
and issue tracking systems that are managed by, or on behalf of, the
|
|
58
|
+
Licensor for the purpose of discussing and improving the Work, but
|
|
59
|
+
excluding communication that is conspicuously marked or otherwise
|
|
60
|
+
designated in writing by the copyright owner as "Not a Contribution."
|
|
61
|
+
|
|
62
|
+
"Contributor" shall mean Licensor and any individual or Legal Entity
|
|
63
|
+
on behalf of whom a Contribution has been received by Licensor and
|
|
64
|
+
subsequently incorporated within the Work.
|
|
65
|
+
|
|
66
|
+
2. Grant of Copyright License. Subject to the terms and conditions of
|
|
67
|
+
this License, each Contributor hereby grants to You a perpetual,
|
|
68
|
+
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
|
69
|
+
copyright license to reproduce, prepare Derivative Works of,
|
|
70
|
+
publicly display, publicly perform, sublicense, and distribute the
|
|
71
|
+
Work and such Derivative Works in Source or Object form.
|
|
72
|
+
|
|
73
|
+
3. Grant of Patent License. Subject to the terms and conditions of
|
|
74
|
+
this License, each Contributor hereby grants to You a perpetual,
|
|
75
|
+
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
|
76
|
+
(except as stated in this section) patent license to make, have made,
|
|
77
|
+
use, offer to sell, sell, import, and otherwise transfer the Work,
|
|
78
|
+
where such license applies only to those patent claims licensable
|
|
79
|
+
by such Contributor that are necessarily infringed by their
|
|
80
|
+
Contribution(s) alone or by combination of their Contribution(s)
|
|
81
|
+
with the Work to which such Contribution(s) was submitted. If You
|
|
82
|
+
institute patent litigation against any entity (including a
|
|
83
|
+
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
|
84
|
+
or a Contribution incorporated within the Work constitutes direct
|
|
85
|
+
or contributory patent infringement, then any patent licenses
|
|
86
|
+
granted to You under this License for that Work shall terminate
|
|
87
|
+
as of the date such litigation is filed.
|
|
88
|
+
|
|
89
|
+
4. Redistribution. You may reproduce and distribute copies of the
|
|
90
|
+
Work or Derivative Works thereof in any medium, with or without
|
|
91
|
+
modifications, and in Source or Object form, provided that You
|
|
92
|
+
meet the following conditions:
|
|
93
|
+
|
|
94
|
+
(a) You must give any other recipients of the Work or
|
|
95
|
+
Derivative Works a copy of this License; and
|
|
96
|
+
|
|
97
|
+
(b) You must cause any modified files to carry prominent notices
|
|
98
|
+
stating that You changed the files; and
|
|
99
|
+
|
|
100
|
+
(c) You must retain, in the Source form of any Derivative Works
|
|
101
|
+
that You distribute, all copyright, patent, trademark, and
|
|
102
|
+
attribution notices from the Source form of the Work,
|
|
103
|
+
excluding those notices that do not pertain to any part of
|
|
104
|
+
the Derivative Works; and
|
|
105
|
+
|
|
106
|
+
(d) If the Work includes a "NOTICE" text file as part of its
|
|
107
|
+
distribution, then any Derivative Works that You distribute must
|
|
108
|
+
include a readable copy of the attribution notices contained
|
|
109
|
+
within such NOTICE file, excluding those notices that do not
|
|
110
|
+
pertain to any part of the Derivative Works, in at least one
|
|
111
|
+
of the following places: within a NOTICE text file distributed
|
|
112
|
+
as part of the Derivative Works; within the Source form or
|
|
113
|
+
documentation, if provided along with the Derivative Works; or,
|
|
114
|
+
within a display generated by the Derivative Works, if and
|
|
115
|
+
wherever such third-party notices normally appear. The contents
|
|
116
|
+
of the NOTICE file are for informational purposes only and
|
|
117
|
+
do not modify the License. You may add Your own attribution
|
|
118
|
+
notices within Derivative Works that You distribute, alongside
|
|
119
|
+
or as an addendum to the NOTICE text from the Work, provided
|
|
120
|
+
that such additional attribution notices cannot be construed
|
|
121
|
+
as modifying the License.
|
|
122
|
+
|
|
123
|
+
You may add Your own copyright statement to Your modifications and
|
|
124
|
+
may provide additional or different license terms and conditions
|
|
125
|
+
for use, reproduction, or distribution of Your modifications, or
|
|
126
|
+
for any such Derivative Works as a whole, provided Your use,
|
|
127
|
+
reproduction, and distribution of the Work otherwise complies with
|
|
128
|
+
the conditions stated in this License.
|
|
129
|
+
|
|
130
|
+
5. Submission of Contributions. Unless You explicitly state otherwise,
|
|
131
|
+
any Contribution intentionally submitted for inclusion in the Work
|
|
132
|
+
by You to the Licensor shall be under the terms and conditions of
|
|
133
|
+
this License, without any additional terms or conditions.
|
|
134
|
+
Notwithstanding the above, nothing herein shall supersede or modify
|
|
135
|
+
the terms of any separate license agreement you may have executed
|
|
136
|
+
with Licensor regarding such Contributions.
|
|
137
|
+
|
|
138
|
+
6. Trademarks. This License does not grant permission to use the trade
|
|
139
|
+
names, trademarks, service marks, or product names of the Licensor,
|
|
140
|
+
except as required for describing the origin of the Work and
|
|
141
|
+
reproducing the content of the NOTICE file.
|
|
142
|
+
|
|
143
|
+
7. Disclaimer of Warranty. Unless required by applicable law or
|
|
144
|
+
agreed to in writing, Licensor provides the Work (and each
|
|
145
|
+
Contributor provides its Contributions) on an "AS IS" BASIS,
|
|
146
|
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
|
147
|
+
implied, including, without limitation, any warranties or conditions
|
|
148
|
+
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
|
149
|
+
PARTICULAR PURPOSE. You are solely responsible for determining the
|
|
150
|
+
appropriateness of using or redistributing the Work and assume any
|
|
151
|
+
risks associated with Your exercise of permissions under this License.
|
|
152
|
+
|
|
153
|
+
8. Limitation of Liability. In no event and under no legal theory,
|
|
154
|
+
whether in tort (including negligence), contract, or otherwise,
|
|
155
|
+
unless required by applicable law (such as deliberate and grossly
|
|
156
|
+
negligent acts) or agreed to in writing, shall any Contributor be
|
|
157
|
+
liable to You for damages, including any direct, indirect, special,
|
|
158
|
+
incidental, or consequential damages of any character arising as a
|
|
159
|
+
result of this License or out of the use or inability to use the
|
|
160
|
+
Work (including but not limited to damages for loss of goodwill,
|
|
161
|
+
work stoppage, computer failure or malfunction, or any and all
|
|
162
|
+
other commercial damages or losses), even if such Contributor
|
|
163
|
+
has been advised of the possibility of such damages.
|
|
164
|
+
|
|
165
|
+
9. Accepting Warranty or Support. While redistributing the Work or
|
|
166
|
+
Derivative Works thereof, You may choose to offer, and charge a
|
|
167
|
+
fee for, acceptance of support, warranty, indemnity, or other
|
|
168
|
+
liability obligations and/or rights consistent with this License.
|
|
169
|
+
However, in accepting such obligations, You may act only on Your
|
|
170
|
+
own behalf and on Your sole responsibility, not on behalf of any
|
|
171
|
+
other Contributor, and only if You agree to indemnify, defend,
|
|
172
|
+
and hold each Contributor harmless for any liability incurred by,
|
|
173
|
+
or claims asserted against, such Contributor by reason of your
|
|
174
|
+
accepting any such warranty or support.
|
|
175
|
+
|
|
176
|
+
END OF TERMS AND CONDITIONS
|
|
177
|
+
|
|
178
|
+
APPENDIX: How to apply the Apache License to your work.
|
|
179
|
+
|
|
180
|
+
To apply the Apache License to your work, attach the following
|
|
181
|
+
boilerplate notice, with the fields enclosed by brackets "[]"
|
|
182
|
+
replaced with your own identifying information. (Don't include
|
|
183
|
+
the brackets!) The text should be enclosed in the appropriate
|
|
184
|
+
comment syntax for the file format. We also recommend that a
|
|
185
|
+
file or class name and description of purpose be included on the
|
|
186
|
+
same "printed page" as the copyright notice for easier
|
|
187
|
+
identification within third-party archives.
|
|
188
|
+
|
|
189
|
+
Copyright 2026 Drew Mattie
|
|
190
|
+
|
|
191
|
+
Licensed under the Apache License, Version 2.0 (the "License");
|
|
192
|
+
you may not use this file except in compliance with the License.
|
|
193
|
+
You may obtain a copy of the License at
|
|
194
|
+
|
|
195
|
+
http://www.apache.org/licenses/LICENSE-2.0
|
|
196
|
+
|
|
197
|
+
Unless required by applicable law or agreed to in writing, software
|
|
198
|
+
distributed under the License is distributed on an "AS IS" BASIS,
|
|
199
|
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
|
200
|
+
implied. See the License for the specific language governing
|
|
201
|
+
permissions and limitations under the License.
|
package/README.md
ADDED
|
@@ -0,0 +1,121 @@
|
|
|
1
|
+
# @pipelinescore/mcp
|
|
2
|
+
|
|
3
|
+
**MCP server for the PipelineScore LLM benchmark.** Lets any MCP-compatible AI client (Claude Code, Codex, Cursor, Continue, Cline) drive benchmarking on your local hardware + read the public leaderboard.
|
|
4
|
+
|
|
5
|
+
[](https://pipelinescore.ai)
|
|
6
|
+
[](LICENSE)
|
|
7
|
+
[](https://www.npmjs.com/package/@pipelinescore/mcp)
|
|
8
|
+
[](https://modelcontextprotocol.io)
|
|
9
|
+
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
## Tools
|
|
13
|
+
|
|
14
|
+
| Tool | Description |
|
|
15
|
+
|---|---|
|
|
16
|
+
| `run_benchmark` | Runs `@pipelinescore/cli` against an LLM (local or frontier API), publishes the result to the public leaderboard |
|
|
17
|
+
| `get_user_leaderboard` | Reads the sortable/filterable user leaderboard (filter by model, provider, hardware, tier) |
|
|
18
|
+
| `get_user_profile` | Reads a specific user's full dashboard (best score, models tried, hardware mix, efficiency aggregates) |
|
|
19
|
+
|
|
20
|
+
## Install
|
|
21
|
+
|
|
22
|
+
```bash
|
|
23
|
+
npm install -g @pipelinescore/mcp
|
|
24
|
+
# or run on-demand without installing
|
|
25
|
+
npx @pipelinescore/mcp
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
The server speaks the MCP stdio protocol — it's spawned by your AI client, not run directly.
|
|
29
|
+
|
|
30
|
+
## Wire into Claude Code
|
|
31
|
+
|
|
32
|
+
Add to `~/.claude/settings.json`:
|
|
33
|
+
|
|
34
|
+
```json
|
|
35
|
+
{
|
|
36
|
+
"mcpServers": {
|
|
37
|
+
"pipelinescore": {
|
|
38
|
+
"command": "npx",
|
|
39
|
+
"args": ["@pipelinescore/mcp"]
|
|
40
|
+
}
|
|
41
|
+
}
|
|
42
|
+
}
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
Restart Claude Code. The three tools become available to your model.
|
|
46
|
+
|
|
47
|
+
## Wire into Cursor
|
|
48
|
+
|
|
49
|
+
Add to `.cursor/mcp.json` in your workspace:
|
|
50
|
+
|
|
51
|
+
```json
|
|
52
|
+
{
|
|
53
|
+
"mcpServers": {
|
|
54
|
+
"pipelinescore": {
|
|
55
|
+
"command": "npx",
|
|
56
|
+
"args": ["@pipelinescore/mcp"]
|
|
57
|
+
}
|
|
58
|
+
}
|
|
59
|
+
}
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
## Wire into Codex CLI
|
|
63
|
+
|
|
64
|
+
Add to `~/.codex/config.json`:
|
|
65
|
+
|
|
66
|
+
```json
|
|
67
|
+
{
|
|
68
|
+
"mcpServers": {
|
|
69
|
+
"pipelinescore": {
|
|
70
|
+
"command": "npx",
|
|
71
|
+
"args": ["@pipelinescore/mcp"]
|
|
72
|
+
}
|
|
73
|
+
}
|
|
74
|
+
}
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
## Wire into other clients
|
|
78
|
+
|
|
79
|
+
Any MCP client that supports stdio servers can use it. The command is the same: `npx @pipelinescore/mcp`.
|
|
80
|
+
|
|
81
|
+
## Environment variables
|
|
82
|
+
|
|
83
|
+
| Var | Default | Purpose |
|
|
84
|
+
|---|---|---|
|
|
85
|
+
| `PIPELINESCORE_BACKEND` | `https://api.pipelinescore.ai` | API endpoint. Override for self-hosted instances or local dev (e.g. `http://localhost:4601`). |
|
|
86
|
+
| `ANTHROPIC_API_KEY` | (none) | Forwarded to the CLI for `provider=anthropic` runs. Never sent to PipelineScore backend. |
|
|
87
|
+
| `OPENAI_API_KEY` | (none) | Forwarded to the CLI for `provider=openai` runs. Never sent to PipelineScore backend. |
|
|
88
|
+
|
|
89
|
+
## How `run_benchmark` works
|
|
90
|
+
|
|
91
|
+
Internally, the MCP server spawns `npx @pipelinescore/cli run ...` with the args forwarded. The CLI does the actual provider call + scoring + submission. The MCP server is just a thin protocol adapter.
|
|
92
|
+
|
|
93
|
+
This means:
|
|
94
|
+
- Local runs need **no API key** — the CLI just hits your local model server
|
|
95
|
+
- Frontier runs use your `ANTHROPIC_API_KEY` / `OPENAI_API_KEY` env vars (or `api_key` arg)
|
|
96
|
+
- Your key never reaches our backend (CLI calls the provider directly)
|
|
97
|
+
|
|
98
|
+
## Example AI prompt
|
|
99
|
+
|
|
100
|
+
Once installed, try saying to your AI:
|
|
101
|
+
> "Benchmark Llama 3.3 70B on my M3 Max against PipelineScore"
|
|
102
|
+
|
|
103
|
+
Your AI should:
|
|
104
|
+
1. Call `run_benchmark` with `provider=local`, `endpoint=http://localhost:11434`, `model=llama-3.3-70b`, `hardware_tag=m3-max-128gb`
|
|
105
|
+
2. Wait for the CLI to complete
|
|
106
|
+
3. Show you the score card + the public URL to your run
|
|
107
|
+
|
|
108
|
+
## Why local-first?
|
|
109
|
+
|
|
110
|
+
PipelineScore's whole pitch is **hardware-aware** ranking. Same model on M3 Max vs RTX 4090 vs A100 = three different rows. The MCP tool defaults to `--provider local` when possible — see your AI's response in [SKILL.md](https://github.com/drewmattie-code/pipelinescore/blob/main/dist/skills/pipelinescore/SKILL.md) for the default flow.
|
|
111
|
+
|
|
112
|
+
## License
|
|
113
|
+
|
|
114
|
+
[Apache 2.0](LICENSE). Drew Mattie, 2026.
|
|
115
|
+
|
|
116
|
+
## Links
|
|
117
|
+
|
|
118
|
+
- 🌐 [pipelinescore.ai](https://pipelinescore.ai) — public leaderboard
|
|
119
|
+
- 📦 [GitHub](https://github.com/drewmattie-code/pipelinescore) — source
|
|
120
|
+
- 🖥️ [CLI](https://www.npmjs.com/package/@pipelinescore/cli) — direct CLI usage
|
|
121
|
+
- 🛡️ [SECURITY.md](https://github.com/drewmattie-code/pipelinescore/blob/main/SECURITY.md) — BYOK posture + retention
|
package/dist/index.js
ADDED
|
@@ -0,0 +1,279 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
/**
|
|
3
|
+
* PipelineScore MCP Server
|
|
4
|
+
*
|
|
5
|
+
* Exposes three tools to any MCP-compatible client (Claude Code, Codex, Cursor, etc.):
|
|
6
|
+
* - run_benchmark → kicks off a benchmark via the CLI
|
|
7
|
+
* - get_user_leaderboard → reads the public user leaderboard
|
|
8
|
+
* - get_user_profile → reads a single user's dashboard
|
|
9
|
+
*
|
|
10
|
+
* Communicates with the PipelineScore HTTP API. Default backend is the public
|
|
11
|
+
* production endpoint; override with PIPELINESCORE_BACKEND env var.
|
|
12
|
+
*
|
|
13
|
+
* Stdio transport — meant to be spawned by an MCP host. See the README for setup.
|
|
14
|
+
*/
|
|
15
|
+
import { Server } from '@modelcontextprotocol/sdk/server/index.js';
|
|
16
|
+
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
|
|
17
|
+
import { CallToolRequestSchema, ListToolsRequestSchema, } from '@modelcontextprotocol/sdk/types.js';
|
|
18
|
+
import { spawn } from 'node:child_process';
|
|
19
|
+
const BACKEND = process.env.PIPELINESCORE_BACKEND ?? 'https://api.pipelinescore.ai';
|
|
20
|
+
const server = new Server({
|
|
21
|
+
name: 'pipelinescore-mcp',
|
|
22
|
+
version: '0.1.0',
|
|
23
|
+
}, {
|
|
24
|
+
capabilities: {
|
|
25
|
+
tools: {},
|
|
26
|
+
},
|
|
27
|
+
});
|
|
28
|
+
// ---- Tool registry ----------------------------------------------------------
|
|
29
|
+
server.setRequestHandler(ListToolsRequestSchema, async () => ({
|
|
30
|
+
tools: [
|
|
31
|
+
{
|
|
32
|
+
name: 'run_benchmark',
|
|
33
|
+
description: 'Run the PipelineScore benchmark against an LLM on the user\'s own hardware (preferred) or via a frontier API. Publishes the result to the hardware-aware public leaderboard at pipelinescore.ai. ' +
|
|
34
|
+
'DEFAULT TO LOCAL: most users running this are pointing at an Ollama/LM Studio/MLX/llama.cpp server on their own machine — no API key, no inference cost. Use provider=local with the appropriate endpoint (Ollama 11434, LM Studio 1234, llama.cpp 8080, MLX-Omni 10240, LiteLLM 8000). ' +
|
|
35
|
+
'For cloud benchmarks, use provider=anthropic or openai with hardware_tag="cloud-api" — the user\'s key never reaches the PipelineScore backend, it goes directly to the provider. ' +
|
|
36
|
+
'ALWAYS set hardware_tag for local runs (e.g. "m3-max-128gb", "rtx-4090-24gb", "ryzen-7950x-cpu-only") — the leaderboard groups by (model, hardware), so same model on different rigs gets ranked separately. ' +
|
|
37
|
+
'Set config_tag when testing a customized version (system prompt, LoRA, persona, RAG setup). ' +
|
|
38
|
+
'After the run, ALWAYS post https://pipelinescore.ai/users/<nickname> so the user can see their rank. The CLI auto-opens this URL by default, but post the link in your reply too.',
|
|
39
|
+
inputSchema: {
|
|
40
|
+
type: 'object',
|
|
41
|
+
required: ['provider', 'model'],
|
|
42
|
+
properties: {
|
|
43
|
+
provider: {
|
|
44
|
+
type: 'string',
|
|
45
|
+
enum: ['local', 'anthropic', 'openai'],
|
|
46
|
+
description: 'The LLM provider. PREFER "local" for any OpenAI-compatible local endpoint (Ollama, LM Studio, MLX, llama.cpp, LiteLLM proxy) — no API key, free inference, hardware-aware ranking. Use "anthropic" or "openai" only when the user explicitly wants to benchmark a frontier API (costs provider $ — recommend a spending cap on a scoped key first).',
|
|
47
|
+
},
|
|
48
|
+
model: {
|
|
49
|
+
type: 'string',
|
|
50
|
+
description: 'The model id (e.g. "claude-opus-4-7", "gpt-5.5-2026-04", or a local model name).',
|
|
51
|
+
},
|
|
52
|
+
user: {
|
|
53
|
+
type: 'string',
|
|
54
|
+
description: 'Public leaderboard nickname (alphanum + . _ -, 2-40 chars). Persisted to ~/.config/pipelinescore/config.json after first use.',
|
|
55
|
+
},
|
|
56
|
+
config_tag: {
|
|
57
|
+
type: 'string',
|
|
58
|
+
description: 'Optional. Differentiator for this configuration vs the base model — examples: "system-prompt-coder", "lora-domain-finance", "temp-zero", "tools-enabled". Persists alongside the nickname.',
|
|
59
|
+
},
|
|
60
|
+
hardware_tag: {
|
|
61
|
+
type: 'string',
|
|
62
|
+
description: 'REQUIRED for local runs, "cloud-api" for cloud. The leaderboard groups by (model, hardware) so this is what makes hardware-vs-hardware comparison possible. Examples: "m3-max-128gb", "m2-ultra-192gb", "m4-pro-48gb", "rtx-4090-24gb", "rtx-3090-24gb", "a100-80gb", "h100-80gb", "ryzen-7950x-cpu-only", "cloud-api". Encourage the user to be specific — "m3-max" alone is OK but "m3-max-128gb" is better because RAM bands matter for inference speed.',
|
|
63
|
+
},
|
|
64
|
+
endpoint: {
|
|
65
|
+
type: 'string',
|
|
66
|
+
description: 'Required when provider=local. The OpenAI-compatible base URL.',
|
|
67
|
+
},
|
|
68
|
+
api_key: {
|
|
69
|
+
type: 'string',
|
|
70
|
+
description: 'Optional — defaults to ANTHROPIC_API_KEY / OPENAI_API_KEY env vars.',
|
|
71
|
+
},
|
|
72
|
+
},
|
|
73
|
+
},
|
|
74
|
+
},
|
|
75
|
+
{
|
|
76
|
+
name: 'get_user_leaderboard',
|
|
77
|
+
description: 'Read the public PipelineScore user leaderboard — every individual benchmark run, sortable and filterable. ' +
|
|
78
|
+
'Use when the user wants to see the current standings, find a specific submission, or check what others have scored.',
|
|
79
|
+
inputSchema: {
|
|
80
|
+
type: 'object',
|
|
81
|
+
properties: {
|
|
82
|
+
provider: {
|
|
83
|
+
type: 'string',
|
|
84
|
+
description: 'Filter by provider (anthropic, openai, google, alibaba, etc.).',
|
|
85
|
+
},
|
|
86
|
+
tier: {
|
|
87
|
+
type: 'string',
|
|
88
|
+
enum: ['trunk', 'mainline', 'feeder', 'tap', 'drip'],
|
|
89
|
+
description: 'Filter by tier.',
|
|
90
|
+
},
|
|
91
|
+
user: {
|
|
92
|
+
type: 'string',
|
|
93
|
+
description: 'Filter to a specific nickname.',
|
|
94
|
+
},
|
|
95
|
+
hardware: {
|
|
96
|
+
type: 'string',
|
|
97
|
+
description: 'Filter to a specific hardware tag (e.g. "m3-max-128gb", "rtx-4090-24gb").',
|
|
98
|
+
},
|
|
99
|
+
lab_verified: {
|
|
100
|
+
type: 'boolean',
|
|
101
|
+
description: 'Show only lab-verified canonical runs.',
|
|
102
|
+
},
|
|
103
|
+
sort: {
|
|
104
|
+
type: 'string',
|
|
105
|
+
enum: ['score', 'date', 'user', 'model', 'provider', 'tier'],
|
|
106
|
+
description: 'Column to sort by (default: score).',
|
|
107
|
+
},
|
|
108
|
+
dir: {
|
|
109
|
+
type: 'string',
|
|
110
|
+
enum: ['asc', 'desc'],
|
|
111
|
+
description: 'Sort direction (default: desc).',
|
|
112
|
+
},
|
|
113
|
+
limit: {
|
|
114
|
+
type: 'number',
|
|
115
|
+
description: 'Max entries to return (default 50, max 500).',
|
|
116
|
+
},
|
|
117
|
+
},
|
|
118
|
+
},
|
|
119
|
+
},
|
|
120
|
+
{
|
|
121
|
+
name: 'get_user_profile',
|
|
122
|
+
description: "Read a single PipelineScore user's full dashboard: best score, all submissions, models tried, provider mix, category strengths. " +
|
|
123
|
+
'Use when looking up "what has X user benchmarked" or "what is X user\'s best score".',
|
|
124
|
+
inputSchema: {
|
|
125
|
+
type: 'object',
|
|
126
|
+
required: ['nickname'],
|
|
127
|
+
properties: {
|
|
128
|
+
nickname: {
|
|
129
|
+
type: 'string',
|
|
130
|
+
description: 'The user\'s leaderboard nickname.',
|
|
131
|
+
},
|
|
132
|
+
},
|
|
133
|
+
},
|
|
134
|
+
},
|
|
135
|
+
],
|
|
136
|
+
}));
|
|
137
|
+
// ---- Tool implementations ---------------------------------------------------
|
|
138
|
+
server.setRequestHandler(CallToolRequestSchema, async (request) => {
|
|
139
|
+
const { name, arguments: args } = request.params;
|
|
140
|
+
try {
|
|
141
|
+
switch (name) {
|
|
142
|
+
case 'run_benchmark':
|
|
143
|
+
return await runBenchmark((args ?? {}));
|
|
144
|
+
case 'get_user_leaderboard':
|
|
145
|
+
return await getUserLeaderboard((args ?? {}));
|
|
146
|
+
case 'get_user_profile':
|
|
147
|
+
return await getUserProfile((args ?? {}));
|
|
148
|
+
default:
|
|
149
|
+
throw new Error(`Unknown tool: ${name}`);
|
|
150
|
+
}
|
|
151
|
+
}
|
|
152
|
+
catch (err) {
|
|
153
|
+
return {
|
|
154
|
+
isError: true,
|
|
155
|
+
content: [{ type: 'text', text: `Error: ${err.message}` }],
|
|
156
|
+
};
|
|
157
|
+
}
|
|
158
|
+
});
|
|
159
|
+
async function runBenchmark(args) {
|
|
160
|
+
if (!args.provider || !args.model) {
|
|
161
|
+
throw new Error('provider and model are required');
|
|
162
|
+
}
|
|
163
|
+
if (args.provider === 'local' && !args.endpoint) {
|
|
164
|
+
throw new Error('provider=local requires an endpoint URL');
|
|
165
|
+
}
|
|
166
|
+
const cliArgs = [
|
|
167
|
+
'@pipelinescore/cli',
|
|
168
|
+
'run',
|
|
169
|
+
'--provider',
|
|
170
|
+
args.provider,
|
|
171
|
+
'--model',
|
|
172
|
+
args.model,
|
|
173
|
+
'--backend',
|
|
174
|
+
BACKEND,
|
|
175
|
+
];
|
|
176
|
+
if (args.user)
|
|
177
|
+
cliArgs.push('--user', args.user);
|
|
178
|
+
if (args.config_tag)
|
|
179
|
+
cliArgs.push('--config-tag', args.config_tag);
|
|
180
|
+
if (args.hardware_tag)
|
|
181
|
+
cliArgs.push('--hardware-tag', args.hardware_tag);
|
|
182
|
+
if (args.endpoint)
|
|
183
|
+
cliArgs.push('--endpoint', args.endpoint);
|
|
184
|
+
if (args.api_key)
|
|
185
|
+
cliArgs.push('--api-key', args.api_key);
|
|
186
|
+
return new Promise((resolve, reject) => {
|
|
187
|
+
const proc = spawn('npx', cliArgs, {
|
|
188
|
+
stdio: ['ignore', 'pipe', 'pipe'],
|
|
189
|
+
env: {
|
|
190
|
+
...process.env,
|
|
191
|
+
// Forward provider env vars
|
|
192
|
+
ANTHROPIC_API_KEY: args.api_key ?? process.env.ANTHROPIC_API_KEY ?? '',
|
|
193
|
+
OPENAI_API_KEY: args.api_key ?? process.env.OPENAI_API_KEY ?? '',
|
|
194
|
+
},
|
|
195
|
+
});
|
|
196
|
+
let stdout = '';
|
|
197
|
+
let stderr = '';
|
|
198
|
+
proc.stdout.on('data', (d) => (stdout += d.toString()));
|
|
199
|
+
proc.stderr.on('data', (d) => (stderr += d.toString()));
|
|
200
|
+
proc.on('close', (code) => {
|
|
201
|
+
if (code === 0) {
|
|
202
|
+
resolve({
|
|
203
|
+
content: [
|
|
204
|
+
{
|
|
205
|
+
type: 'text',
|
|
206
|
+
text: stdout +
|
|
207
|
+
(stderr ? `\n---\nPer-task details:\n${stderr}` : ''),
|
|
208
|
+
},
|
|
209
|
+
],
|
|
210
|
+
});
|
|
211
|
+
}
|
|
212
|
+
else {
|
|
213
|
+
reject(new Error(`CLI exited with code ${code}:\n${stderr || stdout}`));
|
|
214
|
+
}
|
|
215
|
+
});
|
|
216
|
+
proc.on('error', (err) => reject(err));
|
|
217
|
+
});
|
|
218
|
+
}
|
|
219
|
+
async function getUserLeaderboard(args) {
|
|
220
|
+
const params = new URLSearchParams();
|
|
221
|
+
if (args.provider)
|
|
222
|
+
params.set('provider', args.provider);
|
|
223
|
+
if (args.tier)
|
|
224
|
+
params.set('tier', args.tier);
|
|
225
|
+
if (args.user)
|
|
226
|
+
params.set('user', args.user);
|
|
227
|
+
if (args.hardware)
|
|
228
|
+
params.set('hardware', args.hardware);
|
|
229
|
+
if (args.lab_verified)
|
|
230
|
+
params.set('lab_verified', '1');
|
|
231
|
+
if (args.sort)
|
|
232
|
+
params.set('sort', args.sort);
|
|
233
|
+
if (args.dir)
|
|
234
|
+
params.set('dir', args.dir);
|
|
235
|
+
params.set('limit', String(args.limit ?? 50));
|
|
236
|
+
const res = await fetch(`${BACKEND}/v1/leaderboard/users?${params.toString()}`);
|
|
237
|
+
if (!res.ok)
|
|
238
|
+
throw new Error(`Backend ${res.status}: ${res.statusText}`);
|
|
239
|
+
const data = (await res.json());
|
|
240
|
+
const lines = [
|
|
241
|
+
`Showing ${data.entries.length} of ${data.total} submissions:`,
|
|
242
|
+
'',
|
|
243
|
+
...data.entries.map((e, i) => {
|
|
244
|
+
const cfg = e.config_tag ? ` cfg=${e.config_tag}` : '';
|
|
245
|
+
const hw = e.hardware_tag ? ` hw=${e.hardware_tag}` : '';
|
|
246
|
+
return ` ${String(i + 1).padStart(3)}. ${e.user_nickname.padEnd(20)} ${e.model.display_name.padEnd(28)} ${e.pipeline_score.toFixed(1).padStart(6)} ${e.tier.toUpperCase().padEnd(10)} ${e.model.provider}${hw}${cfg}`;
|
|
247
|
+
}),
|
|
248
|
+
];
|
|
249
|
+
return { content: [{ type: 'text', text: lines.join('\n') }] };
|
|
250
|
+
}
|
|
251
|
+
async function getUserProfile(args) {
|
|
252
|
+
if (!args.nickname)
|
|
253
|
+
throw new Error('nickname is required');
|
|
254
|
+
const res = await fetch(`${BACKEND}/v1/users/${encodeURIComponent(args.nickname)}`);
|
|
255
|
+
if (res.status === 404) {
|
|
256
|
+
return { content: [{ type: 'text', text: `User "${args.nickname}" not found.` }] };
|
|
257
|
+
}
|
|
258
|
+
if (!res.ok)
|
|
259
|
+
throw new Error(`Backend ${res.status}: ${res.statusText}`);
|
|
260
|
+
const data = (await res.json());
|
|
261
|
+
const lines = [
|
|
262
|
+
`═══ ${data.nickname} ═══`,
|
|
263
|
+
`Submissions: ${data.submission_count}`,
|
|
264
|
+
`Best: ${data.best_score.toFixed(1)} (${data.best_tier.toUpperCase()}) on ${data.best_model.display_name}`,
|
|
265
|
+
`Average: ${data.avg_score.toFixed(1)}`,
|
|
266
|
+
`First seen: ${data.first_seen.slice(0, 10)}`,
|
|
267
|
+
``,
|
|
268
|
+
`Provider mix:`,
|
|
269
|
+
...Object.entries(data.provider_counts).map(([p, c]) => ` ${p.padEnd(12)} ${c} run(s)`),
|
|
270
|
+
``,
|
|
271
|
+
`Models tried (best per):`,
|
|
272
|
+
...data.models_tried.map((m, i) => ` ${String(i + 1).padStart(2)}. ${m.model.display_name.padEnd(28)} ${m.pipeline_score.toFixed(1).padStart(6)} ${m.tier.toUpperCase()}`),
|
|
273
|
+
];
|
|
274
|
+
return { content: [{ type: 'text', text: lines.join('\n') }] };
|
|
275
|
+
}
|
|
276
|
+
// ---- Start ------------------------------------------------------------------
|
|
277
|
+
const transport = new StdioServerTransport();
|
|
278
|
+
await server.connect(transport);
|
|
279
|
+
process.stderr.write(`[pipelinescore-mcp] connected (backend: ${BACKEND})\n`);
|
package/package.json
ADDED
|
@@ -0,0 +1,62 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "@pipelinescore/mcp",
|
|
3
|
+
"version": "0.1.0",
|
|
4
|
+
"description": "MCP server for the PipelineScore LLM benchmark — exposes run_benchmark / get_user_leaderboard / get_user_profile tools to any MCP-compatible client (Claude Code, Codex, Cursor, Continue, Cline). Lets your AI drive local-first LLM benchmarking on your hardware.",
|
|
5
|
+
"type": "module",
|
|
6
|
+
"main": "./dist/index.js",
|
|
7
|
+
"bin": {
|
|
8
|
+
"pipelinescore-mcp": "./dist/index.js"
|
|
9
|
+
},
|
|
10
|
+
"files": [
|
|
11
|
+
"dist",
|
|
12
|
+
"README.md",
|
|
13
|
+
"LICENSE"
|
|
14
|
+
],
|
|
15
|
+
"scripts": {
|
|
16
|
+
"build": "tsc",
|
|
17
|
+
"start": "tsx src/index.ts",
|
|
18
|
+
"dev": "tsx watch src/index.ts",
|
|
19
|
+
"typecheck": "tsc --noEmit",
|
|
20
|
+
"prepublishOnly": "npm run build"
|
|
21
|
+
},
|
|
22
|
+
"engines": {
|
|
23
|
+
"node": ">=22"
|
|
24
|
+
},
|
|
25
|
+
"keywords": [
|
|
26
|
+
"mcp",
|
|
27
|
+
"model-context-protocol",
|
|
28
|
+
"llm",
|
|
29
|
+
"benchmark",
|
|
30
|
+
"leaderboard",
|
|
31
|
+
"local-llm",
|
|
32
|
+
"ollama",
|
|
33
|
+
"lm-studio",
|
|
34
|
+
"ai-evaluation",
|
|
35
|
+
"claude-code",
|
|
36
|
+
"cursor",
|
|
37
|
+
"codex"
|
|
38
|
+
],
|
|
39
|
+
"author": "Drew Mattie <drew.mattie@gmail.com> (https://github.com/drewmattie-code)",
|
|
40
|
+
"license": "Apache-2.0",
|
|
41
|
+
"homepage": "https://pipelinescore.ai",
|
|
42
|
+
"repository": {
|
|
43
|
+
"type": "git",
|
|
44
|
+
"url": "git+https://github.com/drewmattie-code/pipelinescore.git",
|
|
45
|
+
"directory": "mcp"
|
|
46
|
+
},
|
|
47
|
+
"bugs": {
|
|
48
|
+
"url": "https://github.com/drewmattie-code/pipelinescore/issues"
|
|
49
|
+
},
|
|
50
|
+
"publishConfig": {
|
|
51
|
+
"access": "public"
|
|
52
|
+
},
|
|
53
|
+
"dependencies": {
|
|
54
|
+
"@modelcontextprotocol/sdk": "^1.0.0",
|
|
55
|
+
"zod": "^3.24.0"
|
|
56
|
+
},
|
|
57
|
+
"devDependencies": {
|
|
58
|
+
"@types/node": "^25.9.1",
|
|
59
|
+
"tsx": "^4.19.0",
|
|
60
|
+
"typescript": "^5.6.0"
|
|
61
|
+
}
|
|
62
|
+
}
|