mcpbr-cli 0.3.28 → 0.4.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +194 -48
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -1,11 +1,11 @@
|
|
|
1
1
|
# mcpbr
|
|
2
2
|
|
|
3
3
|
```bash
|
|
4
|
-
#
|
|
5
|
-
|
|
4
|
+
# One-liner install (installs + runs quick test)
|
|
5
|
+
curl -sSL https://raw.githubusercontent.com/greynewell/mcpbr/main/install.sh | bash
|
|
6
6
|
|
|
7
|
-
# Or
|
|
8
|
-
|
|
7
|
+
# Or install and run manually
|
|
8
|
+
pip install mcpbr && mcpbr run -n 1
|
|
9
9
|
```
|
|
10
10
|
|
|
11
11
|
Benchmark your MCP server against real GitHub issues. One command, hard numbers.
|
|
@@ -19,6 +19,7 @@ Benchmark your MCP server against real GitHub issues. One command, hard numbers.
|
|
|
19
19
|
**Model Context Protocol Benchmark Runner**
|
|
20
20
|
|
|
21
21
|
[](https://pypi.org/project/mcpbr/)
|
|
22
|
+
[](https://www.npmjs.com/package/mcpbr-cli)
|
|
22
23
|
[](https://www.python.org/downloads/)
|
|
23
24
|
[](https://github.com/greynewell/mcpbr/actions/workflows/ci.yml)
|
|
24
25
|
[](https://opensource.org/licenses/MIT)
|
|
@@ -60,11 +61,15 @@ mcpbr supports multiple software engineering benchmarks through a flexible abstr
|
|
|
60
61
|
### SWE-bench (Default)
|
|
61
62
|
Real GitHub issues requiring bug fixes and patches. The agent generates unified diffs evaluated by running pytest test suites.
|
|
62
63
|
|
|
63
|
-
- **Dataset**: [SWE-bench/SWE-bench_Lite](https://huggingface.co/datasets/SWE-bench/SWE-bench_Lite)
|
|
64
64
|
- **Task**: Generate patches to fix bugs
|
|
65
65
|
- **Evaluation**: Test suite pass/fail
|
|
66
66
|
- **Pre-built images**: Available for most tasks
|
|
67
67
|
|
|
68
|
+
**Variants:**
|
|
69
|
+
- **swe-bench-verified** (default) - Manually validated test cases for higher quality evaluation ([SWE-bench/SWE-bench_Verified](https://huggingface.co/datasets/SWE-bench/SWE-bench_Verified))
|
|
70
|
+
- **swe-bench-lite** - 300 tasks, quick testing ([SWE-bench/SWE-bench_Lite](https://huggingface.co/datasets/SWE-bench/SWE-bench_Lite))
|
|
71
|
+
- **swe-bench-full** - 2,294 tasks, complete benchmark ([SWE-bench/SWE-bench](https://huggingface.co/datasets/SWE-bench/SWE-bench))
|
|
72
|
+
|
|
68
73
|
### CyberGym
|
|
69
74
|
Security vulnerabilities requiring Proof-of-Concept (PoC) exploits. The agent generates exploits that trigger crashes in vulnerable code.
|
|
70
75
|
|
|
@@ -84,16 +89,16 @@ Large-scale MCP tool use evaluation across 45+ categories. Tests agent capabilit
|
|
|
84
89
|
- **Learn more**: [MCPToolBench++ Paper](https://arxiv.org/pdf/2508.07575) | [GitHub](https://github.com/mcp-tool-bench/MCPToolBenchPP)
|
|
85
90
|
|
|
86
91
|
```bash
|
|
87
|
-
# Run SWE-bench (default)
|
|
92
|
+
# Run SWE-bench Verified (default - manually validated tests)
|
|
88
93
|
mcpbr run -c config.yaml
|
|
89
94
|
|
|
90
|
-
# Run
|
|
91
|
-
mcpbr run -c config.yaml
|
|
95
|
+
# Run SWE-bench Lite (300 tasks, quick testing)
|
|
96
|
+
mcpbr run -c config.yaml -b swe-bench-lite
|
|
92
97
|
|
|
93
|
-
# Run
|
|
94
|
-
mcpbr run -c config.yaml
|
|
98
|
+
# Run SWE-bench Full (2,294 tasks, complete benchmark)
|
|
99
|
+
mcpbr run -c config.yaml -b swe-bench-full
|
|
95
100
|
|
|
96
|
-
# List available benchmarks
|
|
101
|
+
# List all available benchmarks
|
|
97
102
|
mcpbr benchmarks
|
|
98
103
|
```
|
|
99
104
|
|
|
@@ -211,6 +216,8 @@ Run `mcpbr models` to see the full list.
|
|
|
211
216
|
|
|
212
217
|
### via npm
|
|
213
218
|
|
|
219
|
+
[](https://www.npmjs.com/package/mcpbr-cli)
|
|
220
|
+
|
|
214
221
|
```bash
|
|
215
222
|
# Run with npx (no installation)
|
|
216
223
|
npx mcpbr-cli run -c config.yaml
|
|
@@ -220,6 +227,8 @@ npm install -g mcpbr-cli
|
|
|
220
227
|
mcpbr run -c config.yaml
|
|
221
228
|
```
|
|
222
229
|
|
|
230
|
+
> **Package**: [`mcpbr-cli`](https://www.npmjs.com/package/mcpbr-cli) on npm
|
|
231
|
+
>
|
|
223
232
|
> **Note**: The npm package requires Python 3.11+ and the mcpbr Python package (`pip install mcpbr`)
|
|
224
233
|
|
|
225
234
|
### via pip
|
|
@@ -271,9 +280,13 @@ See the **[Examples README](examples/README.md)** for the complete guide.
|
|
|
271
280
|
export ANTHROPIC_API_KEY="your-api-key"
|
|
272
281
|
```
|
|
273
282
|
|
|
274
|
-
2. **
|
|
283
|
+
2. **Run mcpbr (config auto-created if missing):**
|
|
275
284
|
|
|
276
285
|
```bash
|
|
286
|
+
# Config is auto-created on first run
|
|
287
|
+
mcpbr run -n 1
|
|
288
|
+
|
|
289
|
+
# Or explicitly generate a config file first
|
|
277
290
|
mcpbr init
|
|
278
291
|
```
|
|
279
292
|
|
|
@@ -311,55 +324,135 @@ mcpbr run --config config.yaml
|
|
|
311
324
|
|
|
312
325
|
[](https://claude.ai/download)
|
|
313
326
|
|
|
314
|
-
mcpbr includes a built-in Claude Code plugin that makes Claude an expert at running benchmarks correctly.
|
|
327
|
+
mcpbr includes a built-in Claude Code plugin that makes Claude an expert at running benchmarks correctly. The plugin provides specialized skills and knowledge about mcpbr configuration, execution, and troubleshooting.
|
|
315
328
|
|
|
316
|
-
###
|
|
329
|
+
### Installation Options
|
|
317
330
|
|
|
318
|
-
|
|
331
|
+
You have three ways to enable the mcpbr plugin in Claude Code:
|
|
319
332
|
|
|
320
|
-
|
|
321
|
-
- "Generate a config for my MCP server"
|
|
322
|
-
- "Run a quick test with 1 task"
|
|
333
|
+
#### Option 1: Clone Repository (Automatic Detection)
|
|
323
334
|
|
|
324
|
-
Claude
|
|
325
|
-
- Verify Docker is running before starting
|
|
326
|
-
- Check for required API keys
|
|
327
|
-
- Generate valid configurations with proper `{workdir}` placeholders
|
|
328
|
-
- Use correct CLI flags and options
|
|
329
|
-
- Provide helpful troubleshooting when issues occur
|
|
335
|
+
When you clone this repository, Claude Code automatically detects and loads the plugin:
|
|
330
336
|
|
|
331
|
-
|
|
337
|
+
```bash
|
|
338
|
+
git clone https://github.com/greynewell/mcpbr.git
|
|
339
|
+
cd mcpbr
|
|
332
340
|
|
|
333
|
-
|
|
341
|
+
# Plugin is now active - try asking Claude:
|
|
342
|
+
# "Run the SWE-bench Lite eval with 5 tasks"
|
|
343
|
+
```
|
|
334
344
|
|
|
335
|
-
|
|
336
|
-
- Checks prerequisites (Docker, API keys, config files)
|
|
337
|
-
- Constructs valid `mcpbr run` commands
|
|
338
|
-
- Handles errors gracefully with actionable feedback
|
|
345
|
+
**Best for**: Contributors, developers testing changes, or users who want the latest unreleased features.
|
|
339
346
|
|
|
340
|
-
2
|
|
341
|
-
- Ensures `{workdir}` placeholder is included
|
|
342
|
-
- Validates MCP server commands
|
|
343
|
-
- Provides benchmark-specific templates
|
|
347
|
+
#### Option 2: npm Global Install (Planned for v0.4.0)
|
|
344
348
|
|
|
345
|
-
|
|
346
|
-
- Pre-configured for 5-task evaluation
|
|
347
|
-
- Includes sensible defaults for output files
|
|
348
|
-
- Perfect for testing and demonstrations
|
|
349
|
+
Install the plugin globally via npm for use across any project:
|
|
349
350
|
|
|
350
|
-
|
|
351
|
+
```bash
|
|
352
|
+
# Planned for v0.4.0 (not yet released)
|
|
353
|
+
npm install -g @mcpbr/claude-code-plugin
|
|
354
|
+
```
|
|
351
355
|
|
|
352
|
-
|
|
356
|
+
> **Note**: The npm package is not yet published. This installation method will be available in a future release. Track progress in [issue #265](https://github.com/greynewell/mcpbr/issues/265).
|
|
353
357
|
|
|
354
|
-
|
|
355
|
-
git clone https://github.com/greynewell/mcpbr.git
|
|
356
|
-
cd mcpbr
|
|
358
|
+
**Best for**: Users who want plugin features available in any directory.
|
|
357
359
|
|
|
358
|
-
|
|
359
|
-
|
|
360
|
-
|
|
360
|
+
#### Option 3: Claude Code Plugin Manager (Planned for v0.4.0)
|
|
361
|
+
|
|
362
|
+
Install via Claude Code's built-in plugin manager:
|
|
363
|
+
|
|
364
|
+
1. Open Claude Code settings
|
|
365
|
+
2. Navigate to Plugins > Browse
|
|
366
|
+
3. Search for "mcpbr"
|
|
367
|
+
4. Click Install
|
|
368
|
+
|
|
369
|
+
> **Note**: Plugin manager installation is not yet available. This installation method will be available after plugin marketplace submission. Track progress in [issue #267](https://github.com/greynewell/mcpbr/issues/267).
|
|
370
|
+
|
|
371
|
+
**Best for**: Users who prefer a GUI and want automatic updates.
|
|
372
|
+
|
|
373
|
+
### Installation Comparison
|
|
374
|
+
|
|
375
|
+
| Method | Availability | Auto-updates | Works Anywhere | Latest Features |
|
|
376
|
+
|--------|-------------|--------------|----------------|-----------------|
|
|
377
|
+
| Clone Repository | Available now | Manual (git pull) | No (repo only) | Yes (unreleased) |
|
|
378
|
+
| npm Global Install | Planned (not yet released) | Via npm | Yes | Yes (published) |
|
|
379
|
+
| Plugin Manager | Planned (not yet released) | Automatic | Yes | Yes (published) |
|
|
380
|
+
|
|
381
|
+
### What You Get
|
|
382
|
+
|
|
383
|
+
The plugin includes three specialized skills that enhance Claude's ability to work with mcpbr:
|
|
384
|
+
|
|
385
|
+
#### 1. run-benchmark
|
|
386
|
+
Expert at running evaluations with proper validation and error handling.
|
|
387
|
+
|
|
388
|
+
**Capabilities**:
|
|
389
|
+
- Validates prerequisites (Docker running, API keys set, config files exist)
|
|
390
|
+
- Constructs correct `mcpbr run` commands with appropriate flags
|
|
391
|
+
- Handles errors gracefully with actionable troubleshooting steps
|
|
392
|
+
- Monitors progress and provides meaningful status updates
|
|
393
|
+
|
|
394
|
+
**Example interactions**:
|
|
395
|
+
- "Run the SWE-bench Lite benchmark with 10 tasks"
|
|
396
|
+
- "Evaluate my MCP server using CyberGym level 2"
|
|
397
|
+
- "Test my config with a single task"
|
|
398
|
+
|
|
399
|
+
#### 2. generate-config
|
|
400
|
+
Generates valid mcpbr configuration files with benchmark-specific templates.
|
|
401
|
+
|
|
402
|
+
**Capabilities**:
|
|
403
|
+
- Ensures required `{workdir}` placeholder is included in MCP server args
|
|
404
|
+
- Validates MCP server command syntax
|
|
405
|
+
- Provides templates for different benchmarks (SWE-bench, CyberGym, MCPToolBench++)
|
|
406
|
+
- Suggests appropriate timeouts and concurrency settings
|
|
407
|
+
|
|
408
|
+
**Example interactions**:
|
|
409
|
+
- "Generate a config for the filesystem MCP server"
|
|
410
|
+
- "Create a config for testing my custom MCP server"
|
|
411
|
+
- "Set up a CyberGym evaluation config"
|
|
412
|
+
|
|
413
|
+
#### 3. swe-bench-lite
|
|
414
|
+
Quick-start command for running SWE-bench Lite evaluations.
|
|
415
|
+
|
|
416
|
+
**Capabilities**:
|
|
417
|
+
- Pre-configured for 5-task evaluation (fast testing)
|
|
418
|
+
- Includes sensible defaults for output files and logging
|
|
419
|
+
- Perfect for demonstrations and initial testing
|
|
420
|
+
- Automatically sets up verbose output for debugging
|
|
361
421
|
|
|
362
|
-
|
|
422
|
+
**Example interactions**:
|
|
423
|
+
- "Run a quick SWE-bench Lite test"
|
|
424
|
+
- "Show me how mcpbr works"
|
|
425
|
+
- "Test the filesystem server"
|
|
426
|
+
|
|
427
|
+
### Benefits
|
|
428
|
+
|
|
429
|
+
When using Claude Code with the mcpbr plugin active, Claude will automatically:
|
|
430
|
+
|
|
431
|
+
- Verify Docker is running before starting evaluations
|
|
432
|
+
- Check for required API keys (`ANTHROPIC_API_KEY`)
|
|
433
|
+
- Generate valid configurations with proper `{workdir}` placeholders
|
|
434
|
+
- Use correct CLI flags and avoid deprecated options
|
|
435
|
+
- Provide contextual troubleshooting when issues occur
|
|
436
|
+
- Follow mcpbr best practices for optimal results
|
|
437
|
+
|
|
438
|
+
### Troubleshooting
|
|
439
|
+
|
|
440
|
+
**Plugin not detected in cloned repository**:
|
|
441
|
+
- Ensure you're in the repository root directory
|
|
442
|
+
- Verify the `claude-code.json` file exists in the repo
|
|
443
|
+
- Try restarting Claude Code
|
|
444
|
+
|
|
445
|
+
**Skills not appearing**:
|
|
446
|
+
- Check Claude Code version (requires v2.0+)
|
|
447
|
+
- Verify plugin is listed in Settings > Plugins
|
|
448
|
+
- Try running `/reload-plugins` in Claude Code
|
|
449
|
+
|
|
450
|
+
**Commands failing**:
|
|
451
|
+
- Ensure mcpbr is installed: `pip install mcpbr`
|
|
452
|
+
- Verify Docker is running: `docker info`
|
|
453
|
+
- Check API key is set: `echo $ANTHROPIC_API_KEY`
|
|
454
|
+
|
|
455
|
+
For more help, see the [troubleshooting guide](https://greynewell.github.io/mcpbr/troubleshooting/) or [open an issue](https://github.com/greynewell/mcpbr/issues).
|
|
363
456
|
|
|
364
457
|
## Configuration
|
|
365
458
|
|
|
@@ -509,7 +602,7 @@ Run SWE-bench evaluation with the configured MCP server.
|
|
|
509
602
|
|
|
510
603
|
| Option | Short | Description |
|
|
511
604
|
|--------|-------|-------------|
|
|
512
|
-
| `--config PATH` | `-c` | Path to YAML configuration file (
|
|
605
|
+
| `--config PATH` | `-c` | Path to YAML configuration file (default: `mcpbr.yaml`, auto-created if missing) |
|
|
513
606
|
| `--model TEXT` | `-m` | Override model from config |
|
|
514
607
|
| `--benchmark TEXT` | `-b` | Override benchmark from config (`swe-bench`, `cybergym`, or `mcptoolbench`) |
|
|
515
608
|
| `--level INTEGER` | | Override CyberGym difficulty level (0-3) |
|
|
@@ -536,6 +629,7 @@ Run SWE-bench evaluation with the configured MCP server.
|
|
|
536
629
|
| `--smtp-port PORT` | | SMTP server port (default: 587) |
|
|
537
630
|
| `--smtp-user USER` | | SMTP username for authentication |
|
|
538
631
|
| `--smtp-password PASS` | | SMTP password for authentication |
|
|
632
|
+
| `--profile` | | Enable comprehensive performance profiling (tool latency, memory, overhead) |
|
|
539
633
|
| `--help` | `-h` | Show help message |
|
|
540
634
|
|
|
541
635
|
</details>
|
|
@@ -650,6 +744,58 @@ mcpbr cleanup -f
|
|
|
650
744
|
|
|
651
745
|
</details>
|
|
652
746
|
|
|
747
|
+
## Performance Profiling
|
|
748
|
+
|
|
749
|
+
mcpbr includes comprehensive performance profiling to understand MCP server overhead and identify optimization opportunities.
|
|
750
|
+
|
|
751
|
+
### Enable Profiling
|
|
752
|
+
|
|
753
|
+
```bash
|
|
754
|
+
# Via CLI flag
|
|
755
|
+
mcpbr run -c config.yaml --profile
|
|
756
|
+
|
|
757
|
+
# Or in config.yaml
|
|
758
|
+
enable_profiling: true
|
|
759
|
+
```
|
|
760
|
+
|
|
761
|
+
### What Gets Measured
|
|
762
|
+
|
|
763
|
+
- **Tool call latencies** with percentiles (p50, p95, p99)
|
|
764
|
+
- **Memory usage** (peak and average RSS/VMS)
|
|
765
|
+
- **Infrastructure overhead** (Docker and MCP server startup times)
|
|
766
|
+
- **Tool discovery speed** (time to first tool use)
|
|
767
|
+
- **Tool switching overhead** (time between tool calls)
|
|
768
|
+
- **Automated insights** from profiling data
|
|
769
|
+
|
|
770
|
+
### Example Profiling Output
|
|
771
|
+
|
|
772
|
+
```json
|
|
773
|
+
{
|
|
774
|
+
"profiling": {
|
|
775
|
+
"task_duration_seconds": 140.5,
|
|
776
|
+
"tool_call_latencies": {
|
|
777
|
+
"Read": {"count": 15, "avg_seconds": 0.8, "p95_seconds": 1.5},
|
|
778
|
+
"Bash": {"avg_seconds": 2.3, "p95_seconds": 5.1}
|
|
779
|
+
},
|
|
780
|
+
"memory_profile": {"peak_rss_mb": 512.3, "avg_rss_mb": 387.5},
|
|
781
|
+
"docker_startup_seconds": 2.1,
|
|
782
|
+
"mcp_server_startup_seconds": 1.8
|
|
783
|
+
}
|
|
784
|
+
}
|
|
785
|
+
```
|
|
786
|
+
|
|
787
|
+
### Automated Insights
|
|
788
|
+
|
|
789
|
+
The profiler automatically identifies performance issues:
|
|
790
|
+
|
|
791
|
+
```text
|
|
792
|
+
- Bash is the slowest tool (avg: 2.3s, p95: 5.1s)
|
|
793
|
+
- Docker startup adds 2.1s overhead per task
|
|
794
|
+
- Fast tool discovery: first tool use in 8.3s
|
|
795
|
+
```
|
|
796
|
+
|
|
797
|
+
See [docs/profiling.md](docs/profiling.md) for complete profiling documentation.
|
|
798
|
+
|
|
653
799
|
## Example Run
|
|
654
800
|
|
|
655
801
|
Here's what a typical evaluation looks like:
|