npm - nodebench-mcp - Versions diffs - 2.0.0 → 2.1.0 - Mend

nodebench-mcp 2.0.0 → 2.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/README.md +81 -1
package/dist/__tests__/toolsetGatingEval.test.d.ts +1 -0
package/dist/__tests__/toolsetGatingEval.test.js +1031 -0
package/dist/__tests__/toolsetGatingEval.test.js.map +1 -0
package/dist/index.js +94 -19
package/dist/index.js.map +1 -1
package/package.json +9 -3

package/README.md CHANGED Viewed

@@ -33,6 +33,16 @@ Every additional tool call produces a concrete artifact — an issue found, a ri
 ---
+## Who's Using It
+**Vision engineer** — Built agentic vision analysis using GPT 5.2 with Set-of-Mark (SoM) for boundary boxing, similar to Google Gemini 3 Flash's agentic code execution approach. Uses NodeBench's verification pipeline to validate detection accuracy across screenshot variants before shipping model changes.
+**QA engineer** — Transitioned a manual QA workflow website into an AI agent-driven app for a pet care messaging platform. Uses NodeBench's quality gates, verification cycles, and eval runs to ensure the AI agent handles edge cases that manual QA caught but bare AI agents miss.
+Both found different subsets of the 75 tools useful — which is why v2.1 ships with `--preset` gating to load only what you need.
+---
 ## How It Works — 3 Real Examples
 ### Example 1: Bug fix
@@ -67,8 +77,11 @@ Tasks 1-3 start with zero prior knowledge. By task 9, the agent finds 2+ relevan
 ### Install (30 seconds)
 ```bash
-# Claude Code CLI (recommended)
+# Claude Code CLI — all 75 tools
 claude mcp add nodebench -- npx -y nodebench-mcp
+# Or start lean — 30 tools, ~60% less token overhead
+claude mcp add nodebench -- npx -y nodebench-mcp --preset lite
 ```
 Or add to `~/.claude/settings.json` or `.claude.json`:
@@ -187,6 +200,73 @@ Based on Anthropic's ["Building a C Compiler with Parallel Claudes"](https://www
 ---
+## Toolset Gating (v2.1)
+75 tools means ~19K tokens of schema per API call. If you only need core methodology, gate the toolset:
+### Presets
+```bash
+# Lite — 30 tools (verification, eval, gates, learning, recon)
+claude mcp add nodebench -- npx -y nodebench-mcp --preset lite
+# Core — 50 tools (adds flywheel, bootstrap, self-eval)
+claude mcp add nodebench -- npx -y nodebench-mcp --preset core
+# Full — all 75 tools (default)
+claude mcp add nodebench -- npx -y nodebench-mcp
+```
+Or in config:
+```json
+{
+  "mcpServers": {
+    "nodebench": {
+      "command": "npx",
+      "args": ["-y", "nodebench-mcp", "--preset", "core"]
+    }
+  }
+}
+```
+### Fine-grained control
+```bash
+# Include only specific toolsets
+npx nodebench-mcp --toolsets verification,eval,recon
+# Exclude heavy optional-dep toolsets
+npx nodebench-mcp --exclude vision,ui_capture,parallel
+# See all toolsets and presets
+npx nodebench-mcp --help
+```
+### Available toolsets
+| Toolset | Tools | What it covers |
+|---|---|---|
+| verification | 8 | Cycles, gaps, triple-verify, status |
+| eval | 5 | Eval runs, results, comparison |
+| quality_gate | 4 | Gates, presets, history |
+| learning | 4 | Knowledge, search, record |
+| recon | 5 | Research, findings, framework checks |
+| flywheel | 4 | Mandatory flywheel, promote, investigate |
+| bootstrap | 4 | Project setup, agents.md, self-implement |
+| self_eval | 6 | Trajectory analysis, health reports |
+| parallel | 10 | Task locks, roles, context budget, oracle |
+| vision | 3 | Screenshot analysis, UI capture |
+| ui_capture | 3 | Playwright-based capture |
+| web | 2 | Web search, URL fetch |
+| github | 3 | Repo search, analysis |
+| docs | 3 | Documentation generation |
+| local_file | 3 | CSV, XLSX, PDF parsing |
+`findTools` and `getMethodology` are always available regardless of gating — agents can discover tools on demand.
+---
 ## Build from Source
 ```bash

package/dist/__tests__/toolsetGatingEval.test.d.ts ADDED Viewed

	@@ -0,0 +1 @@
1	+ export {};