nodebench-mcp 2.0.0 → 2.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -33,6 +33,16 @@ Every additional tool call produces a concrete artifact — an issue found, a ri
33
33
 
34
34
  ---
35
35
 
36
+ ## Who's Using It
37
+
38
+ **Vision engineer** — Built agentic vision analysis using GPT 5.2 with Set-of-Mark (SoM) for boundary boxing, similar to Google Gemini 3 Flash's agentic code execution approach. Uses NodeBench's verification pipeline to validate detection accuracy across screenshot variants before shipping model changes.
39
+
40
+ **QA engineer** — Transitioned a manual QA workflow website into an AI agent-driven app for a pet care messaging platform. Uses NodeBench's quality gates, verification cycles, and eval runs to ensure the AI agent handles edge cases that manual QA caught but bare AI agents miss.
41
+
42
+ Both found different subsets of the 75 tools useful — which is why v2.1 ships with `--preset` gating to load only what you need.
43
+
44
+ ---
45
+
36
46
  ## How It Works — 3 Real Examples
37
47
 
38
48
  ### Example 1: Bug fix
@@ -67,8 +77,11 @@ Tasks 1-3 start with zero prior knowledge. By task 9, the agent finds 2+ relevan
67
77
  ### Install (30 seconds)
68
78
 
69
79
  ```bash
70
- # Claude Code CLI (recommended)
80
+ # Claude Code CLI — all 75 tools
71
81
  claude mcp add nodebench -- npx -y nodebench-mcp
82
+
83
+ # Or start lean — 30 tools, ~60% less token overhead
84
+ claude mcp add nodebench -- npx -y nodebench-mcp --preset lite
72
85
  ```
73
86
 
74
87
  Or add to `~/.claude/settings.json` or `.claude.json`:
@@ -187,6 +200,73 @@ Based on Anthropic's ["Building a C Compiler with Parallel Claudes"](https://www
187
200
 
188
201
  ---
189
202
 
203
+ ## Toolset Gating (v2.1)
204
+
205
+ 75 tools means ~19K tokens of schema per API call. If you only need core methodology, gate the toolset:
206
+
207
+ ### Presets
208
+
209
+ ```bash
210
+ # Lite — 30 tools (verification, eval, gates, learning, recon)
211
+ claude mcp add nodebench -- npx -y nodebench-mcp --preset lite
212
+
213
+ # Core — 50 tools (adds flywheel, bootstrap, self-eval)
214
+ claude mcp add nodebench -- npx -y nodebench-mcp --preset core
215
+
216
+ # Full — all 75 tools (default)
217
+ claude mcp add nodebench -- npx -y nodebench-mcp
218
+ ```
219
+
220
+ Or in config:
221
+
222
+ ```json
223
+ {
224
+ "mcpServers": {
225
+ "nodebench": {
226
+ "command": "npx",
227
+ "args": ["-y", "nodebench-mcp", "--preset", "core"]
228
+ }
229
+ }
230
+ }
231
+ ```
232
+
233
+ ### Fine-grained control
234
+
235
+ ```bash
236
+ # Include only specific toolsets
237
+ npx nodebench-mcp --toolsets verification,eval,recon
238
+
239
+ # Exclude heavy optional-dep toolsets
240
+ npx nodebench-mcp --exclude vision,ui_capture,parallel
241
+
242
+ # See all toolsets and presets
243
+ npx nodebench-mcp --help
244
+ ```
245
+
246
+ ### Available toolsets
247
+
248
+ | Toolset | Tools | What it covers |
249
+ |---|---|---|
250
+ | verification | 8 | Cycles, gaps, triple-verify, status |
251
+ | eval | 5 | Eval runs, results, comparison |
252
+ | quality_gate | 4 | Gates, presets, history |
253
+ | learning | 4 | Knowledge, search, record |
254
+ | recon | 5 | Research, findings, framework checks |
255
+ | flywheel | 4 | Mandatory flywheel, promote, investigate |
256
+ | bootstrap | 4 | Project setup, agents.md, self-implement |
257
+ | self_eval | 6 | Trajectory analysis, health reports |
258
+ | parallel | 10 | Task locks, roles, context budget, oracle |
259
+ | vision | 3 | Screenshot analysis, UI capture |
260
+ | ui_capture | 3 | Playwright-based capture |
261
+ | web | 2 | Web search, URL fetch |
262
+ | github | 3 | Repo search, analysis |
263
+ | docs | 3 | Documentation generation |
264
+ | local_file | 3 | CSV, XLSX, PDF parsing |
265
+
266
+ `findTools` and `getMethodology` are always available regardless of gating — agents can discover tools on demand.
267
+
268
+ ---
269
+
190
270
  ## Build from Source
191
271
 
192
272
  ```bash
@@ -0,0 +1 @@
1
+ export {};