nodebench-mcp 2.14.1 → 2.14.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -21,7 +21,7 @@ Add to `~/.claude/settings.json`:
21
21
  }
22
22
  ```
23
23
 
24
- Restart Claude Code. 89+ tools available immediately.
24
+ Restart Claude Code. 162 tools available immediately.
25
25
 
26
26
  ### Preset Selection
27
27
 
@@ -264,7 +264,7 @@ Use `getMethodology("overview")` to see all available workflows.
264
264
  | Category | Tools | When to Use |
265
265
  |----------|-------|-------------|
266
266
  | **Web** | `web_search`, `fetch_url` | Research, reading docs, market validation |
267
- | **Local Files** | `read_pdf_text`, `pdf_search_text`, `read_xlsx_file`, `xlsx_select_rows`, `xlsx_aggregate`, `read_csv_file`, `csv_select_rows`, `csv_aggregate`, `read_text_file`, `read_json_file`, `json_select`, `read_jsonl_file`, `zip_list_files`, `zip_read_text_file`, `zip_extract_file`, `read_docx_text`, `read_pptx_text` | Deterministic parsing and aggregation of local attachments (GAIA file-backed lane) |
267
+ | **Local Files** | `read_pdf_text`, `pdf_search_text`, `read_xlsx_file`, `xlsx_select_rows`, `xlsx_aggregate`, `read_csv_file`, `csv_select_rows`, `csv_aggregate`, `read_text_file`, `read_json_file`, `json_select`, `read_jsonl_file`, `zip_list_files`, `zip_read_text_file`, `zip_extract_file`, `read_docx_text`, `read_pptx_text`, `read_image_ocr_text`, `transcribe_audio_file` | Deterministic parsing and aggregation of local attachments (GAIA file-backed lane) |
268
268
  | **GitHub** | `search_github`, `analyze_repo` | Finding libraries, studying implementations |
269
269
  | **Verification** | `start_cycle`, `log_phase`, `complete_cycle` | Tracking the flywheel process |
270
270
  | **Eval** | `start_eval_run`, `log_test_result` | Test case management |
@@ -273,11 +273,18 @@ Use `getMethodology("overview")` to see all available workflows.
273
273
  | **Vision** | `analyze_screenshot`, `capture_ui_screenshot` | UI/UX verification |
274
274
  | **Bootstrap** | `discover_infrastructure`, `triple_verify`, `self_implement` | Self-setup, triple verification |
275
275
  | **Autonomous** | `assess_risk`, `decide_re_update`, `run_self_maintenance` | Risk-aware execution, self-maintenance |
276
- | **Parallel Agents** | `claim_agent_task`, `release_agent_task`, `list_agent_tasks`, `assign_agent_role`, `get_agent_role`, `log_context_budget`, `run_oracle_comparison`, `get_parallel_status` | Multi-agent coordination, task locking, role specialization, oracle testing |
276
+ | **Parallel Agents** | `claim_agent_task`, `release_agent_task`, `list_agent_tasks`, `assign_agent_role`, `get_agent_role`, `log_context_budget`, `run_oracle_comparison`, `get_parallel_status`, `bootstrap_parallel_agents`, `generate_parallel_agents_md`, `send_agent_message`, `check_agent_inbox`, `broadcast_agent_update` | Multi-agent coordination, task locking, role specialization, oracle testing, agent mailbox |
277
277
  | **LLM** | `call_llm`, `extract_structured_data`, `benchmark_models` | LLM calling, structured extraction, model comparison |
278
278
  | **Security** | `scan_dependencies`, `run_code_analysis` | Dependency auditing, static code analysis |
279
279
  | **Platform** | `query_daily_brief`, `query_funding_entities`, `query_research_queue`, `publish_to_queue` | Convex platform bridge: intelligence, funding, research, publishing |
280
280
  | **Meta** | `findTools`, `getMethodology` | Discover tools, get workflow guides |
281
+ | **TOON** | `toon_encode`, `toon_decode` | Token-Oriented Object Notation — ~40% token savings vs JSON |
282
+ | **Pattern** | `mine_session_patterns`, `predict_risks_from_patterns` | Session sequence analysis, risk prediction from history |
283
+ | **Git Workflow** | `check_git_compliance`, `review_pr_checklist`, `enforce_merge_gate` | Branch validation, PR checklist, merge gates |
284
+ | **SEO** | `seo_audit_url`, `check_page_performance`, `analyze_seo_content`, `check_wordpress_site`, `scan_wordpress_updates` | Technical SEO audit, performance, WordPress |
285
+ | **Voice Bridge** | `design_voice_pipeline`, `analyze_voice_config`, `generate_voice_scaffold`, `benchmark_voice_latency` | Voice pipeline design, config, scaffolding, latency |
286
+ | **GAIA Solvers** | `solve_red_green_deviation_average_from_image`, `solve_green_polygon_area_from_image`, `grade_fraction_quiz_from_image`, `extract_fractions_and_simplify_from_image`, `solve_bass_clef_age_from_image`, `solve_storage_upgrade_cost_per_file_from_image` | GAIA media image solvers |
287
+ | **Session Memory** | `save_session_note`, `load_session_notes`, `refresh_task_context` | Compaction-resilient notes, attention refresh |
281
288
  | **Discovery** | `discover_tools`, `get_tool_quick_ref`, `get_workflow_chain` | Hybrid search, quick refs, workflow chains |
282
289
 
283
290
  Meta + Discovery tools (5 total) are **always included** regardless of preset. See [Toolset Gating & Presets](#toolset-gating--presets).
@@ -295,9 +302,9 @@ NodeBench MCP supports 4 presets that control which domain toolsets are loaded a
295
302
  | Preset | Domain Toolsets | Domain Tools | Total (with meta+discovery) | Use Case |
296
303
  |--------|----------------|-------------|----------------------------|----------|
297
304
  | **meta** | 0 | 0 | 5 | Discovery-only front door. Agents start here and self-escalate. |
298
- | **lite** | 7 | ~35 | ~40 | Lightweight verification-focused workflows. CI bots, quick checks. |
299
- | **core** | 16 | ~75 | ~80 | Full development workflow. Most agent sessions. |
300
- | **full** | all | 89+ | 94+ | Everything enabled. Benchmarking, exploration, advanced use. |
305
+ | **lite** | 8 | 38 | 43 | Lightweight verification-focused workflows. CI bots, quick checks. |
306
+ | **core** | 22 | 109 | 114 | Full development workflow. Most agent sessions. |
307
+ | **full** | 30 | 157 | 162 | Everything enabled. Benchmarking, exploration, advanced use. |
301
308
 
302
309
  ### Usage
303
310
 
@@ -338,11 +345,11 @@ This is the recommended starting point for autonomous agents. The self-escalatio
338
345
 
339
346
  **meta** (0 domains): No domain tools. Meta + Discovery only.
340
347
 
341
- **lite** (7 domains): `verification`, `eval`, `quality_gate`, `learning`, `recon`, `security`, `boilerplate`
348
+ **lite** (8 domains): `verification`, `eval`, `quality_gate`, `learning`, `flywheel`, `recon`, `security`, `boilerplate`
342
349
 
343
- **core** (16 domains): Everything in lite plus `flywheel`, `bootstrap`, `self_eval`, `llm`, `platform`, `research_writing`, `flicker_detection`, `figma_flow`, `benchmark`
350
+ **core** (22 domains): Everything in lite plus `bootstrap`, `self_eval`, `llm`, `platform`, `research_writing`, `flicker_detection`, `figma_flow`, `benchmark`, `session_memory`, `toon`, `pattern`, `git_workflow`, `seo`, `voice_bridge`
344
351
 
345
- **full** (all domains): All toolsets in TOOLSET_MAP including `ui_capture`, `vision`, `local_file`, `web`, `github`, `docs`, `parallel`, and everything in core.
352
+ **full** (30 domains): All toolsets in TOOLSET_MAP including `ui_capture`, `vision`, `local_file`, `web`, `github`, `docs`, `parallel`, `gaia_solvers`, and everything in core.
346
353
 
347
354
  **→ Quick Refs:** Check current toolset: `findTools({ query: "*" })` | Self-escalate: restart with `--preset core` | See [MCP Tool Categories](#mcp-tool-categories) | CLI help: `npx nodebench-mcp --help`
348
355
 
@@ -699,6 +706,9 @@ Available via `getMethodology({ topic: "..." })`:
699
706
  | `parallel_agent_teams` | Multi-agent coordination, task locking, oracle testing | [Parallel Agent Teams](#parallel-agent-teams) |
700
707
  | `self_reinforced_learning` | Trajectory analysis, self-eval, improvement recs | [Self-Reinforced Learning](#self-reinforced-learning-loop) |
701
708
  | `toolset_gating` | 4 presets (meta, lite, core, full) and self-escalation | [Toolset Gating & Presets](#toolset-gating--presets) |
709
+ | `toon_format` | TOON encoding — ~40% token savings vs JSON | TOON is on by default since v2.14.1 |
710
+ | `seo_audit` | Full SEO audit workflow (technical + performance + content) | `seo_audit_url`, `check_page_performance`, `analyze_seo_content` |
711
+ | `voice_bridge` | Voice pipeline design, config analysis, scaffolding | `design_voice_pipeline`, `analyze_voice_config` |
702
712
 
703
713
  **→ Quick Refs:** Find tools: `findTools({ query: "..." })` | Get any methodology: `getMethodology({ topic: "..." })` | See [MCP Tool Categories](#mcp-tool-categories)
704
714
 
package/README.md CHANGED
@@ -39,7 +39,7 @@ Every additional tool call produces a concrete artifact — an issue found, a ri
39
39
 
40
40
  **QA engineer** — Transitioned a manual QA workflow website into an AI agent-driven app for a pet care messaging platform. Uses NodeBench's quality gates, verification cycles, and eval runs to ensure the AI agent handles edge cases that manual QA caught but bare AI agents miss.
41
41
 
42
- Both found different subsets of the 143 tools useful — which is why NodeBench ships with 4 `--preset` levels to load only what you need.
42
+ Both found different subsets of the 162 tools useful — which is why NodeBench ships with 4 `--preset` levels to load only what you need.
43
43
 
44
44
  ---
45
45
 
@@ -77,7 +77,7 @@ Tasks 1-3 start with zero prior knowledge. By task 9, the agent finds 2+ relevan
77
77
  ### Install (30 seconds)
78
78
 
79
79
  ```bash
80
- # Claude Code CLI — all 143 tools
80
+ # Claude Code CLI — all 162 tools (TOON encoding on by default for ~40% token savings)
81
81
  claude mcp add nodebench -- npx -y nodebench-mcp
82
82
 
83
83
  # Or start with discovery only — 5 tools, agents self-escalate to what they need
@@ -189,7 +189,7 @@ Notes:
189
189
 
190
190
  ## Progressive Discovery
191
191
 
192
- 143 tools is a lot. The progressive disclosure system helps agents find exactly what they need:
192
+ 162 tools is a lot. The progressive disclosure system helps agents find exactly what they need:
193
193
 
194
194
  ### Multi-modal search engine
195
195
 
@@ -197,7 +197,7 @@ Notes:
197
197
  > discover_tools("verify my implementation")
198
198
  ```
199
199
 
200
- The `discover_tools` search engine scores tools using **9 parallel strategies**:
200
+ The `discover_tools` search engine scores tools using **10 parallel strategies**:
201
201
 
202
202
  | Strategy | What it does | Example |
203
203
  |---|---|---|
@@ -210,6 +210,7 @@ The `discover_tools` search engine scores tools using **9 parallel strategies**:
210
210
  | Regex | Pattern matching | `"^run_.*loop$"` → `run_closed_loop` |
211
211
  | Bigram | Phrase matching | "quality gate" matched as unit |
212
212
  | Domain boost | Related categories boosted together | verification + quality_gate cluster |
213
+ | Dense | TF-IDF cosine similarity for vector-like ranking | "audit compliance" surfaces related tools |
213
214
 
214
215
  **7 search modes**: `hybrid` (default, all strategies), `fuzzy`, `regex`, `prefix`, `semantic`, `exact`, `dense`
215
216
 
@@ -227,7 +228,7 @@ Call `get_tool_quick_ref("tool_name")` for any tool's guidance.
227
228
 
228
229
  ### Workflow chains — step-by-step recipes
229
230
 
230
- 21 pre-built chains for common workflows:
231
+ 24 pre-built chains for common workflows:
231
232
 
232
233
  | Chain | Steps | Use case |
233
234
  |---|---|---|
@@ -242,16 +243,19 @@ Call `get_tool_quick_ref("tool_name")` for any tool's guidance.
242
243
  | `code_review` | 8 | Structured code review |
243
244
  | `deployment` | 8 | Ship with full verification |
244
245
  | `migration` | 10 | SDK/framework upgrade |
245
- | `coordinator_spawn` | 6 | Parallel coordinator setup |
246
- | `self_setup` | 5 | Agent self-onboarding |
247
- | `flicker_detection` | 5 | Android flicker analysis |
246
+ | `coordinator_spawn` | 10 | Parallel coordinator setup |
247
+ | `self_setup` | 8 | Agent self-onboarding |
248
+ | `flicker_detection` | 7 | Android flicker analysis |
248
249
  | `figma_flow_analysis` | 5 | Figma prototype flow audit |
249
- | `agent_eval` | 6 | Evaluate agent performance |
250
- | `contract_compliance` | 4 | Check agent contract adherence |
251
- | `ablation_eval` | 6 | Ablation experiment design |
250
+ | `agent_eval` | 9 | Evaluate agent performance |
251
+ | `contract_compliance` | 5 | Check agent contract adherence |
252
+ | `ablation_eval` | 10 | Ablation experiment design |
252
253
  | `session_recovery` | 6 | Recover context after compaction |
253
254
  | `attention_refresh` | 4 | Reload bearings mid-session |
254
- | `task_bank_setup` | 5 | Create evaluation task banks |
255
+ | `task_bank_setup` | 9 | Create evaluation task banks |
256
+ | `pr_review` | 5 | Pull request review |
257
+ | `seo_audit` | 6 | Full SEO audit |
258
+ | `voice_pipeline` | 6 | Voice pipeline implementation |
255
259
 
256
260
  Call `get_workflow_chain("new_feature")` to get the step-by-step sequence.
257
261
 
@@ -282,7 +286,7 @@ Research → Risk → Implement → Test (3 layers) → Eval → Gate → Learn
282
286
  **Outer loop** (over time): Eval-driven development ensures improvement.
283
287
  **Together**: The AI Flywheel — every verification produces eval artifacts, every regression triggers verification.
284
288
 
285
- Ask the agent: `Use getMethodology("overview")` to see all 19 methodology topics.
289
+ Ask the agent: `Use getMethodology("overview")` to see all 20 methodology topics.
286
290
 
287
291
  ---
288
292
 
@@ -313,7 +317,7 @@ Based on Anthropic's ["Building a C Compiler with Parallel Claudes"](https://www
313
317
 
314
318
  ## Toolset Gating
315
319
 
316
- 143 tools means tens of thousands of tokens of schema per API call. If you only need core methodology, gate the toolset:
320
+ 162 tools means tens of thousands of tokens of schema per API call. If you only need core methodology, gate the toolset:
317
321
 
318
322
  ### Presets
319
323
 
@@ -321,8 +325,8 @@ Based on Anthropic's ["Building a C Compiler with Parallel Claudes"](https://www
321
325
  |---|---|---|---|
322
326
  | `meta` | 5 | 0 | Discovery-only front door — agents start here and self-escalate via `discover_tools` |
323
327
  | `lite` | 43 | 8 | Core methodology — verification, eval, flywheel, learning, recon, security, boilerplate |
324
- | `core` | 93 | 17 | Full workflow — adds bootstrap, self-eval, llm, platform, research_writing, flicker_detection, figma_flow, benchmark, session_memory |
325
- | `full` | 143 | 25 | Everything — adds vision, UI capture, web, GitHub, docs, parallel, local files, GAIA solvers |
328
+ | `core` | 114 | 22 | Full workflow — adds bootstrap, self-eval, llm, platform, research_writing, flicker_detection, figma_flow, benchmark, session_memory, toon, pattern, git_workflow, seo, voice_bridge |
329
+ | `full` | 162 | 30 | Everything — adds vision, UI capture, web, GitHub, docs, parallel, local files, GAIA solvers |
326
330
 
327
331
  ```bash
328
332
  # Meta — 5 tools (discovery-only: findTools, getMethodology, discover_tools, get_tool_quick_ref, get_workflow_chain)
@@ -332,10 +336,10 @@ claude mcp add nodebench -- npx -y nodebench-mcp --preset meta
332
336
  # Lite — 43 tools (verification, eval, flywheel, learning, recon, security, boilerplate + meta + discovery)
333
337
  claude mcp add nodebench -- npx -y nodebench-mcp --preset lite
334
338
 
335
- # Core — 93 tools (adds bootstrap, self-eval, llm, platform, research_writing, flicker_detection, figma_flow, benchmark, session_memory + meta + discovery)
339
+ # Core — 114 tools (adds bootstrap, self-eval, llm, platform, research_writing, flicker_detection, figma_flow, benchmark, session_memory, toon, pattern, git_workflow, seo, voice_bridge + meta + discovery)
336
340
  claude mcp add nodebench -- npx -y nodebench-mcp --preset core
337
341
 
338
- # Full — all 143 tools (default)
342
+ # Full — all 162 tools (default, TOON encoding on by default)
339
343
  claude mcp add nodebench -- npx -y nodebench-mcp
340
344
  ```
341
345
 
@@ -377,7 +381,7 @@ npx nodebench-mcp --help
377
381
  | flywheel | 4 | Mandatory flywheel, promote, investigate |
378
382
  | bootstrap | 11 | Project setup, agents.md, self-implement, autonomous, test runner |
379
383
  | self_eval | 9 | Trajectory analysis, health reports, task banks, grading, contract compliance |
380
- | parallel | 10 | Task locks, roles, context budget, oracle |
384
+ | parallel | 13 | Task locks, roles, context budget, oracle, agent mailbox (point-to-point + broadcast) |
381
385
  | vision | 4 | Screenshot analysis, UI capture, diff |
382
386
  | ui_capture | 2 | Playwright-based capture |
383
387
  | web | 2 | Web search, URL fetch |
@@ -394,6 +398,11 @@ npx nodebench-mcp --help
394
398
  | benchmark | 3 | Autonomous benchmark lifecycle (C-compiler pattern) |
395
399
  | session_memory | 3 | Compaction-resilient notes, attention refresh, context reload |
396
400
  | gaia_solvers | 6 | GAIA media image solvers (red/green deviation, polygon area, fraction quiz, bass clef, storage cost) |
401
+ | toon | 2 | TOON encode/decode — Token-Oriented Object Notation (~40% token savings) |
402
+ | pattern | 2 | Session pattern mining + risk prediction from historical sequences |
403
+ | git_workflow | 3 | Branch compliance, PR checklist review, merge gate enforcement |
404
+ | seo | 5 | Technical SEO audit, page performance, content analysis, WordPress detection + updates |
405
+ | voice_bridge | 4 | Voice pipeline design, config analysis, scaffold generation, latency benchmarking |
397
406
 
398
407
  Always included (regardless of gating) — these 5 tools form the `meta` preset:
399
408
  - Meta: `findTools`, `getMethodology`
@@ -401,6 +410,20 @@ Always included (regardless of gating) — these 5 tools form the `meta` preset:
401
410
 
402
411
  The `meta` preset loads **only** these 5 tools (0 domain tools). Agents use `discover_tools` to find what they need and self-escalate.
403
412
 
413
+ ### TOON Format — Token Savings
414
+
415
+ TOON (Token-Oriented Object Notation) is **on by default** since v2.14.1. Every tool response is TOON-encoded for ~40% fewer tokens vs JSON. Disable with `--no-toon` if your client can't handle non-JSON responses.
416
+
417
+ ```bash
418
+ # TOON on (default)
419
+ claude mcp add nodebench -- npx -y nodebench-mcp
420
+
421
+ # TOON off
422
+ claude mcp add nodebench -- npx -y nodebench-mcp --no-toon
423
+ ```
424
+
425
+ Use the `toon_encode` and `toon_decode` tools to convert between TOON and JSON in your own workflows.
426
+
404
427
  ---
405
428
 
406
429
  ## Build from Source
package/dist/index.js CHANGED
@@ -98,7 +98,7 @@ const PRESETS = {
98
98
  function parseToolsets() {
99
99
  if (cliArgs.includes("--help")) {
100
100
  const lines = [
101
- "nodebench-mcp v2.14.1 — Development Methodology MCP Server",
101
+ "nodebench-mcp v2.14.2 — Development Methodology MCP Server",
102
102
  "",
103
103
  "Usage: nodebench-mcp [options]",
104
104
  "",
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "nodebench-mcp",
3
- "version": "2.14.1",
4
- "description": "Make AI agents catch the bugs they normally ship. 162 MCP tools across 30 domains: TOON format integration (~40% token savings), pattern mining (session sequence analysis + risk prediction), git workflow compliance (branch validation + PR checklist + merge gates), SEO audit (5 tools: technical SEO, performance, content analysis, WordPress detection), voice bridge (pipeline design + config analysis + scaffold generation + latency benchmarking), agent mailbox (point-to-point + broadcast messaging), progressive discovery with 7-mode hybrid search, model-tier complexity routing, agent contract (front-door + anti-rationalization + 3-strike error), compaction-resilient session memory, lightweight hooks, 6 GAIA media image solvers, project boilerplate, autonomous capability benchmarks, structured research, 3-layer testing, quality gates, persistent knowledge, LLM calling, security analysis, platform bridge, visual regression, report generation, academic paper writing, deterministic local file parsing, Android flicker detection, Figma flow analysis, and contract compliance scoring. --preset meta (5), lite (43), core (114), or full (162). --toon flag for token-efficient responses.",
3
+ "version": "2.14.2",
4
+ "description": "Make AI agents catch the bugs they normally ship. 162 MCP tools across 30 domains: TOON encoding on by default (~40% token savings, opt out with --no-toon), pattern mining (session sequence analysis + risk prediction), git workflow compliance (branch validation + PR checklist + merge gates), SEO audit (5 tools: technical SEO, performance, content analysis, WordPress detection), voice bridge (pipeline design + config analysis + scaffold generation + latency benchmarking), agent mailbox (point-to-point + broadcast messaging), progressive discovery with 7-mode hybrid search, model-tier complexity routing, agent contract (front-door + anti-rationalization + 3-strike error), compaction-resilient session memory, lightweight hooks, 6 GAIA media image solvers, project boilerplate, autonomous capability benchmarks, structured research, 3-layer testing, quality gates, persistent knowledge, LLM calling, security analysis, platform bridge, visual regression, report generation, academic paper writing, deterministic local file parsing, Android flicker detection, Figma flow analysis, and contract compliance scoring. --preset meta (5), lite (43), core (114), or full (162).",
5
5
  "type": "module",
6
6
  "bin": {
7
7
  "nodebench-mcp": "./dist/index.js"
@@ -57,7 +57,13 @@
57
57
  "qa-automation",
58
58
  "agentic",
59
59
  "academic-writing",
60
- "research-paper"
60
+ "research-paper",
61
+ "toon",
62
+ "seo",
63
+ "voice-pipeline",
64
+ "git-workflow",
65
+ "pattern-mining",
66
+ "session-memory"
61
67
  ],
62
68
  "repository": {
63
69
  "type": "git",