npm - nodebench-mcp - Versions diffs - 2.14.1 → 2.14.2 - Mend

nodebench-mcp 2.14.1 → 2.14.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/NODEBENCH_AGENTS.md CHANGED Viewed

@@ -21,7 +21,7 @@ Add to `~/.claude/settings.json`:
 }
 ```
-Restart Claude Code. 89+ tools available immediately.
+Restart Claude Code. 162 tools available immediately.
 ### Preset Selection
@@ -264,7 +264,7 @@ Use `getMethodology("overview")` to see all available workflows.
 | Category | Tools | When to Use |
 |----------|-------|-------------|
 | **Web** | `web_search`, `fetch_url` | Research, reading docs, market validation |
-| **Local Files** | `read_pdf_text`, `pdf_search_text`, `read_xlsx_file`, `xlsx_select_rows`, `xlsx_aggregate`, `read_csv_file`, `csv_select_rows`, `csv_aggregate`, `read_text_file`, `read_json_file`, `json_select`, `read_jsonl_file`, `zip_list_files`, `zip_read_text_file`, `zip_extract_file`, `read_docx_text`, `read_pptx_text` | Deterministic parsing and aggregation of local attachments (GAIA file-backed lane) |
+| **Local Files** | `read_pdf_text`, `pdf_search_text`, `read_xlsx_file`, `xlsx_select_rows`, `xlsx_aggregate`, `read_csv_file`, `csv_select_rows`, `csv_aggregate`, `read_text_file`, `read_json_file`, `json_select`, `read_jsonl_file`, `zip_list_files`, `zip_read_text_file`, `zip_extract_file`, `read_docx_text`, `read_pptx_text`, `read_image_ocr_text`, `transcribe_audio_file` | Deterministic parsing and aggregation of local attachments (GAIA file-backed lane) |
 | **GitHub** | `search_github`, `analyze_repo` | Finding libraries, studying implementations |
 | **Verification** | `start_cycle`, `log_phase`, `complete_cycle` | Tracking the flywheel process |
 | **Eval** | `start_eval_run`, `log_test_result` | Test case management |
@@ -273,11 +273,18 @@ Use `getMethodology("overview")` to see all available workflows.
 | **Vision** | `analyze_screenshot`, `capture_ui_screenshot` | UI/UX verification |
 | **Bootstrap** | `discover_infrastructure`, `triple_verify`, `self_implement` | Self-setup, triple verification |
 | **Autonomous** | `assess_risk`, `decide_re_update`, `run_self_maintenance` | Risk-aware execution, self-maintenance |
-| **Parallel Agents** | `claim_agent_task`, `release_agent_task`, `list_agent_tasks`, `assign_agent_role`, `get_agent_role`, `log_context_budget`, `run_oracle_comparison`, `get_parallel_status` | Multi-agent coordination, task locking, role specialization, oracle testing |
+| **Parallel Agents** | `claim_agent_task`, `release_agent_task`, `list_agent_tasks`, `assign_agent_role`, `get_agent_role`, `log_context_budget`, `run_oracle_comparison`, `get_parallel_status`, `bootstrap_parallel_agents`, `generate_parallel_agents_md`, `send_agent_message`, `check_agent_inbox`, `broadcast_agent_update` | Multi-agent coordination, task locking, role specialization, oracle testing, agent mailbox |
 | **LLM** | `call_llm`, `extract_structured_data`, `benchmark_models` | LLM calling, structured extraction, model comparison |
 | **Security** | `scan_dependencies`, `run_code_analysis` | Dependency auditing, static code analysis |
 | **Platform** | `query_daily_brief`, `query_funding_entities`, `query_research_queue`, `publish_to_queue` | Convex platform bridge: intelligence, funding, research, publishing |
 | **Meta** | `findTools`, `getMethodology` | Discover tools, get workflow guides |
+| **TOON** | `toon_encode`, `toon_decode` | Token-Oriented Object Notation — ~40% token savings vs JSON |
+| **Pattern** | `mine_session_patterns`, `predict_risks_from_patterns` | Session sequence analysis, risk prediction from history |
+| **Git Workflow** | `check_git_compliance`, `review_pr_checklist`, `enforce_merge_gate` | Branch validation, PR checklist, merge gates |
+| **SEO** | `seo_audit_url`, `check_page_performance`, `analyze_seo_content`, `check_wordpress_site`, `scan_wordpress_updates` | Technical SEO audit, performance, WordPress |
+| **Voice Bridge** | `design_voice_pipeline`, `analyze_voice_config`, `generate_voice_scaffold`, `benchmark_voice_latency` | Voice pipeline design, config, scaffolding, latency |
+| **GAIA Solvers** | `solve_red_green_deviation_average_from_image`, `solve_green_polygon_area_from_image`, `grade_fraction_quiz_from_image`, `extract_fractions_and_simplify_from_image`, `solve_bass_clef_age_from_image`, `solve_storage_upgrade_cost_per_file_from_image` | GAIA media image solvers |
+| **Session Memory** | `save_session_note`, `load_session_notes`, `refresh_task_context` | Compaction-resilient notes, attention refresh |
 | **Discovery** | `discover_tools`, `get_tool_quick_ref`, `get_workflow_chain` | Hybrid search, quick refs, workflow chains |
 Meta + Discovery tools (5 total) are **always included** regardless of preset. See [Toolset Gating & Presets](#toolset-gating--presets).
@@ -295,9 +302,9 @@ NodeBench MCP supports 4 presets that control which domain toolsets are loaded a
 | Preset | Domain Toolsets | Domain Tools | Total (with meta+discovery) | Use Case |
 |--------|----------------|-------------|----------------------------|----------|
 | **meta** | 0 | 0 | 5 | Discovery-only front door. Agents start here and self-escalate. |
-| **lite** | 7 | ~35 | ~40 | Lightweight verification-focused workflows. CI bots, quick checks. |
-| **core** | 16 | ~75 | ~80 | Full development workflow. Most agent sessions. |
-| **full** | all | 89+ | 94+ | Everything enabled. Benchmarking, exploration, advanced use. |
+| **lite** | 8 | 38 | 43 | Lightweight verification-focused workflows. CI bots, quick checks. |
+| **core** | 22 | 109 | 114 | Full development workflow. Most agent sessions. |
+| **full** | 30 | 157 | 162 | Everything enabled. Benchmarking, exploration, advanced use. |
 ### Usage
@@ -338,11 +345,11 @@ This is the recommended starting point for autonomous agents. The self-escalatio
 **meta** (0 domains): No domain tools. Meta + Discovery only.
-**lite** (7 domains): `verification`, `eval`, `quality_gate`, `learning`, `recon`, `security`, `boilerplate`
+**lite** (8 domains): `verification`, `eval`, `quality_gate`, `learning`, `flywheel`, `recon`, `security`, `boilerplate`
-**core** (16 domains): Everything in lite plus `flywheel`, `bootstrap`, `self_eval`, `llm`, `platform`, `research_writing`, `flicker_detection`, `figma_flow`, `benchmark`
+**core** (22 domains): Everything in lite plus `bootstrap`, `self_eval`, `llm`, `platform`, `research_writing`, `flicker_detection`, `figma_flow`, `benchmark`, `session_memory`, `toon`, `pattern`, `git_workflow`, `seo`, `voice_bridge`
-**full** (all domains): All toolsets in TOOLSET_MAP including `ui_capture`, `vision`, `local_file`, `web`, `github`, `docs`, `parallel`, and everything in core.
+**full** (30 domains): All toolsets in TOOLSET_MAP including `ui_capture`, `vision`, `local_file`, `web`, `github`, `docs`, `parallel`, `gaia_solvers`, and everything in core.
 **→ Quick Refs:** Check current toolset: `findTools({ query: "*" })` | Self-escalate: restart with `--preset core` | See [MCP Tool Categories](#mcp-tool-categories) | CLI help: `npx nodebench-mcp --help`
@@ -699,6 +706,9 @@ Available via `getMethodology({ topic: "..." })`:
 | `parallel_agent_teams` | Multi-agent coordination, task locking, oracle testing | [Parallel Agent Teams](#parallel-agent-teams) |
 | `self_reinforced_learning` | Trajectory analysis, self-eval, improvement recs | [Self-Reinforced Learning](#self-reinforced-learning-loop) |
 | `toolset_gating` | 4 presets (meta, lite, core, full) and self-escalation | [Toolset Gating & Presets](#toolset-gating--presets) |
+| `toon_format` | TOON encoding — ~40% token savings vs JSON | TOON is on by default since v2.14.1 |
+| `seo_audit` | Full SEO audit workflow (technical + performance + content) | `seo_audit_url`, `check_page_performance`, `analyze_seo_content` |
+| `voice_bridge` | Voice pipeline design, config analysis, scaffolding | `design_voice_pipeline`, `analyze_voice_config` |
 **→ Quick Refs:** Find tools: `findTools({ query: "..." })` | Get any methodology: `getMethodology({ topic: "..." })` | See [MCP Tool Categories](#mcp-tool-categories)

package/README.md CHANGED Viewed

@@ -39,7 +39,7 @@ Every additional tool call produces a concrete artifact — an issue found, a ri
 **QA engineer** — Transitioned a manual QA workflow website into an AI agent-driven app for a pet care messaging platform. Uses NodeBench's quality gates, verification cycles, and eval runs to ensure the AI agent handles edge cases that manual QA caught but bare AI agents miss.
-Both found different subsets of the 143 tools useful — which is why NodeBench ships with 4 `--preset` levels to load only what you need.
+Both found different subsets of the 162 tools useful — which is why NodeBench ships with 4 `--preset` levels to load only what you need.
 ---
@@ -77,7 +77,7 @@ Tasks 1-3 start with zero prior knowledge. By task 9, the agent finds 2+ relevan
 ### Install (30 seconds)
 ```bash
-# Claude Code CLI — all 143 tools
+# Claude Code CLI — all 162 tools (TOON encoding on by default for ~40% token savings)
 claude mcp add nodebench -- npx -y nodebench-mcp
 # Or start with discovery only — 5 tools, agents self-escalate to what they need
@@ -189,7 +189,7 @@ Notes:
 ## Progressive Discovery
-143 tools is a lot. The progressive disclosure system helps agents find exactly what they need:
+162 tools is a lot. The progressive disclosure system helps agents find exactly what they need:
 ### Multi-modal search engine
@@ -197,7 +197,7 @@ Notes:
 > discover_tools("verify my implementation")
 ```
-The `discover_tools` search engine scores tools using **9 parallel strategies**:
+The `discover_tools` search engine scores tools using **10 parallel strategies**:
 | Strategy | What it does | Example |
 |---|---|---|
@@ -210,6 +210,7 @@ The `discover_tools` search engine scores tools using **9 parallel strategies**:
 | Regex | Pattern matching | `"^run_.*loop$"` → `run_closed_loop` |
 | Bigram | Phrase matching | "quality gate" matched as unit |
 | Domain boost | Related categories boosted together | verification + quality_gate cluster |
+| Dense | TF-IDF cosine similarity for vector-like ranking | "audit compliance" surfaces related tools |
 **7 search modes**: `hybrid` (default, all strategies), `fuzzy`, `regex`, `prefix`, `semantic`, `exact`, `dense`
@@ -227,7 +228,7 @@ Call `get_tool_quick_ref("tool_name")` for any tool's guidance.
 ### Workflow chains — step-by-step recipes
-21 pre-built chains for common workflows:
+24 pre-built chains for common workflows:
 | Chain | Steps | Use case |
 |---|---|---|
@@ -242,16 +243,19 @@ Call `get_tool_quick_ref("tool_name")` for any tool's guidance.
 | `code_review` | 8 | Structured code review |
 | `deployment` | 8 | Ship with full verification |
 | `migration` | 10 | SDK/framework upgrade |
-| `coordinator_spawn` | 6 | Parallel coordinator setup |
-| `self_setup` | 5 | Agent self-onboarding |
-| `flicker_detection` | 5 | Android flicker analysis |
+| `coordinator_spawn` | 10 | Parallel coordinator setup |
+| `self_setup` | 8 | Agent self-onboarding |
+| `flicker_detection` | 7 | Android flicker analysis |
 | `figma_flow_analysis` | 5 | Figma prototype flow audit |
-| `agent_eval` | 6 | Evaluate agent performance |
-| `contract_compliance` | 4 | Check agent contract adherence |
-| `ablation_eval` | 6 | Ablation experiment design |
+| `agent_eval` | 9 | Evaluate agent performance |
+| `contract_compliance` | 5 | Check agent contract adherence |
+| `ablation_eval` | 10 | Ablation experiment design |
 | `session_recovery` | 6 | Recover context after compaction |
 | `attention_refresh` | 4 | Reload bearings mid-session |
-| `task_bank_setup` | 5 | Create evaluation task banks |
+| `task_bank_setup` | 9 | Create evaluation task banks |
+| `pr_review` | 5 | Pull request review |
+| `seo_audit` | 6 | Full SEO audit |
+| `voice_pipeline` | 6 | Voice pipeline implementation |
 Call `get_workflow_chain("new_feature")` to get the step-by-step sequence.
@@ -282,7 +286,7 @@ Research → Risk → Implement → Test (3 layers) → Eval → Gate → Learn
 **Outer loop** (over time): Eval-driven development ensures improvement.
 **Together**: The AI Flywheel — every verification produces eval artifacts, every regression triggers verification.
-Ask the agent: `Use getMethodology("overview")` to see all 19 methodology topics.
+Ask the agent: `Use getMethodology("overview")` to see all 20 methodology topics.
 ---
@@ -313,7 +317,7 @@ Based on Anthropic's ["Building a C Compiler with Parallel Claudes"](https://www
 ## Toolset Gating
-143 tools means tens of thousands of tokens of schema per API call. If you only need core methodology, gate the toolset:
+162 tools means tens of thousands of tokens of schema per API call. If you only need core methodology, gate the toolset:
 ### Presets
@@ -321,8 +325,8 @@ Based on Anthropic's ["Building a C Compiler with Parallel Claudes"](https://www
 |---|---|---|---|
 | `meta` | 5 | 0 | Discovery-only front door — agents start here and self-escalate via `discover_tools` |
 | `lite` | 43 | 8 | Core methodology — verification, eval, flywheel, learning, recon, security, boilerplate |
-| `core` | 93 | 17 | Full workflow — adds bootstrap, self-eval, llm, platform, research_writing, flicker_detection, figma_flow, benchmark, session_memory |
-| `full` | 143 | 25 | Everything — adds vision, UI capture, web, GitHub, docs, parallel, local files, GAIA solvers |
+| `core` | 114 | 22 | Full workflow — adds bootstrap, self-eval, llm, platform, research_writing, flicker_detection, figma_flow, benchmark, session_memory, toon, pattern, git_workflow, seo, voice_bridge |
+| `full` | 162 | 30 | Everything — adds vision, UI capture, web, GitHub, docs, parallel, local files, GAIA solvers |
 ```bash
 # Meta — 5 tools (discovery-only: findTools, getMethodology, discover_tools, get_tool_quick_ref, get_workflow_chain)
@@ -332,10 +336,10 @@ claude mcp add nodebench -- npx -y nodebench-mcp --preset meta
 # Lite — 43 tools (verification, eval, flywheel, learning, recon, security, boilerplate + meta + discovery)
 claude mcp add nodebench -- npx -y nodebench-mcp --preset lite
-# Core — 93 tools (adds bootstrap, self-eval, llm, platform, research_writing, flicker_detection, figma_flow, benchmark, session_memory + meta + discovery)
+# Core — 114 tools (adds bootstrap, self-eval, llm, platform, research_writing, flicker_detection, figma_flow, benchmark, session_memory, toon, pattern, git_workflow, seo, voice_bridge + meta + discovery)
 claude mcp add nodebench -- npx -y nodebench-mcp --preset core
-# Full — all 143 tools (default)
+# Full — all 162 tools (default, TOON encoding on by default)
 claude mcp add nodebench -- npx -y nodebench-mcp
 ```
@@ -377,7 +381,7 @@ npx nodebench-mcp --help
 | flywheel | 4 | Mandatory flywheel, promote, investigate |
 | bootstrap | 11 | Project setup, agents.md, self-implement, autonomous, test runner |
 | self_eval | 9 | Trajectory analysis, health reports, task banks, grading, contract compliance |
-| parallel | 10 | Task locks, roles, context budget, oracle |
+| parallel | 13 | Task locks, roles, context budget, oracle, agent mailbox (point-to-point + broadcast) |
 | vision | 4 | Screenshot analysis, UI capture, diff |
 | ui_capture | 2 | Playwright-based capture |
 | web | 2 | Web search, URL fetch |
@@ -394,6 +398,11 @@ npx nodebench-mcp --help
 | benchmark | 3 | Autonomous benchmark lifecycle (C-compiler pattern) |
 | session_memory | 3 | Compaction-resilient notes, attention refresh, context reload |
 | gaia_solvers | 6 | GAIA media image solvers (red/green deviation, polygon area, fraction quiz, bass clef, storage cost) |
+| toon | 2 | TOON encode/decode — Token-Oriented Object Notation (~40% token savings) |
+| pattern | 2 | Session pattern mining + risk prediction from historical sequences |
+| git_workflow | 3 | Branch compliance, PR checklist review, merge gate enforcement |
+| seo | 5 | Technical SEO audit, page performance, content analysis, WordPress detection + updates |
+| voice_bridge | 4 | Voice pipeline design, config analysis, scaffold generation, latency benchmarking |
 Always included (regardless of gating) — these 5 tools form the `meta` preset:
 - Meta: `findTools`, `getMethodology`
@@ -401,6 +410,20 @@ Always included (regardless of gating) — these 5 tools form the `meta` preset:
 The `meta` preset loads **only** these 5 tools (0 domain tools). Agents use `discover_tools` to find what they need and self-escalate.
+### TOON Format — Token Savings
+TOON (Token-Oriented Object Notation) is **on by default** since v2.14.1. Every tool response is TOON-encoded for ~40% fewer tokens vs JSON. Disable with `--no-toon` if your client can't handle non-JSON responses.
+```bash
+# TOON on (default)
+claude mcp add nodebench -- npx -y nodebench-mcp
+# TOON off
+claude mcp add nodebench -- npx -y nodebench-mcp --no-toon
+```
+Use the `toon_encode` and `toon_decode` tools to convert between TOON and JSON in your own workflows.
 ---
 ## Build from Source

package/dist/index.js CHANGED Viewed

@@ -98,7 +98,7 @@ const PRESETS = {
 function parseToolsets() {
     if (cliArgs.includes("--help")) {
         const lines = [
-            "nodebench-mcp v2.14.1 — Development Methodology MCP Server",
+            "nodebench-mcp v2.14.2 — Development Methodology MCP Server",
             "",
             "Usage: nodebench-mcp [options]",
             "",

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "nodebench-mcp",
-  "version": "2.14.1",
-  "description": "Make AI agents catch the bugs they normally ship. 162 MCP tools across 30 domains: TOON format integration (~40% token savings), pattern mining (session sequence analysis + risk prediction), git workflow compliance (branch validation + PR checklist + merge gates), SEO audit (5 tools: technical SEO, performance, content analysis, WordPress detection), voice bridge (pipeline design + config analysis + scaffold generation + latency benchmarking), agent mailbox (point-to-point + broadcast messaging), progressive discovery with 7-mode hybrid search, model-tier complexity routing, agent contract (front-door + anti-rationalization + 3-strike error), compaction-resilient session memory, lightweight hooks, 6 GAIA media image solvers, project boilerplate, autonomous capability benchmarks, structured research, 3-layer testing, quality gates, persistent knowledge, LLM calling, security analysis, platform bridge, visual regression, report generation, academic paper writing, deterministic local file parsing, Android flicker detection, Figma flow analysis, and contract compliance scoring. --preset meta (5), lite (43), core (114), or full (162). --toon flag for token-efficient responses.",
+  "version": "2.14.2",
+  "description": "Make AI agents catch the bugs they normally ship. 162 MCP tools across 30 domains: TOON encoding on by default (~40% token savings, opt out with --no-toon), pattern mining (session sequence analysis + risk prediction), git workflow compliance (branch validation + PR checklist + merge gates), SEO audit (5 tools: technical SEO, performance, content analysis, WordPress detection), voice bridge (pipeline design + config analysis + scaffold generation + latency benchmarking), agent mailbox (point-to-point + broadcast messaging), progressive discovery with 7-mode hybrid search, model-tier complexity routing, agent contract (front-door + anti-rationalization + 3-strike error), compaction-resilient session memory, lightweight hooks, 6 GAIA media image solvers, project boilerplate, autonomous capability benchmarks, structured research, 3-layer testing, quality gates, persistent knowledge, LLM calling, security analysis, platform bridge, visual regression, report generation, academic paper writing, deterministic local file parsing, Android flicker detection, Figma flow analysis, and contract compliance scoring. --preset meta (5), lite (43), core (114), or full (162).",
   "type": "module",
   "bin": {
     "nodebench-mcp": "./dist/index.js"
@@ -57,7 +57,13 @@
     "qa-automation",
     "agentic",
     "academic-writing",
-    "research-paper"
+    "research-paper",
+    "toon",
+    "seo",
+    "voice-pipeline",
+    "git-workflow",
+    "pattern-mining",
+    "session-memory"
   ],
   "repository": {
     "type": "git",