npm - nodebench-mcp - Versions diffs - 1.4.0 → 2.0.0 - Mend

nodebench-mcp 1.4.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (38) hide show

package/NODEBENCH_AGENTS.md +154 -2
package/README.md +152 -192
package/dist/__tests__/comparativeBench.test.d.ts +1 -0
package/dist/__tests__/comparativeBench.test.js +722 -0
package/dist/__tests__/comparativeBench.test.js.map +1 -0
package/dist/__tests__/evalHarness.test.js +24 -2
package/dist/__tests__/evalHarness.test.js.map +1 -1
package/dist/__tests__/gaiaCapabilityEval.test.d.ts +14 -0
package/dist/__tests__/gaiaCapabilityEval.test.js +420 -0
package/dist/__tests__/gaiaCapabilityEval.test.js.map +1 -0
package/dist/__tests__/gaiaCapabilityFilesEval.test.d.ts +15 -0
package/dist/__tests__/gaiaCapabilityFilesEval.test.js +303 -0
package/dist/__tests__/gaiaCapabilityFilesEval.test.js.map +1 -0
package/dist/__tests__/openDatasetParallelEvalGaia.test.d.ts +7 -0
package/dist/__tests__/openDatasetParallelEvalGaia.test.js +279 -0
package/dist/__tests__/openDatasetParallelEvalGaia.test.js.map +1 -0
package/dist/__tests__/openDatasetPerfComparison.test.d.ts +10 -0
package/dist/__tests__/openDatasetPerfComparison.test.js +318 -0
package/dist/__tests__/openDatasetPerfComparison.test.js.map +1 -0
package/dist/__tests__/tools.test.js +155 -7
package/dist/__tests__/tools.test.js.map +1 -1
package/dist/db.js +56 -0
package/dist/db.js.map +1 -1
package/dist/index.js +370 -11
package/dist/index.js.map +1 -1
package/dist/tools/localFileTools.d.ts +15 -0
package/dist/tools/localFileTools.js +386 -0
package/dist/tools/localFileTools.js.map +1 -0
package/dist/tools/metaTools.js +170 -3
package/dist/tools/metaTools.js.map +1 -1
package/dist/tools/parallelAgentTools.d.ts +18 -0
package/dist/tools/parallelAgentTools.js +1272 -0
package/dist/tools/parallelAgentTools.js.map +1 -0
package/dist/tools/selfEvalTools.js +240 -10
package/dist/tools/selfEvalTools.js.map +1 -1
package/dist/tools/webTools.js +171 -37
package/dist/tools/webTools.js.map +1 -1
package/package.json +19 -7

package/NODEBENCH_AGENTS.md CHANGED Viewed

@@ -117,7 +117,7 @@ Run ToolBench parallel subagent benchmark:
 NODEBENCH_TOOLBENCH_TASK_LIMIT=6 NODEBENCH_TOOLBENCH_CONCURRENCY=3 npm run mcp:dataset:toolbench:test
 ```
-Run all lanes:
+Run all public lanes:
 ```bash
 npm run mcp:dataset:bench:all
 ```
@@ -137,11 +137,71 @@ Run SWE-bench parallel subagent benchmark:
 NODEBENCH_SWEBENCH_TASK_LIMIT=8 NODEBENCH_SWEBENCH_CONCURRENCY=4 npm run mcp:dataset:swebench:test
 ```
-Run all lanes:
+Fourth lane (GAIA gated long-horizon tool-augmented tasks):
+- Dataset: `gaia-benchmark/GAIA` (gated)
+- Default config: `2023_level3`
+- Default split: `validation`
+- Source: `https://huggingface.co/datasets/gaia-benchmark/GAIA`
+Notes:
+- Fixture is written to `.cache/gaia` (gitignored). Do not commit GAIA question/answer content.
+- Refresh requires `HF_TOKEN` or `HUGGINGFACE_HUB_TOKEN` in your shell.
+- Python deps: `pandas`, `huggingface_hub`, `pyarrow` (or equivalent parquet engine).
+Refresh GAIA fixture:
+```bash
+npm run mcp:dataset:gaia:refresh
+```
+Run GAIA parallel subagent benchmark:
+```bash
+NODEBENCH_GAIA_TASK_LIMIT=8 NODEBENCH_GAIA_CONCURRENCY=4 npm run mcp:dataset:gaia:test
+```
+GAIA capability benchmark (accuracy: LLM-only vs LLM+tools):
+- This runs real model calls and web search. It is disabled by default and only intended for regression checks.
+- Uses Gemini by default. Ensure `GEMINI_API_KEY` is available (repo `.env.local` is loaded by the test).
+- Scoring fixture includes ground-truth answers and MUST remain under `.cache/gaia` (gitignored).
+Generate scoring fixture (local only, gated):
+```bash
+npm run mcp:dataset:gaia:capability:refresh
+```
+Run capability benchmark:
+```bash
+NODEBENCH_GAIA_CAPABILITY_TASK_LIMIT=6 NODEBENCH_GAIA_CAPABILITY_CONCURRENCY=1 npm run mcp:dataset:gaia:capability:test
+```
+GAIA capability benchmark (file-backed lane: PDF / XLSX / CSV):
+- This lane measures the impact of deterministic local parsing tools on GAIA tasks with attachments.
+- Fixture includes ground-truth answers and MUST remain under `.cache/gaia` (gitignored).
+- Attachments are copied into `.cache/gaia/data/<file_path>` for offline deterministic runs after the first download.
+Generate file-backed scoring fixture + download attachments (local only, gated):
+```bash
+npm run mcp:dataset:gaia:capability:files:refresh
+```
+Run file-backed capability benchmark:
+```bash
+NODEBENCH_GAIA_CAPABILITY_TASK_LIMIT=6 NODEBENCH_GAIA_CAPABILITY_CONCURRENCY=1 npm run mcp:dataset:gaia:capability:files:test
+```
+Modes:
+- Recommended (more stable): `NODEBENCH_GAIA_CAPABILITY_TOOLS_MODE=rag`
+- More realistic (higher variance): `NODEBENCH_GAIA_CAPABILITY_TOOLS_MODE=agent` (optional `NODEBENCH_GAIA_CAPABILITY_FORCE_WEB_SEARCH=1`)
+Run all public lanes:
 ```bash
 npm run mcp:dataset:bench:all
 ```
+Run full lane suite (includes GAIA):
+```bash
+npm run mcp:dataset:bench:full
+```
 Implementation files:
 - `packages/mcp-local/src/__tests__/fixtures/generateBfclLongContextFixture.ts`
 - `packages/mcp-local/src/__tests__/fixtures/bfcl_v3_long_context.sample.json`
@@ -152,6 +212,16 @@ Implementation files:
 - `packages/mcp-local/src/__tests__/fixtures/generateSwebenchVerifiedFixture.ts`
 - `packages/mcp-local/src/__tests__/fixtures/swebench_verified.sample.json`
 - `packages/mcp-local/src/__tests__/openDatasetParallelEvalSwebench.test.ts`
+- `packages/mcp-local/src/__tests__/fixtures/generateGaiaLevel3Fixture.py`
+- `.cache/gaia/gaia_2023_level3_validation.sample.json`
+- `packages/mcp-local/src/__tests__/openDatasetParallelEvalGaia.test.ts`
+- `packages/mcp-local/src/__tests__/fixtures/generateGaiaCapabilityFixture.py`
+- `.cache/gaia/gaia_capability_2023_all_validation.sample.json`
+- `packages/mcp-local/src/__tests__/gaiaCapabilityEval.test.ts`
+- `packages/mcp-local/src/__tests__/fixtures/generateGaiaCapabilityFilesFixture.py`
+- `.cache/gaia/gaia_capability_files_2023_all_validation.sample.json`
+- `.cache/gaia/data/...` (local GAIA attachments; do not commit)
+- `packages/mcp-local/src/__tests__/gaiaCapabilityFilesEval.test.ts`
 Required tool chain per dataset task:
 - `run_recon`
@@ -176,6 +246,7 @@ Use `getMethodology("overview")` to see all available workflows.
 | Category | Tools | When to Use |
 |----------|-------|-------------|
 | **Web** | `web_search`, `fetch_url` | Research, reading docs, market validation |
+| **Local Files** | `read_pdf_text`, `read_xlsx_file`, `read_csv_file` | Deterministic parsing of local attachments (GAIA file-backed lane) |
 | **GitHub** | `search_github`, `analyze_repo` | Finding libraries, studying implementations |
 | **Verification** | `start_cycle`, `log_phase`, `complete_cycle` | Tracking the flywheel process |
 | **Eval** | `start_eval_run`, `log_test_result` | Test case management |
@@ -184,6 +255,7 @@ Use `getMethodology("overview")` to see all available workflows.
 | **Vision** | `analyze_screenshot`, `capture_ui_screenshot` | UI/UX verification |
 | **Bootstrap** | `discover_infrastructure`, `triple_verify`, `self_implement` | Self-setup, triple verification |
 | **Autonomous** | `assess_risk`, `decide_re_update`, `run_self_maintenance` | Risk-aware execution, self-maintenance |
+| **Parallel Agents** | `claim_agent_task`, `release_agent_task`, `list_agent_tasks`, `assign_agent_role`, `get_agent_role`, `log_context_budget`, `run_oracle_comparison`, `get_parallel_status` | Multi-agent coordination, task locking, role specialization, oracle testing |
 | **Meta** | `findTools`, `getMethodology` | Discover tools, get workflow guides |
 **→ Quick Refs:** Find tools by keyword: `findTools({ query: "verification" })` | Get workflow guide: `getMethodology({ topic: "..." })` | See [Methodology Topics](#methodology-topics) for all topics
@@ -538,11 +610,91 @@ Available via `getMethodology({ topic: "..." })`:
 | `agents_md_maintenance` | Keep docs in sync | [Auto-Update](#auto-update-this-file) |
 | `agent_bootstrap` | Self-discover, triple verify | [Self-Bootstrap](#agent-self-bootstrap-system) |
 | `autonomous_maintenance` | Risk-tiered execution | [Autonomous Maintenance](#autonomous-self-maintenance-system) |
+| `parallel_agent_teams` | Multi-agent coordination, task locking, oracle testing | [Parallel Agent Teams](#parallel-agent-teams) |
+| `self_reinforced_learning` | Trajectory analysis, self-eval, improvement recs | [Self-Reinforced Learning](#self-reinforced-learning-loop) |
 **→ Quick Refs:** Find tools: `findTools({ query: "..." })` | Get any methodology: `getMethodology({ topic: "..." })` | See [MCP Tool Categories](#mcp-tool-categories)
 ---
+## Parallel Agent Teams
+Based on Anthropic's ["Building a C Compiler with Parallel Claudes"](https://www.anthropic.com/engineering/building-c-compiler) (Feb 2026).
+Run multiple AI agents in parallel on a shared codebase with coordination via task locking, role specialization, context budget management, and oracle-based testing.
+### Quick Start — Parallel Agents
+```
+1. get_parallel_status({ includeHistory: true })     // Orient: what's happening?
+2. assign_agent_role({ role: "implementer" })         // Specialize
+3. claim_agent_task({ taskKey: "fix_auth" })           // Lock task
+4. ... do work ...
+5. log_context_budget({ eventType: "test_output", tokensUsed: 5000 })  // Track budget
+6. run_oracle_comparison({ testLabel: "auth_output", actualOutput: "...", expectedOutput: "...", oracleSource: "prod_v2" })
+7. release_agent_task({ taskKey: "fix_auth", status: "completed", progressNote: "Fixed JWT, added tests" })
+```
+### Predefined Agent Roles
+| Role | Focus |
+|------|-------|
+| `implementer` | Primary feature work. Picks failing tests, implements fixes. |
+| `dedup_reviewer` | Finds and coalesces duplicate implementations. |
+| `performance_optimizer` | Profiles bottlenecks, optimizes hot paths. |
+| `documentation_maintainer` | Keeps READMEs and progress files in sync. |
+| `code_quality_critic` | Structural improvements, pattern enforcement. |
+| `test_writer` | Writes targeted tests for edge cases and failure modes. |
+| `security_auditor` | Audits for vulnerabilities, logs CRITICAL gaps. |
+### Key Patterns (from Anthropic blog)
+- **Task Locking**: Claim before working. If two agents try the same task, the second picks a different one.
+- **Context Window Budget**: Do NOT print thousands of useless bytes. Pre-compute summaries. Use `--fast` mode (1-10% random sample) for large test suites. Log errors with ERROR prefix on same line for grep.
+- **Oracle Testing**: Compare output against known-good reference. Each failing comparison is an independent work item for a parallel agent.
+- **Time Blindness**: Agents can't tell time. Print progress infrequently. Use deterministic random sampling per-agent but randomized across VMs.
+- **Progress Files**: Maintain running docs of status, failed approaches, and remaining tasks. Fresh agent sessions read these to orient.
+- **Delta Debugging**: When tests pass individually but fail together, split the set in half to narrow down the minimal failing combination.
+### Bootstrap for External Repos
+When nodebench-mcp is connected to a project that lacks parallel agent infrastructure, it can auto-detect gaps and scaffold everything needed:
+```
+1. bootstrap_parallel_agents({ projectRoot: "/path/to/their/repo", dryRun: true })
+   // Scans 7 categories: task coordination, roles, oracle, context budget,
+   // progress files, AGENTS.md parallel section, git worktrees
+2. bootstrap_parallel_agents({ projectRoot: "...", dryRun: false, techStack: "TypeScript/React" })
+   // Creates .parallel-agents/ dir, progress.md, roles.json, lock dirs, oracle dirs
+3. generate_parallel_agents_md({ techStack: "TypeScript/React", projectName: "their-project", maxAgents: 4 })
+   // Generates portable AGENTS.md section — paste into their repo
+4. Run the 6-step flywheel plan returned by the bootstrap tool to verify
+5. Fix any issues, re-verify
+6. record_learning({ key: "bootstrap_their_project", content: "...", category: "pattern" })
+```
+The generated AGENTS.md section is framework-agnostic and works with any AI agent (Claude, GPT, etc.). It includes:
+- Task locking protocol (file-based, no dependencies)
+- Role definitions and assignment guide
+- Oracle testing workflow with idiomatic examples
+- Context budget rules
+- Progress file protocol
+- Anti-patterns to avoid
+- Optional nodebench-mcp tool mapping table
+### MCP Prompts for Parallel Agent Teams
+- `parallel-agent-team` — Full team setup with role assignment and task breakdown
+- `oracle-test-harness` — Oracle-based testing setup for a component
+- `bootstrap-parallel-agents` — Detect and scaffold parallel agent infra for any external repo
+**→ Quick Refs:** Full methodology: `getMethodology({ topic: "parallel_agent_teams" })` | Find parallel tools: `findTools({ category: "parallel_agents" })` | Bootstrap external repo: `bootstrap_parallel_agents({ projectRoot: "..." })` | See [AI Flywheel](#the-ai-flywheel-mandatory)
+---
 ## Auto-Update This File
 Agents can self-update this file:

package/README.md CHANGED Viewed

@@ -1,264 +1,224 @@
-# NodeBench MCP Server
+# NodeBench MCP
-A fully local, zero-config MCP server with 46 tools for AI-powered development workflows.
+**Make AI agents catch the bugs they normally ship.**
-**Features:**
-- Web search (Gemini/OpenAI/Perplexity)
-- GitHub repository discovery and analysis
-- Job market research
-- AGENTS.md self-maintenance
-- AI vision for screenshot analysis
-- 6-phase verification flywheel
-- SQLite-backed learning database
+One command gives your agent structured research, risk assessment, 3-layer testing, quality gates, and a persistent knowledge base — so every fix is thorough and every insight compounds into future work.
-## Quick Start (1 minute)
-### 1. Add to Claude Code settings
-Add to `~/.claude/settings.json`:
-```json
-{
-  "mcpServers": {
-    "nodebench": {
-      "command": "npx",
-      "args": ["-y", "nodebench-mcp"]
-    }
-  }
-}
+```bash
+claude mcp add nodebench -- npx -y nodebench-mcp
 ```
-That's it. Restart Claude Code and you have 46 tools.
 ---
-## Alternative: Build from source
+## Why — What Bare Agents Miss
-```bash
-git clone https://github.com/nodebench/nodebench-ai.git
-cd nodebench-ai/packages/mcp-local
-npm install && npm run build
-```
+We benchmarked 9 real production prompts — things like *"The LinkedIn posting pipeline is creating duplicate posts"* and *"The agent loop hits budget but still gets new events"* — comparing a bare agent vs one with NodeBench MCP.
-Then use absolute path in settings:
+| What gets measured | Bare Agent | With NodeBench MCP |
+|---|---|---|
+| Issues detected before deploy | 0 | **13** (4 high, 8 medium, 1 low) |
+| Research findings before coding | 0 | **21** |
+| Risk assessments | 0 | **9** |
+| Test coverage layers | 1 | **3** (static + unit + integration) |
+| Integration failures caught early | 0 | **4** |
+| Regression eval cases created | 0 | **22** |
+| Quality gate rules enforced | 0 | **52** |
+| Deploys blocked by gate violations | 0 | **4** |
+| Knowledge entries banked | 0 | **9** |
+| Blind spots shipped to production | **26** | **0** |
-```json
-{
-  "mcpServers": {
-    "nodebench": {
-      "command": "node",
-      "args": ["/path/to/packages/mcp-local/dist/index.js"]
-    }
-  }
-}
-```
+The bare agent reads the code, implements a fix, runs tests once, and ships. The MCP agent researches first, assesses risk, tracks issues to resolution, runs 3-layer tests, creates regression guards, enforces quality gates, and banks everything as knowledge for next time.
-### 3. Add API keys (optional but recommended)
+Every additional tool call produces a concrete artifact — an issue found, a risk assessed, a regression guarded — that compounds across future tasks.
-Add to your shell profile (`~/.bashrc`, `~/.zshrc`, or Windows Environment Variables):
+---
-```bash
-# Required for web search (pick one)
-export GEMINI_API_KEY="your-key"        # Best: Google Search grounding
-export OPENAI_API_KEY="your-key"        # Alternative: GPT-4o web search
-export PERPLEXITY_API_KEY="your-key"    # Alternative: Perplexity
-# Required for GitHub (higher rate limits)
-export GITHUB_TOKEN="your-token"        # github.com/settings/tokens
-# Required for vision analysis (pick one)
-export GEMINI_API_KEY="your-key"        # Best: Gemini 2.5 Flash
-export OPENAI_API_KEY="your-key"        # Alternative: GPT-4o
-export ANTHROPIC_API_KEY="your-key"     # Alternative: Claude
-```
+## How It Works — 3 Real Examples
-### 4. Restart Claude Code
+### Example 1: Bug fix
-```bash
-# Quit and reopen Claude Code, or run:
-claude --mcp-debug
-```
+You type: *"The content queue has 40 items stuck in 'judging' status for 6 hours"*
-### 5. Test it works
+**Bare agent:** Reads the queue code, finds a potential fix, runs tests, ships.
-In Claude Code, try these prompts:
+**With NodeBench MCP:** The agent runs structured recon and discovers 3 blind spots the bare agent misses:
+- No retry backoff on OpenRouter rate limits (HIGH)
+- JSON regex `match(/\{[\s\S]*\}/)` grabs last `}` — breaks on multi-object responses (MEDIUM)
+- No timeout on LLM call — hung request blocks entire cron for 15+ min (not detected by unit tests)
-```
-# Check your environment
-> Use setup_local_env to check my development environment
+All 3 are logged as gaps, resolved, regression-tested, and the patterns banked so the next similar bug is fixed faster.
-# Search GitHub
-> Use search_github to find TypeScript MCP servers with at least 100 stars
+### Example 2: Parallel agents overwriting each other
-# Fetch documentation
-> Use fetch_url to read https://modelcontextprotocol.io/introduction
+You type: *"I launched 3 Claude Code subagents but they keep overwriting each other's changes"*
-# Get methodology
-> Use getMethodology("overview") to see all available workflows
-```
----
+**Without NodeBench:** Both agents see the same bug and both implement a fix. The third agent re-investigates what agent 1 already solved. Agent 2 hits context limit mid-fix and loses work.
-## Tool Categories
-| Category | Tools | Description |
-|----------|-------|-------------|
-| **Web** | `web_search`, `fetch_url` | Search the web, fetch URLs as markdown |
-| **GitHub** | `search_github`, `analyze_repo` | Find repos, analyze tech stacks |
-| **Documentation** | `update_agents_md`, `research_job_market`, `setup_local_env` | Self-maintaining docs, job research |
-| **Vision** | `discover_vision_env`, `analyze_screenshot`, `manipulate_screenshot` | AI-powered image analysis |
-| **UI Capture** | `capture_ui_screenshot`, `capture_responsive_suite` | Browser screenshots (requires Playwright) |
-| **Verification** | `start_cycle`, `log_phase`, `complete_cycle` | 6-phase dev workflow |
-| **Eval** | `start_eval_run`, `log_test_result`, `list_eval_runs` | Test case tracking |
-| **Quality Gates** | `run_quality_gate`, `get_gate_history` | Pass/fail checkpoints |
-| **Learning** | `record_learning`, `search_learnings`, `search_all_knowledge` | Persistent knowledge base |
-| **Flywheel** | `run_closed_loop`, `check_framework_updates` | Automated workflows |
-| **Recon** | `run_recon`, `log_recon_finding`, `log_gap` | Discovery and gap tracking |
-| **Meta** | `findTools`, `getMethodology` | Tool discovery, methodology guides |
+**With NodeBench MCP:** Each subagent calls `claim_agent_task` to lock its work. Roles are assigned so they don't overlap. Context budget is tracked. Progress notes ensure handoff without starting from scratch.
----
+### Example 3: Knowledge compounding
-## Methodology Topics (15 total)
-Ask Claude: `Use getMethodology("topic_name")`
-- `overview` — See all methodologies
-- `verification` — 6-phase development cycle
-- `eval` — Test case management
-- `flywheel` — Continuous improvement loop
-- `mandatory_flywheel` — Required verification for changes
-- `reconnaissance` — Codebase discovery
-- `quality_gates` — Pass/fail checkpoints
-- `ui_ux_qa` — Frontend verification
-- `agentic_vision` — AI-powered visual QA
-- `closed_loop` — Build/test before presenting
-- `learnings` — Knowledge persistence
-- `project_ideation` — Validate ideas before building
-- `tech_stack_2026` — Dependency management
-- `telemetry_setup` — Observability setup
-- `agents_md_maintenance` — Keep docs in sync
+Tasks 1-3 start with zero prior knowledge. By task 9, the agent finds 2+ relevant prior findings before writing a single line of code. Bare agents start from zero every time.
 ---
-## VSCode Extension Setup
+## Quick Start
-If using the Claude Code VSCode extension:
+### Install (30 seconds)
-1. Open VSCode Settings (Ctrl/Cmd + ,)
-2. Search for "Claude Code MCP"
-3. Add server configuration:
+```bash
+# Claude Code CLI (recommended)
+claude mcp add nodebench -- npx -y nodebench-mcp
+```
+Or add to `~/.claude/settings.json` or `.claude.json`:
 ```json
 {
-  "claude-code.mcpServers": {
+  "mcpServers": {
     "nodebench": {
-      "command": "node",
-      "args": ["/absolute/path/to/packages/mcp-local/dist/index.js"]
+      "command": "npx",
+      "args": ["-y", "nodebench-mcp"]
     }
   }
 }
 ```
----
+### First prompts to try
-## Optional Dependencies
+```
+# See what's available
+> Use getMethodology("overview") to see all workflows
-Install for additional features:
+# Before your next task — search for prior knowledge
+> Use search_all_knowledge("what I'm about to work on")
-```bash
-# Screenshot capture (headless browser)
-npm install playwright
-npx playwright install chromium
-# Image manipulation
-npm install sharp
+# Run the full verification pipeline on a change
+> Use getMethodology("mandatory_flywheel") and follow the 6 steps
+```
-# HTML parsing (already included)
-npm install cheerio
+### Optional: API keys for web search and vision
-# AI providers (pick your preferred)
-npm install @google/genai    # Gemini
-npm install openai           # OpenAI
-npm install @anthropic-ai/sdk # Anthropic
+```bash
+export GEMINI_API_KEY="your-key"        # Web search + vision (recommended)
+export GITHUB_TOKEN="your-token"        # GitHub (higher rate limits)
 ```
 ---
-## Troubleshooting
-**"No search provider available"**
-- Set at least one API key: `GEMINI_API_KEY`, `OPENAI_API_KEY`, or `PERPLEXITY_API_KEY`
-**"GitHub API error 403"**
-- Set `GITHUB_TOKEN` for higher rate limits (60/hour without, 5000/hour with)
-**"Cannot find module"**
-- Run `npm run build` in the mcp-local directory
-**MCP not connecting**
-- Check path is absolute in settings.json
-- Run `claude --mcp-debug` to see connection errors
-- Ensure Node.js >= 18
+## What You Get
+### Core workflow (use these every session)
+| When you... | Use this | Impact |
+|---|---|---|
+| Start any task | `search_all_knowledge` | Find prior findings — avoid repeating past mistakes |
+| Research before coding | `run_recon` + `log_recon_finding` | Structured research with surfaced findings |
+| Assess risk before acting | `assess_risk` | Risk tier determines if action needs confirmation |
+| Track implementation | `start_verification_cycle` + `log_gap` | Issues logged with severity, tracked to resolution |
+| Test thoroughly | `log_test_result` (3 layers) | Static + unit + integration vs running tests once |
+| Guard against regression | `start_eval_run` + `record_eval_result` | Eval cases that protect this fix in the future |
+| Gate before deploy | `run_quality_gate` | Boolean rules enforced — violations block deploy |
+| Bank knowledge | `record_learning` | Persisted findings compound across future sessions |
+| Verify completeness | `run_mandatory_flywheel` | 6-step minimum — catches dead code and intent mismatches |
+### When running parallel agents (Claude Code subagents, worktrees)
+| When you... | Use this | Impact |
+|---|---|---|
+| Prevent duplicate work | `claim_agent_task` / `release_agent_task` | Task locks — each task owned by exactly one agent |
+| Specialize agents | `assign_agent_role` | 7 roles: implementer, test_writer, critic, etc. |
+| Track context usage | `log_context_budget` | Prevents context exhaustion mid-fix |
+| Validate against reference | `run_oracle_comparison` | Compare output against known-good oracle |
+| Orient new sessions | `get_parallel_status` | See what all agents are doing and what's blocked |
+| Bootstrap any repo | `bootstrap_parallel_agents` | Auto-detect gaps, scaffold coordination infra |
+### Research and discovery
+| When you... | Use this | Impact |
+|---|---|---|
+| Search the web | `web_search` | Gemini/OpenAI/Perplexity — latest docs and updates |
+| Fetch a URL | `fetch_url` | Read any page as clean markdown |
+| Find GitHub repos | `search_github` + `analyze_repo` | Discover and evaluate libraries and patterns |
+| Analyze screenshots | `analyze_screenshot` | AI vision (Gemini/GPT-4o/Claude) for UI QA |
 ---
-## Example Workflows
+## The Methodology Pipeline
-### Research a new project idea
+NodeBench MCP isn't just a bag of tools — it's a pipeline. Each step feeds the next:
 ```
-1. Use getMethodology("project_ideation") for the 6-step process
-2. Use web_search to validate market demand
-3. Use search_github to find similar projects
-4. Use analyze_repo to study competitor implementations
-5. Use research_job_market to understand skill demand
+Research → Risk → Implement → Test (3 layers) → Eval → Gate → Learn → Ship
+    ↑                                                              │
+    └──────────── knowledge compounds ─────────────────────────────┘
 ```
-### Analyze a GitHub repo before using it
+**Inner loop** (per change): 6-phase verification ensures correctness.
+**Outer loop** (over time): Eval-driven development ensures improvement.
+**Together**: The AI Flywheel — every verification produces eval artifacts, every regression triggers verification.
-```
-1. Use search_github({ query: "mcp server", language: "typescript", minStars: 100 })
-2. Use analyze_repo({ repoUrl: "owner/repo" }) to see tech stack and patterns
-3. Use fetch_url to read their documentation
-```
+Ask the agent: `Use getMethodology("overview")` to see all 18 methodology topics.
-### Set up a new development environment
+---
-```
-1. Use setup_local_env to scan current environment
-2. Follow the recommendations to install missing SDKs
-3. Use getMethodology("tech_stack_2026") for ongoing maintenance
-```
+## Parallel Agents with Claude Code
----
+Based on Anthropic's ["Building a C Compiler with Parallel Claudes"](https://www.anthropic.com/engineering/building-c-compiler) (Feb 2026).
-## Agent Protocol (NODEBENCH_AGENTS.md)
+**When to use:** Only when running 2+ agent sessions. Single-agent workflows use the standard pipeline above.
-The package includes `NODEBENCH_AGENTS.md` — a portable agent operating procedure that any AI agent can use to self-configure.
+**How it works with Claude Code's Task tool:**
-**What it provides:**
-- The 6-step AI Flywheel verification process (mandatory for all changes)
-- MCP tool usage patterns and workflows
-- Quality gate definitions
-- Post-implementation checklists
-- Self-update instructions
+1. **COORDINATOR** (your main session) breaks work into independent tasks
+2. Each **Task tool** call spawns a subagent with instructions to:
+   - `claim_agent_task` — lock the task
+   - `assign_agent_role` — specialize (implementer, test_writer, critic, etc.)
+   - Do the work
+   - `release_agent_task` — handoff with progress note
+3. Coordinator calls `get_parallel_status` to monitor all subagents
+4. Coordinator runs `run_quality_gate` on the aggregate result
-**To use in your project:**
+**MCP Prompts available:**
+- `claude-code-parallel` — Step-by-step Claude Code subagent coordination
+- `parallel-agent-team` — Full team setup with role assignment
+- `oracle-test-harness` — Validate outputs against known-good reference
+- `bootstrap-parallel-agents` — Scaffold parallel infra for any repo
-1. Copy `NODEBENCH_AGENTS.md` to your repo root
-2. Agents will auto-discover and follow the protocol
-3. Use `update_agents_md` tool to keep it in sync
+---
-Or fetch it directly:
+## Build from Source
 ```bash
-curl -o AGENTS.md https://raw.githubusercontent.com/nodebench/nodebench-ai/main/packages/mcp-local/NODEBENCH_AGENTS.md
+git clone https://github.com/nodebench/nodebench-ai.git
+cd nodebench-ai/packages/mcp-local
+npm install && npm run build
 ```
-The file is designed to be:
-- **Portable** — Works in any repo, any language
-- **Self-updating** — Agents can modify it via MCP tools
-- **Composable** — Add your own sections alongside the standard protocol
+Then use absolute path:
+```json
+{
+  "mcpServers": {
+    "nodebench": {
+      "command": "node",
+      "args": ["/path/to/packages/mcp-local/dist/index.js"]
+    }
+  }
+}
+```
+---
+## Troubleshooting
+**"No search provider available"** — Set `GEMINI_API_KEY`, `OPENAI_API_KEY`, or `PERPLEXITY_API_KEY`
+**"GitHub API error 403"** — Set `GITHUB_TOKEN` for higher rate limits
+**"Cannot find module"** — Run `npm run build` in the mcp-local directory
+**MCP not connecting** — Check path is absolute, run `claude --mcp-debug`, ensure Node.js >= 18
 ---

package/dist/__tests__/comparativeBench.test.d.ts ADDED Viewed

	@@ -0,0 +1 @@
1	+ export {};