nodebench-mcp 1.2.0 → 1.4.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/NODEBENCH_AGENTS.md +253 -20
- package/README.md +47 -6
- package/STYLE_GUIDE.md +477 -0
- package/dist/__tests__/evalDatasetBench.test.d.ts +1 -0
- package/dist/__tests__/evalDatasetBench.test.js +738 -0
- package/dist/__tests__/evalDatasetBench.test.js.map +1 -0
- package/dist/__tests__/evalHarness.test.d.ts +1 -0
- package/dist/__tests__/evalHarness.test.js +830 -0
- package/dist/__tests__/evalHarness.test.js.map +1 -0
- package/dist/__tests__/fixtures/bfcl_v3_long_context.sample.json +264 -0
- package/dist/__tests__/fixtures/generateBfclLongContextFixture.d.ts +10 -0
- package/dist/__tests__/fixtures/generateBfclLongContextFixture.js +135 -0
- package/dist/__tests__/fixtures/generateBfclLongContextFixture.js.map +1 -0
- package/dist/__tests__/fixtures/generateSwebenchVerifiedFixture.d.ts +14 -0
- package/dist/__tests__/fixtures/generateSwebenchVerifiedFixture.js +189 -0
- package/dist/__tests__/fixtures/generateSwebenchVerifiedFixture.js.map +1 -0
- package/dist/__tests__/fixtures/generateToolbenchInstructionFixture.d.ts +16 -0
- package/dist/__tests__/fixtures/generateToolbenchInstructionFixture.js +154 -0
- package/dist/__tests__/fixtures/generateToolbenchInstructionFixture.js.map +1 -0
- package/dist/__tests__/fixtures/swebench_verified.sample.json +162 -0
- package/dist/__tests__/fixtures/toolbench_instruction.sample.json +109 -0
- package/dist/__tests__/openDatasetParallelEval.test.d.ts +7 -0
- package/dist/__tests__/openDatasetParallelEval.test.js +209 -0
- package/dist/__tests__/openDatasetParallelEval.test.js.map +1 -0
- package/dist/__tests__/openDatasetParallelEvalSwebench.test.d.ts +7 -0
- package/dist/__tests__/openDatasetParallelEvalSwebench.test.js +220 -0
- package/dist/__tests__/openDatasetParallelEvalSwebench.test.js.map +1 -0
- package/dist/__tests__/openDatasetParallelEvalToolbench.test.d.ts +7 -0
- package/dist/__tests__/openDatasetParallelEvalToolbench.test.js +218 -0
- package/dist/__tests__/openDatasetParallelEvalToolbench.test.js.map +1 -0
- package/dist/__tests__/tools.test.js +252 -3
- package/dist/__tests__/tools.test.js.map +1 -1
- package/dist/db.js +20 -0
- package/dist/db.js.map +1 -1
- package/dist/index.js +2 -0
- package/dist/index.js.map +1 -1
- package/dist/tools/agentBootstrapTools.d.ts +5 -1
- package/dist/tools/agentBootstrapTools.js +566 -1
- package/dist/tools/agentBootstrapTools.js.map +1 -1
- package/dist/tools/documentationTools.js +102 -8
- package/dist/tools/documentationTools.js.map +1 -1
- package/dist/tools/learningTools.js +6 -2
- package/dist/tools/learningTools.js.map +1 -1
- package/dist/tools/metaTools.js +112 -1
- package/dist/tools/metaTools.js.map +1 -1
- package/dist/tools/selfEvalTools.d.ts +12 -0
- package/dist/tools/selfEvalTools.js +568 -0
- package/dist/tools/selfEvalTools.js.map +1 -0
- package/package.json +11 -3
package/NODEBENCH_AGENTS.md
CHANGED
|
@@ -21,7 +21,9 @@ Add to `~/.claude/settings.json`:
|
|
|
21
21
|
}
|
|
22
22
|
```
|
|
23
23
|
|
|
24
|
-
Restart Claude Code.
|
|
24
|
+
Restart Claude Code. 56 tools available immediately.
|
|
25
|
+
|
|
26
|
+
**→ Quick Refs:** After setup, run `getMethodology("overview")` | First task? See [Verification Cycle](#verification-cycle-workflow) | New to codebase? See [Environment Setup](#environment-setup)
|
|
25
27
|
|
|
26
28
|
---
|
|
27
29
|
|
|
@@ -51,10 +53,119 @@ Review the code for:
|
|
|
51
53
|
### Step 5: Fix and Re-Verify
|
|
52
54
|
If any gap found: fix it, then restart from Step 1.
|
|
53
55
|
|
|
54
|
-
### Step 6:
|
|
56
|
+
### Step 6: Live E2E Test (MANDATORY)
|
|
57
|
+
**Before declaring done or publishing:**
|
|
58
|
+
```bash
|
|
59
|
+
echo '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"YOUR_TOOL","arguments":{...}}}' | node dist/index.js
|
|
60
|
+
```
|
|
61
|
+
Every new/modified tool MUST pass stdio E2E test. No exceptions.
|
|
62
|
+
|
|
63
|
+
For workflow-level changes (verification, eval, recon, quality gates, flywheel, or knowledge tools), also run the long-running open-source benchmark:
|
|
64
|
+
```bash
|
|
65
|
+
npm --prefix packages/mcp-local run dataset:bfcl:refresh
|
|
66
|
+
NODEBENCH_OPEN_DATASET_TASK_LIMIT=12 NODEBENCH_OPEN_DATASET_CONCURRENCY=6 npm --prefix packages/mcp-local run test:open-dataset
|
|
67
|
+
npm --prefix packages/mcp-local run dataset:toolbench:refresh
|
|
68
|
+
NODEBENCH_TOOLBENCH_TASK_LIMIT=6 NODEBENCH_TOOLBENCH_CONCURRENCY=3 npm --prefix packages/mcp-local run test:open-dataset:toolbench
|
|
69
|
+
npm --prefix packages/mcp-local run dataset:swebench:refresh
|
|
70
|
+
NODEBENCH_SWEBENCH_TASK_LIMIT=8 NODEBENCH_SWEBENCH_CONCURRENCY=4 npm --prefix packages/mcp-local run test:open-dataset:swebench
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
### Step 7: Document Learnings
|
|
55
74
|
Record edge cases discovered. Update this file if needed.
|
|
56
75
|
|
|
57
|
-
**Rule: No change ships without passing all
|
|
76
|
+
**Rule: No change ships without passing all 7 steps.**
|
|
77
|
+
|
|
78
|
+
**→ Quick Refs:** Track progress with `start_verification_cycle` | Record findings with `record_learning` | Run gate with `run_quality_gate` | See [Post-Implementation Checklist](#post-implementation-checklist)
|
|
79
|
+
|
|
80
|
+
---
|
|
81
|
+
|
|
82
|
+
## Open-Source Long-Running MCP Benchmark
|
|
83
|
+
|
|
84
|
+
Use open-source long-context tasks to validate real orchestration behavior under parallel load.
|
|
85
|
+
|
|
86
|
+
- Dataset: `gorilla-llm/Berkeley-Function-Calling-Leaderboard`
|
|
87
|
+
- Split: `BFCL_v3_multi_turn_long_context`
|
|
88
|
+
- Source: `https://huggingface.co/datasets/gorilla-llm/Berkeley-Function-Calling-Leaderboard`
|
|
89
|
+
|
|
90
|
+
Refresh local fixture:
|
|
91
|
+
```bash
|
|
92
|
+
npm run mcp:dataset:refresh
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
Run parallel subagent benchmark:
|
|
96
|
+
```bash
|
|
97
|
+
NODEBENCH_OPEN_DATASET_TASK_LIMIT=12 NODEBENCH_OPEN_DATASET_CONCURRENCY=6 npm run mcp:dataset:test
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
Run refresh + benchmark in one shot:
|
|
101
|
+
```bash
|
|
102
|
+
npm run mcp:dataset:bench
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
Second lane (ToolBench multi-tool instructions):
|
|
106
|
+
- Dataset: `OpenBMB/ToolBench`
|
|
107
|
+
- Split: `data_example/instruction (G1,G2,G3)`
|
|
108
|
+
- Source: `https://github.com/OpenBMB/ToolBench`
|
|
109
|
+
|
|
110
|
+
Refresh ToolBench fixture:
|
|
111
|
+
```bash
|
|
112
|
+
npm run mcp:dataset:toolbench:refresh
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
Run ToolBench parallel subagent benchmark:
|
|
116
|
+
```bash
|
|
117
|
+
NODEBENCH_TOOLBENCH_TASK_LIMIT=6 NODEBENCH_TOOLBENCH_CONCURRENCY=3 npm run mcp:dataset:toolbench:test
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
Run all lanes:
|
|
121
|
+
```bash
|
|
122
|
+
npm run mcp:dataset:bench:all
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
Third lane (SWE-bench Verified long-horizon software tasks):
|
|
126
|
+
- Dataset: `princeton-nlp/SWE-bench_Verified`
|
|
127
|
+
- Split: `test`
|
|
128
|
+
- Source: `https://huggingface.co/datasets/princeton-nlp/SWE-bench_Verified`
|
|
129
|
+
|
|
130
|
+
Refresh SWE-bench fixture:
|
|
131
|
+
```bash
|
|
132
|
+
npm run mcp:dataset:swebench:refresh
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
Run SWE-bench parallel subagent benchmark:
|
|
136
|
+
```bash
|
|
137
|
+
NODEBENCH_SWEBENCH_TASK_LIMIT=8 NODEBENCH_SWEBENCH_CONCURRENCY=4 npm run mcp:dataset:swebench:test
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
Run all lanes:
|
|
141
|
+
```bash
|
|
142
|
+
npm run mcp:dataset:bench:all
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
Implementation files:
|
|
146
|
+
- `packages/mcp-local/src/__tests__/fixtures/generateBfclLongContextFixture.ts`
|
|
147
|
+
- `packages/mcp-local/src/__tests__/fixtures/bfcl_v3_long_context.sample.json`
|
|
148
|
+
- `packages/mcp-local/src/__tests__/openDatasetParallelEval.test.ts`
|
|
149
|
+
- `packages/mcp-local/src/__tests__/fixtures/generateToolbenchInstructionFixture.ts`
|
|
150
|
+
- `packages/mcp-local/src/__tests__/fixtures/toolbench_instruction.sample.json`
|
|
151
|
+
- `packages/mcp-local/src/__tests__/openDatasetParallelEvalToolbench.test.ts`
|
|
152
|
+
- `packages/mcp-local/src/__tests__/fixtures/generateSwebenchVerifiedFixture.ts`
|
|
153
|
+
- `packages/mcp-local/src/__tests__/fixtures/swebench_verified.sample.json`
|
|
154
|
+
- `packages/mcp-local/src/__tests__/openDatasetParallelEvalSwebench.test.ts`
|
|
155
|
+
|
|
156
|
+
Required tool chain per dataset task:
|
|
157
|
+
- `run_recon`
|
|
158
|
+
- `log_recon_finding`
|
|
159
|
+
- `findTools`
|
|
160
|
+
- `getMethodology`
|
|
161
|
+
- `start_eval_run`
|
|
162
|
+
- `record_eval_result`
|
|
163
|
+
- `complete_eval_run`
|
|
164
|
+
- `run_closed_loop`
|
|
165
|
+
- `run_mandatory_flywheel`
|
|
166
|
+
- `search_all_knowledge`
|
|
167
|
+
|
|
168
|
+
**→ Quick Refs:** Core process in [AI Flywheel](#the-ai-flywheel-mandatory) | Verification flow in [Verification Cycle](#verification-cycle-workflow) | Loop discipline in [Closed Loop Principle](#closed-loop-principle)
|
|
58
169
|
|
|
59
170
|
---
|
|
60
171
|
|
|
@@ -72,8 +183,11 @@ Use `getMethodology("overview")` to see all available workflows.
|
|
|
72
183
|
| **Learning** | `record_learning`, `search_all_knowledge` | Persistent knowledge base |
|
|
73
184
|
| **Vision** | `analyze_screenshot`, `capture_ui_screenshot` | UI/UX verification |
|
|
74
185
|
| **Bootstrap** | `discover_infrastructure`, `triple_verify`, `self_implement` | Self-setup, triple verification |
|
|
186
|
+
| **Autonomous** | `assess_risk`, `decide_re_update`, `run_self_maintenance` | Risk-aware execution, self-maintenance |
|
|
75
187
|
| **Meta** | `findTools`, `getMethodology` | Discover tools, get workflow guides |
|
|
76
188
|
|
|
189
|
+
**→ Quick Refs:** Find tools by keyword: `findTools({ query: "verification" })` | Get workflow guide: `getMethodology({ topic: "..." })` | See [Methodology Topics](#methodology-topics) for all topics
|
|
190
|
+
|
|
77
191
|
---
|
|
78
192
|
|
|
79
193
|
## Verification Cycle Workflow
|
|
@@ -93,6 +207,8 @@ If blocked or failed:
|
|
|
93
207
|
abandon_cycle({ reason: "Blocked by external dependency" })
|
|
94
208
|
```
|
|
95
209
|
|
|
210
|
+
**→ Quick Refs:** Before starting: `search_all_knowledge({ query: "your task" })` | After completing: `record_learning({ ... })` | Run flywheel: See [AI Flywheel](#the-ai-flywheel-mandatory) | Track quality: See [Quality Gates](#quality-gates)
|
|
211
|
+
|
|
96
212
|
---
|
|
97
213
|
|
|
98
214
|
## Recording Learnings
|
|
@@ -113,6 +229,8 @@ Search later with:
|
|
|
113
229
|
search_all_knowledge({ query: "convex index" })
|
|
114
230
|
```
|
|
115
231
|
|
|
232
|
+
**→ Quick Refs:** Search before implementing: `search_all_knowledge` | `search_learnings` and `list_learnings` are DEPRECATED | Part of flywheel Step 7 | See [Verification Cycle](#verification-cycle-workflow)
|
|
233
|
+
|
|
116
234
|
---
|
|
117
235
|
|
|
118
236
|
## Quality Gates
|
|
@@ -132,6 +250,8 @@ run_quality_gate({
|
|
|
132
250
|
|
|
133
251
|
Gate history tracks pass/fail over time.
|
|
134
252
|
|
|
253
|
+
**→ Quick Refs:** Get preset rules: `get_gate_preset({ preset: "ui_ux_qa" })` | View history: `get_gate_history({ gateName: "..." })` | UI/UX gates: See [Vision](#vision-analysis) | Part of flywheel Step 5 re-verify
|
|
254
|
+
|
|
135
255
|
---
|
|
136
256
|
|
|
137
257
|
## Web Research Workflow
|
|
@@ -146,6 +266,8 @@ For market research or tech evaluation:
|
|
|
146
266
|
5. record_learning({ ... }) // save key findings
|
|
147
267
|
```
|
|
148
268
|
|
|
269
|
+
**→ Quick Refs:** Analyze repo structure: `analyze_repo` | Save findings: `record_learning` | Part of: `getMethodology({ topic: "project_ideation" })` | See [Recording Learnings](#recording-learnings)
|
|
270
|
+
|
|
149
271
|
---
|
|
150
272
|
|
|
151
273
|
## Project Ideation Workflow
|
|
@@ -164,6 +286,8 @@ This returns a 6-step process:
|
|
|
164
286
|
5. Plan Metrics
|
|
165
287
|
6. Gate Approval
|
|
166
288
|
|
|
289
|
+
**→ Quick Refs:** Research tools: `web_search`, `search_github`, `analyze_repo` | Record requirements: `log_recon_finding` | Create baseline: `start_eval_run` | See [Web Research](#web-research-workflow)
|
|
290
|
+
|
|
167
291
|
---
|
|
168
292
|
|
|
169
293
|
## Closed Loop Principle
|
|
@@ -178,6 +302,8 @@ The loop:
|
|
|
178
302
|
|
|
179
303
|
Only when all green: present to user.
|
|
180
304
|
|
|
305
|
+
**→ Quick Refs:** Track loop: `run_closed_loop({ ... })` | Part of flywheel Steps 1-5 | See [AI Flywheel](#the-ai-flywheel-mandatory) | After loop: See [Post-Implementation Checklist](#post-implementation-checklist)
|
|
306
|
+
|
|
181
307
|
---
|
|
182
308
|
|
|
183
309
|
## Environment Setup
|
|
@@ -193,6 +319,8 @@ Returns:
|
|
|
193
319
|
- Recommended SDK installations
|
|
194
320
|
- Actionable next steps
|
|
195
321
|
|
|
322
|
+
**→ Quick Refs:** After setup: `getMethodology("overview")` | Check vision: `discover_vision_env()` | See [API Keys](#api-keys-optional) | Then: See [Verification Cycle](#verification-cycle-workflow)
|
|
323
|
+
|
|
196
324
|
---
|
|
197
325
|
|
|
198
326
|
## API Keys (Optional)
|
|
@@ -206,6 +334,22 @@ Set these for enhanced functionality:
|
|
|
206
334
|
| `GITHUB_TOKEN` | Higher rate limits (5000/hr vs 60/hr) |
|
|
207
335
|
| `ANTHROPIC_API_KEY` | Alternative vision provider |
|
|
208
336
|
|
|
337
|
+
**→ Quick Refs:** Check what's available: `setup_local_env({ checkSdks: true })` | Vision capabilities: `discover_vision_env()` | See [Environment Setup](#environment-setup)
|
|
338
|
+
|
|
339
|
+
---
|
|
340
|
+
|
|
341
|
+
## Vision Analysis
|
|
342
|
+
|
|
343
|
+
For UI/UX verification:
|
|
344
|
+
|
|
345
|
+
```
|
|
346
|
+
1. capture_ui_screenshot({ url: "http://localhost:3000", viewport: "desktop" })
|
|
347
|
+
2. analyze_screenshot({ imageBase64: "...", prompt: "Check accessibility" })
|
|
348
|
+
3. capture_responsive_suite({ url: "...", label: "homepage" })
|
|
349
|
+
```
|
|
350
|
+
|
|
351
|
+
**→ Quick Refs:** Check capabilities: `discover_vision_env()` | UI QA methodology: `getMethodology({ topic: "ui_ux_qa" })` | Agentic vision: `getMethodology({ topic: "agentic_vision" })` | See [Quality Gates](#quality-gates)
|
|
352
|
+
|
|
209
353
|
---
|
|
210
354
|
|
|
211
355
|
## Post-Implementation Checklist
|
|
@@ -214,13 +358,15 @@ After every implementation, answer these 3 questions:
|
|
|
214
358
|
|
|
215
359
|
1. **MCP gaps?** — Were all relevant tools called? Any unexpected results?
|
|
216
360
|
2. **Implementation gaps?** — Dead code? Missing integrations? Hardcoded values?
|
|
217
|
-
3. **Flywheel complete?** — All
|
|
361
|
+
3. **Flywheel complete?** — All 7 steps passed including E2E test?
|
|
218
362
|
|
|
219
363
|
If any answer reveals a gap: fix it before proceeding.
|
|
220
364
|
|
|
365
|
+
**→ Quick Refs:** Run self-check: `run_self_maintenance({ scope: "quick" })` | Record learnings: `record_learning` | Update docs: `update_agents_md` | See [AI Flywheel](#the-ai-flywheel-mandatory)
|
|
366
|
+
|
|
221
367
|
---
|
|
222
368
|
|
|
223
|
-
## Agent Self-Bootstrap System
|
|
369
|
+
## Agent Self-Bootstrap System
|
|
224
370
|
|
|
225
371
|
For agents to self-configure and validate against authoritative sources.
|
|
226
372
|
|
|
@@ -288,27 +434,112 @@ Aggregates findings from multiple sources.
|
|
|
288
434
|
- https://www.langchain.com/langgraph
|
|
289
435
|
- https://modelcontextprotocol.io/specification/2025-11-25
|
|
290
436
|
|
|
437
|
+
**→ Quick Refs:** Full methodology: `getMethodology({ topic: "agent_bootstrap" })` | After bootstrap: See [Autonomous Maintenance](#autonomous-self-maintenance-system) | Before implementing: `assess_risk` | See [Triple Verification](#2-triple-verification-with-source-citations)
|
|
438
|
+
|
|
439
|
+
---
|
|
440
|
+
|
|
441
|
+
## Autonomous Self-Maintenance System
|
|
442
|
+
|
|
443
|
+
Aggressive autonomous self-management with risk-aware execution. Based on OpenClaw patterns and Ralph Wiggum stop-hooks.
|
|
444
|
+
|
|
445
|
+
### 1. Risk-Tiered Execution
|
|
446
|
+
|
|
447
|
+
Before any action, assess its risk tier:
|
|
448
|
+
|
|
449
|
+
```
|
|
450
|
+
assess_risk({ action: "push to remote" })
|
|
451
|
+
```
|
|
452
|
+
|
|
453
|
+
Risk tiers:
|
|
454
|
+
- **Low**: Reading, analyzing, searching — auto-approve
|
|
455
|
+
- **Medium**: Writing local files, running tests — log and proceed
|
|
456
|
+
- **High**: Pushing to remote, posting externally — require confirmation
|
|
457
|
+
|
|
458
|
+
### 2. Re-Update Before Create
|
|
459
|
+
|
|
460
|
+
**CRITICAL:** Before creating new files, check if updating existing is better:
|
|
461
|
+
|
|
462
|
+
```
|
|
463
|
+
decide_re_update({
|
|
464
|
+
targetContent: "New agent instructions",
|
|
465
|
+
contentType: "instructions",
|
|
466
|
+
existingFiles: ["AGENTS.md", "README.md"]
|
|
467
|
+
})
|
|
468
|
+
```
|
|
469
|
+
|
|
470
|
+
This prevents file sprawl and maintains single source of truth.
|
|
471
|
+
|
|
472
|
+
### 3. Self-Maintenance Cycles
|
|
473
|
+
|
|
474
|
+
Run periodic self-checks:
|
|
475
|
+
|
|
476
|
+
```
|
|
477
|
+
run_self_maintenance({
|
|
478
|
+
scope: "standard", // quick | standard | thorough
|
|
479
|
+
autoFix: false,
|
|
480
|
+
dryRun: true
|
|
481
|
+
})
|
|
482
|
+
```
|
|
483
|
+
|
|
484
|
+
Checks: TypeScript compilation, documentation sync, tool counts, test coverage.
|
|
485
|
+
|
|
486
|
+
### 4. Directory Scaffolding (OpenClaw Style)
|
|
487
|
+
|
|
488
|
+
When adding infrastructure, use standardized scaffolding:
|
|
489
|
+
|
|
490
|
+
```
|
|
491
|
+
scaffold_directory({
|
|
492
|
+
component: "agent_loop", // or: telemetry, evaluation, multi_channel, etc.
|
|
493
|
+
includeTests: true,
|
|
494
|
+
dryRun: true
|
|
495
|
+
})
|
|
496
|
+
```
|
|
497
|
+
|
|
498
|
+
Creates organized subdirectories with proper test structure.
|
|
499
|
+
|
|
500
|
+
### 5. Autonomous Loops with Guardrails
|
|
501
|
+
|
|
502
|
+
For multi-step autonomous tasks, use controlled loops:
|
|
503
|
+
|
|
504
|
+
```
|
|
505
|
+
run_autonomous_loop({
|
|
506
|
+
goal: "Verify all tools pass static analysis",
|
|
507
|
+
maxIterations: 5,
|
|
508
|
+
maxDurationMs: 60000,
|
|
509
|
+
stopOnFirstFailure: true
|
|
510
|
+
})
|
|
511
|
+
```
|
|
512
|
+
|
|
513
|
+
Implements Ralph Wiggum pattern with checkpoints and stop conditions.
|
|
514
|
+
|
|
515
|
+
**→ Quick Refs:** Full methodology: `getMethodology({ topic: "autonomous_maintenance" })` | Before actions: `assess_risk` | Before new files: `decide_re_update` | Scaffold structure: `scaffold_directory` | See [Self-Bootstrap](#agent-self-bootstrap-system)
|
|
516
|
+
|
|
291
517
|
---
|
|
292
518
|
|
|
293
519
|
## Methodology Topics
|
|
294
520
|
|
|
295
521
|
Available via `getMethodology({ topic: "..." })`:
|
|
296
522
|
|
|
297
|
-
|
|
298
|
-
|
|
299
|
-
|
|
300
|
-
|
|
301
|
-
|
|
302
|
-
|
|
303
|
-
|
|
304
|
-
|
|
305
|
-
|
|
306
|
-
|
|
307
|
-
|
|
308
|
-
|
|
309
|
-
|
|
310
|
-
|
|
311
|
-
|
|
523
|
+
| Topic | Description | Quick Ref |
|
|
524
|
+
|-------|-------------|-----------|
|
|
525
|
+
| `overview` | See all methodologies | Start here |
|
|
526
|
+
| `verification` | 6-phase development cycle | [AI Flywheel](#the-ai-flywheel-mandatory) |
|
|
527
|
+
| `eval` | Test case management | [Quality Gates](#quality-gates) |
|
|
528
|
+
| `flywheel` | Continuous improvement loop | [AI Flywheel](#the-ai-flywheel-mandatory) |
|
|
529
|
+
| `mandatory_flywheel` | Required verification for changes | [AI Flywheel](#the-ai-flywheel-mandatory) |
|
|
530
|
+
| `reconnaissance` | Codebase discovery | [Self-Bootstrap](#agent-self-bootstrap-system) |
|
|
531
|
+
| `quality_gates` | Pass/fail checkpoints | [Quality Gates](#quality-gates) |
|
|
532
|
+
| `ui_ux_qa` | Frontend verification | [Vision Analysis](#vision-analysis) |
|
|
533
|
+
| `agentic_vision` | AI-powered visual QA | [Vision Analysis](#vision-analysis) |
|
|
534
|
+
| `closed_loop` | Build/test before presenting | [Closed Loop](#closed-loop-principle) |
|
|
535
|
+
| `learnings` | Knowledge persistence | [Recording Learnings](#recording-learnings) |
|
|
536
|
+
| `project_ideation` | Validate ideas before building | [Project Ideation](#project-ideation-workflow) |
|
|
537
|
+
| `tech_stack_2026` | Dependency management | [Environment Setup](#environment-setup) |
|
|
538
|
+
| `agents_md_maintenance` | Keep docs in sync | [Auto-Update](#auto-update-this-file) |
|
|
539
|
+
| `agent_bootstrap` | Self-discover, triple verify | [Self-Bootstrap](#agent-self-bootstrap-system) |
|
|
540
|
+
| `autonomous_maintenance` | Risk-tiered execution | [Autonomous Maintenance](#autonomous-self-maintenance-system) |
|
|
541
|
+
|
|
542
|
+
**→ Quick Refs:** Find tools: `findTools({ query: "..." })` | Get any methodology: `getMethodology({ topic: "..." })` | See [MCP Tool Categories](#mcp-tool-categories)
|
|
312
543
|
|
|
313
544
|
---
|
|
314
545
|
|
|
@@ -330,6 +561,8 @@ Or read current structure:
|
|
|
330
561
|
update_agents_md({ operation: "read", projectRoot: "/path/to/project" })
|
|
331
562
|
```
|
|
332
563
|
|
|
564
|
+
**→ Quick Refs:** Before updating: `decide_re_update({ contentType: "instructions", ... })` | After updating: Run flywheel Steps 1-7 | See [Re-Update Before Create](#2-re-update-before-create)
|
|
565
|
+
|
|
333
566
|
---
|
|
334
567
|
|
|
335
568
|
## License
|
package/README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# NodeBench MCP Server
|
|
2
2
|
|
|
3
|
-
A fully local, zero-config MCP server with
|
|
3
|
+
A fully local, zero-config MCP server with **60 tools** for AI-powered development workflows.
|
|
4
4
|
|
|
5
5
|
**Features:**
|
|
6
6
|
- Web search (Gemini/OpenAI/Perplexity)
|
|
@@ -9,13 +9,23 @@ A fully local, zero-config MCP server with 46 tools for AI-powered development w
|
|
|
9
9
|
- AGENTS.md self-maintenance
|
|
10
10
|
- AI vision for screenshot analysis
|
|
11
11
|
- 6-phase verification flywheel
|
|
12
|
+
- Self-reinforced learning (trajectory analysis, health reports, improvement recommendations)
|
|
13
|
+
- Autonomous agent bootstrap and self-maintenance
|
|
12
14
|
- SQLite-backed learning database
|
|
13
15
|
|
|
14
|
-
## Quick Start (
|
|
16
|
+
## Quick Start (30 seconds)
|
|
15
17
|
|
|
16
|
-
###
|
|
18
|
+
### Option A: Claude Code CLI (recommended)
|
|
17
19
|
|
|
18
|
-
|
|
20
|
+
```bash
|
|
21
|
+
claude mcp add nodebench -- npx -y nodebench-mcp
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
That's it. One command, 60 tools. No restart needed.
|
|
25
|
+
|
|
26
|
+
### Option B: Manual config
|
|
27
|
+
|
|
28
|
+
Add to `~/.claude/settings.json` (global) or `.claude.json` (per-project):
|
|
19
29
|
|
|
20
30
|
```json
|
|
21
31
|
{
|
|
@@ -28,7 +38,7 @@ Add to `~/.claude/settings.json`:
|
|
|
28
38
|
}
|
|
29
39
|
```
|
|
30
40
|
|
|
31
|
-
|
|
41
|
+
Then restart Claude Code.
|
|
32
42
|
|
|
33
43
|
---
|
|
34
44
|
|
|
@@ -114,11 +124,14 @@ In Claude Code, try these prompts:
|
|
|
114
124
|
| **Learning** | `record_learning`, `search_learnings`, `search_all_knowledge` | Persistent knowledge base |
|
|
115
125
|
| **Flywheel** | `run_closed_loop`, `check_framework_updates` | Automated workflows |
|
|
116
126
|
| **Recon** | `run_recon`, `log_recon_finding`, `log_gap` | Discovery and gap tracking |
|
|
127
|
+
| **Agent Bootstrap** | `bootstrap_project`, `setup_local_env`, `triple_verify`, `self_implement` | Self-discover infrastructure, auto-configure |
|
|
128
|
+
| **Autonomous** | `assess_risk`, `decide_re_update`, `run_self_maintenance`, `run_autonomous_loop` | Risk-tiered autonomous execution |
|
|
129
|
+
| **Self-Eval** | `log_tool_call`, `get_trajectory_analysis`, `get_self_eval_report`, `get_improvement_recommendations` | Self-reinforced learning loop |
|
|
117
130
|
| **Meta** | `findTools`, `getMethodology` | Tool discovery, methodology guides |
|
|
118
131
|
|
|
119
132
|
---
|
|
120
133
|
|
|
121
|
-
## Methodology Topics (
|
|
134
|
+
## Methodology Topics (17 total)
|
|
122
135
|
|
|
123
136
|
Ask Claude: `Use getMethodology("topic_name")`
|
|
124
137
|
|
|
@@ -137,6 +150,34 @@ Ask Claude: `Use getMethodology("topic_name")`
|
|
|
137
150
|
- `tech_stack_2026` — Dependency management
|
|
138
151
|
- `telemetry_setup` — Observability setup
|
|
139
152
|
- `agents_md_maintenance` — Keep docs in sync
|
|
153
|
+
- `agent_bootstrap` — Self-discover and auto-configure infrastructure
|
|
154
|
+
- `autonomous_maintenance` — Risk-tiered autonomous execution
|
|
155
|
+
- `self_reinforced_learning` — Trajectory analysis and improvement loop
|
|
156
|
+
|
|
157
|
+
---
|
|
158
|
+
|
|
159
|
+
## Self-Reinforced Learning (v1.4.0)
|
|
160
|
+
|
|
161
|
+
The MCP learns from its own usage. As you develop with the tools, the system accumulates trajectory data and surfaces recommendations.
|
|
162
|
+
|
|
163
|
+
```
|
|
164
|
+
Use → Log → Analyze → Recommend → Apply → Re-analyze
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
**Try it:**
|
|
168
|
+
```
|
|
169
|
+
> Use getMethodology("self_reinforced_learning") for the 5-step guide
|
|
170
|
+
> Use get_self_eval_report to see your project's health score
|
|
171
|
+
> Use get_improvement_recommendations to find actionable improvements
|
|
172
|
+
> Use get_trajectory_analysis to see your tool usage patterns
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
The health score is a weighted composite:
|
|
176
|
+
- Cycle completion (25%) — Are verification cycles being completed?
|
|
177
|
+
- Eval pass rate (25%) — Are eval runs succeeding?
|
|
178
|
+
- Gap resolution (20%) — Are logged gaps getting resolved?
|
|
179
|
+
- Gate pass rate (15%) — Are quality gates passing?
|
|
180
|
+
- Tool error rate (15%) — Are tools running without errors?
|
|
140
181
|
|
|
141
182
|
---
|
|
142
183
|
|