nodebench-mcp 1.1.0 → 1.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/NODEBENCH_AGENTS.md +323 -18
- package/STYLE_GUIDE.md +477 -0
- package/dist/__tests__/evalDatasetBench.test.d.ts +1 -0
- package/dist/__tests__/evalDatasetBench.test.js +738 -0
- package/dist/__tests__/evalDatasetBench.test.js.map +1 -0
- package/dist/__tests__/evalHarness.test.d.ts +1 -0
- package/dist/__tests__/evalHarness.test.js +830 -0
- package/dist/__tests__/evalHarness.test.js.map +1 -0
- package/dist/__tests__/fixtures/bfcl_v3_long_context.sample.json +264 -0
- package/dist/__tests__/fixtures/generateBfclLongContextFixture.d.ts +10 -0
- package/dist/__tests__/fixtures/generateBfclLongContextFixture.js +135 -0
- package/dist/__tests__/fixtures/generateBfclLongContextFixture.js.map +1 -0
- package/dist/__tests__/fixtures/generateSwebenchVerifiedFixture.d.ts +14 -0
- package/dist/__tests__/fixtures/generateSwebenchVerifiedFixture.js +189 -0
- package/dist/__tests__/fixtures/generateSwebenchVerifiedFixture.js.map +1 -0
- package/dist/__tests__/fixtures/generateToolbenchInstructionFixture.d.ts +16 -0
- package/dist/__tests__/fixtures/generateToolbenchInstructionFixture.js +154 -0
- package/dist/__tests__/fixtures/generateToolbenchInstructionFixture.js.map +1 -0
- package/dist/__tests__/fixtures/swebench_verified.sample.json +162 -0
- package/dist/__tests__/fixtures/toolbench_instruction.sample.json +109 -0
- package/dist/__tests__/openDatasetParallelEval.test.d.ts +7 -0
- package/dist/__tests__/openDatasetParallelEval.test.js +209 -0
- package/dist/__tests__/openDatasetParallelEval.test.js.map +1 -0
- package/dist/__tests__/openDatasetParallelEvalSwebench.test.d.ts +7 -0
- package/dist/__tests__/openDatasetParallelEvalSwebench.test.js +220 -0
- package/dist/__tests__/openDatasetParallelEvalSwebench.test.js.map +1 -0
- package/dist/__tests__/openDatasetParallelEvalToolbench.test.d.ts +7 -0
- package/dist/__tests__/openDatasetParallelEvalToolbench.test.js +218 -0
- package/dist/__tests__/openDatasetParallelEvalToolbench.test.js.map +1 -0
- package/dist/__tests__/tools.test.js +256 -4
- package/dist/__tests__/tools.test.js.map +1 -1
- package/dist/db.js +20 -0
- package/dist/db.js.map +1 -1
- package/dist/index.js +4 -0
- package/dist/index.js.map +1 -1
- package/dist/tools/agentBootstrapTools.d.ts +23 -0
- package/dist/tools/agentBootstrapTools.js +1541 -0
- package/dist/tools/agentBootstrapTools.js.map +1 -0
- package/dist/tools/documentationTools.js +102 -8
- package/dist/tools/documentationTools.js.map +1 -1
- package/dist/tools/learningTools.js +6 -2
- package/dist/tools/learningTools.js.map +1 -1
- package/dist/tools/metaTools.js +176 -1
- package/dist/tools/metaTools.js.map +1 -1
- package/dist/tools/selfEvalTools.d.ts +12 -0
- package/dist/tools/selfEvalTools.js +568 -0
- package/dist/tools/selfEvalTools.js.map +1 -0
- package/package.json +11 -3
package/NODEBENCH_AGENTS.md
CHANGED
|
@@ -21,7 +21,9 @@ Add to `~/.claude/settings.json`:
|
|
|
21
21
|
}
|
|
22
22
|
```
|
|
23
23
|
|
|
24
|
-
Restart Claude Code.
|
|
24
|
+
Restart Claude Code. 56 tools available immediately.
|
|
25
|
+
|
|
26
|
+
**→ Quick Refs:** After setup, run `getMethodology("overview")` | First task? See [Verification Cycle](#verification-cycle-workflow) | New to codebase? See [Environment Setup](#environment-setup)
|
|
25
27
|
|
|
26
28
|
---
|
|
27
29
|
|
|
@@ -51,10 +53,119 @@ Review the code for:
|
|
|
51
53
|
### Step 5: Fix and Re-Verify
|
|
52
54
|
If any gap found: fix it, then restart from Step 1.
|
|
53
55
|
|
|
54
|
-
### Step 6:
|
|
56
|
+
### Step 6: Live E2E Test (MANDATORY)
|
|
57
|
+
**Before declaring done or publishing:**
|
|
58
|
+
```bash
|
|
59
|
+
echo '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"YOUR_TOOL","arguments":{...}}}' | node dist/index.js
|
|
60
|
+
```
|
|
61
|
+
Every new/modified tool MUST pass stdio E2E test. No exceptions.
|
|
62
|
+
|
|
63
|
+
For workflow-level changes (verification, eval, recon, quality gates, flywheel, or knowledge tools), also run the long-running open-source benchmark:
|
|
64
|
+
```bash
|
|
65
|
+
npm --prefix packages/mcp-local run dataset:bfcl:refresh
|
|
66
|
+
NODEBENCH_OPEN_DATASET_TASK_LIMIT=12 NODEBENCH_OPEN_DATASET_CONCURRENCY=6 npm --prefix packages/mcp-local run test:open-dataset
|
|
67
|
+
npm --prefix packages/mcp-local run dataset:toolbench:refresh
|
|
68
|
+
NODEBENCH_TOOLBENCH_TASK_LIMIT=6 NODEBENCH_TOOLBENCH_CONCURRENCY=3 npm --prefix packages/mcp-local run test:open-dataset:toolbench
|
|
69
|
+
npm --prefix packages/mcp-local run dataset:swebench:refresh
|
|
70
|
+
NODEBENCH_SWEBENCH_TASK_LIMIT=8 NODEBENCH_SWEBENCH_CONCURRENCY=4 npm --prefix packages/mcp-local run test:open-dataset:swebench
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
### Step 7: Document Learnings
|
|
55
74
|
Record edge cases discovered. Update this file if needed.
|
|
56
75
|
|
|
57
|
-
**Rule: No change ships without passing all
|
|
76
|
+
**Rule: No change ships without passing all 7 steps.**
|
|
77
|
+
|
|
78
|
+
**→ Quick Refs:** Track progress with `start_verification_cycle` | Record findings with `record_learning` | Run gate with `run_quality_gate` | See [Post-Implementation Checklist](#post-implementation-checklist)
|
|
79
|
+
|
|
80
|
+
---
|
|
81
|
+
|
|
82
|
+
## Open-Source Long-Running MCP Benchmark
|
|
83
|
+
|
|
84
|
+
Use open-source long-context tasks to validate real orchestration behavior under parallel load.
|
|
85
|
+
|
|
86
|
+
- Dataset: `gorilla-llm/Berkeley-Function-Calling-Leaderboard`
|
|
87
|
+
- Split: `BFCL_v3_multi_turn_long_context`
|
|
88
|
+
- Source: `https://huggingface.co/datasets/gorilla-llm/Berkeley-Function-Calling-Leaderboard`
|
|
89
|
+
|
|
90
|
+
Refresh local fixture:
|
|
91
|
+
```bash
|
|
92
|
+
npm run mcp:dataset:refresh
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
Run parallel subagent benchmark:
|
|
96
|
+
```bash
|
|
97
|
+
NODEBENCH_OPEN_DATASET_TASK_LIMIT=12 NODEBENCH_OPEN_DATASET_CONCURRENCY=6 npm run mcp:dataset:test
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
Run refresh + benchmark in one shot:
|
|
101
|
+
```bash
|
|
102
|
+
npm run mcp:dataset:bench
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
Second lane (ToolBench multi-tool instructions):
|
|
106
|
+
- Dataset: `OpenBMB/ToolBench`
|
|
107
|
+
- Split: `data_example/instruction (G1,G2,G3)`
|
|
108
|
+
- Source: `https://github.com/OpenBMB/ToolBench`
|
|
109
|
+
|
|
110
|
+
Refresh ToolBench fixture:
|
|
111
|
+
```bash
|
|
112
|
+
npm run mcp:dataset:toolbench:refresh
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
Run ToolBench parallel subagent benchmark:
|
|
116
|
+
```bash
|
|
117
|
+
NODEBENCH_TOOLBENCH_TASK_LIMIT=6 NODEBENCH_TOOLBENCH_CONCURRENCY=3 npm run mcp:dataset:toolbench:test
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
Run all lanes:
|
|
121
|
+
```bash
|
|
122
|
+
npm run mcp:dataset:bench:all
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
Third lane (SWE-bench Verified long-horizon software tasks):
|
|
126
|
+
- Dataset: `princeton-nlp/SWE-bench_Verified`
|
|
127
|
+
- Split: `test`
|
|
128
|
+
- Source: `https://huggingface.co/datasets/princeton-nlp/SWE-bench_Verified`
|
|
129
|
+
|
|
130
|
+
Refresh SWE-bench fixture:
|
|
131
|
+
```bash
|
|
132
|
+
npm run mcp:dataset:swebench:refresh
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
Run SWE-bench parallel subagent benchmark:
|
|
136
|
+
```bash
|
|
137
|
+
NODEBENCH_SWEBENCH_TASK_LIMIT=8 NODEBENCH_SWEBENCH_CONCURRENCY=4 npm run mcp:dataset:swebench:test
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
Run all lanes:
|
|
141
|
+
```bash
|
|
142
|
+
npm run mcp:dataset:bench:all
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
Implementation files:
|
|
146
|
+
- `packages/mcp-local/src/__tests__/fixtures/generateBfclLongContextFixture.ts`
|
|
147
|
+
- `packages/mcp-local/src/__tests__/fixtures/bfcl_v3_long_context.sample.json`
|
|
148
|
+
- `packages/mcp-local/src/__tests__/openDatasetParallelEval.test.ts`
|
|
149
|
+
- `packages/mcp-local/src/__tests__/fixtures/generateToolbenchInstructionFixture.ts`
|
|
150
|
+
- `packages/mcp-local/src/__tests__/fixtures/toolbench_instruction.sample.json`
|
|
151
|
+
- `packages/mcp-local/src/__tests__/openDatasetParallelEvalToolbench.test.ts`
|
|
152
|
+
- `packages/mcp-local/src/__tests__/fixtures/generateSwebenchVerifiedFixture.ts`
|
|
153
|
+
- `packages/mcp-local/src/__tests__/fixtures/swebench_verified.sample.json`
|
|
154
|
+
- `packages/mcp-local/src/__tests__/openDatasetParallelEvalSwebench.test.ts`
|
|
155
|
+
|
|
156
|
+
Required tool chain per dataset task:
|
|
157
|
+
- `run_recon`
|
|
158
|
+
- `log_recon_finding`
|
|
159
|
+
- `findTools`
|
|
160
|
+
- `getMethodology`
|
|
161
|
+
- `start_eval_run`
|
|
162
|
+
- `record_eval_result`
|
|
163
|
+
- `complete_eval_run`
|
|
164
|
+
- `run_closed_loop`
|
|
165
|
+
- `run_mandatory_flywheel`
|
|
166
|
+
- `search_all_knowledge`
|
|
167
|
+
|
|
168
|
+
**→ Quick Refs:** Core process in [AI Flywheel](#the-ai-flywheel-mandatory) | Verification flow in [Verification Cycle](#verification-cycle-workflow) | Loop discipline in [Closed Loop Principle](#closed-loop-principle)
|
|
58
169
|
|
|
59
170
|
---
|
|
60
171
|
|
|
@@ -71,8 +182,12 @@ Use `getMethodology("overview")` to see all available workflows.
|
|
|
71
182
|
| **Quality Gates** | `run_quality_gate`, `get_gate_history` | Pass/fail checkpoints |
|
|
72
183
|
| **Learning** | `record_learning`, `search_all_knowledge` | Persistent knowledge base |
|
|
73
184
|
| **Vision** | `analyze_screenshot`, `capture_ui_screenshot` | UI/UX verification |
|
|
185
|
+
| **Bootstrap** | `discover_infrastructure`, `triple_verify`, `self_implement` | Self-setup, triple verification |
|
|
186
|
+
| **Autonomous** | `assess_risk`, `decide_re_update`, `run_self_maintenance` | Risk-aware execution, self-maintenance |
|
|
74
187
|
| **Meta** | `findTools`, `getMethodology` | Discover tools, get workflow guides |
|
|
75
188
|
|
|
189
|
+
**→ Quick Refs:** Find tools by keyword: `findTools({ query: "verification" })` | Get workflow guide: `getMethodology({ topic: "..." })` | See [Methodology Topics](#methodology-topics) for all topics
|
|
190
|
+
|
|
76
191
|
---
|
|
77
192
|
|
|
78
193
|
## Verification Cycle Workflow
|
|
@@ -92,6 +207,8 @@ If blocked or failed:
|
|
|
92
207
|
abandon_cycle({ reason: "Blocked by external dependency" })
|
|
93
208
|
```
|
|
94
209
|
|
|
210
|
+
**→ Quick Refs:** Before starting: `search_all_knowledge({ query: "your task" })` | After completing: `record_learning({ ... })` | Run flywheel: See [AI Flywheel](#the-ai-flywheel-mandatory) | Track quality: See [Quality Gates](#quality-gates)
|
|
211
|
+
|
|
95
212
|
---
|
|
96
213
|
|
|
97
214
|
## Recording Learnings
|
|
@@ -112,6 +229,8 @@ Search later with:
|
|
|
112
229
|
search_all_knowledge({ query: "convex index" })
|
|
113
230
|
```
|
|
114
231
|
|
|
232
|
+
**→ Quick Refs:** Search before implementing: `search_all_knowledge` | `search_learnings` and `list_learnings` are DEPRECATED | Part of flywheel Step 7 | See [Verification Cycle](#verification-cycle-workflow)
|
|
233
|
+
|
|
115
234
|
---
|
|
116
235
|
|
|
117
236
|
## Quality Gates
|
|
@@ -131,6 +250,8 @@ run_quality_gate({
|
|
|
131
250
|
|
|
132
251
|
Gate history tracks pass/fail over time.
|
|
133
252
|
|
|
253
|
+
**→ Quick Refs:** Get preset rules: `get_gate_preset({ preset: "ui_ux_qa" })` | View history: `get_gate_history({ gateName: "..." })` | UI/UX gates: See [Vision](#vision-analysis) | Part of flywheel Step 5 re-verify
|
|
254
|
+
|
|
134
255
|
---
|
|
135
256
|
|
|
136
257
|
## Web Research Workflow
|
|
@@ -145,6 +266,8 @@ For market research or tech evaluation:
|
|
|
145
266
|
5. record_learning({ ... }) // save key findings
|
|
146
267
|
```
|
|
147
268
|
|
|
269
|
+
**→ Quick Refs:** Analyze repo structure: `analyze_repo` | Save findings: `record_learning` | Part of: `getMethodology({ topic: "project_ideation" })` | See [Recording Learnings](#recording-learnings)
|
|
270
|
+
|
|
148
271
|
---
|
|
149
272
|
|
|
150
273
|
## Project Ideation Workflow
|
|
@@ -163,6 +286,8 @@ This returns a 6-step process:
|
|
|
163
286
|
5. Plan Metrics
|
|
164
287
|
6. Gate Approval
|
|
165
288
|
|
|
289
|
+
**→ Quick Refs:** Research tools: `web_search`, `search_github`, `analyze_repo` | Record requirements: `log_recon_finding` | Create baseline: `start_eval_run` | See [Web Research](#web-research-workflow)
|
|
290
|
+
|
|
166
291
|
---
|
|
167
292
|
|
|
168
293
|
## Closed Loop Principle
|
|
@@ -177,6 +302,8 @@ The loop:
|
|
|
177
302
|
|
|
178
303
|
Only when all green: present to user.
|
|
179
304
|
|
|
305
|
+
**→ Quick Refs:** Track loop: `run_closed_loop({ ... })` | Part of flywheel Steps 1-5 | See [AI Flywheel](#the-ai-flywheel-mandatory) | After loop: See [Post-Implementation Checklist](#post-implementation-checklist)
|
|
306
|
+
|
|
180
307
|
---
|
|
181
308
|
|
|
182
309
|
## Environment Setup
|
|
@@ -192,6 +319,8 @@ Returns:
|
|
|
192
319
|
- Recommended SDK installations
|
|
193
320
|
- Actionable next steps
|
|
194
321
|
|
|
322
|
+
**→ Quick Refs:** After setup: `getMethodology("overview")` | Check vision: `discover_vision_env()` | See [API Keys](#api-keys-optional) | Then: See [Verification Cycle](#verification-cycle-workflow)
|
|
323
|
+
|
|
195
324
|
---
|
|
196
325
|
|
|
197
326
|
## API Keys (Optional)
|
|
@@ -205,6 +334,22 @@ Set these for enhanced functionality:
|
|
|
205
334
|
| `GITHUB_TOKEN` | Higher rate limits (5000/hr vs 60/hr) |
|
|
206
335
|
| `ANTHROPIC_API_KEY` | Alternative vision provider |
|
|
207
336
|
|
|
337
|
+
**→ Quick Refs:** Check what's available: `setup_local_env({ checkSdks: true })` | Vision capabilities: `discover_vision_env()` | See [Environment Setup](#environment-setup)
|
|
338
|
+
|
|
339
|
+
---
|
|
340
|
+
|
|
341
|
+
## Vision Analysis
|
|
342
|
+
|
|
343
|
+
For UI/UX verification:
|
|
344
|
+
|
|
345
|
+
```
|
|
346
|
+
1. capture_ui_screenshot({ url: "http://localhost:3000", viewport: "desktop" })
|
|
347
|
+
2. analyze_screenshot({ imageBase64: "...", prompt: "Check accessibility" })
|
|
348
|
+
3. capture_responsive_suite({ url: "...", label: "homepage" })
|
|
349
|
+
```
|
|
350
|
+
|
|
351
|
+
**→ Quick Refs:** Check capabilities: `discover_vision_env()` | UI QA methodology: `getMethodology({ topic: "ui_ux_qa" })` | Agentic vision: `getMethodology({ topic: "agentic_vision" })` | See [Quality Gates](#quality-gates)
|
|
352
|
+
|
|
208
353
|
---
|
|
209
354
|
|
|
210
355
|
## Post-Implementation Checklist
|
|
@@ -213,30 +358,188 @@ After every implementation, answer these 3 questions:
|
|
|
213
358
|
|
|
214
359
|
1. **MCP gaps?** — Were all relevant tools called? Any unexpected results?
|
|
215
360
|
2. **Implementation gaps?** — Dead code? Missing integrations? Hardcoded values?
|
|
216
|
-
3. **Flywheel complete?** — All
|
|
361
|
+
3. **Flywheel complete?** — All 7 steps passed including E2E test?
|
|
217
362
|
|
|
218
363
|
If any answer reveals a gap: fix it before proceeding.
|
|
219
364
|
|
|
365
|
+
**→ Quick Refs:** Run self-check: `run_self_maintenance({ scope: "quick" })` | Record learnings: `record_learning` | Update docs: `update_agents_md` | See [AI Flywheel](#the-ai-flywheel-mandatory)
|
|
366
|
+
|
|
367
|
+
---
|
|
368
|
+
|
|
369
|
+
## Agent Self-Bootstrap System
|
|
370
|
+
|
|
371
|
+
For agents to self-configure and validate against authoritative sources.
|
|
372
|
+
|
|
373
|
+
### 1. Discover Existing Infrastructure
|
|
374
|
+
```
|
|
375
|
+
discover_infrastructure({
|
|
376
|
+
categories: ["agent_loop", "telemetry", "evaluation", "verification"],
|
|
377
|
+
depth: "thorough"
|
|
378
|
+
})
|
|
379
|
+
```
|
|
380
|
+
|
|
381
|
+
Returns: discovered patterns, missing components, bootstrap plan.
|
|
382
|
+
|
|
383
|
+
### 2. Triple Verification (with Source Citations)
|
|
384
|
+
|
|
385
|
+
Run 3-layer verification with authoritative sources:
|
|
386
|
+
|
|
387
|
+
```
|
|
388
|
+
triple_verify({
|
|
389
|
+
target: "my-feature",
|
|
390
|
+
scope: "full",
|
|
391
|
+
includeWebSearch: true,
|
|
392
|
+
generateInstructions: true
|
|
393
|
+
})
|
|
394
|
+
```
|
|
395
|
+
|
|
396
|
+
**V1: Internal Analysis** — Checks codebase patterns
|
|
397
|
+
**V2: External Validation** — Cross-references Anthropic, OpenAI, LangChain, MCP spec
|
|
398
|
+
**V3: Synthesis** — Generates recommendations with source citations
|
|
399
|
+
|
|
400
|
+
### 3. Self-Implement Missing Components
|
|
401
|
+
```
|
|
402
|
+
self_implement({
|
|
403
|
+
component: "telemetry", // or: agent_loop, evaluation, verification, multi_channel
|
|
404
|
+
dryRun: true
|
|
405
|
+
})
|
|
406
|
+
```
|
|
407
|
+
|
|
408
|
+
Generates production-ready templates based on industry patterns.
|
|
409
|
+
|
|
410
|
+
### 4. Generate Self-Instructions
|
|
411
|
+
```
|
|
412
|
+
generate_self_instructions({
|
|
413
|
+
format: "skills_md", // or: rules_md, guidelines, claude_md
|
|
414
|
+
includeExternalSources: true
|
|
415
|
+
})
|
|
416
|
+
```
|
|
417
|
+
|
|
418
|
+
Creates persistent instructions with authoritative source citations.
|
|
419
|
+
|
|
420
|
+
### 5. Multi-Channel Information Gathering
|
|
421
|
+
```
|
|
422
|
+
connect_channels({
|
|
423
|
+
channels: ["web", "github", "slack", "docs"],
|
|
424
|
+
query: "agent verification patterns",
|
|
425
|
+
aggressive: true
|
|
426
|
+
})
|
|
427
|
+
```
|
|
428
|
+
|
|
429
|
+
Aggregates findings from multiple sources.
|
|
430
|
+
|
|
431
|
+
### Authoritative Sources (Tier 1)
|
|
432
|
+
- https://www.anthropic.com/research/building-effective-agents
|
|
433
|
+
- https://openai.github.io/openai-agents-python/
|
|
434
|
+
- https://www.langchain.com/langgraph
|
|
435
|
+
- https://modelcontextprotocol.io/specification/2025-11-25
|
|
436
|
+
|
|
437
|
+
**→ Quick Refs:** Full methodology: `getMethodology({ topic: "agent_bootstrap" })` | After bootstrap: See [Autonomous Maintenance](#autonomous-self-maintenance-system) | Before implementing: `assess_risk` | See [Triple Verification](#2-triple-verification-with-source-citations)
|
|
438
|
+
|
|
439
|
+
---
|
|
440
|
+
|
|
441
|
+
## Autonomous Self-Maintenance System
|
|
442
|
+
|
|
443
|
+
Aggressive autonomous self-management with risk-aware execution. Based on OpenClaw patterns and Ralph Wiggum stop-hooks.
|
|
444
|
+
|
|
445
|
+
### 1. Risk-Tiered Execution
|
|
446
|
+
|
|
447
|
+
Before any action, assess its risk tier:
|
|
448
|
+
|
|
449
|
+
```
|
|
450
|
+
assess_risk({ action: "push to remote" })
|
|
451
|
+
```
|
|
452
|
+
|
|
453
|
+
Risk tiers:
|
|
454
|
+
- **Low**: Reading, analyzing, searching — auto-approve
|
|
455
|
+
- **Medium**: Writing local files, running tests — log and proceed
|
|
456
|
+
- **High**: Pushing to remote, posting externally — require confirmation
|
|
457
|
+
|
|
458
|
+
### 2. Re-Update Before Create
|
|
459
|
+
|
|
460
|
+
**CRITICAL:** Before creating new files, check if updating existing is better:
|
|
461
|
+
|
|
462
|
+
```
|
|
463
|
+
decide_re_update({
|
|
464
|
+
targetContent: "New agent instructions",
|
|
465
|
+
contentType: "instructions",
|
|
466
|
+
existingFiles: ["AGENTS.md", "README.md"]
|
|
467
|
+
})
|
|
468
|
+
```
|
|
469
|
+
|
|
470
|
+
This prevents file sprawl and maintains single source of truth.
|
|
471
|
+
|
|
472
|
+
### 3. Self-Maintenance Cycles
|
|
473
|
+
|
|
474
|
+
Run periodic self-checks:
|
|
475
|
+
|
|
476
|
+
```
|
|
477
|
+
run_self_maintenance({
|
|
478
|
+
scope: "standard", // quick | standard | thorough
|
|
479
|
+
autoFix: false,
|
|
480
|
+
dryRun: true
|
|
481
|
+
})
|
|
482
|
+
```
|
|
483
|
+
|
|
484
|
+
Checks: TypeScript compilation, documentation sync, tool counts, test coverage.
|
|
485
|
+
|
|
486
|
+
### 4. Directory Scaffolding (OpenClaw Style)
|
|
487
|
+
|
|
488
|
+
When adding infrastructure, use standardized scaffolding:
|
|
489
|
+
|
|
490
|
+
```
|
|
491
|
+
scaffold_directory({
|
|
492
|
+
component: "agent_loop", // or: telemetry, evaluation, multi_channel, etc.
|
|
493
|
+
includeTests: true,
|
|
494
|
+
dryRun: true
|
|
495
|
+
})
|
|
496
|
+
```
|
|
497
|
+
|
|
498
|
+
Creates organized subdirectories with proper test structure.
|
|
499
|
+
|
|
500
|
+
### 5. Autonomous Loops with Guardrails
|
|
501
|
+
|
|
502
|
+
For multi-step autonomous tasks, use controlled loops:
|
|
503
|
+
|
|
504
|
+
```
|
|
505
|
+
run_autonomous_loop({
|
|
506
|
+
goal: "Verify all tools pass static analysis",
|
|
507
|
+
maxIterations: 5,
|
|
508
|
+
maxDurationMs: 60000,
|
|
509
|
+
stopOnFirstFailure: true
|
|
510
|
+
})
|
|
511
|
+
```
|
|
512
|
+
|
|
513
|
+
Implements Ralph Wiggum pattern with checkpoints and stop conditions.
|
|
514
|
+
|
|
515
|
+
**→ Quick Refs:** Full methodology: `getMethodology({ topic: "autonomous_maintenance" })` | Before actions: `assess_risk` | Before new files: `decide_re_update` | Scaffold structure: `scaffold_directory` | See [Self-Bootstrap](#agent-self-bootstrap-system)
|
|
516
|
+
|
|
220
517
|
---
|
|
221
518
|
|
|
222
519
|
## Methodology Topics
|
|
223
520
|
|
|
224
521
|
Available via `getMethodology({ topic: "..." })`:
|
|
225
522
|
|
|
226
|
-
|
|
227
|
-
|
|
228
|
-
|
|
229
|
-
|
|
230
|
-
|
|
231
|
-
|
|
232
|
-
|
|
233
|
-
|
|
234
|
-
|
|
235
|
-
|
|
236
|
-
|
|
237
|
-
|
|
238
|
-
|
|
239
|
-
|
|
523
|
+
| Topic | Description | Quick Ref |
|
|
524
|
+
|-------|-------------|-----------|
|
|
525
|
+
| `overview` | See all methodologies | Start here |
|
|
526
|
+
| `verification` | 6-phase development cycle | [AI Flywheel](#the-ai-flywheel-mandatory) |
|
|
527
|
+
| `eval` | Test case management | [Quality Gates](#quality-gates) |
|
|
528
|
+
| `flywheel` | Continuous improvement loop | [AI Flywheel](#the-ai-flywheel-mandatory) |
|
|
529
|
+
| `mandatory_flywheel` | Required verification for changes | [AI Flywheel](#the-ai-flywheel-mandatory) |
|
|
530
|
+
| `reconnaissance` | Codebase discovery | [Self-Bootstrap](#agent-self-bootstrap-system) |
|
|
531
|
+
| `quality_gates` | Pass/fail checkpoints | [Quality Gates](#quality-gates) |
|
|
532
|
+
| `ui_ux_qa` | Frontend verification | [Vision Analysis](#vision-analysis) |
|
|
533
|
+
| `agentic_vision` | AI-powered visual QA | [Vision Analysis](#vision-analysis) |
|
|
534
|
+
| `closed_loop` | Build/test before presenting | [Closed Loop](#closed-loop-principle) |
|
|
535
|
+
| `learnings` | Knowledge persistence | [Recording Learnings](#recording-learnings) |
|
|
536
|
+
| `project_ideation` | Validate ideas before building | [Project Ideation](#project-ideation-workflow) |
|
|
537
|
+
| `tech_stack_2026` | Dependency management | [Environment Setup](#environment-setup) |
|
|
538
|
+
| `agents_md_maintenance` | Keep docs in sync | [Auto-Update](#auto-update-this-file) |
|
|
539
|
+
| `agent_bootstrap` | Self-discover, triple verify | [Self-Bootstrap](#agent-self-bootstrap-system) |
|
|
540
|
+
| `autonomous_maintenance` | Risk-tiered execution | [Autonomous Maintenance](#autonomous-self-maintenance-system) |
|
|
541
|
+
|
|
542
|
+
**→ Quick Refs:** Find tools: `findTools({ query: "..." })` | Get any methodology: `getMethodology({ topic: "..." })` | See [MCP Tool Categories](#mcp-tool-categories)
|
|
240
543
|
|
|
241
544
|
---
|
|
242
545
|
|
|
@@ -258,6 +561,8 @@ Or read current structure:
|
|
|
258
561
|
update_agents_md({ operation: "read", projectRoot: "/path/to/project" })
|
|
259
562
|
```
|
|
260
563
|
|
|
564
|
+
**→ Quick Refs:** Before updating: `decide_re_update({ contentType: "instructions", ... })` | After updating: Run flywheel Steps 1-7 | See [Re-Update Before Create](#2-re-update-before-create)
|
|
565
|
+
|
|
261
566
|
---
|
|
262
567
|
|
|
263
568
|
## License
|