npm - nodebench-mcp - Versions diffs - 2.10.0 → 2.11.0 - Mend

nodebench-mcp 2.10.0 → 2.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

package/NODEBENCH_AGENTS.md +86 -3
package/README.md +19 -3
package/dist/__tests__/toolsetGatingEval.test.js +67 -33
package/dist/__tests__/toolsetGatingEval.test.js.map +1 -1
package/dist/index.js +5 -4
package/dist/index.js.map +1 -1
package/dist/tools/localFileTools.js +207 -0
package/dist/tools/localFileTools.js.map +1 -1
package/package.json +2 -2

package/NODEBENCH_AGENTS.md CHANGED Viewed

@@ -21,9 +21,26 @@ Add to `~/.claude/settings.json`:
 }
 ```
-Restart Claude Code. 89 tools available immediately.
+Restart Claude Code. 89+ tools available immediately.
-**→ Quick Refs:** After setup, run `getMethodology("overview")` | First task? See [Verification Cycle](#verification-cycle-workflow) | New to codebase? See [Environment Setup](#environment-setup)
+### Preset Selection
+By default all toolsets are enabled. Use `--preset` to start with a scoped subset:
+```json
+{
+  "mcpServers": {
+    "nodebench": {
+      "command": "npx",
+      "args": ["-y", "nodebench-mcp", "--preset", "meta"]
+    }
+  }
+}
+```
+The **meta** preset is the recommended front door for new agents: start with just 5 discovery tools, use `discover_tools` to find what you need, then self-escalate to a larger preset. See [Toolset Gating & Presets](#toolset-gating--presets) for the full breakdown.
+**→ Quick Refs:** After setup, run `getMethodology("overview")` | First task? See [Verification Cycle](#verification-cycle-workflow) | New to codebase? See [Environment Setup](#environment-setup) | Preset options: See [Toolset Gating & Presets](#toolset-gating--presets)
 ---
@@ -261,8 +278,73 @@ Use `getMethodology("overview")` to see all available workflows.
 | **Security** | `scan_dependencies`, `run_code_analysis` | Dependency auditing, static code analysis |
 | **Platform** | `query_daily_brief`, `query_funding_entities`, `query_research_queue`, `publish_to_queue` | Convex platform bridge: intelligence, funding, research, publishing |
 | **Meta** | `findTools`, `getMethodology` | Discover tools, get workflow guides |
+| **Discovery** | `discover_tools`, `get_tool_quick_ref`, `get_workflow_chain` | Hybrid search, quick refs, workflow chains |
+Meta + Discovery tools (5 total) are **always included** regardless of preset. See [Toolset Gating & Presets](#toolset-gating--presets).
+**→ Quick Refs:** Find tools by keyword: `findTools({ query: "verification" })` | Hybrid search: `discover_tools({ query: "security" })` | Get workflow guide: `getMethodology({ topic: "..." })` | See [Methodology Topics](#methodology-topics) for all topics
+---
+## Toolset Gating & Presets
+NodeBench MCP supports 4 presets that control which domain toolsets are loaded at startup. Meta + Discovery tools (5 total) are **always included** on top of any preset.
+### Preset Table
+| Preset | Domain Toolsets | Domain Tools | Total (with meta+discovery) | Use Case |
+|--------|----------------|-------------|----------------------------|----------|
+| **meta** | 0 | 0 | 5 | Discovery-only front door. Agents start here and self-escalate. |
+| **lite** | 7 | ~35 | ~40 | Lightweight verification-focused workflows. CI bots, quick checks. |
+| **core** | 16 | ~75 | ~80 | Full development workflow. Most agent sessions. |
+| **full** | all | 89+ | 94+ | Everything enabled. Benchmarking, exploration, advanced use. |
+### Usage
+```bash
+npx nodebench-mcp --preset meta       # Discovery-only (5 tools)
+npx nodebench-mcp --preset lite       # Verification + eval + recon + security
+npx nodebench-mcp --preset core       # Full dev workflow without vision/parallel
+npx nodebench-mcp --preset full       # All toolsets (default)
+npx nodebench-mcp --toolsets verification,eval,recon   # Custom selection
+npx nodebench-mcp --exclude vision,ui_capture          # Exclude specific toolsets
+```
+### The Meta Preset — Discovery-Only Front Door
+The **meta** preset loads zero domain tools. Agents start with only 5 tools:
+| Tool | Purpose |
+|------|---------|
+| `findTools` | Keyword search across all registered tools |
+| `getMethodology` | Get workflow guides by topic |
+| `discover_tools` | Hybrid search with relevance scoring (richer than findTools) |
+| `get_tool_quick_ref` | Quick reference card for any specific tool |
+| `get_workflow_chain` | Recommended tool sequence for common workflows |
+This is the recommended starting point for autonomous agents. The self-escalation pattern:
+```
+1. Start with --preset meta (5 tools)
+2. discover_tools({ query: "what I need to do" })    // Find relevant tools
+3. get_workflow_chain({ workflow: "verification" })    // Get the tool sequence
+4. If needed tools are not loaded:
+   → Restart with --preset core or --preset full
+   → Or use --toolsets to add specific domains
+5. Proceed with full workflow
+```
+### Preset Domain Breakdown
+**meta** (0 domains): No domain tools. Meta + Discovery only.
+**lite** (7 domains): `verification`, `eval`, `quality_gate`, `learning`, `recon`, `security`, `boilerplate`
+**core** (16 domains): Everything in lite plus `flywheel`, `bootstrap`, `self_eval`, `llm`, `platform`, `research_writing`, `flicker_detection`, `figma_flow`, `benchmark`
+**full** (all domains): All toolsets in TOOLSET_MAP including `ui_capture`, `vision`, `local_file`, `web`, `github`, `docs`, `parallel`, and everything in core.
-**→ Quick Refs:** Find tools by keyword: `findTools({ query: "verification" })` | Get workflow guide: `getMethodology({ topic: "..." })` | See [Methodology Topics](#methodology-topics) for all topics
+**→ Quick Refs:** Check current toolset: `findTools({ query: "*" })` | Self-escalate: restart with `--preset core` | See [MCP Tool Categories](#mcp-tool-categories) | CLI help: `npx nodebench-mcp --help`
 ---
@@ -616,6 +698,7 @@ Available via `getMethodology({ topic: "..." })`:
 | `autonomous_maintenance` | Risk-tiered execution | [Autonomous Maintenance](#autonomous-self-maintenance-system) |
 | `parallel_agent_teams` | Multi-agent coordination, task locking, oracle testing | [Parallel Agent Teams](#parallel-agent-teams) |
 | `self_reinforced_learning` | Trajectory analysis, self-eval, improvement recs | [Self-Reinforced Learning](#self-reinforced-learning-loop) |
+| `toolset_gating` | 4 presets (meta, lite, core, full) and self-escalation | [Toolset Gating & Presets](#toolset-gating--presets) |
 **→ Quick Refs:** Find tools: `findTools({ query: "..." })` | Get any methodology: `getMethodology({ topic: "..." })` | See [MCP Tool Categories](#mcp-tool-categories)

package/README.md CHANGED Viewed

@@ -39,7 +39,7 @@ Every additional tool call produces a concrete artifact — an issue found, a ri
 **QA engineer** — Transitioned a manual QA workflow website into an AI agent-driven app for a pet care messaging platform. Uses NodeBench's quality gates, verification cycles, and eval runs to ensure the AI agent handles edge cases that manual QA caught but bare AI agents miss.
-Both found different subsets of the 129 tools useful — which is why v2.8 ships with `--preset` gating to load only what you need.
+Both found different subsets of the 129 tools useful — which is why v2.8 ships with 4 `--preset` levels to load only what you need.
 ---
@@ -80,6 +80,9 @@ Tasks 1-3 start with zero prior knowledge. By task 9, the agent finds 2+ relevan
 # Claude Code CLI — all 129 tools
 claude mcp add nodebench -- npx -y nodebench-mcp
+# Or start with discovery only — 5 tools, agents self-escalate to what they need
+claude mcp add nodebench -- npx -y nodebench-mcp --preset meta
 # Or start lean — 39 tools, ~70% less token overhead
 claude mcp add nodebench -- npx -y nodebench-mcp --preset lite
 ```
@@ -304,7 +307,18 @@ Based on Anthropic's ["Building a C Compiler with Parallel Claudes"](https://www
 ### Presets
+| Preset | Tools | Use case |
+|---|---|---|
+| `meta` | 5 | Discovery-only front door — agents start here and self-escalate via `discover_tools` |
+| `lite` | 39 | Core methodology — verification, eval, gates, learning, recon, security, boilerplate |
+| `core` | 87 | Full workflow — adds flywheel, bootstrap, self-eval, llm, platform, research_writing, flicker_detection, figma_flow, benchmark |
+| `full` | 129 | Everything (default) |
 ```bash
+# Meta — 5 tools (discovery-only: findTools, getMethodology, discover_tools, get_tool_quick_ref, get_workflow_chain)
+# Agents start here and self-escalate to the tools they need
+claude mcp add nodebench -- npx -y nodebench-mcp --preset meta
 # Lite — 39 tools (verification, eval, gates, learning, recon, security, boilerplate + meta + discovery)
 claude mcp add nodebench -- npx -y nodebench-mcp --preset lite
@@ -322,7 +336,7 @@ Or in config:
   "mcpServers": {
     "nodebench": {
       "command": "npx",
-      "args": ["-y", "nodebench-mcp", "--preset", "core"]
+      "args": ["-y", "nodebench-mcp", "--preset", "meta"]
     }
   }
 }
@@ -369,10 +383,12 @@ npx nodebench-mcp --help
 | boilerplate | 2 | Scaffold NodeBench projects + status |
 | benchmark | 3 | Autonomous benchmark lifecycle (C-compiler pattern) |
-Always included (regardless of gating):
+Always included (regardless of gating) — these 5 tools form the `meta` preset:
 - Meta: `findTools`, `getMethodology`
 - Discovery: `discover_tools`, `get_tool_quick_ref`, `get_workflow_chain`
+The `meta` preset loads **only** these 5 tools (0 domain tools). Agents use `discover_tools` to find what they need and self-escalate.
 ---
 ## Build from Source

package/dist/__tests__/toolsetGatingEval.test.js CHANGED Viewed

@@ -75,6 +75,7 @@ const TOOLSET_MAP = {
     benchmark: cCompilerBenchmarkTools,
 };
 const PRESETS = {
+    meta: [],
     lite: ["verification", "eval", "quality_gate", "learning", "recon", "security", "boilerplate"],
     core: ["verification", "eval", "quality_gate", "learning", "flywheel", "recon", "bootstrap", "self_eval", "llm", "security", "platform", "research_writing", "flicker_detection", "figma_flow", "boilerplate", "benchmark"],
     full: Object.keys(TOOLSET_MAP),
@@ -721,41 +722,51 @@ async function cleanupAll() {
 const allTrajectories = [];
 describe("Toolset Gating Eval", () => {
     afterAll(async () => { await cleanupAll(); });
-    for (const preset of ["lite", "core", "full"]) {
+    for (const preset of ["meta", "lite", "core", "full"]) {
         describe(`Preset: ${preset}`, () => {
             for (const scenario of SCENARIOS) {
                 it(`${preset}/${scenario.id}: runs 8-phase pipeline`, async () => {
                     const t = await runTrajectory(preset, scenario);
                     allTrajectories.push(t);
-                    // Core methodology phases should always work (lite, core, full all have these)
+                    // Meta phase always succeeds (findTools + getMethodology always present)
                     const metaPhase = t.phases.find((p) => p.phase === "meta");
                     expect(metaPhase?.success).toBe(true);
-                    const reconPhase = t.phases.find((p) => p.phase === "recon");
-                    expect(reconPhase?.success).toBe(true);
-                    const verifyPhase = t.phases.find((p) => p.phase === "verification");
-                    expect(verifyPhase?.success).toBe(true);
-                    const evalPhase = t.phases.find((p) => p.phase === "eval");
-                    expect(evalPhase?.success).toBe(true);
-                    const gatePhase = t.phases.find((p) => p.phase === "quality-gate");
-                    expect(gatePhase?.success).toBe(true);
-                    // Knowledge phase depends on preset (learning tools in lite + core + full)
-                    const knowledgePhase = t.phases.find((p) => p.phase === "knowledge");
-                    expect(knowledgePhase?.success).toBe(true);
+                    if (preset === "meta") {
+                        // meta preset: only meta tools available — all other phases skipped
+                        expect(t.phasesCompleted).toBe(1); // only meta phase
+                        expect(t.toolCount).toBe(2); // findTools + getMethodology
+                    }
+                    else {
+                        // lite, core, full: domain tools available
+                        const reconPhase = t.phases.find((p) => p.phase === "recon");
+                        expect(reconPhase?.success).toBe(true);
+                        const verifyPhase = t.phases.find((p) => p.phase === "verification");
+                        expect(verifyPhase?.success).toBe(true);
+                        const evalPhase = t.phases.find((p) => p.phase === "eval");
+                        expect(evalPhase?.success).toBe(true);
+                        const gatePhase = t.phases.find((p) => p.phase === "quality-gate");
+                        expect(gatePhase?.success).toBe(true);
+                        // Knowledge phase depends on preset (learning tools in lite + core + full)
+                        const knowledgePhase = t.phases.find((p) => p.phase === "knowledge");
+                        expect(knowledgePhase?.success).toBe(true);
+                    }
                 }, 30_000);
             }
         });
     }
     describe("Flywheel availability", () => {
-        it("lite preset does NOT have flywheel tools", () => {
-            const liteTrajectories = allTrajectories.filter((t) => t.preset === "lite");
-            for (const t of liteTrajectories) {
+        it("meta and lite presets do NOT have flywheel tools", () => {
+            const noFlywheel = allTrajectories.filter((t) => t.preset === "meta" || t.preset === "lite");
+            for (const t of noFlywheel) {
                 const fw = t.phases.find((p) => p.phase === "flywheel");
                 expect(fw?.success).toBe(false);
-                expect(fw?.toolsMissing).toContain("run_mandatory_flywheel");
+                if (t.preset === "lite") {
+                    expect(fw?.toolsMissing).toContain("run_mandatory_flywheel");
+                }
             }
         });
         it("core and full presets HAVE flywheel tools", () => {
-            const coreFullTrajectories = allTrajectories.filter((t) => t.preset !== "lite");
+            const coreFullTrajectories = allTrajectories.filter((t) => t.preset === "core" || t.preset === "full");
             for (const t of coreFullTrajectories) {
                 expect(t.flywheelComplete).toBe(true);
             }
@@ -784,16 +795,16 @@ describe("Toolset Gating Eval", () => {
         });
     });
     describe("Self-eval availability", () => {
-        it("lite does NOT have self-eval tools", () => {
-            const liteTrajectories = allTrajectories.filter((t) => t.preset === "lite");
-            for (const t of liteTrajectories) {
+        it("meta and lite do NOT have self-eval tools", () => {
+            const noSelfEval = allTrajectories.filter((t) => t.preset === "meta" || t.preset === "lite");
+            for (const t of noSelfEval) {
                 const se = t.phases.find((p) => p.phase === "self-eval");
                 if (se)
                     expect(se.success).toBe(false);
             }
         });
         it("core and full HAVE self-eval tools", () => {
-            const coreFullTrajectories = allTrajectories.filter((t) => t.preset !== "lite");
+            const coreFullTrajectories = allTrajectories.filter((t) => t.preset === "core" || t.preset === "full");
             for (const t of coreFullTrajectories) {
                 const se = t.phases.find((p) => p.phase === "self-eval");
                 expect(se?.success).toBe(true);
@@ -801,6 +812,14 @@ describe("Toolset Gating Eval", () => {
         });
     });
     describe("Token surface area reduction", () => {
+        it("meta has the fewest tools (only meta tools)", () => {
+            const metaT = allTrajectories.find((t) => t.preset === "meta");
+            const liteT = allTrajectories.find((t) => t.preset === "lite");
+            expect(metaT.toolCount).toBe(2); // findTools + getMethodology only
+            expect(metaT.toolCount).toBeLessThan(liteT.toolCount);
+            const reduction = 1 - metaT.toolCount / liteT.toolCount;
+            expect(reduction).toBeGreaterThan(0.9); // meta is 90%+ fewer tools than lite
+        });
         it("lite reduces tool count and estimated token overhead vs full", () => {
             const liteT = allTrajectories.find((t) => t.preset === "lite");
             const fullT = allTrajectories.find((t) => t.preset === "full");
@@ -809,11 +828,13 @@ describe("Toolset Gating Eval", () => {
             const reduction = 1 - liteT.toolCount / fullT.toolCount;
             expect(reduction).toBeGreaterThan(0.5); // lite is at least 50% fewer tools
         });
-        it("core is between lite and full", () => {
+        it("presets are ordered: meta < lite < core < full", () => {
+            const metaT = allTrajectories.find((t) => t.preset === "meta");
             const liteT = allTrajectories.find((t) => t.preset === "lite");
             const coreT = allTrajectories.find((t) => t.preset === "core");
             const fullT = allTrajectories.find((t) => t.preset === "full");
-            expect(coreT.toolCount).toBeGreaterThan(liteT.toolCount);
+            expect(metaT.toolCount).toBeLessThan(liteT.toolCount);
+            expect(liteT.toolCount).toBeLessThan(coreT.toolCount);
             expect(coreT.toolCount).toBeLessThan(fullT.toolCount);
         });
     });
@@ -823,7 +844,7 @@ describe("Toolset Gating Eval", () => {
 // ═══════════════════════════════════════════════════════════════════════════
 describe("Toolset Gating Report", () => {
     it("generates trajectory comparison across presets", () => {
-        expect(allTrajectories.length).toBe(27); // 3 presets × 9 scenarios
+        expect(allTrajectories.length).toBe(36); // 4 presets × 9 scenarios
         console.log("\n");
         console.log("╔══════════════════════════════════════════════════════════════════════════════╗");
         console.log("║  TOOLSET GATING EVAL — Trajectory Comparison                                ║");
@@ -834,7 +855,7 @@ describe("Toolset Gating Report", () => {
         console.log("┌──────────────────────────────────────────────────────────────────────────────┐");
         console.log("│ 1. TOOL COUNT & ESTIMATED TOKEN OVERHEAD                                     │");
         console.log("├──────────────────────────────────────────────────────────────────────────────┤");
-        for (const preset of ["lite", "core", "full"]) {
+        for (const preset of ["meta", "lite", "core", "full"]) {
             const t = allTrajectories.find((tr) => tr.preset === preset);
             const bar = "█".repeat(Math.round(t.toolCount / 3));
             console.log(`│ ${preset.padEnd(6)} ${String(t.toolCount).padStart(3)} tools  ~${String(t.estimatedSchemaTokens).padStart(5)} tokens  ${bar}`.padEnd(79) + "│");
@@ -855,7 +876,7 @@ describe("Toolset Gating Report", () => {
         const allPhaseNames = ["meta", "recon", "risk", "verification", "eval", "quality-gate", "knowledge", "flywheel", "parallel", "self-eval"];
         for (const phase of allPhaseNames) {
             const cols = [];
-            for (const preset of ["lite", "core", "full"]) {
+            for (const preset of ["meta", "lite", "core", "full"]) {
                 const trajectories = allTrajectories.filter((t) => t.preset === preset);
                 const phaseResults = trajectories.map((t) => t.phases.find((p) => p.phase === phase));
                 const present = phaseResults.some((p) => p);
@@ -886,7 +907,7 @@ describe("Toolset Gating Report", () => {
             { label: "Total tool calls", key: "totalToolCalls" },
         ]) {
             const cols = [];
-            for (const preset of ["lite", "core", "full"]) {
+            for (const preset of ["meta", "lite", "core", "full"]) {
                 const sum = allTrajectories
                     .filter((t) => t.preset === preset)
                     .reduce((s, t) => s + t[metric.key], 0);
@@ -901,7 +922,7 @@ describe("Toolset Gating Report", () => {
             { label: "Flywheel complete", fn: (t) => t.flywheelComplete },
         ]) {
             const cols = [];
-            for (const preset of ["lite", "core", "full"]) {
+            for (const preset of ["meta", "lite", "core", "full"]) {
                 const count = allTrajectories
                     .filter((t) => t.preset === preset)
                     .filter(metric.fn).length;
@@ -915,7 +936,7 @@ describe("Toolset Gating Report", () => {
         console.log("┌──────────────────────────────────────────────────────────────────────────────┐");
         console.log("│ 4. TOOLS MISSING BY PRESET (what you lose with gating)                       │");
         console.log("├──────────────────────────────────────────────────────────────────────────────┤");
-        for (const preset of ["lite", "core"]) {
+        for (const preset of ["meta", "lite", "core"]) {
             const missingCalls = callLog.filter((c) => c.preset === preset && c.status === "missing");
             const uniqueMissing = [...new Set(missingCalls.map((c) => c.tool))];
             if (uniqueMissing.length > 0) {
@@ -984,7 +1005,7 @@ describe("Toolset Gating Report", () => {
         console.log("┌──────────────────────────────────────────────────────────────────────────────┐");
         console.log("│ 7. UNIQUE TOOLS EXERCISED PER PRESET                                        │");
         console.log("├──────────────────────────────────────────────────────────────────────────────┤");
-        for (const preset of ["lite", "core", "full"]) {
+        for (const preset of ["meta", "lite", "core", "full"]) {
             const successCalls = callLog.filter((c) => c.preset === preset && c.status === "success");
             const uniqueTools = [...new Set(successCalls.map((c) => c.tool))];
             const availableTools = buildToolset(preset).length;
@@ -1002,24 +1023,37 @@ describe("Toolset Gating Report", () => {
         console.log("║  VERDICT                                                                     ║");
         console.log("╠══════════════════════════════════════════════════════════════════════════════╣");
         console.log("║                                                                              ║");
+        const metaCompleted = allTrajectories.filter((t) => t.preset === "meta").reduce((s, t) => s + t.phasesCompleted, 0);
+        const metaTotal = allTrajectories.filter((t) => t.preset === "meta").reduce((s, t) => s + t.phasesCompleted + t.phasesSkipped, 0);
         const liteCompleted = allTrajectories.filter((t) => t.preset === "lite").reduce((s, t) => s + t.phasesCompleted, 0);
         const liteTotal = allTrajectories.filter((t) => t.preset === "lite").reduce((s, t) => s + t.phasesCompleted + t.phasesSkipped, 0);
         const coreCompleted = allTrajectories.filter((t) => t.preset === "core").reduce((s, t) => s + t.phasesCompleted, 0);
         const coreTotal = allTrajectories.filter((t) => t.preset === "core").reduce((s, t) => s + t.phasesCompleted + t.phasesSkipped, 0);
         const fullCompleted = allTrajectories.filter((t) => t.preset === "full").reduce((s, t) => s + t.phasesCompleted, 0);
         const fullTotal = allTrajectories.filter((t) => t.preset === "full").reduce((s, t) => s + t.phasesCompleted + t.phasesSkipped, 0);
+        console.log(`║  meta: ${metaCompleted}/${metaTotal} phases (${Math.round(metaCompleted / metaTotal * 100)}%)  — discovery only, 5 tools, minimal context`.padEnd(79) + "║");
         console.log(`║  lite: ${liteCompleted}/${liteTotal} phases (${Math.round(liteCompleted / liteTotal * 100)}%)  — ${savings}% fewer tokens, loses flywheel + parallel`.padEnd(79) + "║");
         console.log(`║  core: ${coreCompleted}/${coreTotal} phases (${Math.round(coreCompleted / coreTotal * 100)}%)  — full methodology loop, no parallel/vision/web`.padEnd(79) + "║");
         console.log(`║  full: ${fullCompleted}/${fullTotal} phases (${Math.round(fullCompleted / fullTotal * 100)}%) — everything`.padEnd(79) + "║");
         console.log("║                                                                              ║");
         console.log("║  Recommendation:                                                             ║");
+        console.log("║    Discovery-first / front door → --preset meta  (5 tools, self-escalate)   ║");
         console.log("║    Solo dev, standard tasks     → --preset lite  (fast, low token overhead)  ║");
         console.log("║    Team with methodology needs  → --preset core  (full flywheel loop)        ║");
         console.log("║    Multi-agent / full pipeline   → --preset full  (parallel + self-eval)     ║");
         console.log("║                                                                              ║");
         console.log("╚══════════════════════════════════════════════════════════════════════════════╝");
         // ─── ASSERTIONS ───
-        // All presets complete the core 6 phases (meta, recon, risk, verification, eval, quality-gate)
+        // meta preset: only meta phase succeeds (discovery-only gate)
+        {
+            const metaTrajectories = allTrajectories.filter((t) => t.preset === "meta");
+            for (const t of metaTrajectories) {
+                expect(t.phases.find((p) => p.phase === "meta")?.success).toBe(true);
+                expect(t.phasesCompleted).toBe(1);
+                expect(t.toolCount).toBe(2);
+            }
+        }
+        // lite, core, full: complete the core 6 phases (meta, recon, risk, verification, eval, quality-gate)
         for (const preset of ["lite", "core", "full"]) {
             const trajectories = allTrajectories.filter((t) => t.preset === preset);
             for (const t of trajectories) {
@@ -1031,7 +1065,7 @@ describe("Toolset Gating Report", () => {
                 expect(t.phases.find((p) => p.phase === "knowledge")?.success).toBe(true);
             }
         }
-        // lite and core detect the same number of issues as full (core methodology is intact)
+        // lite, core, full detect issues (core methodology is intact)
         for (const preset of ["lite", "core", "full"]) {
             const totalIssues = allTrajectories
                 .filter((t) => t.preset === preset)