nodebench-mcp 2.10.0 → 2.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -21,9 +21,26 @@ Add to `~/.claude/settings.json`:
21
21
  }
22
22
  ```
23
23
 
24
- Restart Claude Code. 89 tools available immediately.
24
+ Restart Claude Code. 89+ tools available immediately.
25
25
 
26
- **→ Quick Refs:** After setup, run `getMethodology("overview")` | First task? See [Verification Cycle](#verification-cycle-workflow) | New to codebase? See [Environment Setup](#environment-setup)
26
+ ### Preset Selection
27
+
28
+ By default all toolsets are enabled. Use `--preset` to start with a scoped subset:
29
+
30
+ ```json
31
+ {
32
+ "mcpServers": {
33
+ "nodebench": {
34
+ "command": "npx",
35
+ "args": ["-y", "nodebench-mcp", "--preset", "meta"]
36
+ }
37
+ }
38
+ }
39
+ ```
40
+
41
+ The **meta** preset is the recommended front door for new agents: start with just 5 discovery tools, use `discover_tools` to find what you need, then self-escalate to a larger preset. See [Toolset Gating & Presets](#toolset-gating--presets) for the full breakdown.
42
+
43
+ **→ Quick Refs:** After setup, run `getMethodology("overview")` | First task? See [Verification Cycle](#verification-cycle-workflow) | New to codebase? See [Environment Setup](#environment-setup) | Preset options: See [Toolset Gating & Presets](#toolset-gating--presets)
27
44
 
28
45
  ---
29
46
 
@@ -261,8 +278,73 @@ Use `getMethodology("overview")` to see all available workflows.
261
278
  | **Security** | `scan_dependencies`, `run_code_analysis` | Dependency auditing, static code analysis |
262
279
  | **Platform** | `query_daily_brief`, `query_funding_entities`, `query_research_queue`, `publish_to_queue` | Convex platform bridge: intelligence, funding, research, publishing |
263
280
  | **Meta** | `findTools`, `getMethodology` | Discover tools, get workflow guides |
281
+ | **Discovery** | `discover_tools`, `get_tool_quick_ref`, `get_workflow_chain` | Hybrid search, quick refs, workflow chains |
282
+
283
+ Meta + Discovery tools (5 total) are **always included** regardless of preset. See [Toolset Gating & Presets](#toolset-gating--presets).
284
+
285
+ **→ Quick Refs:** Find tools by keyword: `findTools({ query: "verification" })` | Hybrid search: `discover_tools({ query: "security" })` | Get workflow guide: `getMethodology({ topic: "..." })` | See [Methodology Topics](#methodology-topics) for all topics
286
+
287
+ ---
288
+
289
+ ## Toolset Gating & Presets
290
+
291
+ NodeBench MCP supports 4 presets that control which domain toolsets are loaded at startup. Meta + Discovery tools (5 total) are **always included** on top of any preset.
292
+
293
+ ### Preset Table
294
+
295
+ | Preset | Domain Toolsets | Domain Tools | Total (with meta+discovery) | Use Case |
296
+ |--------|----------------|-------------|----------------------------|----------|
297
+ | **meta** | 0 | 0 | 5 | Discovery-only front door. Agents start here and self-escalate. |
298
+ | **lite** | 7 | ~35 | ~40 | Lightweight verification-focused workflows. CI bots, quick checks. |
299
+ | **core** | 16 | ~75 | ~80 | Full development workflow. Most agent sessions. |
300
+ | **full** | all | 89+ | 94+ | Everything enabled. Benchmarking, exploration, advanced use. |
301
+
302
+ ### Usage
303
+
304
+ ```bash
305
+ npx nodebench-mcp --preset meta # Discovery-only (5 tools)
306
+ npx nodebench-mcp --preset lite # Verification + eval + recon + security
307
+ npx nodebench-mcp --preset core # Full dev workflow without vision/parallel
308
+ npx nodebench-mcp --preset full # All toolsets (default)
309
+ npx nodebench-mcp --toolsets verification,eval,recon # Custom selection
310
+ npx nodebench-mcp --exclude vision,ui_capture # Exclude specific toolsets
311
+ ```
312
+
313
+ ### The Meta Preset — Discovery-Only Front Door
314
+
315
+ The **meta** preset loads zero domain tools. Agents start with only 5 tools:
316
+
317
+ | Tool | Purpose |
318
+ |------|---------|
319
+ | `findTools` | Keyword search across all registered tools |
320
+ | `getMethodology` | Get workflow guides by topic |
321
+ | `discover_tools` | Hybrid search with relevance scoring (richer than findTools) |
322
+ | `get_tool_quick_ref` | Quick reference card for any specific tool |
323
+ | `get_workflow_chain` | Recommended tool sequence for common workflows |
324
+
325
+ This is the recommended starting point for autonomous agents. The self-escalation pattern:
326
+
327
+ ```
328
+ 1. Start with --preset meta (5 tools)
329
+ 2. discover_tools({ query: "what I need to do" }) // Find relevant tools
330
+ 3. get_workflow_chain({ workflow: "verification" }) // Get the tool sequence
331
+ 4. If needed tools are not loaded:
332
+ → Restart with --preset core or --preset full
333
+ → Or use --toolsets to add specific domains
334
+ 5. Proceed with full workflow
335
+ ```
336
+
337
+ ### Preset Domain Breakdown
338
+
339
+ **meta** (0 domains): No domain tools. Meta + Discovery only.
340
+
341
+ **lite** (7 domains): `verification`, `eval`, `quality_gate`, `learning`, `recon`, `security`, `boilerplate`
342
+
343
+ **core** (16 domains): Everything in lite plus `flywheel`, `bootstrap`, `self_eval`, `llm`, `platform`, `research_writing`, `flicker_detection`, `figma_flow`, `benchmark`
344
+
345
+ **full** (all domains): All toolsets in TOOLSET_MAP including `ui_capture`, `vision`, `local_file`, `web`, `github`, `docs`, `parallel`, and everything in core.
264
346
 
265
- **→ Quick Refs:** Find tools by keyword: `findTools({ query: "verification" })` | Get workflow guide: `getMethodology({ topic: "..." })` | See [Methodology Topics](#methodology-topics) for all topics
347
+ **→ Quick Refs:** Check current toolset: `findTools({ query: "*" })` | Self-escalate: restart with `--preset core` | See [MCP Tool Categories](#mcp-tool-categories) | CLI help: `npx nodebench-mcp --help`
266
348
 
267
349
  ---
268
350
 
@@ -616,6 +698,7 @@ Available via `getMethodology({ topic: "..." })`:
616
698
  | `autonomous_maintenance` | Risk-tiered execution | [Autonomous Maintenance](#autonomous-self-maintenance-system) |
617
699
  | `parallel_agent_teams` | Multi-agent coordination, task locking, oracle testing | [Parallel Agent Teams](#parallel-agent-teams) |
618
700
  | `self_reinforced_learning` | Trajectory analysis, self-eval, improvement recs | [Self-Reinforced Learning](#self-reinforced-learning-loop) |
701
+ | `toolset_gating` | 4 presets (meta, lite, core, full) and self-escalation | [Toolset Gating & Presets](#toolset-gating--presets) |
619
702
 
620
703
  **→ Quick Refs:** Find tools: `findTools({ query: "..." })` | Get any methodology: `getMethodology({ topic: "..." })` | See [MCP Tool Categories](#mcp-tool-categories)
621
704
 
package/README.md CHANGED
@@ -39,7 +39,7 @@ Every additional tool call produces a concrete artifact — an issue found, a ri
39
39
 
40
40
  **QA engineer** — Transitioned a manual QA workflow website into an AI agent-driven app for a pet care messaging platform. Uses NodeBench's quality gates, verification cycles, and eval runs to ensure the AI agent handles edge cases that manual QA caught but bare AI agents miss.
41
41
 
42
- Both found different subsets of the 129 tools useful — which is why v2.8 ships with `--preset` gating to load only what you need.
42
+ Both found different subsets of the 129 tools useful — which is why v2.8 ships with 4 `--preset` levels to load only what you need.
43
43
 
44
44
  ---
45
45
 
@@ -80,6 +80,9 @@ Tasks 1-3 start with zero prior knowledge. By task 9, the agent finds 2+ relevan
80
80
  # Claude Code CLI — all 129 tools
81
81
  claude mcp add nodebench -- npx -y nodebench-mcp
82
82
 
83
+ # Or start with discovery only — 5 tools, agents self-escalate to what they need
84
+ claude mcp add nodebench -- npx -y nodebench-mcp --preset meta
85
+
83
86
  # Or start lean — 39 tools, ~70% less token overhead
84
87
  claude mcp add nodebench -- npx -y nodebench-mcp --preset lite
85
88
  ```
@@ -304,7 +307,18 @@ Based on Anthropic's ["Building a C Compiler with Parallel Claudes"](https://www
304
307
 
305
308
  ### Presets
306
309
 
310
+ | Preset | Tools | Use case |
311
+ |---|---|---|
312
+ | `meta` | 5 | Discovery-only front door — agents start here and self-escalate via `discover_tools` |
313
+ | `lite` | 39 | Core methodology — verification, eval, gates, learning, recon, security, boilerplate |
314
+ | `core` | 87 | Full workflow — adds flywheel, bootstrap, self-eval, llm, platform, research_writing, flicker_detection, figma_flow, benchmark |
315
+ | `full` | 129 | Everything (default) |
316
+
307
317
  ```bash
318
+ # Meta — 5 tools (discovery-only: findTools, getMethodology, discover_tools, get_tool_quick_ref, get_workflow_chain)
319
+ # Agents start here and self-escalate to the tools they need
320
+ claude mcp add nodebench -- npx -y nodebench-mcp --preset meta
321
+
308
322
  # Lite — 39 tools (verification, eval, gates, learning, recon, security, boilerplate + meta + discovery)
309
323
  claude mcp add nodebench -- npx -y nodebench-mcp --preset lite
310
324
 
@@ -322,7 +336,7 @@ Or in config:
322
336
  "mcpServers": {
323
337
  "nodebench": {
324
338
  "command": "npx",
325
- "args": ["-y", "nodebench-mcp", "--preset", "core"]
339
+ "args": ["-y", "nodebench-mcp", "--preset", "meta"]
326
340
  }
327
341
  }
328
342
  }
@@ -369,10 +383,12 @@ npx nodebench-mcp --help
369
383
  | boilerplate | 2 | Scaffold NodeBench projects + status |
370
384
  | benchmark | 3 | Autonomous benchmark lifecycle (C-compiler pattern) |
371
385
 
372
- Always included (regardless of gating):
386
+ Always included (regardless of gating) — these 5 tools form the `meta` preset:
373
387
  - Meta: `findTools`, `getMethodology`
374
388
  - Discovery: `discover_tools`, `get_tool_quick_ref`, `get_workflow_chain`
375
389
 
390
+ The `meta` preset loads **only** these 5 tools (0 domain tools). Agents use `discover_tools` to find what they need and self-escalate.
391
+
376
392
  ---
377
393
 
378
394
  ## Build from Source
@@ -75,6 +75,7 @@ const TOOLSET_MAP = {
75
75
  benchmark: cCompilerBenchmarkTools,
76
76
  };
77
77
  const PRESETS = {
78
+ meta: [],
78
79
  lite: ["verification", "eval", "quality_gate", "learning", "recon", "security", "boilerplate"],
79
80
  core: ["verification", "eval", "quality_gate", "learning", "flywheel", "recon", "bootstrap", "self_eval", "llm", "security", "platform", "research_writing", "flicker_detection", "figma_flow", "boilerplate", "benchmark"],
80
81
  full: Object.keys(TOOLSET_MAP),
@@ -721,41 +722,51 @@ async function cleanupAll() {
721
722
  const allTrajectories = [];
722
723
  describe("Toolset Gating Eval", () => {
723
724
  afterAll(async () => { await cleanupAll(); });
724
- for (const preset of ["lite", "core", "full"]) {
725
+ for (const preset of ["meta", "lite", "core", "full"]) {
725
726
  describe(`Preset: ${preset}`, () => {
726
727
  for (const scenario of SCENARIOS) {
727
728
  it(`${preset}/${scenario.id}: runs 8-phase pipeline`, async () => {
728
729
  const t = await runTrajectory(preset, scenario);
729
730
  allTrajectories.push(t);
730
- // Core methodology phases should always work (lite, core, full all have these)
731
+ // Meta phase always succeeds (findTools + getMethodology always present)
731
732
  const metaPhase = t.phases.find((p) => p.phase === "meta");
732
733
  expect(metaPhase?.success).toBe(true);
733
- const reconPhase = t.phases.find((p) => p.phase === "recon");
734
- expect(reconPhase?.success).toBe(true);
735
- const verifyPhase = t.phases.find((p) => p.phase === "verification");
736
- expect(verifyPhase?.success).toBe(true);
737
- const evalPhase = t.phases.find((p) => p.phase === "eval");
738
- expect(evalPhase?.success).toBe(true);
739
- const gatePhase = t.phases.find((p) => p.phase === "quality-gate");
740
- expect(gatePhase?.success).toBe(true);
741
- // Knowledge phase depends on preset (learning tools in lite + core + full)
742
- const knowledgePhase = t.phases.find((p) => p.phase === "knowledge");
743
- expect(knowledgePhase?.success).toBe(true);
734
+ if (preset === "meta") {
735
+ // meta preset: only meta tools available — all other phases skipped
736
+ expect(t.phasesCompleted).toBe(1); // only meta phase
737
+ expect(t.toolCount).toBe(2); // findTools + getMethodology
738
+ }
739
+ else {
740
+ // lite, core, full: domain tools available
741
+ const reconPhase = t.phases.find((p) => p.phase === "recon");
742
+ expect(reconPhase?.success).toBe(true);
743
+ const verifyPhase = t.phases.find((p) => p.phase === "verification");
744
+ expect(verifyPhase?.success).toBe(true);
745
+ const evalPhase = t.phases.find((p) => p.phase === "eval");
746
+ expect(evalPhase?.success).toBe(true);
747
+ const gatePhase = t.phases.find((p) => p.phase === "quality-gate");
748
+ expect(gatePhase?.success).toBe(true);
749
+ // Knowledge phase depends on preset (learning tools in lite + core + full)
750
+ const knowledgePhase = t.phases.find((p) => p.phase === "knowledge");
751
+ expect(knowledgePhase?.success).toBe(true);
752
+ }
744
753
  }, 30_000);
745
754
  }
746
755
  });
747
756
  }
748
757
  describe("Flywheel availability", () => {
749
- it("lite preset does NOT have flywheel tools", () => {
750
- const liteTrajectories = allTrajectories.filter((t) => t.preset === "lite");
751
- for (const t of liteTrajectories) {
758
+ it("meta and lite presets do NOT have flywheel tools", () => {
759
+ const noFlywheel = allTrajectories.filter((t) => t.preset === "meta" || t.preset === "lite");
760
+ for (const t of noFlywheel) {
752
761
  const fw = t.phases.find((p) => p.phase === "flywheel");
753
762
  expect(fw?.success).toBe(false);
754
- expect(fw?.toolsMissing).toContain("run_mandatory_flywheel");
763
+ if (t.preset === "lite") {
764
+ expect(fw?.toolsMissing).toContain("run_mandatory_flywheel");
765
+ }
755
766
  }
756
767
  });
757
768
  it("core and full presets HAVE flywheel tools", () => {
758
- const coreFullTrajectories = allTrajectories.filter((t) => t.preset !== "lite");
769
+ const coreFullTrajectories = allTrajectories.filter((t) => t.preset === "core" || t.preset === "full");
759
770
  for (const t of coreFullTrajectories) {
760
771
  expect(t.flywheelComplete).toBe(true);
761
772
  }
@@ -784,16 +795,16 @@ describe("Toolset Gating Eval", () => {
784
795
  });
785
796
  });
786
797
  describe("Self-eval availability", () => {
787
- it("lite does NOT have self-eval tools", () => {
788
- const liteTrajectories = allTrajectories.filter((t) => t.preset === "lite");
789
- for (const t of liteTrajectories) {
798
+ it("meta and lite do NOT have self-eval tools", () => {
799
+ const noSelfEval = allTrajectories.filter((t) => t.preset === "meta" || t.preset === "lite");
800
+ for (const t of noSelfEval) {
790
801
  const se = t.phases.find((p) => p.phase === "self-eval");
791
802
  if (se)
792
803
  expect(se.success).toBe(false);
793
804
  }
794
805
  });
795
806
  it("core and full HAVE self-eval tools", () => {
796
- const coreFullTrajectories = allTrajectories.filter((t) => t.preset !== "lite");
807
+ const coreFullTrajectories = allTrajectories.filter((t) => t.preset === "core" || t.preset === "full");
797
808
  for (const t of coreFullTrajectories) {
798
809
  const se = t.phases.find((p) => p.phase === "self-eval");
799
810
  expect(se?.success).toBe(true);
@@ -801,6 +812,14 @@ describe("Toolset Gating Eval", () => {
801
812
  });
802
813
  });
803
814
  describe("Token surface area reduction", () => {
815
+ it("meta has the fewest tools (only meta tools)", () => {
816
+ const metaT = allTrajectories.find((t) => t.preset === "meta");
817
+ const liteT = allTrajectories.find((t) => t.preset === "lite");
818
+ expect(metaT.toolCount).toBe(2); // findTools + getMethodology only
819
+ expect(metaT.toolCount).toBeLessThan(liteT.toolCount);
820
+ const reduction = 1 - metaT.toolCount / liteT.toolCount;
821
+ expect(reduction).toBeGreaterThan(0.9); // meta is 90%+ fewer tools than lite
822
+ });
804
823
  it("lite reduces tool count and estimated token overhead vs full", () => {
805
824
  const liteT = allTrajectories.find((t) => t.preset === "lite");
806
825
  const fullT = allTrajectories.find((t) => t.preset === "full");
@@ -809,11 +828,13 @@ describe("Toolset Gating Eval", () => {
809
828
  const reduction = 1 - liteT.toolCount / fullT.toolCount;
810
829
  expect(reduction).toBeGreaterThan(0.5); // lite is at least 50% fewer tools
811
830
  });
812
- it("core is between lite and full", () => {
831
+ it("presets are ordered: meta < lite < core < full", () => {
832
+ const metaT = allTrajectories.find((t) => t.preset === "meta");
813
833
  const liteT = allTrajectories.find((t) => t.preset === "lite");
814
834
  const coreT = allTrajectories.find((t) => t.preset === "core");
815
835
  const fullT = allTrajectories.find((t) => t.preset === "full");
816
- expect(coreT.toolCount).toBeGreaterThan(liteT.toolCount);
836
+ expect(metaT.toolCount).toBeLessThan(liteT.toolCount);
837
+ expect(liteT.toolCount).toBeLessThan(coreT.toolCount);
817
838
  expect(coreT.toolCount).toBeLessThan(fullT.toolCount);
818
839
  });
819
840
  });
@@ -823,7 +844,7 @@ describe("Toolset Gating Eval", () => {
823
844
  // ═══════════════════════════════════════════════════════════════════════════
824
845
  describe("Toolset Gating Report", () => {
825
846
  it("generates trajectory comparison across presets", () => {
826
- expect(allTrajectories.length).toBe(27); // 3 presets × 9 scenarios
847
+ expect(allTrajectories.length).toBe(36); // 4 presets × 9 scenarios
827
848
  console.log("\n");
828
849
  console.log("╔══════════════════════════════════════════════════════════════════════════════╗");
829
850
  console.log("║ TOOLSET GATING EVAL — Trajectory Comparison ║");
@@ -834,7 +855,7 @@ describe("Toolset Gating Report", () => {
834
855
  console.log("┌──────────────────────────────────────────────────────────────────────────────┐");
835
856
  console.log("│ 1. TOOL COUNT & ESTIMATED TOKEN OVERHEAD │");
836
857
  console.log("├──────────────────────────────────────────────────────────────────────────────┤");
837
- for (const preset of ["lite", "core", "full"]) {
858
+ for (const preset of ["meta", "lite", "core", "full"]) {
838
859
  const t = allTrajectories.find((tr) => tr.preset === preset);
839
860
  const bar = "█".repeat(Math.round(t.toolCount / 3));
840
861
  console.log(`│ ${preset.padEnd(6)} ${String(t.toolCount).padStart(3)} tools ~${String(t.estimatedSchemaTokens).padStart(5)} tokens ${bar}`.padEnd(79) + "│");
@@ -855,7 +876,7 @@ describe("Toolset Gating Report", () => {
855
876
  const allPhaseNames = ["meta", "recon", "risk", "verification", "eval", "quality-gate", "knowledge", "flywheel", "parallel", "self-eval"];
856
877
  for (const phase of allPhaseNames) {
857
878
  const cols = [];
858
- for (const preset of ["lite", "core", "full"]) {
879
+ for (const preset of ["meta", "lite", "core", "full"]) {
859
880
  const trajectories = allTrajectories.filter((t) => t.preset === preset);
860
881
  const phaseResults = trajectories.map((t) => t.phases.find((p) => p.phase === phase));
861
882
  const present = phaseResults.some((p) => p);
@@ -886,7 +907,7 @@ describe("Toolset Gating Report", () => {
886
907
  { label: "Total tool calls", key: "totalToolCalls" },
887
908
  ]) {
888
909
  const cols = [];
889
- for (const preset of ["lite", "core", "full"]) {
910
+ for (const preset of ["meta", "lite", "core", "full"]) {
890
911
  const sum = allTrajectories
891
912
  .filter((t) => t.preset === preset)
892
913
  .reduce((s, t) => s + t[metric.key], 0);
@@ -901,7 +922,7 @@ describe("Toolset Gating Report", () => {
901
922
  { label: "Flywheel complete", fn: (t) => t.flywheelComplete },
902
923
  ]) {
903
924
  const cols = [];
904
- for (const preset of ["lite", "core", "full"]) {
925
+ for (const preset of ["meta", "lite", "core", "full"]) {
905
926
  const count = allTrajectories
906
927
  .filter((t) => t.preset === preset)
907
928
  .filter(metric.fn).length;
@@ -915,7 +936,7 @@ describe("Toolset Gating Report", () => {
915
936
  console.log("┌──────────────────────────────────────────────────────────────────────────────┐");
916
937
  console.log("│ 4. TOOLS MISSING BY PRESET (what you lose with gating) │");
917
938
  console.log("├──────────────────────────────────────────────────────────────────────────────┤");
918
- for (const preset of ["lite", "core"]) {
939
+ for (const preset of ["meta", "lite", "core"]) {
919
940
  const missingCalls = callLog.filter((c) => c.preset === preset && c.status === "missing");
920
941
  const uniqueMissing = [...new Set(missingCalls.map((c) => c.tool))];
921
942
  if (uniqueMissing.length > 0) {
@@ -984,7 +1005,7 @@ describe("Toolset Gating Report", () => {
984
1005
  console.log("┌──────────────────────────────────────────────────────────────────────────────┐");
985
1006
  console.log("│ 7. UNIQUE TOOLS EXERCISED PER PRESET │");
986
1007
  console.log("├──────────────────────────────────────────────────────────────────────────────┤");
987
- for (const preset of ["lite", "core", "full"]) {
1008
+ for (const preset of ["meta", "lite", "core", "full"]) {
988
1009
  const successCalls = callLog.filter((c) => c.preset === preset && c.status === "success");
989
1010
  const uniqueTools = [...new Set(successCalls.map((c) => c.tool))];
990
1011
  const availableTools = buildToolset(preset).length;
@@ -1002,24 +1023,37 @@ describe("Toolset Gating Report", () => {
1002
1023
  console.log("║ VERDICT ║");
1003
1024
  console.log("╠══════════════════════════════════════════════════════════════════════════════╣");
1004
1025
  console.log("║ ║");
1026
+ const metaCompleted = allTrajectories.filter((t) => t.preset === "meta").reduce((s, t) => s + t.phasesCompleted, 0);
1027
+ const metaTotal = allTrajectories.filter((t) => t.preset === "meta").reduce((s, t) => s + t.phasesCompleted + t.phasesSkipped, 0);
1005
1028
  const liteCompleted = allTrajectories.filter((t) => t.preset === "lite").reduce((s, t) => s + t.phasesCompleted, 0);
1006
1029
  const liteTotal = allTrajectories.filter((t) => t.preset === "lite").reduce((s, t) => s + t.phasesCompleted + t.phasesSkipped, 0);
1007
1030
  const coreCompleted = allTrajectories.filter((t) => t.preset === "core").reduce((s, t) => s + t.phasesCompleted, 0);
1008
1031
  const coreTotal = allTrajectories.filter((t) => t.preset === "core").reduce((s, t) => s + t.phasesCompleted + t.phasesSkipped, 0);
1009
1032
  const fullCompleted = allTrajectories.filter((t) => t.preset === "full").reduce((s, t) => s + t.phasesCompleted, 0);
1010
1033
  const fullTotal = allTrajectories.filter((t) => t.preset === "full").reduce((s, t) => s + t.phasesCompleted + t.phasesSkipped, 0);
1034
+ console.log(`║ meta: ${metaCompleted}/${metaTotal} phases (${Math.round(metaCompleted / metaTotal * 100)}%) — discovery only, 5 tools, minimal context`.padEnd(79) + "║");
1011
1035
  console.log(`║ lite: ${liteCompleted}/${liteTotal} phases (${Math.round(liteCompleted / liteTotal * 100)}%) — ${savings}% fewer tokens, loses flywheel + parallel`.padEnd(79) + "║");
1012
1036
  console.log(`║ core: ${coreCompleted}/${coreTotal} phases (${Math.round(coreCompleted / coreTotal * 100)}%) — full methodology loop, no parallel/vision/web`.padEnd(79) + "║");
1013
1037
  console.log(`║ full: ${fullCompleted}/${fullTotal} phases (${Math.round(fullCompleted / fullTotal * 100)}%) — everything`.padEnd(79) + "║");
1014
1038
  console.log("║ ║");
1015
1039
  console.log("║ Recommendation: ║");
1040
+ console.log("║ Discovery-first / front door → --preset meta (5 tools, self-escalate) ║");
1016
1041
  console.log("║ Solo dev, standard tasks → --preset lite (fast, low token overhead) ║");
1017
1042
  console.log("║ Team with methodology needs → --preset core (full flywheel loop) ║");
1018
1043
  console.log("║ Multi-agent / full pipeline → --preset full (parallel + self-eval) ║");
1019
1044
  console.log("║ ║");
1020
1045
  console.log("╚══════════════════════════════════════════════════════════════════════════════╝");
1021
1046
  // ─── ASSERTIONS ───
1022
- // All presets complete the core 6 phases (meta, recon, risk, verification, eval, quality-gate)
1047
+ // meta preset: only meta phase succeeds (discovery-only gate)
1048
+ {
1049
+ const metaTrajectories = allTrajectories.filter((t) => t.preset === "meta");
1050
+ for (const t of metaTrajectories) {
1051
+ expect(t.phases.find((p) => p.phase === "meta")?.success).toBe(true);
1052
+ expect(t.phasesCompleted).toBe(1);
1053
+ expect(t.toolCount).toBe(2);
1054
+ }
1055
+ }
1056
+ // lite, core, full: complete the core 6 phases (meta, recon, risk, verification, eval, quality-gate)
1023
1057
  for (const preset of ["lite", "core", "full"]) {
1024
1058
  const trajectories = allTrajectories.filter((t) => t.preset === preset);
1025
1059
  for (const t of trajectories) {
@@ -1031,7 +1065,7 @@ describe("Toolset Gating Report", () => {
1031
1065
  expect(t.phases.find((p) => p.phase === "knowledge")?.success).toBe(true);
1032
1066
  }
1033
1067
  }
1034
- // lite and core detect the same number of issues as full (core methodology is intact)
1068
+ // lite, core, full detect issues (core methodology is intact)
1035
1069
  for (const preset of ["lite", "core", "full"]) {
1036
1070
  const totalIssues = allTrajectories
1037
1071
  .filter((t) => t.preset === preset)