@roadmapperai/mcp 0.9.4 → 0.9.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (4) hide show
  1. package/AGENTS.md +17 -0
  2. package/README.md +42 -16
  3. package/package.json +1 -1
  4. package/server.mjs +755 -93
package/AGENTS.md CHANGED
@@ -7,6 +7,23 @@ The roadmap data model is the source of truth — TypeScript types in
7
7
  [`src/types.ts`](https://github.com/vsgro/roadmapper/blob/main/src/types.ts) are the canonical schema. Everything
8
8
  in this doc is downstream of that file.
9
9
 
10
+ ## How to call operations (dispatch surface)
11
+
12
+ The `roadmapper` MCP server advertises **one** action tool: `roadmap({ op, args })`.
13
+ Every operation named in this document — `get_roadmap_snapshot`, `suggest_capability_for`,
14
+ `propose_task`, `propose_tasks`, `propose_capability`, `get_agents_md`, `link_repo`, etc. — is
15
+ an `op` **value**, not a separately-advertised tool. Invoke it as
16
+ `roadmap({ op: "<name>", args: { ... } })`. Two helpers round out the surface:
17
+
18
+ - `roadmap_search({ intent })` — list/rank the available operations by what you're trying to do.
19
+ - `roadmap_describe({ op })` — return one operation's exact input schema (its arguments).
20
+ - `roadmap({ op, args })` — execute the operation. `args` may also be passed flat (top-level), but the documented shape is nested under `args`.
21
+
22
+ So when a section below says "call `propose_tasks`" or shows `suggest_theme_for({ description })`,
23
+ read it as `roadmap({ op: "propose_tasks", args: {...} })` /
24
+ `roadmap({ op: "suggest_theme_for", args: { description } })`. The server enforces the
25
+ rubric/discovery gates and per-op validation identically however you call.
26
+
10
27
  ## TL;DR — what an agent must do
11
28
 
12
29
  - Reference Themes / Capabilities / Tasks / Sprints by **stable ID**,
package/README.md CHANGED
@@ -57,7 +57,22 @@ JSON config works, dropped into Cursor's `mcp` field.
57
57
 
58
58
  ## What this server exposes
59
59
 
60
- Read tools (always available):
60
+ **The wire surface is three dispatch tools.** As of 0.9.5 the server advertises
61
+ only:
62
+
63
+ - `roadmap_search({ intent })` — list/rank the available operations by intent
64
+ - `roadmap_describe({ op })` — return one operation's exact input schema
65
+ - `roadmap({ op, args })` — execute an operation
66
+
67
+ Everything below is an **`op` value** passed to `roadmap`, e.g.
68
+ `roadmap({ op: "get_roadmap_snapshot" })` or
69
+ `roadmap({ op: "propose_tasks", args: { capabilityId, tasks: [...] } })`. This
70
+ keeps `tools/list` tiny (one schema instead of ~34) while the per-op schemas
71
+ load on demand via `roadmap_describe`. The full planning contract ships in the
72
+ server's `instructions` (sent at connect) and the `get_agents_md` op /
73
+ `roadmapper://rubric` resource.
74
+
75
+ ### Read operations (always available)
61
76
 
62
77
  - `list_themes` — top-level strategic themes
63
78
  - `list_capabilities` — capabilities (optionally filtered by theme)
@@ -86,7 +101,7 @@ Read tools (always available):
86
101
  > for full rows and `limit` (max 200) to raise the cap. This keeps a
87
102
  > cold-start read from blowing the token budget on a large workspace.
88
103
 
89
- Write tools (require workspace-scoped write auth — set `ROADMAPPER_API_KEY`
104
+ ### Write operations (require workspace-scoped write auth — set `ROADMAPPER_API_KEY`
90
105
  to an `rmpr_…` key from the dashboard; writes then route through the
91
106
  mcp-broker so the service-role key never lives on your machine):
92
107
 
@@ -100,25 +115,26 @@ mcp-broker so the service-role key never lives on your machine):
100
115
 
101
116
  ## How agents are meant to use this
102
117
 
103
- The intended loop:
118
+ The intended loop (every step is `roadmap({ op, args })`):
104
119
 
105
- 1. Agent calls `get_agents_md` to load the rubric.
106
- 2. Agent reads the current roadmap state via `list_themes` /
107
- `list_capabilities` / `list_tasks`.
108
- 3. Before proposing anything new, agent calls `suggest_capability_for`
120
+ 1. `roadmap({ op: "get_agents_md" })` to load the rubric.
121
+ 2. Read the current roadmap state: `roadmap({ op: "get_roadmap_snapshot" })`
122
+ (or `list_themes` / `list_capabilities` / `list_tasks` ops).
123
+ 3. Before proposing anything new, `roadmap({ op: "suggest_capability_for", args: { description } })`
109
124
  (or `suggest_theme_for`) to check if a matching parent already exists.
110
- Skipping this is allowed for `propose_task` but the response will carry
111
- a warn-on-skip nudge; `propose_capability` hard-requires it.
112
- 4. Agent proposes new rows through the `propose_*` tools, including all
113
- the rubric-required fields (RICE, acceptance criteria, etc.).
114
- 5. Once a PR is merged, the agent calls `link_pr` to attach it; the
125
+ Skipping is allowed for `propose_task` but the response carries a
126
+ warn-on-skip nudge; `propose_capability` hard-requires it.
127
+ 4. Propose new rows via `roadmap({ op: "propose_tasks", args })` /
128
+ `roadmap({ op: "propose_capability", args })`, including all the
129
+ rubric-required fields (RICE, acceptance criteria, etc.).
130
+ 5. Once a PR is merged, `roadmap({ op: "link_pr", args })` to attach it; the
115
131
  roadmap delivery stats update automatically.
116
- 6. After delivery, the agent self-grades against the acceptance criteria
117
- with `submit_acceptance_grades`.
132
+ 6. After delivery, self-grade against the acceptance criteria with
133
+ `roadmap({ op: "submit_acceptance_grades", args })`.
118
134
 
119
135
  The server enforces the rubric: proposals filed without first fetching
120
- `get_agents_md` are rejected with a remediation hint pointing the agent
121
- back to the rubric.
136
+ `get_agents_md` are rejected with a structured remediation hint whose `fix`
137
+ field names the exact dispatch call to make next.
122
138
 
123
139
  ## Versioning
124
140
 
@@ -134,6 +150,16 @@ the check with `ROADMAPPER_DISABLE_UPDATE_CHECK=1`.
134
150
 
135
151
  ### Recent changes
136
152
 
153
+ - **0.9.5** — **tool-surface collapse for token efficiency.** `tools/list` now
154
+ advertises three dispatch tools (`roadmap_search` / `roadmap_describe` /
155
+ `roadmap`) instead of ~34, cutting the always-loaded tool definitions ~97%
156
+ (~15k → ~0.5k tokens) and the per-session planning footprint ~95%. The ~34
157
+ operations are unchanged — they're reached as `roadmap({ op, args })`, with
158
+ schemas served on demand via `roadmap_describe`. Per-tool methodology moved
159
+ into the server `instructions` (now correctly top-level) plus the
160
+ `roadmapper://rubric` resource. Already-linked repos should re-run
161
+ `npm run mcp:setup -- --link <path>` to refresh the permission allow-list and
162
+ the CLAUDE.md block for the dispatch surface.
137
163
  - **0.9.3** — new `link_repo` tool: when `get_active_workspace` reports
138
164
  `env_default`/`unresolved` and you're in a git repo, one call persists
139
165
  the repo → workspace mapping so future sessions resolve silently (the
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@roadmapperai/mcp",
3
- "version": "0.9.4",
3
+ "version": "0.9.5",
4
4
  "description": "Roadmapper AI MCP server — exposes a planning surface (themes, capabilities, tasks, sprints, PRs) to coding agents via stdio JSON-RPC. Pairs with the Roadmapper AI workspace at dashboard.roadmapperai.com.",
5
5
  "keywords": [
6
6
  "mcp",
package/server.mjs CHANGED
@@ -469,6 +469,75 @@ function tplDescription(text, labels) {
469
469
  return out;
470
470
  }
471
471
 
472
+ /**
473
+ * The full tool descriptions below carry the planning methodology
474
+ * (USE WHEN / PREREQUISITE / ANTI-PATTERN / EXAMPLE) inline. That prose
475
+ * is relocated to the server `instructions` field — sent once at connect —
476
+ * plus the roadmapper://rubric resource, so the per-tool wire payload is
477
+ * just the one-line summary: the segment before the first blank line.
478
+ *
479
+ * Why: the 34 full descriptions cost ~15k tokens in every tools/list, in
480
+ * every session, used or not. The summaries cost ~3-4k. The methodology
481
+ * isn't lost — it moves to `instructions` (always sent, deduped) and the
482
+ * rubric resource (on demand), and the contract is still enforced server
483
+ * side (rubric/discovery gates + validateOutcome/validateName/etc. return
484
+ * structured `fix` errors). inputSchema, including every per-field
485
+ * description, is untouched — callers keep full argument-level guidance.
486
+ */
487
+ function summaryOf(description) {
488
+ const i = description.indexOf("\n\n");
489
+ return i === -1 ? description : description.slice(0, i);
490
+ }
491
+
492
+ /**
493
+ * The minimal planning contract an agent needs to file a VALID proposal,
494
+ * sent once in the initialize `instructions` field. This is the CORE
495
+ * extract of AGENTS.md (~600 tokens vs the full ~12.5k doc): the gate
496
+ * sequence, the task/capability shapes, the server-enforced falsifiable
497
+ * outcome + confidence rules, the enums, IDs, and don'ts. The full doc
498
+ * (tool catalogue, PR/branch conventions, RICE narrative, GitHub wiring)
499
+ * stays on demand via get_agents_md / roadmapper://rubric — reading
500
+ * either also satisfies the rubric gate. Keep this in sync with AGENTS.md
501
+ * sections: TL;DR, The mental model, Required agent task, Required
502
+ * capability fields, Outcome statements, Impact/Confidence, ID
503
+ * conventions, Don'ts.
504
+ */
505
+ const CORE_CONTRACT = `ROADMAPPER PLANNING CONTRACT (essentials — full version: the get_agents_md op, or read the roadmapper://rubric resource)
506
+
507
+ ACCESS — every operation runs through ONE tool
508
+ roadmap({ op, args }) executes an operation. roadmap_search(intent) lists/ranks the operations; roadmap_describe(op) returns an op's exact arguments. The op names used below (get_roadmap_snapshot, suggest_capability_for, propose_task, get_agents_md, ...) are values for op — e.g. roadmap({ op: "get_agents_md" }) or roadmap({ op: "propose_task", args: { capabilityId, title, effort } }).
509
+
510
+ PER-SESSION WORKFLOW
511
+ 1. Orient first: get_roadmap_snapshot (or list_themes / list_capabilities). This also satisfies the discovery gates below.
512
+ 2. Writing requires the rubric: every workspace-mutating tool (propose_*/update_*/archive_*/unarchive_*/move_*/record_outcome_reading/link_pr/submit_acceptance_grades) refuses until you call get_agents_md once this session (reading roadmapper://rubric also counts).
513
+ 3. Reuse before creating: suggest_capability_for({description}) to find an existing home; only propose a new capability if nothing fits. suggest_theme_for / list_themes before proposing a theme.
514
+ 4. Before your first write, call get_active_workspace and proceed only if its status is "resolved"; for any other status follow the \`next\` action it returns (e.g. link_repo) so writes don't land in the wrong workspace.
515
+ 5. dryRun:true validates any write without committing. Reference everything by stable ID, never by name.
516
+
517
+ MODEL (don't conflate the layers)
518
+ Theme (TH-NNNNNN · leadership · years) > Capability (CAP-XXXXXX · PM · quarters · a falsifiable bet) > Task (TK-NNNNNN · IC/agent · days) > PR (closes tasks). Sprints (SP-NNN) are 1-2 week buckets.
519
+
520
+ TASK fields
521
+ Required: capabilityId, title (>=5 chars), effort: XS|S|M|L|XL (XS=2h S=4h M=1d L=3d XL=8d).
522
+ Recommended (not enforced): kind: feature|bug|chore|spike, priority: P0|P1|P2|P3, acceptance: [checkable assertions], dependsOn: [TK-...].
523
+ Give any task an agent will pick up a non-empty acceptance list — an empty list is a stop signal (spike it or ask). Stamp authorKind:agent. Only set dependsOn when one task truly blocks another.
524
+
525
+ CAPABILITY fields
526
+ Required: name (>=8 chars), pillarId: TH-..., outcome (falsifiable — see below).
527
+ Optional (defaulted): reach: number >=0 (default 100), impact: 3|2|1|0.5|0.25 (default 1), confidence: 0-95 (default 70), specRef (spec link; supply before decomposing a capability into tasks so scope is pinned — convention, not enforced).
528
+
529
+ FALSIFIABLE OUTCOME (server-enforced — propose_capability rejects otherwise)
530
+ Template: <metric> moves from <baseline> to <target> by <date>, measured by <source>.
531
+ The outcome MUST contain both a number AND a temporal anchor. Use a 20XX year (e.g. 2026-09-30 or "Sep 2026") or a quarter (Q3, q1 2026) — a bare month name or "by <month>" is NOT accepted, so always include the year. Confidence: 100 is never accepted (server caps at 95); reserve 91-95 for work already shipped or behind a flag.
532
+ Good: "Activation rate moves from 32% to 55% by 2026-09-30, measured by the activated_user event."
533
+ Weak: "Improve builder UX" (no metric/baseline/date — rewrite, or file it as a task).
534
+
535
+ WHEN NOT TO CREATE A CAPABILITY
536
+ A one-off fix, infra under an existing bet, a refactor/rename, or anything that fits in one PR is a TASK under the existing capability — not a new bet. If you can't write a falsifiable outcome, it isn't a capability yet.
537
+
538
+ DON'TS
539
+ No capability-per-PR. No blank outcomes. Don't game RICE inputs. Don't edit theme IDs after creation. Don't self-promote a task to delivered — wait for the merged PR.`;
540
+
472
541
  /**
473
542
  * Resolve a config value from a primary `ROADMAPPER_*` env var,
474
543
  * falling back to a legacy `SUPABASE_*` alias when the primary
@@ -1340,7 +1409,7 @@ function validateOutcome(outcome) {
1340
1409
  !hasTemporal ? "date/quarter" : null,
1341
1410
  ]
1342
1411
  .filter(Boolean)
1343
- .join(" + ")}. See get_agents_md for examples.`;
1412
+ .join(" + ")}. See ${opCall("get_agents_md")} for examples.`;
1344
1413
  }
1345
1414
  return null;
1346
1415
  }
@@ -1479,6 +1548,20 @@ function resetSession() {
1479
1548
  session.mutatorBlocks = 0;
1480
1549
  }
1481
1550
 
1551
+ /**
1552
+ * Format an agent-facing "next call" in the dispatch shape. After the
1553
+ * tool-surface collapse the ONLY callable tool is `roadmap`; the 34 ops are
1554
+ * `op` values, so a fix field of `get_agents_md()` names something a real MCP
1555
+ * client can't invoke (it isn't in tools/list). opCall renders the reachable
1556
+ * form `roadmap({ op: "<op>"[, args: {...}] })`. argsHint is a preformatted
1557
+ * args literal (string) when the call needs arguments.
1558
+ */
1559
+ function opCall(op, argsHint) {
1560
+ return argsHint
1561
+ ? `roadmap({ op: "${op}", args: ${argsHint} })`
1562
+ : `roadmap({ op: "${op}" })`;
1563
+ }
1564
+
1482
1565
  /**
1483
1566
  * Build the structured "prereq missing" result the mutators return
1484
1567
  * when the agent hasn't fetched the rubric this session. The shape
@@ -1495,10 +1578,10 @@ function rubricMissingResult(toolName) {
1495
1578
  {
1496
1579
  error: "prerequisite_missing",
1497
1580
  message:
1498
- `Call get_agents_md first this session, then retry ${toolName}. ` +
1581
+ `Call ${opCall("get_agents_md")} first this session, then retry your ${toolName} call. ` +
1499
1582
  "The rubric defines acceptance criteria shape and grading dimensions — " +
1500
1583
  "proposals filed without it will not round-trip.",
1501
- fix: "get_agents_md()",
1584
+ fix: opCall("get_agents_md"),
1502
1585
  },
1503
1586
  null,
1504
1587
  2
@@ -1523,7 +1606,7 @@ function discoveryMissingResult(toolName, fixCall, rationale) {
1523
1606
  {
1524
1607
  error: "discovery_missing",
1525
1608
  message:
1526
- `Call ${fixCall} first this session, then retry ${toolName}. ${rationale}`,
1609
+ `Call ${fixCall} first this session, then retry your ${toolName} call. ${rationale}`,
1527
1610
  fix: fixCall,
1528
1611
  },
1529
1612
  null,
@@ -1566,12 +1649,12 @@ function repoUnmappedResult(toolName, slug, envWsId) {
1566
1649
  error: "repo_unmapped",
1567
1650
  message:
1568
1651
  `"${slug}" isn't mapped to a workspace, so ${toolName} would land on the install-default workspace "${envWsId}" — probably not what you want. ` +
1569
- `Map it once with link_repo (this repo → your key's workspace, resolves silently forever after), then retry ${toolName}. ` +
1570
- `Or, if you meant a specific existing workspace, pass workspaceId on the call and it proceeds without mapping the repo.`,
1652
+ `Map it once with ${opCall("link_repo")} (this repo → your key's workspace, resolves silently forever after), then retry your ${toolName} call. ` +
1653
+ `Or, if you meant a specific existing workspace, pass workspaceId in the op's args and it proceeds without mapping the repo.`,
1571
1654
  repo: slug,
1572
1655
  envDefaultWorkspace: envWsId,
1573
- fix: "link_repo()",
1574
- alt: `${toolName}({ workspaceId: "<target>", ... })`,
1656
+ fix: opCall("link_repo"),
1657
+ alt: opCall(toolName, '{ workspaceId: "<target>", ... }'),
1575
1658
  },
1576
1659
  null,
1577
1660
  2
@@ -2009,7 +2092,7 @@ const TOOLS = [
2009
2092
  outcome: { type: "string" },
2010
2093
  reach: { type: "number" },
2011
2094
  impact: { type: "number", enum: [3, 2, 1, 0.5, 0.25] },
2012
- confidence: { type: "number", minimum: 0, maximum: 100 },
2095
+ confidence: { type: "number", minimum: 0, maximum: 95 },
2013
2096
  roi: { type: "number", description: "Estimated annual ROI in raw dollars (e.g. 2500000 = $2.5M)." },
2014
2097
  specRef: { type: "string" },
2015
2098
  idempotencyKey: { type: "string" },
@@ -2040,7 +2123,11 @@ const TOOLS = [
2040
2123
  properties: {
2041
2124
  index: { type: "integer", minimum: 0 },
2042
2125
  status: { type: "string", enum: ["pass", "fail"] },
2043
- note: { type: "string" },
2126
+ note: {
2127
+ type: "string",
2128
+ description:
2129
+ "Required when status=fail — the failure mode the reviewer needs. Call this before opening the PR.",
2130
+ },
2044
2131
  },
2045
2132
  required: ["index", "status"],
2046
2133
  additionalProperties: false,
@@ -2582,7 +2669,206 @@ const MUTATOR_TOOLS = new Set([
2582
2669
  "record_outcome_reading",
2583
2670
  ]);
2584
2671
 
2672
+ // --- Dispatch surface -------------------------------------------------
2673
+ // Token-efficiency collapse: instead of advertising all 34 tools (their
2674
+ // inputSchemas alone are ~5k tokens in every tools/list), the wire surface
2675
+ // is three dispatch tools. The 34 operations are routed by name through
2676
+ // callTool (see the roadmap/roadmap_search/roadmap_describe early returns),
2677
+ // and their schemas are served on demand via roadmap_describe. This keeps
2678
+ // tools/list tiny while every operation, gate, and validator is unchanged.
2679
+ const OP_NAMES = new Set(TOOLS.map((t) => t.name));
2680
+ const DISPATCH_TOOLS = new Set(["roadmap", "roadmap_search", "roadmap_describe"]);
2681
+
2682
+ const META_TOOLS = [
2683
+ {
2684
+ name: "roadmap_search",
2685
+ description:
2686
+ "Find the right roadmap operation for what you want to do. Returns operation names with one-line summaries, ranked by your intent (or all of them if you omit intent). Then call roadmap_describe(op) for an op's arguments and roadmap({op, args}) to run it.",
2687
+ inputSchema: {
2688
+ type: "object",
2689
+ properties: {
2690
+ intent: {
2691
+ type: "string",
2692
+ description:
2693
+ "Free-text description of the task, e.g. 'file a new bet' or 'mark a task done'. Omit to list every operation.",
2694
+ },
2695
+ },
2696
+ additionalProperties: false,
2697
+ },
2698
+ },
2699
+ {
2700
+ name: "roadmap_describe",
2701
+ description:
2702
+ "Return the input schema and summary for one roadmap operation (op, e.g. 'propose_task'). Call before roadmap({op, args}) when you need the exact argument shape; this is the same schema the operation validates against.",
2703
+ inputSchema: {
2704
+ type: "object",
2705
+ properties: {
2706
+ op: { type: "string", description: "Operation name to describe, e.g. propose_task." },
2707
+ },
2708
+ required: ["op"],
2709
+ additionalProperties: false,
2710
+ },
2711
+ },
2712
+ {
2713
+ name: "roadmap",
2714
+ description:
2715
+ "Execute any roadmap operation: roadmap({ op, args }). op is an operation name such as get_roadmap_snapshot, list_capabilities, suggest_capability_for, propose_task, or update_capability — discover them with roadmap_search, get their arguments with roadmap_describe. All reads, planning, and writes go through here; the server enforces the rubric/discovery gates and per-op validation. See the server instructions for the planning contract.",
2716
+ inputSchema: {
2717
+ type: "object",
2718
+ properties: {
2719
+ op: {
2720
+ type: "string",
2721
+ description:
2722
+ "Operation name, e.g. get_roadmap_snapshot, suggest_capability_for, propose_task. Call roadmap_search to discover ops.",
2723
+ },
2724
+ args: {
2725
+ type: "object",
2726
+ description:
2727
+ "Arguments for the op (see roadmap_describe(op)). Omit for ops that take none.",
2728
+ additionalProperties: true,
2729
+ },
2730
+ },
2731
+ required: ["op"],
2732
+ additionalProperties: false,
2733
+ },
2734
+ },
2735
+ ];
2736
+
2737
+ // roadmap_search: rank the 34 ops by token overlap with the intent and
2738
+ // return {op, summary} rows. Summaries are the trimmed first line, run
2739
+ // through the same label substitution tool descriptions get, so custom
2740
+ // workspace labels (theme -> initiative) stay consistent.
2741
+ function roadmapSearchResult(intent) {
2742
+ const labels = currentLabels();
2743
+ const ops = TOOLS.map((t) => ({
2744
+ op: t.name,
2745
+ summary: tplDescription(summaryOf(t.description), labels),
2746
+ }));
2747
+ const q = (intent || "").toLowerCase().trim();
2748
+ let operations = ops;
2749
+ if (q) {
2750
+ const terms = q.split(/[^a-z0-9]+/).filter((w) => w.length > 2);
2751
+ if (terms.length) {
2752
+ const score = (o) => {
2753
+ const hay = (o.op + " " + o.summary).toLowerCase();
2754
+ return terms.reduce((n, w) => n + (hay.includes(w) ? 1 : 0), 0);
2755
+ };
2756
+ operations = ops
2757
+ .map((o) => ({ o, s: score(o) }))
2758
+ .sort((a, b) => b.s - a.s)
2759
+ .map((x) => x.o);
2760
+ }
2761
+ }
2762
+ return textResult(
2763
+ JSON.stringify(
2764
+ {
2765
+ intent: intent || null,
2766
+ note: "Call roadmap_describe({ op }) for an op's arguments, then roadmap({ op, args }) to run it.",
2767
+ total: operations.length,
2768
+ operations,
2769
+ },
2770
+ null,
2771
+ 2
2772
+ )
2773
+ );
2774
+ }
2775
+
2776
+ // roadmap_describe: serve one op's inputSchema (the bulk that was evicted
2777
+ // from tools/list) plus its trimmed summary, on demand.
2778
+ function roadmapDescribeResult(op) {
2779
+ if (typeof op !== "string" || !op) {
2780
+ return errorResult(
2781
+ "roadmap_describe requires an 'op' string, e.g. roadmap_describe({ op: 'propose_task' })."
2782
+ );
2783
+ }
2784
+ const t = TOOLS.find((x) => x.name === op);
2785
+ if (!t) {
2786
+ return errorResult(
2787
+ `Unknown op '${op}'. Call roadmap_search to list operations.`
2788
+ );
2789
+ }
2790
+ return textResult(
2791
+ JSON.stringify(
2792
+ {
2793
+ op: t.name,
2794
+ summary: tplDescription(summaryOf(t.description), currentLabels()),
2795
+ inputSchema: t.inputSchema,
2796
+ },
2797
+ null,
2798
+ 2
2799
+ )
2800
+ );
2801
+ }
2802
+
2585
2803
  async function callTool(name, args) {
2804
+ // Dispatch surface. roadmap_search / roadmap_describe answer here without
2805
+ // touching the workspace. roadmap({op, args}) re-enters callTool(op, args)
2806
+ // so the operation runs through the IDENTICAL pipeline below — workspace
2807
+ // resolution, the MUTATOR_TOOLS gates, validators, session-flag side
2808
+ // effects — keyed off the real op name, with nothing duplicated. The ops
2809
+ // also stay directly callable (back-compat + what the selftest drives).
2810
+ if (name === "roadmap_search") {
2811
+ return roadmapSearchResult(typeof args?.intent === "string" ? args.intent : "");
2812
+ }
2813
+ if (name === "roadmap_describe") {
2814
+ return roadmapDescribeResult(args?.op);
2815
+ }
2816
+ if (name === "roadmap") {
2817
+ const op = args?.op;
2818
+ if (typeof op !== "string" || !op) {
2819
+ return errorResult(
2820
+ "roadmap requires an 'op', e.g. roadmap({ op: 'get_roadmap_snapshot' }). Call roadmap_search to discover operations."
2821
+ );
2822
+ }
2823
+ if (DISPATCH_TOOLS.has(op)) {
2824
+ return errorResult(
2825
+ `'${op}' is a dispatch tool, not an operation. Pass a real op such as get_roadmap_snapshot — call roadmap_search to list them.`
2826
+ );
2827
+ }
2828
+ if (!OP_NAMES.has(op)) {
2829
+ return errorResult(
2830
+ `Unknown op '${op}'. Call roadmap_search to list operations, or roadmap_describe({ op }) for one.`
2831
+ );
2832
+ }
2833
+ // Accept BOTH the documented nested shape { op, args: {...} } AND the flat
2834
+ // shape { op, ...fields } that LLM clients routinely emit when they hoist
2835
+ // scalar arguments to the top level. Flat siblings fill in keys the nested
2836
+ // object omits; on conflict the nested (documented) shape wins. This means
2837
+ // a top-level workspaceId/dryRun is never silently dropped — which would
2838
+ // mis-target a write or turn a validate-only call into a real one. No op
2839
+ // uses 'op'/'args' as an argument name, so stripping them here is safe.
2840
+ const { op: _op, args: nested, ...flat } = args ?? {};
2841
+ let inner;
2842
+ if (nested == null) {
2843
+ inner = flat; // flat shape (or no args at all)
2844
+ } else if (typeof nested === "object" && !Array.isArray(nested)) {
2845
+ inner = { ...flat, ...nested };
2846
+ } else if (typeof nested === "string") {
2847
+ // Some clients JSON-encode the args object into a string. Parse and
2848
+ // merge when it's an object; otherwise surface the real cause rather
2849
+ // than silently dropping it (which produced a misleading downstream
2850
+ // "X is required" from the inner op).
2851
+ let parsed;
2852
+ try {
2853
+ parsed = JSON.parse(nested);
2854
+ } catch {
2855
+ parsed = undefined;
2856
+ }
2857
+ if (parsed && typeof parsed === "object" && !Array.isArray(parsed)) {
2858
+ inner = { ...flat, ...parsed };
2859
+ } else {
2860
+ return errorResult(
2861
+ `roadmap 'args' must be an object — got a string that isn't a JSON object. Call as roadmap({ op: "${op}", args: { ... } }) (or hoist the fields to the top level).`
2862
+ );
2863
+ }
2864
+ } else {
2865
+ return errorResult(
2866
+ `roadmap 'args' must be an object — got ${Array.isArray(nested) ? "an array" : typeof nested}. Call as roadmap({ op: "${op}", args: { ... } }).`
2867
+ );
2868
+ }
2869
+ return callTool(op, inner);
2870
+ }
2871
+
2586
2872
  // Each tool may override the workspace via args.workspaceId. The
2587
2873
  // projection is workspace-scoped, so we pass that through to the
2588
2874
  // read. Tools that need to know the resolved id later (write paths,
@@ -2881,8 +3167,8 @@ async function callTool(name, args) {
2881
3167
  );
2882
3168
  return discoveryMissingResult(
2883
3169
  name,
2884
- 'suggest_theme_for({ description: "<the work you are about to propose>" })',
2885
- "Rank existing themes by relevance before proposing a new one — themes are years-stable, duplicates are the most common failure mode. Any returned top score >0.4 means an existing theme is a sensible home; re-use it. list_themes() or get_roadmap_snapshot() also satisfy this gate if you want the full catalogue."
3170
+ opCall("suggest_theme_for", '{ description: "<the work you are about to propose>" }'),
3171
+ "Rank existing themes by relevance before proposing a new one — themes are years-stable, duplicates are the most common failure mode. Any returned top score >0.4 means an existing theme is a sensible home; re-use it. The list_themes or get_roadmap_snapshot ops also satisfy this gate if you want the full catalogue."
2886
3172
  );
2887
3173
  }
2888
3174
  if (
@@ -2901,7 +3187,7 @@ async function callTool(name, args) {
2901
3187
  );
2902
3188
  return discoveryMissingResult(
2903
3189
  name,
2904
- 'suggest_capability_for({ description: "<the work you are about to propose>" })',
3190
+ opCall("suggest_capability_for", '{ description: "<the work you are about to propose>" }'),
2905
3191
  "Rank existing capabilities by relevance before proposing a new one. If any score is >0.4, attach tasks there instead."
2906
3192
  );
2907
3193
  }
@@ -3055,7 +3341,7 @@ async function callTool(name, args) {
3055
3341
  _meta: {
3056
3342
  roadmapper: {
3057
3343
  reminder:
3058
- "Rubric loaded. You can now safely call propose_task, propose_capability, propose_theme, submit_acceptance_grades, link_pr.",
3344
+ 'Rubric loaded. You can now safely run the write ops via roadmap({ op, args }) — e.g. roadmap({ op: "propose_task", args: {...} }), propose_capability, propose_theme, submit_acceptance_grades, link_pr.',
3059
3345
  },
3060
3346
  },
3061
3347
  });
@@ -3234,12 +3520,12 @@ async function proposeTask(args, projected, wsId) {
3234
3520
  if (best && best.score > 0.2 && best.score > chosenScore + 0.1) {
3235
3521
  return (
3236
3522
  base +
3237
- `The task text fits ${best.id} (${best.name}) noticeably better (score ${best.score.toFixed(2)}) than the chosen ${cap.id} (${chosenScore.toFixed(2)}). If that's the right home, move_task it there.`
3523
+ `The task text fits ${best.id} (${best.name}) noticeably better (score ${best.score.toFixed(2)}) than the chosen ${cap.id} (${chosenScore.toFixed(2)}). If that's the right home, move it there with ${opCall("move_task")}.`
3238
3524
  );
3239
3525
  }
3240
3526
  return (
3241
3527
  base +
3242
- "If you're confident in the parent, ignore this; otherwise call suggest_capability_for({ taskId }) to confirm."
3528
+ `If you're confident in the parent, ignore this; otherwise call ${opCall("suggest_capability_for", "{ taskId }")} to confirm.`
3243
3529
  );
3244
3530
  }
3245
3531
 
@@ -3523,11 +3809,11 @@ async function proposeTheme(args, projected, wsId) {
3523
3809
  `"${name}" overlaps the existing theme ${nearest.id} (${nearest.name}) ` +
3524
3810
  `at ${nearestScore.toFixed(2)} (block bar ${THEME_SPRAWL_BLOCK}). Themes are the ` +
3525
3811
  "small, years-stable top tier — a near-duplicate fragments the strategic view. " +
3526
- "Reuse it: file your work as a capability under it (propose_capability with " +
3527
- `pillarId: "${nearest.id}"), or broaden its scope with update_theme. If this is ` +
3812
+ "Reuse it: file your work as a capability under it (the propose_capability op with " +
3813
+ `pillarId: "${nearest.id}"), or broaden its scope with the update_theme op. If this is ` +
3528
3814
  "genuinely a distinct strategic pillar, retry with force:true.",
3529
3815
  nearestTheme: { id: nearest.id, name: nearest.name, score: Number(nearestScore.toFixed(3)) },
3530
- fix: `propose_capability({ pillarId: "${nearest.id}", ... })`,
3816
+ fix: opCall("propose_capability", `{ pillarId: "${nearest.id}", ... }`),
3531
3817
  },
3532
3818
  null,
3533
3819
  2
@@ -3555,7 +3841,7 @@ async function proposeTheme(args, projected, wsId) {
3555
3841
  ...(nearest
3556
3842
  ? { closestExisting: { id: nearest.id, name: nearest.name, score: Number(nearestScore.toFixed(3)) } }
3557
3843
  : {}),
3558
- fix: "propose_theme({ ...same args, confirm: true })",
3844
+ fix: opCall("propose_theme", "{ ...same args, confirm: true }"),
3559
3845
  },
3560
3846
  null,
3561
3847
  2
@@ -3631,7 +3917,7 @@ async function proposeCapability(args, projected, wsId) {
3631
3917
  const theme = projected.themes.find((t) => t.id === pillarId);
3632
3918
  if (!theme) {
3633
3919
  return errorResult(
3634
- `pillarId ${pillarId} doesn't match any known theme. Call list_themes first.`
3920
+ `pillarId ${pillarId} doesn't match any known theme. Run ${opCall("list_themes")} first.`
3635
3921
  );
3636
3922
  }
3637
3923
  if (typeof args.impact === "number" && !VALID_IMPACTS.has(args.impact)) {
@@ -3871,7 +4157,7 @@ function suggestCapabilityFor(args, projected) {
3871
4157
  roadmapper: {
3872
4158
  reminder:
3873
4159
  ranked.length === 0
3874
- ? "No existing capability is a sensible parent. Before calling propose_capability, verify with the user that a brand-new capability is warranted — capabilities are quarterly bets, not single tasks."
4160
+ ? `No existing capability is a sensible parent. Before ${opCall("propose_capability")}, verify with the user that a brand-new capability is warranted — capabilities are quarterly bets, not single tasks.`
3875
4161
  : "No strong match (top score < 0.4). If none of the listed capabilities fit, ask the user before calling propose_capability — the top match is often closer than it scores.",
3876
4162
  },
3877
4163
  },
@@ -3947,7 +4233,7 @@ function suggestThemeFor(args, projected) {
3947
4233
  roadmapper: {
3948
4234
  reminder: autonomy
3949
4235
  ? ranked.length === 0
3950
- ? "No existing theme overlaps. Theme-autonomy is ON, so you may call propose_theme directly if this is a genuinely new strategic pillar — the server will refuse it only if it's a near-duplicate of an existing theme."
4236
+ ? `No existing theme overlaps. Theme-autonomy is ON, so you may run ${opCall("propose_theme")} directly if this is a genuinely new strategic pillar — the server will refuse it only if it's a near-duplicate of an existing theme.`
3951
4237
  : "No strong match (top score < 0.4). Prefer the closest existing theme if it fits; otherwise propose_theme is fine (autonomy is ON, sprawl is guarded server-side)."
3952
4238
  : ranked.length === 0
3953
4239
  ? "No existing theme overlaps. Theme-autonomy is OFF for this workspace — verify with the user that this is a genuinely new strategic direction before propose_theme, and pass confirm:true."
@@ -4616,7 +4902,7 @@ function detectCapabilityGaps(args, projected) {
4616
4902
  roadmapper: {
4617
4903
  reminder:
4618
4904
  `${shaped.length} capability gap(s) detected — clusters of uncategorized work no existing bet covers. ` +
4619
- "Each is a CANDIDATE for propose_capability (confirm with the user — capabilities are quarterly bets, not auto-created), then move_tasks the members under it.",
4905
+ `Each is a CANDIDATE for ${opCall("propose_capability")} (confirm with the user — capabilities are quarterly bets, not auto-created), then ${opCall("move_tasks")} the members under it.`,
4620
4906
  },
4621
4907
  },
4622
4908
  }
@@ -4797,7 +5083,7 @@ function buildReminder(toolName, projected) {
4797
5083
  toolName === "list_themes")
4798
5084
  ) {
4799
5085
  reminders.push(
4800
- "Call get_agents_md before any propose_* / submit_acceptance_grades / link_pr call those tools refuse without it."
5086
+ `Call ${opCall("get_agents_md")} before any write op (propose_* / submit_acceptance_grades / link_pr)they refuse without it.`
4801
5087
  );
4802
5088
  }
4803
5089
  // Tasks with merged PRs but no acceptance grades = ungraded
@@ -4816,7 +5102,7 @@ function buildReminder(toolName, projected) {
4816
5102
  reminders.push(
4817
5103
  `${ungraded.length} delivered task${ungraded.length === 1 ? "" : "s"} ` +
4818
5104
  `have merged PRs without submitted acceptance grades. ` +
4819
- `Call submit_acceptance_grades for: ${ids}${more}.`
5105
+ `Call ${opCall("submit_acceptance_grades")} for: ${ids}${more}.`
4820
5106
  );
4821
5107
  }
4822
5108
  }
@@ -4863,14 +5149,14 @@ const RESOURCES = [
4863
5149
  uri: "roadmapper://capabilities/active",
4864
5150
  name: "Active capabilities (snapshot)",
4865
5151
  description:
4866
- "Live list of non-delivered capabilities for the env-default workspace. Read this before propose_task or propose_capability to find the right parent. Note: MCP resources don't accept arguments, so this always reads SUPABASE_WORKSPACE_ID's workspace — use list_capabilities({ workspaceId }) for cross-workspace reads.",
5152
+ `Live list of non-delivered capabilities for the env-default workspace. Read this before proposing tasks or capabilities to find the right parent. Note: MCP resources don't accept arguments, so this always reads SUPABASE_WORKSPACE_ID's workspace — use roadmap({ op: "list_capabilities", args: { workspaceId } }) for cross-workspace reads.`,
4867
5153
  mimeType: "application/json",
4868
5154
  },
4869
5155
  {
4870
5156
  uri: "roadmapper://tasks/open",
4871
5157
  name: "Open tasks (snapshot)",
4872
5158
  description:
4873
- "Live list of in_progress + planned tasks for the env-default workspace. Same workspaceId caveat as roadmapper://capabilities/active — use list_tasks({ workspaceId }) for cross-workspace reads.",
5159
+ `Live list of in_progress + planned tasks for the env-default workspace. Same workspaceId caveat as roadmapper://capabilities/active — use roadmap({ op: "list_tasks", args: { workspaceId } }) for cross-workspace reads.`,
4874
5160
  mimeType: "application/json",
4875
5161
  },
4876
5162
  ];
@@ -5001,33 +5287,33 @@ function renderPrompt(name, args) {
5001
5287
  case "plan-feature":
5002
5288
  return (
5003
5289
  `Plan a feature: "${args.description ?? "(no description provided)"}"\n\n` +
5004
- "Follow this flow exactly:\n" +
5005
- "1. Call get_agents_md (or read roadmapper://rubric) to load the rubric for this session.\n" +
5006
- "2. Call suggest_capability_for with the description above. Read every returned candidate's outcome before deciding.\n" +
5007
- "3. If a returned candidate scores > 0.4 OR its outcome maps to what we're building, propose tasks under it via propose_task. Each task MUST include acceptance criteria per the rubric.\n" +
5008
- "4. If nothing fits, STOP and ask the user before calling propose_capability — capabilities are quarterly bets, not single tasks.\n" +
5290
+ "Every operation runs through one tool: roadmap({ op, args }). Follow this flow exactly:\n" +
5291
+ '1. roadmap({ op: "get_agents_md" }) (or read the roadmapper://rubric resource) to load the rubric for this session.\n' +
5292
+ '2. roadmap({ op: "suggest_capability_for", args: { description } }) with the description above. Read every returned candidate\'s outcome before deciding.\n' +
5293
+ '3. If a returned candidate scores > 0.4 OR its outcome maps to what we\'re building, propose tasks under it via roadmap({ op: "propose_tasks", args: { capabilityId, tasks: [...] } }). Each task MUST include acceptance criteria per the rubric.\n' +
5294
+ '4. If nothing fits, STOP and ask the user before roadmap({ op: "propose_capability", args }) — capabilities are quarterly bets, not single tasks.\n' +
5009
5295
  "5. After tasks are proposed, summarize: capabilityId chosen, task ids created, anything skipped and why."
5010
5296
  );
5011
5297
  case "close-task":
5012
5298
  return (
5013
5299
  `Close task ${args.task_id ?? "(missing task_id)"}.\n\n` +
5014
- "Follow this flow exactly:\n" +
5015
- "1. Call get_agents_md (or read roadmapper://rubric) to load grading dimensions.\n" +
5016
- `2. Call get_task({ id: "${args.task_id ?? ""}" }) and read every acceptance criterion.\n` +
5300
+ "Every operation runs through one tool: roadmap({ op, args }). Follow this flow exactly:\n" +
5301
+ '1. roadmap({ op: "get_agents_md" }) (or read the roadmapper://rubric resource) to load grading dimensions.\n' +
5302
+ `2. roadmap({ op: "get_task", args: { id: "${args.task_id ?? ""}" } }) and read every acceptance criterion.\n` +
5017
5303
  "3. For each criterion, decide pass/fail. Fabricated passes destroy this signal — only mark pass if you verified.\n" +
5018
- "4. Call submit_acceptance_grades with the per-index results. Include a note on any fail.\n" +
5304
+ '4. roadmap({ op: "submit_acceptance_grades", args: { taskId, grades } }) with the per-index results. Include a note on any fail.\n' +
5019
5305
  (args.pr_url
5020
- ? `5. Call link_pr to attach ${args.pr_url} to the task.\n`
5021
- : "5. If you opened a PR, call link_pr to attach it.\n") +
5306
+ ? `5. roadmap({ op: "link_pr", args: {...} }) to attach ${args.pr_url} to the task.\n`
5307
+ : '5. If you opened a PR, roadmap({ op: "link_pr", args: {...} }) to attach it.\n') +
5022
5308
  "6. Stamp Roadmapper-Task: " +
5023
5309
  (args.task_id ?? "TK-NNNNNN") +
5024
5310
  " in the PR body so the webhook routes future events back here."
5025
5311
  );
5026
5312
  case "weekly-review":
5027
5313
  return (
5028
- "Run a structured roadmap review.\n\n" +
5029
- "1. Call get_agents_md to load the rubric (or confirm rubric is current).\n" +
5030
- "2. Call get_roadmap_snapshot for the canonical model. Note any _meta reminders in the response.\n" +
5314
+ "Run a structured roadmap review. Every operation runs through one tool: roadmap({ op, args }).\n\n" +
5315
+ '1. roadmap({ op: "get_agents_md" }) to load the rubric (or confirm rubric is current).\n' +
5316
+ '2. roadmap({ op: "get_roadmap_snapshot" }) for the canonical model. Note any _meta reminders in the response.\n' +
5031
5317
  "3. For each active capability, scan: are open tasks aging? Are any without acceptance criteria? Are there delivered tasks without acceptance grades?\n" +
5032
5318
  "4. List capabilities whose outcomes are no longer falsifiable or whose tasks all delivered (close them or pivot).\n" +
5033
5319
  "5. Report: ungraded deliveries, stale capabilities, capabilities ready to close, suggested next bets."
@@ -5070,6 +5356,36 @@ async function handle(request) {
5070
5356
  // boundary for "you need to fetch the rubric again."
5071
5357
  resetSession();
5072
5358
  recordTelemetry("session_initialized", { stats });
5359
+ // Build the server instructions once. A dynamic preamble (resolved
5360
+ // workspace + where it came from + live counts, so the agent can
5361
+ // trust where its writes land instead of discovering an empty/wrong
5362
+ // workspace later) followed by the static CORE planning contract.
5363
+ // Surfaced at the TOP LEVEL of the result — the MCP-spec
5364
+ // `instructions` channel that compliant clients (Claude Code,
5365
+ // Cursor) inject into context. It previously lived only inside
5366
+ // serverInfo, where the spec doesn't define it, so spec-reading
5367
+ // clients silently dropped it. The gate/suggest reminders that used
5368
+ // to sit here are now folded into CORE_CONTRACT's workflow section.
5369
+ const instructions = (() => {
5370
+ const { id: ws, source } = resolveWorkspaceWithSource();
5371
+ const wsLine = ws
5372
+ ? `Workspace: ${ws} (resolved from ${source}). `
5373
+ : "No workspace resolved yet. ";
5374
+ const rootsLine = _clientSupportsRoots
5375
+ ? "Detecting the repo you're in to pick its workspace; call get_active_workspace before your first write to confirm. "
5376
+ : ws
5377
+ ? ""
5378
+ : "Set ROADMAPPER_WORKSPACE_ID or open a connected repo. ";
5379
+ const preamble =
5380
+ "Roadmapper online — " +
5381
+ wsLine +
5382
+ `${stats.themes} theme${stats.themes === 1 ? "" : "s"}, ` +
5383
+ `${stats.capabilities} capabilit${stats.capabilities === 1 ? "y" : "ies"}, ` +
5384
+ `${stats.openTasks} open task${stats.openTasks === 1 ? "" : "s"}. ` +
5385
+ rootsLine +
5386
+ "Slash-prompts available: roadmapper:plan-feature, roadmapper:close-task, roadmapper:weekly-review.";
5387
+ return preamble + "\n\n" + CORE_CONTRACT;
5388
+ })();
5073
5389
  return {
5074
5390
  jsonrpc: "2.0",
5075
5391
  id,
@@ -5083,38 +5399,14 @@ async function handle(request) {
5083
5399
  resources: { listChanged: false },
5084
5400
  prompts: { listChanged: false },
5085
5401
  },
5402
+ // Top-level instructions: the spec-defined channel. serverInfo
5403
+ // keeps only name/version/stats (stats is a non-standard extra
5404
+ // some clients surface as "server info").
5405
+ instructions,
5086
5406
  serverInfo: {
5087
5407
  name: SERVER_NAME,
5088
5408
  version: SERVER_VERSION,
5089
5409
  stats,
5090
- instructions: (() => {
5091
- // Name the workspace we resolve to RIGHT NOW + where it came
5092
- // from, so the agent can trust where its writes land instead
5093
- // of discovering an empty/wrong workspace later. Repo-based
5094
- // resolution (roots → repo_workspace_map) finishes just after
5095
- // this handshake, so if the client supports roots we say the
5096
- // target may refine and to confirm via get_active_workspace.
5097
- const { id: ws, source } = resolveWorkspaceWithSource();
5098
- const wsLine = ws
5099
- ? `Workspace: ${ws} (resolved from ${source}). `
5100
- : "No workspace resolved yet. ";
5101
- const rootsLine = _clientSupportsRoots
5102
- ? "Detecting the repo you're in to pick its workspace; call get_active_workspace before your first write to confirm. "
5103
- : ws
5104
- ? ""
5105
- : "Set ROADMAPPER_WORKSPACE_ID or open a connected repo. ";
5106
- return (
5107
- "Roadmapper online — " +
5108
- wsLine +
5109
- `${stats.themes} theme${stats.themes === 1 ? "" : "s"}, ` +
5110
- `${stats.capabilities} capabilit${stats.capabilities === 1 ? "y" : "ies"}, ` +
5111
- `${stats.openTasks} open task${stats.openTasks === 1 ? "" : "s"}. ` +
5112
- rootsLine +
5113
- "Call get_agents_md before planning — the propose_* and submit_acceptance_grades tools refuse without it. " +
5114
- "Use suggest_capability_for before propose_capability. " +
5115
- "Slash-prompts available: roadmapper:plan-feature, roadmapper:close-task, roadmapper:weekly-review."
5116
- );
5117
- })(),
5118
5410
  },
5119
5411
  },
5120
5412
  };
@@ -5128,7 +5420,11 @@ async function handle(request) {
5128
5420
  // so the timing usually works out.
5129
5421
  startLabelLoad();
5130
5422
  const labels = currentLabels();
5131
- const tools = TOOLS.map((t) => ({
5423
+ // Advertise the three dispatch tools, not the 34 operations. The ops
5424
+ // (and their schemas) are reachable via roadmap_search / roadmap_describe
5425
+ // / roadmap — see META_TOOLS and the callTool dispatch. tplDescription
5426
+ // still runs so custom workspace labels apply.
5427
+ const tools = META_TOOLS.map((t) => ({
5132
5428
  ...t,
5133
5429
  description: tplDescription(t.description, labels),
5134
5430
  }));
@@ -5197,6 +5493,23 @@ async function runSelftest() {
5197
5493
  r?.result?.capabilities?.resources &&
5198
5494
  r?.result?.capabilities?.prompts,
5199
5495
  },
5496
+ {
5497
+ // The CORE planning contract must ride on the TOP-LEVEL `instructions`
5498
+ // field (the spec channel clients read), not buried in serverInfo, and
5499
+ // must carry the server-enforced falsifiable-outcome rule so an agent
5500
+ // can file a valid proposal without first fetching the full AGENTS.md.
5501
+ name: "initialize returns top-level instructions with the core contract",
5502
+ fn: () => handle({ id: 2, method: "initialize", params: {} }),
5503
+ pass: (r) => {
5504
+ const instr = r?.result?.instructions;
5505
+ return (
5506
+ typeof instr === "string" &&
5507
+ instr.length > 0 &&
5508
+ instr.includes("FALSIFIABLE OUTCOME") &&
5509
+ instr.includes("get_agents_md")
5510
+ );
5511
+ },
5512
+ },
5200
5513
  {
5201
5514
  // Hitting a mutator with no rubric fetched must return the
5202
5515
  // structured prerequisite_missing error with a `fix` field,
@@ -5485,10 +5798,317 @@ async function runSelftest() {
5485
5798
  r.result.messages[0].content.text.includes("demo description"),
5486
5799
  },
5487
5800
  {
5488
- name: "tools/list",
5801
+ // The wire surface is the three dispatch tools, NOT the 34 ops.
5802
+ name: "tools/list advertises exactly the three dispatch tools",
5489
5803
  fn: () => handle({ id: 2, method: "tools/list", params: {} }),
5490
- pass: (r) =>
5491
- Array.isArray(r?.result?.tools) && r.result.tools.length === TOOLS.length,
5804
+ pass: (r) => {
5805
+ const names = (r?.result?.tools ?? []).map((t) => t.name).sort();
5806
+ return (
5807
+ names.length === META_TOOLS.length &&
5808
+ ["roadmap", "roadmap_describe", "roadmap_search"].every((n) =>
5809
+ names.includes(n)
5810
+ )
5811
+ );
5812
+ },
5813
+ },
5814
+ {
5815
+ // tools/list must serve TRIMMED descriptions (summary only): every
5816
+ // tool keeps a non-empty one-line summary, and the methodology blocks
5817
+ // (USE WHEN / PREREQUISITE / ANTI-PATTERN / EXAMPLE) must be gone from
5818
+ // the wire payload — they now live in `instructions` + the rubric.
5819
+ // Guards against a regression that re-serves the full descriptions.
5820
+ name: "tools/list serves trimmed one-line descriptions",
5821
+ fn: () => handle({ id: 23, method: "tools/list", params: {} }),
5822
+ pass: (r) => {
5823
+ const tools = r?.result?.tools;
5824
+ if (!Array.isArray(tools) || tools.length === 0) return false;
5825
+ return tools.every(
5826
+ (t) =>
5827
+ typeof t.description === "string" &&
5828
+ t.description.length > 0 &&
5829
+ !t.description.includes("\n\n") &&
5830
+ !t.description.includes("ANTI-PATTERN:") &&
5831
+ !t.description.includes("PREREQUISITE:")
5832
+ );
5833
+ },
5834
+ },
5835
+ {
5836
+ // roadmap_search returns the op catalogue (all 34 when no intent),
5837
+ // each row carrying a trimmed summary (no methodology blocks).
5838
+ name: "roadmap_search lists operations with trimmed summaries",
5839
+ fn: () =>
5840
+ handle({
5841
+ id: 24,
5842
+ method: "tools/call",
5843
+ params: { name: "roadmap_search", arguments: {} },
5844
+ }),
5845
+ pass: (r) => {
5846
+ if (r?.result?.isError) return false;
5847
+ const text = r?.result?.content?.[0]?.text ?? "";
5848
+ let body;
5849
+ try {
5850
+ body = JSON.parse(text);
5851
+ } catch {
5852
+ return false;
5853
+ }
5854
+ const ops = body?.operations ?? [];
5855
+ return (
5856
+ ops.length === TOOLS.length &&
5857
+ ops.some((o) => o.op === "propose_task") &&
5858
+ ops.every(
5859
+ (o) =>
5860
+ typeof o.summary === "string" &&
5861
+ o.summary.length > 0 &&
5862
+ !o.summary.includes("ANTI-PATTERN:")
5863
+ )
5864
+ );
5865
+ },
5866
+ },
5867
+ {
5868
+ // roadmap_describe serves an op's inputSchema on demand (the bulk
5869
+ // evicted from tools/list). move_* / update_* live here now.
5870
+ name: "roadmap_describe returns inputSchema for move/update ops",
5871
+ fn: () =>
5872
+ handle({
5873
+ id: 25,
5874
+ method: "tools/call",
5875
+ params: { name: "roadmap_describe", arguments: { op: "move_task" } },
5876
+ }),
5877
+ pass: (r) => {
5878
+ if (r?.result?.isError) return false;
5879
+ let body;
5880
+ try {
5881
+ body = JSON.parse(r?.result?.content?.[0]?.text ?? "");
5882
+ } catch {
5883
+ return false;
5884
+ }
5885
+ return (
5886
+ body?.op === "move_task" &&
5887
+ body?.inputSchema?.type === "object" &&
5888
+ !!body?.inputSchema?.properties?.newCapabilityId
5889
+ );
5890
+ },
5891
+ },
5892
+ {
5893
+ name: "roadmap_describe rejects an unknown op",
5894
+ fn: () =>
5895
+ handle({
5896
+ id: 26,
5897
+ method: "tools/call",
5898
+ params: { name: "roadmap_describe", arguments: { op: "no_such_op" } },
5899
+ }),
5900
+ pass: (r) => r?.result?.isError === true,
5901
+ },
5902
+ {
5903
+ name: "roadmap rejects a missing op",
5904
+ fn: () =>
5905
+ handle({
5906
+ id: 27,
5907
+ method: "tools/call",
5908
+ params: { name: "roadmap", arguments: {} },
5909
+ }),
5910
+ pass: (r) => r?.result?.isError === true,
5911
+ },
5912
+ {
5913
+ name: "roadmap rejects an unknown op",
5914
+ fn: () =>
5915
+ handle({
5916
+ id: 28,
5917
+ method: "tools/call",
5918
+ params: { name: "roadmap", arguments: { op: "no_such_op" } },
5919
+ }),
5920
+ pass: (r) => r?.result?.isError === true,
5921
+ },
5922
+ {
5923
+ // Dispatch must run the SAME gates as a direct call: hitting a mutator
5924
+ // op through roadmap() before the rubric is fetched returns the
5925
+ // structured prerequisite_missing error, proving gates key off the op.
5926
+ name: "roadmap dispatch enforces the rubric gate on the inner op",
5927
+ fn: () => {
5928
+ resetSession();
5929
+ return handle({
5930
+ id: 29,
5931
+ method: "tools/call",
5932
+ params: {
5933
+ name: "roadmap",
5934
+ arguments: {
5935
+ op: "propose_task",
5936
+ args: { capabilityId: aCap, title: "Should be blocked via dispatch" },
5937
+ },
5938
+ },
5939
+ });
5940
+ },
5941
+ pass: (r) => {
5942
+ if (!r?.result?.isError) return false;
5943
+ try {
5944
+ // Parse and assert the fix FIELD directly (not the whole blob) so a
5945
+ // regression reverting fix to a bare uncallable get_agents_md() —
5946
+ // which still appears in the message — can't pass on substring luck.
5947
+ const out = JSON.parse(r.result.content?.[0]?.text ?? "");
5948
+ return (
5949
+ out.error === "prerequisite_missing" &&
5950
+ out.fix === 'roadmap({ op: "get_agents_md" })'
5951
+ );
5952
+ } catch {
5953
+ return false;
5954
+ }
5955
+ },
5956
+ },
5957
+ {
5958
+ // Dispatch reaches the inner op's argument validation identically to a
5959
+ // direct call. Tightened: assert the message is move_task's OWN
5960
+ // validator (newCapabilityId), so a regression that swallowed args.args
5961
+ // (yielding a generic 'taskId is required') can't pass this.
5962
+ name: "roadmap dispatch reaches inner-op validation",
5963
+ fn: () => {
5964
+ resetSession();
5965
+ return handle({
5966
+ id: 30,
5967
+ method: "tools/call",
5968
+ params: { name: "get_agents_md", arguments: {} },
5969
+ }).then(() =>
5970
+ handle({
5971
+ id: 31,
5972
+ method: "tools/call",
5973
+ params: {
5974
+ name: "roadmap",
5975
+ arguments: { op: "move_task", args: { taskId: "TK-1" } },
5976
+ },
5977
+ })
5978
+ );
5979
+ },
5980
+ pass: (r) => {
5981
+ if (!r?.result?.isError) return false;
5982
+ const text = r.result.content?.[0]?.text ?? "";
5983
+ return text.includes("newCapabilityId") && text.includes("required");
5984
+ },
5985
+ },
5986
+ {
5987
+ // POSITIVE path: a successful read THROUGH roadmap returns the inner
5988
+ // op's real data verbatim (proves the unwrap + passthrough, which the
5989
+ // error-path checks above never exercise).
5990
+ name: "roadmap dispatch returns real data on the happy path",
5991
+ fn: () =>
5992
+ handle({
5993
+ id: 32,
5994
+ method: "tools/call",
5995
+ params: { name: "roadmap", arguments: { op: "get_roadmap_snapshot" } },
5996
+ }),
5997
+ pass: (r) => {
5998
+ if (r?.result?.isError) return false;
5999
+ try {
6000
+ const body = JSON.parse(r.result.content?.[0]?.text ?? "");
6001
+ // Real snapshot shape (workspaceId may be null in seed-only mode).
6002
+ return (
6003
+ Array.isArray(body?.themes) &&
6004
+ Array.isArray(body?.capabilities) &&
6005
+ typeof body?.counts === "object"
6006
+ );
6007
+ } catch {
6008
+ return false;
6009
+ }
6010
+ },
6011
+ },
6012
+ {
6013
+ // Discovery/rubric session flags must be SET through dispatch, not just
6014
+ // blocked when unset — satisfy both gates purely via roadmap({op}) and
6015
+ // confirm the flags flipped (guards a regression where dispatch stopped
6016
+ // re-entering the switch that writes them).
6017
+ name: "roadmap dispatch sets the rubric + discovery session flags",
6018
+ fn: async () => {
6019
+ resetSession();
6020
+ await handle({
6021
+ id: 33,
6022
+ method: "tools/call",
6023
+ params: { name: "roadmap", arguments: { op: "get_agents_md" } },
6024
+ });
6025
+ await handle({
6026
+ id: 34,
6027
+ method: "tools/call",
6028
+ params: { name: "roadmap", arguments: { op: "get_roadmap_snapshot" } },
6029
+ });
6030
+ return {
6031
+ rubric: session.rubricFetchedAt,
6032
+ themes: session.themesListedAt,
6033
+ caps: session.capsDiscoveredAt,
6034
+ };
6035
+ },
6036
+ pass: (r) => r?.rubric !== null && r?.themes !== null && r?.caps !== null,
6037
+ },
6038
+ {
6039
+ // Flat (un-nested) args must NOT be silently dropped — the dangerous
6040
+ // case being a dropped workspaceId/dryRun. Use a gate-free read: a flat
6041
+ // get_task carrying the id at the TOP level (not under args) must reach
6042
+ // the op and resolve the real task. If the flat id were dropped, it
6043
+ // would 404 instead. Proves the flat-merge in the roadmap dispatch.
6044
+ name: "roadmap tolerates flat (un-nested) args",
6045
+ fn: () =>
6046
+ handle({
6047
+ id: 36,
6048
+ method: "tools/call",
6049
+ params: { name: "roadmap", arguments: { op: "get_task", id: aTask } },
6050
+ }),
6051
+ pass: (r) => {
6052
+ if (r?.result?.isError) return false;
6053
+ const text = r.result.content?.[0]?.text ?? "";
6054
+ return typeof aTask === "string" && text.includes(aTask);
6055
+ },
6056
+ },
6057
+ {
6058
+ // When both a flat sibling and a nested args carry the same key, the
6059
+ // nested (documented) value must win. Flat id is a bogus task; nested
6060
+ // id is the real one — resolving the real task proves nested precedence.
6061
+ name: "roadmap merge: nested args win over flat siblings on conflict",
6062
+ fn: () =>
6063
+ handle({
6064
+ id: 37,
6065
+ method: "tools/call",
6066
+ params: {
6067
+ name: "roadmap",
6068
+ arguments: { op: "get_task", id: "TK-DOES-NOT-EXIST", args: { id: aTask } },
6069
+ },
6070
+ }),
6071
+ pass: (r) => {
6072
+ if (r?.result?.isError) return false;
6073
+ const text = r.result.content?.[0]?.text ?? "";
6074
+ return text.includes(aTask) && !text.includes("TK-DOES-NOT-EXIST");
6075
+ },
6076
+ },
6077
+ {
6078
+ // args passed as a JSON STRING (a real LLM failure mode) must be parsed
6079
+ // and honored, not silently dropped — a flat get_task with stringified
6080
+ // args resolves the real task.
6081
+ name: "roadmap parses JSON-string args",
6082
+ fn: () =>
6083
+ handle({
6084
+ id: 38,
6085
+ method: "tools/call",
6086
+ params: {
6087
+ name: "roadmap",
6088
+ arguments: { op: "get_task", args: JSON.stringify({ id: aTask }) },
6089
+ },
6090
+ }),
6091
+ pass: (r) => {
6092
+ if (r?.result?.isError) return false;
6093
+ const text = r.result.content?.[0]?.text ?? "";
6094
+ return typeof aTask === "string" && text.includes(aTask);
6095
+ },
6096
+ },
6097
+ {
6098
+ // A non-object, non-parseable args must produce a clear boundary error
6099
+ // naming the cause — not a misleading downstream 'X is required'.
6100
+ name: "roadmap rejects non-object args with a clear error",
6101
+ fn: () =>
6102
+ handle({
6103
+ id: 39,
6104
+ method: "tools/call",
6105
+ params: { name: "roadmap", arguments: { op: "get_task", args: "not json at all" } },
6106
+ }),
6107
+ pass: (r) => {
6108
+ if (!r?.result?.isError) return false;
6109
+ const text = r.result.content?.[0]?.text ?? "";
6110
+ return text.includes("args") && text.includes("must be an object");
6111
+ },
5492
6112
  },
5493
6113
  {
5494
6114
  name: "get_active_workspace reports a resolution source",
@@ -6477,15 +7097,37 @@ async function runSelftest() {
6477
7097
  pass: (r) => r?.result?.isError === true,
6478
7098
  },
6479
7099
  {
6480
- // Schema-level: tools/list must advertise the four move tools.
6481
- name: "tools/list advertises four move tools",
6482
- fn: () => handle({ id: 30, method: "tools/list", params: {} }),
6483
- pass: (r) => {
6484
- const names = (r?.result?.tools ?? []).map((t) => t.name);
6485
- return ["move_task", "move_capability", "move_tasks", "move_capabilities"].every((n) =>
6486
- names.includes(n)
6487
- );
7100
+ // The four move ops are no longer advertised by name (the surface is
7101
+ // the three dispatch tools) but must remain reachable + describable.
7102
+ name: "roadmap_describe resolves all four move ops",
7103
+ fn: async () => {
7104
+ const ops = ["move_task", "move_capability", "move_tasks", "move_capabilities"];
7105
+ const out = [];
7106
+ for (const op of ops) {
7107
+ out.push(
7108
+ await handle({
7109
+ id: 30,
7110
+ method: "tools/call",
7111
+ params: { name: "roadmap_describe", arguments: { op } },
7112
+ })
7113
+ );
7114
+ }
7115
+ return out;
6488
7116
  },
7117
+ pass: (results) =>
7118
+ Array.isArray(results) &&
7119
+ results.length === 4 &&
7120
+ results.every((r) => {
7121
+ if (r?.result?.isError) return false;
7122
+ try {
7123
+ return (
7124
+ JSON.parse(r.result.content?.[0]?.text ?? "")?.inputSchema?.type ===
7125
+ "object"
7126
+ );
7127
+ } catch {
7128
+ return false;
7129
+ }
7130
+ }),
6489
7131
  },
6490
7132
  {
6491
7133
  // Update validation: missing patch.
@@ -6552,17 +7194,37 @@ async function runSelftest() {
6552
7194
  pass: (r) => r?.result?.isError === true,
6553
7195
  },
6554
7196
  {
6555
- // Schema-level: parent fields are blocked at JSON-schema layer
6556
- // (additionalProperties:false on patch). Without service key
6557
- // we won't reach SQL, but the schema rejects it pre-call.
6558
- name: "tools/list advertises three update tools",
6559
- fn: () => handle({ id: 35, method: "tools/list", params: {} }),
6560
- pass: (r) => {
6561
- const names = (r?.result?.tools ?? []).map((t) => t.name);
6562
- return ["update_task", "update_capability", "update_theme"].every((n) =>
6563
- names.includes(n)
6564
- );
7197
+ // The three update ops are reachable + describable via the dispatch
7198
+ // surface (not advertised by name in tools/list anymore).
7199
+ name: "roadmap_describe resolves all three update ops",
7200
+ fn: async () => {
7201
+ const ops = ["update_task", "update_capability", "update_theme"];
7202
+ const out = [];
7203
+ for (const op of ops) {
7204
+ out.push(
7205
+ await handle({
7206
+ id: 35,
7207
+ method: "tools/call",
7208
+ params: { name: "roadmap_describe", arguments: { op } },
7209
+ })
7210
+ );
7211
+ }
7212
+ return out;
6565
7213
  },
7214
+ pass: (results) =>
7215
+ Array.isArray(results) &&
7216
+ results.length === 3 &&
7217
+ results.every((r) => {
7218
+ if (r?.result?.isError) return false;
7219
+ try {
7220
+ return (
7221
+ JSON.parse(r.result.content?.[0]?.text ?? "")?.inputSchema?.type ===
7222
+ "object"
7223
+ );
7224
+ } catch {
7225
+ return false;
7226
+ }
7227
+ }),
6566
7228
  },
6567
7229
  {
6568
7230
  // Cross-workspace guard fires when snapshot.json names workspace
@@ -7149,7 +7811,7 @@ async function runSelftest() {
7149
7811
  return (
7150
7812
  out.error === "repo_unmapped" &&
7151
7813
  out.repo === "acme/unmapped" &&
7152
- out.fix === "link_repo()" &&
7814
+ out.fix === 'roadmap({ op: "link_repo" })' &&
7153
7815
  out.envDefaultWorkspace === "ws-envdefault"
7154
7816
  );
7155
7817
  } catch {