@roadmapperai/mcp 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (4) hide show
  1. package/AGENTS.md +885 -0
  2. package/README.md +111 -0
  3. package/package.json +35 -0
  4. package/server.mjs +4019 -0
package/AGENTS.md ADDED
@@ -0,0 +1,885 @@
1
+ # AGENTS.md
2
+
3
+ The contract an AI agent must satisfy when planning, coding, or
4
+ opening PRs in this repo. Read this end-to-end before doing any work.
5
+
6
+ The roadmap data model is the source of truth — TypeScript types in
7
+ [`src/types.ts`](https://github.com/vsgro/roadmapper/blob/main/src/types.ts) are the canonical schema. Everything
8
+ in this doc is downstream of that file.
9
+
10
+ ## TL;DR — what an agent must do
11
+
12
+ - Reference Themes / Capabilities / Tasks / Sprints by **stable ID**,
13
+ never by name. Use the `roadmapper` MCP server (see "Discovering
14
+ the current roadmap" below) to pull live IDs rather than guessing.
15
+ - Open PRs from a branch named `agent/TK-NNNNNN-slug` with commits
16
+ prefixed `TK-NNNNNN: …`.
17
+ - Self-grade against the task's `acceptance` list before marking work
18
+ ready for review. Record each grade on `Task.acceptanceGrades[i]`
19
+ as `{ status: "pass" | "fail", note? }` so reviewers can see what
20
+ you actually verified. If `acceptance` is empty, flag the task as
21
+ `kind: "spike"` or stop and ask for one.
22
+ - Respect `Task.dependsOn`. Don't start work on a task whose
23
+ prerequisites haven't been delivered — surface the blocked state
24
+ back to the user instead.
25
+ - Stamp `authorKind: "agent"` on tasks/PRs you create.
26
+ - Don't auto-close work. A PR closes its parent task only after a
27
+ human reviewer approves and merges.
28
+
29
+ ## Discovering the current roadmap
30
+
31
+ Roadmapper ships with `mcp/server.mjs`, a stdio MCP server. Read tools
32
+ let you plan against real IDs; write tools let you act through the
33
+ same surface when the operator has wired in a service-role key.
34
+
35
+ **Read tools** (always available):
36
+ - `get_roadmap_snapshot` — **start here**. One-call orient: returns
37
+ themes, active capabilities, and in-flight tasks for the workspace
38
+ in a single response. Saves you three round trips when you're just
39
+ trying to orient. Always live, never cached. Response includes the
40
+ resolved `workspaceId` you should pass back on any write call.
41
+ - `list_themes` — get the theme catalogue.
42
+ - `list_capabilities` — filter by `themeId` to scope your plan.
43
+ **Excludes delivered capabilities by default** — agents should
44
+ plan against work that's still in flight. Pass
45
+ `includeDelivered: true` only when you genuinely need the full
46
+ historical list (e.g. reviewing whether a closed bet should be
47
+ reopened, which is rare and should be a human call).
48
+ - `list_tasks` — filter by `capabilityId` and/or `status` to see what
49
+ already exists before proposing duplicates.
50
+ - `get_task` — full task detail, including `acceptance`, `dependsOn`,
51
+ and attached PRs.
52
+ - `get_agents_md` — re-read this contract on demand.
53
+
54
+ **Multi-workspace addressing.** A single MCP install can talk to any
55
+ workspace the caller is authorized for. Every read and write tool
56
+ accepts an optional `workspaceId` argument that overrides the env
57
+ default. Two conventions for picking the right one:
58
+
59
+ 1. If you're working inside a repo with a `.roadmapper/snapshot.json`
60
+ file in it, that file names its own `workspaceId` — use that.
61
+ 2. Otherwise, call `get_roadmap_snapshot` (which uses the env-default
62
+ workspace) and use the `workspaceId` it returns on subsequent
63
+ write calls. This is mainly a "did the env get the right
64
+ workspace?" sanity check; under normal use the env-default is
65
+ what you want.
66
+
67
+ **The rubric gate** — the server tracks whether you've called
68
+ `get_agents_md` (or read `roadmapper://rubric`) this session. Until
69
+ you do, every write tool below returns a structured error:
70
+
71
+ ```json
72
+ {
73
+ "error": "prerequisite_missing",
74
+ "message": "Call get_agents_md first this session, then retry ...",
75
+ "fix": "get_agents_md()"
76
+ }
77
+ ```
78
+
79
+ Retry with the `fix` field's call. The gate is not optional —
80
+ proposals filed without the rubric in scope won't round-trip into
81
+ the product because the server's validators are tied to it.
82
+
83
+ **Write tools** (require `SUPABASE_SERVICE_ROLE_KEY`; if the operator
84
+ hasn't set it, these return an error result and you should fall back
85
+ to "tell the human what I'd do"):
86
+
87
+ - `propose_task` — create a new task under an **existing** capability.
88
+ The server auto-stamps `authorKind: "agent"`, `status: "planned"`,
89
+ and a 6-digit `TK-NNNNNN` id. Include `acceptance` and `dependsOn`
90
+ when you propose — don't make the human fill them in. **Always pass
91
+ `idempotencyKey`** (e.g. a hash of `capabilityId + title`, or your
92
+ session id + a per-task counter) so a retry after a crash or
93
+ transient error reuses the existing task instead of creating a
94
+ duplicate. Response includes `idempotent: true` when that happens.
95
+ - `propose_capability` — create a new capability under an **existing**
96
+ theme. Required: `name`, `pillarId`. Sensible defaults are applied
97
+ (`reach: 100`, `impact: 1`, `confidence: 70`). Pass `outcome` and
98
+ `specRef` whenever you have them — capabilities without an outcome
99
+ rarely survive review. Pass `idempotencyKey`.
100
+ - `propose_theme` — create a new theme. **Use sparingly.** Themes are
101
+ years-stable strategic categories; almost every plan fits under an
102
+ existing one. Only call this when the human explicitly asks for a
103
+ new theme or when the work genuinely doesn't fit any of the existing
104
+ ones from `list_themes`. Default rule: if you're tempted to create
105
+ a theme, file a capability under the closest existing theme instead
106
+ and let the human reorganize later.
107
+ - `submit_acceptance_grades` — stamp `{ status: pass | fail, note? }`
108
+ per criterion index, after you've actually verified the work. The
109
+ server stamps `gradedAt` and `gradedBy: "mcp:agent"`.
110
+
111
+ **Lifecycle tools** (archive, move, update — every call requires
112
+ `reason`; the audit log records who/why so future readers know what
113
+ happened):
114
+
115
+ - `archive_task` / `archive_capability` / `archive_theme` — soft-delete
116
+ an entity. The row stays in the workspace; list views filter it out,
117
+ by-id lookups still resolve. Archive refuses if the entity has active
118
+ children (forces bottom-up: archive tasks before their capability,
119
+ capabilities before their theme).
120
+ - `unarchive_task` / `unarchive_capability` / `unarchive_theme` —
121
+ reverse archive. The parent must be active. To unarchive while
122
+ reparenting, use `move_*` with a different active target — that path
123
+ unarchives in one step.
124
+ - `move_task` / `move_capability` — re-parent under a different
125
+ capability/theme. IDs are stable across moves. Target parent must be
126
+ active. Moving to the current parent returns `{ idempotent: true }`.
127
+ Tasks move via `move_task`; capabilities move via `move_capability`.
128
+ Themes are roots and don't move.
129
+ - `move_tasks` / `move_capabilities` — bulk variant, up to 100 items
130
+ per call. The server stamps one shared `batchId` into every audit
131
+ row so a reorg shows up as one logical event in history. Per-item
132
+ failures don't roll back earlier successes — the response includes
133
+ per-move ok/error so you can retry the failures.
134
+ - `update_task` / `update_capability` / `update_theme` — patch fields
135
+ on an entity. The patch is a partial object; only the keys you
136
+ include are touched. Parent fields (`capabilityId`, `pillarId`) and
137
+ lifecycle flags (`archived`, `archivedAt`, `id`) are rejected — use
138
+ `move_*` / `archive_*` instead. The server diffs against the
139
+ projected current state; a patch that matches everything returns
140
+ `{ idempotent: true }` with no audit row.
141
+
142
+ All lifecycle tools require `reason`. Idempotent retries (same state,
143
+ same input) return `{ idempotent: true }` and emit no audit row, so
144
+ crash-retry is safe.
145
+
146
+ **Scope hints on tasks.** `propose_task` and `update_task` accept
147
+ two optional advisory fields:
148
+
149
+ - `expectedPRs` — max merged PRs you expect for this task.
150
+ - `expectedScope` — cumulative LoC ceiling (additions + deletions)
151
+ across all PRs linked to the task.
152
+
153
+ Both are unset by default. When the webhook links a PR that pushes
154
+ the task past either ceiling, it writes a `scope_overrun` row to
155
+ `audit_log` (action=`scope_overrun`, target_kind=`task`). The link
156
+ itself never gets refused — these are observations, not gates. Use
157
+ them to declare an envelope per task and watch which tasks blow
158
+ through it.
159
+
160
+ **Outcome readings.** Capabilities accumulate point-in-time
161
+ measurements against their stated outcome:
162
+
163
+ - `record_outcome_reading({ capabilityId, value, asOf, source,
164
+ note? })` — append a reading. The server takes a row lock so
165
+ concurrent writers (a human pasting Mixpanel + a script pushing
166
+ a warehouse query) both land instead of clobbering.
167
+ - `list_stale_outcomes({ thresholdDays?: 14, includeArchived? })`
168
+ — read tool. Returns capabilities with no reading in the
169
+ threshold window (or never measured), sorted never-measured
170
+ first then by most-stale. Use during quarterly review or to
171
+ spot bets that lost the empirical loop.
172
+
173
+ **Workspace auto-default + cross-workspace guard.** If the cwd
174
+ has a `.roadmapper/snapshot.json` (committed by the snapshot
175
+ cron into every connected repo), the MCP defaults to that
176
+ workspace. Mutator calls with an explicit `workspaceId` that
177
+ disagrees with the cwd snapshot are refused — set
178
+ `ROADMAPPER_ALLOW_CROSS_WORKSPACE=1` in the env to override.
179
+ Reads can target any workspace freely.
180
+
181
+ Authoring discipline:
182
+ - Read first (`list_themes`, `list_capabilities`, `list_tasks`) before
183
+ proposing anything, so you don't invent a new theme/capability that
184
+ duplicates an existing one.
185
+ - Prefer the deepest existing parent. New task > new capability > new
186
+ theme. Climb the tree only when nothing at the lower level fits.
187
+ - The three `propose_*` tools support `idempotencyKey`. Always pass one
188
+ so a retry collapses to the existing record. (`submit_acceptance_grades`
189
+ is naturally idempotent — re-stamping the same grade is a no-op.)
190
+
191
+ All write tools route through Postgres RPCs (`public.propose_task`,
192
+ `public.propose_capability`, `public.propose_theme`,
193
+ `public.grade_acceptance`, `public.archive_entity`,
194
+ `public.unarchive_entity`, `public.move_entity`, `public.update_entity`
195
+ — defined across migrations 0006–0045). The read-modify-write happens
196
+ inside one transaction with row-level locking, so concurrent agents
197
+ writing to the same workspace queue cleanly — no last-write-wins
198
+ clobber. Safe to call from parallel agent sessions.
199
+
200
+ **MCP resources** (auto-pulled by clients that subscribe at
201
+ connect; bypass the "agent forgot to fetch" failure entirely):
202
+
203
+ - `roadmapper://rubric` — same content as `get_agents_md`. Reading
204
+ this resource counts toward the rubric gate, so a client that
205
+ auto-subscribes satisfies the prerequisite without an explicit
206
+ tool call.
207
+ - `roadmapper://capabilities/active` — live snapshot of non-
208
+ delivered capabilities.
209
+ - `roadmapper://tasks/open` — live snapshot of `in_progress` +
210
+ `planned` tasks.
211
+
212
+ **MCP prompts** (slash-commands the user invokes to force a flow;
213
+ bypass model judgment about when to apply the rubric):
214
+
215
+ - `roadmapper:plan-feature <description>` — expands into "fetch
216
+ rubric → suggest_capability_for → propose tasks under it, or ask
217
+ before creating a new capability". Use when the user says "design
218
+ features for X" or "plan Y".
219
+ - `roadmapper:close-task <task_id> [pr_url]` — expands into the
220
+ acceptance-grade + link-pr flow with the task already named.
221
+ - `roadmapper:weekly-review` — structured walk through open tasks,
222
+ stale capabilities, and ungraded deliveries.
223
+
224
+ **System reminders.** List-tool responses (`list_tasks`,
225
+ `list_capabilities`, `get_roadmap_snapshot`, `suggest_capability_for`)
226
+ attach a `_meta.roadmapper.reminder` field when the server detects
227
+ a flow that needs attention — e.g. "3 delivered tasks have merged
228
+ PRs without submitted acceptance grades; call submit_acceptance_grades
229
+ for TK-X, TK-Y, TK-Z." Surface these to the user; they're the
230
+ roadmap's way of asking for the next action.
231
+
232
+ Three capability tiers, driven by which env vars the operator set:
233
+
234
+ | Tier | Required env | What you get |
235
+ |-------------------|-----------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------|
236
+ | Seed-only | none | Read tools against the static `roadmap.json`. |
237
+ | Live read | `SUPABASE_URL` + (`SUPABASE_PUBLISHABLE_KEY` or `SUPABASE_ANON_KEY`) + `SUPABASE_WORKSPACE_ID` | Read tools merged with the workspace's edits. |
238
+ | Live read + write | All of the above plus `SUPABASE_SERVICE_ROLE_KEY` (and migrations 0005–0045 applied) | Read tools + propose_* / grade / archive_* / unarchive_* / move_* / update_* tools. |
239
+
240
+ If a write tool returns an error result mentioning `SUPABASE_SERVICE_ROLE_KEY`,
241
+ the operator is on the live-read tier; don't keep retrying — fall
242
+ back to telling the human what you'd do and let them apply it.
243
+
244
+ Sanity check the install with `node mcp/server.mjs --selftest` — runs
245
+ every tool against the local seed and prints a pass/fail summary.
246
+
247
+ If the operator chose project-scoped install (`.mcp.json` in the
248
+ roadmapper repo root, which is the default `npm run mcp:setup` path),
249
+ the MCP only loads when their client is launched from that repo. If
250
+ you're an agent running in another codebase and the `roadmapper`
251
+ tools aren't visible, ask the operator to either (a) merge the
252
+ `mcpServers.roadmapper` block from `roadmap/.mcp.json` into their
253
+ user-level client config or (b) point you at the roadmapper repo so
254
+ you can run there instead.
255
+
256
+ Wire-up (Claude Code, Claude Desktop, or any MCP client):
257
+
258
+ ```jsonc
259
+ {
260
+ "mcpServers": {
261
+ "roadmapper": {
262
+ "command": "node",
263
+ "args": ["/absolute/path/to/roadmap/mcp/server.mjs"],
264
+ "env": {
265
+ "SUPABASE_URL": "https://<id>.supabase.co",
266
+ "SUPABASE_PUBLISHABLE_KEY": "sb_publishable_...",
267
+ "SUPABASE_WORKSPACE_ID": "<workspace id>",
268
+ "SUPABASE_SERVICE_ROLE_KEY": "<only if write tools wanted>"
269
+ }
270
+ }
271
+ }
272
+ }
273
+ ```
274
+
275
+ The `env` block is optional — drop it to serve the seed only. See
276
+ [README.md](/README.md#mcp-server) for the config-file path per
277
+ client and the full env-var matrix.
278
+
279
+ ## The snapshot file in connected repos
280
+
281
+ Workspaces that have the GitHub App installed get a server-pushed
282
+ snapshot file at `.roadmapper/snapshot.json` on a dedicated branch
283
+ named `roadmapper-snapshot` in every connected repo. The cron
284
+ refreshes it every 6 hours; commits are content-hash-deduped so
285
+ the branch only moves when the workspace actually changed.
286
+
287
+ **Why this matters for agents:** if you're working out of a checkout
288
+ that has this branch, you can read the canonical model from the
289
+ filesystem without an MCP round trip. The file contains:
290
+
291
+ ```json
292
+ {
293
+ "workspaceId": "ws-abc",
294
+ "workspaceName": "Acme Roadmap",
295
+ "generatedAt": "2026-05-12T18:00:00Z",
296
+ "themes": [...], // every theme in the workspace
297
+ "capabilities": [...], // every capability
298
+ "tasks": [...] // every task
299
+ }
300
+ ```
301
+
302
+ The snapshot is the **complete current state** of the workspace's
303
+ entity tables, projected to camelCase. No seed-merge is required
304
+ on the consumer side — the snapshot already contains the bundled
305
+ demo entities plus everything created by the user (post-Stage-3
306
+ the seed is just the projection target for fresh workspaces, not
307
+ a parallel data path).
308
+
309
+ Soft-deleted workspaces stop receiving snapshot updates — the
310
+ cron filters them out. If you're operating from a stale snapshot
311
+ on a deleted workspace, your MCP writes will hit
312
+ `assert_workspace_active` and error; treat that as the signal
313
+ that the workspace is gone and stop pushing.
314
+
315
+ The `workspaceId` field is what you pass into MCP write tools (via
316
+ their optional `workspaceId` arg) so the write lands on the right
317
+ workspace even when the MCP's env points elsewhere. Think of the
318
+ snapshot file as the agent's "where am I?" pointer.
319
+
320
+ ## The mental model
321
+
322
+ Five layers, don't conflate them:
323
+
324
+ | Layer | Owned by | Stability | Example |
325
+ |------------------|------------|---------------|--------------------------------------------|
326
+ | **Theme** | Leadership | Years | "Data & Intelligence" |
327
+ | **Capability** | PM | Quarters | "Real-time segmentation" |
328
+ | **Sprint** | Eng lead | 1–2 weeks | "Sprint 127" |
329
+ | **Task** | IC / agent | Days–weeks | "Implement segment recompute job" |
330
+ | **PR** | IC / agent | Hours–days | `TK-100201: Add NLP segment authoring` |
331
+
332
+ Plans live as Capabilities. Engineering output is Tasks + PRs. Each
333
+ PR closes one or more Tasks. The **Strategic Outlook** view
334
+ (`#outlook`) shows Capabilities on a 3-month to 3-year horizon for
335
+ stakeholder/customer presentation; drag a bar between theme groups
336
+ to rebind its `pillarId`.
337
+
338
+ ---
339
+
340
+ ## Planning a feature with an LLM
341
+
342
+ When a chat session asks you to "plan out feature X" before any
343
+ code is written, your job is to emit a structure that maps 1:1 onto
344
+ the Theme → Capability → Task hierarchy below. Don't return a generic
345
+ PRD or bullet list — the user will copy your output straight into
346
+ Roadmapper, so it has to be in the shape Roadmapper consumes.
347
+
348
+ ### Decision tree
349
+
350
+ Before writing anything:
351
+
352
+ 1. **Find the home Theme.** Call `suggest_theme_for({ description })`
353
+ with a one-line description of the work — the server ranks
354
+ existing themes by token overlap and returns a fit signal.
355
+ - **Top score > 0.4**: existing theme fits, use it. Never call
356
+ `propose_theme` after seeing a strong match.
357
+ - **Top score 0.2–0.4**: weak overlap. The top match is still
358
+ usually the right home; re-using a "close-enough" theme is
359
+ almost always better than creating a duplicate.
360
+ - **Empty matches or top < 0.2**: no existing theme fits. Stop
361
+ and ask the user explicitly whether this represents a new
362
+ strategic direction worth a years-stable theme. Only call
363
+ `propose_theme` after the user confirms — themes are
364
+ years-stable, not per-feature.
365
+ The propose_theme tool enforces this discovery: skipping
366
+ `suggest_theme_for` (or `list_themes` / `get_roadmap_snapshot`)
367
+ returns a `discovery_missing` error.
368
+ 2. **Decompose into Capabilities.** A Capability is "a quarterly
369
+ bet that ships value to the customer." Most features are one
370
+ Capability. A complex feature (multiple value streams, multiple
371
+ stakeholders, three+ months of work) splits into 2–3 Capabilities.
372
+ If you're tempted to write 5+, you're describing a Theme — go
373
+ back to step 1.
374
+ 3. **Decompose each Capability into Tasks.** A Task is "days-to-weeks
375
+ of work that one IC can finish and PR." Aim for 3–8 Tasks per
376
+ Capability. Fewer means the Capability is too small (combine
377
+ with a sibling). More means the Tasks are too granular (merge
378
+ them — Roadmapper isn't a checklist app).
379
+ 4. **Stamp every record with the required fields below.** Don't
380
+ skip `outcome`, `acceptance`, `effort`, or `priority` — those
381
+ are what make the plan actionable instead of aspirational.
382
+
383
+ ### What to emit (template)
384
+
385
+ Return a single JSON block. The user will paste it into Roadmapper's
386
+ import or read it manually; either way the field names must match
387
+ exactly. IDs use the `__NEW__` placeholder prefix when you're
388
+ proposing a new record — Roadmapper assigns the real `TH-NNNNNN` /
389
+ `CAP-NNN` / `TK-NNNNNN` ID at import time.
390
+
391
+ ```jsonc
392
+ {
393
+ "themes": [
394
+ // Only include a theme when proposing a NEW one. Otherwise
395
+ // reference an existing theme by its real id below.
396
+ {
397
+ "id": "__NEW__data-intelligence",
398
+ "name": "Data & Intelligence",
399
+ "description": "Make customer data instantly queryable, cross-product.",
400
+ "color": "#2563eb",
401
+ "targetRoi": 12 // $M annual; optional, exec rollup signal
402
+ }
403
+ ],
404
+ "capabilities": [
405
+ {
406
+ "id": "__NEW__realtime-segmentation",
407
+ "name": "Real-time segmentation",
408
+ "pillarId": "__NEW__data-intelligence", // or an existing TH- id
409
+ "description": "Recompute audience membership within 60s of an event.",
410
+ "outcome": "Marketing operators ship campaigns from fresh data instead of yesterday's snapshot. Closes the #1 ICP request.",
411
+ "specRef": "https://notion.so/.../realtime-seg-rfc", // strongly recommended
412
+ "reach": 120, // distinct customers/qtr
413
+ "impact": 2, // 3 massive | 2 high | 1 medium | 0.5 low | 0.25 minimal
414
+ "confidence": 75 // 0–100
415
+ }
416
+ ],
417
+ "tasks": [
418
+ {
419
+ "id": "__NEW__segment-recompute-job",
420
+ "capabilityId": "__NEW__realtime-segmentation",
421
+ "title": "Implement segment recompute job",
422
+ "summary": "Stream-process events and rewrite audience membership in <60s.",
423
+ "kind": "feature", // feature | bug | chore | spike
424
+ "authorKind": "agent",
425
+ "effort": "L", // XS=3d S=8d M=18d L=35d XL=65d
426
+ "priority": "P1", // P0 critical | P1 high | P2 medium | P3 low
427
+ "start": "2026-06-01",
428
+ "target": "2026-07-15",
429
+ "acceptance": [
430
+ "Latency from event ingest to membership update is <60s p95.",
431
+ "Backfill of last 30 days completes without missed events.",
432
+ "Recompute job emits CloudWatch metrics for lag and throughput."
433
+ ],
434
+ "dependsOn": [] // other __NEW__ task ids you propose
435
+ },
436
+ // ...3–8 tasks per capability
437
+ ]
438
+ }
439
+ ```
440
+
441
+ ### Field quality bar
442
+
443
+ Things you should *not* skip or hand-wave:
444
+
445
+ - **`outcome`** — describe the customer-visible change in plain
446
+ English, ~2 sentences. "Adds X feature" is not an outcome;
447
+ "Marketing operators stop asking us for X manually" is.
448
+ - **`specRef`** — link to a real doc (Notion, Figma, Linear, GitHub
449
+ RFC, etc.). If one doesn't exist yet, propose its title in the
450
+ description and tag the first Task `kind: "spike"` to write it.
451
+ - **`acceptance`** — each entry is a single checkable assertion.
452
+ An agent will grade itself against this list before opening a
453
+ PR. "Works correctly" is not acceptance; "Returns 401 when token
454
+ is missing" is.
455
+ - **`effort`** — pick the size that matches the listed day budget.
456
+ Don't compress 6 weeks of work into M because the user wants
457
+ it sooner.
458
+ - **`priority`** — P0 is reserved for revenue / SLA / security
459
+ fires. Most planned work is P1 or P2.
460
+
461
+ ### Common shape of your reply
462
+
463
+ A typical planning response is:
464
+
465
+ ```
466
+ 1. Three sentences explaining what you understood about the feature.
467
+ 2. The JSON block above, fully filled in.
468
+ 3. Any open questions you want answered before the user imports it.
469
+ ```
470
+
471
+ Don't pad with status updates, project-management theatre, or a
472
+ "let's break this down" preamble. Roadmapper already provides the
473
+ breakdown; you provide the data.
474
+
475
+ ---
476
+
477
+ ## Required of every agent task
478
+
479
+ ```jsonc
480
+ {
481
+ "id": "TK-100201",
482
+ "capabilityId": "CAP-018",
483
+ "title": "Audit log S3 export job",
484
+ "summary": "Stream the audit log to S3 nightly, partitioned by day.",
485
+ "kind": "feature", // feature | bug | chore | spike
486
+ "authorKind": "agent", // human | agent
487
+ "effort": "M", // XS | S | M | L | XL
488
+ "priority": "P1", // P0 | P1 | P2 | P3
489
+ "start": "2026-06-01",
490
+ "target": "2026-07-15",
491
+ "owner": "Marcus Lee",
492
+ "ownerUserId": "...", // auth.users.id when known
493
+ "acceptance": [ // checkable assertions
494
+ "Nightly job runs at 02:00 UTC and writes to s3://audit/YYYY/MM/DD/.",
495
+ "Failures page on-call via PagerDuty.",
496
+ "Last 30 days are queryable via Snowflake external table."
497
+ ],
498
+ "dependsOn": ["TK-100199"] // optional
499
+ }
500
+ ```
501
+
502
+ - `acceptance` is **required for any task an agent will pick up**. An
503
+ empty list is a stop signal: either the task is too vague to start
504
+ (`kind: "spike"` it and produce a spec), or someone needs to fill
505
+ it in.
506
+ - `acceptanceGrades[i]` is the persistent self-grade for criterion
507
+ `i`. Write `{ status: "pass" | "fail", note? }` after you've
508
+ actually verified the assertion. The task surfaces a "Self-graded
509
+ ready" badge once every entry passes; reviewers should not need to
510
+ ask "did you check this?"
511
+ - `dependsOn` is honored by sprint composition, burndown, and the
512
+ "Blocked" indicator in task lists. Don't invent it just because
513
+ tasks happen in order; only set it when one truly blocks another.
514
+ - `kind` decides whether the task counts toward RICE/forecasting.
515
+ Bugs and chores don't pollute capability-level estimates.
516
+
517
+ ### Required capability fields
518
+
519
+ ```jsonc
520
+ {
521
+ "id": "CAP-018",
522
+ "name": "Audit log export",
523
+ "pillarId": "TH-PL",
524
+ "description": "Export audit stream to S3/Snowflake.",
525
+ "outcome": "Compliance officers stop asking for ad-hoc dumps. SOC2 evidence in one click.",
526
+ "specRef": "https://notion.so/.../audit-log-rfc", // required before agent decomposition
527
+ "reach": 80, "impact": 2, "confidence": 85,
528
+ "outcomeStatus": "pending" // pending | achieved | missed
529
+ }
530
+ ```
531
+
532
+ - `specRef` is **required before an agent decomposes a Capability
533
+ into Tasks**. Without it, the agent has no contract for what's in
534
+ scope. Stop and ask.
535
+ - `outcomeStatus` is reviewed quarterly. "Delivered" doesn't mean
536
+ "outcome achieved"; the field exists to keep the distinction honest.
537
+
538
+ ---
539
+
540
+ ## Branch & commit conventions
541
+
542
+ Agent-authored work uses idempotent, attributable identifiers. Branch
543
+ naming, commit prefixing, and PR title are all derived from the same
544
+ task ID so re-syncs don't double-apply and review tooling can group
545
+ agent work.
546
+
547
+ - **Branch:** `agent/TK-NNNNNN-slug`
548
+ - `agent/` prefix tells branch protection and reviewers this needs
549
+ extra scrutiny.
550
+ - `TK-NNNNNN` is the task ID (6-digit numeric).
551
+ - `slug` is a short kebab-case description (≤ 6 words).
552
+ - Example: `agent/TK-100201-nlp-segment-authoring`.
553
+
554
+ - **Commit messages:** `TK-NNNNNN: <imperative summary>`
555
+ - Every commit on the branch carries the prefix.
556
+ - Body explains *why* (not what — the diff has that).
557
+
558
+ - **PR title:** `TK-NNNNNN: <task title>` (auto-derived).
559
+ - **PR body:** see "PR body template" below. The body's `Closes
560
+ TK-NNNNNN` line is what links the PR to the task at sync time.
561
+
562
+ Multi-task work: prefix the most-load-bearing task ID, list the
563
+ others in the PR body's `Closes …` lines. The Settings → GitHub
564
+ unmatched picker can also attach one PR to multiple tasks
565
+ post-hoc.
566
+
567
+ ---
568
+
569
+ ## PR body template
570
+
571
+ ```markdown
572
+ ## Summary
573
+ What changed in one paragraph.
574
+
575
+ ## Customer impact
576
+ Plain-English statement. If this is the last PR for the capability,
577
+ this becomes the capability outcome (Roadmapper auto-promotes when the
578
+ capability's `outcome` is empty — never overwrites).
579
+
580
+ ## Acceptance check
581
+ - [x] <criterion 1 from the task>
582
+ - [x] <criterion 2 from the task>
583
+ - [ ] <criterion 3 — not done; explain or split>
584
+
585
+ ## Test plan
586
+ - [ ] How a reviewer should validate this locally / in CI.
587
+
588
+ Closes TK-100201
589
+
590
+ Roadmapper-Task: TK-100201
591
+ Roadmapper-Capability: CAP-9F2C7E
592
+ ```
593
+
594
+ ### Roadmapper trailers — the canonical-model loop
595
+
596
+ The two lines at the bottom (`Roadmapper-Task:` and `Roadmapper-Capability:`)
597
+ are **trailers** the webhook reads when GitHub fires `pull_request`
598
+ events. They close a loop that's otherwise impossible to close
599
+ server-side: only the agent doing the work knows which capability the
600
+ PR actually advances.
601
+
602
+ **Before opening a PR**, an agent must:
603
+
604
+ 1. Call `list_capabilities` (and `list_themes` if needed) over MCP
605
+ and read the **entire list** of *active* capabilities — names,
606
+ outcomes, descriptions. A one-shot text match against a single
607
+ candidate is not enough; the right capability is usually the one
608
+ whose outcome most closely maps to what your PR achieves, not
609
+ the one with the most keyword overlap. Use `list_capabilities`
610
+ with a `themeId` filter when you already know the theme to keep
611
+ the list short. Delivered capabilities are excluded by default —
612
+ they're closed bets, not landing pads for new work.
613
+ 2. Pick the **single best-fit** capability id, or call
614
+ `propose_capability` if and only if nothing genuinely fits.
615
+ Default rule still applies: new task > new capability > new
616
+ theme. A capability that already exists is almost always better
617
+ than a new one.
618
+ 3. Stamp both trailers (Task + Capability) into the PR body, exactly
619
+ as shown above — one per line, at the bottom of the body, no
620
+ trailing punctuation.
621
+
622
+ The webhook will:
623
+
624
+ - Honor `Roadmapper-Task: TK-NNNNNN` as authoritative when the id
625
+ exists in the workspace, even if the PR title doesn't say `TK-…:`.
626
+ - Pass `Roadmapper-Capability: CAP-XXXXXX` to `create_task_from_pr`
627
+ when no matching task exists, so brand-new tasks land already
628
+ associated with the right capability instead of dumped in
629
+ Uncategorized.
630
+ - Fall back to (Jaccard match against user-created capabilities) →
631
+ (null capability) when no trailer is present.
632
+
633
+ The PR-to-task matcher accepts (in order):
634
+ 1. **Roadmapper-Task trailer** — `Roadmapper-Task: TK-NNNNNN` in the
635
+ PR body. Authoritative when the id is known to the workspace.
636
+ 2. **Already-attached** — if the PR is already on a task's `prs[]`,
637
+ it routes back to that task. Manual mappings stick across re-syncs.
638
+ 3. **ID reference** — title prefix `XX-N: …`, body `Closes/Fixes/Refs/
639
+ Resolves XX-N`, or any bare `XX-N` matching a known task ID.
640
+ 4. **AI map** — Settings → GitHub → "AI map" sends the visible /
641
+ selected unmatched PRs to a Vercel edge function backed by Claude
642
+ Haiku 4.5 for semantic capability routing. PR rows show an "AI"
643
+ chip when the suggestion came from the LLM. Requires
644
+ `ANTHROPIC_API_KEY` set on the Vercel project; without it the
645
+ Jaccard fallback continues to work and the UI surfaces a one-line
646
+ "not configured" notice.
647
+ 5. **Jaccard fallback** — title/body tokens against capability
648
+ tokens, threshold 0.10. Always available; used when nothing
649
+ higher-precedence matched. The webhook also runs this fallback
650
+ server-side against user-created capabilities when picking a
651
+ capability hint for an auto-created task.
652
+
653
+ If you want your PR to land on the right task without anyone
654
+ clicking the unmatched picker, do (1) by stamping the
655
+ `Roadmapper-Task:` trailer, (2) by branching from
656
+ `agent/TK-NNNNNN-slug`, or (3) by including `Closes TK-NNNNNN` in
657
+ the body.
658
+
659
+ ---
660
+
661
+ ## How merged PRs become "delivered"
662
+
663
+ `status="delivered"` auto-stamps `delivered = today` and `progress =
664
+ 100`. With the GitHub sync's `Auto-mark delivered` flag on, a merged
665
+ PR flips its parent task to `delivered` using the merge timestamp.
666
+
667
+ Caveats:
668
+ - Merged ≠ deployed ≠ outcome achieved. The capability's
669
+ `outcomeStatus` exists so quarterly reviews can say so.
670
+ - An agent **must not** flip a task to `delivered` on its own; that's
671
+ driven by a human-merged PR, not by the agent's self-grade.
672
+ - `originalTarget` is captured at first plan and never auto-changes;
673
+ on-time rate compares `delivered` to `originalTarget`.
674
+
675
+ ### Outcome attainment
676
+
677
+ Each capability carries an `outcomeStatus` (`pending` / `achieved` /
678
+ `missed`) that the human stamps from the capability detail page once
679
+ delivery is done — independent of whether the code shipped on time.
680
+ A separate target-hit / target-missed chip is auto-derived from
681
+ `delivered ≤ target` and needs no manual stamp. Agents that propose
682
+ new capabilities should set `outcomeStatus: "pending"` and leave
683
+ `outcomeCheckedAt` unset — the human owns those fields.
684
+
685
+ ---
686
+
687
+ ## RICE scoring
688
+
689
+ ```
690
+ score = (Reach × Impact × Confidence/100) ÷ Effort-days
691
+ ```
692
+
693
+ | Field | Meaning | Range / values |
694
+ |------------|----------------------------------------------------|-----------------------|
695
+ | Reach | Distinct customers/users touched per quarter, post-launch | Integer |
696
+ | Impact | How big a change | 3 · 2 · 1 · 0.5 · 0.25 |
697
+ | Confidence | How sure you are this will work | 0–95 |
698
+ | Effort | Auto-summed from tasks (XS=2h, S=4h, M=1d, L=3d, XL=8d) | Days |
699
+
700
+ ### Impact levels (concrete)
701
+
702
+ - **3 — Massive.** Unlocks a new revenue line, removes the single biggest blocker in your pipeline, or shifts a north-star metric.
703
+ - **2 — High.** Removes a top-5 friction in the existing flow, or moves a secondary metric measurably.
704
+ - **1 — Medium.** Solid improvement that customers will notice but won't move headline numbers.
705
+ - **0.5 — Low.** Quality-of-life polish.
706
+ - **0.25 — Minimal.** Cosmetic or internal-only.
707
+
708
+ ### Confidence levels (concrete)
709
+
710
+ - **>90 only if the work is already shipped or running behind a flag.** Anything earlier is a guess.
711
+ - **70–90** — comparable to things you've shipped before.
712
+ - **40–70** — plausible but not validated.
713
+ - **<40** — propose as `kind: "spike"`, not a feature capability.
714
+ - **100 is never correct.** The server caps `propose_capability` at 95 to make this stick.
715
+
716
+ ### Outcome statements (required, falsifiable)
717
+
718
+ Every capability needs an outcome that can be checked against reality. Template:
719
+
720
+ > `<metric>` moves from `<baseline>` to `<target>` by `<date>`, measured by `<source>`.
721
+
722
+ **Good** — `CAP-XXXX "SDK primitives for landing pages"`
723
+ > Pages-published-per-BU moves from 0 to median 3/quarter by 2026-Q3, measured by `lp_publish_event`.
724
+
725
+ **Good** — `CAP-XXXX "Onboarding completion lift"`
726
+ > Activation rate moves from 32% to 55% by 2026-09-30, measured by the `activated_user` warehouse event.
727
+
728
+ **Weak** — "Improve builder UX." Fails: no metric, no baseline, no date. Rewrite or drop.
729
+
730
+ **Weak** — "Better email deliverability." Fails: same. Rewrite as "Email deliverability moves from 87% to 95% by 2026-Q4, measured by Postmark stats."
731
+
732
+ The MCP server's `propose_capability` tool rejects outcomes that don't contain both a number and a temporal anchor (year, quarter, month name, or "by …"). Use `dryRun: true` to validate before committing.
733
+
734
+ ### When NOT to create a capability
735
+
736
+ - **One-off bug fix** → task under the existing capability it affects.
737
+ - **Infra work under an existing bet** → task on that bet, not a new bet.
738
+ - **Exploration without a commitment** → don't propose yet; gather evidence, then come back.
739
+ - **Renaming or refactoring** → task, not capability.
740
+ - **Anything that fits in one PR** → task.
741
+
742
+ If you find yourself emitting 5+ tasks under a capability you just proposed, the capability is too big — split it. If you can't write a falsifiable outcome, the capability isn't real yet — go back one step.
743
+
744
+ ### ROI vs theme
745
+
746
+ A capability's `roi` should clear ~70% of the parent theme's `targetRoi`. Below that, the server flags a warning on `propose_capability` — not a rejection, but the gap should be justified in the `outcome` (e.g., "low-ROI but unblocks the $X bet downstream"). Don't pad ROI to clear the bar; pad outcomes if anything.
747
+
748
+ Default sort everywhere is RICE desc. Tasks with `kind: "bug"` or `kind: "chore"` are excluded from capability-level effort rollups so they don't crater the score.
749
+
750
+ ---
751
+
752
+ ## Sprints
753
+
754
+ A `Sprint` references tasks by ID — picking a task into a sprint
755
+ doesn't change the task itself, just adds it to `Sprint.taskIds`. The
756
+ same task can carry across consecutive sprints; the burndown treats
757
+ it as remaining effort until delivered.
758
+
759
+ ```jsonc
760
+ {
761
+ "id": "SP-127",
762
+ "name": "Sprint 127",
763
+ "goal": "Push AI segmentation past 60% and lock audit log MVP",
764
+ "startDate": "2026-04-27",
765
+ "endDate": "2026-05-11",
766
+ "taskIds": ["TK-100201", "TK-100204", "TK-100203"],
767
+ "defaultCapacityDays": 8, // optional
768
+ "ownerCapacity": { "Marcus Lee": 5 } // optional per-owner override
769
+ }
770
+ ```
771
+
772
+ ### Capacity
773
+
774
+ Sprints carry an optional per-sprint capacity model:
775
+
776
+ - `defaultCapacityDays` — the day budget every owner gets unless
777
+ overridden.
778
+ - `ownerCapacity[owner]` — an override for one person (PTO, partial
779
+ allocation, etc.).
780
+
781
+ The sprint detail page sums `effortDays(task)` per owner and flags
782
+ anyone whose load exceeds their cap. Agents proposing a sprint plan
783
+ should keep each owner within their capacity rather than just
784
+ maximizing throughput.
785
+
786
+ ---
787
+
788
+ ## ID conventions
789
+
790
+ - `TH-NNNNNN` — theme (legacy `TH-XX` short-form still works)
791
+ - `CAP-NNN` — capability (3-digit, sequential)
792
+ - `SP-NNN` — sprint (3-digit, sequential)
793
+ - `TK-NNNNNN` — task (6-digit, randomized; offline-edit-safe)
794
+
795
+ The PR-to-task matcher accepts any `[A-Z]{2,6}-\d+` pattern but only
796
+ emits matches for IDs that exist in the workspace, so unrelated `JIRA-…`
797
+ mentions won't latch on.
798
+
799
+ ---
800
+
801
+ ## Don'ts
802
+
803
+ - Don't create one capability per PR — capabilities are quarterly bets.
804
+ - Don't leave `outcome` blank, especially on agent-authored work.
805
+ - Don't manipulate RICE inputs to game the score.
806
+ - Don't edit theme IDs after creation. Names are renameable.
807
+ - Don't update `roadmap.json` in random PRs unless the PR is
808
+ explicitly changing the plan.
809
+ - Don't open a PR from a non-`agent/` branch and claim it's
810
+ agent-authored.
811
+ - Don't self-promote a task to `delivered`. Wait for the merged PR.
812
+
813
+ ---
814
+
815
+ ## GitHub integration (overview)
816
+
817
+ Two paths, agent's responsibility is identical for both:
818
+
819
+ **GitHub App (canonical).** A single App is registered once
820
+ (`/api/github-app-manifest`), shared across all workspaces. Each
821
+ workspace installs it on its own repos. PRs arrive over webhooks
822
+ to `/api/github-webhook` and route to the right workspace by
823
+ `installation_id`. Matched PRs link to existing tasks via `link_pr`;
824
+ unmatched PRs are either auto-created as new uncategorized tasks
825
+ (`capabilityId: null`) or queued in `edits.unmatchedPrs` for review
826
+ — gated by the workspace's `pr_unmatched_behavior` setting.
827
+
828
+ **PAT/OAuth (fallback).** Same matcher rules, but the user clicks
829
+ Fetch & Preview in Settings → GitHub sync to pull. Cached in
830
+ localStorage per-workspace.
831
+
832
+ Agent responsibility: author PRs the matcher can find. The matcher
833
+ looks (in order):
834
+ 1. `TK-XXX:` prefix in the PR title.
835
+ 2. `Closes | Fixes | Resolves | Refs TK-XXX` in the body.
836
+ 3. `TK-XXX` prefix in the branch name (e.g. `vsgro/TK-100201-foo`).
837
+ 4. Any bare `TK-XXX` in title or body that matches a known task id.
838
+
839
+ Stick `TK-XXX:` at the start of the title and you're done.
840
+
841
+ For implementation detail see
842
+ [`api/github-webhook.ts`](https://github.com/vsgro/roadmapper/blob/main/api/github-webhook.ts),
843
+ [`src/lib/githubSync.ts`](https://github.com/vsgro/roadmapper/blob/main/src/lib/githubSync.ts), and the GitHub section in
844
+ [`src/components/SettingsPage.tsx`](https://github.com/vsgro/roadmapper/blob/main/src/components/SettingsPage.tsx).
845
+
846
+ ---
847
+
848
+ ## File map
849
+
850
+ - [`src/types.ts`](https://github.com/vsgro/roadmapper/blob/main/src/types.ts) — schema (TypeScript types are the spec)
851
+ - [`src/data/roadmap.json`](https://github.com/vsgro/roadmapper/blob/main/src/data/roadmap.json) — example data
852
+ - [`src/lib/store.ts`](https://github.com/vsgro/roadmapper/blob/main/src/lib/store.ts) — `useRoadmap()` hook with edits / patches / cloning
853
+ - [`src/lib/githubSync.ts`](https://github.com/vsgro/roadmapper/blob/main/src/lib/githubSync.ts) — fetch + match + cache helpers
854
+ - [`src/lib/textMatch.ts`](https://github.com/vsgro/roadmapper/blob/main/src/lib/textMatch.ts) — shared tokenize + Jaccard
855
+ - [`src/lib/profiles.ts`](https://github.com/vsgro/roadmapper/blob/main/src/lib/profiles.ts) — Supabase user_profiles lookup + cache
856
+ - [`src/lib/users.ts`](https://github.com/vsgro/roadmapper/blob/main/src/lib/users.ts) — known-user pool + owner/authorship defaults
857
+ - [`src/lib/util.ts`](https://github.com/vsgro/roadmapper/blob/main/src/lib/util.ts) — RICE / capability / KPI helpers
858
+ - [`src/components/CommandPalette.tsx`](https://github.com/vsgro/roadmapper/blob/main/src/components/CommandPalette.tsx) — global Cmd-K palette
859
+ - [`src/components/OutlookPage.tsx`](https://github.com/vsgro/roadmapper/blob/main/src/components/OutlookPage.tsx) — Strategic Outlook view
860
+ - [`src/components/SprintsPage.tsx`](https://github.com/vsgro/roadmapper/blob/main/src/components/SprintsPage.tsx) — kanban + capacity panel
861
+ - [`src/components/SettingsPage.tsx`](https://github.com/vsgro/roadmapper/blob/main/src/components/SettingsPage.tsx) — GitHub sync flow + Profile + auth
862
+ - [`api/classify-prs.ts`](https://github.com/vsgro/roadmapper/blob/main/api/classify-prs.ts) — Vercel edge function: LLM PR → capability classifier
863
+ - [`mcp/server.mjs`](https://github.com/vsgro/roadmapper/blob/main/mcp/server.mjs) — stdio MCP server (read-only roadmap view)
864
+ - [`src/adapters/`](https://github.com/vsgro/roadmapper/blob/main/src/adapters/) — GitHub / Linear / Productboard / Jira stubs
865
+ - [`src/adapters/README.md`](https://github.com/vsgro/roadmapper/blob/main/src/adapters/README.md) — full mapping table
866
+ - [`docs/llm-pr-mapping-plan.md`](https://github.com/vsgro/roadmapper/blob/main/docs/llm-pr-mapping-plan.md) — planned LLM-based PR↔task matcher
867
+
868
+ ## LocalStorage keys
869
+
870
+ - `roadmap.edits.v3` — local offline buffer of in-app edits
871
+ - `roadmap.activeWorkspace.v1` — the workspace id the user last
872
+ switched to (per-tab; AuthGate re-pins to a valid workspace on
873
+ sign-in if this id isn't in the user's accessible list)
874
+ - `roadmap.kpiPeriod.v2` — dashboard KPI period
875
+ - `roadmap.github.config.<workspaceId>.v1` — PAT, selected repos,
876
+ sync flags, history. **Per-workspace.** Legacy unscoped
877
+ `roadmap.github.config.v1` is auto-migrated into the active
878
+ workspace's slot on first read.
879
+ - `roadmap.github.lastFetch.<workspaceId>.v1` — cached raw PRs +
880
+ contributors from last sync. Per-workspace, same migration.
881
+ - `roadmap.brand.v1` — display-name override + uploaded logo data URL
882
+ - `roadmap.outlook.horizon.v1` — Strategic Outlook horizon selection
883
+ - `roadmap.githubProviderToken.v1` — OAuth provider token captured
884
+ from a GitHub sign-in (used as the GitHub API credential when no
885
+ PAT is set)