npm - @roadmapperai/mcp - Versions diffs - 0.1.0 - Mend

@roadmapperai/mcp 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/AGENTS.md ADDED Viewed

@@ -0,0 +1,885 @@
+# AGENTS.md
+The contract an AI agent must satisfy when planning, coding, or
+opening PRs in this repo. Read this end-to-end before doing any work.
+The roadmap data model is the source of truth — TypeScript types in
+[`src/types.ts`](https://github.com/vsgro/roadmapper/blob/main/src/types.ts) are the canonical schema. Everything
+in this doc is downstream of that file.
+## TL;DR — what an agent must do
+- Reference Themes / Capabilities / Tasks / Sprints by **stable ID**,
+  never by name. Use the `roadmapper` MCP server (see "Discovering
+  the current roadmap" below) to pull live IDs rather than guessing.
+- Open PRs from a branch named `agent/TK-NNNNNN-slug` with commits
+  prefixed `TK-NNNNNN: …`.
+- Self-grade against the task's `acceptance` list before marking work
+  ready for review. Record each grade on `Task.acceptanceGrades[i]`
+  as `{ status: "pass" | "fail", note? }` so reviewers can see what
+  you actually verified. If `acceptance` is empty, flag the task as
+  `kind: "spike"` or stop and ask for one.
+- Respect `Task.dependsOn`. Don't start work on a task whose
+  prerequisites haven't been delivered — surface the blocked state
+  back to the user instead.
+- Stamp `authorKind: "agent"` on tasks/PRs you create.
+- Don't auto-close work. A PR closes its parent task only after a
+  human reviewer approves and merges.
+## Discovering the current roadmap
+Roadmapper ships with `mcp/server.mjs`, a stdio MCP server. Read tools
+let you plan against real IDs; write tools let you act through the
+same surface when the operator has wired in a service-role key.
+**Read tools** (always available):
+- `get_roadmap_snapshot` — **start here**. One-call orient: returns
+  themes, active capabilities, and in-flight tasks for the workspace
+  in a single response. Saves you three round trips when you're just
+  trying to orient. Always live, never cached. Response includes the
+  resolved `workspaceId` you should pass back on any write call.
+- `list_themes` — get the theme catalogue.
+- `list_capabilities` — filter by `themeId` to scope your plan.
+  **Excludes delivered capabilities by default** — agents should
+  plan against work that's still in flight. Pass
+  `includeDelivered: true` only when you genuinely need the full
+  historical list (e.g. reviewing whether a closed bet should be
+  reopened, which is rare and should be a human call).
+- `list_tasks` — filter by `capabilityId` and/or `status` to see what
+  already exists before proposing duplicates.
+- `get_task` — full task detail, including `acceptance`, `dependsOn`,
+  and attached PRs.
+- `get_agents_md` — re-read this contract on demand.
+**Multi-workspace addressing.** A single MCP install can talk to any
+workspace the caller is authorized for. Every read and write tool
+accepts an optional `workspaceId` argument that overrides the env
+default. Two conventions for picking the right one:
+1. If you're working inside a repo with a `.roadmapper/snapshot.json`
+   file in it, that file names its own `workspaceId` — use that.
+2. Otherwise, call `get_roadmap_snapshot` (which uses the env-default
+   workspace) and use the `workspaceId` it returns on subsequent
+   write calls. This is mainly a "did the env get the right
+   workspace?" sanity check; under normal use the env-default is
+   what you want.
+**The rubric gate** — the server tracks whether you've called
+`get_agents_md` (or read `roadmapper://rubric`) this session. Until
+you do, every write tool below returns a structured error:
+```json
+{
+  "error": "prerequisite_missing",
+  "message": "Call get_agents_md first this session, then retry ...",
+  "fix": "get_agents_md()"
+}
+```
+Retry with the `fix` field's call. The gate is not optional —
+proposals filed without the rubric in scope won't round-trip into
+the product because the server's validators are tied to it.
+**Write tools** (require `SUPABASE_SERVICE_ROLE_KEY`; if the operator
+hasn't set it, these return an error result and you should fall back
+to "tell the human what I'd do"):
+- `propose_task` — create a new task under an **existing** capability.
+  The server auto-stamps `authorKind: "agent"`, `status: "planned"`,
+  and a 6-digit `TK-NNNNNN` id. Include `acceptance` and `dependsOn`
+  when you propose — don't make the human fill them in. **Always pass
+  `idempotencyKey`** (e.g. a hash of `capabilityId + title`, or your
+  session id + a per-task counter) so a retry after a crash or
+  transient error reuses the existing task instead of creating a
+  duplicate. Response includes `idempotent: true` when that happens.
+- `propose_capability` — create a new capability under an **existing**
+  theme. Required: `name`, `pillarId`. Sensible defaults are applied
+  (`reach: 100`, `impact: 1`, `confidence: 70`). Pass `outcome` and
+  `specRef` whenever you have them — capabilities without an outcome
+  rarely survive review. Pass `idempotencyKey`.
+- `propose_theme` — create a new theme. **Use sparingly.** Themes are
+  years-stable strategic categories; almost every plan fits under an
+  existing one. Only call this when the human explicitly asks for a
+  new theme or when the work genuinely doesn't fit any of the existing
+  ones from `list_themes`. Default rule: if you're tempted to create
+  a theme, file a capability under the closest existing theme instead
+  and let the human reorganize later.
+- `submit_acceptance_grades` — stamp `{ status: pass | fail, note? }`
+  per criterion index, after you've actually verified the work. The
+  server stamps `gradedAt` and `gradedBy: "mcp:agent"`.
+**Lifecycle tools** (archive, move, update — every call requires
+`reason`; the audit log records who/why so future readers know what
+happened):
+- `archive_task` / `archive_capability` / `archive_theme` — soft-delete
+  an entity. The row stays in the workspace; list views filter it out,
+  by-id lookups still resolve. Archive refuses if the entity has active
+  children (forces bottom-up: archive tasks before their capability,
+  capabilities before their theme).
+- `unarchive_task` / `unarchive_capability` / `unarchive_theme` —
+  reverse archive. The parent must be active. To unarchive while
+  reparenting, use `move_*` with a different active target — that path
+  unarchives in one step.
+- `move_task` / `move_capability` — re-parent under a different
+  capability/theme. IDs are stable across moves. Target parent must be
+  active. Moving to the current parent returns `{ idempotent: true }`.
+  Tasks move via `move_task`; capabilities move via `move_capability`.
+  Themes are roots and don't move.
+- `move_tasks` / `move_capabilities` — bulk variant, up to 100 items
+  per call. The server stamps one shared `batchId` into every audit
+  row so a reorg shows up as one logical event in history. Per-item
+  failures don't roll back earlier successes — the response includes
+  per-move ok/error so you can retry the failures.
+- `update_task` / `update_capability` / `update_theme` — patch fields
+  on an entity. The patch is a partial object; only the keys you
+  include are touched. Parent fields (`capabilityId`, `pillarId`) and
+  lifecycle flags (`archived`, `archivedAt`, `id`) are rejected — use
+  `move_*` / `archive_*` instead. The server diffs against the
+  projected current state; a patch that matches everything returns
+  `{ idempotent: true }` with no audit row.
+All lifecycle tools require `reason`. Idempotent retries (same state,
+same input) return `{ idempotent: true }` and emit no audit row, so
+crash-retry is safe.
+**Scope hints on tasks.** `propose_task` and `update_task` accept
+two optional advisory fields:
+- `expectedPRs` — max merged PRs you expect for this task.
+- `expectedScope` — cumulative LoC ceiling (additions + deletions)
+  across all PRs linked to the task.
+Both are unset by default. When the webhook links a PR that pushes
+the task past either ceiling, it writes a `scope_overrun` row to
+`audit_log` (action=`scope_overrun`, target_kind=`task`). The link
+itself never gets refused — these are observations, not gates. Use
+them to declare an envelope per task and watch which tasks blow
+through it.
+**Outcome readings.** Capabilities accumulate point-in-time
+measurements against their stated outcome:
+- `record_outcome_reading({ capabilityId, value, asOf, source,
+  note? })` — append a reading. The server takes a row lock so
+  concurrent writers (a human pasting Mixpanel + a script pushing
+  a warehouse query) both land instead of clobbering.
+- `list_stale_outcomes({ thresholdDays?: 14, includeArchived? })`
+  — read tool. Returns capabilities with no reading in the
+  threshold window (or never measured), sorted never-measured
+  first then by most-stale. Use during quarterly review or to
+  spot bets that lost the empirical loop.
+**Workspace auto-default + cross-workspace guard.** If the cwd
+has a `.roadmapper/snapshot.json` (committed by the snapshot
+cron into every connected repo), the MCP defaults to that
+workspace. Mutator calls with an explicit `workspaceId` that
+disagrees with the cwd snapshot are refused — set
+`ROADMAPPER_ALLOW_CROSS_WORKSPACE=1` in the env to override.
+Reads can target any workspace freely.
+Authoring discipline:
+- Read first (`list_themes`, `list_capabilities`, `list_tasks`) before
+  proposing anything, so you don't invent a new theme/capability that
+  duplicates an existing one.
+- Prefer the deepest existing parent. New task > new capability > new
+  theme. Climb the tree only when nothing at the lower level fits.
+- The three `propose_*` tools support `idempotencyKey`. Always pass one
+  so a retry collapses to the existing record. (`submit_acceptance_grades`
+  is naturally idempotent — re-stamping the same grade is a no-op.)
+All write tools route through Postgres RPCs (`public.propose_task`,
+`public.propose_capability`, `public.propose_theme`,
+`public.grade_acceptance`, `public.archive_entity`,
+`public.unarchive_entity`, `public.move_entity`, `public.update_entity`
+— defined across migrations 0006–0045). The read-modify-write happens
+inside one transaction with row-level locking, so concurrent agents
+writing to the same workspace queue cleanly — no last-write-wins
+clobber. Safe to call from parallel agent sessions.
+**MCP resources** (auto-pulled by clients that subscribe at
+connect; bypass the "agent forgot to fetch" failure entirely):
+- `roadmapper://rubric` — same content as `get_agents_md`. Reading
+  this resource counts toward the rubric gate, so a client that
+  auto-subscribes satisfies the prerequisite without an explicit
+  tool call.
+- `roadmapper://capabilities/active` — live snapshot of non-
+  delivered capabilities.
+- `roadmapper://tasks/open` — live snapshot of `in_progress` +
+  `planned` tasks.
+**MCP prompts** (slash-commands the user invokes to force a flow;
+bypass model judgment about when to apply the rubric):
+- `roadmapper:plan-feature <description>` — expands into "fetch
+  rubric → suggest_capability_for → propose tasks under it, or ask
+  before creating a new capability". Use when the user says "design
+  features for X" or "plan Y".
+- `roadmapper:close-task <task_id> [pr_url]` — expands into the
+  acceptance-grade + link-pr flow with the task already named.
+- `roadmapper:weekly-review` — structured walk through open tasks,
+  stale capabilities, and ungraded deliveries.
+**System reminders.** List-tool responses (`list_tasks`,
+`list_capabilities`, `get_roadmap_snapshot`, `suggest_capability_for`)
+attach a `_meta.roadmapper.reminder` field when the server detects
+a flow that needs attention — e.g. "3 delivered tasks have merged
+PRs without submitted acceptance grades; call submit_acceptance_grades
+for TK-X, TK-Y, TK-Z." Surface these to the user; they're the
+roadmap's way of asking for the next action.
+Three capability tiers, driven by which env vars the operator set:
+| Tier              | Required env                                                                                              | What you get                                                                  |
+|-------------------|-----------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------|
+| Seed-only         | none                                                                                                      | Read tools against the static `roadmap.json`.                                 |
+| Live read         | `SUPABASE_URL` + (`SUPABASE_PUBLISHABLE_KEY` or `SUPABASE_ANON_KEY`) + `SUPABASE_WORKSPACE_ID`             | Read tools merged with the workspace's edits.                                 |
+| Live read + write | All of the above plus `SUPABASE_SERVICE_ROLE_KEY` (and migrations 0005–0045 applied)                      | Read tools + propose_* / grade / archive_* / unarchive_* / move_* / update_* tools. |
+If a write tool returns an error result mentioning `SUPABASE_SERVICE_ROLE_KEY`,
+the operator is on the live-read tier; don't keep retrying — fall
+back to telling the human what you'd do and let them apply it.
+Sanity check the install with `node mcp/server.mjs --selftest` — runs
+every tool against the local seed and prints a pass/fail summary.
+If the operator chose project-scoped install (`.mcp.json` in the
+roadmapper repo root, which is the default `npm run mcp:setup` path),
+the MCP only loads when their client is launched from that repo. If
+you're an agent running in another codebase and the `roadmapper`
+tools aren't visible, ask the operator to either (a) merge the
+`mcpServers.roadmapper` block from `roadmap/.mcp.json` into their
+user-level client config or (b) point you at the roadmapper repo so
+you can run there instead.
+Wire-up (Claude Code, Claude Desktop, or any MCP client):
+```jsonc
+{
+  "mcpServers": {
+    "roadmapper": {
+      "command": "node",
+      "args": ["/absolute/path/to/roadmap/mcp/server.mjs"],
+      "env": {
+        "SUPABASE_URL": "https://<id>.supabase.co",
+        "SUPABASE_PUBLISHABLE_KEY": "sb_publishable_...",
+        "SUPABASE_WORKSPACE_ID": "<workspace id>",
+        "SUPABASE_SERVICE_ROLE_KEY": "<only if write tools wanted>"
+      }
+    }
+  }
+}
+```
+The `env` block is optional — drop it to serve the seed only. See
+[README.md](/README.md#mcp-server) for the config-file path per
+client and the full env-var matrix.
+## The snapshot file in connected repos
+Workspaces that have the GitHub App installed get a server-pushed
+snapshot file at `.roadmapper/snapshot.json` on a dedicated branch
+named `roadmapper-snapshot` in every connected repo. The cron
+refreshes it every 6 hours; commits are content-hash-deduped so
+the branch only moves when the workspace actually changed.
+**Why this matters for agents:** if you're working out of a checkout
+that has this branch, you can read the canonical model from the
+filesystem without an MCP round trip. The file contains:
+```json
+{
+  "workspaceId": "ws-abc",
+  "workspaceName": "Acme Roadmap",
+  "generatedAt": "2026-05-12T18:00:00Z",
+  "themes": [...],         // every theme in the workspace
+  "capabilities": [...],   // every capability
+  "tasks": [...]           // every task
+}
+```
+The snapshot is the **complete current state** of the workspace's
+entity tables, projected to camelCase. No seed-merge is required
+on the consumer side — the snapshot already contains the bundled
+demo entities plus everything created by the user (post-Stage-3
+the seed is just the projection target for fresh workspaces, not
+a parallel data path).
+Soft-deleted workspaces stop receiving snapshot updates — the
+cron filters them out. If you're operating from a stale snapshot
+on a deleted workspace, your MCP writes will hit
+`assert_workspace_active` and error; treat that as the signal
+that the workspace is gone and stop pushing.
+The `workspaceId` field is what you pass into MCP write tools (via
+their optional `workspaceId` arg) so the write lands on the right
+workspace even when the MCP's env points elsewhere. Think of the
+snapshot file as the agent's "where am I?" pointer.
+## The mental model
+Five layers, don't conflate them:
+| Layer            | Owned by   | Stability     | Example                                    |
+|------------------|------------|---------------|--------------------------------------------|
+| **Theme**        | Leadership | Years         | "Data & Intelligence"                      |
+| **Capability**   | PM         | Quarters      | "Real-time segmentation"                   |
+| **Sprint**       | Eng lead   | 1–2 weeks     | "Sprint 127"                               |
+| **Task**         | IC / agent | Days–weeks    | "Implement segment recompute job"          |
+| **PR**           | IC / agent | Hours–days    | `TK-100201: Add NLP segment authoring`     |
+Plans live as Capabilities. Engineering output is Tasks + PRs. Each
+PR closes one or more Tasks. The **Strategic Outlook** view
+(`#outlook`) shows Capabilities on a 3-month to 3-year horizon for
+stakeholder/customer presentation; drag a bar between theme groups
+to rebind its `pillarId`.
+---
+## Planning a feature with an LLM
+When a chat session asks you to "plan out feature X" before any
+code is written, your job is to emit a structure that maps 1:1 onto
+the Theme → Capability → Task hierarchy below. Don't return a generic
+PRD or bullet list — the user will copy your output straight into
+Roadmapper, so it has to be in the shape Roadmapper consumes.
+### Decision tree
+Before writing anything:
+1. **Find the home Theme.** Call `suggest_theme_for({ description })`
+   with a one-line description of the work — the server ranks
+   existing themes by token overlap and returns a fit signal.
+   - **Top score > 0.4**: existing theme fits, use it. Never call
+     `propose_theme` after seeing a strong match.
+   - **Top score 0.2–0.4**: weak overlap. The top match is still
+     usually the right home; re-using a "close-enough" theme is
+     almost always better than creating a duplicate.
+   - **Empty matches or top < 0.2**: no existing theme fits. Stop
+     and ask the user explicitly whether this represents a new
+     strategic direction worth a years-stable theme. Only call
+     `propose_theme` after the user confirms — themes are
+     years-stable, not per-feature.
+   The propose_theme tool enforces this discovery: skipping
+   `suggest_theme_for` (or `list_themes` / `get_roadmap_snapshot`)
+   returns a `discovery_missing` error.
+2. **Decompose into Capabilities.** A Capability is "a quarterly
+   bet that ships value to the customer." Most features are one
+   Capability. A complex feature (multiple value streams, multiple
+   stakeholders, three+ months of work) splits into 2–3 Capabilities.
+   If you're tempted to write 5+, you're describing a Theme — go
+   back to step 1.
+3. **Decompose each Capability into Tasks.** A Task is "days-to-weeks
+   of work that one IC can finish and PR." Aim for 3–8 Tasks per
+   Capability. Fewer means the Capability is too small (combine
+   with a sibling). More means the Tasks are too granular (merge
+   them — Roadmapper isn't a checklist app).
+4. **Stamp every record with the required fields below.** Don't
+   skip `outcome`, `acceptance`, `effort`, or `priority` — those
+   are what make the plan actionable instead of aspirational.
+### What to emit (template)
+Return a single JSON block. The user will paste it into Roadmapper's
+import or read it manually; either way the field names must match
+exactly. IDs use the `__NEW__` placeholder prefix when you're
+proposing a new record — Roadmapper assigns the real `TH-NNNNNN` /
+`CAP-NNN` / `TK-NNNNNN` ID at import time.
+```jsonc
+{
+  "themes": [
+    // Only include a theme when proposing a NEW one. Otherwise
+    // reference an existing theme by its real id below.
+    {
+      "id": "__NEW__data-intelligence",
+      "name": "Data & Intelligence",
+      "description": "Make customer data instantly queryable, cross-product.",
+      "color": "#2563eb",
+      "targetRoi": 12   // $M annual; optional, exec rollup signal
+    }
+  ],
+  "capabilities": [
+    {
+      "id": "__NEW__realtime-segmentation",
+      "name": "Real-time segmentation",
+      "pillarId": "__NEW__data-intelligence",  // or an existing TH- id
+      "description": "Recompute audience membership within 60s of an event.",
+      "outcome": "Marketing operators ship campaigns from fresh data instead of yesterday's snapshot. Closes the #1 ICP request.",
+      "specRef": "https://notion.so/.../realtime-seg-rfc",  // strongly recommended
+      "reach": 120,    // distinct customers/qtr
+      "impact": 2,     // 3 massive | 2 high | 1 medium | 0.5 low | 0.25 minimal
+      "confidence": 75 // 0–100
+    }
+  ],
+  "tasks": [
+    {
+      "id": "__NEW__segment-recompute-job",
+      "capabilityId": "__NEW__realtime-segmentation",
+      "title": "Implement segment recompute job",
+      "summary": "Stream-process events and rewrite audience membership in <60s.",
+      "kind": "feature",     // feature | bug | chore | spike
+      "authorKind": "agent",
+      "effort": "L",         // XS=3d S=8d M=18d L=35d XL=65d
+      "priority": "P1",      // P0 critical | P1 high | P2 medium | P3 low
+      "start": "2026-06-01",
+      "target": "2026-07-15",
+      "acceptance": [
+        "Latency from event ingest to membership update is <60s p95.",
+        "Backfill of last 30 days completes without missed events.",
+        "Recompute job emits CloudWatch metrics for lag and throughput."
+      ],
+      "dependsOn": []        // other __NEW__ task ids you propose
+    },
+    // ...3–8 tasks per capability
+  ]
+}
+```
+### Field quality bar
+Things you should *not* skip or hand-wave:
+- **`outcome`** — describe the customer-visible change in plain
+  English, ~2 sentences. "Adds X feature" is not an outcome;
+  "Marketing operators stop asking us for X manually" is.
+- **`specRef`** — link to a real doc (Notion, Figma, Linear, GitHub
+  RFC, etc.). If one doesn't exist yet, propose its title in the
+  description and tag the first Task `kind: "spike"` to write it.
+- **`acceptance`** — each entry is a single checkable assertion.
+  An agent will grade itself against this list before opening a
+  PR. "Works correctly" is not acceptance; "Returns 401 when token
+  is missing" is.
+- **`effort`** — pick the size that matches the listed day budget.
+  Don't compress 6 weeks of work into M because the user wants
+  it sooner.
+- **`priority`** — P0 is reserved for revenue / SLA / security
+  fires. Most planned work is P1 or P2.
+### Common shape of your reply
+A typical planning response is:
+```
+1. Three sentences explaining what you understood about the feature.
+2. The JSON block above, fully filled in.
+3. Any open questions you want answered before the user imports it.
+```
+Don't pad with status updates, project-management theatre, or a
+"let's break this down" preamble. Roadmapper already provides the
+breakdown; you provide the data.
+---
+## Required of every agent task
+```jsonc
+{
+  "id": "TK-100201",
+  "capabilityId": "CAP-018",
+  "title": "Audit log S3 export job",
+  "summary": "Stream the audit log to S3 nightly, partitioned by day.",
+  "kind": "feature",                // feature | bug | chore | spike
+  "authorKind": "agent",            // human | agent
+  "effort": "M",                    // XS | S | M | L | XL
+  "priority": "P1",                 // P0 | P1 | P2 | P3
+  "start": "2026-06-01",
+  "target": "2026-07-15",
+  "owner": "Marcus Lee",
+  "ownerUserId": "...",             // auth.users.id when known
+  "acceptance": [                   // checkable assertions
+    "Nightly job runs at 02:00 UTC and writes to s3://audit/YYYY/MM/DD/.",
+    "Failures page on-call via PagerDuty.",
+    "Last 30 days are queryable via Snowflake external table."
+  ],
+  "dependsOn": ["TK-100199"]        // optional
+}
+```
+- `acceptance` is **required for any task an agent will pick up**. An
+  empty list is a stop signal: either the task is too vague to start
+  (`kind: "spike"` it and produce a spec), or someone needs to fill
+  it in.
+- `acceptanceGrades[i]` is the persistent self-grade for criterion
+  `i`. Write `{ status: "pass" | "fail", note? }` after you've
+  actually verified the assertion. The task surfaces a "Self-graded
+  ready" badge once every entry passes; reviewers should not need to
+  ask "did you check this?"
+- `dependsOn` is honored by sprint composition, burndown, and the
+  "Blocked" indicator in task lists. Don't invent it just because
+  tasks happen in order; only set it when one truly blocks another.
+- `kind` decides whether the task counts toward RICE/forecasting.
+  Bugs and chores don't pollute capability-level estimates.
+### Required capability fields
+```jsonc
+{
+  "id": "CAP-018",
+  "name": "Audit log export",
+  "pillarId": "TH-PL",
+  "description": "Export audit stream to S3/Snowflake.",
+  "outcome": "Compliance officers stop asking for ad-hoc dumps. SOC2 evidence in one click.",
+  "specRef": "https://notion.so/.../audit-log-rfc",   // required before agent decomposition
+  "reach": 80, "impact": 2, "confidence": 85,
+  "outcomeStatus": "pending"                          // pending | achieved | missed
+}
+```
+- `specRef` is **required before an agent decomposes a Capability
+  into Tasks**. Without it, the agent has no contract for what's in
+  scope. Stop and ask.
+- `outcomeStatus` is reviewed quarterly. "Delivered" doesn't mean
+  "outcome achieved"; the field exists to keep the distinction honest.
+---
+## Branch & commit conventions
+Agent-authored work uses idempotent, attributable identifiers. Branch
+naming, commit prefixing, and PR title are all derived from the same
+task ID so re-syncs don't double-apply and review tooling can group
+agent work.
+- **Branch:** `agent/TK-NNNNNN-slug`
+  - `agent/` prefix tells branch protection and reviewers this needs
+    extra scrutiny.
+  - `TK-NNNNNN` is the task ID (6-digit numeric).
+  - `slug` is a short kebab-case description (≤ 6 words).
+  - Example: `agent/TK-100201-nlp-segment-authoring`.
+- **Commit messages:** `TK-NNNNNN: <imperative summary>`
+  - Every commit on the branch carries the prefix.
+  - Body explains *why* (not what — the diff has that).
+- **PR title:** `TK-NNNNNN: <task title>` (auto-derived).
+- **PR body:** see "PR body template" below. The body's `Closes
+  TK-NNNNNN` line is what links the PR to the task at sync time.
+Multi-task work: prefix the most-load-bearing task ID, list the
+others in the PR body's `Closes …` lines. The Settings → GitHub
+unmatched picker can also attach one PR to multiple tasks
+post-hoc.
+---
+## PR body template
+```markdown
+## Summary
+What changed in one paragraph.
+## Customer impact
+Plain-English statement. If this is the last PR for the capability,
+this becomes the capability outcome (Roadmapper auto-promotes when the
+capability's `outcome` is empty — never overwrites).
+## Acceptance check
+- [x] <criterion 1 from the task>
+- [x] <criterion 2 from the task>
+- [ ] <criterion 3 — not done; explain or split>
+## Test plan
+- [ ] How a reviewer should validate this locally / in CI.
+Closes TK-100201
+Roadmapper-Task: TK-100201
+Roadmapper-Capability: CAP-9F2C7E
+```
+### Roadmapper trailers — the canonical-model loop
+The two lines at the bottom (`Roadmapper-Task:` and `Roadmapper-Capability:`)
+are **trailers** the webhook reads when GitHub fires `pull_request`
+events. They close a loop that's otherwise impossible to close
+server-side: only the agent doing the work knows which capability the
+PR actually advances.
+**Before opening a PR**, an agent must:
+1. Call `list_capabilities` (and `list_themes` if needed) over MCP
+   and read the **entire list** of *active* capabilities — names,
+   outcomes, descriptions. A one-shot text match against a single
+   candidate is not enough; the right capability is usually the one
+   whose outcome most closely maps to what your PR achieves, not
+   the one with the most keyword overlap. Use `list_capabilities`
+   with a `themeId` filter when you already know the theme to keep
+   the list short. Delivered capabilities are excluded by default —
+   they're closed bets, not landing pads for new work.
+2. Pick the **single best-fit** capability id, or call
+   `propose_capability` if and only if nothing genuinely fits.
+   Default rule still applies: new task > new capability > new
+   theme. A capability that already exists is almost always better
+   than a new one.
+3. Stamp both trailers (Task + Capability) into the PR body, exactly
+   as shown above — one per line, at the bottom of the body, no
+   trailing punctuation.
+The webhook will:
+- Honor `Roadmapper-Task: TK-NNNNNN` as authoritative when the id
+  exists in the workspace, even if the PR title doesn't say `TK-…:`.
+- Pass `Roadmapper-Capability: CAP-XXXXXX` to `create_task_from_pr`
+  when no matching task exists, so brand-new tasks land already
+  associated with the right capability instead of dumped in
+  Uncategorized.
+- Fall back to (Jaccard match against user-created capabilities) →
+  (null capability) when no trailer is present.
+The PR-to-task matcher accepts (in order):
+1. **Roadmapper-Task trailer** — `Roadmapper-Task: TK-NNNNNN` in the
+   PR body. Authoritative when the id is known to the workspace.
+2. **Already-attached** — if the PR is already on a task's `prs[]`,
+   it routes back to that task. Manual mappings stick across re-syncs.
+3. **ID reference** — title prefix `XX-N: …`, body `Closes/Fixes/Refs/
+   Resolves XX-N`, or any bare `XX-N` matching a known task ID.
+4. **AI map** — Settings → GitHub → "AI map" sends the visible /
+   selected unmatched PRs to a Vercel edge function backed by Claude
+   Haiku 4.5 for semantic capability routing. PR rows show an "AI"
+   chip when the suggestion came from the LLM. Requires
+   `ANTHROPIC_API_KEY` set on the Vercel project; without it the
+   Jaccard fallback continues to work and the UI surfaces a one-line
+   "not configured" notice.
+5. **Jaccard fallback** — title/body tokens against capability
+   tokens, threshold 0.10. Always available; used when nothing
+   higher-precedence matched. The webhook also runs this fallback
+   server-side against user-created capabilities when picking a
+   capability hint for an auto-created task.
+If you want your PR to land on the right task without anyone
+clicking the unmatched picker, do (1) by stamping the
+`Roadmapper-Task:` trailer, (2) by branching from
+`agent/TK-NNNNNN-slug`, or (3) by including `Closes TK-NNNNNN` in
+the body.
+---
+## How merged PRs become "delivered"
+`status="delivered"` auto-stamps `delivered = today` and `progress =
+100`. With the GitHub sync's `Auto-mark delivered` flag on, a merged
+PR flips its parent task to `delivered` using the merge timestamp.
+Caveats:
+- Merged ≠ deployed ≠ outcome achieved. The capability's
+  `outcomeStatus` exists so quarterly reviews can say so.
+- An agent **must not** flip a task to `delivered` on its own; that's
+  driven by a human-merged PR, not by the agent's self-grade.
+- `originalTarget` is captured at first plan and never auto-changes;
+  on-time rate compares `delivered` to `originalTarget`.
+### Outcome attainment
+Each capability carries an `outcomeStatus` (`pending` / `achieved` /
+`missed`) that the human stamps from the capability detail page once
+delivery is done — independent of whether the code shipped on time.
+A separate target-hit / target-missed chip is auto-derived from
+`delivered ≤ target` and needs no manual stamp. Agents that propose
+new capabilities should set `outcomeStatus: "pending"` and leave
+`outcomeCheckedAt` unset — the human owns those fields.
+---
+## RICE scoring
+```
+score = (Reach × Impact × Confidence/100) ÷ Effort-days
+```
+| Field      | Meaning                                            | Range / values        |
+|------------|----------------------------------------------------|-----------------------|
+| Reach      | Distinct customers/users touched per quarter, post-launch | Integer        |
+| Impact     | How big a change                                   | 3 · 2 · 1 · 0.5 · 0.25 |
+| Confidence | How sure you are this will work                    | 0–95                  |
+| Effort     | Auto-summed from tasks (XS=2h, S=4h, M=1d, L=3d, XL=8d) | Days               |
+### Impact levels (concrete)
+- **3 — Massive.** Unlocks a new revenue line, removes the single biggest blocker in your pipeline, or shifts a north-star metric.
+- **2 — High.** Removes a top-5 friction in the existing flow, or moves a secondary metric measurably.
+- **1 — Medium.** Solid improvement that customers will notice but won't move headline numbers.
+- **0.5 — Low.** Quality-of-life polish.
+- **0.25 — Minimal.** Cosmetic or internal-only.
+### Confidence levels (concrete)
+- **>90 only if the work is already shipped or running behind a flag.** Anything earlier is a guess.
+- **70–90** — comparable to things you've shipped before.
+- **40–70** — plausible but not validated.
+- **<40** — propose as `kind: "spike"`, not a feature capability.
+- **100 is never correct.** The server caps `propose_capability` at 95 to make this stick.
+### Outcome statements (required, falsifiable)
+Every capability needs an outcome that can be checked against reality. Template:
+> `<metric>` moves from `<baseline>` to `<target>` by `<date>`, measured by `<source>`.
+**Good** — `CAP-XXXX "SDK primitives for landing pages"`
+> Pages-published-per-BU moves from 0 to median 3/quarter by 2026-Q3, measured by `lp_publish_event`.
+**Good** — `CAP-XXXX "Onboarding completion lift"`
+> Activation rate moves from 32% to 55% by 2026-09-30, measured by the `activated_user` warehouse event.
+**Weak** — "Improve builder UX." Fails: no metric, no baseline, no date. Rewrite or drop.
+**Weak** — "Better email deliverability." Fails: same. Rewrite as "Email deliverability moves from 87% to 95% by 2026-Q4, measured by Postmark stats."
+The MCP server's `propose_capability` tool rejects outcomes that don't contain both a number and a temporal anchor (year, quarter, month name, or "by …"). Use `dryRun: true` to validate before committing.
+### When NOT to create a capability
+- **One-off bug fix** → task under the existing capability it affects.
+- **Infra work under an existing bet** → task on that bet, not a new bet.
+- **Exploration without a commitment** → don't propose yet; gather evidence, then come back.
+- **Renaming or refactoring** → task, not capability.
+- **Anything that fits in one PR** → task.
+If you find yourself emitting 5+ tasks under a capability you just proposed, the capability is too big — split it. If you can't write a falsifiable outcome, the capability isn't real yet — go back one step.
+### ROI vs theme
+A capability's `roi` should clear ~70% of the parent theme's `targetRoi`. Below that, the server flags a warning on `propose_capability` — not a rejection, but the gap should be justified in the `outcome` (e.g., "low-ROI but unblocks the $X bet downstream"). Don't pad ROI to clear the bar; pad outcomes if anything.
+Default sort everywhere is RICE desc. Tasks with `kind: "bug"` or `kind: "chore"` are excluded from capability-level effort rollups so they don't crater the score.
+---
+## Sprints
+A `Sprint` references tasks by ID — picking a task into a sprint
+doesn't change the task itself, just adds it to `Sprint.taskIds`. The
+same task can carry across consecutive sprints; the burndown treats
+it as remaining effort until delivered.
+```jsonc
+{
+  "id": "SP-127",
+  "name": "Sprint 127",
+  "goal": "Push AI segmentation past 60% and lock audit log MVP",
+  "startDate": "2026-04-27",
+  "endDate": "2026-05-11",
+  "taskIds": ["TK-100201", "TK-100204", "TK-100203"],
+  "defaultCapacityDays": 8,                       // optional
+  "ownerCapacity": { "Marcus Lee": 5 }            // optional per-owner override
+}
+```
+### Capacity
+Sprints carry an optional per-sprint capacity model:
+- `defaultCapacityDays` — the day budget every owner gets unless
+  overridden.
+- `ownerCapacity[owner]` — an override for one person (PTO, partial
+  allocation, etc.).
+The sprint detail page sums `effortDays(task)` per owner and flags
+anyone whose load exceeds their cap. Agents proposing a sprint plan
+should keep each owner within their capacity rather than just
+maximizing throughput.
+---
+## ID conventions
+- `TH-NNNNNN` — theme (legacy `TH-XX` short-form still works)
+- `CAP-NNN` — capability (3-digit, sequential)
+- `SP-NNN` — sprint (3-digit, sequential)
+- `TK-NNNNNN` — task (6-digit, randomized; offline-edit-safe)
+The PR-to-task matcher accepts any `[A-Z]{2,6}-\d+` pattern but only
+emits matches for IDs that exist in the workspace, so unrelated `JIRA-…`
+mentions won't latch on.
+---
+## Don'ts
+- Don't create one capability per PR — capabilities are quarterly bets.
+- Don't leave `outcome` blank, especially on agent-authored work.
+- Don't manipulate RICE inputs to game the score.
+- Don't edit theme IDs after creation. Names are renameable.
+- Don't update `roadmap.json` in random PRs unless the PR is
+  explicitly changing the plan.
+- Don't open a PR from a non-`agent/` branch and claim it's
+  agent-authored.
+- Don't self-promote a task to `delivered`. Wait for the merged PR.
+---
+## GitHub integration (overview)
+Two paths, agent's responsibility is identical for both:
+**GitHub App (canonical).** A single App is registered once
+(`/api/github-app-manifest`), shared across all workspaces. Each
+workspace installs it on its own repos. PRs arrive over webhooks
+to `/api/github-webhook` and route to the right workspace by
+`installation_id`. Matched PRs link to existing tasks via `link_pr`;
+unmatched PRs are either auto-created as new uncategorized tasks
+(`capabilityId: null`) or queued in `edits.unmatchedPrs` for review
+— gated by the workspace's `pr_unmatched_behavior` setting.
+**PAT/OAuth (fallback).** Same matcher rules, but the user clicks
+Fetch & Preview in Settings → GitHub sync to pull. Cached in
+localStorage per-workspace.
+Agent responsibility: author PRs the matcher can find. The matcher
+looks (in order):
+1. `TK-XXX:` prefix in the PR title.
+2. `Closes | Fixes | Resolves | Refs TK-XXX` in the body.
+3. `TK-XXX` prefix in the branch name (e.g. `vsgro/TK-100201-foo`).
+4. Any bare `TK-XXX` in title or body that matches a known task id.
+Stick `TK-XXX:` at the start of the title and you're done.
+For implementation detail see
+[`api/github-webhook.ts`](https://github.com/vsgro/roadmapper/blob/main/api/github-webhook.ts),
+[`src/lib/githubSync.ts`](https://github.com/vsgro/roadmapper/blob/main/src/lib/githubSync.ts), and the GitHub section in
+[`src/components/SettingsPage.tsx`](https://github.com/vsgro/roadmapper/blob/main/src/components/SettingsPage.tsx).
+---
+## File map
+- [`src/types.ts`](https://github.com/vsgro/roadmapper/blob/main/src/types.ts) — schema (TypeScript types are the spec)
+- [`src/data/roadmap.json`](https://github.com/vsgro/roadmapper/blob/main/src/data/roadmap.json) — example data
+- [`src/lib/store.ts`](https://github.com/vsgro/roadmapper/blob/main/src/lib/store.ts) — `useRoadmap()` hook with edits / patches / cloning
+- [`src/lib/githubSync.ts`](https://github.com/vsgro/roadmapper/blob/main/src/lib/githubSync.ts) — fetch + match + cache helpers
+- [`src/lib/textMatch.ts`](https://github.com/vsgro/roadmapper/blob/main/src/lib/textMatch.ts) — shared tokenize + Jaccard
+- [`src/lib/profiles.ts`](https://github.com/vsgro/roadmapper/blob/main/src/lib/profiles.ts) — Supabase user_profiles lookup + cache
+- [`src/lib/users.ts`](https://github.com/vsgro/roadmapper/blob/main/src/lib/users.ts) — known-user pool + owner/authorship defaults
+- [`src/lib/util.ts`](https://github.com/vsgro/roadmapper/blob/main/src/lib/util.ts) — RICE / capability / KPI helpers
+- [`src/components/CommandPalette.tsx`](https://github.com/vsgro/roadmapper/blob/main/src/components/CommandPalette.tsx) — global Cmd-K palette
+- [`src/components/OutlookPage.tsx`](https://github.com/vsgro/roadmapper/blob/main/src/components/OutlookPage.tsx) — Strategic Outlook view
+- [`src/components/SprintsPage.tsx`](https://github.com/vsgro/roadmapper/blob/main/src/components/SprintsPage.tsx) — kanban + capacity panel
+- [`src/components/SettingsPage.tsx`](https://github.com/vsgro/roadmapper/blob/main/src/components/SettingsPage.tsx) — GitHub sync flow + Profile + auth
+- [`api/classify-prs.ts`](https://github.com/vsgro/roadmapper/blob/main/api/classify-prs.ts) — Vercel edge function: LLM PR → capability classifier
+- [`mcp/server.mjs`](https://github.com/vsgro/roadmapper/blob/main/mcp/server.mjs) — stdio MCP server (read-only roadmap view)
+- [`src/adapters/`](https://github.com/vsgro/roadmapper/blob/main/src/adapters/) — GitHub / Linear / Productboard / Jira stubs
+- [`src/adapters/README.md`](https://github.com/vsgro/roadmapper/blob/main/src/adapters/README.md) — full mapping table
+- [`docs/llm-pr-mapping-plan.md`](https://github.com/vsgro/roadmapper/blob/main/docs/llm-pr-mapping-plan.md) — planned LLM-based PR↔task matcher
+## LocalStorage keys
+- `roadmap.edits.v3` — local offline buffer of in-app edits
+- `roadmap.activeWorkspace.v1` — the workspace id the user last
+  switched to (per-tab; AuthGate re-pins to a valid workspace on
+  sign-in if this id isn't in the user's accessible list)
+- `roadmap.kpiPeriod.v2` — dashboard KPI period
+- `roadmap.github.config.<workspaceId>.v1` — PAT, selected repos,
+  sync flags, history. **Per-workspace.** Legacy unscoped
+  `roadmap.github.config.v1` is auto-migrated into the active
+  workspace's slot on first read.
+- `roadmap.github.lastFetch.<workspaceId>.v1` — cached raw PRs +
+  contributors from last sync. Per-workspace, same migration.
+- `roadmap.brand.v1` — display-name override + uploaded logo data URL
+- `roadmap.outlook.horizon.v1` — Strategic Outlook horizon selection
+- `roadmap.githubProviderToken.v1` — OAuth provider token captured
+  from a GitHub sign-in (used as the GitHub API credential when no
+  PAT is set)