npm - @cyanheads/mcp-ts-core - Versions diffs - 0.9.9 → 0.9.11 - Mend

@cyanheads/mcp-ts-core 0.9.9 → 0.9.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (41) hide show

package/CLAUDE.md +3 -2
package/README.md +4 -4
package/biome.json +1 -1
package/changelog/0.9.x/0.9.10.md +37 -0
package/changelog/0.9.x/0.9.11.md +38 -0
package/changelog/template.md +8 -0
package/dist/cli/init.js +16 -3
package/dist/cli/init.js.map +1 -1
package/dist/logs/combined.log +4 -0
package/dist/logs/error.log +2 -0
package/dist/logs/interactions.log +0 -0
package/dist/mcp-server/transports/http/httpErrorHandler.d.ts.map +1 -1
package/dist/mcp-server/transports/http/httpErrorHandler.js +27 -6
package/dist/mcp-server/transports/http/httpErrorHandler.js.map +1 -1
package/dist/mcp-server/transports/http/httpTransport.d.ts.map +1 -1
package/dist/mcp-server/transports/http/httpTransport.js +29 -4
package/dist/mcp-server/transports/http/httpTransport.js.map +1 -1
package/package.json +9 -9
package/skills/code-simplifier/SKILL.md +130 -0
package/skills/design-mcp-server/SKILL.md +72 -2
package/skills/git-wrapup/SKILL.md +2 -3
package/skills/maintenance/SKILL.md +2 -1
package/skills/orchestrations/SKILL.md +217 -0
package/skills/orchestrations/workflows/field-test-fix.md +206 -0
package/skills/orchestrations/workflows/fix-wrapup-release.md +175 -0
package/skills/orchestrations/workflows/greenfield-build.md +143 -0
package/skills/orchestrations/workflows/maintenance-release.md +171 -0
package/skills/polish-docs-meta/SKILL.md +27 -6
package/skills/release-and-publish/SKILL.md +5 -4
package/skills/report-issue-framework/SKILL.md +4 -1
package/templates/.claude-plugin/plugin.json +20 -0
package/templates/.codex-plugin/mcp.json +9 -0
package/templates/.codex-plugin/plugin.json +25 -0
package/templates/AGENTS.md +11 -0
package/templates/CLAUDE.md +12 -1
package/templates/changelog/template.md +8 -0
package/skills/multi-server-orchestration/SKILL.md +0 -137
package/skills/multi-server-orchestration/references/greenfield-buildout.md +0 -246
package/skills/multi-server-orchestration/references/maintenance-pass.md +0 -148
package/skills/multi-server-orchestration/references/release-and-publish-pass.md +0 -184
package/skills/multi-server-orchestration/references/wrapup-pass.md +0 -150

package/skills/code-simplifier/SKILL.md ADDED Viewed

@@ -0,0 +1,130 @@
+---
+name: code-simplifier
+description: >
+  Post-session code review and cleanup against a working tree of changes. Analyzes `git diff` to simplify, consolidate, and align changed code with the existing codebase — modernize syntax, remove unnecessary complexity, consolidate duplicated logic, catch efficiency issues. Use after a substantive working session, or when asked to clean up, simplify, reduce slop, consolidate, modernize, tighten up, or de-slop code. For `@cyanheads/mcp-ts-core` projects, includes specific transformations for tool/resource/prompt definitions, the ctx pattern, error factories, and framework idioms.
+metadata:
+  author: cyanheads
+  version: "1.0"
+  audience: external
+  type: workflow
+---
+# Code Simplifier
+Post-session cleanup pass. Reviews what changed, understands how it fits the existing codebase, and makes targeted improvements — modernizing syntax, removing unnecessary complexity, consolidating duplicated logic, catching efficiency issues. Prioritizes codebase cohesion over local perfection.
+## Core philosophy
+**Every change must earn its keep.** A simplification that doesn't meaningfully improve clarity, correctness, or cohesion is noise. Don't refactor for refactoring's sake. Don't create new files, abstractions, or utilities unless they solve a demonstrated problem. If the existing code works and is readable, leave it alone. The goal is a cohesive codebase, not a pristine one.
+## Procedure
+### Phase 1: Identify changes
+Run `git diff` (or `git diff HEAD` if changes are staged) to see what changed. If there are no git changes, review the most recently modified files from the current session.
+### Phase 2: Understand the surrounding codebase
+Don't review changes in isolation. Before any modifications:
+1. **Read the full files** containing changes — not just the diff hunks. Understand imports, surrounding logic, module structure.
+2. **Identify the project language(s)** and select the relevant transformation rules. Discard inapplicable rules.
+3. **Survey adjacent code** — shared utilities, sibling modules, common patterns. You need to know what already exists before deciding something is missing. For mcp-ts-core projects, check `src/utils/` for project utilities, `src/errors/` for error handling, and `node_modules/@cyanheads/mcp-ts-core/` for framework exports.
+### Phase 3: Review
+Evaluate the changes across these dimensions. Not every dimension applies to every diff — skip what's irrelevant.
+#### Codebase cohesion
+- **Reuse** — Search for existing utilities, helpers, and patterns that could replace newly-written code. For mcp-ts-core projects, prefer `import from '@cyanheads/mcp-ts-core/utils'` over hand-rolled equivalents — pagination helpers, schema builders, retry primitives, and OTel attribute constants are framework-provided.
+- **Consolidation** — Flag copy-paste-with-variation: near-duplicate code blocks that should be unified. Only unify if the shared abstraction is genuinely simpler than the duplicated code.
+- **Consistency** — Check that new code follows the same patterns as the rest of the codebase: naming conventions, error handling style, import patterns, type annotation style. Normalize toward the better variant when the project is inconsistent.
+- **Stringly-typed code** — Flag raw strings where constants, string-union types, branded types, or framework attribute constants already exist. For mcp-ts-core projects, the `ATTR_*` constants in `@cyanheads/mcp-ts-core/utils` should replace raw OTel attribute keys.
+#### Code quality
+- **Redundant state** — State that duplicates existing state, cached values that could be derived.
+- **Unnecessary complexity** — Deep nesting that could be guard clauses, premature abstractions, over-engineered solutions to simple problems.
+- **Dead code** — Unreachable branches, unused variables, commented-out code, exports that nothing imports.
+- **Defensive code for impossible states** — Guards for cases the type system or framework already prevents. Drop them.
+- **Outdated patterns** — Verbose or legacy syntax where modern equivalents exist. See the transformation tables below.
+#### Efficiency
+- **Redundant work** — Repeated computations, duplicate file reads, duplicate network/API calls, N+1 query patterns.
+- **Missed concurrency** — Independent async operations run sequentially that could run in parallel with `Promise.all` / `Promise.allSettled`.
+- **No-op updates** — State/store updates inside loops or event handlers that fire unconditionally. Add change-detection so downstream consumers aren't notified when nothing changed.
+- **TOCTOU** — Pre-checking file/resource existence before operating on it. Operate directly and handle the error instead.
+- **Overly broad operations** — Reading entire files when only a portion is needed, loading all items when filtering for one.
+#### mcp-ts-core-specific
+- **Error throwing patterns** — Prefer framework error factories (`McpError`, `validationError`, `notFound`, `httpErrorFromResponse`) over raw `throw new Error()`. Tool handlers should throw — the framework catches, classifies, and instruments.
+- **Error codes** — `InvalidParams` only for malformed JSON-RPC params shape. `ValidationError` for domain validation. `NotFound` for missing entities. Don't conflate them.
+- **Ctx usage** — Use `ctx.log`, `ctx.state`, `ctx.elicit`, `ctx.sample` — don't reach for global loggers, request-scoped storage, or sampling APIs directly. The `ctx` pattern carries tenant scope and OTel context.
+- **Zod schemas** — Every tool input/output field needs `.describe()`. Zod 4 requires `z.record(z.string(), z.string())` not `z.record(z.string())`. Use `.optional()` rather than `.nullish()` unless null is semantically distinct from absent.
+- **Tool annotations** — `readOnlyHint`, `idempotentHint`, `openWorldHint` should reflect reality. A read-only tool with `readOnlyHint: false` gives clients the wrong picture.
+- **`exactOptionalPropertyTypes` boundaries** — If a downstream type insists on the field being present-or-not-present (not present-as-undefined), use a mapped widening type at the boundary. The pattern is documented in the framework.
+- **`format()` ↔ `structuredContent` parity** — Different MCP clients forward different surfaces. Tests should assert both surfaces carry equivalent data.
+### Phase 4: Apply transformations
+1. **Filter findings ruthlessly.** If a finding is a false positive or not worth the churn, skip it. Don't argue with yourself about borderline cases — move on.
+2. **Transform incrementally** — one category of change at a time (modernize syntax, then reduce nesting, then consolidate).
+3. **Verify equivalence** — all functionality, types, and public interfaces must remain unchanged.
+4. **Keep the diff minimal.** Only touch lines that have a real reason to change. Don't reformat untouched code, add comments to code you didn't modify, or "improve" things that are already fine.
+When done, briefly summarize what was fixed or confirm the code was already clean.
+## Common transformations
+The tables below cover TypeScript and Python. For other languages, apply analogous principles: prefer modern idioms, reduce nesting, eliminate dead code, follow project conventions.
+### TypeScript (modern ESM, TS 5.x+)
+| Before | After | Why |
+| --- | --- | --- |
+| `const x: Foo = { ... } as Foo` | `const x = { ... } satisfies Foo` | Type-checked without assertion |
+| `let resource = acquire(); try { ... } finally { release(resource) }` | `using resource = acquire()` | Explicit resource disposal (TS 5.2+) |
+| `if (x !== null && x !== undefined)` | `if (x != null)` | Idiomatic null/undefined check |
+| `arr.filter(x => x !== null) as T[]` | `arr.filter((x): x is T => x != null)` | Type-safe filtering, no cast |
+| `export { foo } from './foo/index.js'` | Direct imports at call sites | Avoid barrel re-exports inside the package; barrel exports are for public APIs only |
+| `async function f() { const a = await x(); const b = await y(); }` | `const [a, b] = await Promise.all([x(), y()])` | Parallel when independent |
+| `obj.x !== undefined ? obj.x : fallback` | `obj.x ?? fallback` | Nullish coalescing |
+| `if (a) { if (b) { if (c) { ... } } }` | Guard clauses with early returns | Reduce nesting |
+| `try { risky() } catch (e: any) { ... }` | `try { risky() } catch (e: unknown) { ... }` | Type-safe error handling |
+| `enum Status { A, B, C }` | `const Status = { A: 'A', B: 'B', C: 'C' } as const` | Prefer const objects for numeric enums; string enums are acceptable |
+| `function f(a: string, b: string, c: string, d?: string)` | `function f(opts: FnOptions)` | Options object when >3 params |
+| `throw new Error('Bad input')` (in a tool handler) | `throw validationError('Bad input', { field: 'x' })` | Use framework error factories so the framework can classify and instrument |
+| `const ATTR_KEY = 'mcp.tool.name'` | `import { ATTR_MCP_TOOL_NAME } from '@cyanheads/mcp-ts-core/utils'` | Use framework attribute constants |
+### Python (3.12+)
+| Before | After | Why |
+| --- | --- | --- |
+| `Optional[str]` | `str \| None` | Modern union syntax (3.10+) |
+| `List[str]`, `Dict[str, int]` | `list[str]`, `dict[str, int]` | Built-in generics (3.9+) |
+| `if x == 0: ... elif x == 1: ... elif x == 2: ...` | `match x: case 0: ... case 1: ...` | Structural pattern matching (3.10+) |
+| `class Config: def __init__(self, a, b, c): self.a = a ...` | `@dataclass class Config: a: str; b: int; c: float` | Less boilerplate, built-in eq/repr |
+| `results = []; for item in items: results.append(transform(item))` | `results = [transform(item) for item in items]` | Idiomatic comprehension |
+| `f = open('x'); try: ... finally: f.close()` | `with open('x') as f: ...` | Context manager for resources |
+| `line = f.readline(); while line: process(line); line = f.readline()` | `while (line := f.readline()): process(line)` | Walrus operator where it reduces duplication |
+| `"Hello " + name + "!"` | `f"Hello {name}!"` | f-string over concatenation |
+| `except Exception as e: pass` | `except SpecificError as e: log(e)` | Catch specific, never bare except/pass |
+| `from module import *` | `from module import specific_name` | Explicit imports only |
+| `TypeAlias = Union[A, B, C]` | `type ABC = A \| B \| C` | `type` statement (3.12+) |
+| Sequential `await` for independent I/O | `await asyncio.gather(a(), b())` | Parallel when independent |
+## When NOT to simplify
+Leave code alone when:
+- **It works and is readable.** "I would have written it differently" is not a reason to change it.
+- **The change is cosmetic.** Renaming a variable from `data` to `result` isn't worth the churn.
+- **Intentional verbosity for debugging.** Verbose code may exist to make stack traces or logging clearer.
+- **Performance-critical paths.** A less readable version may exist for measured performance reasons — check before simplifying.
+- **API compatibility.** Don't change public function signatures, export shapes, or return types that callers depend on. For mcp-ts-core projects, the public surface includes tool input/output schemas exposed via MCP — changing them is a breaking change to the server's MCP surface.
+- **Tests.** Don't DRY up test code aggressively — test readability and isolation matter more than deduplication.
+- **Type workarounds.** Sometimes an `as` cast or `# type: ignore` exists because of a genuine type system limitation — verify before removing.
+- **The abstraction isn't proven.** Don't create a shared utility for two similar blocks of code. Wait until there are three, and even then only if the abstraction is genuinely simpler than the duplication.

package/skills/design-mcp-server/SKILL.md CHANGED Viewed

@@ -4,7 +4,7 @@ description: >
   Design the tool surface, resources, and service layer for a new MCP server. Use when starting a new server, planning a major feature expansion, or when the user describes a domain/API they want to expose via MCP. Produces a design doc at docs/design.md that drives implementation.
 metadata:
   author: cyanheads
-  version: "2.11"
+  version: "2.12"
   audience: external
   type: workflow
 ---
@@ -29,6 +29,23 @@ Gather before designing. Ask the user if not obvious from context:
 If the domain has a public API, read its docs before designing. For internal-only servers, skip API research and go straight to user goals. Don't design from vibes either way.
+### Server scope and audience
+Before committing to a server boundary, answer: **what workflow does this server serve, and who is the audience?**
+The unit of a server is a *user workflow*, not an API. A single rich API can earn its own server when the audience is large and the API surface supports a full workflow (PubMed for literature research, SEC EDGAR for financial analysis, Shodan for internet-wide device intelligence). Multiple APIs should collapse into one server when they serve the same workflow from different angles — a "threat intelligence" server that aggregates VirusTotal, AbuseIPDB, and GreyNoise is more useful than three separate servers because the user's goal is "assess this indicator," not "query VirusTotal."
+**Don't default to one-API-one-server.** That's the right call when the API is deep enough and the audience is large enough, but it's not the starting point. The starting point is the workflow:
+| Signal | Server boundary |
+|:-------|:----------------|
+| Single API with rich surface, large audience | Standalone server named for the platform (`pubmed-mcp-server`, `secedgar-mcp-server`) |
+| Multiple APIs serving the same workflow | One server named for the workflow (`threat-intel-mcp-server`), APIs are internal sources |
+| Domain with distinct sub-audiences | Consider splitting — a pentester and a SOC analyst have different workflows even in the same domain |
+| Pure computation, no external deps | Standalone server named for the capability (`calculator-mcp-server`, `redteam-mcp-server`) |
+When multiple APIs collapse into one server, the tool surface is organized around what the user is doing, not which API gets called. The agent says "investigate this domain" and the server routes to the best available source internally. Individual APIs become service-layer implementation details, not tool-surface identities.
 ## Server Naming
 The server name (repo name, npm package, public identity) must communicate what it does at a glance. The test: can a human or agent scanning a server list tell what this server does from the name alone?
@@ -141,6 +158,56 @@ const findStudies = tool('clinicaltrials_find_studies', {
 > **Tip — mode consolidation.** When a tool has several related operations on the same noun, you can consolidate them under one tool with a `mode`/`operation` enum. This affects both naming (noun-led, e.g., `github_pull_request`) and handler design (dispatch by mode). Use when it tightens the surface; skip when ops diverge enough to warrant separate tools.
+#### Multi-source tools and fallback chains
+**Applies when:** a server aggregates multiple data sources for the same workflow, and the "best" source varies by input type, availability, or coverage. Skip for single-API servers.
+When a tool's goal can be served by multiple sources, design it as a **multi-source tool** — the agent calls one tool, the handler routes to the best source (or fans out to several) internally. This is the difference between a "PubMed wrapper" and a "literature research server": `pubmed_search_articles` tries PubMed first, falls back to EuropePMC for broader coverage, then Unpaywall for open access. The agent doesn't choose which API to hit — the server makes that decision based on what works.
+Two patterns:
+**Source fallback chains** — try sources in priority order, fall through on failure or empty results. Best when sources cover the same data with different depth or availability. The output should indicate which source provided the data so the agent (and human) can assess provenance.
+```ts
+// Handler pseudocode — not a real implementation
+async handler(input, ctx) {
+  // Primary: PubMed E-utilities (authoritative, best metadata)
+  const result = await pubmedService.search(input.query);
+  if (result.items.length > 0) return { ...result, source: 'pubmed' };
+  // Fallback: EuropePMC (broader coverage, includes preprints)
+  const epmcResult = await epmcService.search(input.query);
+  if (epmcResult.items.length > 0) return { ...epmcResult, source: 'europepmc' };
+  return { items: [], source: 'none', message: 'No results from any source.' };
+}
+```
+**Multi-source fan-out** — query multiple sources in parallel, merge results. Best when sources provide complementary data about the same entity. Use `Promise.allSettled` so one failing source doesn't tank the whole call.
+```ts
+// Handler pseudocode — indicator enrichment across threat intel sources
+async handler(input, ctx) {
+  const [vt, abuse, greynoise] = await Promise.allSettled([
+    vtService.lookup(input.indicator),
+    abuseIpService.check(input.indicator),
+    greynoiseService.query(input.indicator),
+  ]);
+  return {
+    indicator: input.indicator,
+    sources: {
+      virustotal: vt.status === 'fulfilled' ? vt.value : { error: vt.reason.message },
+      abuseipdb: abuse.status === 'fulfilled' ? abuse.value : { error: abuse.reason.message },
+      greynoise: greynoise.status === 'fulfilled' ? greynoise.value : { error: greynoise.reason.message },
+    },
+    // Server synthesizes a verdict from available data — the agent gets a conclusion, not raw API dumps
+    assessment: synthesizeVerdict(vt, abuse, greynoise),
+  };
+}
+```
+In both patterns, the tool surface is organized around what the user is doing. Sources are service-layer details — the agent sees `threat_enrich_indicator`, not `virustotal_lookup` + `abuseipdb_check` + `greynoise_query`. Mode-based dispatch by input type (e.g., `indicator_type: 'ip' | 'domain' | 'hash'`) naturally routes to different source chains per mode, since different sources cover different indicator types.
 There is no fixed ceiling on tool count — tools need to earn their keep, but don't artificially limit the surface. If the domain genuinely has 20 distinct workflows, expose 20 tools.
 #### Cut the surface
@@ -416,7 +483,7 @@ Skip for purely data/action-oriented servers.
 ### 7. Plan Services and Config
-**Services** — one per external dependency. Init/accessor pattern. Skip if all tools are thin wrappers with no shared state.
+**Services** — one per external dependency (or per source, for multi-source servers). Init/accessor pattern. Skip if all tools are thin wrappers with no shared state. For multi-source servers, each upstream API gets its own service with its own auth, rate limits, and retry config — tools compose across services internally, agents never see the service boundary.
 **Server-as-service.** When the server IS the source of truth (knowledge graph, in-memory task tracker, local scratchpad, embedded inference wrapper), the resilience table below doesn't apply — there's no upstream to retry. The design questions shift to state management: what's tenant-scoped vs. global, what TTLs apply, what survives a restart, what the storage backend is. Plan persistence via `ctx.state` for tenant-scoped KV (auto-namespaced by `tenantId`), or use a `StorageService` provider directly when data must cross tenants. Service init still happens in `setup()`, accessed via `getMyService()` at request time. Calls within the server are local and synchronous-ish — the API-efficiency table below also doesn't apply.
@@ -539,6 +606,8 @@ Execute the plan using the scaffolding skills:
 Items without an `If …:` prefix apply to every design. Conditional items only apply when the trigger fires — otherwise skip them.
+- [ ] Server scope decided — workflow identified, audience sized, boundary drawn (standalone single-API vs. multi-source aggregation vs. internal-only)
+- [ ] **If multi-source:** tool surface organized around user workflows, not API identity. Sources are service-layer details.
 - [ ] External APIs/dependencies researched and verified (docs fetched, SDKs identified)
 - [ ] **If wrapping an external API:** live API probed (at minimum: one list/search, one single-item GET, one error case)
 - [ ] User goals enumerated first (3–10 outcomes agents will accomplish, scaled to domain size), then domain operations mapped as raw material
@@ -567,5 +636,6 @@ Items without an `If …:` prefix apply to every design. Conditional items only
 - [ ] **If the server is itself the source of truth (no external API):** state lifecycle planned — tenant-scoped vs. global, TTLs, what survives restart, storage backend chosen
 - [ ] **If the server has external deps or shared state:** service layer planned (or explicitly skipped with reasoning)
 - [ ] **If services wrap external APIs:** resilience planned (retry boundary, backoff, parse classification)
+- [ ] **If multi-source server:** each source has its own service with independent auth/retry/rate-limit config. Fallback chains or fan-out strategy documented per tool. Output includes source provenance.
 - [ ] **If exposing a SQL/analytical workspace over tabular data is in scope:** DataCanvas considered (`api-canvas` skill) as one option before designing custom analytical state — register / query / export tools accepting an optional `canvas_id`, with `ctx.core.canvas?` reads
 - [ ] **If the server needs runtime config:** env vars identified in `server-config.ts`

package/skills/git-wrapup/SKILL.md CHANGED Viewed

@@ -4,7 +4,7 @@ description: >
   Land working-tree changes as a versioned release commit with an annotated tag — version bump, changelog, regenerate derived artifacts, verify, commit, tag. Stops at "committed and tagged locally" — no push, no publish. The release-and-publish skill picks up from here. Distilled from the git_wrapup_instructions protocol.
 metadata:
   author: cyanheads
-  version: "1.0"
+  version: "1.1"
   audience: external
   type: workflow
 ---
@@ -35,8 +35,6 @@ Every item must be true before starting wrapup. Committing means releasing — a
 If any gate is red, fix it before proceeding. This skill re-verifies build + tests in step 6, but starting wrapup on a broken tree wastes the version number and creates a revert-or-amend situation.
-After all gates pass, spawn dedicated agents to handle the wrapup (this skill) and publish (`release-and-publish`). These are separate agents from the ones that did the editing work.
 ## Steps
 ### 1. Review the diff
@@ -180,6 +178,7 @@ Dependency bumps:
 - Not a CHANGELOG copy — terse, scannable
 - No marketing adjectives
 - Length is earned — two-line tags are fine for small patches
+- **Issue backlinks:** when changes address GitHub issues, include `(#N)` references in the relevant bullets — same as the changelog entry. The backlinks render as clickable links in the GitHub Release body.
 ### 9. Verify end state

package/skills/maintenance/SKILL.md CHANGED Viewed

@@ -78,7 +78,8 @@ Scan specifically for:
 | Config changes | New env vars, renamed keys, changed defaults |
 | Linter rules | New definition-lint rules that may now flag existing tools/resources |
 | New or materially-changed skills | Note new skills or workflow changes (renamed steps, new checklist items) worth surfacing at end-of-run. Don't auto-invoke — some skills (e.g. `security-pass`) are user-triggered. The per-version changelog entries (e.g. 0.6.14 calling out `skills/security-pass/ (v1.0)`) name what changed. |
-| New template-scaffolded files | Compare `templates/` in the package against the project root. Files that `init` would create for a new project but don't exist in this project are adoption candidates — create them with project-specific values (version, name, description, env vars from `server.json`). Examples: `manifest.json`, `.mcpbignore`. Skip files the project has intentionally opted out of (documented in CLAUDE.md or a code comment). |
+| New template-scaffolded files | Compare `templates/` in the package against the project root. Files that `init` would create for a new project but don't exist in this project are adoption candidates — create them with project-specific values (version, name, description, env vars from `server.json`). Examples: `manifest.json`, `.mcpbignore`, `.codex-plugin/`, `.claude-plugin/`. Skip files the project has intentionally opted out of (documented in CLAUDE.md or a code comment). |
+| Changelog `agent-notes` | Read `agent-notes` frontmatter from each new per-version changelog file — these carry release-specific adoption instructions for downstream consumers (new files to create, fields to populate, one-time migration steps). Apply them alongside other adoption work in Step 6. |
 Cross-reference each finding against the server's code. Collect adoption opportunities for Step 6.

package/skills/orchestrations/SKILL.md ADDED Viewed

@@ -0,0 +1,217 @@
+---
+name: orchestrations
+description: >
+  Pick and run a multi-phase workflow that chains foundational task skills (`git-wrapup`, `release-and-publish`, `maintenance`, `field-test`, `setup`, etc.) end-to-end. Routes user intent to a workflow file under `workflows/` — greenfield builds, maintenance + release, field-test + fix, or known-work + release. Single source for the universal rules (no commits without authorization, no destructive git, no marketing language), the orchestrator posture (own the goal, ground sub-agents in primary sources, verify against the goal), and the sub-agent strategy (orient block, parallel fanout, isolation, normalization) that apply across every workflow. Sub-agents are an optional capability — workflows run linearly when fanout isn't available.
+metadata:
+  author: cyanheads
+  version: "1.0"
+  audience: internal
+  type: workflow
+---
+## When to Use
+Multi-phase work that chains several foundational skills against one or more MCP server projects. Typical triggers:
+- "Build N new servers" / "scaffold and ship X, Y, Z" → `workflows/greenfield-build.md`
+- "Update and release these servers" / "run maintenance and ship" → `workflows/maintenance-release.md`
+- "QA / field-test / find-and-fix bugs in these servers" → `workflows/field-test-fix.md`
+- "Fix these issues and ship" / handoff document with findings to act on → `workflows/fix-wrapup-release.md`
+Single-skill work — running just `maintenance`, just `git-wrapup`, just `release-and-publish` — invokes the foundational skill directly. Use this orchestrations skill when at least two phases need to chain.
+## Mental Model — Three Tiers
+| Tier | Layer | Examples | Who reads it |
+|:---|:---|:---|:---|
+| **1** | Foundational task skills | `git-wrapup`, `release-and-publish`, `maintenance`, `field-test`, `setup`, `design-mcp-server`, `polish-docs-meta`, `code-simplifier`, `add-tool`, `add-resource`, `add-service`, `add-test`, etc. | Orchestrator AND sub-agents (by direct path reference) |
+| **2** | Orchestration workflows | The four files under `workflows/` | Orchestrator only |
+| **3** | Router | This `SKILL.md` | Orchestrator only |
+Workflows in Tier 2 sequence Tier 1 skills with gates and verification. They never duplicate Tier 1 content — they direct to it. A workflow file says "Phase N: agent reads and runs `skills/git-wrapup/SKILL.md`," not "here's how to wrap up a release."
+The orchestrator is the agent driving the workflow — the one reading this SKILL.md. Sub-agents the orchestrator spawns receive prompts pointing at Tier 1 skills directly; they do not receive this skill or the workflow file. That boundary prevents recursive sub-agent spawning.
+## Pick a Workflow
+Identify the workflow from user intent first, then sanity-check against project state if intent is ambiguous.
+| User intent / state signal | Workflow |
+|:---|:---|
+| New scaffold(s) from `bunx @cyanheads/mcp-ts-core init`, no implementation yet (echo definitions still present, no released changelog) | `workflows/greenfield-build.md` |
+| Existing server(s), `bun outdated` shows updates, want to land them and ship | `workflows/maintenance-release.md` |
+| Existing server(s), want to find bugs via live testing and fix them, optionally ship | `workflows/field-test-fix.md` |
+| Existing server(s) with known issues (GH issues, handoff document, observed gap), want to fix and ship | `workflows/fix-wrapup-release.md` |
+If intent is ambiguous (no clear signal), surface the candidate workflows to the user and confirm. Don't pick silently.
+A workflow file is the orchestrator's playbook for one run. Read it end-to-end before kicking off the first phase.
+## Universal Rules
+These apply to every workflow. Workflow files don't restate them; the orchestrator carries them forward and restates them in sub-agent prompts where applicable.
+1. **No commits, pushes, tags, branch creation, or destructive ops without explicit user authorization.** Work phases leave the working tree dirty for orchestrator review. Wrap-up and release phases run only after the user authorizes — though once authorized, the authorization is durable through the workflow's end (no re-asking at each phase boundary).
+2. **No `git stash`, no `git reset --hard`, no `git restore .`, no `git clean -f`, no `git checkout -- .`.** These bypass safety and risk silent data loss. Read-only git (`status`, `diff`, `log`, `show`, `blame`) is always safe.
+3. **No `--no-verify`, no `--no-gpg-sign`, no bypassing commit hooks.** If a hook fails, investigate the underlying issue.
+4. **`bun run devcheck` is the handoff gate between phases.** Work phases must hand back a green devcheck. If a phase can't reach green, halt and report the failing step verbatim rather than carrying broken state forward.
+5. **No marketing adjectives** in commits, tags, READMEs, or changelog entries — no "comprehensive", "robust", "enhanced", "seamless", "improved". State the change, not its quality.
+6. **One workflow per orchestration run.** Don't interleave two workflows in the same session. If a target needs both (e.g., maintenance surfaces a bug fix that needs field-testing first), sequence them as two workflow runs with a clean handoff in between.
+7. **`gh release create --notes-from-tag` is incompatible with `--repo`.** Always `cd` into the target repo directory for `gh release` commands.
+8. **Annotated tags only** (`git tag -a`), never lightweight. Tag annotation subject omits the version number — GitHub prepends `v<VERSION>:` to release titles when using `--notes-from-tag`, so including the version in the subject creates stutter.
+9. **Conventional Commits subjects** (`feat|fix|refactor|chore|docs|test|build(scope): message`). One logical concern per commit. The release commit (version bump + changelog + regenerated artifacts) lands on top of a stack of feature/fix commits, never collapsed alongside them.
+10. **Email on any artifact is the user's domain email**, never a personal address that might appear in git config.
+## Orchestrator Posture
+The orchestrator owns the goals. Workflow phases are not "run skill X" — they are "achieve goal Y, using skill X as the path." Sub-agents (when used) are instruments for hitting the goals, not the work itself. The same posture applies in linear mode — the orchestrator runs the phase directly, but the goal is still the contract.
+Before running a phase (or spawning a sub-agent for it), write down four things:
+1. **Goal** — the verifiable end state this phase must produce. Concrete and testable: "v0.5.2 tag exists at HEAD with structured-markdown annotation; `bun run devcheck` green; `npm view <pkg>@0.5.2` resolves." Not fuzzy: "ran the release-and-publish skill."
+2. **Primary sources** — the specific files, GH issues, and reference docs the sub-agent must read directly. Inlining content into the prompt is a paraphrase that loses nuance; agents grounded in the source catch details the orchestrator's summary missed. For GH issues, instruct `gh issue view N --comments` — body alone misses thread clarifications. The orchestrator reads these sources too (to construct the prompt), but that's prompt construction, not a substitute for the sub-agent reading them.
+3. **Path** — the Tier 1 skill(s) and steps that get to the goal. This is what gets handed to the sub-agent.
+4. **Verification** — the read-only checks that confirm the goal was hit. Defined upfront, not as an afterthought.
+Why the framing matters:
+- **Verification follows from goal definition.** If the goal is concrete, the verification is obvious — check that exact state. If the goal is fuzzy, verification degrades to "did the sub-agent say it worked?"
+- **Sub-agent self-reports describe intent, not always reality.** A goal you wrote down beforehand is the falsification target — the sub-agent's report is a hypothesis to verify against it.
+- **Replanning is local.** When verification fails, the goal is unchanged; the orchestrator picks a different path (re-spawn with the failure context, re-slice the work, intervene directly). Phase rework doesn't cascade.
+**Inform without inlining.** An enhanced sub-agent prompt names the specific primary sources and the goal — it does NOT paraphrase them. "Review GH issue #123 (read it via `gh issue view 123 --comments`); the goal is X; verify with Y" is the right shape. Pasting the issue body into the prompt forces the sub-agent to work from a paraphrase. Let the sub-agent read the source and explore for additional context as needed.
+## Sub-Agent Strategy (if available)
+Sub-agents are optional. If the orchestrator has the capability to spawn them, fan out work in parallel where it makes sense. Single-target workflows usually run linearly; multi-target workflows benefit from a parallel sub-agent per target. Choose based on scope, not by default.
+The decision tree:
+| Situation | Strategy |
+|:---|:---|
+| Single target, small change | Linear, orchestrator runs the phases itself |
+| Single target, large change likely to exhaust orchestrator context | Sub-agent per phase; orchestrator gates between phases |
+| N > 1 targets, independent work per target | One sub-agent per target per phase (parallel fanout) |
+| N > 1 targets, work that conflicts across targets (e.g., all editing the same file) | Linear or serial — the parallel model assumes target independence |
+| Sub-agents not available | Linear, regardless of N — same phases, just sequential |
+If sub-agents are not available, the workflow phases still apply — they're just executed sequentially by the orchestrator. The phase structure is the value; parallelism is the optimization.
+### Orient block
+Every sub-agent prompt opens with this block. Sub-agents do not inherit the orchestrator's `CLAUDE.md` chain or skill registry — both must be reconstructed in the prompt. Substitute the bracketed values per target.
+```text
+You are working on `[project name]` at `[project absolute path]`.
+Orient first. These steps are required before any task work — do them in
+order. If any file does not exist, note it and continue.
+1. Read the global agent protocol at `~/.claude/CLAUDE.md` (or your agent's equivalent).
+2. Read the workspace-level protocol if one exists at `[workspace CLAUDE.md path]`
+   — skip this step if no workspace-tier protocol applies.
+3. Read the project protocol at `[project absolute path]/CLAUDE.md`.
+4. Run `cd [project absolute path] && bun run list-skills` to see the project's
+   available skills with descriptions and locations.
+5. Read the skill file(s) for this task: `[Tier 1 skill paths]`.
+6. Read the primary sources for this task directly — design docs (`docs/design.md`),
+   GH issues (use `gh issue view <N> --comments` to capture the full thread, not
+   just the body), handoff documents, reference/gold-standard files. List each
+   one explicitly: `[primary source paths and gh commands]`. Skip this step only
+   if no primary source applies (rare).
+Only after that, begin the task below.
+**Goal:** [the verifiable end state this phase must produce — concrete, testable]
+**Path:** [Tier 1 skill(s) and steps the sub-agent should follow]
+**Constraints:** [no-go list — restate git/commit rules and other invariants verbatim]
+**Expected outputs:** [report shape you want back — e.g., "Step 8 numbered summary", "list of files touched with one-line rationale per fix"]
+```
+The sub-agent reads the primary sources directly during orient (step 6) — do not paste their contents into the prompt. The orchestrator names them; the sub-agent reads them.
+### Isolation rules
+1. **Bash `git` only in parallel sub-agents.** Do not let parallel sub-agents call `mcp__git-mcp-server__*` tools — session state (`set_working_dir`) leaks across parallel calls in the same orchestrator session, causing silent no-ops, wrong-directory operations, and false "tag already exists" errors. Bash `git` in the agent's CWD is reliable. The orchestrator may still use `git-mcp-server` itself in serial.
+2. **Sub-agents do not receive this orchestrations skill or workflow files.** Their prompts include Tier 1 skill paths only. This prevents recursive sub-agent spawning — if a sub-agent decides it needs to fan out work, that's a signal the orchestrator sliced the work too wide. Re-slice; don't let the sub-agent recurse.
+3. **Sub-agent prompts must restate the no-git-write and no-`stash` rules verbatim.** The orchestrator's `CLAUDE.md` rules aren't visible to sub-agents at prompt time.
+4. **Narrow scope per fanout.** A sub-agent doing "implement everything, write tests, run devcheck, polish, commit, tag" will exhaust its context window before finishing — the work lands on disk but the agent can't continue. Split phases so each sub-agent finishes well under the context limit. Plan a follow-up "finish" phase as a normal backstop, not a fallback for failure.
+### Parallel fanout pattern
+For N targets in a phase:
+1. Compose N sub-agent prompts (one per target) with the orient block + task body + workflow's phase-specific constraints
+2. Launch them as parallel sub-agents in a single orchestrator action
+3. Collect their reports
+4. Verify with a read-only orchestrator check before advancing to the next phase
+### Editor / wrap-up separation
+Editing phases and wrap-up phases never go in the same sub-agent. Editing sub-agents make file changes and run devcheck — they do not commit, tag, or push. Wrap-up sub-agents read the working tree, commit, tag, and (when releasing) push and publish — they do not edit source. This separation lets the orchestrator review diffs before they become permanent and keeps the commit graph clean.
+### Normalization
+Independent sub-agents diverge on incidental choices — scoped vs. unscoped package names, script invocation form, README hero structure, badge ordering. When choices should be uniform across targets, plan an explicit normalization step after the fanout — don't expect alignment for free.
+For small N or small diffs, the orchestrator normalizes directly. For large N or non-trivial fixes, spawn a narrow-scope fanout with an explicit rule list.
+### Rolling concurrency
+Rate limits on parallel sub-agent spawning are intermittent — sometimes 15 concurrent agents work fine, sometimes 3 get throttled. Don't hard-cap; use rolling concurrency. Launch an initial batch, then as each agent completes, kick off the next in line. If a wave gets rate-limited, shrink the window for the next wave.
+### Cross-project naming hygiene
+When N targets share a phase, never name other targets in a sub-agent's prompt — even as examples. Sub-agents pattern-match on everything in their prompt, and cross-project names leak into commits, messages, and variable names. Each sub-agent's prompt references its own target only.
+## Verification (orchestrator)
+Verification runs against the goal *you* defined for the phase — not against the sub-agent's self-report. A sub-agent that reports "done" without producing the goal state is not done. The artifact checks below are the *means* of confirming the goal; pick the ones that exercise your specific goal definition.
+Sub-agent self-reports describe intent, not always reality. After every phase that touched the filesystem or remote services, run a read-only check against the goal:
+- **Files** — `ls`, `git status`, `git diff --stat`
+- **Commits** — `git log --oneline -5`
+- **Tags** — `git tag --points-at HEAD`, `git ls-remote --tags origin`
+- **GitHub** — `gh repo view --json visibility`, `gh release view v<VERSION>`, `gh issue list`, `gh issue view <N> --comments` to confirm the fix comment landed
+- **npm / registries** — `npm view <pkg>@<version>`, registry-specific checks
+- **Build state** — re-run `bun run devcheck` if the previous phase was supposed to land green
+- **Quality** — tag annotation reads as structured markdown (not flat string), subject omits the version number, no marketing adjectives, dep arrows present where applicable, issue backlinks where applicable
+If verification disagrees with the sub-agent's report, that's the signal to re-spawn with the actual state and the unmet goal in the prompt — not to trust the report. The goal hasn't changed; only the path needs to.
+## Authorization Flow
+| Phase type | Authorization required |
+|:---|:---|
+| Reads, analysis, file edits (working tree only) | Implicit — initial workflow approval covers these |
+| Local commits, annotated tags | Explicit at workflow start; durable through workflow end |
+| Push to remote, npm / registry publish, GH release create, Docker push | Explicit at workflow start; durable through workflow end |
+| Destructive ops (force push, tag delete, remote branch delete, etc.) | Always re-confirm, never assume |
+Pipeline authorization is durable through to completion. Once the user authorizes a workflow run, don't re-ask at each phase boundary — proceed automatically through gates that pass. Conditions that always require a fresh check-in: destructive ops on shared resources, external actions without sign-off, errors that need human judgment.
+## Workflow File Discipline
+Workflow files are thin by design. Each phase row in a workflow's phases table maps to a Tier 1 skill or a thin orchestration step. **Phase notes are for orchestration overrides only** — sequencing rules, fanout-specific constraints, non-obvious instructions, decisions the foundational skill leaves to the caller. Never paraphrase what a foundational skill already documents. A phase that runs a Tier 1 skill end-to-end with no orchestration override needs no phase note — just the row in the table.
+The same discipline applies to gotchas: workflow-specific gotchas are about the orchestration pattern itself (e.g., parallel sub-agent context exhaustion, normalization gaps). Gotchas about a Tier 1 skill's internals belong in that skill, not the workflow.
+## When the Workflow List Doesn't Fit
+For scenarios that don't map cleanly to one of the four workflow files — security audits across N servers, framework-wide migrations, design-only extensions, ad-hoc multi-step work — the universal rules and sub-agent strategy above still apply. Author a new workflow file at `workflows/<scenario>.md` when the pattern is repeatable enough to codify. Follow the shape of the existing workflow files: when applicable, Tier 1 skills referenced, pre-flight, phases table, phase notes, workflow-specific gotchas, checklist. Apply the workflow file discipline above.
+## Pre-flight Checklist (every workflow)
+Verify before kicking off the first phase. Workflow files add their own pre-flight items on top of these.
+- [ ] Target list captured with absolute paths
+- [ ] Intent and state signals point to a single workflow (or confirmed with user if ambiguous)
+- [ ] Selected workflow file read end-to-end
+- [ ] Phase objectives understood (the Objective column of the phases table is the goal contract — verification runs against these)
+- [ ] Plan surfaced to user: workflow, targets, phase objectives, applicable universal rules
+- [ ] User authorization captured for the workflow's commit/push/publish phases (if any apply)
+- [ ] Sub-agent capability confirmed (or fallback to linear execution noted)