npm - @event4u/agent-config - Versions diffs - 1.28.0 → 1.29.0 - Mend

@event4u/agent-config 1.28.0 → 1.29.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

package/.agent-src/skills/async-python-patterns/SKILL.md +147 -0
package/.agent-src/skills/defense-in-depth/SKILL.md +152 -0
package/.agent-src/skills/error-handling-patterns/SKILL.md +134 -0
package/.agent-src/skills/mcp-builder/SKILL.md +108 -0
package/.agent-src/skills/prompt-engineering-patterns/SKILL.md +145 -0
package/.agent-src/skills/repomix/SKILL.md +135 -0
package/.agent-src/skills/secrets-management/SKILL.md +142 -0
package/.agent-src/skills/testing-anti-patterns/SKILL.md +145 -0
package/.claude-plugin/marketplace.json +9 -1
package/CHANGELOG.md +27 -0
package/README.md +2 -2
package/docs/architecture.md +1 -1
package/docs/catalog.md +10 -2
package/docs/contracts/file-ownership-matrix.json +314 -0
package/docs/contracts/package-self-orientation.md +1 -1
package/package.json +1 -1

package/.agent-src/skills/async-python-patterns/SKILL.md ADDED Viewed

@@ -0,0 +1,147 @@
+---
+name: async-python-patterns
+description: "Use when writing Python asyncio code — picking between gather / TaskGroup / wait, structured concurrency, timeouts, cancellation, sync-bridging — decision framework only, cookbook externalized."
+source: package
+status: active
+refresh_trigger: "Python ships a new structured-concurrency primitive (post-TaskGroup), OR ≥30% of cited upstream cookbook examples become deprecated, OR the cited libraries (aiohttp, httpx, anyio, trio) cut a major version with breaking async surface changes."
+sunset_criterion: "When `https://docs.python.org/3/library/asyncio.html` ships an in-tree decision framework AND consumer projects no longer cite this skill in PR reviews for two consecutive review cycles."
+---
+# async-python-patterns
+Decision framework for picking the right Python asyncio primitive. **The pattern cookbook lives upstream** (links in § Provenance) — this skill is the predicate, not the recipe library. Sunset-policy compliant: the 600+ lines of language-specific cookbook stay in authoritative Python docs.
+## When to use
+- Designing a new async I/O-bound service (FastAPI, aiohttp, async DB client).
+- Reviewing a diff that introduces `asyncio.gather`, `asyncio.create_task`, `TaskGroup`, `as_completed`, or `wait_for`.
+- Mixing sync and async code (calling sync libs from async context, or vice versa).
+- Diagnosing event-loop blocking, never-awaited warnings, or cancellation leaks.
+Do NOT use when:
+- The work is CPU-bound — async will not help; route to multiprocessing or threadpool.
+- The runtime is not Python — read the host runtime's concurrency guide.
+- The fix is a single missing `await` — read the upstream tutorial directly.
+## Decision framework
+### Step 1 — Verify async is the right tool
+```
+Workload is:
+  I/O-bound, many concurrent waits  → async fits (network, disk, IPC).
+  CPU-bound (parsing, math, crypto) → async is wrong; use ProcessPoolExecutor.
+  Mixed                              → async shell + run_in_executor for CPU bursts.
+  Single sequential call             → don't introduce async; sync is simpler.
+```
+### Step 2 — Pick the concurrency primitive
+```
+Run N independent coroutines, ALL must complete:
+  Same trust level, exceptions cancel siblings  → asyncio.TaskGroup (3.11+; preferred).
+  Pre-3.11 OR exceptions must NOT cancel peers  → asyncio.gather(*, return_exceptions=...).
+Run N coroutines, react to results as they finish:
+  → asyncio.as_completed (yields completed futures in finish order).
+Run N coroutines, race to first success / failure:
+  → asyncio.wait(..., return_when=FIRST_COMPLETED) + cancel pending.
+Schedule fire-and-forget background work:
+  → asyncio.create_task + keep a strong reference (else GC eats it).
+  Forgetting the reference is the #1 silent-failure source.
+Bound the wait time:
+  → asyncio.wait_for(coro, timeout=...)  → raises TimeoutError on expiry.
+  → asyncio.timeout(...) context manager (3.11+; preferred when many awaits share a deadline).
+Bound concurrency (rate-limit, connection pool):
+  → asyncio.Semaphore(n); acquire around the awaitable.
+```
+### Step 3 — Bridge sync ↔ async correctly
+```
+Async code calls sync, blocking, function:
+  Short pure-CPU              → fine, accept the block (microseconds).
+  Long, blocking, or I/O-sync → await loop.run_in_executor(None, fn, *args).
+  Library has async sibling   → switch the library (httpx vs requests, aiosqlite vs sqlite3).
+Sync code calls async function:
+  Top-level entrypoint        → asyncio.run(coro()).
+  Inside running loop         → never asyncio.run; create_task + await it.
+  Test suite                  → pytest-asyncio fixture; never raw run() in tests.
+```
+### Step 4 — Cancellation discipline
+Every long-running coroutine MUST be cancellation-safe:
+- Catch `asyncio.CancelledError`, perform cleanup, **re-raise**. Swallowing it silently breaks the propagation chain.
+- Use `try / finally` (or `async with`) around resource acquisition so cancellation cannot leak file handles, DB connections, locks.
+- Detached `create_task` without a strong reference is undefined behavior; either store the task or use a TaskGroup.
+### Step 5 — Don't block the event loop
+A single blocking call (sync I/O, time.sleep, CPU-heavy parse, large JSON load) freezes every coroutine. Audit every leaf function under `async def`:
+- Sleep → `await asyncio.sleep`, never `time.sleep`.
+- HTTP → `httpx.AsyncClient` / `aiohttp`, never `requests`.
+- DB → `asyncpg` / `aiosqlite` / `motor`, never the sync driver.
+- File → `aiofiles` for hot-paths, or `run_in_executor` for one-shots.
+## Procedure: Apply to a new async feature
+1. Run Step 1; reject if work is CPU-bound.
+2. Sketch the call graph; tag each `await` site with its primitive (Step 2).
+3. Mark every sync↔async boundary; pick the bridge per Step 3.
+4. For each long-running coroutine, write the cancel-safety contract (Step 4).
+5. Grep the leaf calls for blocking sins (Step 5); replace or push to executor.
+6. Hand the sketch to a reviewer **before** coding; cite this skill.
+## Output format
+1. Call-graph table: coroutine · concurrency primitive · timeout · cancel-safety note.
+2. Sync↔async boundary list: site · bridge · justification.
+3. Blocking-call audit: leaf function · status (async / executor / accepted-block + reason).
+4. Cancel-safety contract for each background task.
+## Gotcha
+- "It works in my REPL" — `asyncio.run` inside an already-running loop (Jupyter, FastAPI startup) raises `RuntimeError`. Use `await` directly or `nest_asyncio` (last resort).
+- `asyncio.gather` swallows the second exception silently; use `return_exceptions=True` and inspect, or use `TaskGroup` (cancels all on first error, surfaces the group).
+- `create_task` results that nobody awaits look fine until the program exits and Python prints `Task was destroyed but it is pending!`. Always `await` or use a TaskGroup.
+- `wait_for` on a non-cancellation-safe coroutine leaks resources; the timeout cancels the task but cleanup never runs.
+- Libraries that "support async" via thread pools (e.g. `requests-async`) often re-block the loop under load; verify with the cited upstream library docs, not the README.
+## Do NOT
+- Do NOT call `asyncio.run` from a running loop.
+- Do NOT swallow `CancelledError` without re-raising.
+- Do NOT call sync blocking I/O from async paths without `run_in_executor`.
+- Do NOT spawn `create_task` without storing the reference (or using TaskGroup).
+- Do NOT inline the asyncio cookbook into this skill — externalize per Sunset Policy.
+## Auto-trigger keywords
+- asyncio
+- async / await
+- gather / TaskGroup / wait_for
+- event loop blocking
+- cancellation
+- sync to async bridge
+## Provenance
+- Adopted from: `Microck/ordinary-claude-skills@8f5c83174f7aa683b4ddc7433150471983b93131:skills_all/async-python-patterns/SKILL.md` (MIT, © 2025 Microck) — **Sunset Policy applied**: 694-line cookbook source reduced to a ~140-line decision framework; pattern catalogues externalized to upstream docs below.
+- Externalized cookbook:
+  - asyncio core: https://docs.python.org/3/library/asyncio.html · https://docs.python.org/3/library/asyncio-task.html
+  - TaskGroup (3.11+): https://docs.python.org/3/library/asyncio-task.html#task-groups
+  - Structured concurrency: https://anyio.readthedocs.io · https://trio.readthedocs.io
+  - Async HTTP: https://www.python-httpx.org/async/ · https://docs.aiohttp.org/en/stable/
+  - Async DB: https://magicstack.github.io/asyncpg/ · https://aiosqlite.omnilib.dev/
+- Cross-linked: [`error-handling-patterns`](../error-handling-patterns/SKILL.md), [`mcp-builder`](../mcp-builder/SKILL.md), [`api-design`](../api-design/SKILL.md), [`performance`](../performance/SKILL.md).
+- Provenance registry: `agents/contexts/skills-provenance.yml` (entry: `async-python-patterns`).
+- Iron-Law floor: `verify-before-complete`, `skill-quality`, `non-destructive-by-default`.

package/.agent-src/skills/defense-in-depth/SKILL.md ADDED Viewed

@@ -0,0 +1,152 @@
+---
+name: defense-in-depth
+description: "Use when validation needs entry, business-logic, environment, and instrumentation guards so a bad value cannot reach the failure point — turns a local bug fix into a structural one."
+source: package
+---
+# defense-in-depth
+Validate at every layer the value passes through. Fixing the bug at one layer is locally sufficient and globally fragile — the next refactor, code path, mock, or platform edge case will rediscover it. Four-layer validation makes the bug *structurally* impossible.
+## When to use
+- Bug fix where invalid data caused failure several frames deep.
+- New entry point that funnels external input into existing internals.
+- Refactor that adds a second caller to a previously single-caller routine.
+- Test setup that shortcuts production guards (mocks bypassing entry validation).
+Do NOT use when:
+- Pure formatting / style change — no data flow, no layers to defend.
+- Boundary validation alone is correct (e.g. immutable value object with constructor invariant) — route to [`laravel-validation`](../laravel-validation/SKILL.md).
+- The fix belongs at a single architectural seam — adding three more guards is over-engineering. Use the gate function below to stop early.
+## Procedure: Apply the four-layer pattern
+### Step 0: Analyze the data flow before adding guards
+1. Identify where the bad value originates (test fixture, request body, env var, config).
+2. List every function that receives the value before the failure point.
+3. Mark which functions are reachable from production paths and which only from tests.
+### Step 1: Layer 1 — Entry-point validation
+Reject obviously invalid input at the API / route / command boundary. In Laravel this is FormRequest rules; in pure PHP services it is the public method on the service.
+```php
+public function createProject(string $name, string $workingDirectory): Project
+{
+    if (trim($workingDirectory) === '') {
+        throw new InvalidArgumentException('workingDirectory cannot be empty');
+    }
+    if (! is_dir($workingDirectory)) {
+        throw new InvalidArgumentException("workingDirectory does not exist: {$workingDirectory}");
+    }
+    if (! is_writable($workingDirectory)) {
+        throw new InvalidArgumentException("workingDirectory is not writable: {$workingDirectory}");
+    }
+    // ... proceed
+}
+```
+### Step 2: Layer 2 — Business-logic validation
+Verify the value still makes sense for the operation that consumes it. Different code paths can reach the same internal — re-check rather than trust the caller.
+```php
+public function initializeWorkspace(string $projectDir, string $sessionId): Workspace
+{
+    if ($projectDir === '') {
+        throw new RuntimeException('projectDir required for workspace initialization');
+    }
+    // ... proceed
+}
+```
+### Step 3: Layer 3 — Environment guards
+Refuse dangerous operations in the wrong context — most often: running a destructive command outside a test temp dir while the test suite is active.
+```php
+public function gitInit(string $directory): void
+{
+    if (app()->environment('testing')) {
+        $normalized = realpath($directory) ?: $directory;
+        $tmp = realpath(sys_get_temp_dir());
+        if ($tmp === false || ! str_starts_with($normalized, $tmp)) {
+            throw new RuntimeException("refusing git init outside tmp during tests: {$directory}");
+        }
+    }
+    // ... proceed
+}
+```
+### Step 4: Layer 4 — Debug instrumentation
+Capture context for forensics so the next failure surfaces *why*, not just *that*. Log only when the call is about to hit an irreversible side effect.
+```php
+public function gitInit(string $directory): void
+{
+    Log::debug('about to git init', [
+        'directory' => $directory,
+        'cwd' => getcwd(),
+        'trace' => (new Exception)->getTraceAsString(),
+    ]);
+    // ... proceed
+}
+```
+### Step 5: Verify each layer in isolation
+Try to bypass Layer 1 (call the internal directly) and confirm Layer 2 catches it. Mock the production guard and confirm Layer 3 still refuses. The pattern only earns its name when each layer is independently provable.
+## Gate function — when to stop adding layers
+```
+BEFORE adding the 5th guard:
+  STOP — re-check the data flow.
+  IF the value crosses ≤ 1 module boundary:
+    Use a single boundary check + a value-object invariant. Two layers max.
+  IF every layer would re-implement the same predicate:
+    Hoist the predicate into a value object / type and inject. One check is enough.
+  Layers are for distinct concerns: input shape vs operation invariant
+  vs environment risk vs forensic visibility. Same concern repeated is duplication, not depth.
+```
+## Output format
+1. The four guards (or a documented subset, with the gate-function justification).
+2. Tests that bypass each layer to prove the next layer catches the failure.
+3. One-line note on the data flow that motivated the layering.
+## Gotcha
+- Layers 1 and 2 must reject with **distinct** errors — same error string makes the second guard look like a duplicate.
+- Layer 3 environment checks should fail closed: unknown environment treated as production.
+- Layer 4 instrumentation must not change behavior — no early returns, no mutated state.
+- Test bypasses (in-process mocks) often skip Layer 1 — Layer 2 catches them; do not weaken Layer 2 to silence the test.
+## Do NOT
+- Do NOT replicate Layer 1 inside private methods that only Layer 1 can reach.
+- Do NOT log secrets in Layer 4 — sanitize before `Log::debug`.
+- Do NOT use Layer 3 to gate business logic — environments change, business rules do not.
+- Do NOT add a layer without a failing test that proves the layer was needed.
+## Auto-trigger keywords
+- defense in depth
+- multiple validation layers
+- bug deep in execution
+- structurally impossible
+## Provenance
+- Adopted from: `Microck/ordinary-claude-skills@8f5c83174f7aa683b4ddc7433150471983b93131:skills_all/defense-in-depth/SKILL.md` (MIT, © 2025 Microck).
+- Provenance registry: `agents/contexts/skills-provenance.yml` (entry: `defense-in-depth`).
+- Iron-Law floor: `non-destructive-by-default`, `verify-before-complete`, `skill-quality`.

package/.agent-src/skills/error-handling-patterns/SKILL.md ADDED Viewed

@@ -0,0 +1,134 @@
+---
+name: error-handling-patterns
+description: "Use when picking a failure-reporting strategy — exceptions vs Result types, recoverable vs not, retry / circuit-breaker / graceful degradation — decision framework only, catalogues externalized."
+source: package
+status: active
+refresh_trigger: "≥30% of cited upstream pattern catalogues become deprecated, OR a new top-2 ecosystem (Python/JS/PHP/Go/Rust) ships a paradigm-shifting standard error model"
+sunset_criterion: "When the upstream framework docs (Laravel, FastAPI, Express, Axum, Effect-TS) all carry an equivalent in-tree decision framework AND consumer projects no longer cite this skill in PR reviews for two consecutive review cycles."
+---
+# error-handling-patterns
+Decision framework for picking an error-handling strategy. **Catalogues of language-specific code live upstream** (links in § Provenance) — this skill is the predicate, not the pattern library. Sunset-policy compliant: large language-specific catalogues stay in authoritative upstream docs.
+## When to use
+- Designing how a new feature, API, or service reports failure.
+- Reviewing a diff that introduces a new exception class, `Result<T, E>`, or sentinel return.
+- Debugging production noise that traces back to inconsistent error semantics.
+- Choosing between retry, circuit-breaker, fallback, and fail-fast for an external dependency.
+Do NOT use when:
+- You only need the syntax for a `try/catch` in language X — read the upstream language guide directly.
+- The failure is a single-call Laravel validation error — route to [`laravel-validation`](../laravel-validation/SKILL.md).
+- The fix is a one-line null check in existing code — route to [`bug-analyzer`](../bug-analyzer/SKILL.md).
+## Decision framework
+### Step 1 — Classify the failure
+```
+Failure is:
+  caller's fault (bad input, missing auth)         → reject at boundary, structured error
+  expected operational (timeout, 404, rate-limit)  → Result-type / typed return; retry-aware
+  unexpected operational (DB down, OOM, deadlock)  → exception; observability + alert
+  programmer bug (null deref, off-by-one)          → crash early; do not catch
+```
+### Step 2 — Pick the reporting mechanism
+```
+IF failure is an EXPECTED, branchable outcome the caller will route on
+  → Result type / tagged union / typed error return.
+  Forces the caller to handle it; the type system is the proof.
+IF failure is UNEXPECTED and most callers cannot do anything useful
+  → exception, propagated to a single boundary handler.
+  One layer (HTTP, queue, CLI) translates exceptions to user-facing errors.
+IF failure is UNRECOVERABLE (invariant violated, data corruption)
+  → fail loud, fail fast. No catch-and-continue.
+  Log structured context, exit / panic / 500.
+IF the language idiom forces one choice (Go: errors are values; Rust: Result;
+   Python/PHP/JS: exceptions)
+  → follow the idiom. Inventing a foreign mechanism is more cost than the
+    correctness it buys.
+```
+### Step 3 — Pick the resilience strategy
+```
+External call?
+  Idempotent + transient failure mode  → retry with exponential backoff + jitter, cap.
+  Non-idempotent                       → no blind retry; require an idempotency key.
+  Repeated failure across instances    → circuit breaker; open → half-open probe → close.
+  Optional functionality               → graceful degradation (cached / default / null result).
+  Required functionality               → propagate; surface to user with a recovery hint.
+```
+### Step 4 — Shape the error payload
+Every produced error must carry: `code` (stable string), `message` (human-readable), `cause` (chained), `context` (sanitized inputs), `correlation_id` (request / trace).
+Forbidden: secrets, raw SQL, full stack traces in user-facing surfaces, internal class names leaked through API boundaries.
+### Step 5 — Define the boundary
+Exactly **one** layer translates internal errors to the egress format (HTTP status + body, queue requeue policy, CLI exit code). Anywhere else doing this duplication is the bug.
+## Procedure: Apply the framework to a new feature
+1. List failure modes (each external call, each invariant, each user input class).
+2. Run Step 1 against each, write the classification next to it.
+3. Pick reporting mechanism per Step 2; reject combinations the language idiom rejects.
+4. For each external call, run Step 3 and write down the chosen resilience strategy.
+5. Sketch the error payload shape (Step 4) and the single boundary (Step 5).
+6. Hand the sketch to a reviewer **before** coding; cite this skill.
+## Output format
+1. The failure-mode table (mode · classification · mechanism · resilience strategy).
+2. The shared error payload definition (code, message, cause, context, correlation_id).
+3. The single boundary handler (file:line) where internal → egress translation happens.
+4. The retry / circuit-breaker config (attempts, base, jitter, breaker thresholds), if any.
+## Gotcha
+- "Catch everything, log it, return null" silently destroys signal — every catch must either rethrow, translate, or recover with a written reason.
+- Retries on non-idempotent calls are the second-most-common production incident; insist on idempotency keys before allowing retry.
+- Circuit breakers without a half-open probe never close — they degrade to permanent failure.
+- Mixing Result types and exceptions in the same module is worse than picking the wrong one — pick one per module and stay in it.
+- Upstream pattern catalogues drift; trust the link, not memory. Refresh per `refresh_trigger` above.
+## Do NOT
+- Do NOT introduce a custom error mechanism that fights the language idiom.
+- Do NOT swallow exceptions — every catch has a written purpose.
+- Do NOT leak stack traces, secrets, or internal class names across the boundary.
+- Do NOT retry without backoff + jitter + cap.
+- Do NOT inline language-specific code catalogues into this skill — externalize per Sunset Policy.
+## Auto-trigger keywords
+- error handling strategy
+- exceptions vs result
+- retry pattern
+- circuit breaker
+- graceful degradation
+- error payload shape
+## Provenance
+- Adopted from: `Microck/ordinary-claude-skills@8f5c83174f7aa683b4ddc7433150471983b93131:skills_all/error-handling-patterns/SKILL.md` (MIT, © 2025 Microck) — **Sunset Policy applied**: 636-line source reduced to a ~150-line decision framework; language catalogues externalized to the upstream resources below.
+- Externalized catalogues:
+  - Python: https://docs.python.org/3/tutorial/errors.html · https://docs.python.org/3/library/exceptions.html
+  - PHP / Laravel: https://laravel.com/docs/errors · https://www.php.net/manual/en/language.exceptions.php
+  - JS / TS: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Control_flow_and_error_handling · https://www.typescriptlang.org/docs/handbook/2/narrowing.html
+  - Go: https://go.dev/blog/error-handling-and-go · https://pkg.go.dev/errors
+  - Rust: https://doc.rust-lang.org/book/ch09-00-error-handling.html
+  - Resilience patterns: https://martinfowler.com/bliki/CircuitBreaker.html · https://aws.amazon.com/builders-library/timeouts-retries-and-backoff-with-jitter/
+- Cross-linked: [`defense-in-depth`](../defense-in-depth/SKILL.md), [`laravel-validation`](../laravel-validation/SKILL.md), [`bug-analyzer`](../bug-analyzer/SKILL.md), [`api-design`](../api-design/SKILL.md).
+- Provenance registry: `agents/contexts/skills-provenance.yml` (entry: `error-handling-patterns`).
+- Iron-Law floor: `verify-before-complete`, `skill-quality`, `non-destructive-by-default`.

package/.agent-src/skills/mcp-builder/SKILL.md ADDED Viewed

@@ -0,0 +1,108 @@
+---
+name: mcp-builder
+description: "Use when building an MCP server in Python (FastMCP) or Node/TypeScript (MCP SDK) — agent-centric tool design, input schemas, error handling, and the 10-question evaluation harness."
+source: package
+---
+# mcp-builder
+Author MCP servers that LLMs can drive end-to-end. The quality bar is *can the agent finish the workflow*, not *does the endpoint return 200*. This skill is the **server-author** counterpart to the existing [`mcp`](../mcp/SKILL.md) consumer skill.
+## When to use
+- Wrapping an external API or service as MCP tools for an LLM client.
+- Adding tools to an existing MCP server (Python FastMCP or TypeScript SDK).
+- Reviewing an MCP server before shipping — Phase 4 evaluation gate below.
+Do NOT use when:
+- You only need to *call* an MCP server — route to [`mcp`](../mcp/SKILL.md).
+- The integration belongs in the host process — write a regular service, not an MCP server.
+- The "server" wraps one endpoint with no workflow — a CLI wrapper is enough.
+## Procedure: Four phases, one tool at a time
+### Phase 1 — Research & plan
+1. **Agent-centric design**. Tools encode *workflows*, not raw endpoints. Consolidate (`schedule_event` checks availability **and** creates the event). Default to human-readable names over IDs. Errors are educational, not just diagnostic ("retry with `filter='active_only'` to reduce results").
+2. **Load the protocol**. Fetch `https://modelcontextprotocol.io/llms-full.txt` once into context — the canonical spec.
+3. **Load the SDK README** for the chosen language:
+   - Python: `https://raw.githubusercontent.com/modelcontextprotocol/python-sdk/main/README.md`
+   - TypeScript: `https://raw.githubusercontent.com/modelcontextprotocol/typescript-sdk/main/README.md`
+4. **Read the target service's API docs in full** — auth, rate limits, pagination, error codes, schemas. Skipping this produces incomplete mocks (see [`testing-anti-patterns`](../testing-anti-patterns/SKILL.md) § Anti-Pattern 4).
+5. **Write the plan**: tool list with priority, shared utilities (request helper, pagination, formatter), input/output schemas, error strategy, response-detail levels (concise vs detailed), character limits (default 25 000 tokens).
+### Phase 2 — Implement
+1. **Project layout**. Python: single `.py` or modular package; Pydantic v2 with `model_config`. TypeScript: standard `package.json` + `tsconfig.json` strict mode; Zod schemas with `.strict()`.
+2. **Shared utilities first**. API request helper with retry/timeout, error formatter, JSON-vs-Markdown response builder, pagination cursor handling, auth/token cache.
+3. **Per tool**:
+   - Input schema (Pydantic / Zod) with constraints, descriptions, and *examples*.
+   - One-line summary + detailed docstring covering purpose, parameters, return shape, when-to-use, when-NOT-to-use, error handling.
+   - Tool annotations: `readOnlyHint`, `destructiveHint`, `idempotentHint`, `openWorldHint`.
+   - Async/await for all I/O. Honor pagination. Truncate to the character limit and signal truncation in the response.
+### Phase 3 — Review & test
+1. **Code-quality pass**: DRY across tools, shared helpers extracted, consistent response shapes, all external calls have error handling, full type coverage.
+2. **Build & syntax**:
+   - Python: `python -m py_compile server.py`.
+   - TypeScript: `npm run build`; verify `dist/index.js`.
+3. **Run the server safely**. MCP servers block on stdio. Either run inside `tmux` and drive from the harness, or wrap with `timeout 5s python server.py` for a smoke check. Do NOT block your own session by running it in-process.
+### Phase 4 — Evaluations (10-question harness)
+Each evaluation is a question the agent must answer using only the new tools.
+Requirements per question — **independent**, **read-only**, **complex** (multiple tool calls), **realistic**, **verifiable** (string-comparable answer), **stable** (answer does not drift over time).
+```xml
+<evaluation>
+  <qa_pair>
+    <question>...</question>
+    <answer>...</answer>
+  </qa_pair>
+  <!-- 9 more -->
+</evaluation>
+```
+Process: enumerate the tools, explore READ-ONLY data, draft 10 questions, **solve each yourself first** to confirm the answer is reachable and stable.
+## Output format
+1. The server source plus the 10-question evaluation XML.
+2. A README with: install, env vars, transport mode (stdio / sse / http), example tool call.
+3. A line in `agents/contexts/skills-provenance.yml` if the server was forked from an upstream, or a note that it was authored from scratch.
+## Gotcha
+- "Wrap every endpoint" is the failure mode — agents cannot orchestrate 60 thin tools as well as 12 workflow tools.
+- Returning the full upstream payload blows the agent's context. Default to a *concise* shape with an opt-in *detailed* mode.
+- Pydantic / Zod descriptions are the *only* documentation the LLM sees at runtime — write them like usage docs, not comments.
+- A server that hangs your session usually means stdio transport ran in the main process — move it under `tmux` or use a `timeout`.
+- Inflated token claims are not credible without an evaluation harness — Phase 4 is the validation gate, not optional.
+## Do NOT
+- Do NOT mirror REST routes 1:1.
+- Do NOT use `any` (TypeScript) or untyped `dict` (Python) in tool I/O.
+- Do NOT skip the 10-question evaluation — Phase 4 IS the quality bar.
+- Do NOT run the MCP server in your main process during testing — it will block.
+- Do NOT log tokens, API keys, or full request bodies — sanitize before logging.
+## Auto-trigger keywords
+- mcp server
+- model context protocol
+- fastmcp
+- mcp builder
+- agent-centric tools
+## Provenance
+- Upstream protocol: https://modelcontextprotocol.io
+- Upstream SDKs: https://github.com/modelcontextprotocol/python-sdk · https://github.com/modelcontextprotocol/typescript-sdk
+- Adopted from: `Microck/ordinary-claude-skills@8f5c83174f7aa683b4ddc7433150471983b93131:skills_all/mcp-builder/SKILL.md` (MIT, © 2025 Microck) — external `./reference/*.md` file links replaced with inline guidance + upstream URLs.
+- Cross-linked: [`mcp`](../mcp/SKILL.md), [`testing-anti-patterns`](../testing-anti-patterns/SKILL.md), [`api-design`](../api-design/SKILL.md).
+- Provenance registry: `agents/contexts/skills-provenance.yml` (entry: `mcp-builder`).
+- Iron-Law floor: `verify-before-complete`, `tool-safety`, `skill-quality`.