npm - @event4u/agent-config - Versions diffs - 1.28.0 → 1.31.0 - Mend

@event4u/agent-config 1.28.0 → 1.31.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (52) hide show

package/.agent-src/commands/agents/audit.md +101 -197
package/.agent-src/commands/{copilot-agents → agents}/init.md +18 -10
package/.agent-src/commands/agents/optimize.md +181 -0
package/.agent-src/commands/agents.md +19 -12
package/.agent-src/commands/optimize/agents-dir.md +111 -0
package/.agent-src/commands/optimize.md +10 -8
package/.agent-src/contexts/communication/rules-auto/guidelines-mechanics.md +6 -0
package/.agent-src/contexts/communication/rules-auto/slash-command-routing-policy-mechanics.md +2 -3
package/.agent-src/contexts/contracts/agents-md-anatomy.md +132 -0
package/.agent-src/skills/agents-md-thin-root/SKILL.md +8 -1
package/.agent-src/skills/async-python-patterns/SKILL.md +147 -0
package/.agent-src/skills/command-writing/SKILL.md +49 -0
package/.agent-src/skills/copilot-agents-optimization/SKILL.md +3 -3
package/.agent-src/skills/defense-in-depth/SKILL.md +152 -0
package/.agent-src/skills/error-handling-patterns/SKILL.md +134 -0
package/.agent-src/skills/mcp-builder/SKILL.md +108 -0
package/.agent-src/skills/prompt-engineering-patterns/SKILL.md +145 -0
package/.agent-src/skills/repomix-packer/SKILL.md +135 -0
package/.agent-src/skills/roadmap-writing/SKILL.md +9 -0
package/.agent-src/skills/rule-writing/SKILL.md +21 -0
package/.agent-src/skills/secrets-management/SKILL.md +142 -0
package/.agent-src/skills/skill-writing/SKILL.md +19 -0
package/.agent-src/skills/testing-anti-patterns/SKILL.md +152 -0
package/.agent-src/templates/AGENTS.md +9 -10
package/.claude-plugin/marketplace.json +12 -7
package/AGENTS.md +1 -2
package/CHANGELOG.md +113 -0
package/CONTRIBUTING.md +90 -0
package/README.md +3 -3
package/docs/architecture.md +3 -3
package/docs/catalog.md +19 -13
package/docs/contracts/command-clusters.md +20 -3
package/docs/contracts/file-ownership-matrix.json +511 -48
package/docs/contracts/package-self-orientation.md +1 -1
package/docs/getting-started.md +1 -1
package/docs/guidelines/code-clarity.md +95 -0
package/docs/guidelines/php/general.md +8 -0
package/docs/guidelines/php/php-coding-patterns.md +1 -0
package/docs/skills-catalog.md +27 -3
package/llms.txt +26 -2
package/package.json +1 -1
package/scripts/chat_history.py +166 -36
package/scripts/check_command_count_messaging.py +12 -3
package/scripts/check_portability.py +1 -0
package/scripts/lint_agents_md.py +33 -0
package/scripts/release.py +77 -2
package/scripts/skill_linter.py +10 -3
package/.agent-src/commands/agents/cleanup.md +0 -194
package/.agent-src/commands/agents/prepare.md +0 -141
package/.agent-src/commands/copilot-agents/optimize.md +0 -255
package/.agent-src/commands/copilot-agents.md +0 -44
package/.agent-src/commands/optimize/agents.md +0 -144

package/.agent-src/skills/copilot-agents-optimization/SKILL.md CHANGED Viewed

@@ -166,7 +166,7 @@ prevent. Fix or remove those BEFORE any dedup/compress work — there's
 no point deduplicating content that is about to be rewritten.
 When the drift is severe (whole sections are wrong), recommend
-`/copilot-agents-init` to scaffold a clean replacement rather than
+`/agents init` to scaffold a clean replacement rather than
 patching forever.
 ## agent-config Path Conventions — Preserve, Don't "Fix"
@@ -189,7 +189,7 @@ it as "redundant" and never trim its bullets. The patterns it covers:
 If the consumer project's `copilot-instructions.md` is missing the
 section, **add it** during optimization using the canonical block
 from `.augment/templates/copilot-instructions.md`. Surfaces include
-`/copilot-agents init` and `/copilot-agents optimize`.
+`/agents init` and `/agents optimize`.
 ## Optimization Checklist
@@ -212,7 +212,7 @@ When optimizing either file, check:
 ## Related
-- **Command:** `/copilot-agents-optimize`
+- **Command:** `/agents optimize`
 - **Skill:** `copilot-config` — Copilot behavior and PR review patterns
 - **Skill:** `agent-docs-writing` — documentation hierarchy
 - **Context:** `augment-infrastructure.md` — full `.augment/` overview

package/.agent-src/skills/defense-in-depth/SKILL.md ADDED Viewed

@@ -0,0 +1,152 @@
+---
+name: defense-in-depth
+description: "Use when validation needs entry, business-logic, environment, and instrumentation guards so a bad value cannot reach the failure point — turns a local bug fix into a structural one."
+source: package
+---
+# defense-in-depth
+Validate at every layer the value passes through. Fixing the bug at one layer is locally sufficient and globally fragile — the next refactor, code path, mock, or platform edge case will rediscover it. Four-layer validation makes the bug *structurally* impossible.
+## When to use
+- Bug fix where invalid data caused failure several frames deep.
+- New entry point that funnels external input into existing internals.
+- Refactor that adds a second caller to a previously single-caller routine.
+- Test setup that shortcuts production guards (mocks bypassing entry validation).
+Do NOT use when:
+- Pure formatting / style change — no data flow, no layers to defend.
+- Boundary validation alone is correct (e.g. immutable value object with constructor invariant) — route to [`laravel-validation`](../laravel-validation/SKILL.md).
+- The fix belongs at a single architectural seam — adding three more guards is over-engineering. Use the gate function below to stop early.
+## Procedure: Apply the four-layer pattern
+### Step 0: Analyze the data flow before adding guards
+1. Identify where the bad value originates (test fixture, request body, env var, config).
+2. List every function that receives the value before the failure point.
+3. Mark which functions are reachable from production paths and which only from tests.
+### Step 1: Layer 1 — Entry-point validation
+Reject obviously invalid input at the API / route / command boundary. In Laravel this is FormRequest rules; in pure PHP services it is the public method on the service.
+```php
+public function createProject(string $name, string $workingDirectory): Project
+{
+    if (trim($workingDirectory) === '') {
+        throw new InvalidArgumentException('workingDirectory cannot be empty');
+    }
+    if (! is_dir($workingDirectory)) {
+        throw new InvalidArgumentException("workingDirectory does not exist: {$workingDirectory}");
+    }
+    if (! is_writable($workingDirectory)) {
+        throw new InvalidArgumentException("workingDirectory is not writable: {$workingDirectory}");
+    }
+    // ... proceed
+}
+```
+### Step 2: Layer 2 — Business-logic validation
+Verify the value still makes sense for the operation that consumes it. Different code paths can reach the same internal — re-check rather than trust the caller.
+```php
+public function initializeWorkspace(string $projectDir, string $sessionId): Workspace
+{
+    if ($projectDir === '') {
+        throw new RuntimeException('projectDir required for workspace initialization');
+    }
+    // ... proceed
+}
+```
+### Step 3: Layer 3 — Environment guards
+Refuse dangerous operations in the wrong context — most often: running a destructive command outside a test temp dir while the test suite is active.
+```php
+public function gitInit(string $directory): void
+{
+    if (app()->environment('testing')) {
+        $normalized = realpath($directory) ?: $directory;
+        $tmp = realpath(sys_get_temp_dir());
+        if ($tmp === false || ! str_starts_with($normalized, $tmp)) {
+            throw new RuntimeException("refusing git init outside tmp during tests: {$directory}");
+        }
+    }
+    // ... proceed
+}
+```
+### Step 4: Layer 4 — Debug instrumentation
+Capture context for forensics so the next failure surfaces *why*, not just *that*. Log only when the call is about to hit an irreversible side effect.
+```php
+public function gitInit(string $directory): void
+{
+    Log::debug('about to git init', [
+        'directory' => $directory,
+        'cwd' => getcwd(),
+        'trace' => (new Exception)->getTraceAsString(),
+    ]);
+    // ... proceed
+}
+```
+### Step 5: Verify each layer in isolation
+Try to bypass Layer 1 (call the internal directly) and confirm Layer 2 catches it. Mock the production guard and confirm Layer 3 still refuses. The pattern only earns its name when each layer is independently provable.
+## Gate function — when to stop adding layers
+```
+BEFORE adding the 5th guard:
+  STOP — re-check the data flow.
+  IF the value crosses ≤ 1 module boundary:
+    Use a single boundary check + a value-object invariant. Two layers max.
+  IF every layer would re-implement the same predicate:
+    Hoist the predicate into a value object / type and inject. One check is enough.
+  Layers are for distinct concerns: input shape vs operation invariant
+  vs environment risk vs forensic visibility. Same concern repeated is duplication, not depth.
+```
+## Output format
+1. The four guards (or a documented subset, with the gate-function justification).
+2. Tests that bypass each layer to prove the next layer catches the failure.
+3. One-line note on the data flow that motivated the layering.
+## Gotcha
+- Layers 1 and 2 must reject with **distinct** errors — same error string makes the second guard look like a duplicate.
+- Layer 3 environment checks should fail closed: unknown environment treated as production.
+- Layer 4 instrumentation must not change behavior — no early returns, no mutated state.
+- Test bypasses (in-process mocks) often skip Layer 1 — Layer 2 catches them; do not weaken Layer 2 to silence the test.
+## Do NOT
+- Do NOT replicate Layer 1 inside private methods that only Layer 1 can reach.
+- Do NOT log secrets in Layer 4 — sanitize before `Log::debug`.
+- Do NOT use Layer 3 to gate business logic — environments change, business rules do not.
+- Do NOT add a layer without a failing test that proves the layer was needed.
+## Auto-trigger keywords
+- defense in depth
+- multiple validation layers
+- bug deep in execution
+- structurally impossible
+## Provenance
+- Adopted from: `Microck/ordinary-claude-skills@8f5c83174f7aa683b4ddc7433150471983b93131:skills_all/defense-in-depth/SKILL.md` (MIT, © 2025 Microck).
+- Provenance registry: `agents/contexts/skills-provenance.yml` (entry: `defense-in-depth`).
+- Iron-Law floor: `non-destructive-by-default`, `verify-before-complete`, `skill-quality`.

package/.agent-src/skills/error-handling-patterns/SKILL.md ADDED Viewed

@@ -0,0 +1,134 @@
+---
+name: error-handling-patterns
+description: "Use when picking a failure-reporting strategy — exceptions vs Result types, recoverable vs not, retry / circuit-breaker / graceful degradation — decision framework only, catalogues externalized."
+source: package
+status: active
+refresh_trigger: "≥30% of cited upstream pattern catalogues become deprecated, OR a new top-2 ecosystem (Python/JS/PHP/Go/Rust) ships a paradigm-shifting standard error model"
+sunset_criterion: "When the upstream framework docs (Laravel, FastAPI, Express, Axum, Effect-TS) all carry an equivalent in-tree decision framework AND consumer projects no longer cite this skill in PR reviews for two consecutive review cycles."
+---
+# error-handling-patterns
+Decision framework for picking an error-handling strategy. **Catalogues of language-specific code live upstream** (links in § Provenance) — this skill is the predicate, not the pattern library. Sunset-policy compliant: large language-specific catalogues stay in authoritative upstream docs.
+## When to use
+- Designing how a new feature, API, or service reports failure.
+- Reviewing a diff that introduces a new exception class, `Result<T, E>`, or sentinel return.
+- Debugging production noise that traces back to inconsistent error semantics.
+- Choosing between retry, circuit-breaker, fallback, and fail-fast for an external dependency.
+Do NOT use when:
+- You only need the syntax for a `try/catch` in language X — read the upstream language guide directly.
+- The failure is a single-call Laravel validation error — route to [`laravel-validation`](../laravel-validation/SKILL.md).
+- The fix is a one-line null check in existing code — route to [`bug-analyzer`](../bug-analyzer/SKILL.md).
+## Decision framework
+### Step 1 — Classify the failure
+```
+Failure is:
+  caller's fault (bad input, missing auth)         → reject at boundary, structured error
+  expected operational (timeout, 404, rate-limit)  → Result-type / typed return; retry-aware
+  unexpected operational (DB down, OOM, deadlock)  → exception; observability + alert
+  programmer bug (null deref, off-by-one)          → crash early; do not catch
+```
+### Step 2 — Pick the reporting mechanism
+```
+IF failure is an EXPECTED, branchable outcome the caller will route on
+  → Result type / tagged union / typed error return.
+  Forces the caller to handle it; the type system is the proof.
+IF failure is UNEXPECTED and most callers cannot do anything useful
+  → exception, propagated to a single boundary handler.
+  One layer (HTTP, queue, CLI) translates exceptions to user-facing errors.
+IF failure is UNRECOVERABLE (invariant violated, data corruption)
+  → fail loud, fail fast. No catch-and-continue.
+  Log structured context, exit / panic / 500.
+IF the language idiom forces one choice (Go: errors are values; Rust: Result;
+   Python/PHP/JS: exceptions)
+  → follow the idiom. Inventing a foreign mechanism is more cost than the
+    correctness it buys.
+```
+### Step 3 — Pick the resilience strategy
+```
+External call?
+  Idempotent + transient failure mode  → retry with exponential backoff + jitter, cap.
+  Non-idempotent                       → no blind retry; require an idempotency key.
+  Repeated failure across instances    → circuit breaker; open → half-open probe → close.
+  Optional functionality               → graceful degradation (cached / default / null result).
+  Required functionality               → propagate; surface to user with a recovery hint.
+```
+### Step 4 — Shape the error payload
+Every produced error must carry: `code` (stable string), `message` (human-readable), `cause` (chained), `context` (sanitized inputs), `correlation_id` (request / trace).
+Forbidden: secrets, raw SQL, full stack traces in user-facing surfaces, internal class names leaked through API boundaries.
+### Step 5 — Define the boundary
+Exactly **one** layer translates internal errors to the egress format (HTTP status + body, queue requeue policy, CLI exit code). Anywhere else doing this duplication is the bug.
+## Procedure: Apply the framework to a new feature
+1. **Inspect** the feature surface — identify every failure mode (each external call, each invariant, each user input class) and write it down.
+2. Run Step 1 of the decision framework against each entry; write the classification next to it.
+3. Pick reporting mechanism per Step 2; reject combinations the language idiom rejects.
+4. For each external call, run Step 3 and write down the chosen resilience strategy.
+5. Sketch the error payload shape (Step 4) and the single boundary (Step 5).
+6. Hand the sketch to a reviewer **before** coding; cite this skill.
+## Output format
+1. The failure-mode table (mode · classification · mechanism · resilience strategy).
+2. The shared error payload definition (code, message, cause, context, correlation_id).
+3. The single boundary handler (file:line) where internal → egress translation happens.
+4. The retry / circuit-breaker config (attempts, base, jitter, breaker thresholds), if any.
+## Gotcha
+- "Catch everything, log it, return null" silently destroys signal — every catch must either rethrow, translate, or recover with a written reason.
+- Retries on non-idempotent calls are the second-most-common production incident; insist on idempotency keys before allowing retry.
+- Circuit breakers without a half-open probe never close — they degrade to permanent failure.
+- Mixing Result types and exceptions in the same module is worse than picking the wrong one — pick one per module and stay in it.
+- Upstream pattern catalogues drift; trust the link, not memory. Refresh per `refresh_trigger` above.
+## Do NOT
+- Do NOT introduce a custom error mechanism that fights the language idiom.
+- Do NOT swallow exceptions — every catch has a written purpose.
+- Do NOT leak stack traces, secrets, or internal class names across the boundary.
+- Do NOT retry without backoff + jitter + cap.
+- Do NOT inline language-specific code catalogues into this skill — externalize per Sunset Policy.
+## Auto-trigger keywords
+- error handling strategy
+- exceptions vs result
+- retry pattern
+- circuit breaker
+- graceful degradation
+- error payload shape
+## Provenance
+- Adopted from: `Microck/ordinary-claude-skills@8f5c83174f7aa683b4ddc7433150471983b93131:skills_all/error-handling-patterns/SKILL.md` (MIT, © 2025 Microck) — **Sunset Policy applied**: 636-line source reduced to a ~150-line decision framework; language catalogues externalized to the upstream resources below.
+- Externalized catalogues:
+  - Python: https://docs.python.org/3/tutorial/errors.html · https://docs.python.org/3/library/exceptions.html
+  - PHP / Laravel: https://laravel.com/docs/errors · https://www.php.net/manual/en/language.exceptions.php
+  - JS / TS: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Control_flow_and_error_handling · https://www.typescriptlang.org/docs/handbook/2/narrowing.html
+  - Go: https://go.dev/blog/error-handling-and-go · https://pkg.go.dev/errors
+  - Rust: https://doc.rust-lang.org/book/ch09-00-error-handling.html
+  - Resilience patterns: https://martinfowler.com/bliki/CircuitBreaker.html · https://aws.amazon.com/builders-library/timeouts-retries-and-backoff-with-jitter/
+- Cross-linked: [`defense-in-depth`](../defense-in-depth/SKILL.md), [`laravel-validation`](../laravel-validation/SKILL.md), [`bug-analyzer`](../bug-analyzer/SKILL.md), [`api-design`](../api-design/SKILL.md).
+- Provenance registry: `agents/contexts/skills-provenance.yml` (entry: `error-handling-patterns`).
+- Iron-Law floor: `verify-before-complete`, `skill-quality`, `non-destructive-by-default`.

package/.agent-src/skills/mcp-builder/SKILL.md ADDED Viewed

@@ -0,0 +1,108 @@
+---
+name: mcp-builder
+description: "Use when building an MCP server in Python (FastMCP) or Node/TypeScript (MCP SDK) — agent-centric tool design, input schemas, error handling, and the 10-question evaluation harness."
+source: package
+---
+# mcp-builder
+Author MCP servers that LLMs can drive end-to-end. The quality bar is *can the agent finish the workflow*, not *does the endpoint return 200*. This skill is the **server-author** counterpart to the existing [`mcp`](../mcp/SKILL.md) consumer skill.
+## When to use
+- Wrapping an external API or service as MCP tools for an LLM client.
+- Adding tools to an existing MCP server (Python FastMCP or TypeScript SDK).
+- Reviewing an MCP server before shipping — Phase 4 evaluation gate below.
+Do NOT use when:
+- You only need to *call* an MCP server — route to [`mcp`](../mcp/SKILL.md).
+- The integration belongs in the host process — write a regular service, not an MCP server.
+- The "server" wraps one endpoint with no workflow — a CLI wrapper is enough.
+## Procedure: Four phases, one tool at a time
+### Phase 1 — Research & plan
+1. **Agent-centric design**. Tools encode *workflows*, not raw endpoints. Consolidate (`schedule_event` checks availability **and** creates the event). Default to human-readable names over IDs. Errors are educational, not just diagnostic ("retry with `filter='active_only'` to reduce results").
+2. **Load the protocol**. Fetch `https://modelcontextprotocol.io/llms-full.txt` once into context — the canonical spec.
+3. **Load the SDK README** for the chosen language:
+   - Python: `https://raw.githubusercontent.com/modelcontextprotocol/python-sdk/main/README.md`
+   - TypeScript: `https://raw.githubusercontent.com/modelcontextprotocol/typescript-sdk/main/README.md`
+4. **Read the target service's API docs in full** — auth, rate limits, pagination, error codes, schemas. Skipping this produces incomplete mocks (see [`testing-anti-patterns`](../testing-anti-patterns/SKILL.md) § Anti-Pattern 4).
+5. **Write the plan**: tool list with priority, shared utilities (request helper, pagination, formatter), input/output schemas, error strategy, response-detail levels (concise vs detailed), character limits (default 25 000 tokens).
+### Phase 2 — Implement
+1. **Project layout**. Python: single `.py` or modular package; Pydantic v2 with `model_config`. TypeScript: standard `package.json` + `tsconfig.json` strict mode; Zod schemas with `.strict()`.
+2. **Shared utilities first**. API request helper with retry/timeout, error formatter, JSON-vs-Markdown response builder, pagination cursor handling, auth/token cache.
+3. **Per tool**:
+   - Input schema (Pydantic / Zod) with constraints, descriptions, and *examples*.
+   - One-line summary + detailed docstring covering purpose, parameters, return shape, when-to-use, when-NOT-to-use, error handling.
+   - Tool annotations: `readOnlyHint`, `destructiveHint`, `idempotentHint`, `openWorldHint`.
+   - Async/await for all I/O. Honor pagination. Truncate to the character limit and signal truncation in the response.
+### Phase 3 — Review & test
+1. **Code-quality pass**: DRY across tools, shared helpers extracted, consistent response shapes, all external calls have error handling, full type coverage.
+2. **Build & syntax**:
+   - Python: `python -m py_compile server.py`.
+   - TypeScript: `npm run build`; verify `dist/index.js`.
+3. **Run the server safely**. MCP servers block on stdio. Either run inside `tmux` and drive from the harness, or wrap with `timeout 5s python server.py` for a smoke check. Do NOT block your own session by running it in-process.
+### Phase 4 — Evaluations (10-question harness)
+Each evaluation is a question the agent must answer using only the new tools.
+Requirements per question — **independent**, **read-only**, **complex** (multiple tool calls), **realistic**, **verifiable** (string-comparable answer), **stable** (answer does not drift over time).
+```xml
+<evaluation>
+  <qa_pair>
+    <question>...</question>
+    <answer>...</answer>
+  </qa_pair>
+  <!-- 9 more -->
+</evaluation>
+```
+Process: enumerate the tools, explore READ-ONLY data, draft 10 questions, **solve each yourself first** to confirm the answer is reachable and stable.
+## Output format
+1. The server source plus the 10-question evaluation XML.
+2. A README with: install, env vars, transport mode (stdio / sse / http), example tool call.
+3. A line in `agents/contexts/skills-provenance.yml` if the server was forked from an upstream, or a note that it was authored from scratch.
+## Gotcha
+- "Wrap every endpoint" is the failure mode — agents cannot orchestrate 60 thin tools as well as 12 workflow tools.
+- Returning the full upstream payload blows the agent's context. Default to a *concise* shape with an opt-in *detailed* mode.
+- Pydantic / Zod descriptions are the *only* documentation the LLM sees at runtime — write them like usage docs, not comments.
+- A server that hangs your session usually means stdio transport ran in the main process — move it under `tmux` or use a `timeout`.
+- Inflated token claims are not credible without an evaluation harness — Phase 4 is the validation gate, not optional.
+## Do NOT
+- Do NOT mirror REST routes 1:1.
+- Do NOT use `any` (TypeScript) or untyped `dict` (Python) in tool I/O.
+- Do NOT skip the 10-question evaluation — Phase 4 IS the quality bar.
+- Do NOT run the MCP server in your main process during testing — it will block.
+- Do NOT log tokens, API keys, or full request bodies — sanitize before logging.
+## Auto-trigger keywords
+- mcp server
+- model context protocol
+- fastmcp
+- mcp builder
+- agent-centric tools
+## Provenance
+- Upstream protocol: https://modelcontextprotocol.io
+- Upstream SDKs: https://github.com/modelcontextprotocol/python-sdk · https://github.com/modelcontextprotocol/typescript-sdk
+- Adopted from: `Microck/ordinary-claude-skills@8f5c83174f7aa683b4ddc7433150471983b93131:skills_all/mcp-builder/SKILL.md` (MIT, © 2025 Microck) — external `./reference/*.md` file links replaced with inline guidance + upstream URLs.
+- Cross-linked: [`mcp`](../mcp/SKILL.md), [`testing-anti-patterns`](../testing-anti-patterns/SKILL.md), [`api-design`](../api-design/SKILL.md).
+- Provenance registry: `agents/contexts/skills-provenance.yml` (entry: `mcp-builder`).
+- Iron-Law floor: `verify-before-complete`, `tool-safety`, `skill-quality`.

package/.agent-src/skills/prompt-engineering-patterns/SKILL.md ADDED Viewed

@@ -0,0 +1,145 @@
+---
+name: prompt-engineering-patterns
+description: "Use when designing production-LLM prompts — few-shot, chain-of-thought, system prompts, templates, self-verification — distinct from prompt-optimizer and refine-prompt."
+source: package
+status: active
+---
+# prompt-engineering-patterns
+Production patterns for LLM prompts: few-shot, chain-of-thought, system-prompt design, templating, self-verification. **Distinct surface** from sibling skills:
+- [`prompt-optimizer`](../prompt-optimizer/SKILL.md) — polishes a single end-user prompt for ChatGPT / Claude / Gemini.
+- [`refine-prompt`](../refine-prompt/SKILL.md) — refines a free-form work prompt into engine-ready acceptance criteria.
+- **This skill** — designs prompts that ship inside an application that calls an LLM at runtime.
+## When to use
+- Designing the system prompt for a new LLM-powered feature.
+- Building a few-shot template with dynamic example selection.
+- Adding chain-of-thought reasoning to a low-accuracy prompt.
+- Reviewing a prompt diff in production code.
+- Diagnosing inconsistent LLM outputs that look like prompt drift.
+Do NOT use when:
+- Polishing a one-off prompt for a chat session — route to `prompt-optimizer`.
+- Turning a Jira ticket into engine input — route to `refine-prompt`.
+- Tuning a model's weights — this skill is prompt-only, not fine-tuning.
+## Decision framework
+### Step 1 — Pick the prompt level (progressive disclosure)
+```
+Start at Level 1; only escalate when measurement says you must.
+Level 1  Direct instruction                    "Summarize this article."
+Level 2  + constraints (length, format, focus) "...in 3 bullets, key findings only."
+Level 3  + reasoning scaffold                  "Read first, identify findings, then summarize."
+Level 4  + few-shot examples                   "Like these examples: ..."
+Level 5  + self-verification step              "...then check answer against criteria; revise if fails."
+```
+Escalating without evidence is over-engineering. Each level adds tokens, latency, and a maintenance surface.
+### Step 2 — Structure the prompt
+Fixed instruction hierarchy — every production prompt fills these slots in order:
+```
+[System context]   role, expertise, constraints, safety
+[Task instruction] what to do, in one sentence
+[Examples]         few-shot demonstrations (optional)
+[Input data]       the user-supplied content
+[Output format]    schema, length, citation rules
+```
+Stable slots (system, task, format) belong in cached prompt prefixes; volatile slots (examples, input) belong in the per-call portion.
+### Step 3 — Pick the few-shot strategy
+```
+Examples are uniform and small (< 20)         → embed all of them; deterministic.
+Examples are large or diverse                 → semantic-similarity retrieval per call.
+Edge cases dominate                           → diversity-sampled examples (cluster + pick one per cluster).
+Token budget tight                            → fewer, higher-quality examples beats many mediocre.
+Examples drift with the data                  → regenerate from a labeled corpus on a schedule, not hand-edited.
+```
+Bad examples are worse than no examples — the model imitates structure.
+### Step 4 — Add chain-of-thought ONLY when measured
+CoT improves accuracy on multi-step reasoning, hurts on classification and lookup. Decision rule:
+```
+Task is multi-step / arithmetic / multi-hop   → add CoT (zero-shot "let's think step by step", or few-shot CoT).
+Task is single-step extraction / classify     → CoT adds tokens without lift; skip.
+You haven't measured                          → measure first, decide second.
+Self-consistency needed (high-stakes answers) → sample N reasoning paths, majority vote.
+```
+### Step 5 — Build error recovery into the prompt
+Production prompts handle their own failure cases:
+- Specify the explicit "I don't know" output (don't let the model invent).
+- Require a confidence indicator when downstream code needs to gate.
+- Define the format for "missing information" so callers can branch.
+- For self-verification: specify the criteria, then the revision rule.
+### Step 6 — Treat prompts as code
+- Version every prompt (file + git, not a wiki page).
+- Test on a frozen evaluation set before shipping changes.
+- Track P50 / P95 latency, token usage, accuracy, success rate per version.
+- A/B test prompt variants behind a flag; never edit a live prompt without a rollback path.
+## Procedure: Apply to a new LLM feature
+1. **Inspect** the existing prompt (if any) and the eval set; verify a success metric exists (accuracy / consistency / latency / token cost) — refuse to design without it.
+2. Draft Level-1 prompt (Step 1) and measure on the eval set.
+3. Escalate one level at a time (Step 1) until metric is met or budget runs out.
+4. Lock the structure (Step 2), choose few-shot strategy (Step 3), decide CoT (Step 4).
+5. Add error-recovery clauses (Step 5).
+6. Commit prompt + eval results + chosen version (Step 6); cite this skill.
+## Output format
+1. Prompt-spec table: slot · content · stable-vs-volatile · cached-vs-per-call.
+2. Eval results table: prompt-version · metric · delta-vs-previous.
+3. Failure-mode list: trigger · prompt clause that handles it.
+## Gotcha
+- Few-shot examples leak the model's style — examples that include hedging produce hedging.
+- "Let's think step by step" works zero-shot on capable models, fails on smaller models without exemplar reasoning traces.
+- Self-consistency (N samples + vote) multiplies cost by N — only on high-stakes paths.
+- Cached prompt prefixes only cache when byte-identical — a single reformat busts the cache.
+- Prompts that drift across model versions silently regress accuracy when the provider rolls a model update; pin model version OR re-run eval per release.
+## Do NOT
+- Do NOT escalate to Level 4 / 5 before measuring at lower levels.
+- Do NOT mix few-shot examples from different tasks; the model averages them.
+- Do NOT add CoT to single-step classification — it hurts.
+- Do NOT hand-edit production prompts without versioning + eval.
+- Do NOT echo secrets or PII into the prompt — they end up in provider logs.
+## Auto-trigger keywords
+- prompt engineering
+- few-shot learning
+- chain-of-thought
+- system prompt design
+- prompt template
+- LLM prompt versioning
+- prompt evaluation
+## Provenance
+- Adopted from: `Microck/ordinary-claude-skills@8f5c83174f7aa683b4ddc7433150471983b93131:skills_all/prompt-engineering-patterns/SKILL.md` (MIT, © 2025 Microck) — restructured into a decision-framework shape; vendor `prompt_optimizer` Python snippets dropped (project-specific to Microck).
+- Cross-linked: [`prompt-optimizer`](../prompt-optimizer/SKILL.md), [`refine-prompt`](../refine-prompt/SKILL.md), [`mcp-builder`](../mcp-builder/SKILL.md), [`async-python-patterns`](../async-python-patterns/SKILL.md).
+- Provenance registry: `agents/contexts/skills-provenance.yml` (entry: `prompt-engineering-patterns`).
+- Iron-Law floor: `verify-before-complete`, `skill-quality`, `non-destructive-by-default`.