@event4u/agent-config 1.28.0 → 1.31.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (52) hide show
  1. package/.agent-src/commands/agents/audit.md +101 -197
  2. package/.agent-src/commands/{copilot-agents → agents}/init.md +18 -10
  3. package/.agent-src/commands/agents/optimize.md +181 -0
  4. package/.agent-src/commands/agents.md +19 -12
  5. package/.agent-src/commands/optimize/agents-dir.md +111 -0
  6. package/.agent-src/commands/optimize.md +10 -8
  7. package/.agent-src/contexts/communication/rules-auto/guidelines-mechanics.md +6 -0
  8. package/.agent-src/contexts/communication/rules-auto/slash-command-routing-policy-mechanics.md +2 -3
  9. package/.agent-src/contexts/contracts/agents-md-anatomy.md +132 -0
  10. package/.agent-src/skills/agents-md-thin-root/SKILL.md +8 -1
  11. package/.agent-src/skills/async-python-patterns/SKILL.md +147 -0
  12. package/.agent-src/skills/command-writing/SKILL.md +49 -0
  13. package/.agent-src/skills/copilot-agents-optimization/SKILL.md +3 -3
  14. package/.agent-src/skills/defense-in-depth/SKILL.md +152 -0
  15. package/.agent-src/skills/error-handling-patterns/SKILL.md +134 -0
  16. package/.agent-src/skills/mcp-builder/SKILL.md +108 -0
  17. package/.agent-src/skills/prompt-engineering-patterns/SKILL.md +145 -0
  18. package/.agent-src/skills/repomix-packer/SKILL.md +135 -0
  19. package/.agent-src/skills/roadmap-writing/SKILL.md +9 -0
  20. package/.agent-src/skills/rule-writing/SKILL.md +21 -0
  21. package/.agent-src/skills/secrets-management/SKILL.md +142 -0
  22. package/.agent-src/skills/skill-writing/SKILL.md +19 -0
  23. package/.agent-src/skills/testing-anti-patterns/SKILL.md +152 -0
  24. package/.agent-src/templates/AGENTS.md +9 -10
  25. package/.claude-plugin/marketplace.json +12 -7
  26. package/AGENTS.md +1 -2
  27. package/CHANGELOG.md +113 -0
  28. package/CONTRIBUTING.md +90 -0
  29. package/README.md +3 -3
  30. package/docs/architecture.md +3 -3
  31. package/docs/catalog.md +19 -13
  32. package/docs/contracts/command-clusters.md +20 -3
  33. package/docs/contracts/file-ownership-matrix.json +511 -48
  34. package/docs/contracts/package-self-orientation.md +1 -1
  35. package/docs/getting-started.md +1 -1
  36. package/docs/guidelines/code-clarity.md +95 -0
  37. package/docs/guidelines/php/general.md +8 -0
  38. package/docs/guidelines/php/php-coding-patterns.md +1 -0
  39. package/docs/skills-catalog.md +27 -3
  40. package/llms.txt +26 -2
  41. package/package.json +1 -1
  42. package/scripts/chat_history.py +166 -36
  43. package/scripts/check_command_count_messaging.py +12 -3
  44. package/scripts/check_portability.py +1 -0
  45. package/scripts/lint_agents_md.py +33 -0
  46. package/scripts/release.py +77 -2
  47. package/scripts/skill_linter.py +10 -3
  48. package/.agent-src/commands/agents/cleanup.md +0 -194
  49. package/.agent-src/commands/agents/prepare.md +0 -141
  50. package/.agent-src/commands/copilot-agents/optimize.md +0 -255
  51. package/.agent-src/commands/copilot-agents.md +0 -44
  52. package/.agent-src/commands/optimize/agents.md +0 -144
@@ -166,7 +166,7 @@ prevent. Fix or remove those BEFORE any dedup/compress work — there's
166
166
  no point deduplicating content that is about to be rewritten.
167
167
 
168
168
  When the drift is severe (whole sections are wrong), recommend
169
- `/copilot-agents-init` to scaffold a clean replacement rather than
169
+ `/agents init` to scaffold a clean replacement rather than
170
170
  patching forever.
171
171
 
172
172
  ## agent-config Path Conventions — Preserve, Don't "Fix"
@@ -189,7 +189,7 @@ it as "redundant" and never trim its bullets. The patterns it covers:
189
189
  If the consumer project's `copilot-instructions.md` is missing the
190
190
  section, **add it** during optimization using the canonical block
191
191
  from `.augment/templates/copilot-instructions.md`. Surfaces include
192
- `/copilot-agents init` and `/copilot-agents optimize`.
192
+ `/agents init` and `/agents optimize`.
193
193
 
194
194
  ## Optimization Checklist
195
195
 
@@ -212,7 +212,7 @@ When optimizing either file, check:
212
212
 
213
213
  ## Related
214
214
 
215
- - **Command:** `/copilot-agents-optimize`
215
+ - **Command:** `/agents optimize`
216
216
  - **Skill:** `copilot-config` — Copilot behavior and PR review patterns
217
217
  - **Skill:** `agent-docs-writing` — documentation hierarchy
218
218
  - **Context:** `augment-infrastructure.md` — full `.augment/` overview
@@ -0,0 +1,152 @@
1
+ ---
2
+ name: defense-in-depth
3
+ description: "Use when validation needs entry, business-logic, environment, and instrumentation guards so a bad value cannot reach the failure point — turns a local bug fix into a structural one."
4
+ source: package
5
+ ---
6
+
7
+ # defense-in-depth
8
+
9
+ Validate at every layer the value passes through. Fixing the bug at one layer is locally sufficient and globally fragile — the next refactor, code path, mock, or platform edge case will rediscover it. Four-layer validation makes the bug *structurally* impossible.
10
+
11
+ ## When to use
12
+
13
+ - Bug fix where invalid data caused failure several frames deep.
14
+ - New entry point that funnels external input into existing internals.
15
+ - Refactor that adds a second caller to a previously single-caller routine.
16
+ - Test setup that shortcuts production guards (mocks bypassing entry validation).
17
+
18
+ Do NOT use when:
19
+
20
+ - Pure formatting / style change — no data flow, no layers to defend.
21
+ - Boundary validation alone is correct (e.g. immutable value object with constructor invariant) — route to [`laravel-validation`](../laravel-validation/SKILL.md).
22
+ - The fix belongs at a single architectural seam — adding three more guards is over-engineering. Use the gate function below to stop early.
23
+
24
+ ## Procedure: Apply the four-layer pattern
25
+
26
+ ### Step 0: Analyze the data flow before adding guards
27
+
28
+ 1. Identify where the bad value originates (test fixture, request body, env var, config).
29
+ 2. List every function that receives the value before the failure point.
30
+ 3. Mark which functions are reachable from production paths and which only from tests.
31
+
32
+ ### Step 1: Layer 1 — Entry-point validation
33
+
34
+ Reject obviously invalid input at the API / route / command boundary. In Laravel this is FormRequest rules; in pure PHP services it is the public method on the service.
35
+
36
+ ```php
37
+ public function createProject(string $name, string $workingDirectory): Project
38
+ {
39
+ if (trim($workingDirectory) === '') {
40
+ throw new InvalidArgumentException('workingDirectory cannot be empty');
41
+ }
42
+ if (! is_dir($workingDirectory)) {
43
+ throw new InvalidArgumentException("workingDirectory does not exist: {$workingDirectory}");
44
+ }
45
+ if (! is_writable($workingDirectory)) {
46
+ throw new InvalidArgumentException("workingDirectory is not writable: {$workingDirectory}");
47
+ }
48
+ // ... proceed
49
+ }
50
+ ```
51
+
52
+ ### Step 2: Layer 2 — Business-logic validation
53
+
54
+ Verify the value still makes sense for the operation that consumes it. Different code paths can reach the same internal — re-check rather than trust the caller.
55
+
56
+ ```php
57
+ public function initializeWorkspace(string $projectDir, string $sessionId): Workspace
58
+ {
59
+ if ($projectDir === '') {
60
+ throw new RuntimeException('projectDir required for workspace initialization');
61
+ }
62
+ // ... proceed
63
+ }
64
+ ```
65
+
66
+ ### Step 3: Layer 3 — Environment guards
67
+
68
+ Refuse dangerous operations in the wrong context — most often: running a destructive command outside a test temp dir while the test suite is active.
69
+
70
+ ```php
71
+ public function gitInit(string $directory): void
72
+ {
73
+ if (app()->environment('testing')) {
74
+ $normalized = realpath($directory) ?: $directory;
75
+ $tmp = realpath(sys_get_temp_dir());
76
+
77
+ if ($tmp === false || ! str_starts_with($normalized, $tmp)) {
78
+ throw new RuntimeException("refusing git init outside tmp during tests: {$directory}");
79
+ }
80
+ }
81
+ // ... proceed
82
+ }
83
+ ```
84
+
85
+ ### Step 4: Layer 4 — Debug instrumentation
86
+
87
+ Capture context for forensics so the next failure surfaces *why*, not just *that*. Log only when the call is about to hit an irreversible side effect.
88
+
89
+ ```php
90
+ public function gitInit(string $directory): void
91
+ {
92
+ Log::debug('about to git init', [
93
+ 'directory' => $directory,
94
+ 'cwd' => getcwd(),
95
+ 'trace' => (new Exception)->getTraceAsString(),
96
+ ]);
97
+ // ... proceed
98
+ }
99
+ ```
100
+
101
+ ### Step 5: Verify each layer in isolation
102
+
103
+ Try to bypass Layer 1 (call the internal directly) and confirm Layer 2 catches it. Mock the production guard and confirm Layer 3 still refuses. The pattern only earns its name when each layer is independently provable.
104
+
105
+ ## Gate function — when to stop adding layers
106
+
107
+ ```
108
+ BEFORE adding the 5th guard:
109
+ STOP — re-check the data flow.
110
+
111
+ IF the value crosses ≤ 1 module boundary:
112
+ Use a single boundary check + a value-object invariant. Two layers max.
113
+
114
+ IF every layer would re-implement the same predicate:
115
+ Hoist the predicate into a value object / type and inject. One check is enough.
116
+
117
+ Layers are for distinct concerns: input shape vs operation invariant
118
+ vs environment risk vs forensic visibility. Same concern repeated is duplication, not depth.
119
+ ```
120
+
121
+ ## Output format
122
+
123
+ 1. The four guards (or a documented subset, with the gate-function justification).
124
+ 2. Tests that bypass each layer to prove the next layer catches the failure.
125
+ 3. One-line note on the data flow that motivated the layering.
126
+
127
+ ## Gotcha
128
+
129
+ - Layers 1 and 2 must reject with **distinct** errors — same error string makes the second guard look like a duplicate.
130
+ - Layer 3 environment checks should fail closed: unknown environment treated as production.
131
+ - Layer 4 instrumentation must not change behavior — no early returns, no mutated state.
132
+ - Test bypasses (in-process mocks) often skip Layer 1 — Layer 2 catches them; do not weaken Layer 2 to silence the test.
133
+
134
+ ## Do NOT
135
+
136
+ - Do NOT replicate Layer 1 inside private methods that only Layer 1 can reach.
137
+ - Do NOT log secrets in Layer 4 — sanitize before `Log::debug`.
138
+ - Do NOT use Layer 3 to gate business logic — environments change, business rules do not.
139
+ - Do NOT add a layer without a failing test that proves the layer was needed.
140
+
141
+ ## Auto-trigger keywords
142
+
143
+ - defense in depth
144
+ - multiple validation layers
145
+ - bug deep in execution
146
+ - structurally impossible
147
+
148
+ ## Provenance
149
+
150
+ - Adopted from: `Microck/ordinary-claude-skills@8f5c83174f7aa683b4ddc7433150471983b93131:skills_all/defense-in-depth/SKILL.md` (MIT, © 2025 Microck).
151
+ - Provenance registry: `agents/contexts/skills-provenance.yml` (entry: `defense-in-depth`).
152
+ - Iron-Law floor: `non-destructive-by-default`, `verify-before-complete`, `skill-quality`.
@@ -0,0 +1,134 @@
1
+ ---
2
+ name: error-handling-patterns
3
+ description: "Use when picking a failure-reporting strategy — exceptions vs Result types, recoverable vs not, retry / circuit-breaker / graceful degradation — decision framework only, catalogues externalized."
4
+ source: package
5
+ status: active
6
+ refresh_trigger: "≥30% of cited upstream pattern catalogues become deprecated, OR a new top-2 ecosystem (Python/JS/PHP/Go/Rust) ships a paradigm-shifting standard error model"
7
+ sunset_criterion: "When the upstream framework docs (Laravel, FastAPI, Express, Axum, Effect-TS) all carry an equivalent in-tree decision framework AND consumer projects no longer cite this skill in PR reviews for two consecutive review cycles."
8
+ ---
9
+
10
+ # error-handling-patterns
11
+
12
+ Decision framework for picking an error-handling strategy. **Catalogues of language-specific code live upstream** (links in § Provenance) — this skill is the predicate, not the pattern library. Sunset-policy compliant: large language-specific catalogues stay in authoritative upstream docs.
13
+
14
+ ## When to use
15
+
16
+ - Designing how a new feature, API, or service reports failure.
17
+ - Reviewing a diff that introduces a new exception class, `Result<T, E>`, or sentinel return.
18
+ - Debugging production noise that traces back to inconsistent error semantics.
19
+ - Choosing between retry, circuit-breaker, fallback, and fail-fast for an external dependency.
20
+
21
+ Do NOT use when:
22
+
23
+ - You only need the syntax for a `try/catch` in language X — read the upstream language guide directly.
24
+ - The failure is a single-call Laravel validation error — route to [`laravel-validation`](../laravel-validation/SKILL.md).
25
+ - The fix is a one-line null check in existing code — route to [`bug-analyzer`](../bug-analyzer/SKILL.md).
26
+
27
+ ## Decision framework
28
+
29
+ ### Step 1 — Classify the failure
30
+
31
+ ```
32
+ Failure is:
33
+ caller's fault (bad input, missing auth) → reject at boundary, structured error
34
+ expected operational (timeout, 404, rate-limit) → Result-type / typed return; retry-aware
35
+ unexpected operational (DB down, OOM, deadlock) → exception; observability + alert
36
+ programmer bug (null deref, off-by-one) → crash early; do not catch
37
+ ```
38
+
39
+ ### Step 2 — Pick the reporting mechanism
40
+
41
+ ```
42
+ IF failure is an EXPECTED, branchable outcome the caller will route on
43
+ → Result type / tagged union / typed error return.
44
+ Forces the caller to handle it; the type system is the proof.
45
+
46
+ IF failure is UNEXPECTED and most callers cannot do anything useful
47
+ → exception, propagated to a single boundary handler.
48
+ One layer (HTTP, queue, CLI) translates exceptions to user-facing errors.
49
+
50
+ IF failure is UNRECOVERABLE (invariant violated, data corruption)
51
+ → fail loud, fail fast. No catch-and-continue.
52
+ Log structured context, exit / panic / 500.
53
+
54
+ IF the language idiom forces one choice (Go: errors are values; Rust: Result;
55
+ Python/PHP/JS: exceptions)
56
+ → follow the idiom. Inventing a foreign mechanism is more cost than the
57
+ correctness it buys.
58
+ ```
59
+
60
+ ### Step 3 — Pick the resilience strategy
61
+
62
+ ```
63
+ External call?
64
+ Idempotent + transient failure mode → retry with exponential backoff + jitter, cap.
65
+ Non-idempotent → no blind retry; require an idempotency key.
66
+ Repeated failure across instances → circuit breaker; open → half-open probe → close.
67
+ Optional functionality → graceful degradation (cached / default / null result).
68
+ Required functionality → propagate; surface to user with a recovery hint.
69
+ ```
70
+
71
+ ### Step 4 — Shape the error payload
72
+
73
+ Every produced error must carry: `code` (stable string), `message` (human-readable), `cause` (chained), `context` (sanitized inputs), `correlation_id` (request / trace).
74
+
75
+ Forbidden: secrets, raw SQL, full stack traces in user-facing surfaces, internal class names leaked through API boundaries.
76
+
77
+ ### Step 5 — Define the boundary
78
+
79
+ Exactly **one** layer translates internal errors to the egress format (HTTP status + body, queue requeue policy, CLI exit code). Anywhere else doing this duplication is the bug.
80
+
81
+ ## Procedure: Apply the framework to a new feature
82
+
83
+ 1. **Inspect** the feature surface — identify every failure mode (each external call, each invariant, each user input class) and write it down.
84
+ 2. Run Step 1 of the decision framework against each entry; write the classification next to it.
85
+ 3. Pick reporting mechanism per Step 2; reject combinations the language idiom rejects.
86
+ 4. For each external call, run Step 3 and write down the chosen resilience strategy.
87
+ 5. Sketch the error payload shape (Step 4) and the single boundary (Step 5).
88
+ 6. Hand the sketch to a reviewer **before** coding; cite this skill.
89
+
90
+ ## Output format
91
+
92
+ 1. The failure-mode table (mode · classification · mechanism · resilience strategy).
93
+ 2. The shared error payload definition (code, message, cause, context, correlation_id).
94
+ 3. The single boundary handler (file:line) where internal → egress translation happens.
95
+ 4. The retry / circuit-breaker config (attempts, base, jitter, breaker thresholds), if any.
96
+
97
+ ## Gotcha
98
+
99
+ - "Catch everything, log it, return null" silently destroys signal — every catch must either rethrow, translate, or recover with a written reason.
100
+ - Retries on non-idempotent calls are the second-most-common production incident; insist on idempotency keys before allowing retry.
101
+ - Circuit breakers without a half-open probe never close — they degrade to permanent failure.
102
+ - Mixing Result types and exceptions in the same module is worse than picking the wrong one — pick one per module and stay in it.
103
+ - Upstream pattern catalogues drift; trust the link, not memory. Refresh per `refresh_trigger` above.
104
+
105
+ ## Do NOT
106
+
107
+ - Do NOT introduce a custom error mechanism that fights the language idiom.
108
+ - Do NOT swallow exceptions — every catch has a written purpose.
109
+ - Do NOT leak stack traces, secrets, or internal class names across the boundary.
110
+ - Do NOT retry without backoff + jitter + cap.
111
+ - Do NOT inline language-specific code catalogues into this skill — externalize per Sunset Policy.
112
+
113
+ ## Auto-trigger keywords
114
+
115
+ - error handling strategy
116
+ - exceptions vs result
117
+ - retry pattern
118
+ - circuit breaker
119
+ - graceful degradation
120
+ - error payload shape
121
+
122
+ ## Provenance
123
+
124
+ - Adopted from: `Microck/ordinary-claude-skills@8f5c83174f7aa683b4ddc7433150471983b93131:skills_all/error-handling-patterns/SKILL.md` (MIT, © 2025 Microck) — **Sunset Policy applied**: 636-line source reduced to a ~150-line decision framework; language catalogues externalized to the upstream resources below.
125
+ - Externalized catalogues:
126
+ - Python: https://docs.python.org/3/tutorial/errors.html · https://docs.python.org/3/library/exceptions.html
127
+ - PHP / Laravel: https://laravel.com/docs/errors · https://www.php.net/manual/en/language.exceptions.php
128
+ - JS / TS: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Control_flow_and_error_handling · https://www.typescriptlang.org/docs/handbook/2/narrowing.html
129
+ - Go: https://go.dev/blog/error-handling-and-go · https://pkg.go.dev/errors
130
+ - Rust: https://doc.rust-lang.org/book/ch09-00-error-handling.html
131
+ - Resilience patterns: https://martinfowler.com/bliki/CircuitBreaker.html · https://aws.amazon.com/builders-library/timeouts-retries-and-backoff-with-jitter/
132
+ - Cross-linked: [`defense-in-depth`](../defense-in-depth/SKILL.md), [`laravel-validation`](../laravel-validation/SKILL.md), [`bug-analyzer`](../bug-analyzer/SKILL.md), [`api-design`](../api-design/SKILL.md).
133
+ - Provenance registry: `agents/contexts/skills-provenance.yml` (entry: `error-handling-patterns`).
134
+ - Iron-Law floor: `verify-before-complete`, `skill-quality`, `non-destructive-by-default`.
@@ -0,0 +1,108 @@
1
+ ---
2
+ name: mcp-builder
3
+ description: "Use when building an MCP server in Python (FastMCP) or Node/TypeScript (MCP SDK) — agent-centric tool design, input schemas, error handling, and the 10-question evaluation harness."
4
+ source: package
5
+ ---
6
+
7
+ # mcp-builder
8
+
9
+ Author MCP servers that LLMs can drive end-to-end. The quality bar is *can the agent finish the workflow*, not *does the endpoint return 200*. This skill is the **server-author** counterpart to the existing [`mcp`](../mcp/SKILL.md) consumer skill.
10
+
11
+ ## When to use
12
+
13
+ - Wrapping an external API or service as MCP tools for an LLM client.
14
+ - Adding tools to an existing MCP server (Python FastMCP or TypeScript SDK).
15
+ - Reviewing an MCP server before shipping — Phase 4 evaluation gate below.
16
+
17
+ Do NOT use when:
18
+
19
+ - You only need to *call* an MCP server — route to [`mcp`](../mcp/SKILL.md).
20
+ - The integration belongs in the host process — write a regular service, not an MCP server.
21
+ - The "server" wraps one endpoint with no workflow — a CLI wrapper is enough.
22
+
23
+ ## Procedure: Four phases, one tool at a time
24
+
25
+ ### Phase 1 — Research & plan
26
+
27
+ 1. **Agent-centric design**. Tools encode *workflows*, not raw endpoints. Consolidate (`schedule_event` checks availability **and** creates the event). Default to human-readable names over IDs. Errors are educational, not just diagnostic ("retry with `filter='active_only'` to reduce results").
28
+ 2. **Load the protocol**. Fetch `https://modelcontextprotocol.io/llms-full.txt` once into context — the canonical spec.
29
+ 3. **Load the SDK README** for the chosen language:
30
+ - Python: `https://raw.githubusercontent.com/modelcontextprotocol/python-sdk/main/README.md`
31
+ - TypeScript: `https://raw.githubusercontent.com/modelcontextprotocol/typescript-sdk/main/README.md`
32
+ 4. **Read the target service's API docs in full** — auth, rate limits, pagination, error codes, schemas. Skipping this produces incomplete mocks (see [`testing-anti-patterns`](../testing-anti-patterns/SKILL.md) § Anti-Pattern 4).
33
+ 5. **Write the plan**: tool list with priority, shared utilities (request helper, pagination, formatter), input/output schemas, error strategy, response-detail levels (concise vs detailed), character limits (default 25 000 tokens).
34
+
35
+ ### Phase 2 — Implement
36
+
37
+ 1. **Project layout**. Python: single `.py` or modular package; Pydantic v2 with `model_config`. TypeScript: standard `package.json` + `tsconfig.json` strict mode; Zod schemas with `.strict()`.
38
+ 2. **Shared utilities first**. API request helper with retry/timeout, error formatter, JSON-vs-Markdown response builder, pagination cursor handling, auth/token cache.
39
+ 3. **Per tool**:
40
+ - Input schema (Pydantic / Zod) with constraints, descriptions, and *examples*.
41
+ - One-line summary + detailed docstring covering purpose, parameters, return shape, when-to-use, when-NOT-to-use, error handling.
42
+ - Tool annotations: `readOnlyHint`, `destructiveHint`, `idempotentHint`, `openWorldHint`.
43
+ - Async/await for all I/O. Honor pagination. Truncate to the character limit and signal truncation in the response.
44
+
45
+ ### Phase 3 — Review & test
46
+
47
+ 1. **Code-quality pass**: DRY across tools, shared helpers extracted, consistent response shapes, all external calls have error handling, full type coverage.
48
+ 2. **Build & syntax**:
49
+ - Python: `python -m py_compile server.py`.
50
+ - TypeScript: `npm run build`; verify `dist/index.js`.
51
+ 3. **Run the server safely**. MCP servers block on stdio. Either run inside `tmux` and drive from the harness, or wrap with `timeout 5s python server.py` for a smoke check. Do NOT block your own session by running it in-process.
52
+
53
+ ### Phase 4 — Evaluations (10-question harness)
54
+
55
+ Each evaluation is a question the agent must answer using only the new tools.
56
+
57
+ Requirements per question — **independent**, **read-only**, **complex** (multiple tool calls), **realistic**, **verifiable** (string-comparable answer), **stable** (answer does not drift over time).
58
+
59
+ ```xml
60
+ <evaluation>
61
+ <qa_pair>
62
+ <question>...</question>
63
+ <answer>...</answer>
64
+ </qa_pair>
65
+ <!-- 9 more -->
66
+ </evaluation>
67
+ ```
68
+
69
+ Process: enumerate the tools, explore READ-ONLY data, draft 10 questions, **solve each yourself first** to confirm the answer is reachable and stable.
70
+
71
+ ## Output format
72
+
73
+ 1. The server source plus the 10-question evaluation XML.
74
+ 2. A README with: install, env vars, transport mode (stdio / sse / http), example tool call.
75
+ 3. A line in `agents/contexts/skills-provenance.yml` if the server was forked from an upstream, or a note that it was authored from scratch.
76
+
77
+ ## Gotcha
78
+
79
+ - "Wrap every endpoint" is the failure mode — agents cannot orchestrate 60 thin tools as well as 12 workflow tools.
80
+ - Returning the full upstream payload blows the agent's context. Default to a *concise* shape with an opt-in *detailed* mode.
81
+ - Pydantic / Zod descriptions are the *only* documentation the LLM sees at runtime — write them like usage docs, not comments.
82
+ - A server that hangs your session usually means stdio transport ran in the main process — move it under `tmux` or use a `timeout`.
83
+ - Inflated token claims are not credible without an evaluation harness — Phase 4 is the validation gate, not optional.
84
+
85
+ ## Do NOT
86
+
87
+ - Do NOT mirror REST routes 1:1.
88
+ - Do NOT use `any` (TypeScript) or untyped `dict` (Python) in tool I/O.
89
+ - Do NOT skip the 10-question evaluation — Phase 4 IS the quality bar.
90
+ - Do NOT run the MCP server in your main process during testing — it will block.
91
+ - Do NOT log tokens, API keys, or full request bodies — sanitize before logging.
92
+
93
+ ## Auto-trigger keywords
94
+
95
+ - mcp server
96
+ - model context protocol
97
+ - fastmcp
98
+ - mcp builder
99
+ - agent-centric tools
100
+
101
+ ## Provenance
102
+
103
+ - Upstream protocol: https://modelcontextprotocol.io
104
+ - Upstream SDKs: https://github.com/modelcontextprotocol/python-sdk · https://github.com/modelcontextprotocol/typescript-sdk
105
+ - Adopted from: `Microck/ordinary-claude-skills@8f5c83174f7aa683b4ddc7433150471983b93131:skills_all/mcp-builder/SKILL.md` (MIT, © 2025 Microck) — external `./reference/*.md` file links replaced with inline guidance + upstream URLs.
106
+ - Cross-linked: [`mcp`](../mcp/SKILL.md), [`testing-anti-patterns`](../testing-anti-patterns/SKILL.md), [`api-design`](../api-design/SKILL.md).
107
+ - Provenance registry: `agents/contexts/skills-provenance.yml` (entry: `mcp-builder`).
108
+ - Iron-Law floor: `verify-before-complete`, `tool-safety`, `skill-quality`.
@@ -0,0 +1,145 @@
1
+ ---
2
+ name: prompt-engineering-patterns
3
+ description: "Use when designing production-LLM prompts — few-shot, chain-of-thought, system prompts, templates, self-verification — distinct from prompt-optimizer and refine-prompt."
4
+ source: package
5
+ status: active
6
+ ---
7
+
8
+ # prompt-engineering-patterns
9
+
10
+ Production patterns for LLM prompts: few-shot, chain-of-thought, system-prompt design, templating, self-verification. **Distinct surface** from sibling skills:
11
+
12
+ - [`prompt-optimizer`](../prompt-optimizer/SKILL.md) — polishes a single end-user prompt for ChatGPT / Claude / Gemini.
13
+ - [`refine-prompt`](../refine-prompt/SKILL.md) — refines a free-form work prompt into engine-ready acceptance criteria.
14
+ - **This skill** — designs prompts that ship inside an application that calls an LLM at runtime.
15
+
16
+ ## When to use
17
+
18
+ - Designing the system prompt for a new LLM-powered feature.
19
+ - Building a few-shot template with dynamic example selection.
20
+ - Adding chain-of-thought reasoning to a low-accuracy prompt.
21
+ - Reviewing a prompt diff in production code.
22
+ - Diagnosing inconsistent LLM outputs that look like prompt drift.
23
+
24
+ Do NOT use when:
25
+
26
+ - Polishing a one-off prompt for a chat session — route to `prompt-optimizer`.
27
+ - Turning a Jira ticket into engine input — route to `refine-prompt`.
28
+ - Tuning a model's weights — this skill is prompt-only, not fine-tuning.
29
+
30
+ ## Decision framework
31
+
32
+ ### Step 1 — Pick the prompt level (progressive disclosure)
33
+
34
+ ```
35
+ Start at Level 1; only escalate when measurement says you must.
36
+
37
+ Level 1 Direct instruction "Summarize this article."
38
+ Level 2 + constraints (length, format, focus) "...in 3 bullets, key findings only."
39
+ Level 3 + reasoning scaffold "Read first, identify findings, then summarize."
40
+ Level 4 + few-shot examples "Like these examples: ..."
41
+ Level 5 + self-verification step "...then check answer against criteria; revise if fails."
42
+ ```
43
+
44
+ Escalating without evidence is over-engineering. Each level adds tokens, latency, and a maintenance surface.
45
+
46
+ ### Step 2 — Structure the prompt
47
+
48
+ Fixed instruction hierarchy — every production prompt fills these slots in order:
49
+
50
+ ```
51
+ [System context] role, expertise, constraints, safety
52
+ [Task instruction] what to do, in one sentence
53
+ [Examples] few-shot demonstrations (optional)
54
+ [Input data] the user-supplied content
55
+ [Output format] schema, length, citation rules
56
+ ```
57
+
58
+ Stable slots (system, task, format) belong in cached prompt prefixes; volatile slots (examples, input) belong in the per-call portion.
59
+
60
+ ### Step 3 — Pick the few-shot strategy
61
+
62
+ ```
63
+ Examples are uniform and small (< 20) → embed all of them; deterministic.
64
+ Examples are large or diverse → semantic-similarity retrieval per call.
65
+ Edge cases dominate → diversity-sampled examples (cluster + pick one per cluster).
66
+ Token budget tight → fewer, higher-quality examples beats many mediocre.
67
+ Examples drift with the data → regenerate from a labeled corpus on a schedule, not hand-edited.
68
+ ```
69
+
70
+ Bad examples are worse than no examples — the model imitates structure.
71
+
72
+ ### Step 4 — Add chain-of-thought ONLY when measured
73
+
74
+ CoT improves accuracy on multi-step reasoning, hurts on classification and lookup. Decision rule:
75
+
76
+ ```
77
+ Task is multi-step / arithmetic / multi-hop → add CoT (zero-shot "let's think step by step", or few-shot CoT).
78
+ Task is single-step extraction / classify → CoT adds tokens without lift; skip.
79
+ You haven't measured → measure first, decide second.
80
+ Self-consistency needed (high-stakes answers) → sample N reasoning paths, majority vote.
81
+ ```
82
+
83
+ ### Step 5 — Build error recovery into the prompt
84
+
85
+ Production prompts handle their own failure cases:
86
+
87
+ - Specify the explicit "I don't know" output (don't let the model invent).
88
+ - Require a confidence indicator when downstream code needs to gate.
89
+ - Define the format for "missing information" so callers can branch.
90
+ - For self-verification: specify the criteria, then the revision rule.
91
+
92
+ ### Step 6 — Treat prompts as code
93
+
94
+ - Version every prompt (file + git, not a wiki page).
95
+ - Test on a frozen evaluation set before shipping changes.
96
+ - Track P50 / P95 latency, token usage, accuracy, success rate per version.
97
+ - A/B test prompt variants behind a flag; never edit a live prompt without a rollback path.
98
+
99
+ ## Procedure: Apply to a new LLM feature
100
+
101
+ 1. **Inspect** the existing prompt (if any) and the eval set; verify a success metric exists (accuracy / consistency / latency / token cost) — refuse to design without it.
102
+ 2. Draft Level-1 prompt (Step 1) and measure on the eval set.
103
+ 3. Escalate one level at a time (Step 1) until metric is met or budget runs out.
104
+ 4. Lock the structure (Step 2), choose few-shot strategy (Step 3), decide CoT (Step 4).
105
+ 5. Add error-recovery clauses (Step 5).
106
+ 6. Commit prompt + eval results + chosen version (Step 6); cite this skill.
107
+
108
+ ## Output format
109
+
110
+ 1. Prompt-spec table: slot · content · stable-vs-volatile · cached-vs-per-call.
111
+ 2. Eval results table: prompt-version · metric · delta-vs-previous.
112
+ 3. Failure-mode list: trigger · prompt clause that handles it.
113
+
114
+ ## Gotcha
115
+
116
+ - Few-shot examples leak the model's style — examples that include hedging produce hedging.
117
+ - "Let's think step by step" works zero-shot on capable models, fails on smaller models without exemplar reasoning traces.
118
+ - Self-consistency (N samples + vote) multiplies cost by N — only on high-stakes paths.
119
+ - Cached prompt prefixes only cache when byte-identical — a single reformat busts the cache.
120
+ - Prompts that drift across model versions silently regress accuracy when the provider rolls a model update; pin model version OR re-run eval per release.
121
+
122
+ ## Do NOT
123
+
124
+ - Do NOT escalate to Level 4 / 5 before measuring at lower levels.
125
+ - Do NOT mix few-shot examples from different tasks; the model averages them.
126
+ - Do NOT add CoT to single-step classification — it hurts.
127
+ - Do NOT hand-edit production prompts without versioning + eval.
128
+ - Do NOT echo secrets or PII into the prompt — they end up in provider logs.
129
+
130
+ ## Auto-trigger keywords
131
+
132
+ - prompt engineering
133
+ - few-shot learning
134
+ - chain-of-thought
135
+ - system prompt design
136
+ - prompt template
137
+ - LLM prompt versioning
138
+ - prompt evaluation
139
+
140
+ ## Provenance
141
+
142
+ - Adopted from: `Microck/ordinary-claude-skills@8f5c83174f7aa683b4ddc7433150471983b93131:skills_all/prompt-engineering-patterns/SKILL.md` (MIT, © 2025 Microck) — restructured into a decision-framework shape; vendor `prompt_optimizer` Python snippets dropped (project-specific to Microck).
143
+ - Cross-linked: [`prompt-optimizer`](../prompt-optimizer/SKILL.md), [`refine-prompt`](../refine-prompt/SKILL.md), [`mcp-builder`](../mcp-builder/SKILL.md), [`async-python-patterns`](../async-python-patterns/SKILL.md).
144
+ - Provenance registry: `agents/contexts/skills-provenance.yml` (entry: `prompt-engineering-patterns`).
145
+ - Iron-Law floor: `verify-before-complete`, `skill-quality`, `non-destructive-by-default`.