npm - @ai-agent-lead/skills - Versions diffs - 1.0.0 - Mend

@ai-agent-lead/skills 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (48) hide show

package/README.md +37 -0
package/bin/install.js +272 -0
package/package.json +34 -0
package/skills/LANGUAGE.md +72 -0
package/skills/README.md +156 -0
package/skills/SKILL-TEMPLATE.md +120 -0
package/skills/TRIGGERS.md +64 -0
package/skills/WORKFLOWS.md +369 -0
package/skills/bench/SKILL.md +40 -0
package/skills/bench/templates/benchmark-report.md +26 -0
package/skills/bootstrap/BOOTSTRAP.md +13 -0
package/skills/bootstrap/SKILL.md +47 -0
package/skills/code-hygiene/SKILL.md +92 -0
package/skills/debug/SKILL.md +122 -0
package/skills/design/DEEP-MODULES.md +76 -0
package/skills/design/FUNCTIONAL-CORE.md +121 -0
package/skills/design/ILLEGAL-STATES.md +102 -0
package/skills/design/OBSERVABILITY.md +49 -0
package/skills/design/PERSONAS.md +41 -0
package/skills/design/SKILL.md +139 -0
package/skills/design/TESTABILITY.md +84 -0
package/skills/feature-doc/SKILL.md +113 -0
package/skills/feature-doc/templates/feature-template.md +52 -0
package/skills/formats/ADR-FORMAT.md +51 -0
package/skills/formats/CONTEXT-FORMAT.md +109 -0
package/skills/formats/CONTEXT-MAP-FORMAT.md +6 -0
package/skills/grill-plan/SKILL.md +112 -0
package/skills/improve-codebase-architecture/DEEPENING.md +37 -0
package/skills/improve-codebase-architecture/INTERFACE-DESIGN.md +41 -0
package/skills/improve-codebase-architecture/SKILL.md +115 -0
package/skills/investigate/SKILL.md +97 -0
package/skills/investigate/templates/research-note.md +84 -0
package/skills/pr-review/SKILL.md +197 -0
package/skills/prod-ready/SKILL.md +88 -0
package/skills/security-review/SKILL.md +145 -0
package/skills/simplify/SKILL.md +105 -0
package/skills/sync-check/SKILL.md +69 -0
package/skills/system-design/SKILL.md +160 -0
package/skills/tdd/SKILL.md +121 -0
package/skills/tdd/TESTS.md +93 -0
package/skills/tdd-rounds/COMMITS.md +122 -0
package/skills/tdd-rounds/SKILL.md +96 -0
package/skills/tdd-rounds/templates/builder-brief.md +73 -0
package/skills/tdd-rounds/templates/builder-report.md +21 -0
package/skills/verify-real-deps/MOTIVATION.md +18 -0
package/skills/verify-real-deps/SKILL.md +118 -0
package/skills/verify-real-deps/templates/known-issues.md +45 -0
package/skills/zoom-out/SKILL.md +104 -0

package/skills/bootstrap/SKILL.md ADDED Viewed

@@ -0,0 +1,47 @@
+---
+name: bootstrap
+description: Initializes a greenfield repository. Creates the docs/ directory, the initial CONTEXT.md, and the first ADR. Triggered by phrases like "new project", "initialize", "bootstrap".
+complexity: low
+expected_duration: 10 minutes
+---
+# Bootstrap
+This skill makes starting a new project a first-class workflow, establishing the durable artifacts that other skills rely on. It initializes the `docs/` structure and core terminology.
+## Why this skill exists
+Starting from a blank slate often leads to inconsistent documentation structure. This skill enforces a canonical starting point for vocabulary and architectural decisions.
+## When to use
+- Starting a new repository or service.
+- Initializing the skills framework in an existing repository that lacks `docs/CONTEXT.md`.
+## When to skip
+- The repository already has `docs/CONTEXT.md` and an established `docs/` structure.
+## Process
+### 1. Initialize docs/
+Create the standard directory structure:
+- `docs/`
+- `docs/adr/`
+- `docs/features/`
+- `docs/research/`
+### 2. Seed CONTEXT.md
+Ask the user for 3-7 core domain terms. Create `docs/CONTEXT.md` using the canonical format.
+### 3. Record ADR-0000
+If any major architectural decisions are made during initialization, record them in `docs/adr/0000-architectural-overview.md`.
+## Done when
+- `docs/` directory exists with the required subdirectories.
+- `docs/CONTEXT.md` is seeded with core domain terms.
+- (Optional) `docs/adr/0000-architectural-overview.md` exists.

package/skills/code-hygiene/SKILL.md ADDED Viewed

@@ -0,0 +1,92 @@
+---
+name: code-hygiene
+description: Day-to-day coding discipline at the line and function level — boring code, naming as primary refactor, YAGNI, rule of 3, locality of behavior. Use when reviewing or writing code, when names feel wrong, when tempted to abstract too early, when a solution looks clever, when the simplify pass after `tdd` runs, or when the user mentions "simpler", "boring", "naming", "YAGNI", "premature abstraction", "over-engineered". Skip for module-level interface design — use `design` instead. Skip for whole-codebase architectural sweeps — use `improve-codebase-architecture`.
+complexity: low
+expected_duration: 5 minutes
+---
+# Code Hygiene
+Day-to-day discipline that keeps a codebase readable, navigable, and easy to change. Smaller in scope than `design` (which shapes module interfaces) — these are line-level and function-level habits.
+Five principles.
+1. **Boring code beats clever code** — prefer the obvious solution over the elegant trick.
+2. **Naming is the primary refactor** — a bad name misleads longer than a bad implementation.
+3. **YAGNI** — don't build for hypothetical futures.
+4. **Rule of 3 before extracting** — duplicate twice; extract on the third occurrence, not the second.
+5. **Locality of behavior** — related code lives together; don't split by category.
+## When to use
+- Writing new code, line by line — keep these in mind as you type.
+- Reviewing a PR — these are five common smell categories.
+- After `tdd` reaches green, during the [`simplify`](../simplify/SKILL.md) sweep — `code-hygiene` is the lens you apply.
+- When you read code and pause to figure out what it's doing — that pause is a smell.
+## When to skip
+- Module-level shape (interface, depth, dependencies) — use `design`.
+- Whole-codebase sweeps for shallow modules — use `improve-codebase-architecture`.
+- The horizontal-vs-vertical TDD failure mode — that's `tdd`'s territory.
+## Principle 1: Boring code beats clever code
+When there's an obvious solution and a clever one, pick the obvious. Cleverness is a tax on every reader who comes after.
+**Smell**: a one-liner using bit manipulation, regex acrobatics, or chained ternaries to do what a four-line `if` would do clearly.
+**Rule of thumb**: if reading the code feels like solving a puzzle, that's a smell — even when the puzzle has a satisfying answer. Save cleverness for places where it earns its cost (a hot loop you've actually profiled, a parser, a constraint solver).
+## Principle 2: Naming is the primary refactor
+Bad code with great names is debuggable; great code with bad names misleads forever. Names live longer than implementations.
+**Smells**:
+- A variable named `data`, `result`, `tmp`, `value`, or `item` that survives more than ~5 lines.
+- A function named `process`, `handle`, `run`, or `do` that does anything specific.
+- A boolean named `flag`, or a name with `Manager` / `Helper` / `Util` suffix that hides what the thing actually is.
+- A type named after the *shape* of the data (`UserData`, `OrderInfo`) instead of its *meaning* (`UnverifiedUser`, `PendingOrder`).
+- A function name that doesn't match what it does (especially: `getX` that mutates, or `isX` that returns non-boolean).
+**Fix the name first.** Even before fixing the implementation. The name is the documentation everyone reads.
+## Principle 3: YAGNI — You Aren't Gonna Need It
+Don't build for hypothetical futures. Don't add a parameter "in case we need it later". Don't extract an interface "in case there's a second implementation". Don't write the configurable version of a thing that has one configuration.
+**Why**: hypothetical futures rarely arrive in the shape you predicted. Code written for them ages worse than code added when the need is real.
+**Exception**: when the cost of *not* designing for it later is provably much higher than the cost of designing for it now (e.g. schema migrations under load, public APIs with downstream consumers, security-sensitive surfaces). The bar is *provably* — not "I have a feeling".
+## Principle 4: Rule of 3 before extracting
+Duplicate twice; extract on the third occurrence — not the second.
+The first occurrence is unique. The second might be coincidence. The third is a pattern. Extracting at two reveals only one axis of variation; extracting at three reveals the *real* axis.
+**Why**: premature abstractions calcify. Once a wrong abstraction exists, callers shape themselves to it, and rewriting becomes expensive. Three concrete copies are cheap; one wrong abstraction is not.
+**Smell**: a helper function with one caller, or a base class with one subclass. That's an abstraction in search of a use.
+## Principle 5: Locality of behavior
+Related code lives close together. Don't split a system by *type of code* (`controllers/`, `services/`, `repositories/`) — split by *responsibility* (`orders/`, `billing/`, `auth/`).
+**Why**: a new contributor should be able to read one folder and understand one feature, not bounce across five folders to follow one request.
+**Smell**: changing one feature requires editing 5 files in 5 directories. That's a sign the structure separates *type* of code, not *responsibility*. (This is a `improve-codebase-architecture` issue at scale, but at smaller scale you can fix it inline by colocating files.)
+## Done when
+- Names communicate intent — a stranger reads them and forms the right mental model.
+- The clever shortcut is replaced with the obvious version (or its cleverness is justified by a comment naming the constraint).
+- No "in case we need it" parameters, classes, or interfaces remain.
+- Duplications either survived the 2-occurrence test (left as-is) or proved themselves at the 3rd occurrence (extracted).
+- Related code lives near related code.
+## Pairing with other skills
+- **`design`** sets module shape; `code-hygiene` polishes within the module. Different scopes; both apply.
+- **`tdd`** reaches green; `code-hygiene` is part of the simplify sweep that follows.
+- **`improve-codebase-architecture`** finds shallow modules; if the diagnosis is "shallow" but the fix is line-level (rename, inline, delete dead helper), this skill applies. If the fix is structural (deepen the module), that one does.

package/skills/debug/SKILL.md ADDED Viewed

@@ -0,0 +1,122 @@
+---
+name: debug
+description: Disciplined reproduction, isolation, and hypothesis-testing for non-trivial bugs — runs BEFORE `tdd` when the failing assertion isn't yet known. Use when the user reports a bug whose root cause is not obvious from the symptom — triggered by phrases like "it's broken", "this is failing", "intermittent", "flaky", "regression", "not sure why", "production issue", "doesn't work in <env>". Skip for typos, clear stack traces with one-step fixes, or bugs whose fix is obvious from reading the message. Pairs with `tdd` (downstream — the failing test crystallises once the bug is reproduced) and `zoom-out` (upstream, when the area is unfamiliar).
+complexity: high
+expected_duration: 45 minutes
+---
+# Debug
+The discipline of finding a root cause before writing a fix. TDD says "write a failing test"; for a non-trivial bug, you don't yet know what the failing test should assert. This skill is the step between *symptom* and *test*.
+## Why this skill exists
+Jumping to a fix without a clean reproduction risks fixing the wrong thing — or fixing the right thing for the wrong reason. Both leave the bug latent. The discipline of reproducing → isolating → hypothesis-testing produces:
+- A **minimum reproduction** the future failing test can assert on.
+- A **named root cause** distinct from the symptom — the thing that has to change.
+- A **bisected blast radius** — what else this might affect, what else might be affected by the fix.
+Without this, "fixed in production" often means "symptom no longer visible from the angle we looked at."
+## When to use
+- Bug whose root cause is not obvious from the message or stack trace.
+- Intermittent / flaky failure — passes locally, fails in CI; passes most of the time, fails sometimes.
+- Regression — worked yesterday, broken today, unclear what changed.
+- "Doesn't work in production" / "doesn't work in <env>" — environment-specific behaviour.
+- Concurrency, timing, or ordering-dependent symptoms.
+- The user says "I don't know what's wrong" — that's the trigger.
+## When to skip
+- Typo / off-by-one / null-check fixes obvious from the stack trace. Just fix it.
+- A test you wrote that fails with a clear assertion message — fix the code, the test already pinned the behaviour.
+- Bugs covered by an existing failing test — go straight to `tdd`'s green step.
+- "Bug" that's actually a feature request / unclear requirements — that's a `feature-doc` problem, not a `debug` one.
+## Phases
+Each phase is a stop. Don't start the next until the previous is grounded.
+### 1. Reproduce — find the smallest input that triggers it
+Without a reliable reproduction, every "fix" is a guess. The reproduction is the contract you're buying.
+- **Capture the symptom precisely.** What's the exact error / output / observable behaviour? What's the expected? Quote it; don't paraphrase.
+- **Capture the environment.** OS, language version, dependency versions, database version, env vars in play, time of day if relevant. Bugs hide in unstated context.
+- **Find the smallest input that triggers it.** Trim until removing one more piece makes the bug disappear. The minimum repro is the seed of the failing test.
+- **Make it reliable.** If it's intermittent, is it really 50/50, or 5%, or "only when run after test X"? Quantify or you can't tell when you've fixed it.
+If you cannot reproduce, **stop and say so.** "Can't reproduce" is a valid debug outcome that warrants better instrumentation, not a guess at a fix.
+### 2. Isolate — narrow to the failing region
+Don't read the whole codebase. Bisect.
+- **`git bisect`** for regressions. Find the commit that introduced the change.
+- **Logs / tracing** — add structured logs at suspect boundaries; don't read code that hasn't been confirmed to execute.
+- **Diff your assumptions against the code.** If you believe path A executes, prove it. Print, log, breakpoint.
+- **Walk the data, not the code.** Trace one specific input through the system; see where the actual value diverges from the expected. The divergence point is the bug's region.
+The output of this phase is a **named region**: a function, a config key, a boundary between two modules. Not "somewhere in the auth code" — `validateToken` at `auth.go:142`.
+### 3. Hypothesis test — one variable at a time
+For each suspected root cause, form a falsifiable hypothesis and test it.
+- **State the hypothesis.** "I think the bug is that X happens when Y." If you can't state it, you don't have one.
+- **Predict.** If the hypothesis is true, what should happen when I change Z? If it's false, what should happen?
+- **Test.** Change *one thing*. Observe.
+- **Update.** Hypothesis confirmed → proceed to fix. Falsified → form another. Don't try to confirm two hypotheses at once; you'll learn nothing from the result.
+Most "I don't know what's wrong" bugs are debugger-friendly with this discipline. Most "I tried five things" sessions skipped it.
+### 4. Name the root cause
+State the root cause in one sentence, distinct from the symptom.
+- **Symptom**: "checkout returns 500 on Tuesdays."
+- **Root cause**: "discount-rule cache TTL is 24h but the rule table is rebuilt nightly at 03:00 UTC; Tuesday-morning requests hit the stale cache because Monday's TTL hasn't expired."
+The root cause names *what has to change*. If the sentence is fuzzy, the bug isn't isolated yet — go back to phase 2.
+### 5. Hand off to TDD
+Once the root cause is named:
+- The **minimum reproduction** is the seed of the failing test (step 1 of `tdd`'s red phase).
+- The **named region** is where the fix lands.
+- The **hypothesis** describes what behaviour the fix changes.
+Run `tdd`: write a failing test that captures the reproduction, fix, refactor with the test as a safety net.
+## Optional artifact: bug research note
+For non-trivial bugs whose investigation produced real signal — bisected commits, environment-specific findings, surprising cross-module interactions — capture a research note at `docs/research/<bug-slug>.md` (use the `investigate` template's shape). The note reads as the post-mortem: what was symptom, what was root cause, why was it not caught earlier, what test would have caught it.
+Skip for bugs with one-paragraph stories. Capture for bugs with real lessons.
+## Anti-patterns
+- **Fix-then-verify.** "I think this is the issue, let me change it and see." That's hypothesis-testing without the discipline — every change becomes a confounder. Reproduce reliably first.
+- **Shotgun debugging.** Changing several things at once. If any one fixes it, you don't know which.
+- **Reading without running.** "I read the code and I think the bug is X." Read confirms hypothesis; running falsifies it. Run.
+- **Symptom-as-bug.** "Fixing" the visible error without naming the root cause. The fix might suppress the symptom while leaving the cause to surface elsewhere.
+- **Skipping the minimum repro.** A 1000-line repro is not a repro; it's an environment. Trim.
+- **Calling it intermittent without quantifying.** "Sometimes it fails" is not actionable. "9/100 runs in CI, 0/100 locally" is.
+## Pairing with other skills
+- **`tdd`** runs *after*. The reproduction becomes the failing test. The named region is where the fix lands.
+- **`zoom-out`** runs *before* if the area is unfamiliar. Map first, then debug — easier to bisect when you know the topology.
+- **`investigate`** runs *instead* if the "bug" turns out to be an unclear requirement (no obvious correct behaviour to assert). Don't force a debug session on what's actually a design question.
+- **`prod-ready`** Section 7's doc-map: if the bug surfaced a missed invariant, decision, or domain term, capture it on the way out (ADR / CONTEXT.md update).
+- **`verify-real-deps`** is the upstream prevention layer for wire-shape bugs against third-party APIs. If `debug` finds one of those, log it in `docs/known-issues.md` and tighten the fake.
+## Done when
+- A minimum, reliable reproduction exists.
+- The root cause is named in one sentence, distinct from the symptom.
+- The blast radius is known (what else this affects; what else could be affected by the fix).
+- Handoff to `tdd` is unambiguous: the test to write is clear from the reproduction.

package/skills/design/DEEP-MODULES.md ADDED Viewed

@@ -0,0 +1,76 @@
+# Deep Modules
+From John Ousterhout's *A Philosophy of Software Design*.
+> The principle is language-agnostic. Examples below use TypeScript for readability — translate naturally to Python (`requests`-style), Go (struct + methods), Rust (trait + impl), Kotlin / Java (interface + class), etc. The shape "small interface, deep implementation" matters; the syntax doesn't.
+## The idea
+A module's value is **functionality minus interface complexity**. Deep modules give you a lot of functionality behind a simple interface. Shallow modules give you little functionality and force the caller to deal with the complexity anyway.
+## Shallow vs deep
+```
+SHALLOW (avoid)              DEEP (prefer)
+┌───────────────────────┐    ┌──────────────┐
+│  open, read, seek,    │    │   readFile   │
+│  close, lock, unlock, │    └──────┬───────┘
+│  flush, ...           │           │
+├───────────────────────┤    ┌──────┴───────┐
+│  thin pass-through    │    │ open, read,  │
+└───────────────────────┘    │ close, retry,│
+                             │ buffer, ...  │
+                             └──────────────┘
+```
+The shallow version makes the *caller* manage file lifecycles. The deep version handles it internally and exposes one method.
+## Concrete example
+```ts
+// SHALLOW — caller does most of the work
+class HttpClient {
+  buildUrl(base: string, path: string, params: object): string;
+  buildHeaders(auth: string, contentType: string): object;
+  serialize(body: any): string;
+  parse(response: string): any;
+  send(method: string, url: string, headers: object, body: string): Promise<string>;
+}
+// caller code:
+const url = client.buildUrl(BASE, "/users", { id: 123 });
+const headers = client.buildHeaders(token, "application/json");
+const body = client.serialize({ name: "Alice" });
+const raw = await client.send("POST", url, headers, body);
+const result = client.parse(raw);
+```
+```ts
+// DEEP — interface hides the orchestration
+class HttpClient {
+  request<T>(method: string, path: string, options?: RequestOptions): Promise<T>;
+}
+// caller code:
+const result = await client.request("POST", "/users", {
+  query: { id: 123 },
+  body: { name: "Alice" },
+});
+```
+Same functionality. The deep version moved complexity from every caller into one place.
+## Questions to ask when designing
+- Can I merge two methods into one?
+- Can a parameter become an internal detail?
+- Does the caller need this option, or am I exposing it because it was easy?
+- If I removed this method, what would break — and could the remaining methods cover it?
+## Warning signs of shallow modules
+- Methods that are one or two lines of pass-through.
+- Callers who always call methods in the same sequence (that sequence belongs inside the module).
+- Many methods with very similar names (`getUserById`, `getUserByEmail`, `getUserByName`) — consider one `getUser(query)`.
+- Configuration options that no caller actually varies.

package/skills/design/FUNCTIONAL-CORE.md ADDED Viewed

@@ -0,0 +1,121 @@
+# Functional Core, Imperative Shell
+Coined by Gary Bernhardt ("Boundaries", 2012). Push pure logic to the center; keep side effects at the edges.
+> Language-agnostic. Examples use TypeScript; the same split works in Python (pure functions + dataclasses, side effects in the calling layer), Go (pure funcs + struct returns, side effects in the handler), Rust (`fn` returning `Result<Decision, _>`, effects in the binary's `main`-side), Kotlin (sealed classes for decisions, effects in coroutines / handlers). What matters: pure functions return values; the shell is the only place that touches the world.
+## The shape
+```
+┌─────────────────────────────────────┐
+│  Imperative shell                    │
+│  - HTTP / RPC handlers               │
+│  - DB queries                        │
+│  - File I/O                          │
+│  - Time, randomness, env             │
+│  - Network calls, queues             │
+│                                      │
+│  ┌──────────────────────────────┐   │
+│  │  Functional core              │   │
+│  │  - Pure transformations       │   │
+│  │  - Decisions                  │   │
+│  │  - Validations                │   │
+│  │  - Calculations               │   │
+│  │  - State derivations          │   │
+│  └──────────────────────────────┘   │
+└─────────────────────────────────────┘
+```
+The shell:
+- Reads inputs from the world
+- Calls the core
+- Writes outputs back to the world
+The core never touches the world directly.
+## Refactoring example
+WEAK — mixed
+```ts
+async function processOrder(orderId: string) {
+  const order = await db.orders.get(orderId);
+  const stock = await api.checkStock(order.items);
+  if (stock.allAvailable) {
+    order.status = "confirmed";
+    order.confirmedAt = new Date();
+    await db.orders.save(order);
+    await emailer.send(order.userId, "confirmed");
+    return order;
+  } else {
+    order.status = "backordered";
+    await db.orders.save(order);
+    return order;
+  }
+}
+```
+To test this you must mock `db`, `api`, `emailer`, AND the date. The test mostly verifies the mocks.
+STRONG — split
+```ts
+// Functional core — pure
+type OrderDecision =
+  | { kind: "confirm";     newStatus: "confirmed";   confirmedAt: Date }
+  | { kind: "backorder";   newStatus: "backordered" };
+function decideOrderStatus(
+  order: Order,
+  stock: StockReport,
+  now: Date,
+): OrderDecision {
+  if (stock.allAvailable) {
+    return { kind: "confirm", newStatus: "confirmed", confirmedAt: now };
+  }
+  return { kind: "backorder", newStatus: "backordered" };
+}
+// Imperative shell — thin
+async function processOrder(orderId: string) {
+  const order = await db.orders.get(orderId);
+  const stock = await api.checkStock(order.items);
+  const decision = decideOrderStatus(order, stock, new Date());
+  order.status = decision.newStatus;
+  if (decision.kind === "confirm") {
+    order.confirmedAt = decision.confirmedAt;
+  }
+  await db.orders.save(order);
+  if (decision.kind === "confirm") {
+    await emailer.send(order.userId, "confirmed");
+  }
+  return order;
+}
+```
+Now `decideOrderStatus` is pure: trivially tested with literal inputs and outputs, no mocks. The shell is short enough to verify by reading.
+## How to find your core
+Ask: *"If I removed all the awaits and the side-effecting calls, what's left?"* That residue is the candidate for the core. Lift it out as a pure function. Then ask: *"What is each side-effect call doing — fetching input, or writing output?"* Group them at the top and bottom of the shell respectively.
+A useful test: a pure core function should be safe to call ten thousand times in a tight loop with no consequences. If that's not safe, it's not pure yet.
+## Pairings with other principles
+- **With Principle 2 (testability):** the core is what your tests target. Mocks shrink from "everywhere" to "at the shell's edge only".
+- **With Principle 3 (illegal states):** the core often returns sum types (Decision, Result, Outcome) that the shell pattern-matches on. Bad transitions become uncompileable.
+- **With Principle 1 (deep modules):** the core is "deep" — many decisions hidden behind a single function call. The shell is "shallow" *by design* — its thinness is the point.
+## Limits — where this gets harder
+- **Streaming / long-running processes.** Purity is easier when inputs are bounded. For streams, lift transformations into pure operators (map, filter, fold) and keep the orchestration impure but small.
+- **Performance-critical paths.** If creating intermediate values is expensive, you may push more state into the shell. Profile first; don't preemptively sacrifice clarity.
+- **Logic that depends on intermediate API calls.** When the decision needs results from external calls partway through, the core/shell split moves *inside* each call boundary. The pattern still helps; the core just gets smaller chunks.
+## The smell that points to this principle
+If your function under test needs more mocks than there are lines of business logic, the logic is buried inside the shell. Lift it out.
+If two test cases have nearly-identical setup but assert on different decisions, the decision is a pure function in disguise. Lift it out.

package/skills/design/ILLEGAL-STATES.md ADDED Viewed

@@ -0,0 +1,102 @@
+# Make Illegal States Unrepresentable
+The single highest-leverage application of the type system: prevent bad states at compile time so you don't have to defend against them at runtime.
+The pattern: take a runtime invariant ("a verified user must have a verification timestamp") and encode it in the types. The bad state can no longer be constructed.
+## Example 1 — Optional fields that should move together
+WEAK
+```ts
+type User = {
+  email: string;
+  verifiedAt?: Date;
+  verificationToken?: string;
+};
+```
+This permits four combinations but only three are valid:
+| State | Valid? |
+|---|---|
+| email + neither | ✓ unverified |
+| email + token only | ✓ pending |
+| email + verifiedAt only | ✓ verified |
+| email + token + verifiedAt | ✗ incoherent |
+STRONG
+```ts
+type User =
+  | { kind: "unverified"; email: string }
+  | { kind: "pending";    email: string; token: string }
+  | { kind: "verified";   email: string; verifiedAt: Date };
+```
+Each state names itself. Pattern matching is exhaustive. The fourth (incoherent) state cannot compile.
+## Example 2 — Phantom types for pipeline stages
+When the same data flows through stages (e.g. `RawInput` → `Validated` → `Sanitized`), encode the stage in the type:
+```ts
+type Validated<T> = T & { _validated: true };
+type Sanitized<T> = T & { _sanitized: true };
+function validate(input: RawInput): Validated<RawInput> { ... }
+function sanitize(input: Validated<RawInput>): Sanitized<RawInput> { ... }
+function persist(input: Sanitized<RawInput>): void { ... }
+```
+`persist` cannot be called with raw input. The compiler enforces the order — no runtime guard needed.
+## Example 3 — Non-empty collections
+```ts
+// WEAK — runtime check forever, easy to forget
+function firstUser(users: User[]): User {
+  if (users.length === 0) throw new Error("empty");
+  return users[0];
+}
+// STRONG — the type enforces non-emptiness
+type NonEmpty<T> = [T, ...T[]];
+function firstUser(users: NonEmpty<User>): User {
+  return users[0]; // type-safe, no check needed
+}
+```
+Callers can't pass an empty array — it doesn't satisfy the type.
+## When the language can't fully encode
+In languages without sum types (older Java, plain JS):
+- **Use enum + private constructor + factory methods.** Each factory returns a value that's already in a valid state.
+- **Use builders that only expose `build()` once required fields are set.** The build method's signature changes as fields are populated (in TypeScript, this is doable via fluent builders + conditional types).
+- **Document the invariant in one place, validate at construction, never internally.** The closer construction lives to the type, the fewer scattered checks survive.
+The principle still holds: the closer the invariant lives to the type system, the fewer bugs survive.
+## Limits — when not to encode
+- **Invariants that change frequently.** If the rule is in flux, encoding it ossifies it. Use runtime validation until the rule stabilises, then encode.
+- **Invariants that span systems.** If the rule lives across multiple services, types in one service can't enforce it. Validate at the boundary.
+- **Invariants the type system can't express.** *"An order's total equals the sum of its line items"* — most type systems can't encode this. Validate at construction; treat the constructor as a quarantine.
+## The smell that points to this principle
+If you find yourself writing comments like:
+```
+// invariant: if status === "verified", verifiedAt must be set
+```
+That's the type system asking to be used. The comment will rot. The type won't.
+If you write defensive checks like:
+```
+if (!user.verifiedAt) throw new Error("user must be verified");
+```
+…and that check appears in more than one place, the type can carry that obligation instead. Lift the invariant.

package/skills/design/OBSERVABILITY.md ADDED Viewed

@@ -0,0 +1,49 @@
+# Observability by Design
+Principles for ensuring modules are transparent, debuggable, and "production-ready" at the architectural level.
+## The Principle: Deep Observability
+A **Deep Module** should hide its implementation complexity but **expose its operational health**. Observability is not a "sidecar" added later; it is a primary concern of the interface design.
+## 1. The Telemetry Port
+Every deep module should have a way to emit telemetry (metrics, logs, traces) without depending on a specific infrastructure provider.
+- **The Port**: The module defines a `Telemetry` interface (or a set of "Probes") within its own package.
+- **The Dependency**: Telemetry is a **required dependency** of the module, injected at instantiation.
+- **The Adapter**: The Imperative Shell (infrastructure layer) implements the interface and injects the concrete provider (e.g., Datadog, Prometheus, or a structured logger).
+- **The Benefit**: The core logic remains pure and testable; the operations team gets the data they need without the core knowing how it's collected.
+## 2. No Silent Failures
+If a module cannot satisfy its contract, it must fail loudly and descriptively.
+- **Descriptive Errors**: Errors must name the failing operation and the specific input that caused it.
+- **Contextual Wrapping**: As errors move from the core to the shell, wrap them with context (e.g., `"failed to process order: <reason>"`).
+- **Internal Health Probes**: For long-lived modules, provide a `Health()` check that verifies internal invariants or critical dependencies.
+## 3. The Traceable Path
+In asynchronous or distributed flows, ensure the module preserves and propagates **Correlation IDs**.
+- Every entry point should accept a context/correlation carrier.
+- Every internal log line should include the ID.
+- This allows a single user request to be traced through multiple deep modules.
+## 4. Performance Transparency
+Expose the "Boring" metrics that matter:
+- **Latency**: How long the deep implementation takes.
+- **Throughput**: How many requests are being handled.
+- **Error Rate**: Percentage of calls that return a failure.
+- **Saturation**: How close the module is to its internal limits (e.g., buffer sizes, connection pools).
+## Integration with Simplify
+During the `simplify` pass, apply the **Telemetry Lens**:
+- [ ] Does every error message name the failing input?
+- [ ] Are there any "catch-all" blocks that swallow errors?
+- [ ] Is there a Correlation ID being propagated if the flow is non-trivial?
+- [ ] Could a stranger debug a failure in this code using *only* the logs it emits?

package/skills/design/PERSONAS.md ADDED Viewed

@@ -0,0 +1,41 @@
+# Design Personas
+When designing a new system or deepening a module, use these parallel sub-agent personas to explore the design space. Based on the principle of "Design It Twice" — your first idea is rarely your best.
+These personas are used by `system-design`, `improve-codebase-architecture`, and `investigate`.
+## The Personas
+### 1. The Minimalist
+- **Goal**: Minimize the interface surface area.
+- **Strategy**: 1–3 entry points max. Hide everything else. If a feature can be accomplished by combining existing primitives, don't add a new one.
+- **Metric**: High **Leverage** (Functionality / Interface size).
+### 2. The Extensible (The Architect)
+- **Goal**: Support many use cases and future growth.
+- **Strategy**: Use hooks, providers, or plugin-style interfaces. Focus on the "Seam" where behavior can be altered without editing the module.
+- **Metric**: High **Flexibility** (Ease of change without breaking the contract).
+### 3. The Ergonomic (The Developer Advocate)
+- **Goal**: Make the most common caller's life trivial.
+- **Strategy**: Design for the "Happy Path." Provide high-level defaults and "Convention over Configuration."
+- **Metric**: Low **Cognitive Load** (Time-to-first-successful-call).
+### 4. The Hardened (The Security/Robustness Expert)
+- **Goal**: Prevent abuse and ensure reliability.
+- **Strategy**: Focus on "Illegal States Unrepresentable" and strict trust boundaries. Explicit timeouts, retries, and validation at every entry point.
+- **Metric**: High **Resilience** (Failure-resistance and security posture).
+### 5. The Observability-First (The SRE)
+- **Goal**: Ensure the module's internal state is transparent.
+- **Strategy**: Design with built-in telemetry, probes, and structured error propagation. No "Silent Failures."
+- **Metric**: High **Debuggability** (Time-to-root-cause during incidents).
+## Usage Pattern
+When exploring an interface:
+1. **Frame the problem space**: State the constraints and dependency categories.
+2. **Dispatch Personas**: Assign 2-3 of these personas to separate "Parallel Brains" (or sub-agent invocations).
+3. **Compare**: Contrast the results by **Depth**, **Locality**, and **Seam placement**.
+4. **Hybridize**: Pick the strongest elements to form the final recommendation.