npm - @ai-agent-lead/skills - Versions diffs - 1.0.0 - Mend

@ai-agent-lead/skills 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (48) hide show

package/README.md +37 -0
package/bin/install.js +272 -0
package/package.json +34 -0
package/skills/LANGUAGE.md +72 -0
package/skills/README.md +156 -0
package/skills/SKILL-TEMPLATE.md +120 -0
package/skills/TRIGGERS.md +64 -0
package/skills/WORKFLOWS.md +369 -0
package/skills/bench/SKILL.md +40 -0
package/skills/bench/templates/benchmark-report.md +26 -0
package/skills/bootstrap/BOOTSTRAP.md +13 -0
package/skills/bootstrap/SKILL.md +47 -0
package/skills/code-hygiene/SKILL.md +92 -0
package/skills/debug/SKILL.md +122 -0
package/skills/design/DEEP-MODULES.md +76 -0
package/skills/design/FUNCTIONAL-CORE.md +121 -0
package/skills/design/ILLEGAL-STATES.md +102 -0
package/skills/design/OBSERVABILITY.md +49 -0
package/skills/design/PERSONAS.md +41 -0
package/skills/design/SKILL.md +139 -0
package/skills/design/TESTABILITY.md +84 -0
package/skills/feature-doc/SKILL.md +113 -0
package/skills/feature-doc/templates/feature-template.md +52 -0
package/skills/formats/ADR-FORMAT.md +51 -0
package/skills/formats/CONTEXT-FORMAT.md +109 -0
package/skills/formats/CONTEXT-MAP-FORMAT.md +6 -0
package/skills/grill-plan/SKILL.md +112 -0
package/skills/improve-codebase-architecture/DEEPENING.md +37 -0
package/skills/improve-codebase-architecture/INTERFACE-DESIGN.md +41 -0
package/skills/improve-codebase-architecture/SKILL.md +115 -0
package/skills/investigate/SKILL.md +97 -0
package/skills/investigate/templates/research-note.md +84 -0
package/skills/pr-review/SKILL.md +197 -0
package/skills/prod-ready/SKILL.md +88 -0
package/skills/security-review/SKILL.md +145 -0
package/skills/simplify/SKILL.md +105 -0
package/skills/sync-check/SKILL.md +69 -0
package/skills/system-design/SKILL.md +160 -0
package/skills/tdd/SKILL.md +121 -0
package/skills/tdd/TESTS.md +93 -0
package/skills/tdd-rounds/COMMITS.md +122 -0
package/skills/tdd-rounds/SKILL.md +96 -0
package/skills/tdd-rounds/templates/builder-brief.md +73 -0
package/skills/tdd-rounds/templates/builder-report.md +21 -0
package/skills/verify-real-deps/MOTIVATION.md +18 -0
package/skills/verify-real-deps/SKILL.md +118 -0
package/skills/verify-real-deps/templates/known-issues.md +45 -0
package/skills/zoom-out/SKILL.md +104 -0

package/skills/prod-ready/SKILL.md ADDED Viewed

@@ -0,0 +1,88 @@
+---
+name: prod-ready
+description: Pre-merge production-readiness checklist — operational, infrastructure, and consistency checks that tests alone don't surface. Use after `tdd` reaches green; before opening a PR or merging to main; after significant infra changes (new DB, new deployment target, new auth flow); or when the user mentions "shipping", "ready to merge", "before deploy", "production readiness", or "prod-ready". Pairs with the `tdd` skill — tdd proves the feature works; this catches what tests can't see (server timeouts, DB pragmas, error-response consistency, secrets at rest).
+complexity: low
+expected_duration: 10 minutes
+---
+# Prod-Readiness Checklist
+A short pre-merge gate. Tests prove the **feature** works. This skill catches what tests don't see: ops defaults, infra config, security defense-in-depth, and cross-handler consistency.
+> The checklist is **stack-agnostic in concept**, even when an item names a Go/Node-style knob. Translate to your stack: "explicit read/write/idle timeouts" applies to any HTTP framework (Spring's `server.tomcat.connection-timeout`, FastAPI's Uvicorn `--timeout-keep-alive`, Rails' Puma `worker_timeout`, etc.). "Default isolation" applies to every database engine (Postgres `default_transaction_isolation`, MySQL `transaction-isolation`, SQLite `PRAGMA`, etc.). When an item doesn't apply to your stack at all (e.g. no HTTP surface), write `n/a` with one line of why. Don't tick boxes you didn't actually verify.
+## When to use
+- After all acceptance criteria are green.
+- Before opening a PR or merging to main.
+- After significant infrastructure changes (new DB, new deployment target, new auth flow).
+## Checklist
+Walk each section. An item is OK to fail **only if** the feature doc's Notes / Non-Goals explicitly accepts the gap.
+### 1. Server hygiene (HTTP / RPC services)
+- [ ] Server has explicit read / write / idle timeouts — not framework defaults. Default-zero (or framework "demo" defaults) are a common production-killer; equivalent traps exist in every stack.
+- [ ] Termination signal (SIGTERM, container stop, etc.) triggers graceful shutdown; in-flight requests finish before the process exits.
+- [ ] Unhandled errors in a handler don't crash the process — top-level recovery + structured log of the failure.
+- [ ] Request payload size is bounded at the boundary — a client streaming an unbounded body shouldn't OOM the process.
+### 2. Database / storage
+- [ ] Referential integrity enforced — FK constraints on, or an equivalent invariant maintained explicitly with a comment naming where it lives.
+- [ ] Concurrency / isolation mode set deliberately, not left on the engine's default. Default isolation often allows write-write races your tests didn't see.
+- [ ] Indexes match the actual filter + sort shape of hot queries — read the slow-query log or `EXPLAIN` the top endpoints; don't guess.
+- [ ] Migrations are forward-only and idempotent — safe to re-run after a partial deploy.
+### 3. Auth / security defense-in-depth
+- [ ] Tokens (sign-in, session, API keys) are hashed at rest — a DB leak shouldn't grant live sessions.
+- [ ] Inputs are validated *and normalized* at the boundary (email: lowercase + trim, etc.).
+- [ ] Secrets come from env / secret store, not flags, not files in repo.
+- [ ] No PII or tokens in logs or error response bodies.
+### 4. API consistency
+- [ ] One code path for error responses — don't mix raw transport errors with a JSON helper, or two response envelopes within the same surface.
+- [ ] Return shapes are consistent within a module / package — pick one error-propagation pattern and stick to it.
+- [ ] Errors are wrapped with context as they propagate, so production logs name the failing operation, not just the leaf error.
+### 5. Observability
+- [ ] One structured log line per request (method, path, status, latency).
+- [ ] Health endpoint that fails when a critical dependency (DB) is unreachable.
+### 6. Deferred-by-design
+- [ ] Items in the feature doc's "Non-Goals" / "Known production gaps" still appear there — no silent regressions.
+- [ ] New deferrals introduced this feature are linked to a tracking issue or follow-up doc.
+### 7. Documentation (the doc-map)
+Implementation lands → docs drift. The natural moment to catch drift is now, not "next sprint". One question per doc type: *did this work change X? Then update Y.* Files don't need to pre-exist — create `docs/adr/`, `CONTEXT.md`, design notes lazily when the first relevant change appears.
+- [ ] **New decision with viable alternatives** → ADR exists in `docs/adr/`, names what it supersedes (if anything), and is referenced from code where the decision is load-bearing.
+- [ ] **New or changed domain term** → `CONTEXT.md` entry created or updated. Includes `_Avoid_:` aliases if the term is at risk of being confused with an existing one.
+- [ ] **New/removed package, changed public interface, or shifted module boundary** → the feature's design note (e.g., `docs/features/<feature>.design.md`) updated. Specifically: Module map, File layout, public-interface signatures, Test boundaries.
+- [ ] **Changed acceptance criteria** → the feature doc reflects what was actually built. Silently-dropped or silently-added behavior is the most common drift class — fix here, don't kick to a follow-up.
+If a doc type isn't relevant to this work, write "n/a" — explicit beats implicit.
+## When to skip
+- Pure docs / comment / test-only changes.
+- Dependency bumps that don't change runtime behavior.
+- One-line business-logic bug fixes that don't touch infra, auth, or error paths.
+## Pairing with other skills
+- **[`tdd`](../tdd/SKILL.md)** — runs *before*. `prod-ready` runs after all ACs are green.
+- **[`simplify`](../simplify/SKILL.md)** — runs *between*. Sweep before the prod-ready gate.
+- **[`security-review`](../security-review/SKILL.md)** — runs *alongside* when the change is surface-changing. Section 3 here is the always-on minimum; `security-review` is the heavier threat-model pass.
+- **[`pr-review`](../pr-review/SKILL.md)** — the *reviewer's* mirror of this skill. The reviewer verifies your `prod-ready` pass actually landed (timeouts, migrations, structured logs, doc-map). A `prod-ready` checked off but not reflected in the diff is a blocker for them.
+- **[`verify-real-deps`](../verify-real-deps/SKILL.md)** — runs *after* this for releases that touch third-party APIs. Tag only after that's clean.
+## Done when
+- Every section walked; each item is `✓`, `✗ + remediation`, or `n/a + reason`.
+- Section 7 (doc-map) checks have been honestly answered — silent doc-drift is the most common regression class.
+- The `Non-Goals` / `Known production gaps` items in the feature doc are unchanged (no silent regressions).
+## Handoff
+This is the follow-on to `tdd`. When tdd is green, run this; when this passes, open the PR.

package/skills/security-review/SKILL.md ADDED Viewed

@@ -0,0 +1,145 @@
+---
+name: security-review
+description: Threat-model + control-review pass for surface-changing work — runs alongside `tdd` and as a heavier gate than `prod-ready` Section 3 when the change introduces or alters trust boundaries, authentication, authorization, sensitive data flows, or external surfaces. Use when the user mentions "security review", "threat model", "STRIDE", "auth flow", "permissions", "secrets", "PII", "public API", "external surface", "abuse", "hardening", or whenever a feature doc / PR adds a new entry point, identity flow, or sensitive-data path. Skip for pure-internal refactors with no surface change, dependency bumps that don't change runtime behavior, or doc-only changes. Pairs with `feature-doc` (informs the threat model) and `prod-ready` (which has the lighter operational-security checklist for every change).
+complexity: high
+expected_duration: 30 minutes
+---
+# Security Review
+A focused pass that maps the change's **trust boundaries**, enumerates **plausible threats**, and verifies the **controls** that prevent them. Heavier than `prod-ready` Section 3 (which is the always-on operational-security checklist); reserved for changes that actually move the security surface.
+## Why this skill exists
+Most security bugs don't come from missing crypto — they come from **missed trust boundaries**: a "trusted" input that a stranger can shape, an internal RPC reachable from outside, an admin endpoint without an authz check, a token logged at INFO. `prod-ready`'s 4-bullet defense-in-depth section catches generic mistakes; it does not catch "this new feature's data flow has a hole."
+This skill exists to make the threat-modeling step explicit when the surface changes — instead of hoping someone notices in PR review.
+## When to use
+A change is **surface-changing** (and this skill should fire) when *any* of these hold:
+- New external entry point (HTTP route, gRPC method, queue consumer, file/upload handler, public CLI flag).
+- New or changed **identity / session / token** flow (sign-in, OAuth, API key issuance, password reset, MFA).
+- New or changed **authorization** logic (role, permission, ownership check, multi-tenancy scope).
+- New or changed flow handling **sensitive data** (PII, payment, auth credentials, health, location, anything covered by a regulation or policy).
+- New **external dependency** that receives user-influenced data or that the system trusts (third-party callback, webhook, OAuth provider, queue origin).
+- A change to **how secrets are loaded, stored, or rotated**.
+- The user explicitly asks for a security review.
+## When to skip
+- Pure-internal refactor with no surface change (`improve-codebase-architecture` style).
+- Doc / comment / test-only changes.
+- Dependency bumps that don't change runtime behavior (still: check the changelog for security advisories).
+- Bug fix on an existing path that doesn't change the trust boundaries (`prod-ready` Section 3 is enough).
+## Phases
+### 0. Dependency Audit
+Before reviewing application logic, audit the supply chain. New or changed external dependencies are high-risk entry points.
+- **Scan Transitive Deps**: Run the ecosystem's audit tool (e.g., `npm audit`, `cargo audit`, `govulncheck`).
+- **License Compliance**: Ensure the license of any new dependency aligns with project policy.
+- **Vulnerability Check**: Check for known CVEs in the specific version being added.
+### 1. Map the surface
+Before reviewing controls, draw what's actually exposed. Most security holes hide in surfaces nobody listed.
+- **Entry points**: every external caller of the changed code path. HTTP routes, queue consumers, scheduled jobs, public CLI flags, library exports if this is a published package.
+- **Trust zones**: who is on the other side of each entry point? Anonymous internet, authenticated user (which role?), internal service, ops operator, the system itself.
+- **Data flows**: for each entry point, trace the data — what's stored, what's logged, what's forwarded to a third party, what's reflected back to the caller.
+- **Trust boundaries**: where data crosses from a less-trusted zone to a more-trusted one. Every boundary is a place input must be validated, sanitised, or authorised.
+Output is a short artifact (3–10 lines is fine for most changes) — usually appended to the feature doc as a `## Security surface` section, or, for high-stakes changes, a dedicated `docs/security/<feature>.md`.
+### 2. Enumerate threats
+Walk the surface and ask **what could go wrong on purpose**. The lightweight prompt is **STRIDE** (one or two threats per category is plenty for most reviews):
+| Category | Question |
+|---|---|
+| **S**poofing | Can someone pretend to be another user / service / origin? |
+| **T**ampering | Can data be modified in transit, at rest, or via a confused-deputy path? |
+| **R**epudiation | Can an actor deny having done something we need to attribute? (audit log holes) |
+| **I**nformation disclosure | Can data leak to a caller who shouldn't see it? Through responses, logs, error messages, timing? |
+| **D**enial of service | Can a single caller exhaust shared resources (CPU, memory, DB connections, third-party quota)? |
+| **E**levation of privilege | Can a caller cause the system to act with higher privilege than they have? |
+For small reviews, use a shorter prompt: **"Who could abuse this, and how?"** and **"What does the answer cost us?"**
+Output: a short list. Each threat is one line, plus its **likelihood** (plausible / requires-insider / theoretical) and **impact** (account-level / tenant-level / system-level).
+### 3. Verify controls
+For each plausible threat, name the **control** that prevents it (or detects it, or limits its blast radius). Controls fall in five buckets:
+1. **Authentication** — who is calling. The session / token / mTLS.
+2. **Authorization** — what they're allowed to do. The check, performed at the trust boundary, on every request — not just at login.
+3. **Input handling** — validation (shape) AND normalization (canonical form) at the boundary. Reject what you don't understand; don't silently coerce.
+4. **Output handling** — encoding for the destination context (HTML, SQL, shell, log line, error response). Never reflect untrusted input into a privileged context.
+5. **Secrets & data at rest** — secrets from env / vault, never from flags or repo. Tokens hashed at rest. PII minimised in logs and error bodies.
+For each control, **verify**: read the code that enforces it, or the test that pins it, or the config that activates it. "I assume framework X handles this" is not verification — confirm.
+### 4. Defense in depth — at least two layers for high-impact threats
+For threats with system-level or tenant-level impact, name **two independent controls** when feasible. A single control's failure mode shouldn't be a system-wide compromise.
+- Authn at the gateway *and* re-checked at the service.
+- Per-tenant scoping in the query *and* a row-level check after the query returns.
+- Rate limit at the edge *and* a quota in the data layer.
+Don't manufacture layers for low-impact threats — defense in depth is for things that hurt when they fail.
+### 5. Document deferred risks
+Anything not closed is captured. Two acceptable outcomes:
+- **Closed**: control exists and is verified. Note the file:line of the check or the test.
+- **Deferred**: risk is acknowledged, accepted, and tracked. Goes into the feature doc's `Non-Goals` (with the security framing) or into a follow-up tracking issue. Silent deferral is the failure mode.
+## Output
+Most reviews produce a **`## Security review` section appended to the feature doc**:
+```md
+## Security review
+**Surface**: <one-line description — entry points + trust zones touched>
+**Threats considered**:
+- <one-liner> — likelihood / impact — control: <where it lives, file:line>
+- ...
+**Deferred**:
+- <risk> — accepted because <reason>; tracked at <link or follow-up>
+```
+For high-stakes changes (new auth flow, new external surface, regulated data), promote to `docs/security/<feature>.md` with a fuller threat-model section. Same shape, just longer.
+## Anti-patterns
+- **Crypto cargo-cult.** "We added AES-256" doesn't address authz, validation, or trust boundaries. Crypto is one bucket of five; most bugs are elsewhere.
+- **Reviewing the diff, not the surface.** A diff might not touch the authz check that's now load-bearing for the new entry point. Walk the surface, then map back to the diff.
+- **STRIDE-by-rote.** Filling in all six categories with strawmen wastes everyone's time. Better one real threat per relevant category than six theoretical ones.
+- **"Framework handles it."** Maybe. Verify the framework is actually invoked on this code path with the configuration you assume. Defaults change between versions.
+- **Conflating security with prod-ready.** `prod-ready` Section 3 is operational security defaults (timeouts, hashing, secrets-from-env). `security-review` is the threat-model pass for surface-changing work. Run both when the surface changes.
+## Pairing with other skills
+- **`feature-doc`** captures the change being reviewed. The Security review section attaches there.
+- **`prod-ready`** Section 3 (auth/security defense-in-depth) is the always-on operational checklist. Run both — they don't substitute.
+- **`grill-plan`** can interleave for high-stakes auth / authz topology. If a load-bearing decision needs an ADR, route through `grill-plan`.
+- **`debug`** for security incidents post-deploy. The reproduction discipline applies; the root cause framing applies double.
+- **`pr-review`** flags a security-review as required when a PR's diff touches a surface-changing path.
+## Done when
+- The security surface (entry points + trust zones + data flows) is named, not assumed.
+- Each plausible threat has a verified control or an explicit deferral with rationale.
+- High-impact threats have two independent layers of control where feasible.
+- The artifact (feature-doc section or `docs/security/<feature>.md`) is in the repo.
+- The PR description references the review so the reviewer can audit it.

package/skills/simplify/SKILL.md ADDED Viewed

@@ -0,0 +1,105 @@
+---
+name: simplify
+description: Single end-of-round sweep that tightens what `tdd` just left green — review every changed file for reuse, quality, efficiency, and test relevance. Use after `tdd` reaches green and before opening a PR (or before a Builder closes a round in `tdd-rounds`). Triggered by phrases like "simplify pass", "tighten this", "clean up before commit", "end-of-round sweep", or appearing as a step in a Builder brief. Skip for trivial diffs (typo, dep bump, doc-only). Pairs with `tdd` (runs immediately after green), `code-hygiene` (the lens applied during the sweep), and `pr-review` (a self-check after this).
+complexity: low
+expected_duration: 10 minutes
+---
+# Simplify
+A single sweep over every file changed in the current slice (or round) to remove what doesn't earn its keep, before the diff is reviewed by anyone else.
+This is **one pass, not a license to refactor**. If you find structural issues that need their own round, surface them rather than expanding the sweep.
+## Why this skill exists
+`tdd` reaches green on the *minimum code that satisfies the test*. That's the right discipline — but the residue often includes:
+- Helpers extracted at one caller (premature abstraction).
+- Names that read OK in isolation but mislead next to the AC.
+- Tests that pass without actually exercising the AC (false-green).
+- Dead branches left behind from a path you tried and abandoned.
+The simplify pass catches these once, deliberately, before the diff lands. Without it, the residue compounds round-over-round.
+## When to use
+- After `tdd` reaches green, before opening a PR.
+- At the end of every round in `tdd-rounds` (Builder responsibility — a separate commit per [`tdd-rounds/COMMITS.md`](../tdd-rounds/COMMITS.md) rule 4).
+- After a focused refactor when you want a final sweep before merging.
+## When to skip
+- Typo / dep bump / lint-only / doc-only diffs.
+- Slices where the diff is one or two lines and there's nothing to sweep.
+- During red phase — never refactor while red. See [`tdd/SKILL.md`](../tdd/SKILL.md).
+## The four lenses
+Walk every changed file. Apply each lens in order. Fix what you find inline.
+### 1. Reuse — extract duplication that crossed file boundaries
+- Did a helper get inlined in two places this slice? Extract it (Rule of 3 may already apply if the third occurrence existed before this round).
+- Was a constant repeated? Hoist it.
+- **Don't extract speculatively.** A helper with one caller is an abstraction in search of a use.
+### 2. Quality — names, errors, abstractions
+- Names that read clearly out of context — would a stranger guess what `result`, `data`, `value` referred to? If not, rename.
+- Error messages that name the failing input — `"could not parse: <value>"` beats `"parse error"`.
+- Abstractions that haven't earned their keep — a base class with one subclass, an interface with one implementation. Inline.
+- Comments that explain *what* the code does — delete; the code already says it. Keep comments that explain *why* (a constraint, an invariant, a workaround).
+### 3. Efficiency — dead code, redundant work, premature defensive checks
+- Functions, parameters, branches that were used during exploration but the green test never exercises. Delete.
+- Defensive checks for conditions that can't happen given the type / caller contract. Delete.
+- Repeated computation across loop iterations or call sites. Hoist where the cost is real (don't preemptively optimize).
+- "In case we need it" parameters / interfaces. YAGNI — strip.
+### 4. Test relevance — tests that don't pull their weight
+For each test added in the slice:
+- Does it fail without the implementation change? If you revert the prod code and the test still passes, it's not testing the claim. **Strengthen or delete.**
+- Does it assert a behavior that matters to a caller, or shape that doesn't? See [`tdd/TESTS.md`](../tdd/TESTS.md).
+- Does the name describe behavior, not function?
+- Are mocks at boundaries only? Internal-collaborator mocks are a smell — fix the design, not the test.
+### 5. Telemetry — the observability pass
+Apply the lens from [`skills/design/OBSERVABILITY.md`](../design/OBSERVABILITY.md):
+- [ ] Does every error message name the failing input?
+- [ ] Are there any "catch-all" blocks that swallow errors?
+- [ ] Is there a Correlation ID being propagated if the flow is non-trivial?
+- [ ] Could a stranger debug a failure in this code using *only* the logs it emits?
+## What this sweep is not
+- **Not a structural refactor.** If you find a shallow module or a misplaced seam, surface it as an `Open question for parent` (in `tdd-rounds`) or a follow-up note (in single-feature flow). Don't expand simplify into `improve-codebase-architecture`.
+- **Not new behavior.** Simplify does not add features, fix bugs unrelated to this slice, or sneak in a "while I'm here" change.
+- **Not a license to rename across the codebase.** Rename what you wrote this slice; leave the rest.
+If a finding requires more than ten minutes of edits or touches files outside this slice's allowlist, stop simplifying and surface it.
+## Commit discipline
+In `tdd-rounds`, the simplify sweep is **its own commit** ([`tdd-rounds/COMMITS.md` rule 4](../tdd-rounds/COMMITS.md#L36)) — `R<N>: simplify — <one-line summary>`. Bundling the sweep into the AC commits hides "what changed because the AC required it" from "what changed because we cleaned up after."
+In single-feature flow (no rounds), the sweep can land as a separate commit before the PR or as part of the final commit if the sweep is small.
+## Pairing with other skills
+- **`tdd`** runs first. Simplify runs after green. Never simplify while red.
+- **`code-hygiene`** is the lens — its 5 principles (boring code, naming, YAGNI, rule of 3, locality) are what you apply during the sweep. Read it once; apply it many times.
+- **`pr-review`** comes after — a self-check against the diff. Some of the same lenses, applied as a reviewer rather than an author.
+- **`improve-codebase-architecture`** is the escalation — when simplify surfaces structural issues bigger than a sweep can fix.
+## Done when
+- Every changed file walked once with the four lenses.
+- Each finding either fixed inline (small) or surfaced as a follow-up (large).
+- Tests still green after the sweep — re-run them.
+- A separate `simplify` commit exists (in `tdd-rounds`) or the sweep is captured in the final commit message (in single-feature flow).

package/skills/sync-check/SKILL.md ADDED Viewed

@@ -0,0 +1,69 @@
+---
+name: sync-check
+description: Diagnostic "Context Auditor" that scans code changes for terminology drift against `CONTEXT.md` and architectural contradictions against `docs/adr/`. Use before `pr-review`, after significant refactors, or when names feel "off." Prevents term collisions (e.g., "Account" vs "Customer") and silent deviations from established architectural decisions. Pairs with `grill-plan` (to resolve contradictions found) and `pr-review` (downstream gate).
+complexity: low
+expected_duration: 10 minutes
+---
+# Sync Check (Context Auditor)
+The discipline of ensuring that the code's implementation matches the project's **Domain Language** and **Architectural History**.
+## Why this skill exists
+As a codebase grows, it is easy for synonyms ("User", "Account", "Profile") to bleed into the code, diluting the domain model. Similarly, architectural decisions recorded in ADRs are often forgotten by developers (or LLMs) focused on local implementation. `sync-check` is the background auditor that surfaces these drifts before they calcify.
+## When to use
+- Before starting a `pr-review` (as a primary step).
+- After a large refactor round in `tdd-rounds`.
+- When you encounter a term in the code that isn't in `CONTEXT.md`.
+- When the user asks "is this consistent with our model?" or "are we following our ADRs?"
+## When to skip
+- Initial project setup (pre-vocabulary).
+- Pure doc/comment changes.
+- Trivial typo/config fixes.
+## Process
+### 1. Identify the Scope
+Map the files changed in the current branch or round to their corresponding domain contexts.
+- Use `CONTEXT-MAP.md` if it exists.
+- Otherwise, use the root `CONTEXT.md`.
+### 2. Terminology Audit
+Scan the `git diff` for new or changed public identifiers (classes, functions, types, variables).
+- **Term Collision**: Check if a new term is listed as an `_Avoid_:` alias in `CONTEXT.md`.
+- **Fuzzy Mapping**: Check if a term exists in the code but is missing from the glossary.
+- **Synonym Drift**: Check if the code uses two different words for the same domain concept.
+### 3. ADR Consistency Check
+Read the `docs/adr/` entries relevant to the changed packages.
+- **Direct Contradiction**: Does the change perform an action explicitly forbidden by an ADR (e.g., using an ORM when an ADR mandates manual SQL)?
+- **Surprise Deviation**: Does the change introduce a pattern that warrants a new ADR (hard to reverse, surprising, real trade-off)?
+### 4. Report Drifts
+Output a numbered list of findings. For each:
+- **Location**: `file:line` or `ADR-NNNN`.
+- **Severity**: `Blocker` (direct contradiction/collision) or `Suggestion` (fuzzy mapping).
+- **Finding**: "Code uses 'Account', but `CONTEXT.md` specifies 'Customer'."
+- **Recommended Action**: "Rename to 'Customer' or update `CONTEXT.md` if the concept is genuinely new."
+## Done when
+- All changed files have been cross-referenced with the glossary and relevant ADRs.
+- Every "Avoid" alias found in the diff has been flagged.
+- Every architectural deviation from active ADRs has been surfaced.
+## Handoff
+- If contradictions are found → run `grill-plan` to resolve the terminology or architectural mismatch.
+- If clean → proceed to `pr-review`.

package/skills/system-design/SKILL.md ADDED Viewed

@@ -0,0 +1,160 @@
+---
+name: system-design
+description: System-level architecture for greenfield work — name the modules, define responsibilities, set dependency direction, identify seams. Use when starting a new multi-module system or service from scratch, when defining topology before code lands, or when the user mentions "system architecture", "module boundaries", "service boundaries", "how should I structure this system", "draw the architecture", "topology". Skip for single-module work — use `design` instead. Skip for reorganizing an existing codebase — use `improve-codebase-architecture`. Pairs with `investigate` (comes before, surveying options) and `design` (comes after, shaping each module's interface).
+complexity: high
+expected_duration: 30 minutes
+---
+# System Design
+Greenfield system architecture: deciding which modules should exist, what each is responsible for, how they depend on each other, and where the seams between them live.
+Distinct from `design` (which shapes ONE module's interface) and from `improve-codebase-architecture` (which deepens an EXISTING codebase). This skill is for the moment **before** code lands — you have a feature set and constraints, but no folder structure yet.
+## When to use
+- Starting a new project, service, or major subsystem from scratch.
+- After `investigate` chose a direction, before any per-module `design` happens.
+- When the codebase needs a coherent topology and one doesn't exist yet.
+## When to skip
+- Single-module work — use `design`.
+- Existing codebase reorganization — use `improve-codebase-architecture`.
+- Pure feature delivery within an established topology — go straight to `feature-doc`.
+## Vocabulary
+Canonical definitions in [skills/LANGUAGE.md](../LANGUAGE.md). The terms this skill leans on most: **Module**, **Responsibility**, **Seam**, **Dependency direction**, **Acyclic**, **Port**. Use them exactly — every artifact this skill produces (module table, ASCII map, ADRs) should read in the same vocabulary as the rest of the framework.
+## Process
+### 1. Survey
+Before drawing anything, read what already exists:
+- `docs/CONTEXT.md` — the domain language. Module names should come from here.
+- `docs/research/<topic>.md` — any prior `investigate` runs that constrain the design.
+- `docs/adr/` — decisions already taken.
+- The feature set — what the system has to do.
+If `CONTEXT.md` doesn't exist yet, **resolve the core domain terms first** — defer to [`grill-plan` bootstrap mode](../grill-plan/BOOTSTRAP.md) (single source of truth for the lazy-creation rules). Module names without domain terms are placeholders.
+### 2. List the modules
+Enumerate the modules that need to exist. For each, capture:
+- **Name** — from `CONTEXT.md` vocabulary. Not from a framework or layering convention.
+- **Responsibility** — one sentence. *"Manages X."* *"Receives Y events."* *"Provides Z queries."*
+- **Owns** — what state, types, or invariants live behind its interface.
+Aim for **few, deep modules** (per `design`'s Principle 1). Five deep modules beat fifteen shallow ones.
+If you can't say a module's responsibility in one sentence, split it or merge it.
+### 3. Define dependency direction
+For each pair of modules, decide:
+- Does A depend on B? Yes / No.
+- Can the dependency direction be reversed? (Sometimes a domain module shouldn't depend on infrastructure — invert via interface in domain, implementation in infra.)
+The result is a **directed graph**. Verify it's acyclic. If two modules genuinely depend on each other, they're one module.
+Conventions to reach for:
+- Domain modules don't depend on infrastructure — use ports-and-adapters (domain defines the interface, infra implements it).
+- Application / orchestration depends on domain, not the reverse. Domain logic should be runnable in tests with no infra.
+- Cross-cutting modules (logging, metrics) sit at the bottom — everything depends on them, they depend on nothing else.
+### 4. Identify seams
+For each cross-module dependency, name the seam:
+- In-process call (function/import)?
+- Queue / event bus (asynchronous)?
+- Network call (HTTP / gRPC)?
+- Database / shared state?
+Each seam is a potential test boundary AND a potential failure point. Naming them now means adapters live where you intend, not where they accidentally end up.
+### 5. Draw the system map
+Output an ASCII diagram + a module table + a seam list. Save to `docs/architecture.md` (create lazily on first use).
+```
+                 ┌──────────────┐
+                 │   HTTP API   │
+                 └──────┬───────┘
+                        │ commands / queries
+                        ▼
+   ┌──────────┐   ┌──────────────┐   ┌──────────────┐
+   │   Auth   │◄──┤   Ordering   │──►│   Billing    │
+   └──────────┘   └──────┬───────┘   └──────────────┘
+                         │ port
+                         ▼
+                  ┌──────────────┐
+                  │  Persistence │  ← adapter implements port
+                  │  (storage)   │
+                  └──────────────┘
+```
+Module table:
+| Module | Responsibility | Owns |
+|---|---|---|
+| HTTP API | Translates HTTP into domain commands and queries | DTOs, routes |
+| Ordering | Coordinates the order lifecycle | `Order`, `OrderLine`, `OrderStatus` |
+| Billing | Generates invoices and processes payments | `Invoice`, `Charge` |
+| Auth | Authenticates and authorizes requests | `Session`, `Permission` |
+| Persistence | Concrete storage adapter | (no domain types — implements interfaces from Ordering / Billing) |
+Seams:
+- HTTP API ↔ Ordering: in-process function calls (commands + queries).
+- Ordering ↔ Persistence: port (interface) defined in Ordering; adapter implements it.
+- Ordering ↔ Billing: in-process; events emitted by Ordering, consumed by Billing.
+### 6. Validate
+Before declaring the design done, check each item:
+- [ ] Every module's responsibility fits in one sentence.
+- [ ] The dependency graph is acyclic.
+- [ ] No two modules duplicate the same responsibility.
+- [ ] Domain modules don't depend on infrastructure modules (ports inverted).
+- [ ] The whole map fits on one screen — if it doesn't, you've over-decomposed.
+Failures here are not "warnings" — they're "stop and rework". The system map is load-bearing for everything downstream.
+## Done when
+- `docs/architecture.md` exists with: module table, dependency direction, seam list, ASCII map.
+- Each module name comes from `CONTEXT.md` vocabulary (not framework conventions).
+- The dependency graph is acyclic and explicitly reviewed.
+- Every cross-module boundary has a named seam + adapter location.
+- For any decision you're least sure about, you've raised it as an open question to the user before locking in.
+## Anti-patterns
+- **Layering instead of slicing.** Naming modules `controllers`, `services`, `repositories` is layering — a category split, not a responsibility split. Prefer feature slicing (`ordering/`, `billing/`).
+- **Anaemic domain modules.** A module that's only data with no logic should usually be folded into the module that owns the logic. Anaemic modules cost a seam without paying for it.
+- **Premature service extraction.** Splitting modules into separate processes / deployments without a real reason (scale, team, fault isolation) adds operational cost without clarity benefit. Prefer in-process modules first; extract later if needed.
+- **Cycles "for now".** A cyclic dependency is a sign the modules aren't really separate. Don't accept it; merge them or insert a third module to break the cycle.
+## Handoff
+Once the system map is stable:
+- For each module that needs a public interface → run `design` per module.
+- If a topology decision is hard-to-reverse / surprising / load-bearing → run `grill-plan` and write an ADR.
+- Then proceed to `feature-doc` for the first feature using the topology.
+If the system map needs to evolve later (new module, changed boundaries, removed seam), come back to this skill — don't drift the map silently.
+## Pairing with other skills
+- **`investigate`** runs *before*. Investigates the problem and chooses the direction. `system-design` then operationalizes the chosen direction into a topology.
+- **`design`** runs *after*, per module. `system-design` says what modules should exist; `design` says what shape each module has.
+- **`grill-plan`** runs alongside or after, to stress-test high-stakes topology decisions.
+- **`improve-codebase-architecture`** is the *opposite* skill: given an existing codebase, find what to deepen. `system-design` is greenfield; that one is brownfield.