npm - @ai-agent-lead/skills - Versions diffs - 1.0.0 - Mend

@ai-agent-lead/skills 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (48) hide show

package/README.md +37 -0
package/bin/install.js +272 -0
package/package.json +34 -0
package/skills/LANGUAGE.md +72 -0
package/skills/README.md +156 -0
package/skills/SKILL-TEMPLATE.md +120 -0
package/skills/TRIGGERS.md +64 -0
package/skills/WORKFLOWS.md +369 -0
package/skills/bench/SKILL.md +40 -0
package/skills/bench/templates/benchmark-report.md +26 -0
package/skills/bootstrap/BOOTSTRAP.md +13 -0
package/skills/bootstrap/SKILL.md +47 -0
package/skills/code-hygiene/SKILL.md +92 -0
package/skills/debug/SKILL.md +122 -0
package/skills/design/DEEP-MODULES.md +76 -0
package/skills/design/FUNCTIONAL-CORE.md +121 -0
package/skills/design/ILLEGAL-STATES.md +102 -0
package/skills/design/OBSERVABILITY.md +49 -0
package/skills/design/PERSONAS.md +41 -0
package/skills/design/SKILL.md +139 -0
package/skills/design/TESTABILITY.md +84 -0
package/skills/feature-doc/SKILL.md +113 -0
package/skills/feature-doc/templates/feature-template.md +52 -0
package/skills/formats/ADR-FORMAT.md +51 -0
package/skills/formats/CONTEXT-FORMAT.md +109 -0
package/skills/formats/CONTEXT-MAP-FORMAT.md +6 -0
package/skills/grill-plan/SKILL.md +112 -0
package/skills/improve-codebase-architecture/DEEPENING.md +37 -0
package/skills/improve-codebase-architecture/INTERFACE-DESIGN.md +41 -0
package/skills/improve-codebase-architecture/SKILL.md +115 -0
package/skills/investigate/SKILL.md +97 -0
package/skills/investigate/templates/research-note.md +84 -0
package/skills/pr-review/SKILL.md +197 -0
package/skills/prod-ready/SKILL.md +88 -0
package/skills/security-review/SKILL.md +145 -0
package/skills/simplify/SKILL.md +105 -0
package/skills/sync-check/SKILL.md +69 -0
package/skills/system-design/SKILL.md +160 -0
package/skills/tdd/SKILL.md +121 -0
package/skills/tdd/TESTS.md +93 -0
package/skills/tdd-rounds/COMMITS.md +122 -0
package/skills/tdd-rounds/SKILL.md +96 -0
package/skills/tdd-rounds/templates/builder-brief.md +73 -0
package/skills/tdd-rounds/templates/builder-report.md +21 -0
package/skills/verify-real-deps/MOTIVATION.md +18 -0
package/skills/verify-real-deps/SKILL.md +118 -0
package/skills/verify-real-deps/templates/known-issues.md +45 -0
package/skills/zoom-out/SKILL.md +104 -0

package/skills/design/SKILL.md ADDED Viewed

@@ -0,0 +1,139 @@
+---
+name: design
+description: Module and interface design principles — deep modules and testable interfaces. Use when designing a NEW module, class, or public API; deciding what to expose vs hide; reviewing an interface before implementation; or when the user asks "how should I structure this", mentions "deep modules", "testability", or "API design". Use for NEW code shape; for finding deepening opportunities in EXISTING code, use `improve-codebase-architecture`. Skip for trivial glue, getters/setters, or single-call wrappers. Pairs with the tdd skill — good design is what makes TDD pleasant.
+complexity: medium
+expected_duration: 20 minutes
+---
+# Module and Interface Design
+Principles that make code easier to understand, change, and test.
+1. **Deep modules** — small interface, lots hidden behind it.
+2. **Testable interfaces** — accept dependencies, return results, keep the surface small.
+3. **Illegal states unrepresentable** — encode runtime invariants in the type system.
+4. **Functional core, imperative shell** — pure logic at the center, side effects at the edges.
+## When to use
+- Before writing a new module, class, or public function.
+- When refactoring something that's painful to test.
+- When reviewing an API surface (PRs, design docs).
+- When a test is hard to write — usually the design is wrong, not the test.
+## When to skip
+- Trivial glue, getters/setters, single-call wrappers — design is overkill.
+- Existing code that needs deepening — use [`improve-codebase-architecture`](../improve-codebase-architecture/SKILL.md).
+- Whole-system topology (which modules should exist) — use [`system-design`](../system-design/SKILL.md).
+## Principle 1: Deep modules
+A **deep module** has a small, simple interface that hides a lot of complexity.
+```
+┌──────────────────┐
+│  Small interface │  few methods, simple params
+├──────────────────┤
+│                  │
+│  Deep            │  complex logic, hidden
+│  implementation  │
+│                  │
+└──────────────────┘
+```
+A **shallow module** has a wide interface and thin implementation — it just passes through. Avoid.
+```
+┌──────────────────────────────┐
+│      Large interface         │  many methods, leaky params
+├──────────────────────────────┤
+│   Thin implementation        │  mostly pass-through
+└──────────────────────────────┘
+```
+When designing a module, ask:
+- Can I reduce the number of methods?
+- Can I simplify the parameters?
+- Can I hide more complexity behind the interface?
+See [DEEP-MODULES.md](DEEP-MODULES.md) for examples.
+## Principle 2: Design for testability
+Good interfaces make testing natural. Three rules:
+1. **Accept dependencies, don't create them.** Pass them in.
+2. **Return results, don't produce side effects.** Pure functions are trivially testable.
+3. **Small surface area.** Fewer methods, fewer params.
+See [TESTABILITY.md](TESTABILITY.md) for examples and counter-examples.
+## Principle 3: Make illegal states unrepresentable
+Push runtime invariants into the type system. If a state can't exist, the compiler should reject it — not a runtime check, not a comment.
+```
+WEAK — both fields can be set or neither; runtime has to validate
+type User = { email?: string; verifiedAt?: Date }
+STRONG — the type enforces the invariant; the bad state can't compile
+type User =
+  | { kind: "unverified", email: string }
+  | { kind: "verified",   email: string, verifiedAt: Date }
+```
+The difference: *"this can't happen"* in a comment (hope) vs *"this can't compile"* in the type system (guarantee). The second eliminates whole classes of bugs at compile time.
+Applies in any language with sum types or discriminated unions (TS, Rust, Swift, OCaml, Kotlin, Python with `Literal`-tagged unions). In languages without them, approximate with enums + private constructors + factory methods, or builders that only expose `build()` once required fields are set.
+See [ILLEGAL-STATES.md](ILLEGAL-STATES.md) for more examples and the limits of the pattern.
+## Principle 4: Functional core, imperative shell
+Push pure logic to the center; keep side effects (HTTP, DB, file I/O, time, randomness) at the edges. The functional core becomes trivially testable; the shell stays small enough to verify by inspection.
+```
+┌────────────────────────────────┐
+│  Imperative shell               │  ← side effects live here
+│  (small, hard to test fully)    │
+│                                 │
+│  ┌─────────────────────────┐    │
+│  │  Functional core         │   │  ← same input → same output
+│  │  (large, pure, trivial   │   │     no side effects
+│  │  to test)                │   │
+│  └─────────────────────────┘    │
+└────────────────────────────────┘
+```
+Composes with Principle 2 — the core is what tests target, so mocks shrink to a handful at the shell boundary. Composes with Principle 3 — the core often returns Result-style values that the shell pattern-matches on.
+See [FUNCTIONAL-CORE.md](FUNCTIONAL-CORE.md) for a concrete refactor example.
+## How this connects to TDD
+If TDD feels painful — tests need elaborate setup, mocks of internals, or peeking at private state — the design is wrong. Fix the interface, not the test. The two skills work as a pair:
+- `design` decides *what the interface should be*.
+- `tdd` decides *how to build it test-first*.
+## Done when
+The interface is small enough that you can describe it in one sentence, and the testability rules hold:
+- Dependencies are accepted, not created internally.
+- The module returns results; side effects are isolated.
+- Surface area (method count + parameter count) is minimized.
+If TDD then feels painful, return here — the design needs another pass.
+## Optional artifact: design note
+Most of the time `design` is guidance only — the shape lives in the code that follows. **Capture a sibling design note when**:
+- The module shape is non-trivial enough that a future contributor would benefit from seeing the reasoning (interface options considered, why one was picked, what's hidden behind the seam).
+- The interface decisions are load-bearing for downstream rounds of `tdd-rounds`.
+- A `prod-ready` reviewer will need to verify "module map / public-interface signatures / test boundaries" against something explicit (per [prod-ready Section 7](../prod-ready/SKILL.md)).
+When captured, save as a sibling to the feature doc: `docs/features/<feature>.design.md`. Skip when the interface is small enough that the code is the design.

package/skills/design/TESTABILITY.md ADDED Viewed

@@ -0,0 +1,84 @@
+# Designing for Testability
+Three rules that make code testable without sacrificing clarity.
+> Examples below use TypeScript for readability. The rules apply identically in Python (constructor injection + `Protocol`), Go (interface as parameter, fake struct in tests), Rust (trait object, mock impl), Kotlin / Java (interface + DI), and any other language with first-class function or interface values. Translate the syntax; the shape stays.
+## 1. Accept dependencies, don't create them
+```ts
+// HARD TO TEST — gateway is hardcoded
+function processOrder(order: Order) {
+  const gateway = new StripeGateway(process.env.STRIPE_KEY);
+  return gateway.charge(order.total);
+}
+// EASY TO TEST — gateway is passed in
+function processOrder(order: Order, gateway: PaymentGateway) {
+  return gateway.charge(order.total);
+}
+```
+In tests, pass a fake or stub gateway. In production, pass the real one. No mocking framework needed.
+## 2. Return results, don't produce side effects
+```ts
+// HARD TO TEST — mutates the cart, returns nothing
+function applyDiscount(cart: Cart): void {
+  cart.total -= computeDiscount(cart);
+}
+// EASY TO TEST — pure function
+function calculateDiscount(cart: Cart): Discount {
+  return { amount: computeDiscount(cart) };
+}
+```
+Pure functions are the easiest things in software to test — same input, same output, no setup, no teardown.
+## 3. Keep the surface area small
+- Fewer methods → fewer tests needed.
+- Fewer parameters → simpler test setup.
+- Each method should do one thing.
+## Mocking guidance
+Mock only at **system boundaries**:
+- External APIs (payment, email, third-party services)
+- Time and randomness
+- File system (sometimes — prefer a temp dir)
+- Databases (sometimes — prefer a real test DB)
+**Don't mock your own code.** If you find yourself mocking an internal class to test another internal class, the design is wrong — the two are too tightly coupled. Fix the interface so they can be tested independently, or test them together as one unit.
+## SDK-style over generic fetchers
+When wrapping an external API, prefer named methods over one generic call:
+```ts
+// GOOD — each call is independently mockable, type-safe per endpoint
+const api = {
+  getUser: (id: string) => fetch(`/users/${id}`),
+  createOrder: (data: OrderInput) => fetch("/orders", { method: "POST", body: data }),
+};
+// BAD — mocking requires conditional logic in the mock
+const api = {
+  call: (endpoint: string, options?: object) => fetch(endpoint, options),
+};
+```
+The SDK style means each test sets up exactly the calls it needs, and you can see at a glance which endpoints a test exercises.
+## The test-difficulty signal
+If a test is hard to write, the design is probably wrong. Common smells:
+- Test needs to mock five things → too many dependencies, or wrong dependencies.
+- Test needs to peek at private state → behavior isn't observable through the interface.
+- Test setup is longer than the test itself → constructor or factory is doing too much.
+Treat test pain as design feedback. Fix the code, not the test.

package/skills/feature-doc/SKILL.md ADDED Viewed

@@ -0,0 +1,113 @@
+---
+name: feature-doc
+description: One-page contract for a non-trivial feature or bug fix — Problem, User Story, Acceptance Criteria, Non-Goals. The ACs become the test list for `tdd`. Use before any non-trivial feature or bug fix; when the user mentions "spec this out", "write a feature doc", "before any non-trivial feature", or describes a feature without listing ACs. Skip for typo fixes, dependency bumps, or pure refactors. Pairs with `tdd` / `tdd-rounds` (downstream — ACs feed the test list), `investigate` (upstream — when direction itself is unclear), and `grill-plan` (when the chosen plan needs stress-testing against existing model).
+complexity: low
+expected_duration: 10 minutes
+---
+# Feature Doc
+Every non-trivial change starts with a one-page doc in `docs/features/<short-name>.md`. The doc is the **contract**: ACs become tests, Non-Goals prevent scope creep, and reviewers check the PR against this — not against memory.
+## Why this skill exists
+Without a contract, three failure modes are common:
+- **Scope creep.** "While I'm in here…" turns a 1-AC change into a 4-AC change with no review of the new scope.
+- **Untestable claims.** "Improve X" can't fail; ACs forced into Given/When/Then can.
+- **Silent regressions.** Behavior added or dropped during implementation never surfaces in review because the diff is the only thing being read.
+The doc is short by design — one page. If it grows, the feature is too big.
+## When to use
+- Any non-trivial feature or bug fix where the ACs aren't already pinned somewhere.
+- After `investigate` chooses a direction (the research note triggers a feature doc).
+- After `debug` names a root cause and the fix is more than a one-liner.
+## When to skip
+- Typo fixes, dependency bumps, lint-only / formatter-only diffs.
+- Pure refactors with no AC change — use `improve-codebase-architecture` instead.
+- One-line config tweaks with no behavior change.
+- The change already has a feature doc; you're iterating, not starting fresh.
+## Steps
+1. Copy [`templates/feature-template.md`](templates/feature-template.md) to `docs/features/<short-name>.md`.
+2. Fill in **Problem**, **User Story**, **Acceptance Criteria**, **Non-Goals**.
+3. Get one round of review on the doc **before** writing code.
+4. Update the doc if behavior changes during implementation — stale docs are worse than none.
+## Rules
+- **One page max.** If it grows longer, split the feature.
+- **Each AC must be testable** — Given / When / Then. "Improves performance" is not an AC; "p99 of `/api/orders` is < 200ms under 100 RPS" is.
+- **Non-Goals is not optional.** Empty Non-Goals usually means unclear scope. Even "none — see scope in Problem" is better than missing.
+- **Doc lives in the repo, versioned with the code.** Not in a wiki, not in a Notion page, not in a Linear ticket alone.
+- **Tick AC boxes only when the test is green AND merged** — not when implementation starts.
+## Example: weak vs strong ACs
+```md
+WEAK
+- [ ] Users can reset their password.
+- [ ] Reset link should expire.
+- [ ] Improve email deliverability.
+STRONG
+- [ ] Given a registered user, when they POST `/auth/reset` with their email,
+      then the system emails a single-use token valid for 15 minutes.
+- [ ] Given an expired or already-consumed token, when the user POSTs
+      `/auth/reset/confirm` with it, then the response is `410 Gone` and
+      no password change occurs.
+- [ ] Given a valid token, when the user POSTs a new password,
+      then the user can authenticate with the new password and the token
+      is marked consumed.
+```
+The strong list translates 1:1 into tests. The weak list translates into vibes.
+## Definition of Done for the doc
+- [ ] Problem stated in 2–3 sentences.
+- [ ] At least one acceptance criterion in Given / When / Then form.
+- [ ] Non-goals listed (or explicitly "none — see scope in Problem").
+- [ ] Reviewed by at least one other person.
+## Anti-patterns
+- **AC list as task list.** "Implement the login form" is a task; "Given a user with valid credentials, when they POST `/login`, then they receive a session cookie" is an AC. Tasks belong in your TODO; ACs belong in the contract.
+- **Aspirational ACs.** "System is highly performant" — not testable, not falsifiable. Pin it: latency / throughput / error-rate threshold.
+- **Empty Non-Goals as a habit.** "We'll figure out scope later" hides the disagreement; surface it now.
+- **Spec written after the code.** The doc retrofits whatever shipped. The discipline is lost — at that point the diff is the contract, not the doc.
+- **Doc lives in a wiki.** Drift starts immediately. The doc must live with the code.
+- **Hidden behavior in the diff.** A feature doc with 3 ACs but a PR that ships 5 changes — the silent two are the most likely place a regression hides.
+## Pairing with other skills
+- **`investigate`** — runs *before* when direction is unclear. The research note's "Decided" recommendation triggers the feature doc.
+- **`grill-plan`** — runs *after* when the chosen plan needs stress-testing against existing CONTEXT.md / ADRs. Or in **bootstrap mode** when the vocabulary itself is fuzzy.
+- **`tdd`** — runs *next* for single-package, manageable AC count. Each AC becomes one TDD slice.
+- **`tdd-rounds`** — runs *next* when the AC count is large (≥10) or multi-package. The AC list maps to round splits.
+- **`security-review`** — runs *alongside* `tdd` when the feature is surface-changing (new entry point, identity flow, sensitive data path).
+- **`prod-ready`** — runs after green, against this doc. Section 7 catches doc-drift if behavior shifted during implementation.
+## Handoff
+Once the doc is reviewed and ACs are stable:
+- **Single-feature delivery** (one package, manageable AC count) → run [`tdd`](../tdd/SKILL.md). Each AC becomes one TDD slice.
+- **Larger delivery** (≥10 ACs or multi-package) → run [`tdd-rounds`](../tdd-rounds/SKILL.md). The AC list maps to round splits.
+- **Direction still feels uncertain after writing the doc**:
+  - Project has `CONTEXT.md` and/or ADRs → run [`grill-plan`](../grill-plan/SKILL.md) to stress-test against the existing model.
+  - No `CONTEXT.md` or ADRs yet → run [`investigate`](../investigate/SKILL.md) to map options first; or, if the plan is chosen but vocabulary is fuzzy, run `grill-plan` in [bootstrap mode](../grill-plan/BOOTSTRAP.md).
+- **Hard-to-reverse / surface-changing** (new auth flow, public API, sensitive data flow) → run [`security-review`](../security-review/SKILL.md) alongside `tdd` / `tdd-rounds`.
+## Done when
+- `docs/features/<short-name>.md` exists with all four required sections.
+- ACs are testable Given / When / Then statements.
+- Non-Goals is non-empty (or explicitly "none — see scope in Problem").
+- One reviewer has signed off.
+- Status is `Approved` and the next skill (`tdd` / `tdd-rounds` / `grill-plan` / `security-review`) is named.

package/skills/feature-doc/templates/feature-template.md ADDED Viewed

@@ -0,0 +1,52 @@
+# <Feature Name>
+**Status:** Draft | Approved | In Progress | Shipped
+**Owner:** <name — primary author and point of contact for follow-up>
+**Date:** YYYY-MM-DD
+<!--
+Status values:
+- Draft       — being written; not yet reviewed
+- Approved    — reviewed; ready to start implementation
+- In Progress — implementation underway
+- Shipped     — all acceptance criteria checked AND merged to main
+-->
+## Problem
+What user pain or business need does this solve? (2-3 sentences.)
+## User Story
+As a <role>, I want <capability> so that <outcome>.
+## Acceptance Criteria
+Each bullet must be testable and map to at least one test. Tick a box only once the test covering it is green and merged — not when implementation starts.
+- [ ] Given <context>, when <action>, then <result>.
+- [ ] Given <context>, when <action>, then <result>.
+## Non-Goals
+What this feature explicitly does NOT do. Prevents scope creep.
+This is about *capabilities deliberately not built*. (A research note's "Out of Scope" is about *decisions deliberately deferred* — different concept.)
+- ...
+## Related
+Cross-links to other artifacts. Omit any subsection that doesn't apply.
+- **ADRs:** [`ADR-NNNN <title>`](../adr/NNNN-slug.md) — why this one matters here
+- **Research notes:** [`<topic>`](../research/<topic>.md)
+- **Design note:** [`<short-name>.design.md`](./<short-name>.design.md) — when module shape was decided before code
+## Notes
+Open questions, links, or design tradeoffs. Optional.
+## Sign-off
+- [ ] Reviewed by <name>

package/skills/formats/ADR-FORMAT.md ADDED Viewed

@@ -0,0 +1,51 @@
+# ADR Format
+ADRs live in `docs/adr/` and use sequential numbering: `0001-slug.md`, `0002-slug.md`, etc.
+Create the `docs/adr/` directory lazily — only when the first ADR is needed.
+## Template
+```md
+# {Short title of the decision}
+**Status:** accepted
+**Date:** YYYY-MM-DD
+{1-3 sentences: what's the context, what did we decide, and why.}
+```
+That's it. An ADR can be a single paragraph plus the two header lines. The value is in recording *that* a decision was made and *why* — not in filling out sections.
+## Optional sections
+Only include these when they add genuine value. Most ADRs won't need them. **Optional sections amplify a decision; they do not replace it.** Keep them terse — if a section grows past a short bullet list, ask whether the content belongs in a feature doc or design note instead. An ADR is not a feature spec.
+- **Status** values: `proposed | accepted | deprecated | superseded by ADR-NNNN`. Default to `accepted` on creation; flip to `deprecated` or `superseded` when revisited.
+- **Considered Options** — only when the rejected alternatives are worth remembering. One line per option, plus the reason for rejection.
+- **Consequences** — only when non-obvious downstream effects need to be called out. Things the *code* will not make obvious.
+- **Related** — cross-link to the feature-doc or research-note that triggered this ADR (recommended when one exists), and to peer ADRs it interacts with.
+## Numbering
+Scan `docs/adr/` for the highest existing number and increment by one. Numbers are monotonic — never reuse a retired number, even if the ADR is deleted, deprecated, or superseded. Gaps are acceptable; reuse causes confusion.
+## When to offer an ADR
+All three of these must be true:
+1. **Hard to reverse** — the cost of changing your mind later is meaningful
+2. **Surprising without context** — a future reader will look at the code and wonder "why on earth did they do it this way?"
+3. **The result of a real trade-off** — there were genuine alternatives and you picked one for specific reasons
+If a decision is easy to reverse, skip it — you'll just reverse it. If it's not surprising, nobody will wonder why. If there was no real alternative, there's nothing to record beyond "we did the obvious thing."
+### What qualifies
+- **Architectural shape.** "We're using a monorepo." "The write model is event-sourced, the read model is projected into Postgres."
+- **Integration patterns between contexts.** "Ordering and Billing communicate via domain events, not synchronous HTTP."
+- **Technology choices that carry lock-in.** Database, message bus, auth provider, deployment target. Not every library — just the ones that would take a quarter to swap out.
+- **Boundary and scope decisions.** "Customer data is owned by the Customer context; other contexts reference it by ID only." The explicit no-s are as valuable as the yes-s.
+- **Deliberate deviations from the obvious path.** "We're using manual SQL instead of an ORM because X." Anything where a reasonable reader would assume the opposite. These stop the next engineer from "fixing" something that was deliberate.
+- **Constraints not visible in the code.** "We can't use AWS because of compliance requirements." "Response times must be under 200ms because of the partner API contract."
+- **Rejected alternatives when the rejection is non-obvious.** If you considered GraphQL and picked REST for subtle reasons, record it — otherwise someone will suggest GraphQL again in six months.

package/skills/formats/CONTEXT-FORMAT.md ADDED Viewed

@@ -0,0 +1,109 @@
+# CONTEXT.md Format
+## Structure
+```md
+# {Context Name}
+{One or two sentence description of what this context is and why it exists.}
+## Language
+**Order**:
+{A concise description of the term}
+_Avoid_: Purchase, transaction
+**Invoice**:
+A request for payment sent to a customer after delivery.
+_Avoid_: Bill, payment request
+**Customer**:
+A person or organization that places orders.
+_Avoid_: Client, buyer, account
+## Relationships
+- An **Order** produces one or more **Invoices**
+- An **Invoice** belongs to exactly one **Customer**
+## Example dialogue
+> **Dev:** "When a **Customer** places an **Order**, do we create the **Invoice** immediately?"
+> **Domain expert:** "No — an **Invoice** is only generated once a **Fulfillment** is confirmed."
+## Flagged ambiguities
+- "account" was used to mean both **Customer** and **User** — resolved: these are distinct concepts.
+```
+## Rules
+- **Be opinionated.** When multiple words exist for the same concept, pick the best one and list the others as aliases under `_Avoid_:`.
+- **Flag conflicts explicitly.** If a term is used ambiguously, call it out in "Flagged ambiguities" with a clear resolution.
+- **Keep definitions tight.** One sentence max. Define what it IS, not what it does.
+- **Show relationships.** Use bold term names and express cardinality where obvious.
+- **Only include terms specific to this project's context.** General programming concepts (timeouts, error types, utility patterns) don't belong even if the project uses them extensively. Before adding a term, ask: is this a concept unique to this context, or a general programming concept? Only the former belongs.
+- **Group terms under subheadings** when natural clusters emerge. If all terms belong to a single cohesive area, a flat list is fine.
+- **Write an example dialogue.** A conversation between a dev and a domain expert that demonstrates how the terms interact naturally and clarifies boundaries between related concepts.
+- **Edit inline as terminology shifts.** `CONTEXT.md` is updated whenever a term is renamed, retired, or newly disambiguated. The `grill-plan` skill will offer to update it during planning sessions, but anyone touching the vocabulary can edit it directly — stale terminology here is worse than missing terminology.
+## Minimal example (single domain)
+The opening example uses an Ordering/Billing domain to show cross-context structure. For a smaller project a `CONTEXT.md` may be just a handful of terms and no relationships:
+```md
+# Todo App
+Vocabulary used across `internal/auth` and `internal/todos`.
+## Language
+**User**:
+A person identified by an email address; owns todos and sessions.
+_Avoid_: account, profile
+**Sign-in token**:
+A single-use, time-limited token emailed to a User to prove ownership of an email address.
+_Avoid_: magic link (the *URL* containing the token), OTP, code
+**Session**:
+A long-lived bearer token issued after a sign-in token is consumed; authenticates subsequent requests.
+_Avoid_: cookie, JWT (the format is a random bearer string, not either of those)
+**Todo**:
+An item on a User's personal list — text plus a done flag. Owned by exactly one User.
+## Flagged ambiguities
+- "token" alone is ambiguous — always qualify as **sign-in token** or **session** (bearer).
+```
+## Single vs multi-context repos
+**Single context (most repos):** One `CONTEXT.md` at the repo root.
+**Multiple contexts:** A `CONTEXT-MAP.md` at the repo root lists the contexts, where they live, and how they relate to each other:
+```md
+# Context Map
+## Contexts
+- [Ordering](./src/ordering/CONTEXT.md) — receives and tracks customer orders
+- [Billing](./src/billing/CONTEXT.md) — generates invoices and processes payments
+- [Fulfillment](./src/fulfillment/CONTEXT.md) — manages warehouse picking and shipping
+## Relationships
+- **Ordering → Fulfillment**: Ordering emits `OrderPlaced` events; Fulfillment consumes them to start picking
+- **Fulfillment → Billing**: Fulfillment emits `ShipmentDispatched` events; Billing consumes them to generate invoices
+- **Ordering ↔ Billing**: Shared types for `CustomerId` and `Money`
+```
+The skill infers which structure applies:
+- If `CONTEXT-MAP.md` exists, read it to find contexts
+- If only a root `CONTEXT.md` exists, single context
+- If neither exists, create a root `CONTEXT.md` lazily when the first term is resolved
+When multiple contexts exist, infer which one the current topic relates to. If unclear, ask.

package/skills/formats/CONTEXT-MAP-FORMAT.md ADDED Viewed

@@ -0,0 +1,6 @@
+# Context Map Format
+Used when a single `CONTEXT.md` becomes a bottleneck (>100 terms).
+## Structure
+- Links to sub-contexts.

package/skills/grill-plan/SKILL.md ADDED Viewed

@@ -0,0 +1,112 @@
+---
+name: grill-plan
+description: Grilling session that stress-tests a chosen plan against the existing domain model, sharpens terminology, and updates documentation (CONTEXT.md, ADRs) inline as decisions crystallise. Use AFTER a direction has been picked (post-`investigate` or post-`feature-doc`), when the user wants to pressure-test the plan — triggered by phrases like "grill me on this", "stress-test this plan", "walk me through this", "is this consistent with our model". Skip if the direction is still being explored — use `investigate` instead.
+complexity: medium
+expected_duration: 20 minutes
+---
+## When to use
+- A direction has been picked (post-`investigate` or post-`feature-doc`) and you want to pressure-test it against the existing model.
+- Mid-`improve-codebase-architecture` — the grilling loop borrows this skill's discipline.
+## When to skip
+- The direction is still being explored — use [`investigate`](../investigate/SKILL.md) instead.
+- The plan is fully exploratory and no decisions are being committed — also `investigate`.
+- The decision is trivially reversible — there's nothing to stress-test.
+- Greenfield repo with no `CONTEXT.md` / ADRs yet → run [`bootstrap`](../bootstrap/SKILL.md) first.
+## What to do
+Interview me relentlessly about every aspect of this plan until we reach a shared understanding. Walk down each branch of the design tree, resolving dependencies between decisions one-by-one. For each question, provide your recommended answer.
+Ask the questions one at a time, waiting for feedback on each question before continuing.
+If a question can be answered by exploring the codebase, explore the codebase instead.
+## Domain awareness
+During codebase exploration, also look for existing documentation:
+### File structure
+Most repos have a single context:
+```
+/
+├── CONTEXT.md
+├── docs/
+│   └── adr/
+│       ├── 0001-event-sourced-orders.md
+│       └── 0002-postgres-for-write-model.md
+└── src/
+```
+If a `CONTEXT-MAP.md` exists at the root, the repo has multiple contexts. The map points to where each one lives:
+```
+/
+├── CONTEXT-MAP.md
+├── docs/
+│   └── adr/                          ← system-wide decisions
+├── src/
+│   ├── ordering/
+│   │   ├── CONTEXT.md
+│   │   └── docs/adr/                 ← context-specific decisions
+│   └── billing/
+│       ├── CONTEXT.md
+│       └── docs/adr/
+```
+Create files lazily — only when you have something to write. If no `CONTEXT.md` exists, create one when the first term is resolved. If no `docs/adr/` exists, create it when the first ADR is needed.
+## During the session
+### Challenge against the glossary
+When the user uses a term that conflicts with the existing language in `CONTEXT.md`, call it out immediately. "Your glossary defines 'cancellation' as X, but you seem to mean Y — which is it?"
+### Sharpen fuzzy language
+When the user uses vague or overloaded terms, propose a precise canonical term. "You're saying 'account' — do you mean the Customer or the User? Those are different things."
+### Discuss concrete scenarios
+When domain relationships are being discussed, stress-test them with specific scenarios. Invent scenarios that probe edge cases and force the user to be precise about the boundaries between concepts.
+### Cross-reference with code
+When the user states how something works, check whether the code agrees. If you find a contradiction, surface it: "Your code cancels entire Orders, but you just said partial cancellation is possible — which is right?"
+### Update CONTEXT.md inline
+When a term is resolved, update `CONTEXT.md` right there. Don't batch these up — capture them as they happen. Use the format in [`../formats/CONTEXT-FORMAT.md`](../formats/CONTEXT-FORMAT.md).
+Don't couple `CONTEXT.md` to implementation details. Only include terms that are meaningful to domain experts.
+### Offer ADRs sparingly
+Only offer to create an ADR when all three are true:
+1. **Hard to reverse** — the cost of changing your mind later is meaningful
+2. **Surprising without context** — a future reader will wonder "why did they do it this way?"
+3. **The result of a real trade-off** — there were genuine alternatives and you picked one for specific reasons
+If any of the three is missing, skip the ADR. Use the format in [`../formats/ADR-FORMAT.md`](../formats/ADR-FORMAT.md).
+## Pairing with other skills
+- **[`investigate`](../investigate/SKILL.md)** — runs *before* when direction is unclear. Once a direction is chosen, hand off here.
+- **[`sync-check`](../sync-check/SKILL.md)** — escalates here when a terminology collision or ADR contradiction is found in existing code.
+- **[`feature-doc`](../feature-doc/SKILL.md)** — runs *before* when a feature doc is the input to grilling, or *after* when grilling refines the plan into a feature doc.
+- **[`system-design`](../system-design/SKILL.md)** — defers here for hard-to-reverse topology decisions.
+- **[`improve-codebase-architecture`](../improve-codebase-architecture/SKILL.md)** — borrows this skill's grilling discipline in its own grilling loop.
+- **[`tdd-rounds`](../tdd-rounds/SKILL.md)** — escalates here mid-round when an ADR is wobbling.
+## Done when
+- No open questions remain on the chosen plan.
+- Every term used during the session has been resolved into `CONTEXT.md` (or explicitly flagged as ambiguous).
+- Any decision that meets all three ADR criteria has been offered (and either accepted or declined).
+- The user explicitly says "this is enough" or "let's move on" — don't keep grilling past their patience.