npm - @lemoncode/lemony - Versions diffs - 0.1.0 - Mend

@lemoncode/lemony 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (68) hide show

package/LICENSE +21 -0
package/PRIVACY.md +147 -0
package/README.md +189 -0
package/catalog/VERSION +1 -0
package/catalog/agents/README.md +29 -0
package/catalog/agents/architect.md +81 -0
package/catalog/agents/fit-assessment.md +94 -0
package/catalog/agents/implementer.md +67 -0
package/catalog/agents/orchestrator.md +627 -0
package/catalog/agents/reviewer.md +124 -0
package/catalog/agents/spec-author.md +69 -0
package/catalog/agents/ui-designer.md +25 -0
package/catalog/commands/add-capability.md +69 -0
package/catalog/commands/bypass.md +40 -0
package/catalog/commands/define.md +24 -0
package/catalog/commands/hotfix.md +47 -0
package/catalog/commands/pause.md +52 -0
package/catalog/commands/resume.md +56 -0
package/catalog/commands/spinoff.md +59 -0
package/catalog/commands/triage.md +24 -0
package/catalog/harness.config.schema.json +116 -0
package/catalog/hooks/README.md +56 -0
package/catalog/hooks/init.sh +281 -0
package/catalog/hooks/lib/lemony.sh +41 -0
package/catalog/hooks/lib/playbook-scan.sh +394 -0
package/catalog/hooks/lib/transcript-grep.sh +56 -0
package/catalog/hooks/require-playbook.sh +97 -0
package/catalog/hooks/session-close.sh +232 -0
package/catalog/hooks/suggest-playbook.sh +72 -0
package/catalog/playbook-format.md +198 -0
package/catalog/schemas/README.md +13 -0
package/catalog/schemas/tier2-events-history.md +104 -0
package/catalog/schemas/tier2-events.md +286 -0
package/catalog/skills/README.md +62 -0
package/catalog/skills/bootstrap-architecture/SKILL.md +78 -0
package/catalog/skills/code-explorer/SKILL.md +76 -0
package/catalog/skills/grill-with-docs/ADR-FORMAT.md +49 -0
package/catalog/skills/grill-with-docs/CONTEXT-FORMAT.md +77 -0
package/catalog/skills/grill-with-docs/SKILL.md +270 -0
package/catalog/skills/grill-with-docs/reference.md +236 -0
package/catalog/skills/mutation-testing/SKILL.md +84 -0
package/catalog/skills/note-side-finding/SKILL.md +89 -0
package/catalog/skills/playbook-iterate/SKILL.md +78 -0
package/catalog/skills/prd-to-spec/SKILL.md +181 -0
package/catalog/skills/raise-discovery/SKILL.md +112 -0
package/catalog/skills/resolve-discovery/SKILL.md +123 -0
package/catalog/skills/review-pr/SKILL.md +106 -0
package/catalog/skills/review-pr/reference.md +105 -0
package/catalog/skills/security-review/SKILL.md +90 -0
package/catalog/skills/senior-review/SKILL.md +99 -0
package/catalog/skills/silent-failure-hunter/SKILL.md +76 -0
package/catalog/skills/spec-compliance-check/SKILL.md +74 -0
package/catalog/skills/spec-to-issue/SKILL.md +88 -0
package/catalog/skills/task-closeout/SKILL.md +229 -0
package/catalog/skills/tdd/SKILL.md +171 -0
package/catalog/skills/test-gap-report/SKILL.md +71 -0
package/catalog/skills/triage-issue/SKILL.md +102 -0
package/catalog/skills/update-architecture/SKILL.md +69 -0
package/catalog/skills/verify/SKILL.md +90 -0
package/catalog/skills/write-adr/SKILL.md +77 -0
package/catalog/templates/README.md +32 -0
package/catalog/templates/claude-code/.claude/settings.json.tpl +34 -0
package/catalog/templates/claude-code/agents.md.tpl +109 -0
package/catalog/templates/claude-code/docs/playbooks/README.md.tpl +96 -0
package/catalog/templates/claude-code/harness.config.yml.tpl +59 -0
package/catalog/templates/claude-code/state/history.md.tpl +6 -0
package/dist/cli.mjs +5691 -0
package/package.json +80 -0

package/catalog/skills/grill-with-docs/SKILL.md ADDED Viewed

@@ -0,0 +1,270 @@
+---
+name: grill-with-docs
+description: Relentlessly interview the user about a plan or design until shared understanding is reached, grounded in existing code and documentation. Drills decision by decision, presents alternatives with trade-offs, updates CONTEXT.md inline and offers ADRs sparingly. Produces a PRD (and resumes WIP grills across sessions). Use when the user wants to stress-test a plan, define product requirements, or formalize a design grounded in their project.
+origin: vendor
+vendor_version: '{{vendor_version}}'
+invoked-by: [orchestrator, architect]
+---
+# Grill With Docs
+## Core Principle
+Push the user to think deeper. Drill every decision branch. Present concrete alternatives with trade-offs — don't just ask open questions. The goal is not validation; it's helping the user discover a better design through structured interrogation, grounded in the project's existing code and documentation.
+## Hard rules (the meta-feedback from past sessions)
+1. **NEVER auto-decide.** Every structural decision (architecture, naming, file structure, storage, tooling, agents, scope, etc.) goes through `AskUserQuestion`. "My recommendation: X" is fine as an OPTION in a question — never as a closed decision in the plan.
+2. **One question at a time.** Always. Wait for the answer before continuing.
+3. **Recommend, never impose.** When proposing alternatives, mark one with `(Recommended)` and explain why. The user chooses.
+4. **Don't close the plan unilaterally.** Even if you think the grill is done, ASK the user before closing. A WIP plan is acceptable; a falsely-closed plan is not.
+5. **Update artifacts inline as decisions land.** `CONTEXT.md` and `ADRs` get touched when the moment is right, not batched at the end.
+## Mode Detection
+Detect automatically from what the user provides:
+- **Validation** — user brings a structured plan with decisions already taken → challenge and verify.
+- **Co-creation** — user brings a vague idea or wish → build the design together through questions and proposals.
+If ambiguous, ask whether the user wants the plan challenged or built together.
+## Workflow
+### 1. Setup
+- Announce that by default you'll briefly explore the codebase and docs to ground your questions, and that the user can opt out.
+- If allowed, do quick reconnaissance:
+  - Project structure, stack, conventions
+  - `CLAUDE.md`, `CONTEXT.md`, `CONTEXT-MAP.md` if present
+  - `docs/architecture.md` if present — the maintained map of the system's shape; orient
+    the grill against it so proposals don't contradict existing boundaries/seams. Trust it
+    for the big picture; verify against code where a decision actually turns on it. It is
+    **absent by default and that is fine** — grill as today, and never push the user to
+    create it (decision #8; it is **not** one of the files this skill creates lazily below).
+  - `docs/adr/` if present
+  - The project's playbooks — `docs/playbooks/<topic>.md` (then the global layer),
+    discovered by topic filename
+- Detect mode (validation vs co-creation).
+### 2. Phase A — Big Picture
+High-level questions first. Validate direction before diving into details. Cover both **technical** and **product/UX** perspectives.
+Examples:
+- "Does the overall approach make sense?"
+- "Are there fundamental assumptions worth challenging?"
+- "Who's the actual user / customer / audience here?"
+### 3. Phase B — Branch by Branch
+Identify the key decisions. Walk them one at a time.
+For each branch:
+- Ask the question (with concrete alternatives + trade-offs).
+- Wait for the answer.
+- Resolve and close the branch.
+- If sub-branches emerge, note them and ask whether they're worth exploring now.
+**Checkpoints**: after closing important branches, offer pending points to the user explicitly:
+> "We've closed branches A, B, C. There are 5 open points left: [list]. Want to keep grilling all, prioritize some, or pause here?"
+### 4. Co-creation Approach — Socratic + proposals
+When the user doesn't have a formed opinion, NEVER leave it open-ended. Present alternatives:
+> "You have these options: A (pros/cons), B (pros/cons), C (pros/cons). My recommendation: B because X. But you decide."
+This helps the user learn through forced trade-off comparison, not just decide blindly.
+### 5. Documentation Grounding (the "with-docs" part)
+While grilling, weave in documentation updates as decisions crystallize.
+#### File structure
+Most repos have a single context:
+```
+/
+├── CONTEXT.md
+├── docs/
+│   ├── adr/
+│   │   ├── 0001-event-sourced-orders.md
+│   │   └── 0002-postgres-for-write-model.md
+│   └── prds/
+│       └── checkout-flow-2026-05-22.md
+└── src/
+```
+If a `CONTEXT-MAP.md` exists at the root, the repo has multiple contexts. The map points to where each one lives:
+```
+/
+├── CONTEXT-MAP.md
+├── docs/
+│   ├── adr/                          ← system-wide decisions
+│   └── prds/                         ← system-wide PRDs
+├── src/
+│   ├── ordering/
+│   │   ├── CONTEXT.md
+│   │   └── docs/adr/                 ← context-specific decisions
+│   └── billing/
+│       ├── CONTEXT.md
+│       └── docs/adr/
+```
+Create files and directories lazily — only when there's something to write. If no `CONTEXT.md` exists, create one when the first term is resolved. If no `docs/adr/` exists, create it when the first ADR is needed. Same for `docs/prds/` when the first PRD lands. **`docs/architecture.md` is the exception — never create it here.** It is client-owned (decision #8); only the Architect's `update-architecture` skill ever edits it (and only when the file already exists).
+#### Challenge against the glossary
+When the user uses a term that conflicts with `CONTEXT.md`, call it out immediately:
+> "Your glossary defines 'cancellation' as X, but you seem to mean Y — which is it?"
+#### Sharpen fuzzy language
+When the user uses vague or overloaded terms, propose precise canonical names:
+> "You're saying 'account' — do you mean the Customer or the User? Those are different things."
+#### Discuss concrete scenarios
+When domain relationships are being discussed, stress-test with specific scenarios. Invent edge cases that force the user to be precise about boundaries.
+#### Cross-reference with code
+When the user states how something works, verify against code. If contradicted, surface it:
+> "Your code cancels entire Orders, but you just said partial cancellation is possible — which is right?"
+#### Update CONTEXT.md inline
+When a term is resolved, update `CONTEXT.md` right there. Don't batch — capture it as it happens. Use the format in [CONTEXT-FORMAT.md](./CONTEXT-FORMAT.md).
+`CONTEXT.md` is a glossary, NOT a spec or scratch pad. Never put implementation details there.
+#### Offer ADRs sparingly
+Only offer an ADR when ALL THREE are true:
+1. **Hard to reverse** — meaningful cost of changing later.
+2. **Surprising without context** — future reader will ask "why this way?".
+3. **Result of a real trade-off** — genuine alternatives existed and one was picked for specific reasons.
+Use the format in [ADR-FORMAT.md](./ADR-FORMAT.md). If any of the three is missing, skip.
+### 6. Closing
+When branches are exhausted or the user says they've had enough:
+- Generate the PRD at `./docs/prds/<topic>-<date>.md` relative to the **project working directory**, creating `docs/prds/` lazily. This does NOT require a git repo or a pre-existing `docs/` — an ordinary project folder is enough.
+- The file uses a `Status:` frontmatter field: `in_progress` while the grill is WIP, `completed` when closed. SAME FILE, just different status.
+- **Location fallback (narrow)**: use `~/.claude/plans/<slug>-<date>.md` ONLY when there is genuinely no project workspace — e.g. the CWD is the home directory, or the grill is abstract and tied to no repo. The mere absence of git or of a `docs/` folder is NOT a reason to fall back. **If unsure whether the CWD is the right home for the PRD, ASK the user where to put it — never silently use the plans fallback.** A PRD that lands in `~/.claude/plans/` is invisible from the user's working directory and causes confusion later.
+- See [reference.md](./reference.md) for the PRD template (and the WIP-specific sections used while `Status: in_progress`).
+- **Before writing**: confirm with the user that the grill is done. NEVER autoclose.
+## Tone
+**Adaptive**. Start curious-collaborative. If the user gives shallow answers ("because", "I don't know"), escalate intensity. Always constructive, never hostile.
+Three levels (full detail in [reference.md](./reference.md)):
+1. **Curious-collaborative** (default).
+2. **Challenging** (when user gives surface answers).
+3. **Provocative-constructive** (when shallow thinking persists).
+**Never go beyond level 3.** Use proposals, not insults.
+## Pause & Resume
+First-class workflow. Same file as the final PRD — just a different `Status:`.
+### Pausing mid-grill
+When the user signals they want to stop for now ("let's continue tomorrow", "leave it here", or similar):
+1. **DO NOT close the plan unilaterally.** Ask the user how to proceed.
+2. Save state to the PRD path chosen per the Closing location rule (`./docs/prds/<topic>-<date>.md` in the project workspace; the `~/.claude/plans/` fallback only when there is no project — ask if unsure). Set `Status: in_progress`.
+3. Add the WIP-specific sections to the same file (see reference.md): "Decisions closed", "Open decisions" (priority order), "Side tasks pending", "Resume prompt".
+4. The Resume prompt must be the exact text the user can re-invoke later, e.g.:
+   ```
+   grill-with-docs — resume <topic>. Read <path-to-PRD>. Start at decision #N. Keep grilling mode — one question at a time, do NOT auto-decide.
+   ```
+5. Only after confirming with the user → end the turn.
+### Resuming
+When the user re-invokes the skill to pick a grill back up:
+1. Locate the in_progress PRD file (the user usually provides the path).
+2. Read it fully — especially "Decisions closed" and "Open decisions".
+3. Confirm with the user where to resume: which decisions are closed, which open one comes next, and whether to keep that order.
+4. Continue branch-by-branch.
+### Completing
+When the grill closes:
+- Flip `Status: in_progress` → `Status: completed`.
+- Distribute the content of the WIP sections into the structured PRD sections (User Stories, Decisions, etc.) and remove the WIP-only sections.
+- Same file. No moves between paths.
+## Codebase Exploration
+- **At start**: brief project reconnaissance (single bash listing + read of `CLAUDE.md`, `CONTEXT.md`, project root). NEVER deep-dive unprompted.
+- **During**: on-demand when a question can be answered by code or by reading a playbook/doc.
+- **User opt-out**: respected if requested at setup.
+- **Watch for existing partial work**: if the user has been iterating on adjacent ideas, surface what already exists (don't reinvent).
+## Scope
+Covers BOTH technical and product/UX. Don't assume product decisions are resolved unless the user explicitly says so. Many technical users benefit from being pushed on UX, business model, pricing, anti-copy, onboarding cost, etc.
+## "Beyond the basics" — proactive risk surfacing
+If the user explicitly asks for it ("go beyond the basics", "what am I not seeing"), or if you detect they're treating a strategic question as a technical one, surface 4-6 strategic risks/blind-spots they haven't named. Examples:
+- Cost / unit economics at scale (token spend, licensing).
+- Data residency / compliance for selling to regulated industries.
+- Anti-copy / IP protection if productizing.
+- Onboarding cost per customer (affects pricing).
+- Documented failure modes ("this is NOT for X").
+- Continuous-improvement mechanisms.
+Prioritize the ones most likely to kill the project if ignored.
+## Uncontemplated scenarios
+When a question or situation doesn't clearly fit these rules:
+1. Apply the closest matching approach with explicit reasoning.
+2. **Flag it**: "This scenario isn't covered by the skill. I applied X because Y. Want me to add a rule?".
+3. Offer to update the skill.
+## Cross-references with other skills
+In the harness, the grill output (PRD) feeds the Spec Author:
+```
+grill-with-docs → PRD → prd-to-spec → spec (requirements/design/tasks) → spec-to-issue → GitHub issue
+```
+| Scenario                                                          | Next skill                    |
+| ----------------------------------------------------------------- | ----------------------------- |
+| PRD completed → needs a spec (EARS requirements / design / tasks) | `prd-to-spec` (Spec Author)   |
+| Spec ready → needs the externalized GitHub issue                  | `spec-to-issue` (Spec Author) |
+| Bug spotted during exploration                                    | `triage-issue`                |
+Mention the relevant option in the "Next Steps" section of the PRD output.
+---
+See also:
+- [CONTEXT-FORMAT.md](./CONTEXT-FORMAT.md) — format for the project glossary.
+- [ADR-FORMAT.md](./ADR-FORMAT.md) — format for Architecture Decision Records.
+- [reference.md](./reference.md) — PRD template, question categories, tone calibration details.

package/catalog/skills/grill-with-docs/reference.md ADDED Viewed

@@ -0,0 +1,236 @@
+# Grill With Docs — Reference
+## PRD Output Template
+Save to `./docs/prds/<topic>-<date>.md`. Create `docs/prds/` lazily if it doesn't exist.
+**Same file for WIP and final** — distinguished only by the `Status:` field:
+- `Status: in_progress` while the grill is paused. The file includes the **WIP-specific sections** described below alongside the structured template.
+- `Status: completed` when the grill closes. The WIP-specific sections are removed; their content gets distributed into the structured sections (User Stories, Decisions, etc.).
+**Fallback path**: `~/.claude/plans/<slug>-<date>.md` — used ONLY when there is genuinely no project workspace (see the narrow rule in SKILL.md: the mere absence of git or of a `docs/` folder is NOT a reason to fall back — ask if unsure). The file structure is identical; only the location differs.
+```markdown
+# PRD: <topic>
+**Date**: YYYY-MM-DD
+**Mode**: validation | co-creation
+**Status**: in_progress | completed
+## Problem Statement
+<!-- What problem does this solve? From the user's perspective. -->
+<!-- What was discovered from codebase exploration. -->
+## User Stories
+<!-- Extensive list covering all aspects of the feature. -->
+1. As a <actor>, I want <feature>, so that <benefit>
+## Product / UX Decisions
+<!-- User-facing decisions with reasoning. -->
+- **<Decision area>**: <choice> — because <reasoning>
+## Technical Decisions
+<!-- Technical decisions with reasoning. -->
+- **<Decision area>**: <choice> — because <reasoning>
+## Testing Decisions
+<!-- What to test and how, per area. -->
+- **<Area>**: <what to test and approach>
+## Out of Scope
+<!-- Explicitly excluded to prevent scope creep. -->
+- <Thing that will NOT be addressed>
+## Discarded Alternatives
+<!-- Options considered and rejected, with reasons. -->
+- **<Alternative>**: discarded because <reason>
+## Assumptions
+<!-- Things assumed true but not validated. -->
+- <Assumption that could change and invalidate decisions>
+## Risks
+<!-- Concerns flagged during the grill, not decisions. -->
+- <Risk and its potential impact>
+## Open Points
+<!-- Branches not explored or unresolved. -->
+- [ ] <Open point to explore in the future>
+## Next Steps
+- [ ] Hand off to the Spec Author (`prd-to-spec`) to turn this PRD into a spec.
+- [ ] <Other actions if needed>
+```
+### Status values
+- `in_progress` — grill paused, can be resumed.
+- `completed` — all branches resolved or user said enough.
+### WIP-specific sections (only present while `Status: in_progress`)
+While the grill is paused, the PRD file ADDS these working sections at the top (between frontmatter and the structured sections). They coexist with whatever structured content is already filled in:
+````markdown
+## Decisions closed
+| #   | Decision | Value | Why |
+| --- | -------- | ----- | --- |
+| 1   | ...      | ...   | ... |
+## Open decisions — priority order
+1. **<Branch>**: <where we are, what's pending>.
+2. ...
+## Side tasks pending
+<!-- Things that need doing before/after fully closing: refactors, file moves, hook installs -->
+- ...
+## Resume prompt
+Exact prompt for next session:
+```
+grill-with-docs — resume <topic>. Read ./docs/prds/<topic>-<date>.md. Start at decision #N. Keep grilling mode — one question at a time, do NOT auto-decide.
+```
+````
+When the grill completes (`Status: completed`): distribute the content from these working sections into the structured PRD sections (User Stories, Decisions, etc.) and remove the WIP-only sections. Same file.
+---
+## Question categories
+### Product / UX questions
+Push the user beyond technical thinking:
+- **User**: Who uses it? What problem? How do they solve it today?
+- **Flow**: Happy path? Mistakes? How to go back?
+- **User edge cases**: Abandoned mid-flow? Poor connection? Mobile?
+- **Value**: Why this over the alternative? The "aha moment"?
+- **Prioritization**: If only one thing, what? What postpones?
+- **User stories**: For each decision, "As a X, I want Y, so that Z".
+### Technical questions
+- **Functional gaps**: What if X happens and it's not handled?
+- **Trade-offs**: Chose A, why not B? Volume / load?
+- **Technical edge cases**: Connection drops? Corrupted data? Concurrency?
+- **Scalability**: 10 users vs 10k vs 10M?
+- **Alternatives**: Considered Z instead of Y? Gain / lose?
+- **Security**: Malicious user does X — protection?
+- **Operations**: Deploy? Monitor? Rollback?
+- **Testing**: What deserves tests? Approach? What NOT to test?
+### Strategic / commercial questions (use when productizing)
+- **Unit economics**: Cost per use? Pricing? Margin?
+- **Onboarding cost**: How long to install for new customer? What does that imply for pricing?
+- **TAM constraints**: Are there segments excluded (privacy, residency, regulation)?
+- **Anti-copy**: What stops a competitor or customer from copying?
+- **Lock-in**: What makes the customer stay? What makes them leave?
+- **Failure modes**: Where does the product NOT serve well? Documented?
+### Meta questions (both modes)
+- **Assumptions**: What are you assuming? What if false?
+- **Dependencies**: What must exist first? Who else affected?
+- **Simplification**: Simpler version delivering 80% of value?
+- **Scope**: Explicitly out? Looks in-scope but shouldn't be?
+---
+## Tone calibration
+### Level 1 — Curious-collaborative (default)
+> "Interesting that you chose JWT. What led you to that decision?"
+Use when: user engaged, thoughtful answers, thinking actively.
+### Level 2 — Challenging
+> "You say JWT, but your app is monolithic without inter-service comms. Do you actually need stateless tokens or does it add complexity without value?"
+Use when: surface-level answers, conventional wisdom without reasoning.
+### Level 3 — Provocative-constructive
+> "Why JWT? Give me a real reason, not 'because everyone uses it'. Server-side sessions with Redis solve your case with half the complexity."
+Use when: shallow answers persist, user needs a strong push.
+**Never go beyond level 3.** Always constructive.
+### Escalation signals
+- _"I don't know"_ → present options with trade-offs, don't just push.
+- _"Because it's standard"_ → challenge with concrete alternative.
+- Thoughtful but incomplete → follow up on the gap.
+- Deep, well-reasoned → acknowledge and move to next branch.
+---
+## Documentation grounding — when to write what
+| Artifact                  | When to touch                                                         | Where it lives                                                                        | Format                                                                            |
+| ------------------------- | --------------------------------------------------------------------- | ------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------- |
+| `CONTEXT.md`              | A domain term is clarified or a new term emerges                      | Repo root (or per-context if multi-context)                                           | [CONTEXT-FORMAT.md](./CONTEXT-FORMAT.md)                                          |
+| `CONTEXT-MAP.md`          | Multiple contexts emerge in the same repo                             | Repo root                                                                             | See CONTEXT-FORMAT.md                                                             |
+| `NNNN-slug.md`            | All three triggers met: hard-to-reverse + surprising + real trade-off | `docs/adr/`                                                                           | [ADR-FORMAT.md](./ADR-FORMAT.md)                                                  |
+| PRD (this skill's output) | Continuously during the grill                                         | `./docs/prds/<topic>-<date>.md` (or `~/.claude/plans/<slug>-<date>.md` if no project) | Template above. `Status: in_progress` while WIP, `Status: completed` when closed. |
+Create directories lazily — `CONTEXT.md` only when the first term is captured, `docs/adr/` only when the first ADR is needed.
+---
+## Cross-references with other skills
+In the harness, the grill output (PRD) feeds the Spec Author:
+```
+grill-with-docs → PRD → prd-to-spec → spec (requirements/design/tasks) → spec-to-issue → GitHub issue
+```
+| Scenario                                         | Next skill                    |
+| ------------------------------------------------ | ----------------------------- |
+| PRD completed → needs a spec                     | `prd-to-spec` (Spec Author)   |
+| Spec ready → needs the externalized GitHub issue | `spec-to-issue` (Spec Author) |
+| Bug spotted during grilling                      | `triage-issue`                |
+Mention the relevant option in the "Next Steps" section of the PRD.
+---
+## What this skill explicitly does NOT do
+- **It does NOT auto-decide.** Every decision goes through `AskUserQuestion`. Recommendations live INSIDE options, not as closed decisions in the plan.
+- **It does NOT close the plan unilaterally.** Before closing, confirm with the user.
+- **It does NOT batch documentation updates.** `CONTEXT.md` and ADRs are touched inline as decisions land.
+- **It does NOT impose architecture.** The user (or their playbooks) drives architecture; the grill helps articulate it.
+- **It does NOT silently consume time.** When the conversation passes ~10 branches, offer a checkpoint to pause or prioritize.

package/catalog/skills/mutation-testing/SKILL.md ADDED Viewed

@@ -0,0 +1,84 @@
+---
+name: mutation-testing
+description: Behavioral test-strength analysis of a change — mutate the changed source and check whether the suite actually fails. Surviving mutants mark tests that pass regardless of behavior. Advisory, diff-scoped, and capability-gated (only lands when the project declares a `test:mutation` script). Distinct from `test-gap-report` (structural — which files lack a spec) and `verify` (which just runs the suite).
+origin: vendor
+vendor_version: '{{vendor_version}}'
+applies-when: [has-mutation-testing]
+phase: post-implementation
+invoked-by: [reviewer]
+---
+# Mutation Testing
+The **behavioral** counterpart to `test-gap-report`'s structural analysis. A structural
+gap report asks "which changed files lack a dedicated spec?"; mutation testing asks the
+harder question: **do the tests that exist actually fail when the code misbehaves?** It
+mutates the source (flips `>` to `>=`, drops a line, swaps `&&`/`||`) and re-runs the
+suite. A mutant that **survives** — the tests still pass on the broken code — marks a
+test that asserts nothing load-bearing. Line coverage can't catch this; only mutation
+can.
+This skill is **advisory** and **diff-scoped**. It never rejects on its own (mutation
+output is noisy — equivalent mutants are real and unavoidable), and it only mutates the
+code the change touched, because mutating the whole tree per review is prohibitively
+slow.
+## Precondition (capability-gated)
+This skill is only installed when the project declares a **`test:mutation`** script in
+`package.json` (the `has-mutation-testing` capability, #155). If you are reading it, the
+script exists — run it. The harness stays tool-agnostic: the script may drive Stryker or
+any mutation tool. Delegate to it exactly as `verify` delegates to the project's
+declared scripts; never assume a specific tool's CLI.
+## Process
+### 1. Run the project's mutation script, scoped to the diff
+Run the declared script focused on the **files this change modified** — not the whole
+tree. How to scope depends on the project's tool; read its config and adapt:
+- **Worked example (Stryker):** `node --run test:mutation -- --mutate "<changed files,comma-separated>"`,
+  or rely on the tool's incremental mode (Stryker's `--since=<base>`) if the project
+  configures one. The `--` forwards args to the underlying tool.
+- If the tool offers no per-file scoping, run it as the project declares it and **note
+  in your output that the run was un-scoped** (slower; don't silently imply diff-scope).
+Capture the surviving mutants with their file + line.
+### 2. Classify each surviving mutant — the two-case model
+A diff-scoped run mutates changed _files_, but a changed file still contains
+**pre-existing, untouched lines**. Where the mutant survives decides how you report it:
+| Mutant survives on…                                      | Nature                                     | Channel                                                                |
+| -------------------------------------------------------- | ------------------------------------------ | ---------------------------------------------------------------------- |
+| a line **this change added or modified**                 | a weak test **this task wrote**            | **verdict** — advisory finding; the Implementer _may_ strengthen it    |
+| a **pre-existing** line (the file was touched elsewhere) | an **independent**, pre-existing weak test | **`note-side-finding`** → the Orchestrator offers a `/spinoff` capture |
+Use the diff to decide which lines the change actually touched. When unsure whether a
+mutated line is new or pre-existing, treat it as in-scope (verdict) — over-reporting in
+the verdict is cheaper than mis-routing a real gap to a dismissable offer.
+### 3. Report
+- **In-scope surviving mutants** → list them in your review verdict as an **advisory**
+  block: file:line, the mutation that survived, and the assertion that would have caught
+  it. This is **not** a REJECT on its own (decision: advisory). The Implementer may
+  strengthen the tests; the Reviewer may still REJECT by _judgment_ if a survivor exposes
+  a genuinely dangerous untested path — but the mutation result alone never auto-blocks.
+- **Pre-existing surviving mutants** → run **`note-side-finding`**: one `## Side-findings`
+  bullet each (file:line + the gap), then keep going. The Orchestrator collects it and
+  offers the human a `/spinoff`. Don't fix it, don't reject on it — it isn't this task's.
+If no mutant survives on the changed lines, say so in one line — a clean diff-scoped
+mutation run is a meaningful positive signal about the new tests.
+## Notes
+- **Equivalent mutants** (a mutation that can't change observable behavior, so no test
+  could kill it) are expected. Don't chase them; flag a survivor only when a real
+  assertion would have caught it.
+- Mutation testing is **slow** — keep it diff-scoped. If a single review's run is
+  blowing the time budget, report what completed and note the un-covered files rather
+  than silently truncating.