@lemoncode/lemony 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (68) hide show
  1. package/LICENSE +21 -0
  2. package/PRIVACY.md +147 -0
  3. package/README.md +189 -0
  4. package/catalog/VERSION +1 -0
  5. package/catalog/agents/README.md +29 -0
  6. package/catalog/agents/architect.md +81 -0
  7. package/catalog/agents/fit-assessment.md +94 -0
  8. package/catalog/agents/implementer.md +67 -0
  9. package/catalog/agents/orchestrator.md +627 -0
  10. package/catalog/agents/reviewer.md +124 -0
  11. package/catalog/agents/spec-author.md +69 -0
  12. package/catalog/agents/ui-designer.md +25 -0
  13. package/catalog/commands/add-capability.md +69 -0
  14. package/catalog/commands/bypass.md +40 -0
  15. package/catalog/commands/define.md +24 -0
  16. package/catalog/commands/hotfix.md +47 -0
  17. package/catalog/commands/pause.md +52 -0
  18. package/catalog/commands/resume.md +56 -0
  19. package/catalog/commands/spinoff.md +59 -0
  20. package/catalog/commands/triage.md +24 -0
  21. package/catalog/harness.config.schema.json +116 -0
  22. package/catalog/hooks/README.md +56 -0
  23. package/catalog/hooks/init.sh +281 -0
  24. package/catalog/hooks/lib/lemony.sh +41 -0
  25. package/catalog/hooks/lib/playbook-scan.sh +394 -0
  26. package/catalog/hooks/lib/transcript-grep.sh +56 -0
  27. package/catalog/hooks/require-playbook.sh +97 -0
  28. package/catalog/hooks/session-close.sh +232 -0
  29. package/catalog/hooks/suggest-playbook.sh +72 -0
  30. package/catalog/playbook-format.md +198 -0
  31. package/catalog/schemas/README.md +13 -0
  32. package/catalog/schemas/tier2-events-history.md +104 -0
  33. package/catalog/schemas/tier2-events.md +286 -0
  34. package/catalog/skills/README.md +62 -0
  35. package/catalog/skills/bootstrap-architecture/SKILL.md +78 -0
  36. package/catalog/skills/code-explorer/SKILL.md +76 -0
  37. package/catalog/skills/grill-with-docs/ADR-FORMAT.md +49 -0
  38. package/catalog/skills/grill-with-docs/CONTEXT-FORMAT.md +77 -0
  39. package/catalog/skills/grill-with-docs/SKILL.md +270 -0
  40. package/catalog/skills/grill-with-docs/reference.md +236 -0
  41. package/catalog/skills/mutation-testing/SKILL.md +84 -0
  42. package/catalog/skills/note-side-finding/SKILL.md +89 -0
  43. package/catalog/skills/playbook-iterate/SKILL.md +78 -0
  44. package/catalog/skills/prd-to-spec/SKILL.md +181 -0
  45. package/catalog/skills/raise-discovery/SKILL.md +112 -0
  46. package/catalog/skills/resolve-discovery/SKILL.md +123 -0
  47. package/catalog/skills/review-pr/SKILL.md +106 -0
  48. package/catalog/skills/review-pr/reference.md +105 -0
  49. package/catalog/skills/security-review/SKILL.md +90 -0
  50. package/catalog/skills/senior-review/SKILL.md +99 -0
  51. package/catalog/skills/silent-failure-hunter/SKILL.md +76 -0
  52. package/catalog/skills/spec-compliance-check/SKILL.md +74 -0
  53. package/catalog/skills/spec-to-issue/SKILL.md +88 -0
  54. package/catalog/skills/task-closeout/SKILL.md +229 -0
  55. package/catalog/skills/tdd/SKILL.md +171 -0
  56. package/catalog/skills/test-gap-report/SKILL.md +71 -0
  57. package/catalog/skills/triage-issue/SKILL.md +102 -0
  58. package/catalog/skills/update-architecture/SKILL.md +69 -0
  59. package/catalog/skills/verify/SKILL.md +90 -0
  60. package/catalog/skills/write-adr/SKILL.md +77 -0
  61. package/catalog/templates/README.md +32 -0
  62. package/catalog/templates/claude-code/.claude/settings.json.tpl +34 -0
  63. package/catalog/templates/claude-code/agents.md.tpl +109 -0
  64. package/catalog/templates/claude-code/docs/playbooks/README.md.tpl +96 -0
  65. package/catalog/templates/claude-code/harness.config.yml.tpl +59 -0
  66. package/catalog/templates/claude-code/state/history.md.tpl +6 -0
  67. package/dist/cli.mjs +5691 -0
  68. package/package.json +80 -0
@@ -0,0 +1,270 @@
1
+ ---
2
+ name: grill-with-docs
3
+ description: Relentlessly interview the user about a plan or design until shared understanding is reached, grounded in existing code and documentation. Drills decision by decision, presents alternatives with trade-offs, updates CONTEXT.md inline and offers ADRs sparingly. Produces a PRD (and resumes WIP grills across sessions). Use when the user wants to stress-test a plan, define product requirements, or formalize a design grounded in their project.
4
+ origin: vendor
5
+ vendor_version: '{{vendor_version}}'
6
+ invoked-by: [orchestrator, architect]
7
+ ---
8
+
9
+ # Grill With Docs
10
+
11
+ ## Core Principle
12
+
13
+ Push the user to think deeper. Drill every decision branch. Present concrete alternatives with trade-offs — don't just ask open questions. The goal is not validation; it's helping the user discover a better design through structured interrogation, grounded in the project's existing code and documentation.
14
+
15
+ ## Hard rules (the meta-feedback from past sessions)
16
+
17
+ 1. **NEVER auto-decide.** Every structural decision (architecture, naming, file structure, storage, tooling, agents, scope, etc.) goes through `AskUserQuestion`. "My recommendation: X" is fine as an OPTION in a question — never as a closed decision in the plan.
18
+ 2. **One question at a time.** Always. Wait for the answer before continuing.
19
+ 3. **Recommend, never impose.** When proposing alternatives, mark one with `(Recommended)` and explain why. The user chooses.
20
+ 4. **Don't close the plan unilaterally.** Even if you think the grill is done, ASK the user before closing. A WIP plan is acceptable; a falsely-closed plan is not.
21
+ 5. **Update artifacts inline as decisions land.** `CONTEXT.md` and `ADRs` get touched when the moment is right, not batched at the end.
22
+
23
+ ## Mode Detection
24
+
25
+ Detect automatically from what the user provides:
26
+
27
+ - **Validation** — user brings a structured plan with decisions already taken → challenge and verify.
28
+ - **Co-creation** — user brings a vague idea or wish → build the design together through questions and proposals.
29
+
30
+ If ambiguous, ask whether the user wants the plan challenged or built together.
31
+
32
+ ## Workflow
33
+
34
+ ### 1. Setup
35
+
36
+ - Announce that by default you'll briefly explore the codebase and docs to ground your questions, and that the user can opt out.
37
+ - If allowed, do quick reconnaissance:
38
+ - Project structure, stack, conventions
39
+ - `CLAUDE.md`, `CONTEXT.md`, `CONTEXT-MAP.md` if present
40
+ - `docs/architecture.md` if present — the maintained map of the system's shape; orient
41
+ the grill against it so proposals don't contradict existing boundaries/seams. Trust it
42
+ for the big picture; verify against code where a decision actually turns on it. It is
43
+ **absent by default and that is fine** — grill as today, and never push the user to
44
+ create it (decision #8; it is **not** one of the files this skill creates lazily below).
45
+ - `docs/adr/` if present
46
+ - The project's playbooks — `docs/playbooks/<topic>.md` (then the global layer),
47
+ discovered by topic filename
48
+ - Detect mode (validation vs co-creation).
49
+
50
+ ### 2. Phase A — Big Picture
51
+
52
+ High-level questions first. Validate direction before diving into details. Cover both **technical** and **product/UX** perspectives.
53
+
54
+ Examples:
55
+
56
+ - "Does the overall approach make sense?"
57
+ - "Are there fundamental assumptions worth challenging?"
58
+ - "Who's the actual user / customer / audience here?"
59
+
60
+ ### 3. Phase B — Branch by Branch
61
+
62
+ Identify the key decisions. Walk them one at a time.
63
+
64
+ For each branch:
65
+
66
+ - Ask the question (with concrete alternatives + trade-offs).
67
+ - Wait for the answer.
68
+ - Resolve and close the branch.
69
+ - If sub-branches emerge, note them and ask whether they're worth exploring now.
70
+
71
+ **Checkpoints**: after closing important branches, offer pending points to the user explicitly:
72
+
73
+ > "We've closed branches A, B, C. There are 5 open points left: [list]. Want to keep grilling all, prioritize some, or pause here?"
74
+
75
+ ### 4. Co-creation Approach — Socratic + proposals
76
+
77
+ When the user doesn't have a formed opinion, NEVER leave it open-ended. Present alternatives:
78
+
79
+ > "You have these options: A (pros/cons), B (pros/cons), C (pros/cons). My recommendation: B because X. But you decide."
80
+
81
+ This helps the user learn through forced trade-off comparison, not just decide blindly.
82
+
83
+ ### 5. Documentation Grounding (the "with-docs" part)
84
+
85
+ While grilling, weave in documentation updates as decisions crystallize.
86
+
87
+ #### File structure
88
+
89
+ Most repos have a single context:
90
+
91
+ ```
92
+ /
93
+ ├── CONTEXT.md
94
+ ├── docs/
95
+ │ ├── adr/
96
+ │ │ ├── 0001-event-sourced-orders.md
97
+ │ │ └── 0002-postgres-for-write-model.md
98
+ │ └── prds/
99
+ │ └── checkout-flow-2026-05-22.md
100
+ └── src/
101
+ ```
102
+
103
+ If a `CONTEXT-MAP.md` exists at the root, the repo has multiple contexts. The map points to where each one lives:
104
+
105
+ ```
106
+ /
107
+ ├── CONTEXT-MAP.md
108
+ ├── docs/
109
+ │ ├── adr/ ← system-wide decisions
110
+ │ └── prds/ ← system-wide PRDs
111
+ ├── src/
112
+ │ ├── ordering/
113
+ │ │ ├── CONTEXT.md
114
+ │ │ └── docs/adr/ ← context-specific decisions
115
+ │ └── billing/
116
+ │ ├── CONTEXT.md
117
+ │ └── docs/adr/
118
+ ```
119
+
120
+ Create files and directories lazily — only when there's something to write. If no `CONTEXT.md` exists, create one when the first term is resolved. If no `docs/adr/` exists, create it when the first ADR is needed. Same for `docs/prds/` when the first PRD lands. **`docs/architecture.md` is the exception — never create it here.** It is client-owned (decision #8); only the Architect's `update-architecture` skill ever edits it (and only when the file already exists).
121
+
122
+ #### Challenge against the glossary
123
+
124
+ When the user uses a term that conflicts with `CONTEXT.md`, call it out immediately:
125
+
126
+ > "Your glossary defines 'cancellation' as X, but you seem to mean Y — which is it?"
127
+
128
+ #### Sharpen fuzzy language
129
+
130
+ When the user uses vague or overloaded terms, propose precise canonical names:
131
+
132
+ > "You're saying 'account' — do you mean the Customer or the User? Those are different things."
133
+
134
+ #### Discuss concrete scenarios
135
+
136
+ When domain relationships are being discussed, stress-test with specific scenarios. Invent edge cases that force the user to be precise about boundaries.
137
+
138
+ #### Cross-reference with code
139
+
140
+ When the user states how something works, verify against code. If contradicted, surface it:
141
+
142
+ > "Your code cancels entire Orders, but you just said partial cancellation is possible — which is right?"
143
+
144
+ #### Update CONTEXT.md inline
145
+
146
+ When a term is resolved, update `CONTEXT.md` right there. Don't batch — capture it as it happens. Use the format in [CONTEXT-FORMAT.md](./CONTEXT-FORMAT.md).
147
+
148
+ `CONTEXT.md` is a glossary, NOT a spec or scratch pad. Never put implementation details there.
149
+
150
+ #### Offer ADRs sparingly
151
+
152
+ Only offer an ADR when ALL THREE are true:
153
+
154
+ 1. **Hard to reverse** — meaningful cost of changing later.
155
+ 2. **Surprising without context** — future reader will ask "why this way?".
156
+ 3. **Result of a real trade-off** — genuine alternatives existed and one was picked for specific reasons.
157
+
158
+ Use the format in [ADR-FORMAT.md](./ADR-FORMAT.md). If any of the three is missing, skip.
159
+
160
+ ### 6. Closing
161
+
162
+ When branches are exhausted or the user says they've had enough:
163
+
164
+ - Generate the PRD at `./docs/prds/<topic>-<date>.md` relative to the **project working directory**, creating `docs/prds/` lazily. This does NOT require a git repo or a pre-existing `docs/` — an ordinary project folder is enough.
165
+ - The file uses a `Status:` frontmatter field: `in_progress` while the grill is WIP, `completed` when closed. SAME FILE, just different status.
166
+ - **Location fallback (narrow)**: use `~/.claude/plans/<slug>-<date>.md` ONLY when there is genuinely no project workspace — e.g. the CWD is the home directory, or the grill is abstract and tied to no repo. The mere absence of git or of a `docs/` folder is NOT a reason to fall back. **If unsure whether the CWD is the right home for the PRD, ASK the user where to put it — never silently use the plans fallback.** A PRD that lands in `~/.claude/plans/` is invisible from the user's working directory and causes confusion later.
167
+ - See [reference.md](./reference.md) for the PRD template (and the WIP-specific sections used while `Status: in_progress`).
168
+ - **Before writing**: confirm with the user that the grill is done. NEVER autoclose.
169
+
170
+ ## Tone
171
+
172
+ **Adaptive**. Start curious-collaborative. If the user gives shallow answers ("because", "I don't know"), escalate intensity. Always constructive, never hostile.
173
+
174
+ Three levels (full detail in [reference.md](./reference.md)):
175
+
176
+ 1. **Curious-collaborative** (default).
177
+ 2. **Challenging** (when user gives surface answers).
178
+ 3. **Provocative-constructive** (when shallow thinking persists).
179
+
180
+ **Never go beyond level 3.** Use proposals, not insults.
181
+
182
+ ## Pause & Resume
183
+
184
+ First-class workflow. Same file as the final PRD — just a different `Status:`.
185
+
186
+ ### Pausing mid-grill
187
+
188
+ When the user signals they want to stop for now ("let's continue tomorrow", "leave it here", or similar):
189
+
190
+ 1. **DO NOT close the plan unilaterally.** Ask the user how to proceed.
191
+ 2. Save state to the PRD path chosen per the Closing location rule (`./docs/prds/<topic>-<date>.md` in the project workspace; the `~/.claude/plans/` fallback only when there is no project — ask if unsure). Set `Status: in_progress`.
192
+ 3. Add the WIP-specific sections to the same file (see reference.md): "Decisions closed", "Open decisions" (priority order), "Side tasks pending", "Resume prompt".
193
+ 4. The Resume prompt must be the exact text the user can re-invoke later, e.g.:
194
+ ```
195
+ grill-with-docs — resume <topic>. Read <path-to-PRD>. Start at decision #N. Keep grilling mode — one question at a time, do NOT auto-decide.
196
+ ```
197
+ 5. Only after confirming with the user → end the turn.
198
+
199
+ ### Resuming
200
+
201
+ When the user re-invokes the skill to pick a grill back up:
202
+
203
+ 1. Locate the in_progress PRD file (the user usually provides the path).
204
+ 2. Read it fully — especially "Decisions closed" and "Open decisions".
205
+ 3. Confirm with the user where to resume: which decisions are closed, which open one comes next, and whether to keep that order.
206
+ 4. Continue branch-by-branch.
207
+
208
+ ### Completing
209
+
210
+ When the grill closes:
211
+
212
+ - Flip `Status: in_progress` → `Status: completed`.
213
+ - Distribute the content of the WIP sections into the structured PRD sections (User Stories, Decisions, etc.) and remove the WIP-only sections.
214
+ - Same file. No moves between paths.
215
+
216
+ ## Codebase Exploration
217
+
218
+ - **At start**: brief project reconnaissance (single bash listing + read of `CLAUDE.md`, `CONTEXT.md`, project root). NEVER deep-dive unprompted.
219
+ - **During**: on-demand when a question can be answered by code or by reading a playbook/doc.
220
+ - **User opt-out**: respected if requested at setup.
221
+ - **Watch for existing partial work**: if the user has been iterating on adjacent ideas, surface what already exists (don't reinvent).
222
+
223
+ ## Scope
224
+
225
+ Covers BOTH technical and product/UX. Don't assume product decisions are resolved unless the user explicitly says so. Many technical users benefit from being pushed on UX, business model, pricing, anti-copy, onboarding cost, etc.
226
+
227
+ ## "Beyond the basics" — proactive risk surfacing
228
+
229
+ If the user explicitly asks for it ("go beyond the basics", "what am I not seeing"), or if you detect they're treating a strategic question as a technical one, surface 4-6 strategic risks/blind-spots they haven't named. Examples:
230
+
231
+ - Cost / unit economics at scale (token spend, licensing).
232
+ - Data residency / compliance for selling to regulated industries.
233
+ - Anti-copy / IP protection if productizing.
234
+ - Onboarding cost per customer (affects pricing).
235
+ - Documented failure modes ("this is NOT for X").
236
+ - Continuous-improvement mechanisms.
237
+
238
+ Prioritize the ones most likely to kill the project if ignored.
239
+
240
+ ## Uncontemplated scenarios
241
+
242
+ When a question or situation doesn't clearly fit these rules:
243
+
244
+ 1. Apply the closest matching approach with explicit reasoning.
245
+ 2. **Flag it**: "This scenario isn't covered by the skill. I applied X because Y. Want me to add a rule?".
246
+ 3. Offer to update the skill.
247
+
248
+ ## Cross-references with other skills
249
+
250
+ In the harness, the grill output (PRD) feeds the Spec Author:
251
+
252
+ ```
253
+ grill-with-docs → PRD → prd-to-spec → spec (requirements/design/tasks) → spec-to-issue → GitHub issue
254
+ ```
255
+
256
+ | Scenario | Next skill |
257
+ | ----------------------------------------------------------------- | ----------------------------- |
258
+ | PRD completed → needs a spec (EARS requirements / design / tasks) | `prd-to-spec` (Spec Author) |
259
+ | Spec ready → needs the externalized GitHub issue | `spec-to-issue` (Spec Author) |
260
+ | Bug spotted during exploration | `triage-issue` |
261
+
262
+ Mention the relevant option in the "Next Steps" section of the PRD output.
263
+
264
+ ---
265
+
266
+ See also:
267
+
268
+ - [CONTEXT-FORMAT.md](./CONTEXT-FORMAT.md) — format for the project glossary.
269
+ - [ADR-FORMAT.md](./ADR-FORMAT.md) — format for Architecture Decision Records.
270
+ - [reference.md](./reference.md) — PRD template, question categories, tone calibration details.
@@ -0,0 +1,236 @@
1
+ # Grill With Docs — Reference
2
+
3
+ ## PRD Output Template
4
+
5
+ Save to `./docs/prds/<topic>-<date>.md`. Create `docs/prds/` lazily if it doesn't exist.
6
+
7
+ **Same file for WIP and final** — distinguished only by the `Status:` field:
8
+
9
+ - `Status: in_progress` while the grill is paused. The file includes the **WIP-specific sections** described below alongside the structured template.
10
+ - `Status: completed` when the grill closes. The WIP-specific sections are removed; their content gets distributed into the structured sections (User Stories, Decisions, etc.).
11
+
12
+ **Fallback path**: `~/.claude/plans/<slug>-<date>.md` — used ONLY when there is genuinely no project workspace (see the narrow rule in SKILL.md: the mere absence of git or of a `docs/` folder is NOT a reason to fall back — ask if unsure). The file structure is identical; only the location differs.
13
+
14
+ ```markdown
15
+ # PRD: <topic>
16
+
17
+ **Date**: YYYY-MM-DD
18
+ **Mode**: validation | co-creation
19
+ **Status**: in_progress | completed
20
+
21
+ ## Problem Statement
22
+
23
+ <!-- What problem does this solve? From the user's perspective. -->
24
+ <!-- What was discovered from codebase exploration. -->
25
+
26
+ ## User Stories
27
+
28
+ <!-- Extensive list covering all aspects of the feature. -->
29
+
30
+ 1. As a <actor>, I want <feature>, so that <benefit>
31
+
32
+ ## Product / UX Decisions
33
+
34
+ <!-- User-facing decisions with reasoning. -->
35
+
36
+ - **<Decision area>**: <choice> — because <reasoning>
37
+
38
+ ## Technical Decisions
39
+
40
+ <!-- Technical decisions with reasoning. -->
41
+
42
+ - **<Decision area>**: <choice> — because <reasoning>
43
+
44
+ ## Testing Decisions
45
+
46
+ <!-- What to test and how, per area. -->
47
+
48
+ - **<Area>**: <what to test and approach>
49
+
50
+ ## Out of Scope
51
+
52
+ <!-- Explicitly excluded to prevent scope creep. -->
53
+
54
+ - <Thing that will NOT be addressed>
55
+
56
+ ## Discarded Alternatives
57
+
58
+ <!-- Options considered and rejected, with reasons. -->
59
+
60
+ - **<Alternative>**: discarded because <reason>
61
+
62
+ ## Assumptions
63
+
64
+ <!-- Things assumed true but not validated. -->
65
+
66
+ - <Assumption that could change and invalidate decisions>
67
+
68
+ ## Risks
69
+
70
+ <!-- Concerns flagged during the grill, not decisions. -->
71
+
72
+ - <Risk and its potential impact>
73
+
74
+ ## Open Points
75
+
76
+ <!-- Branches not explored or unresolved. -->
77
+
78
+ - [ ] <Open point to explore in the future>
79
+
80
+ ## Next Steps
81
+
82
+ - [ ] Hand off to the Spec Author (`prd-to-spec`) to turn this PRD into a spec.
83
+ - [ ] <Other actions if needed>
84
+ ```
85
+
86
+ ### Status values
87
+
88
+ - `in_progress` — grill paused, can be resumed.
89
+ - `completed` — all branches resolved or user said enough.
90
+
91
+ ### WIP-specific sections (only present while `Status: in_progress`)
92
+
93
+ While the grill is paused, the PRD file ADDS these working sections at the top (between frontmatter and the structured sections). They coexist with whatever structured content is already filled in:
94
+
95
+ ````markdown
96
+ ## Decisions closed
97
+
98
+ | # | Decision | Value | Why |
99
+ | --- | -------- | ----- | --- |
100
+ | 1 | ... | ... | ... |
101
+
102
+ ## Open decisions — priority order
103
+
104
+ 1. **<Branch>**: <where we are, what's pending>.
105
+ 2. ...
106
+
107
+ ## Side tasks pending
108
+
109
+ <!-- Things that need doing before/after fully closing: refactors, file moves, hook installs -->
110
+
111
+ - ...
112
+
113
+ ## Resume prompt
114
+
115
+ Exact prompt for next session:
116
+
117
+ ```
118
+ grill-with-docs — resume <topic>. Read ./docs/prds/<topic>-<date>.md. Start at decision #N. Keep grilling mode — one question at a time, do NOT auto-decide.
119
+ ```
120
+ ````
121
+
122
+ When the grill completes (`Status: completed`): distribute the content from these working sections into the structured PRD sections (User Stories, Decisions, etc.) and remove the WIP-only sections. Same file.
123
+
124
+ ---
125
+
126
+ ## Question categories
127
+
128
+ ### Product / UX questions
129
+
130
+ Push the user beyond technical thinking:
131
+
132
+ - **User**: Who uses it? What problem? How do they solve it today?
133
+ - **Flow**: Happy path? Mistakes? How to go back?
134
+ - **User edge cases**: Abandoned mid-flow? Poor connection? Mobile?
135
+ - **Value**: Why this over the alternative? The "aha moment"?
136
+ - **Prioritization**: If only one thing, what? What postpones?
137
+ - **User stories**: For each decision, "As a X, I want Y, so that Z".
138
+
139
+ ### Technical questions
140
+
141
+ - **Functional gaps**: What if X happens and it's not handled?
142
+ - **Trade-offs**: Chose A, why not B? Volume / load?
143
+ - **Technical edge cases**: Connection drops? Corrupted data? Concurrency?
144
+ - **Scalability**: 10 users vs 10k vs 10M?
145
+ - **Alternatives**: Considered Z instead of Y? Gain / lose?
146
+ - **Security**: Malicious user does X — protection?
147
+ - **Operations**: Deploy? Monitor? Rollback?
148
+ - **Testing**: What deserves tests? Approach? What NOT to test?
149
+
150
+ ### Strategic / commercial questions (use when productizing)
151
+
152
+ - **Unit economics**: Cost per use? Pricing? Margin?
153
+ - **Onboarding cost**: How long to install for new customer? What does that imply for pricing?
154
+ - **TAM constraints**: Are there segments excluded (privacy, residency, regulation)?
155
+ - **Anti-copy**: What stops a competitor or customer from copying?
156
+ - **Lock-in**: What makes the customer stay? What makes them leave?
157
+ - **Failure modes**: Where does the product NOT serve well? Documented?
158
+
159
+ ### Meta questions (both modes)
160
+
161
+ - **Assumptions**: What are you assuming? What if false?
162
+ - **Dependencies**: What must exist first? Who else affected?
163
+ - **Simplification**: Simpler version delivering 80% of value?
164
+ - **Scope**: Explicitly out? Looks in-scope but shouldn't be?
165
+
166
+ ---
167
+
168
+ ## Tone calibration
169
+
170
+ ### Level 1 — Curious-collaborative (default)
171
+
172
+ > "Interesting that you chose JWT. What led you to that decision?"
173
+
174
+ Use when: user engaged, thoughtful answers, thinking actively.
175
+
176
+ ### Level 2 — Challenging
177
+
178
+ > "You say JWT, but your app is monolithic without inter-service comms. Do you actually need stateless tokens or does it add complexity without value?"
179
+
180
+ Use when: surface-level answers, conventional wisdom without reasoning.
181
+
182
+ ### Level 3 — Provocative-constructive
183
+
184
+ > "Why JWT? Give me a real reason, not 'because everyone uses it'. Server-side sessions with Redis solve your case with half the complexity."
185
+
186
+ Use when: shallow answers persist, user needs a strong push.
187
+
188
+ **Never go beyond level 3.** Always constructive.
189
+
190
+ ### Escalation signals
191
+
192
+ - _"I don't know"_ → present options with trade-offs, don't just push.
193
+ - _"Because it's standard"_ → challenge with concrete alternative.
194
+ - Thoughtful but incomplete → follow up on the gap.
195
+ - Deep, well-reasoned → acknowledge and move to next branch.
196
+
197
+ ---
198
+
199
+ ## Documentation grounding — when to write what
200
+
201
+ | Artifact | When to touch | Where it lives | Format |
202
+ | ------------------------- | --------------------------------------------------------------------- | ------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------- |
203
+ | `CONTEXT.md` | A domain term is clarified or a new term emerges | Repo root (or per-context if multi-context) | [CONTEXT-FORMAT.md](./CONTEXT-FORMAT.md) |
204
+ | `CONTEXT-MAP.md` | Multiple contexts emerge in the same repo | Repo root | See CONTEXT-FORMAT.md |
205
+ | `NNNN-slug.md` | All three triggers met: hard-to-reverse + surprising + real trade-off | `docs/adr/` | [ADR-FORMAT.md](./ADR-FORMAT.md) |
206
+ | PRD (this skill's output) | Continuously during the grill | `./docs/prds/<topic>-<date>.md` (or `~/.claude/plans/<slug>-<date>.md` if no project) | Template above. `Status: in_progress` while WIP, `Status: completed` when closed. |
207
+
208
+ Create directories lazily — `CONTEXT.md` only when the first term is captured, `docs/adr/` only when the first ADR is needed.
209
+
210
+ ---
211
+
212
+ ## Cross-references with other skills
213
+
214
+ In the harness, the grill output (PRD) feeds the Spec Author:
215
+
216
+ ```
217
+ grill-with-docs → PRD → prd-to-spec → spec (requirements/design/tasks) → spec-to-issue → GitHub issue
218
+ ```
219
+
220
+ | Scenario | Next skill |
221
+ | ------------------------------------------------ | ----------------------------- |
222
+ | PRD completed → needs a spec | `prd-to-spec` (Spec Author) |
223
+ | Spec ready → needs the externalized GitHub issue | `spec-to-issue` (Spec Author) |
224
+ | Bug spotted during grilling | `triage-issue` |
225
+
226
+ Mention the relevant option in the "Next Steps" section of the PRD.
227
+
228
+ ---
229
+
230
+ ## What this skill explicitly does NOT do
231
+
232
+ - **It does NOT auto-decide.** Every decision goes through `AskUserQuestion`. Recommendations live INSIDE options, not as closed decisions in the plan.
233
+ - **It does NOT close the plan unilaterally.** Before closing, confirm with the user.
234
+ - **It does NOT batch documentation updates.** `CONTEXT.md` and ADRs are touched inline as decisions land.
235
+ - **It does NOT impose architecture.** The user (or their playbooks) drives architecture; the grill helps articulate it.
236
+ - **It does NOT silently consume time.** When the conversation passes ~10 branches, offer a checkpoint to pause or prioritize.
@@ -0,0 +1,84 @@
1
+ ---
2
+ name: mutation-testing
3
+ description: Behavioral test-strength analysis of a change — mutate the changed source and check whether the suite actually fails. Surviving mutants mark tests that pass regardless of behavior. Advisory, diff-scoped, and capability-gated (only lands when the project declares a `test:mutation` script). Distinct from `test-gap-report` (structural — which files lack a spec) and `verify` (which just runs the suite).
4
+ origin: vendor
5
+ vendor_version: '{{vendor_version}}'
6
+ applies-when: [has-mutation-testing]
7
+ phase: post-implementation
8
+ invoked-by: [reviewer]
9
+ ---
10
+
11
+ # Mutation Testing
12
+
13
+ The **behavioral** counterpart to `test-gap-report`'s structural analysis. A structural
14
+ gap report asks "which changed files lack a dedicated spec?"; mutation testing asks the
15
+ harder question: **do the tests that exist actually fail when the code misbehaves?** It
16
+ mutates the source (flips `>` to `>=`, drops a line, swaps `&&`/`||`) and re-runs the
17
+ suite. A mutant that **survives** — the tests still pass on the broken code — marks a
18
+ test that asserts nothing load-bearing. Line coverage can't catch this; only mutation
19
+ can.
20
+
21
+ This skill is **advisory** and **diff-scoped**. It never rejects on its own (mutation
22
+ output is noisy — equivalent mutants are real and unavoidable), and it only mutates the
23
+ code the change touched, because mutating the whole tree per review is prohibitively
24
+ slow.
25
+
26
+ ## Precondition (capability-gated)
27
+
28
+ This skill is only installed when the project declares a **`test:mutation`** script in
29
+ `package.json` (the `has-mutation-testing` capability, #155). If you are reading it, the
30
+ script exists — run it. The harness stays tool-agnostic: the script may drive Stryker or
31
+ any mutation tool. Delegate to it exactly as `verify` delegates to the project's
32
+ declared scripts; never assume a specific tool's CLI.
33
+
34
+ ## Process
35
+
36
+ ### 1. Run the project's mutation script, scoped to the diff
37
+
38
+ Run the declared script focused on the **files this change modified** — not the whole
39
+ tree. How to scope depends on the project's tool; read its config and adapt:
40
+
41
+ - **Worked example (Stryker):** `node --run test:mutation -- --mutate "<changed files,comma-separated>"`,
42
+ or rely on the tool's incremental mode (Stryker's `--since=<base>`) if the project
43
+ configures one. The `--` forwards args to the underlying tool.
44
+ - If the tool offers no per-file scoping, run it as the project declares it and **note
45
+ in your output that the run was un-scoped** (slower; don't silently imply diff-scope).
46
+
47
+ Capture the surviving mutants with their file + line.
48
+
49
+ ### 2. Classify each surviving mutant — the two-case model
50
+
51
+ A diff-scoped run mutates changed _files_, but a changed file still contains
52
+ **pre-existing, untouched lines**. Where the mutant survives decides how you report it:
53
+
54
+ | Mutant survives on… | Nature | Channel |
55
+ | -------------------------------------------------------- | ------------------------------------------ | ---------------------------------------------------------------------- |
56
+ | a line **this change added or modified** | a weak test **this task wrote** | **verdict** — advisory finding; the Implementer _may_ strengthen it |
57
+ | a **pre-existing** line (the file was touched elsewhere) | an **independent**, pre-existing weak test | **`note-side-finding`** → the Orchestrator offers a `/spinoff` capture |
58
+
59
+ Use the diff to decide which lines the change actually touched. When unsure whether a
60
+ mutated line is new or pre-existing, treat it as in-scope (verdict) — over-reporting in
61
+ the verdict is cheaper than mis-routing a real gap to a dismissable offer.
62
+
63
+ ### 3. Report
64
+
65
+ - **In-scope surviving mutants** → list them in your review verdict as an **advisory**
66
+ block: file:line, the mutation that survived, and the assertion that would have caught
67
+ it. This is **not** a REJECT on its own (decision: advisory). The Implementer may
68
+ strengthen the tests; the Reviewer may still REJECT by _judgment_ if a survivor exposes
69
+ a genuinely dangerous untested path — but the mutation result alone never auto-blocks.
70
+ - **Pre-existing surviving mutants** → run **`note-side-finding`**: one `## Side-findings`
71
+ bullet each (file:line + the gap), then keep going. The Orchestrator collects it and
72
+ offers the human a `/spinoff`. Don't fix it, don't reject on it — it isn't this task's.
73
+
74
+ If no mutant survives on the changed lines, say so in one line — a clean diff-scoped
75
+ mutation run is a meaningful positive signal about the new tests.
76
+
77
+ ## Notes
78
+
79
+ - **Equivalent mutants** (a mutation that can't change observable behavior, so no test
80
+ could kill it) are expected. Don't chase them; flag a survivor only when a real
81
+ assertion would have caught it.
82
+ - Mutation testing is **slow** — keep it diff-scoped. If a single review's run is
83
+ blowing the time budget, report what completed and note the un-covered files rather
84
+ than silently truncating.