@ai-agent-lead/skills 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +37 -0
- package/bin/install.js +272 -0
- package/package.json +34 -0
- package/skills/LANGUAGE.md +72 -0
- package/skills/README.md +156 -0
- package/skills/SKILL-TEMPLATE.md +120 -0
- package/skills/TRIGGERS.md +64 -0
- package/skills/WORKFLOWS.md +369 -0
- package/skills/bench/SKILL.md +40 -0
- package/skills/bench/templates/benchmark-report.md +26 -0
- package/skills/bootstrap/BOOTSTRAP.md +13 -0
- package/skills/bootstrap/SKILL.md +47 -0
- package/skills/code-hygiene/SKILL.md +92 -0
- package/skills/debug/SKILL.md +122 -0
- package/skills/design/DEEP-MODULES.md +76 -0
- package/skills/design/FUNCTIONAL-CORE.md +121 -0
- package/skills/design/ILLEGAL-STATES.md +102 -0
- package/skills/design/OBSERVABILITY.md +49 -0
- package/skills/design/PERSONAS.md +41 -0
- package/skills/design/SKILL.md +139 -0
- package/skills/design/TESTABILITY.md +84 -0
- package/skills/feature-doc/SKILL.md +113 -0
- package/skills/feature-doc/templates/feature-template.md +52 -0
- package/skills/formats/ADR-FORMAT.md +51 -0
- package/skills/formats/CONTEXT-FORMAT.md +109 -0
- package/skills/formats/CONTEXT-MAP-FORMAT.md +6 -0
- package/skills/grill-plan/SKILL.md +112 -0
- package/skills/improve-codebase-architecture/DEEPENING.md +37 -0
- package/skills/improve-codebase-architecture/INTERFACE-DESIGN.md +41 -0
- package/skills/improve-codebase-architecture/SKILL.md +115 -0
- package/skills/investigate/SKILL.md +97 -0
- package/skills/investigate/templates/research-note.md +84 -0
- package/skills/pr-review/SKILL.md +197 -0
- package/skills/prod-ready/SKILL.md +88 -0
- package/skills/security-review/SKILL.md +145 -0
- package/skills/simplify/SKILL.md +105 -0
- package/skills/sync-check/SKILL.md +69 -0
- package/skills/system-design/SKILL.md +160 -0
- package/skills/tdd/SKILL.md +121 -0
- package/skills/tdd/TESTS.md +93 -0
- package/skills/tdd-rounds/COMMITS.md +122 -0
- package/skills/tdd-rounds/SKILL.md +96 -0
- package/skills/tdd-rounds/templates/builder-brief.md +73 -0
- package/skills/tdd-rounds/templates/builder-report.md +21 -0
- package/skills/verify-real-deps/MOTIVATION.md +18 -0
- package/skills/verify-real-deps/SKILL.md +118 -0
- package/skills/verify-real-deps/templates/known-issues.md +45 -0
- package/skills/zoom-out/SKILL.md +104 -0
|
@@ -0,0 +1,37 @@
|
|
|
1
|
+
# Deepening
|
|
2
|
+
|
|
3
|
+
How to deepen a cluster of shallow modules safely, given its dependencies. Assumes the vocabulary in [skills/LANGUAGE.md](../LANGUAGE.md) — **module**, **interface**, **seam**, **adapter**.
|
|
4
|
+
|
|
5
|
+
## Dependency categories
|
|
6
|
+
|
|
7
|
+
When assessing a candidate for deepening, classify its dependencies. The category determines how the deepened module is tested across its seam.
|
|
8
|
+
|
|
9
|
+
### 1. In-process
|
|
10
|
+
|
|
11
|
+
Pure computation, in-memory state, no I/O. Always deepenable — merge the modules and test through the new interface directly. No adapter needed.
|
|
12
|
+
|
|
13
|
+
### 2. Local-substitutable
|
|
14
|
+
|
|
15
|
+
Dependencies that have local test stand-ins (PGLite for Postgres, in-memory filesystem). Deepenable if the stand-in exists. The deepened module is tested with the stand-in running in the test suite. The seam is internal; no port at the module's external interface.
|
|
16
|
+
|
|
17
|
+
### 3. Remote but owned (Ports & Adapters)
|
|
18
|
+
|
|
19
|
+
Your own services across a network boundary (microservices, internal APIs). Define a **port** (interface) at the seam. The deep module owns the logic; the transport is injected as an **adapter**. Tests use an in-memory adapter. Production uses an HTTP/gRPC/queue adapter.
|
|
20
|
+
|
|
21
|
+
Recommendation shape: *"Define a port at the seam, implement an HTTP adapter for production and an in-memory adapter for testing, so the logic sits in one deep module even though it's deployed across a network."*
|
|
22
|
+
|
|
23
|
+
### 4. True external (Mock)
|
|
24
|
+
|
|
25
|
+
Third-party services (Stripe, Twilio, etc.) you don't control. The deepened module takes the external dependency as an injected port; tests provide a mock adapter.
|
|
26
|
+
|
|
27
|
+
## Seam discipline
|
|
28
|
+
|
|
29
|
+
- **One adapter means a hypothetical seam. Two adapters means a real one.** Don't introduce a port unless at least two adapters are justified (typically production + test). A single-adapter seam is just indirection.
|
|
30
|
+
- **Internal seams vs external seams.** A deep module can have internal seams (private to its implementation, used by its own tests) as well as the external seam at its interface. Don't expose internal seams through the interface just because tests use them.
|
|
31
|
+
|
|
32
|
+
## Testing strategy: replace, don't layer
|
|
33
|
+
|
|
34
|
+
- Old unit tests on shallow modules become waste once tests at the deepened module's interface exist — delete them.
|
|
35
|
+
- Write new tests at the deepened module's interface. The **interface is the test surface**.
|
|
36
|
+
- Tests assert on observable outcomes through the interface, not internal state.
|
|
37
|
+
- Tests should survive internal refactors — they describe behaviour, not implementation. If a test has to change when the implementation changes, it's testing past the interface.
|
|
@@ -0,0 +1,41 @@
|
|
|
1
|
+
# Interface Design
|
|
2
|
+
|
|
3
|
+
When the user wants to explore alternative interfaces for a chosen deepening candidate, use this parallel sub-agent pattern. Based on "Design It Twice" (Ousterhout) — your first idea is unlikely to be the best.
|
|
4
|
+
|
|
5
|
+
Uses the vocabulary in [skills/LANGUAGE.md](../LANGUAGE.md) — **module**, **interface**, **seam**, **adapter**, **leverage**.
|
|
6
|
+
|
|
7
|
+
## Process
|
|
8
|
+
|
|
9
|
+
### 1. Frame the problem space
|
|
10
|
+
|
|
11
|
+
Before spawning sub-agents, write a user-facing explanation of the problem space for the chosen candidate:
|
|
12
|
+
|
|
13
|
+
- The constraints any new interface would need to satisfy
|
|
14
|
+
- The dependencies it would rely on, and which category they fall into (see [DEEPENING.md](DEEPENING.md))
|
|
15
|
+
- A rough illustrative code sketch to ground the constraints — not a proposal, just a way to make the constraints concrete
|
|
16
|
+
|
|
17
|
+
Show this to the user, then immediately proceed to Step 2. The user reads and thinks while the sub-agents work in parallel.
|
|
18
|
+
|
|
19
|
+
### 2. Spawn sub-agents
|
|
20
|
+
|
|
21
|
+
Spawn 3+ sub-agents in parallel using the Agent tool. Each must produce a **radically different** interface for the deepened module.
|
|
22
|
+
|
|
23
|
+
Prompt each sub-agent with a separate technical brief (file paths, coupling details, dependency category from [DEEPENING.md](DEEPENING.md), what sits behind the seam). The brief is independent of the user-facing problem-space explanation in Step 1.
|
|
24
|
+
|
|
25
|
+
**Assign each sub-agent a specific persona from [`skills/design/PERSONAS.md`](../design/PERSONAS.md)** (e.g., Agent 1 is "The Minimalist", Agent 2 is "The Extensible").
|
|
26
|
+
|
|
27
|
+
Include both [skills/LANGUAGE.md](../LANGUAGE.md) vocabulary and CONTEXT.md vocabulary in the brief so each sub-agent names things consistently with the architecture language and the project's domain language.
|
|
28
|
+
|
|
29
|
+
Each sub-agent outputs:
|
|
30
|
+
|
|
31
|
+
1. Interface (types, methods, params — plus invariants, ordering, error modes)
|
|
32
|
+
2. Usage example showing how callers use it
|
|
33
|
+
3. What the implementation hides behind the seam
|
|
34
|
+
4. Dependency strategy and adapters (see [DEEPENING.md](DEEPENING.md))
|
|
35
|
+
5. Trade-offs — where leverage is high, where it's thin
|
|
36
|
+
|
|
37
|
+
### 3. Present and compare
|
|
38
|
+
|
|
39
|
+
Present designs sequentially so the user can absorb each one, then compare them in prose. Contrast by **depth** (leverage at the interface), **locality** (where change concentrates), and **seam placement**.
|
|
40
|
+
|
|
41
|
+
After comparing, give your own recommendation: which design you think is strongest and why. If elements from different designs would combine well, propose a hybrid. Be opinionated — the user wants a strong read, not a menu.
|
|
@@ -0,0 +1,115 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: improve-codebase-architecture
|
|
3
|
+
description: Find deepening opportunities in EXISTING code, informed by the domain language in CONTEXT.md and the decisions in docs/adr/. Use when the user wants to improve architecture, find refactoring opportunities, consolidate tightly-coupled modules, or make a codebase more testable and AI-navigable. Use for EXISTING code; for designing the shape of a new module from scratch, use `design`. Skip for single-module local refactors with no cross-module impact — use `design` or just refactor inline.
|
|
4
|
+
complexity: high
|
|
5
|
+
expected_duration: 45 minutes
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Improve Codebase Architecture
|
|
9
|
+
|
|
10
|
+
Surface architectural friction and propose **deepening opportunities** — refactors that turn shallow modules into deep ones. The aim is testability and AI-navigability.
|
|
11
|
+
|
|
12
|
+
## When to use
|
|
13
|
+
|
|
14
|
+
- The user wants to improve architecture, find refactoring opportunities, consolidate tightly-coupled modules, or make a codebase more testable.
|
|
15
|
+
- Mid-project in `tdd-rounds`, after the load-bearing seams exist — surface candidates that become dedicated refactor rounds.
|
|
16
|
+
- After [`zoom-out`](../zoom-out/SKILL.md) reveals shallow modules in an unfamiliar area.
|
|
17
|
+
|
|
18
|
+
## When to skip
|
|
19
|
+
|
|
20
|
+
- Designing the shape of a new module from scratch — use [`design`](../design/SKILL.md).
|
|
21
|
+
- Greenfield system topology — use [`system-design`](../system-design/SKILL.md).
|
|
22
|
+
- Single-module local refactor with no cross-module impact — use `design` or refactor inline.
|
|
23
|
+
- Line-level cleanup (renames, dead-code removal) — use [`code-hygiene`](../code-hygiene/SKILL.md) + [`simplify`](../simplify/SKILL.md).
|
|
24
|
+
|
|
25
|
+
## Glossary
|
|
26
|
+
|
|
27
|
+
Use these terms exactly in every suggestion. Consistent language is the point — don't drift into "component," "service," "API," or "boundary." Full definitions in [skills/LANGUAGE.md](../LANGUAGE.md).
|
|
28
|
+
|
|
29
|
+
- **Module** — anything with an interface and an implementation (function, class, package, slice).
|
|
30
|
+
- **Interface** — everything a caller must know to use the module: types, invariants, error modes, ordering, config. Not just the type signature.
|
|
31
|
+
- **Implementation** — the code inside.
|
|
32
|
+
- **Depth** — leverage at the interface: a lot of behaviour behind a small interface. **Deep** = high leverage. **Shallow** = interface nearly as complex as the implementation.
|
|
33
|
+
- **Seam** — where an interface lives; a place behaviour can be altered without editing in place. (Use this, not "boundary.")
|
|
34
|
+
- **Adapter** — a concrete thing satisfying an interface at a seam.
|
|
35
|
+
- **Leverage** — what callers get from depth.
|
|
36
|
+
- **Locality** — what maintainers get from depth: change, bugs, knowledge concentrated in one place.
|
|
37
|
+
|
|
38
|
+
Key principles (see [skills/LANGUAGE.md](../LANGUAGE.md) for the full list):
|
|
39
|
+
|
|
40
|
+
- **Deletion test**: imagine deleting the module. If complexity vanishes, it was a pass-through. If complexity reappears across N callers, it was earning its keep.
|
|
41
|
+
- **The interface is the test surface.**
|
|
42
|
+
- **One adapter = hypothetical seam. Two adapters = real seam.**
|
|
43
|
+
|
|
44
|
+
This skill is _informed_ by the project's domain model. The domain language gives names to good seams; ADRs record decisions the skill should not re-litigate.
|
|
45
|
+
|
|
46
|
+
## Process
|
|
47
|
+
|
|
48
|
+
### 1. Explore
|
|
49
|
+
|
|
50
|
+
Read the project's domain glossary ([`docs/CONTEXT.md`](../../docs/CONTEXT.md)) and any ADRs in [`docs/adr/`](../../docs/adr/) for the area you're touching first.
|
|
51
|
+
|
|
52
|
+
Then use the Agent tool with `subagent_type=Explore` to walk the codebase. Don't follow rigid heuristics — explore organically and note where you experience friction:
|
|
53
|
+
|
|
54
|
+
- Where does understanding one concept require bouncing between many small modules?
|
|
55
|
+
- Where are modules **shallow** — interface nearly as complex as the implementation?
|
|
56
|
+
- Where have pure functions been extracted just for testability, but the real bugs hide in how they're called (no **locality**)?
|
|
57
|
+
- Where do tightly-coupled modules leak across their seams?
|
|
58
|
+
- Which parts of the codebase are untested, or hard to test through their current interface?
|
|
59
|
+
|
|
60
|
+
Apply the **deletion test** to anything you suspect is shallow: would deleting it concentrate complexity, or just move it? A "yes, concentrates" is the signal you want.
|
|
61
|
+
|
|
62
|
+
### 2. Present candidates
|
|
63
|
+
|
|
64
|
+
Present a numbered list of deepening opportunities. For each candidate:
|
|
65
|
+
|
|
66
|
+
- **Files** — which files/modules are involved
|
|
67
|
+
- **Problem** — why the current architecture is causing friction
|
|
68
|
+
- **Solution** — plain English description of what would change
|
|
69
|
+
- **Benefits** — explained in terms of locality and leverage, and also in how tests would improve
|
|
70
|
+
|
|
71
|
+
**Use CONTEXT.md vocabulary for the domain, and [skills/LANGUAGE.md](../LANGUAGE.md) vocabulary for the architecture.** If `CONTEXT.md` defines "Order," talk about "the Order intake module" — not "the FooBarHandler," and not "the Order service."
|
|
72
|
+
|
|
73
|
+
**ADR conflicts**: if a candidate contradicts an existing ADR, only surface it when the friction is real enough to warrant revisiting the ADR. Mark it clearly (e.g. _"contradicts ADR-0007 — but worth reopening because…"_). Don't list every theoretical refactor an ADR forbids.
|
|
74
|
+
|
|
75
|
+
Do NOT propose interfaces yet. Ask the user: "Which of these would you like to explore?"
|
|
76
|
+
|
|
77
|
+
### 3. Grilling loop
|
|
78
|
+
|
|
79
|
+
Once the user picks a candidate, drop into a grilling conversation. Walk the design tree with them — constraints, dependencies, the shape of the deepened module, what sits behind the seam, what tests survive.
|
|
80
|
+
|
|
81
|
+
Side effects happen inline as decisions crystallize:
|
|
82
|
+
|
|
83
|
+
- **Naming a deepened module after a concept not in `CONTEXT.md`?** Add the term to `CONTEXT.md` — same discipline as `/grill-plan` (see [`../formats/CONTEXT-FORMAT.md`](../formats/CONTEXT-FORMAT.md)). Create the file lazily if it doesn't exist.
|
|
84
|
+
- **Sharpening a fuzzy term during the conversation?** Update `CONTEXT.md` right there.
|
|
85
|
+
- **User rejects the candidate with a load-bearing reason?** Offer an ADR, framed as: _"Want me to record this as an ADR so future architecture reviews don't re-suggest it?"_ Only offer when the reason would actually be needed by a future explorer to avoid re-suggesting the same thing — skip ephemeral reasons ("not worth it right now") and self-evident ones. See [`../formats/ADR-FORMAT.md`](../formats/ADR-FORMAT.md).
|
|
86
|
+
- **Want to explore alternative interfaces for the deepened module?** See [INTERFACE-DESIGN.md](INTERFACE-DESIGN.md).
|
|
87
|
+
|
|
88
|
+
### 4. Context Splitting
|
|
89
|
+
|
|
90
|
+
When a single `CONTEXT.md` becomes a bottleneck (>100 terms), the codebase is likely ready for context splitting.
|
|
91
|
+
|
|
92
|
+
- **Identify Seams**: Find logical boundaries where domain terms are largely independent.
|
|
93
|
+
- **Extract Sub-Contexts**: Move terms into `<module>/CONTEXT.md` files.
|
|
94
|
+
- **Update CONTEXT-MAP.md**: Create or update the root [`docs/CONTEXT-MAP.md`](../formats/CONTEXT-MAP-FORMAT.md) to point to the new sub-contexts.
|
|
95
|
+
- **AI-Navigability**: This reduces context pollution, allowing agents to focus only on the relevant vocabulary for a given module.
|
|
96
|
+
|
|
97
|
+
## Pairing with other skills
|
|
98
|
+
|
|
99
|
+
- **[`zoom-out`](../zoom-out/SKILL.md)** — runs *before* if the area is unfamiliar. Map first, then propose.
|
|
100
|
+
- **[`design`](../design/SKILL.md)** — shares vocabulary ([`LANGUAGE.md`](../LANGUAGE.md)). `design` is the greenfield twin of this skill.
|
|
101
|
+
- **[`system-design`](../system-design/SKILL.md)** — the system-level twin. Greenfield topology vs brownfield deepening.
|
|
102
|
+
- **[`grill-plan`](../grill-plan/SKILL.md)** — invoked when a candidate hits an ADR that needs revisiting (or when a rejection deserves an ADR).
|
|
103
|
+
- **[`tdd`](../tdd/SKILL.md) / [`tdd-rounds`](../tdd-rounds/SKILL.md)** — runs *after* the candidate is chosen. Refactor rounds: ACs are "all existing tests still green".
|
|
104
|
+
- **[`prod-ready`](../prod-ready/SKILL.md)** — gate before merge.
|
|
105
|
+
|
|
106
|
+
## Done when
|
|
107
|
+
|
|
108
|
+
For each candidate the user chose to explore:
|
|
109
|
+
|
|
110
|
+
- The deepened module's interface has been sketched (small surface, clear seam).
|
|
111
|
+
- Implications for tests are named (which existing tests survive; which need updating).
|
|
112
|
+
- Any new domain terms used are added to `CONTEXT.md`.
|
|
113
|
+
- Any architectural decision worth preserving is offered as an ADR.
|
|
114
|
+
|
|
115
|
+
If the user wants to deepen multiple candidates, run the skill again per candidate — don't batch them in one pass.
|
|
@@ -0,0 +1,97 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: investigate
|
|
3
|
+
description: Use when the user asks for investigation, research, a proposal, or "options" before any code lands; or proactively for non-trivial structural decisions (new dependency, framework choice, API contract change, cross-cutting refactor). Triggered by phrases like "investigate X", "research Y", "give me a proposal", "what are our options", "how would we approach", "let's explore", "should we...". Produces a durable research note in `docs/research/<topic>.md`. Skip for tasks where one obvious approach exists (typo fixes, config tweaks, mechanical refactors). Pairs with `feature-doc` (captures *what* we're building once a direction is chosen) and `grill-plan` (stress-tests a chosen plan).
|
|
4
|
+
complexity: medium
|
|
5
|
+
expected_duration: 20 minutes
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Investigation Workflow
|
|
9
|
+
|
|
10
|
+
Investigation is a separate phase from implementation. It produces a durable artifact — a research note in `docs/research/<topic>.md` — that captures what's true today, the viable approaches, and the recommendation with its tradeoff. The artifact survives the conversation; future contributors can reach for it.
|
|
11
|
+
|
|
12
|
+
## When to use
|
|
13
|
+
|
|
14
|
+
- The user explicitly asks for investigation, research, a proposal, options, or "how would we approach X".
|
|
15
|
+
- A non-trivial structural decision is on the table: new dependency, new architectural pattern, framework choice, contract change, cross-cutting refactor.
|
|
16
|
+
- The decision passes the same bar as an ADR: hard to reverse, surprising-without-context, or the result of a real trade-off.
|
|
17
|
+
|
|
18
|
+
## When to skip
|
|
19
|
+
|
|
20
|
+
- One obvious approach (typo fixes, config tweaks, mechanical refactors).
|
|
21
|
+
- Pure execution of an already-decided plan.
|
|
22
|
+
- Bug fixes that don't change architecture.
|
|
23
|
+
|
|
24
|
+
## Phases
|
|
25
|
+
|
|
26
|
+
Each phase is a stop. Don't start the next until the previous is grounded.
|
|
27
|
+
|
|
28
|
+
### 1. Survey the current state — read first, claim later
|
|
29
|
+
|
|
30
|
+
Do not work from impressions. Read the relevant files and ground every claim in the artifact in something cite-able (`path:line`). Specifically check:
|
|
31
|
+
|
|
32
|
+
- **The relevant code paths.** Read entire functions, not just headers. If a behavior is being challenged, read the test that pins it.
|
|
33
|
+
- **Existing ADRs** (`docs/adr/`). A prior decision may already constrain the space — surface it; do not relitigate without flagging.
|
|
34
|
+
- **`CONTEXT.md`** if the topic touches domain language. Get the terms right before writing anything down.
|
|
35
|
+
- **Feature docs** (`docs/features/`) for acceptance criteria already committed.
|
|
36
|
+
- **Open follow-ups** in the codebase (TODO/FIXME) related to the topic.
|
|
37
|
+
- **Known Issues Ledger** (`docs/known-issues.md`). Check for historical "surprises" or wire-shape drifts found during past end-to-end verifications in this area. Learning from past failures prevents repeating them.
|
|
38
|
+
|
|
39
|
+
The Context section of the artifact lists what you found, with citations. Not what you guessed.
|
|
40
|
+
|
|
41
|
+
### 2. Map the design space — 2 to 3 options
|
|
42
|
+
|
|
43
|
+
Aim for 2 to 3 genuinely viable options. One option is not a design space; six is performance art. For each option capture:
|
|
44
|
+
|
|
45
|
+
- **Approach** — one or two sentences. Concrete, not abstract.
|
|
46
|
+
- **Pros** — what makes this attractive.
|
|
47
|
+
- **Cons** — what hurts. Don't soft-pedal — if you'd reject the option later, name the reason now.
|
|
48
|
+
- **Fit with project** — does it align with existing ADRs, conventions, the level of ceremony the team uses? Misfit isn't disqualifying but should be explicit.
|
|
49
|
+
- **Main tradeoff** — one line. The thing being accepted if this option is picked.
|
|
50
|
+
|
|
51
|
+
If you found only one option, say so and explain why other paths were ruled out. Do not pad with strawmen.
|
|
52
|
+
|
|
53
|
+
### 3. Recommend — with reasoning
|
|
54
|
+
|
|
55
|
+
Pick one option. Name it. Give explicit reasoning. Name the tradeoff being accepted. If the recommendation genuinely depends on user preference, say which preferences map to which option — do not punt the decision back without structure.
|
|
56
|
+
|
|
57
|
+
### 4. Checkpoint questions
|
|
58
|
+
|
|
59
|
+
Identify the user decisions that must land before any code does. Examples:
|
|
60
|
+
|
|
61
|
+
- "Do you want X or Y?"
|
|
62
|
+
- "Is the team OK with adding dependency Z?"
|
|
63
|
+
- "Should I draft an ADR first or start the implementation?"
|
|
64
|
+
|
|
65
|
+
Each question should be answerable; if it's open-ended, sharpen it.
|
|
66
|
+
|
|
67
|
+
### 5. (Optional) Independent review
|
|
68
|
+
|
|
69
|
+
For high-stakes artifacts — specs, ADRs, anything load-bearing for cross-team alignment — spawn an independent reviewer agent (e.g., `general-purpose`) with a self-contained brief and the artifact path. Do not delegate the synthesis; ask for a critique against specific axes (correctness, completeness, internal consistency).
|
|
70
|
+
|
|
71
|
+
## The artifact
|
|
72
|
+
|
|
73
|
+
Save the research note to `docs/research/<short-topic>.md`. Use [`templates/research-note.md`](./templates/research-note.md) as the skeleton. Create `docs/research/` lazily on first use.
|
|
74
|
+
|
|
75
|
+
The note must include:
|
|
76
|
+
- **Context** with citations.
|
|
77
|
+
- **Options** — 2 to 3, each with the five fields above.
|
|
78
|
+
- **Recommendation** with reasoning and the accepted tradeoff.
|
|
79
|
+
- **Checkpoint questions** the user must answer.
|
|
80
|
+
- **Out of scope** — explicit, so adjacent decisions don't silently leak in.
|
|
81
|
+
|
|
82
|
+
## Rules
|
|
83
|
+
|
|
84
|
+
- Do **not** start implementing inside the investigation. Stop at the artifact and the recommendation; wait for the user to choose.
|
|
85
|
+
- Do **not** narrow to one option silently. If only one option survives, the artifact must explain why the others were ruled out.
|
|
86
|
+
- Do **not** conflate research with planning. A plan executes a chosen option; research surfaces options.
|
|
87
|
+
- Do **not** skip the artifact for "small" investigations. The discipline of writing it is the value; the durable trail is the bonus.
|
|
88
|
+
- Do **not** add a Recommendation that just lists the options again. Pick one.
|
|
89
|
+
|
|
90
|
+
## Handoff
|
|
91
|
+
|
|
92
|
+
Once the user picks an option:
|
|
93
|
+
|
|
94
|
+
- Mark the research note **Decided** and bold the chosen option in the Recommendation section.
|
|
95
|
+
- If the decision is hard-to-reverse / surprising-without-context / the result of a real tradeoff → write an ADR (use `grill-plan`, or write directly into `docs/adr/`). Link the ADR back from the research note.
|
|
96
|
+
- If a concrete feature is now being built → run `feature-doc` next; link it from the research note.
|
|
97
|
+
- If the chosen option requires later validation → leave the research note Open and add a "Follow-ups" section.
|
|
@@ -0,0 +1,84 @@
|
|
|
1
|
+
# Research: <topic>
|
|
2
|
+
|
|
3
|
+
**Status:** Open | Decided | Superseded
|
|
4
|
+
**Owner:** <user>
|
|
5
|
+
**Date:** YYYY-MM-DD
|
|
6
|
+
|
|
7
|
+
<!--
|
|
8
|
+
Status values:
|
|
9
|
+
- Open — investigation in progress, or written and awaiting decision
|
|
10
|
+
- Decided — user has chosen an option; record it on the Decision line below
|
|
11
|
+
- Superseded — replaced by a later research note or ADR; link the replacement
|
|
12
|
+
-->
|
|
13
|
+
|
|
14
|
+
**Decision:** _<pin the chosen option here once decided, e.g., "Option B — decided 2026-05-07">_
|
|
15
|
+
|
|
16
|
+
## Context
|
|
17
|
+
|
|
18
|
+
What's true today. Each claim should be cite-able (file:line, ADR number, doc heading). Include:
|
|
19
|
+
- The problem or question that triggered the investigation.
|
|
20
|
+
- Relevant code paths and what they do today.
|
|
21
|
+
- Prior ADRs / feature docs / `CONTEXT.md` entries that constrain the space.
|
|
22
|
+
- Existing follow-ups (TODO/FIXME) related to the topic.
|
|
23
|
+
|
|
24
|
+
## Options
|
|
25
|
+
|
|
26
|
+
### Option A — <short name>
|
|
27
|
+
|
|
28
|
+
- **Approach:** one or two sentences, concrete.
|
|
29
|
+
- **Pros:** what makes this attractive.
|
|
30
|
+
- **Cons:** what hurts. Don't soft-pedal.
|
|
31
|
+
- **Fit with project:** alignment with existing ADRs / conventions / team ceremony level.
|
|
32
|
+
- **Main tradeoff:** one line.
|
|
33
|
+
|
|
34
|
+
### Option B — <short name>
|
|
35
|
+
|
|
36
|
+
- **Approach:**
|
|
37
|
+
- **Pros:**
|
|
38
|
+
- **Cons:**
|
|
39
|
+
- **Fit with project:**
|
|
40
|
+
- **Main tradeoff:**
|
|
41
|
+
|
|
42
|
+
### Option C — <short name>
|
|
43
|
+
|
|
44
|
+
(Optional. Two to three options total. One is not a design space; six is performance art.)
|
|
45
|
+
|
|
46
|
+
## Recommendation
|
|
47
|
+
|
|
48
|
+
Option **X**.
|
|
49
|
+
|
|
50
|
+
**Reasoning:** ...
|
|
51
|
+
|
|
52
|
+
**Tradeoff being accepted:** ...
|
|
53
|
+
|
|
54
|
+
## Open Questions
|
|
55
|
+
|
|
56
|
+
Research-level unknowns the team carries forward — uncertainties to validate later, not blockers for the decision.
|
|
57
|
+
|
|
58
|
+
(Distinct from Checkpoint Questions below: those are decisions the *user* must answer before code lands; these are unknowns nobody yet has the answer to.)
|
|
59
|
+
|
|
60
|
+
- ...
|
|
61
|
+
|
|
62
|
+
## Checkpoint Questions
|
|
63
|
+
|
|
64
|
+
Decisions the user must make before any code lands. Each question should be answerable.
|
|
65
|
+
|
|
66
|
+
1. ...
|
|
67
|
+
2. ...
|
|
68
|
+
|
|
69
|
+
## Out of Scope
|
|
70
|
+
|
|
71
|
+
Decisions deliberately deferred from this investigation.
|
|
72
|
+
|
|
73
|
+
(Distinct from a feature doc's "Non-Goals", which lists *capabilities deliberately not built*.)
|
|
74
|
+
|
|
75
|
+
- Things the investigation deliberately does not cover.
|
|
76
|
+
- Adjacent decisions that should get their own research note.
|
|
77
|
+
|
|
78
|
+
## Handoff
|
|
79
|
+
|
|
80
|
+
Once a direction is picked:
|
|
81
|
+
- Update **Status** to `Decided` and fill in the **Decision** line at the top.
|
|
82
|
+
- Bold the chosen option in the Recommendation section.
|
|
83
|
+
- If the decision is hard-to-reverse, surprising-without-context, and the result of a real tradeoff → write an ADR and link it here.
|
|
84
|
+
- If a feature follows → invoke `feature-doc` and link the resulting feature doc here.
|
|
@@ -0,0 +1,197 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: pr-review
|
|
3
|
+
description: Discipline for reviewing someone else's pull request — the inverse of `prod-ready` (which is the author's pre-merge gate). Use when the user asks to "review this PR", "look over the diff", "check this change", "give feedback on", or invokes a code-review slash command. Reviews against the linked feature doc / ADRs / domain vocabulary, classifies findings by severity (blocker / suggestion / nit), and returns a structured report. Skip for trivial PRs (typo, dep bump, lint-only) — approve directly. Pairs with `prod-ready` (the author's checklist; the reviewer verifies it landed honestly), `security-review` (escalation when the diff is surface-changing), and `code-hygiene` (the line-level lens applied during the read).
|
|
4
|
+
complexity: medium
|
|
5
|
+
expected_duration: 20 minutes
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# PR Review
|
|
9
|
+
|
|
10
|
+
The author's job is to write a defensible change. The reviewer's job is to verify the **claim** (what the PR says it does) matches the **diff** (what it actually does), against the **contract** (what the feature doc / ADR / tests said the change should be).
|
|
11
|
+
|
|
12
|
+
Most bad PR reviews are line-by-line nit fests that miss the load-bearing decisions. This skill flips the order: read the *claim* first, verify it, then walk the diff.
|
|
13
|
+
|
|
14
|
+
## Why this skill exists
|
|
15
|
+
|
|
16
|
+
- A PR reviewed line-by-line without context produces lots of comments and misses the architectural drift.
|
|
17
|
+
- A PR rubber-stamped after skimming the description misses the security / consistency / scope-creep failures.
|
|
18
|
+
- A reviewer who treats every concern as equal severity exhausts the author's attention and dilutes signal.
|
|
19
|
+
|
|
20
|
+
This skill produces a **prioritised** review where blockers are unambiguous, suggestions are framed as suggestions, and nits are clearly nits — so the author knows what must change vs what's optional.
|
|
21
|
+
|
|
22
|
+
## When to use
|
|
23
|
+
|
|
24
|
+
- The user asks for a PR review (any phrasing).
|
|
25
|
+
- Reviewing a Builder's round in `tdd-rounds` (the parent's verification ritual borrows from this skill).
|
|
26
|
+
- Reviewing your own work before opening the PR — last self-check after `prod-ready`.
|
|
27
|
+
|
|
28
|
+
## When to skip
|
|
29
|
+
|
|
30
|
+
- Typo / lint-only / formatter-only diffs. Approve.
|
|
31
|
+
- Dependency bumps with no API change (still: scan changelog for security advisories before approving).
|
|
32
|
+
- Trivial config tweaks with no behavioural change.
|
|
33
|
+
- PRs that are explicitly draft / WIP — give early feedback, but skip the formal severity classification until the author flags ready.
|
|
34
|
+
|
|
35
|
+
## Process
|
|
36
|
+
|
|
37
|
+
### 1. Read the claim first
|
|
38
|
+
|
|
39
|
+
Before opening the diff, read what the PR claims to do:
|
|
40
|
+
|
|
41
|
+
- **PR description / title** — what's the change? Why?
|
|
42
|
+
- **Linked feature doc** (`docs/features/<name>.md`) — what's the contract? Which ACs?
|
|
43
|
+
- **Linked ADRs** — what decisions does this change rely on or supersede?
|
|
44
|
+
- **Linked research note / known-issues entry** — for fix-rounds, the bug ledger entry doubles as the brief.
|
|
45
|
+
|
|
46
|
+
If there's no description / no linked artifact / no ACs, **that's the first finding**. A PR without a stated claim is not reviewable — ask the author to add one before continuing.
|
|
47
|
+
|
|
48
|
+
### 2. Verify the claim against the diff
|
|
49
|
+
|
|
50
|
+
For each AC / each ledger entry / each promised change:
|
|
51
|
+
|
|
52
|
+
- Find the test that pins it. Read the test. Does it actually exercise the claim, or does it pass while testing something adjacent?
|
|
53
|
+
- Find the implementation. Does the diff match the claim? Anything extra?
|
|
54
|
+
- Anything **missing** from the diff that the claim said would change?
|
|
55
|
+
- Anything **extra** in the diff that the claim didn't mention? (Scope creep; flag as a finding.)
|
|
56
|
+
|
|
57
|
+
This step is the difference between "the PR ships what it says" and "the PR ships, plus a surprise refactor and a silent feature flag." Catch the silent additions here.
|
|
58
|
+
|
|
59
|
+
### 3. Read the diff with the right lenses
|
|
60
|
+
|
|
61
|
+
In this order — biggest-impact first:
|
|
62
|
+
|
|
63
|
+
#### 3a. Architectural / interface review
|
|
64
|
+
|
|
65
|
+
- Does the change respect the dependency direction in `docs/architecture.md` and the existing ADRs?
|
|
66
|
+
- Are new modules deep, or shallow? (Apply the deletion test from [skills/LANGUAGE.md](../LANGUAGE.md).)
|
|
67
|
+
- Are new public types / functions / endpoints named consistently with `CONTEXT.md`?
|
|
68
|
+
- Does anything contradict an ADR without superseding it explicitly?
|
|
69
|
+
|
|
70
|
+
#### 3b. Test review
|
|
71
|
+
|
|
72
|
+
- Does each new test exercise behaviour through the public interface?
|
|
73
|
+
- Does each test name describe a behaviour, not a function? (See `tdd/TESTS.md`.)
|
|
74
|
+
- Does the test actually fail without the implementation change? If it would still pass on `main`, it's not testing the claim.
|
|
75
|
+
- Are mocks at boundaries only? Internal-collaborator mocks are a smell.
|
|
76
|
+
|
|
77
|
+
#### 3c. Security review (delegate when surface-changing)
|
|
78
|
+
|
|
79
|
+
If the diff introduces or alters a trust boundary, identity flow, authorization check, sensitive data path, or external surface — flag that **`security-review` is required** and hold the PR until that review lands. Don't try to inline a half-review of a surface-changing change.
|
|
80
|
+
|
|
81
|
+
For non-surface-changing diffs: walk `prod-ready` Section 3 (defense-in-depth) bullets; flag specific gaps if you see them, otherwise move on.
|
|
82
|
+
|
|
83
|
+
#### 3d. Operational review
|
|
84
|
+
|
|
85
|
+
- Walk `prod-ready`'s checklist sections relevant to the diff. Did the author's `prod-ready` pass actually land? (Verify the timeouts, the migrations idempotency, the structured logging, the doc-map.)
|
|
86
|
+
- A common failure mode: `prod-ready` was checked off but the diff doesn't reflect the changes the checklist would have driven. Treat that as a blocker.
|
|
87
|
+
|
|
88
|
+
#### 3e. Doc-drift audit
|
|
89
|
+
|
|
90
|
+
This is the second line of defense for `prod-ready` Section 7. Author may have missed it; reviewer catches what's left. Walk these four questions against the diff:
|
|
91
|
+
|
|
92
|
+
- **New decision with viable alternatives** → does an ADR exist in `docs/adr/` for it? Does it name what it supersedes? If load-bearing, is it referenced from the code?
|
|
93
|
+
- **New or changed domain term** → has [`docs/CONTEXT.md`](../../docs/CONTEXT.md) been updated? Are `_Avoid_:` aliases listed if there's risk of confusion?
|
|
94
|
+
- **New / removed package, changed public interface, shifted module boundary** → is the feature's design note (`docs/features/<feature>.design.md`) updated? Module map, file layout, public-interface signatures, test boundaries.
|
|
95
|
+
- **Changed acceptance criteria** → does the feature doc reflect what was actually built? Silently-dropped or silently-added behavior is the most common drift class — flag and don't accept "we'll fix in a follow-up".
|
|
96
|
+
|
|
97
|
+
If any answer is "no" without `n/a + reason`, that's a finding. Severity:
|
|
98
|
+
- **Blocker** — the missing doc is load-bearing for the next reader (ADR for a hard-to-reverse decision; CONTEXT.md entry for a term other PRs will use; AC drift hiding behavior).
|
|
99
|
+
- **Suggestion** — the doc would help but the diff is self-explanatory in isolation.
|
|
100
|
+
|
|
101
|
+
The doc-map is small enough to walk in 2–3 minutes. Skip it and you're trading 3 minutes now for an hour of orientation in 3 months.
|
|
102
|
+
|
|
103
|
+
#### 3e. Hygiene (line level)
|
|
104
|
+
|
|
105
|
+
Apply `code-hygiene` as a lens here, not as a primary phase:
|
|
106
|
+
|
|
107
|
+
- Names that mislead (boolean returning non-bool, `getX` that mutates, `Manager`/`Helper` suffixes hiding what the thing is).
|
|
108
|
+
- Cleverness that earns its cost? Or could be boring?
|
|
109
|
+
- YAGNI — "in case we need it" parameters / interfaces / classes? Strip.
|
|
110
|
+
- Premature extraction (Rule of 3 violated)?
|
|
111
|
+
|
|
112
|
+
Save these for last — they shouldn't outweigh architectural concerns.
|
|
113
|
+
|
|
114
|
+
### 4. Classify findings
|
|
115
|
+
|
|
116
|
+
Every finding gets a severity. **The severity is part of the finding.**
|
|
117
|
+
|
|
118
|
+
- **Blocker** — must change before merge. The PR is wrong, breaks a contract, has a security gap, regresses an AC, or contradicts a load-bearing ADR.
|
|
119
|
+
- **Suggestion** — the author should consider; you'd prefer a change but won't block. Includes design alternatives, missing-but-non-essential tests, hygiene improvements with real impact.
|
|
120
|
+
- **Nit** — taste-level. Naming preferences, whitespace, tiny refactors. The author can resolve or dismiss without further discussion.
|
|
121
|
+
- **Question** — you genuinely don't understand and need the author to explain before you can rank it. Asking "why this approach?" is fine; using questions as passive-aggressive blockers is not.
|
|
122
|
+
|
|
123
|
+
Default to fewer blockers. A review with 12 blockers is usually a review with 1 blocker and 11 suggestions miscategorised.
|
|
124
|
+
|
|
125
|
+
### 5. Write the review
|
|
126
|
+
|
|
127
|
+
Structured, scannable. The author should be able to triage in one read.
|
|
128
|
+
|
|
129
|
+
```md
|
|
130
|
+
## PR review: <title>
|
|
131
|
+
|
|
132
|
+
**Verdict**: Approve | Approve with suggestions | Request changes | Needs security-review first
|
|
133
|
+
|
|
134
|
+
**Claim verification**:
|
|
135
|
+
- Description matches diff: yes / partial — <what's extra or missing>
|
|
136
|
+
- ACs covered: AC-XX (test `name`), AC-YY (test `name`), ...
|
|
137
|
+
- Linked ADRs respected: yes / <which one is in tension>
|
|
138
|
+
|
|
139
|
+
**Blockers**:
|
|
140
|
+
- [file:line] <issue>. <why it blocks>. <what would unblock>.
|
|
141
|
+
- ...
|
|
142
|
+
|
|
143
|
+
**Suggestions**:
|
|
144
|
+
- [file:line] <issue>. <why>. <what to consider>.
|
|
145
|
+
- ...
|
|
146
|
+
|
|
147
|
+
**Nits**:
|
|
148
|
+
- [file:line] <one-liner>.
|
|
149
|
+
- ...
|
|
150
|
+
|
|
151
|
+
**Questions**:
|
|
152
|
+
- [file:line] <question>.
|
|
153
|
+
- ...
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
Empty sections: write `_none_` rather than omit. Explicit beats implicit.
|
|
157
|
+
|
|
158
|
+
## Severity calibration — one rule
|
|
159
|
+
|
|
160
|
+
If you label something a blocker, you must be able to finish this sentence: *"This change cannot merge as-is because ___."* If your reason is "I'd prefer X" or "in my style", it's a suggestion. If it's "the AC isn't covered" or "the trust boundary is open" or "the ADR contradicts this", it's a blocker.
|
|
161
|
+
|
|
162
|
+
## Anti-patterns
|
|
163
|
+
|
|
164
|
+
- **Drive-by approve.** "LGTM" without reading the linked artifacts. The artifacts exist so reviewers can verify the claim; skipping them defeats the discipline.
|
|
165
|
+
- **Re-litigating decided ADRs.** If the change implements an ADR-blessed approach you disagree with, the place to argue is a new ADR (or a `grill-plan` session), not the PR. Note the disagreement, don't block on it.
|
|
166
|
+
- **Miscategorised severity.** Calling a naming preference a blocker burns trust. Calling a missing AC a suggestion misses the point of review.
|
|
167
|
+
- **Architectural review as nit pile.** If you have ten line-level comments and zero architectural finding on a 500-line PR, you reviewed at the wrong altitude.
|
|
168
|
+
- **Reviewing the author, not the diff.** Address the change, not the person. "This function does X, but the AC says Y" — not "you didn't understand the AC."
|
|
169
|
+
- **Ignoring the `prod-ready` line item.** If the PR claims `prod-ready` was run, verify a sample of items. Otherwise it becomes a checkbox both sides ignore.
|
|
170
|
+
|
|
171
|
+
## When this skill is invoked by `tdd-rounds` parents
|
|
172
|
+
|
|
173
|
+
A `tdd-rounds` parent verifying a Builder's round runs a focused subset:
|
|
174
|
+
|
|
175
|
+
1. Read the Builder's structured report (`templates/builder-report.md`).
|
|
176
|
+
2. Run the test command independently — don't trust pasted output.
|
|
177
|
+
3. Read the diff, classify findings.
|
|
178
|
+
4. Tick AC checkboxes in the feature doc with the test names.
|
|
179
|
+
5. Append the round summary to `docs/STATE.md`.
|
|
180
|
+
|
|
181
|
+
The classification (blocker / suggestion / nit) lives in the parent's notes; only blockers gate the next round.
|
|
182
|
+
|
|
183
|
+
## Pairing with other skills
|
|
184
|
+
|
|
185
|
+
- **`prod-ready`** is the author's pre-merge checklist. Reviewer verifies it landed.
|
|
186
|
+
- **`sync-check`** is the diagnostic context auditor. Reviewer runs it (or verifies it was run) to catch terminology drift and ADR contradictions early.
|
|
187
|
+
- **`security-review`** is the surface-change escalation. Reviewer flags when required.
|
|
188
|
+
- **`code-hygiene`** is the line-level lens applied during the read.
|
|
189
|
+
- **`grill-plan`** is where load-bearing disagreements go (a new ADR, not a PR comment thread).
|
|
190
|
+
- **`debug`** if a finding turns out to be "this PR introduces a bug" — switch to debug to characterise it before recommending a change.
|
|
191
|
+
|
|
192
|
+
## Done when
|
|
193
|
+
|
|
194
|
+
- The claim is verified or contradicted, with citations.
|
|
195
|
+
- Every finding has a severity, a file:line, and a concrete next step.
|
|
196
|
+
- Blockers are genuinely blocking ("cannot merge because ___").
|
|
197
|
+
- The verdict is one of the four states; ambiguous reviews leave the author guessing.
|