@lemoncode/lemony 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/PRIVACY.md +147 -0
- package/README.md +189 -0
- package/catalog/VERSION +1 -0
- package/catalog/agents/README.md +29 -0
- package/catalog/agents/architect.md +81 -0
- package/catalog/agents/fit-assessment.md +94 -0
- package/catalog/agents/implementer.md +67 -0
- package/catalog/agents/orchestrator.md +627 -0
- package/catalog/agents/reviewer.md +124 -0
- package/catalog/agents/spec-author.md +69 -0
- package/catalog/agents/ui-designer.md +25 -0
- package/catalog/commands/add-capability.md +69 -0
- package/catalog/commands/bypass.md +40 -0
- package/catalog/commands/define.md +24 -0
- package/catalog/commands/hotfix.md +47 -0
- package/catalog/commands/pause.md +52 -0
- package/catalog/commands/resume.md +56 -0
- package/catalog/commands/spinoff.md +59 -0
- package/catalog/commands/triage.md +24 -0
- package/catalog/harness.config.schema.json +116 -0
- package/catalog/hooks/README.md +56 -0
- package/catalog/hooks/init.sh +281 -0
- package/catalog/hooks/lib/lemony.sh +41 -0
- package/catalog/hooks/lib/playbook-scan.sh +394 -0
- package/catalog/hooks/lib/transcript-grep.sh +56 -0
- package/catalog/hooks/require-playbook.sh +97 -0
- package/catalog/hooks/session-close.sh +232 -0
- package/catalog/hooks/suggest-playbook.sh +72 -0
- package/catalog/playbook-format.md +198 -0
- package/catalog/schemas/README.md +13 -0
- package/catalog/schemas/tier2-events-history.md +104 -0
- package/catalog/schemas/tier2-events.md +286 -0
- package/catalog/skills/README.md +62 -0
- package/catalog/skills/bootstrap-architecture/SKILL.md +78 -0
- package/catalog/skills/code-explorer/SKILL.md +76 -0
- package/catalog/skills/grill-with-docs/ADR-FORMAT.md +49 -0
- package/catalog/skills/grill-with-docs/CONTEXT-FORMAT.md +77 -0
- package/catalog/skills/grill-with-docs/SKILL.md +270 -0
- package/catalog/skills/grill-with-docs/reference.md +236 -0
- package/catalog/skills/mutation-testing/SKILL.md +84 -0
- package/catalog/skills/note-side-finding/SKILL.md +89 -0
- package/catalog/skills/playbook-iterate/SKILL.md +78 -0
- package/catalog/skills/prd-to-spec/SKILL.md +181 -0
- package/catalog/skills/raise-discovery/SKILL.md +112 -0
- package/catalog/skills/resolve-discovery/SKILL.md +123 -0
- package/catalog/skills/review-pr/SKILL.md +106 -0
- package/catalog/skills/review-pr/reference.md +105 -0
- package/catalog/skills/security-review/SKILL.md +90 -0
- package/catalog/skills/senior-review/SKILL.md +99 -0
- package/catalog/skills/silent-failure-hunter/SKILL.md +76 -0
- package/catalog/skills/spec-compliance-check/SKILL.md +74 -0
- package/catalog/skills/spec-to-issue/SKILL.md +88 -0
- package/catalog/skills/task-closeout/SKILL.md +229 -0
- package/catalog/skills/tdd/SKILL.md +171 -0
- package/catalog/skills/test-gap-report/SKILL.md +71 -0
- package/catalog/skills/triage-issue/SKILL.md +102 -0
- package/catalog/skills/update-architecture/SKILL.md +69 -0
- package/catalog/skills/verify/SKILL.md +90 -0
- package/catalog/skills/write-adr/SKILL.md +77 -0
- package/catalog/templates/README.md +32 -0
- package/catalog/templates/claude-code/.claude/settings.json.tpl +34 -0
- package/catalog/templates/claude-code/agents.md.tpl +109 -0
- package/catalog/templates/claude-code/docs/playbooks/README.md.tpl +96 -0
- package/catalog/templates/claude-code/harness.config.yml.tpl +59 -0
- package/catalog/templates/claude-code/state/history.md.tpl +6 -0
- package/dist/cli.mjs +5691 -0
- package/package.json +80 -0
|
@@ -0,0 +1,229 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: task-closeout
|
|
3
|
+
description: Close out a finished task — record it in history.md, archive the task spec + discoveries, activate the Architect for durable capture (ADRs, the architecture map, playbooks), and finalize via a dedicated closeout PR. Use when the Orchestrator finalizes an approved+merged task, mentions "closeout", "close the task", or "wrap up the issue".
|
|
4
|
+
origin: vendor
|
|
5
|
+
vendor_version: '{{vendor_version}}'
|
|
6
|
+
invoked-by: [orchestrator]
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
# Task Closeout
|
|
10
|
+
|
|
11
|
+
## Core Principle
|
|
12
|
+
|
|
13
|
+
Closeout is **archive-on-done through a dedicated PR** (ADR 0009). The durable memory of
|
|
14
|
+
a finished task — its spec and its resolved discoveries — is **archived live**, not
|
|
15
|
+
deleted, so a future agent reads the decisions instead of doing git archaeology. The
|
|
16
|
+
genuinely architectural decisions rise further, to ADRs. And the closeout record reaches
|
|
17
|
+
the base branch **only through a PR**, never a direct push — the same branch isolation
|
|
18
|
+
every other change obeys.
|
|
19
|
+
|
|
20
|
+
Three moves, in this order:
|
|
21
|
+
|
|
22
|
+
1. **Activate the Architect** — closeout is its reliable end-of-task checkpoint (#138).
|
|
23
|
+
In cold blood, no resume pressure, it drives durable capture: `write-adr` (HITL offer
|
|
24
|
+
per resolved discovery), `update-architecture` (automatic, when the map exists), and
|
|
25
|
+
`playbook-iterate` (HITL offer, once per task). The canon outlives any task folder.
|
|
26
|
+
2. **Archive, don't delete** — `git mv` the spec + `discoveries.md` into
|
|
27
|
+
`_archive/<id>/`; drop only `progress.md` (true scratch). The high-value memory stays
|
|
28
|
+
live and grep-able.
|
|
29
|
+
3. **Land via a PR** — the `history.md` append, the archival move, and any new ADR ride a
|
|
30
|
+
dedicated `harness/closeout-<id>` PR merged with `--auto`. No direct push to the base.
|
|
31
|
+
|
|
32
|
+
Run this only when the **Reviewer has approved** and the task PR is merged. The merge is a
|
|
33
|
+
human decision (the merge gate) — closeout never merges the task; it **confirms** the
|
|
34
|
+
merge, then records it.
|
|
35
|
+
|
|
36
|
+
## Preconditions
|
|
37
|
+
|
|
38
|
+
- The Reviewer posted an explicit approval (the task passed `in-review`).
|
|
39
|
+
- **The task PR is merged — confirmed against GitHub, not the conversation.**
|
|
40
|
+
`gh pr view <pr> --json state,mergedAt` must report `"state": "MERGED"`, regardless of
|
|
41
|
+
how it was merged (GitHub UI, CLI, or the Orchestrator running `gh pr merge` after the
|
|
42
|
+
human authorized it). If it is not `MERGED`, stop — the task stays at `in-review`
|
|
43
|
+
until the human merges.
|
|
44
|
+
- The PRD for this task is `Status: completed` — `grill-with-docs` normally sets this
|
|
45
|
+
when the grill closed; if it's still `in_progress`, flip it now. (This is the
|
|
46
|
+
latest point the PRD can be marked done.)
|
|
47
|
+
- **No discovery is left open.** If `discoveries.md` exists, every entry must carry a
|
|
48
|
+
resolved `**Resolution**` block (`Decision` / `Resumed at` filled) and no
|
|
49
|
+
`harness:discovery:*` label may remain on the issue. An unresolved discovery means a
|
|
50
|
+
sub-agent is still paused — run `resolve-discovery` before closing out.
|
|
51
|
+
- **The `harness:status:closeout-pending` label exists on the remote.** `install` /
|
|
52
|
+
`update` / `repair` create it during label sync; an install that predates this label
|
|
53
|
+
needs one `lemony repair` first, or the park step's label flip will fail.
|
|
54
|
+
|
|
55
|
+
## Why post-merge (don't archive early)
|
|
56
|
+
|
|
57
|
+
Archival is **destructive** and must wait until the task PR is merged. Until that merge,
|
|
58
|
+
the task can re-enter iteration — a Reviewer rejection, or human review comments at the
|
|
59
|
+
gate (#111) — and that re-work needs the spec **live** in `tasks/<id>/spec/`. Archiving
|
|
60
|
+
pre-merge would pull the spec out from under an in-flight fix. So every step below runs
|
|
61
|
+
**after** the merge is confirmed.
|
|
62
|
+
|
|
63
|
+
## Process
|
|
64
|
+
|
|
65
|
+
Closeout runs **on the default branch, after the merge** — the task PR brought the
|
|
66
|
+
branch (spec + code) onto it. Land there first so the closeout branch forks from the
|
|
67
|
+
merged base: `git checkout <default> && git pull`.
|
|
68
|
+
|
|
69
|
+
### 1. Activate the Architect for durable capture
|
|
70
|
+
|
|
71
|
+
Closeout is the Architect's **reliable activation point** (#138): in cold blood, after
|
|
72
|
+
the merge, with no paused sub-agent pulling toward "just unblock me", it drives the three
|
|
73
|
+
durable-capture skills the on-demand triggers otherwise lose. Each one **no-ops cleanly
|
|
74
|
+
when its skill isn't installed**, so this step is safe everywhere.
|
|
75
|
+
|
|
76
|
+
The three differ in **who decides** — and the asymmetry is deliberate (ADR 0010):
|
|
77
|
+
|
|
78
|
+
- **`write-adr` — HITL offer, per resolved discovery.** Walk the **resolved** entries in
|
|
79
|
+
`discoveries.md` and **offer** the ones that settled a real decision (not a trivial
|
|
80
|
+
clarification) as candidate ADRs, then dispatch the **Architect** (`write-adr`, fresh
|
|
81
|
+
context) on the ones the human accepts. Offer liberally — closeout does **not** apply
|
|
82
|
+
the ADR criterion itself; the Architect owns the three tests (hard to reverse /
|
|
83
|
+
surprising without context / a real trade-off) and **declines** the ones that fail, so
|
|
84
|
+
a generous offer costs only a "no". If there are no resolved discoveries, or none
|
|
85
|
+
qualify, skip — never pad the canon with reversible or obvious decisions. An ADR is
|
|
86
|
+
**net-new canon**: the human curates what enters, so it stays an offer.
|
|
87
|
+
|
|
88
|
+
- **`update-architecture` — automatic dispatch, no pre-offer.** Only when
|
|
89
|
+
`docs/architecture.md` exists (the skill installs solely then). Dispatch the
|
|
90
|
+
**Architect** (`update-architecture`, fresh context) with the task's **merged diff**
|
|
91
|
+
(`gh pr diff <pr>`) plus the task's `spec/design.md` (still live here — archival is
|
|
92
|
+
step 4 below). The Architect reads the change, makes
|
|
93
|
+
the smallest true edit if the system's **shape** moved (referencing any ADR just
|
|
94
|
+
written — `see ADR-NNNN`), or reports **no-op** when nothing architectural changed.
|
|
95
|
+
There is **no human prompt here**: the map must _track reality_, not be curated, and the
|
|
96
|
+
edit is reviewed in the closeout PR diff like any other change. Run this **after**
|
|
97
|
+
`write-adr` so the map can cite the new ADR numbers.
|
|
98
|
+
|
|
99
|
+
- **`playbook-iterate` — HITL offer, once per task.** A single reflective offer: did this
|
|
100
|
+
task surface a **reusable pattern** worth extracting, or a playbook worth iterating,
|
|
101
|
+
that **no `T6 PLAYBOOK_CONFLICT` already routed** mid-task? (A conflict blocks a
|
|
102
|
+
sub-agent and is handled in the moment; this catches the _non-blocking_ learning that
|
|
103
|
+
would otherwise be lost.) On the human's "yes", dispatch the **Architect**
|
|
104
|
+
(`playbook-iterate`, fresh context). Like an ADR, a playbook change is curated canon, so
|
|
105
|
+
it stays an offer; decline costs only a "no".
|
|
106
|
+
|
|
107
|
+
Any artifact written — an ADR, the architecture edit, a playbook — lands in the working
|
|
108
|
+
tree and rides the closeout PR below.
|
|
109
|
+
|
|
110
|
+
### 2. Branch for the record
|
|
111
|
+
|
|
112
|
+
```bash
|
|
113
|
+
git checkout -b harness/closeout-<id>
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
All of closeout's writes go on this branch — nothing touches the base directly.
|
|
117
|
+
|
|
118
|
+
### 3. Append one line to `history.md`
|
|
119
|
+
|
|
120
|
+
`.claude/state/history.md` is the append-only, committed changelog — one entry per
|
|
121
|
+
finished task. Add the task with whatever metrics you actually have:
|
|
122
|
+
|
|
123
|
+
```markdown
|
|
124
|
+
## #<id> — <topic> (<start-date> → <end-date>)
|
|
125
|
+
|
|
126
|
+
Cycle: <h> | Iter: <review rounds> | Bugs 14d: <n/a until measured>
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
Record only metrics you can back up. Rich telemetry (tokens, post-merge bugs) is
|
|
130
|
+
populated once the telemetry/event machinery is authored in a later phase; until
|
|
131
|
+
then, leave those as `n/a` rather than inventing numbers.
|
|
132
|
+
|
|
133
|
+
### 4. Archive the spec + discoveries; drop the scratch
|
|
134
|
+
|
|
135
|
+
Move the durable memory into the archive and remove only the ephemeral scratch. The
|
|
136
|
+
existence guards keep this **idempotent** — a retried closeout (e.g. crash mid-run) that
|
|
137
|
+
already archived skips the move rather than erroring on an existing destination:
|
|
138
|
+
|
|
139
|
+
```bash
|
|
140
|
+
mkdir -p .claude/state/tasks/_archive/<id>
|
|
141
|
+
# Move the durables only if not already archived (retry-safe):
|
|
142
|
+
if [ -d .claude/state/tasks/<id>/spec ]; then
|
|
143
|
+
git mv .claude/state/tasks/<id>/spec .claude/state/tasks/_archive/<id>/spec
|
|
144
|
+
fi
|
|
145
|
+
# discoveries.md only exists if the task raised one — move it conditionally:
|
|
146
|
+
if [ -f .claude/state/tasks/<id>/discoveries.md ]; then
|
|
147
|
+
git mv .claude/state/tasks/<id>/discoveries.md .claude/state/tasks/_archive/<id>/discoveries.md
|
|
148
|
+
fi
|
|
149
|
+
# Drop everything that remains under tasks/<id>/ — progress.md plus any straggler an
|
|
150
|
+
# Implementer left. -r tolerates whatever is there; git doesn't track the empty dir.
|
|
151
|
+
git rm -r .claude/state/tasks/<id>/
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
`spec/` and `discoveries.md` survive **live** under `_archive/<id>/` (grep-able), plus in
|
|
155
|
+
git history and the GitHub issue. `progress.md` (the pause-point bitácora) and any other
|
|
156
|
+
working scratch are gone.
|
|
157
|
+
|
|
158
|
+
### 5. Open the closeout PR and auto-merge
|
|
159
|
+
|
|
160
|
+
Commit the record, push the branch, open a PR, and let GitHub apply the repo's own rules:
|
|
161
|
+
|
|
162
|
+
```bash
|
|
163
|
+
git commit -m "closeout(<id>): archive task state, record in history.md"
|
|
164
|
+
git push -u origin harness/closeout-<id>
|
|
165
|
+
gh pr create --base <default> --head harness/closeout-<id> \
|
|
166
|
+
--title "closeout(<id>): <topic>" --body "Closeout record for #<id>."
|
|
167
|
+
gh pr merge harness/closeout-<id> --auto --squash --delete-branch
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
The closeout PR **must not** carry `Closes #<id>` — the task PR already auto-closed the
|
|
171
|
+
issue on merge; closeout only flips the label and finalizes.
|
|
172
|
+
|
|
173
|
+
`gh pr merge` `--auto` defers to branch protection:
|
|
174
|
+
|
|
175
|
+
- **Protection is PR + checks only** → the PR self-merges once checks pass. Continue to
|
|
176
|
+
step 6 (finalize) once the closeout PR reports merged.
|
|
177
|
+
- **Protection requires human approval** → the PR waits. **Park** (see below).
|
|
178
|
+
- **Auto-merge is disabled repo-wide** → `gh pr merge` `--auto` **errors** ("auto-merge is
|
|
179
|
+
not allowed for this repository") rather than queuing. This is a repo setting,
|
|
180
|
+
independent of branch protection. Treat the error as the wait case: **park**. (You may
|
|
181
|
+
instead merge it immediately with a plain `gh pr merge` `--squash --delete-branch` if the
|
|
182
|
+
human has authorized you to merge — same gesture as the task merge gate.)
|
|
183
|
+
|
|
184
|
+
**Park:** flip the issue to `harness:status:closeout-pending`, tell the human the closeout
|
|
185
|
+
PR is open and awaiting their merge, and stop. The task issue is **already closed** (the
|
|
186
|
+
task PR's `Closes #<id>` fired on its merge), so `/resume` finds the parked closeout only
|
|
187
|
+
by listing closed issues too (`--state all`) — a default open-only queue would miss it. A
|
|
188
|
+
later `/resume` picks up at step 6 once the PR is merged. (Authority for the RESUME entry:
|
|
189
|
+
the Orchestrator.)
|
|
190
|
+
|
|
191
|
+
### 6. Finalize (once the closeout PR is merged)
|
|
192
|
+
|
|
193
|
+
Confirm the closeout PR merged (`gh pr view <pr> --json state,mergedAt` → `MERGED`; on a
|
|
194
|
+
`/resume`, pass the deterministic branch `harness/closeout-<id>` as `<pr>` — the
|
|
195
|
+
merge-confirm accepts a branch name in place of a PR number), land the merged base
|
|
196
|
+
(`git checkout <default> && git pull`), then:
|
|
197
|
+
|
|
198
|
+
```bash
|
|
199
|
+
git push origin --delete harness/<id>-<slug> # delete the task branch (skip if already gone)
|
|
200
|
+
gh issue close <id> # belt-and-suspenders if the merge didn't auto-close it
|
|
201
|
+
```
|
|
202
|
+
|
|
203
|
+
Then flip the issue to `harness:status:done` and **emit `task_done`** — both happen here,
|
|
204
|
+
in finalize, exactly once (the parked → `/resume` path also lands here, so it is the
|
|
205
|
+
single emit point for either path). You compute the envelope (cycle time, review
|
|
206
|
+
rejections, level) as the Orchestrator running this skill; the fields and the `emit`
|
|
207
|
+
command line are in `orchestrator.md` §Closeout. `events.jsonl` is local-only/gitignored
|
|
208
|
+
(ADR 0008), so the emit never dirties the base.
|
|
209
|
+
|
|
210
|
+
### 7. Report
|
|
211
|
+
|
|
212
|
+
Return a one-line summary: the task id, the `history.md` entry, the archive path
|
|
213
|
+
(`_archive/<id>/`), any ADR raised, and confirmation that the issue is closed (or that
|
|
214
|
+
closeout is parked at `closeout-pending` awaiting the record PR's merge).
|
|
215
|
+
|
|
216
|
+
## Scope note
|
|
217
|
+
|
|
218
|
+
In this build, closeout records to `history.md`, archives the spec + discoveries, raises
|
|
219
|
+
ADRs, and finalizes via the closeout PR. The session/metrics accounting beyond
|
|
220
|
+
`task_done` is wired in a later phase — closeout is the place it will attach.
|
|
221
|
+
|
|
222
|
+
## Uncontemplated Scenarios
|
|
223
|
+
|
|
224
|
+
When a scenario doesn't clearly fit these rules:
|
|
225
|
+
|
|
226
|
+
1. Apply the closest matching approach with reasoning.
|
|
227
|
+
2. **Flag it**: "This scenario isn't covered by the task-closeout skill. I applied
|
|
228
|
+
[approach] because [reason]. Want to update the skill?"
|
|
229
|
+
3. Offer to add a new rule for the case.
|
|
@@ -0,0 +1,171 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: tdd
|
|
3
|
+
description: Test-driven development with red-green-refactor loop. Use when user wants to build features or fix bugs using TDD, mentions "red-green-refactor", wants integration tests, or asks for test-first development.
|
|
4
|
+
origin: vendor
|
|
5
|
+
vendor_version: '{{vendor_version}}'
|
|
6
|
+
phase: during-implementation
|
|
7
|
+
invoked-by: [implementer]
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
# Test-Driven Development
|
|
11
|
+
|
|
12
|
+
## Core Principle
|
|
13
|
+
|
|
14
|
+
Tests verify behavior through public interfaces, not implementation details. Code
|
|
15
|
+
can change entirely; tests shouldn't break. A good test reads like a specification —
|
|
16
|
+
"user can checkout with valid cart" tells you exactly what capability exists.
|
|
17
|
+
|
|
18
|
+
## Workflow
|
|
19
|
+
|
|
20
|
+
### 0. Playbook preflight (mandatory — before writing any code or test)
|
|
21
|
+
|
|
22
|
+
1. Identify which of the project's playbooks apply to this task and read them.
|
|
23
|
+
Playbooks are looked up by topic — `<topic>.md` under `docs/playbooks/` (then the
|
|
24
|
+
global layer); the local entry wins, and a missing playbook is not a blocker.
|
|
25
|
+
2. Declare the test environment explicitly:
|
|
26
|
+
|
|
27
|
+
| This file type… | Use… |
|
|
28
|
+
| ------------------------------------------- | -------------------------------------------------- |
|
|
29
|
+
| Pure logic, service, hook (no DOM) | Node spec (`*.spec.ts`) — `environment: 'node'` |
|
|
30
|
+
| React component, DOM interaction, rendering | Browser spec (`*.browser.spec.tsx`) — real browser |
|
|
31
|
+
|
|
32
|
+
State aloud: "This is a [type] test → [node/browser] mode, because [reason]."
|
|
33
|
+
|
|
34
|
+
3. Output preflight log: "Playbooks loaded: [list]. Test environment: [node/browser].
|
|
35
|
+
Proceeding."
|
|
36
|
+
|
|
37
|
+
### 1. Planning
|
|
38
|
+
|
|
39
|
+
Before writing any code:
|
|
40
|
+
|
|
41
|
+
- Confirm with user what interface changes are needed
|
|
42
|
+
- Confirm with user which behaviors to test (prioritize — you can't test everything)
|
|
43
|
+
- List the behaviors to test (not implementation steps)
|
|
44
|
+
- Get user approval on the plan
|
|
45
|
+
|
|
46
|
+
Ask: "What should the public interface look like? Which behaviors are most important
|
|
47
|
+
to test?"
|
|
48
|
+
|
|
49
|
+
### 2. Tracer Bullet
|
|
50
|
+
|
|
51
|
+
Write ONE test that confirms ONE thing about the system:
|
|
52
|
+
|
|
53
|
+
```
|
|
54
|
+
RED: Write test for first behavior → test fails
|
|
55
|
+
GREEN: Write minimal code to pass → test passes
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
This is your tracer bullet — proves the path works end-to-end.
|
|
59
|
+
|
|
60
|
+
### 3. Incremental Loop
|
|
61
|
+
|
|
62
|
+
For each remaining behavior:
|
|
63
|
+
|
|
64
|
+
```
|
|
65
|
+
RED: Write next test → fails
|
|
66
|
+
GREEN: Minimal code to pass → passes
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
Rules:
|
|
70
|
+
|
|
71
|
+
- One test at a time
|
|
72
|
+
- Only enough code to pass current test
|
|
73
|
+
- Don't anticipate future tests
|
|
74
|
+
- Keep tests focused on observable behavior
|
|
75
|
+
|
|
76
|
+
### 4. Refactor
|
|
77
|
+
|
|
78
|
+
After all tests pass, look for refactor candidates:
|
|
79
|
+
|
|
80
|
+
- Never refactor while RED — get to GREEN first
|
|
81
|
+
- Run tests after each refactor step
|
|
82
|
+
|
|
83
|
+
## Capture decisions worth remembering (ADR nudge)
|
|
84
|
+
|
|
85
|
+
A green test proves _what_ the code does. It rarely captures _why_ this
|
|
86
|
+
implementation was chosen over the alternatives. After a cycle (or at the end of a
|
|
87
|
+
feature), ask: did this step embed a decision that will look arbitrary to a future
|
|
88
|
+
reader?
|
|
89
|
+
|
|
90
|
+
Three signals — if **all three** are true, write an ADR (`docs/adr/NNNN-slug.md`):
|
|
91
|
+
|
|
92
|
+
1. **Hard to reverse** — changing your mind later costs real work (data migration,
|
|
93
|
+
public API break, cross-module refactor, downstream client coordination).
|
|
94
|
+
2. **Surprising without context** — a future reader will see the code and wonder
|
|
95
|
+
"why on earth did they do it this way?"
|
|
96
|
+
3. **Result of a real trade-off** — there were genuine alternatives and you picked
|
|
97
|
+
one for specific reasons (not "the obvious thing").
|
|
98
|
+
|
|
99
|
+
Common triggers inside a TDD cycle:
|
|
100
|
+
|
|
101
|
+
- A defensive cap or limit (depth, budget, retries, page size). The _value_ you
|
|
102
|
+
picked is the artifact most likely to drift later; the reason is rarely in the code.
|
|
103
|
+
- A "consistency at every level" rule — easy to write, easy for a future patch to
|
|
104
|
+
break without noticing.
|
|
105
|
+
- A non-obvious algorithm/data structure picked to satisfy a perf or budget test.
|
|
106
|
+
- A deliberate **no-test** decision — "we are intentionally not covering X because the
|
|
107
|
+
cost outweighs the value." Future contributors will add the missing test and break
|
|
108
|
+
the design.
|
|
109
|
+
- A boundary or scope decision: "this concern lives in module A, not module B."
|
|
110
|
+
|
|
111
|
+
If only one or two signals hit, a `// Why: …` comment at the call site is usually
|
|
112
|
+
enough. The bar for an ADR is "this decision will outlive my memory of writing it".
|
|
113
|
+
|
|
114
|
+
When in doubt, flag it: "I made a non-trivial trade-off about X. Want me to draft an
|
|
115
|
+
ADR?" — let the user decide. (The Architect's `write-adr` skill owns the ADR format;
|
|
116
|
+
numbering is sequential under `docs/adr/`.)
|
|
117
|
+
|
|
118
|
+
## Anti-Pattern: Horizontal Slices
|
|
119
|
+
|
|
120
|
+
**DO NOT write all tests first, then all implementation.** This is the biggest TDD
|
|
121
|
+
mistake.
|
|
122
|
+
|
|
123
|
+
```
|
|
124
|
+
WRONG (horizontal):
|
|
125
|
+
RED: test1, test2, test3, test4, test5
|
|
126
|
+
GREEN: impl1, impl2, impl3, impl4, impl5
|
|
127
|
+
|
|
128
|
+
RIGHT (vertical):
|
|
129
|
+
RED→GREEN: test1→impl1
|
|
130
|
+
RED→GREEN: test2→impl2
|
|
131
|
+
RED→GREEN: test3→impl3
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
Tests written in bulk test _imagined_ behavior. You outrun your headlights,
|
|
135
|
+
committing to test structure before understanding the implementation. Each vertical
|
|
136
|
+
cycle informs the next — because you just wrote the code, you know exactly what
|
|
137
|
+
behavior matters.
|
|
138
|
+
|
|
139
|
+
## When NOT to Apply TDD
|
|
140
|
+
|
|
141
|
+
Not everything needs red-green-refactor:
|
|
142
|
+
|
|
143
|
+
- **Validation schemas** — declarative, self-validating
|
|
144
|
+
- **Configuration** (routes wiring, test config) — no logic to test
|
|
145
|
+
- **Type definitions / interfaces** — no runtime behavior
|
|
146
|
+
- **Re-exports / barrel files** — just wiring
|
|
147
|
+
- **Constants and enums** — declarative
|
|
148
|
+
- **Middleware wiring** — the middleware logic is tested, the wiring isn't
|
|
149
|
+
- **Pass-through wrappers** — re-exporting a library without added logic
|
|
150
|
+
- **Database migrations** — DDL scripts, validated by execution
|
|
151
|
+
- **Static content** — i18n strings, templates, fixed content
|
|
152
|
+
- **Logger / env config setup** — pure configuration
|
|
153
|
+
- **Prototypes / spikes** — ask the user: "Do you want tests for this spike?"
|
|
154
|
+
|
|
155
|
+
**Always test**: mappers (even small ones), services with business logic,
|
|
156
|
+
repositories (integration), routes (API behavior), components (user interaction).
|
|
157
|
+
|
|
158
|
+
## Mocking Strategy
|
|
159
|
+
|
|
160
|
+
**`vi.spyOn` as default** — mock at system boundaries only (external APIs, databases,
|
|
161
|
+
time/randomness). Don't mock your own modules' internals. DI only when it genuinely
|
|
162
|
+
adds value. For mocking recipes → see your project's testing playbook.
|
|
163
|
+
|
|
164
|
+
## Uncontemplated Scenarios
|
|
165
|
+
|
|
166
|
+
When a scenario doesn't clearly fit these rules:
|
|
167
|
+
|
|
168
|
+
1. Apply the closest matching rule with reasoning
|
|
169
|
+
2. **Flag it**: "This scenario isn't covered by the tdd skill. I applied [rule]
|
|
170
|
+
because [reason]. Want to update the skill?"
|
|
171
|
+
3. Offer to add a new rule for the case
|
|
@@ -0,0 +1,71 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: test-gap-report
|
|
3
|
+
description: Structural test-gap analysis of a change — which logic-bearing files lack a dedicated test, and which behaviors are untested. Walks the coverage matrix, not the runtime. Use during review to find missing tests before they become regressions; distinct from `verify` (which runs the suite) and `senior-review` (which judges the tests that exist).
|
|
4
|
+
origin: vendor
|
|
5
|
+
vendor_version: '{{vendor_version}}'
|
|
6
|
+
phase: post-implementation
|
|
7
|
+
invoked-by: [reviewer]
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
# Test Gap Report
|
|
11
|
+
|
|
12
|
+
A **structural** gap analysis: which files in the change _should_ have a dedicated test
|
|
13
|
+
and don't, and which behaviors are untested. This is separate from `verify` (which
|
|
14
|
+
_runs_ the suite) and from `senior-review` (which judges whether the tests that exist
|
|
15
|
+
are meaningful). Coverage numbers are a weak proxy — this walks the **coverage matrix**
|
|
16
|
+
by file role instead.
|
|
17
|
+
|
|
18
|
+
## Process
|
|
19
|
+
|
|
20
|
+
### 1. Classify each changed file
|
|
21
|
+
|
|
22
|
+
For every file the change adds or modifies, decide whether it needs a dedicated test
|
|
23
|
+
using the coverage matrix (consult the project's testing playbook for project-specific
|
|
24
|
+
roles):
|
|
25
|
+
|
|
26
|
+
**Always needs a dedicated `*.spec.*`** (it carries logic — conditionals, loops,
|
|
27
|
+
transformations, calculations):
|
|
28
|
+
|
|
29
|
+
| Role | What to cover |
|
|
30
|
+
| ------------------------- | ------------------------------------------------------ |
|
|
31
|
+
| service / use-case | happy path, expected errors, edge cases |
|
|
32
|
+
| mapper / transformer | every field, optional/nullable, empty collections |
|
|
33
|
+
| helper / util / validator | boundary values, error conditions |
|
|
34
|
+
| hook | state transitions, side effects, cleanup |
|
|
35
|
+
| component | renders expected content, interactions, conditional UI |
|
|
36
|
+
| processor / reducer | each input/event type, unknown input |
|
|
37
|
+
|
|
38
|
+
**Covered by integration (no dedicated spec needed)** — wiring, not logic:
|
|
39
|
+
type-only models, thin route/api wrappers, simple stores, constants, config. A file
|
|
40
|
+
here with **branching logic** is promoted to "always test."
|
|
41
|
+
|
|
42
|
+
### 2. Find the gaps
|
|
43
|
+
|
|
44
|
+
For each file in the "always" set:
|
|
45
|
+
|
|
46
|
+
1. Does a co-located `*.spec.*` exist?
|
|
47
|
+
2. If yes, does it cover the **behaviors** the change introduced (not just the ones
|
|
48
|
+
that were already there)? A new branch / error path / edge case added by this change
|
|
49
|
+
needs a new test even if the file already had a spec.
|
|
50
|
+
3. **User-visible flows** (navigation / form / list / auth) added or modified → is
|
|
51
|
+
there an E2E test, or is one warranted per the testing playbook?
|
|
52
|
+
|
|
53
|
+
### 3. Report
|
|
54
|
+
|
|
55
|
+
```
|
|
56
|
+
## Test Gap Report — <task name>
|
|
57
|
+
|
|
58
|
+
| File | Role | Spec? | Gap |
|
|
59
|
+
| --- | --- | --- | --- |
|
|
60
|
+
| `order.service.ts` | service | ✅ | error path for empty cart untested |
|
|
61
|
+
| `slug.util.ts` | util | ❌ | no spec — boundary cases untested |
|
|
62
|
+
| `cart.view.tsx` | component | ✅ | covered |
|
|
63
|
+
|
|
64
|
+
**Missing specs**: <n> · **Partial coverage**: <n>
|
|
65
|
+
**E2E**: needed / present / N/A
|
|
66
|
+
**Verdict**: adequately covered / <n> gaps to close
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
Report gaps; do **not** silently write the missing tests yourself (that's the
|
|
70
|
+
Implementer's loop). A gap is a finding routed back. If closing a gap reveals the
|
|
71
|
+
behavior itself is unspecified, that's a **discovery** — run `raise-discovery`.
|
|
@@ -0,0 +1,102 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: triage-issue
|
|
3
|
+
description: Triage a bug or issue by exploring the codebase to find root cause, then create a GitHub issue with a TDD-based fix plan. Use when user reports a bug, wants to file an issue, mentions "triage", or wants to investigate and plan a fix for a problem.
|
|
4
|
+
origin: vendor
|
|
5
|
+
vendor_version: '{{vendor_version}}'
|
|
6
|
+
invoked-by: [orchestrator]
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
# Triage Issue
|
|
10
|
+
|
|
11
|
+
## Core Principle
|
|
12
|
+
|
|
13
|
+
Investigate first, ask later. Find the root cause through codebase exploration
|
|
14
|
+
before asking the user anything. Minimize questions, maximize diagnosis.
|
|
15
|
+
|
|
16
|
+
## Process
|
|
17
|
+
|
|
18
|
+
### 1. Capture the problem
|
|
19
|
+
|
|
20
|
+
Get a brief description from the user. If they haven't provided one, ask ONE
|
|
21
|
+
question: "What's the problem you're seeing?"
|
|
22
|
+
|
|
23
|
+
Do NOT ask follow-up questions. Start investigating immediately.
|
|
24
|
+
|
|
25
|
+
### 2. Explore and diagnose
|
|
26
|
+
|
|
27
|
+
Deeply investigate the codebase to find:
|
|
28
|
+
|
|
29
|
+
- **Where** the bug manifests (entry points, UI, API responses)
|
|
30
|
+
- **What** code path is involved (trace the flow)
|
|
31
|
+
- **Why** it fails (the root cause, not just the symptom)
|
|
32
|
+
- **What** related code exists (similar patterns, tests, adjacent modules)
|
|
33
|
+
|
|
34
|
+
Look at:
|
|
35
|
+
|
|
36
|
+
- Related source files and their dependencies
|
|
37
|
+
- Existing tests (what's tested, what's missing)
|
|
38
|
+
- Recent changes to affected files (`git log`)
|
|
39
|
+
- Error handling in the code path
|
|
40
|
+
- Similar patterns elsewhere that work correctly
|
|
41
|
+
|
|
42
|
+
### 3. Identify the fix approach
|
|
43
|
+
|
|
44
|
+
Determine:
|
|
45
|
+
|
|
46
|
+
- The minimal change needed to fix the root cause
|
|
47
|
+
- Which modules/interfaces are affected
|
|
48
|
+
- What behaviors need to be verified via tests
|
|
49
|
+
- Whether this is a regression, missing feature, or design flaw
|
|
50
|
+
|
|
51
|
+
### 4. Design TDD fix plan
|
|
52
|
+
|
|
53
|
+
Create an ordered list of RED-GREEN cycles (vertical slices):
|
|
54
|
+
|
|
55
|
+
- **RED**: a specific test that captures the broken/missing behavior
|
|
56
|
+
- **GREEN**: the minimal code change to make that test pass
|
|
57
|
+
|
|
58
|
+
Rules:
|
|
59
|
+
|
|
60
|
+
- Tests verify behavior through public interfaces, not implementation details
|
|
61
|
+
- One test at a time (NOT all tests first, then all code)
|
|
62
|
+
- **Durability**: describe behaviors and contracts, not internal structure. A good
|
|
63
|
+
fix plan reads like a spec, a bad one reads like a diff
|
|
64
|
+
|
|
65
|
+
For TDD philosophy → see the **tdd** skill. For test recipes → see your project's
|
|
66
|
+
testing playbook.
|
|
67
|
+
|
|
68
|
+
### 5. Confirm with the user
|
|
69
|
+
|
|
70
|
+
Present the issue draft: problem summary, root cause analysis, and TDD fix plan.
|
|
71
|
+
Ask: "Does this look right? Should I create the issue?"
|
|
72
|
+
|
|
73
|
+
### 6. Create the GitHub issue
|
|
74
|
+
|
|
75
|
+
After confirmation, create with `gh issue create` and the `harness:managed` label.
|
|
76
|
+
Use a structure like:
|
|
77
|
+
|
|
78
|
+
```
|
|
79
|
+
## Problem
|
|
80
|
+
<one-paragraph summary of the observed behavior>
|
|
81
|
+
|
|
82
|
+
## Root cause
|
|
83
|
+
<the actual cause, traced to file:line>
|
|
84
|
+
|
|
85
|
+
## Fix plan (TDD)
|
|
86
|
+
- [ ] RED: <test capturing behavior 1> → GREEN: <minimal change>
|
|
87
|
+
- [ ] RED: <test capturing behavior 2> → GREEN: <minimal change>
|
|
88
|
+
|
|
89
|
+
## Affected
|
|
90
|
+
<modules / interfaces touched>
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
Print the issue URL and a one-line summary of the root cause.
|
|
94
|
+
|
|
95
|
+
## Uncontemplated Scenarios
|
|
96
|
+
|
|
97
|
+
When a scenario doesn't clearly fit these rules:
|
|
98
|
+
|
|
99
|
+
1. Apply the closest matching approach with reasoning
|
|
100
|
+
2. **Flag it**: "This scenario isn't covered by the triage-issue skill. I applied
|
|
101
|
+
[approach] because [reason]. Want to update the skill?"
|
|
102
|
+
3. Offer to add a new rule for the case
|
|
@@ -0,0 +1,69 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: update-architecture
|
|
3
|
+
description: Keep docs/architecture.md current after an architecturally significant change — a new module/context, a changed boundary or integration seam, a new external dependency, a data-flow change. Use when the change alters the system's shape and the project keeps an architecture doc. The Orchestrator invokes the Architect with this skill; it only lands at install when docs/architecture.md already exists.
|
|
4
|
+
origin: vendor
|
|
5
|
+
vendor_version: '{{vendor_version}}'
|
|
6
|
+
applies-when: [has-architecture-doc]
|
|
7
|
+
invoked-by: [architect]
|
|
8
|
+
trigger-condition: a change alters the architecture (or drift is surfaced) and docs/architecture.md exists
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
# Update Architecture
|
|
12
|
+
|
|
13
|
+
## Core Principle
|
|
14
|
+
|
|
15
|
+
`docs/architecture.md` is the **living high-level map** of the system — the shape a new
|
|
16
|
+
engineer reads first. It drifts the moment a change alters that shape and the doc isn't
|
|
17
|
+
updated. This skill keeps it true to reality after an architecturally significant change.
|
|
18
|
+
|
|
19
|
+
The trigger is usually a change in hand, but the job is the same when **drift is surfaced
|
|
20
|
+
without a change** — an orientation or review finds the map already diverged from the code
|
|
21
|
+
(a `code-explorer` Notes flag, a Reviewer side-finding; ADR 0011). Reconcile it the same
|
|
22
|
+
way, applying the same significance bar below: only shape-level divergence is worth an edit.
|
|
23
|
+
|
|
24
|
+
It is gated on `applies-when: has-architecture-doc`: it only installs when the project
|
|
25
|
+
already keeps `docs/architecture.md`. The harness **never creates** an architecture doc
|
|
26
|
+
where the client chose not to have one (decision #8 — the vendor gives the framework,
|
|
27
|
+
not the architecture). If a project has no such doc and one is wanted, that's a client
|
|
28
|
+
decision, not an automatic harness action.
|
|
29
|
+
|
|
30
|
+
## What counts as architecturally significant
|
|
31
|
+
|
|
32
|
+
Update the doc when the change touches the **shape**, not the detail:
|
|
33
|
+
|
|
34
|
+
- a new module / bounded context, or one removed or merged;
|
|
35
|
+
- a changed boundary or ownership rule ("X now owns Y; others reference by id");
|
|
36
|
+
- a new or changed integration seam (sync HTTP → domain events, a new queue);
|
|
37
|
+
- a new external dependency that carries lock-in (a database, a broker, an auth provider);
|
|
38
|
+
- a data-flow change a reader of the diagram would otherwise get wrong.
|
|
39
|
+
|
|
40
|
+
A bug fix, a refactor that preserves the shape, or a new function inside an existing
|
|
41
|
+
module is **not** architecturally significant — leave the doc alone.
|
|
42
|
+
|
|
43
|
+
## Process
|
|
44
|
+
|
|
45
|
+
1. **Read the current doc** and locate the section the change — or the surfaced drift —
|
|
46
|
+
affects. Understand the existing structure before editing — match its level of
|
|
47
|
+
abstraction and its style.
|
|
48
|
+
2. **Make the smallest true edit.** Reflect the new reality surgically: update the
|
|
49
|
+
affected section, a diagram, or a boundary statement. Don't rewrite what didn't change.
|
|
50
|
+
3. **Link the why, don't restate it.** If the change was recorded as an ADR (via
|
|
51
|
+
`write-adr`), reference it (`see ADR-NNNN`) rather than re-explaining the rationale —
|
|
52
|
+
the architecture doc holds the _shape_, the ADR the _decision_.
|
|
53
|
+
4. **Keep it high-level.** This is a map, not a code listing. Resist pulling in
|
|
54
|
+
implementation detail that belongs in code, `CLAUDE.md`, or a playbook.
|
|
55
|
+
|
|
56
|
+
## Report
|
|
57
|
+
|
|
58
|
+
Return to the Orchestrator: the section(s) updated and a one-line summary of what the
|
|
59
|
+
map now reflects. If the change turned out **not** to be architecturally significant,
|
|
60
|
+
say so and make no edit.
|
|
61
|
+
|
|
62
|
+
## Uncontemplated Scenarios
|
|
63
|
+
|
|
64
|
+
When a case doesn't clearly fit:
|
|
65
|
+
|
|
66
|
+
1. Apply the closest matching approach with reasoning.
|
|
67
|
+
2. **Flag it**: "This isn't covered by the update-architecture skill. I did [approach]
|
|
68
|
+
because [reason]. Want to refine the skill?"
|
|
69
|
+
3. Offer to add a rule for the case.
|