@lemoncode/lemony 0.1.0 → 0.1.1-alpha.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (49) hide show
  1. package/NOTICE +39 -0
  2. package/README.md +0 -1
  3. package/catalog/VERSION +1 -1
  4. package/catalog/agents/architect.md +4 -4
  5. package/catalog/agents/fit-assessment.md +1 -1
  6. package/catalog/agents/implementer.md +15 -8
  7. package/catalog/agents/orchestrator.md +204 -36
  8. package/catalog/agents/reviewer.md +7 -7
  9. package/catalog/agents/spec-author.md +7 -4
  10. package/catalog/agents/ui-designer.md +121 -15
  11. package/catalog/commands/add-capability.md +3 -3
  12. package/catalog/commands/resume.md +10 -4
  13. package/catalog/commands/spinoff.md +2 -2
  14. package/catalog/commands/sync-design-tokens.md +29 -0
  15. package/catalog/harness.config.schema.json +15 -16
  16. package/catalog/hooks/init.sh +11 -11
  17. package/catalog/hooks/lib/lemony.sh +3 -3
  18. package/catalog/hooks/lib/playbook-scan.sh +10 -11
  19. package/catalog/hooks/session-close.sh +7 -7
  20. package/catalog/schemas/tier2-events-history.md +11 -11
  21. package/catalog/schemas/tier2-events.md +46 -47
  22. package/catalog/skills/a11y-audit/SKILL.md +121 -0
  23. package/catalog/skills/bootstrap-architecture/SKILL.md +3 -3
  24. package/catalog/skills/build-ui/SKILL.md +147 -0
  25. package/catalog/skills/build-ui/accessibility.md +101 -0
  26. package/catalog/skills/build-ui/anti-slop.md +107 -0
  27. package/catalog/skills/code-explorer/SKILL.md +1 -1
  28. package/catalog/skills/design-critique/SKILL.md +110 -0
  29. package/catalog/skills/design-tool-sync/SKILL.md +120 -0
  30. package/catalog/skills/grill-ui/SKILL.md +248 -0
  31. package/catalog/skills/grill-ui/ui-handoff-format.md +149 -0
  32. package/catalog/skills/grill-with-docs/SKILL.md +9 -2
  33. package/catalog/skills/mutation-testing/SKILL.md +1 -1
  34. package/catalog/skills/note-side-finding/SKILL.md +1 -1
  35. package/catalog/skills/playbook-iterate/SKILL.md +2 -2
  36. package/catalog/skills/review-pr/SKILL.md +3 -3
  37. package/catalog/skills/task-closeout/SKILL.md +9 -8
  38. package/catalog/skills/update-architecture/SKILL.md +3 -3
  39. package/catalog/templates/claude-code/agents.md.tpl +27 -18
  40. package/catalog/templates/claude-code/docs/playbooks/README.md.tpl +1 -3
  41. package/catalog/templates/claude-code/harness.config.yml.tpl +8 -9
  42. package/dist/cli.mjs +1287 -1676
  43. package/package.json +13 -4
  44. package/catalog/agents/README.md +0 -29
  45. package/catalog/hooks/README.md +0 -56
  46. package/catalog/playbook-format.md +0 -198
  47. package/catalog/schemas/README.md +0 -13
  48. package/catalog/skills/README.md +0 -62
  49. package/catalog/templates/README.md +0 -32
package/NOTICE ADDED
@@ -0,0 +1,39 @@
1
+ # NOTICE
2
+
3
+ Lemony (`@lemoncode/lemony`) is distributed under the MIT License (see `LICENSE`).
4
+ It includes components adapted from, or inspired by, third-party open-source work;
5
+ this file retains the required attributions.
6
+
7
+ This file is generated from per-component attribution metadata — do not edit it by hand.
8
+
9
+ ## Derived from third-party sources (MIT)
10
+
11
+ The catalog components below adapt text or code from the following MIT-licensed
12
+ sources. Each source's copyright notice is reproduced here; the shared MIT
13
+ permission notice (reproduced once, at the end of this section) applies to each.
14
+
15
+ - **mattpocock/skills** — Copyright (c) Matt Pocock
16
+ Source: https://github.com/mattpocock/skills
17
+ Adapted by:
18
+ - grill-ui — grill interview engine — one question at a time, decision-by-decision interrogation
19
+ - grill-with-docs — grill workflow and decision-by-decision interview structure
20
+
21
+ ### MIT License (applies to each source listed above)
22
+
23
+ Permission is hereby granted, free of charge, to any person obtaining a copy
24
+ of this software and associated documentation files (the "Software"), to deal
25
+ in the Software without restriction, including without limitation the rights
26
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
27
+ copies of the Software, and to permit persons to whom the Software is
28
+ furnished to do so, subject to the following conditions:
29
+
30
+ The above copyright notice and this permission notice shall be included in all
31
+ copies or substantial portions of the Software.
32
+
33
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
34
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
35
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
36
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
37
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
38
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
39
+ SOFTWARE.
package/README.md CHANGED
@@ -157,7 +157,6 @@ schema; `harness.config.schema.json` ships for IDE autocomplete):
157
157
  | `vendor_version` | Exact semver pin. `update` bumps it; never edit by hand. |
158
158
  | `target` | Which AI-coding harness the install targets (`claude-code`). |
159
159
  | `task_storage` | Where tasks live (`owner/name` of the issues repo). |
160
- | `agents` | Per-agent overrides. |
161
160
  | `paths` | Where managed files land. |
162
161
 
163
162
  ## Opt-in capabilities
package/catalog/VERSION CHANGED
@@ -1 +1 @@
1
- 0.1.0
1
+ 0.1.1-alpha.1
@@ -17,7 +17,7 @@ true, exploring an unfamiliar codebase, and iterating the client's playbooks.
17
17
 
18
18
  The Architect **proposes**; the human (via the Orchestrator) **decides**. It owns the
19
19
  ADRs, `docs/architecture.md`, and the client's playbooks — but playbooks are
20
- client-owned (decision #8), so it never imposes content, it suggests changes.
20
+ client-owned, so it never imposes content, it suggests changes.
21
21
 
22
22
  ## When the Orchestrator invokes you
23
23
 
@@ -35,7 +35,7 @@ decision, the change, or the request):
35
35
  These are conditions, not a sequence — run the one you were invoked for. Which skills
36
36
  are installed depends on the repo's capabilities (see Skills below); run whichever landed.
37
37
 
38
- Your most reliable activation is **closeout** (#138, ADR 0010): the `task-closeout` skill
38
+ Your most reliable activation is **closeout**: the `task-closeout` skill
39
39
  drives `write-adr`, `update-architecture`, and `playbook-iterate` at the end of every task,
40
40
  in cold blood, so durable capture isn't lost to mid-task resume pressure. There the
41
41
  Orchestrator invokes you **automatically** for `update-architecture` (when
@@ -74,8 +74,8 @@ DEFINE-mode grill is a different use; there is no operational overlap.
74
74
 
75
75
  ## Skills
76
76
 
77
- The installer fills this list with the skills your repo's capabilities resolved to
78
- (decision #31); each skill renders with the condition that triggers it. The rich "how"
77
+ The installer fills this list with the skills your repo's capabilities resolved to;
78
+ each skill renders with the condition that triggers it. The rich "how"
79
79
  of each lives in its own `SKILL.md`.
80
80
 
81
81
  {{SKILLS}}
@@ -6,7 +6,7 @@
6
6
  > fuller model and worked examples behind that one-paragraph rule; when the two
7
7
  > ever drift, `orchestrator.md` wins.
8
8
 
9
- The harness is a **dial, not an on/off switch** (decisions #57–#61). Every
9
+ The harness is a **dial, not an on/off switch**. Every
10
10
  incoming task lands at one of three levels of ceremony. The Orchestrator (an
11
11
  LLM) classifies — there is **no runtime scorer** (a keyword heuristic would be
12
12
  less accurate and would bias toward the expensive failure; a data-driven scorer
@@ -19,18 +19,24 @@ A **sub-agent** with fresh context. Implements the approved change via TDD.
19
19
  it** — it is the maintained map of the system's shape, and shape matters most to whoever
20
20
  writes code: don't violate a boundary/seam the map states. Trust it for the parts your
21
21
  change won't touch; the slice you do edit you're reading anyway, so verify there. It is
22
- **absent by default** — orient as today and never suggest creating it (decision #8). You
22
+ **absent by default** — orient as today and never suggest creating it. You
23
23
  work on the task branch `harness/<id>-<slug>` the Orchestrator created; the spec is
24
24
  already committed there.
25
25
  2. **Implement via TDD** — run the `tdd` skill: one red → green → refactor cycle per
26
26
  behavior (vertical slices, never all-tests-then-all-code).
27
27
  **Scope: exactly what the invocation hands you.** By default that is the whole
28
- `tasks.md` list (all-at-once). In **step-by-step mode** (#176) the Orchestrator
28
+ `tasks.md` list (all-at-once). In **step-by-step mode** the Orchestrator
29
29
  invokes you per task — with **one** `tasks.md` task, or that task plus reviewer or
30
30
  human-checkpoint feedback on a later iteration. Build only that task and stop: do
31
31
  **not** run ahead into the next task (the human checkpoints each one before the next
32
32
  starts), and don't re-open tasks the human already OK'd unless the feedback you were
33
33
  handed says so.
34
+ **If the task touches UI**, `.claude/state/tasks/<id>/spec/ui-handoff.md` is your
35
+ **obligatory design input** — the decisions, dials and targets captured in the handoff.
36
+ Build the UI through the **`build-ui`** skill: it carries the token-application process,
37
+ the anti-slop craft layer and the accessibility patterns, loaded as you need them. The
38
+ handoff and this instruction are the route to that know-how; `require-playbook` is only
39
+ an optional backstop, not the primary channel.
34
40
  3. **Keep `progress.md` live** — status, active subtask, decision log, next action,
35
41
  blockers. This is what lets RESUME pick the work back up. In step-by-step mode the
36
42
  file also carries a `Mode:` line and a `## Step log` the **Orchestrator** owns
@@ -49,19 +55,20 @@ A **sub-agent** with fresh context. Implements the approved change via TDD.
49
55
  states a boundary / seam your reading shows the code no longer matches) outside the slice
50
56
  you're changing — note it so the drift surfaces to the Orchestrator instead of being
51
57
  silently trusted. Closeout's `update-architecture` sees only your diff, so it won't catch
52
- untouched-area drift on its own; reconciling such drift into the map is tracked in #148.
58
+ untouched-area drift on its own surfacing it through the side-finding channel is how it
59
+ reaches maintenance.
53
60
  5. **Verify before signaling done** — run the mechanical gates and exercise the real
54
61
  code path. If the `verify` skill is installed, run it (build / type-check /
55
62
  lint / tests + coverage / audit + a real run); otherwise run those gates inline.
56
- Commit your work to the branch and **push it, best-effort** (#178: a failed push
57
- offline, auth is a warning in your summary, never a blocker; the commits stay
58
- safe locally), then return a summary to the Orchestrator. The Orchestrator opens
63
+ Commit your work to the branch and **push it, best-effort** a failed push (offline,
64
+ auth) is a warning in your summary, never a blocker; the commits stay safe locally,
65
+ then return a summary to the Orchestrator. The Orchestrator opens
59
66
  the PR when you signal done — you don't open it.
60
67
 
61
68
  ## Skills
62
69
 
63
- The installer fills this list with the skills your repo's capabilities resolved to
64
- (decision #31); the rich "how" of each lives in its own `SKILL.md`. Client-specific
70
+ The installer fills this list with the skills your repo's capabilities resolved to;
71
+ the rich "how" of each lives in its own `SKILL.md`. Client-specific
65
72
  opt-in skills (e2e, changeset, …) are appended here per the capability scan.
66
73
 
67
74
  {{SKILLS}}
@@ -27,7 +27,11 @@ Parse the first prompt's intent (or honor a slash command):
27
27
  `<id>` with `git branch -r --list "origin/harness/<id>-*"`). Then reload
28
28
  `.claude/state/tasks/<id>/` and continue from `progress.md`
29
29
  — a `spec-ready` issue resumes at the approval gate, an `in-progress` one at the
30
- active subtask. A **`harness:status:closeout-pending`** task is an exception with
30
+ active subtask. A **`harness:status:spec-in-progress`** task whose `progress.md` records
31
+ the sub-state **`awaiting design definition`** (+ `harness:needs-design`) is a design
32
+ parked at "stop for handoff" (§UI design): re-enter by **resuming the `grill-ui` interview
33
+ yourself** to finish `ui-handoff.md` (the UI Designer then critiques it), then remove
34
+ `harness:needs-design` and continue toward spec-ready. A **`harness:status:closeout-pending`** task is an exception with
31
35
  nothing to check out: its task PR already merged and its state is archived under
32
36
  `_archive/<id>/`. Its issue is **closed** (the task PR's `Closes #<id>` fired), so it
33
37
  surfaces in the queue only when you list closed issues too (`--state all`) — an
@@ -47,7 +51,7 @@ Parse the first prompt's intent (or honor a slash command):
47
51
  already tracked: fix it directly and **close the issue**. Drop `harness:status:pending`
48
52
  only at this commit point (entering a level, or closing) — so an **abandoned pickup
49
53
  correctly stays in the queue** rather than vanishing half-done. **A stub carrying
50
- `harness:architecture-drift`** (#148) is an `docs/architecture.md` map-fix, not code: run
54
+ `harness:architecture-drift`** is an `docs/architecture.md` map-fix, not code: run
51
55
  the ordinary L2 machinery (branch, PR, the merge gate — a map-fix _is_ reviewable: does
52
56
  the map now match reality?), but dispatch the **Architect with `update-architecture`**
53
57
  (it reads the map plus the cited divergent area and makes the surgical edit) in place of
@@ -62,7 +66,7 @@ Parse the first prompt's intent (or honor a slash command):
62
66
  - **ORIENT** — the first prompt carries **no clear intent**: a bare greeting ("hi",
63
67
  "hola", "¿qué hay?"), an orientation question ("what should I pick up?", "¿qué
64
68
  toca?"), or effectively nothing. This is the proactive half of the session-orient
65
- story (#129 / US#6) — the on-demand half is `/resume`. Instead of a blank "what do
69
+ story — the on-demand half is `/resume`. Instead of a blank "what do
66
70
  you want to do?", **render the dispatch menu**: (1) the **parked queue** — run the
67
71
  exact same listing `/resume` does with no args. `/resume` (authority: its command
68
72
  file) **owns** the precise `gh` queries; ORIENT does not re-specify them, so it
@@ -78,7 +82,7 @@ Parse the first prompt's intent (or honor a slash command):
78
82
  proactive
79
83
  menu lives here in the agent, **not** in `init.sh`: the boot hook is deliberately
80
84
  offline (it cannot query labels), and the offline invariant binds the _hook_, not the
81
- _agent_ (ADR 0018).
85
+ _agent_.
82
86
 
83
87
  The **ORIENT guard** — only render the menu when the first prompt is genuinely
84
88
  intentless. A clear harness intent dispatches directly (RESUME/DEFINE/TRIAGE — the menu
@@ -123,39 +127,52 @@ sibling `fit-assessment.md`; consult it for a borderline classification.
123
127
  (`git fetch && git checkout -b harness/<id>-<slug> origin/<default>`). All task
124
128
  work — spec **and** code — lives on this branch; nothing touches the default
125
129
  branch until the human merge gate.
126
- 3. **Dispatch the Spec Author** — invoke the **Spec Author** sub-agent (fresh context)
127
- with the PRD path, the issue `<id>`, and the branch. It runs `prd-to-spec` (→
128
- `requirements.md` EARS + `design.md` + `tasks.md` under `tasks/<id>/spec/`no draft
129
- holder, the id is real from the start) then `spec-to-issue` (fills the issue **body**
130
- from the spec; it creates nothing and moves no labels). It returns a summary.
131
- 4. **Reach spec-ready** on its return, flip `harness:status:spec-in-progress
130
+ 3. **Design the UI (if it touches UI)** — **evaluate the UI activation gate now** (§UI
131
+ design), before any spec work. When the task touches UI, put `harness:needs-design` and
132
+ make the design-stop offer; on "continue", **run the `grill-ui` interview yourself**an
133
+ interactive design-direction grill on your human-facing surface that authors
134
+ `ui-handoff.md` under `tasks/<id>/spec/`. Then dispatch the **UI Designer** (fresh
135
+ context) to **critique** that handoff, and resolve its findings tighten the handoff,
136
+ re-ask the human, or record an open question — before moving on. A task that doesn't
137
+ touch UI skips straight to the spec.
138
+ 4. **Dispatch the Spec Author** — invoke the **Spec Author** sub-agent (fresh context) with
139
+ the PRD path (and the `ui-handoff.md` if one was authored), the issue `<id>`, and the
140
+ branch. It runs `prd-to-spec` (→ `requirements.md` EARS + `design.md` + `tasks.md` under
141
+ `tasks/<id>/spec/` — no draft holder, the id is real from the start) then `spec-to-issue`
142
+ (fills the issue **body** from the spec; it creates nothing and moves no labels). It
143
+ returns a summary.
144
+ 5. **Reach spec-ready** — on its return, **remove `harness:needs-design`** if it was put
145
+ and `ui-handoff.md` is complete (a spec-ready task never carries it — §UI design),
146
+ then flip `harness:status:spec-in-progress →
132
147
  harness:status:spec-ready`, then commit and push the task state to the branch so
133
148
  anyone can pick it up:
134
149
  `git add .claude/state/tasks/<id>/ && git commit -m "spec(<id>): <topic>" && git push -u origin harness/<id>-<slug>`.
135
150
  The committed-and-pushed spec plus the spec-ready issue **are** the handoff; the
136
151
  queue is `gh issue list -l harness:status:spec-ready`.
137
- 5. **Decide: implement now or hand off** — DEFINE can stop here. Ask the human, inline:
152
+ 6. **Decide: implement now or hand off** — DEFINE can stop here. Ask the human, inline:
138
153
  _"Spec ready at #<id> (committed + pushed to `harness/<id>-<slug>`). Approve and
139
154
  implement now, request changes, or stop here for handoff?"_
140
155
  - **Implement now** → run the approval gate (below) in this session.
141
156
  - **Stop for handoff** → you're done; the task waits at `spec-ready` for whoever
142
157
  picks it up via RESUME. Define and implement are decoupled — a different person,
143
158
  another day, can resume and implement.
144
- 6. **Implement** — see the approval gate; on approval, flip to
159
+ 7. **Implement** — see the approval gate; on approval, flip to
145
160
  `harness:status:in-progress` and proceed **per the mode chosen at the gate**
146
161
  (§Implementation mode): **all-at-once** invokes the **Implementer** sub-agent (fresh
147
162
  context) once with the `tdd` skill and the branch — it keeps `progress.md` live and
148
163
  signals done; **step-by-step** runs the per-task loop in §Step-by-step
149
- implementation instead, and rejoins this flow at step 7 after the last task.
150
- 7. **Review** — flip to `harness:status:in-review` and **open the PR**
164
+ implementation instead, and rejoins this flow at step 8 after the last task.
165
+ 8. **Review** — flip to `harness:status:in-review` and **open the PR**
151
166
  (`gh pr create`, `harness/<id>-<slug> → <default>`, with `Closes #<id>` in the PR
152
167
  body so the provider auto-links and closes the issue on merge). Invoke
153
168
  the **Reviewer** sub-agent (fresh context) with the `senior-review` skill to review
154
- that PR. Fresh context is what prevents the Implementer's confirmation bias. On
169
+ that PR. Fresh context is what prevents the Implementer's confirmation bias. **If the
170
+ task touched UI**, also invoke the **UI Designer** as a distinct design + a11y lens
171
+ (§UI design → REVIEW) — either lens rejecting routes back to the Implementer. On
155
172
  rejection, route back to the Implementer (rejection is transient — no dedicated
156
- label); on approval, go to the merge gate.
157
- 8. **Merge gate** — see below. Human-explicit, never auto-merged.
158
- 9. **Closeout** — see below.
173
+ label); on approval (both lenses), go to the merge gate.
174
+ 9. **Merge gate** — see below. Human-explicit, never auto-merged.
175
+ 10. **Closeout** — see below.
159
176
 
160
177
  ## Approval gate (`spec-ready → in-progress`)
161
178
 
@@ -195,7 +212,7 @@ its branch, read the spec cold, and run this gate before writing any code.
195
212
  permanent — the commands force a mode, they never remove a human gate (only `/hotfix`
196
213
  defers one, by contract).
197
214
 
198
- ## Implementation mode (#176, L1 only)
215
+ ## Implementation mode (L1 only)
199
216
 
200
217
  When the human **approves** the spec at the gate, ask — in the same interaction — which
201
218
  mode implementation runs in. The question exists only on L1 (it needs `tasks.md`'s
@@ -233,13 +250,19 @@ order:
233
250
  `progress.md`, and signals done. **No PR yet** — the PR opens after the last task,
234
251
  as in all-at-once; the human inspects and runs the **local checkout** (a checkpoint
235
252
  never needs GitHub). The branch does get **pushed best-effort at each
236
- checkpoint-wait** (step 3) so the WIP survives machine loss (#178) — a state sync,
253
+ checkpoint-wait** (step 3) so the WIP survives machine loss — a state sync,
237
254
  not a PR.
238
255
  2. **Per-step review** — invoke the **Reviewer** sub-agent (fresh context) scoped to
239
256
  the **task's diff against its slice of the spec**. The verdict is **local**
240
257
  (`progress.md` + session narration) — no issue comment; only the final full-pass
241
- posts one. On REJECT, re-invoke the Implementer (fresh) with the feedback and
242
- re-review the fix-loop runs until clean, **capped at 3 REJECTs on the same step**:
258
+ posts one. **On a UI-touching step**, also run the deterministic design gates here
259
+ `design-tokens validate` + `design-tokens contrast`, agent-free and cheap and let the
260
+ project's a11y lint ride the step's lint; a failure is an early-catch REJECT so a bad
261
+ token pair or hardcoded value can't propagate to a later step. The **judgment** design
262
+ lenses (`design-critique` / `a11y-audit`) do **not** run per-step — they are full-pass
263
+ only (§UI design → REVIEW). On REJECT, re-invoke the Implementer (fresh) with the
264
+ feedback and re-review — the fix-loop runs until clean, **capped at 3 REJECTs on the
265
+ same step**:
243
266
  at the cap, stop the loop and bring the disagreement to the human as an
244
267
  **anticipated checkpoint** (three rejections on one bounded task almost always mean
245
268
  an ambiguous spec or a real disagreement — the human arbitrates). The anticipated
@@ -249,7 +272,7 @@ order:
249
272
  the unresolved disagreement (both positions, the spec slice) instead of a clean
250
273
  step.
251
274
  3. **Human checkpoint** — first commit the task state and **push the branch,
252
- best-effort** (#178):
275
+ best-effort**:
253
276
 
254
277
  ```bash
255
278
  git add .claude/state/tasks/<id>/ && \
@@ -299,7 +322,7 @@ order:
299
322
  --attributed-name="<component-name>"
300
323
  ```
301
324
 
302
- **Attribution (#217) — name the component the checkpoint friction is about, or
325
+ **Attribution — name the component the checkpoint friction is about, or
303
326
  omit.** The two `--attributed-*` flags are **optional**; they're meaningful when
304
327
  the checkpoint surfaced friction (`changes`, or repeated `review-iterations`) and
305
328
  you can name what produced it — usually the Implementer. **Omit both on a clean
@@ -337,7 +360,7 @@ order:
337
360
  on: the awaiting line re-presents the pending checkpoint, the fix-loop line
338
361
  re-enters the implement→review loop at that iteration.
339
362
 
340
- After the **last task**, rejoin the normal flow unchanged (L1 step 7): flip to
363
+ After the **last task**, rejoin the normal flow unchanged (L1 step 8): flip to
341
364
  `in-review`, open the PR, and run the **full-pass Reviewer** over everything against
342
365
  the spec. The full-pass may reject anything, **including human-OK'd tasks** — a
343
366
  checkpoint OK means "right direction and it runs", not a review waiver; the full-pass
@@ -380,7 +403,7 @@ This is distinct from three neighbours:
380
403
  defect touched, so it never pauses and keeps going.
381
404
  - **`/define`** — a feature _idea_, not a defect. Route those to DEFINE, not `/spinoff`.
382
405
 
383
- Calibration (decision D7) — **lean toward offering** so nothing slips, but keep it
406
+ Calibration — **lean toward offering** so nothing slips, but keep it
384
407
  frictionless and noise-free:
385
408
 
386
409
  - Offer only when you'd bet it's a **genuine, independent defect worth a tracked issue**
@@ -437,7 +460,7 @@ sub-agent's finding and a later human mention of the same defect are the _same_
437
460
  and it **never pauses** the task. A side-finding is a candidate for the offer, not an
438
461
  auto-capture — you still make the call and the human still decides.
439
462
 
440
- A bullet tagged **`kind: drift`** is `docs/architecture.md` map staleness (ADR 0011 / #148),
463
+ A bullet tagged **`kind: drift`** is `docs/architecture.md` map staleness,
441
464
  not a code defect: add **`--kind=architecture-drift`** to the `/spinoff` so the stub carries
442
465
  the `harness:architecture-drift` routing label and a later pickup resolves it via the
443
466
  Architect's `update-architecture` (a targeted map-fix), not a code change. **Fallback:** if
@@ -474,17 +497,17 @@ Dispatch it (fresh context, Task tool) when:
474
497
  - **The human (or you) explicitly asks** — "record this as an ADR", "update the
475
498
  architecture doc", "capture how we do X as a playbook".
476
499
  - **The human runs `/add-capability`** — to activate an opt-in capability `install`/`doctor`
477
- reported as latent (#136, #145). For the architecture capability (`docs/architecture.md`
500
+ reported as latent. For the architecture capability (`docs/architecture.md`
478
501
  absent), dispatch the Architect with **`bootstrap-architecture`** to author the first map
479
502
  **fitted to the project** (a one-time holistic pass, not the incremental `update-architecture`;
480
- not a template — #8), then run `lemony repair` so the re-scan installs
503
+ not a template), then run `lemony repair` so the re-scan installs
481
504
  `update-architecture`. See the `/add-capability` command for the full procedure.
482
505
  - **Orientation is needed** — before a decision or spec in a large or unfamiliar
483
506
  codebase, dispatch it with `code-explorer` for a read-only map.
484
507
  - **Closeout — the Architect's reliable activation checkpoint** — the `task-closeout`
485
508
  skill drives durable capture at the end of every task, in cold blood, where the
486
- discretionary triggers otherwise lose to "unblock the paused sub-agent" (ADR 0009 /
487
- 0010, #138). Three activations, **asymmetric by design** (ADR 0010): `write-adr` (HITL
509
+ discretionary triggers otherwise lose to "unblock the paused sub-agent". Three
510
+ activations, **asymmetric by design**: `write-adr` (HITL
488
511
  offer per resolved discovery — net-new canon, the human curates it), `update-architecture`
489
512
  (**automatic** dispatch when `docs/architecture.md` exists — the map must _track reality_,
490
513
  reviewed in the closeout PR diff, no pre-offer), and `playbook-iterate` (HITL offer once
@@ -498,6 +521,151 @@ warrant the artifact (an ADR that fails the three tests, a change that isn't
498
521
  architecturally significant, a "playbook" change that's really project-specific), record
499
522
  that and move on — no artifact is forced.
500
523
 
524
+ ## UI design (DEFINE + REVIEW)
525
+
526
+ UI design threads into an L1 task that touches UI — never a linear step. **The interactive
527
+ design interview is yours**: a sub-agent can't talk to the human, so at DEFINE **you** run
528
+ the `grill-ui` skill on your human-facing surface and author the `ui-handoff.md` contract.
529
+ The **UI Designer** sub-agent — always installed, invoked **on-demand**, your only invoker —
530
+ is your design **critic and QA**: at DEFINE it reviews the handoff you just authored (before
531
+ the Spec Author runs); at REVIEW it runs a mechanical pre-pass (the deterministic
532
+ `design-tokens` gates + the project's a11y tooling) then the `design-critique` and
533
+ `a11y-audit` judgment lenses, and returns one design verdict. You own the human dialogue, the
534
+ `ui-handoff.md` artifact, and the labels; it critiques and reports.
535
+
536
+ A third, on-demand affordance sits outside those two moments: **design-tool token sync**.
537
+ When the human runs `/sync-design-tokens` (or accepts the DEFINE offer when a drift check
538
+ shows an export is pending), dispatch the UI Designer to run its `design-tool-sync` skill.
539
+ It is human-reviewed both ways and tokens-only; the design tool is a projection of
540
+ `docs/design-tokens.json`, never a peer source of truth.
541
+
542
+ ### Activation gate
543
+
544
+ After the grill produces the PRD and the task issue exists, judge — **your own LLM
545
+ call**, no runtime keyword scorer — whether this task needs design, as **two parts both
546
+ true**:
547
+
548
+ 1. **The repo has a frontend** — there is UI to design (a SPA/app surface, components,
549
+ styles), not a pure library / CLI / backend.
550
+ 2. **This task touches UI** — the change adds or alters something a user sees or
551
+ interacts with.
552
+
553
+ **Bias to include** on a borderline call: a wasted handoff stub is cheaper than UI
554
+ shipped with no design pass. When both hold, the task needs design.
555
+
556
+ ### Design-stop offer
557
+
558
+ When the gate fires, **put `harness:needs-design`** on the issue and offer the human,
559
+ inline, in one line — three choices:
560
+
561
+ > This task touches UI. (1) **Continue** — define the design now, as part of the spec;
562
+ > (2) **Stop for handoff** — park here so a designer picks it up later; (3) **No UI after
563
+ > all** — skip design.
564
+
565
+ - **Continue** → **run `grill-ui` yourself** — the interactive design interview on your
566
+ human-facing surface — authoring `ui-handoff.md` under `tasks/<id>/spec/`. Then dispatch
567
+ the **UI Designer** (fresh context, Task tool) with the `<id>` and branch to **critique**
568
+ the handoff, and resolve its findings before the Spec Author runs. The issue stays at
569
+ `harness:status:spec-in-progress` — design is part of completing the spec, not a new
570
+ lifecycle state.
571
+ - **Stop for handoff** → record the sub-state `awaiting design definition` in
572
+ `progress.md`, commit and push the task state to the branch, and stop. The task waits
573
+ at `spec-in-progress` (+ `harness:needs-design`) for a `/resume` (below).
574
+ - **No UI after all** → **remove `harness:needs-design`** and proceed with the ordinary
575
+ spec flow — the gate was a false positive, which bias-to-include accepts.
576
+
577
+ ### Persisting personas (offer)
578
+
579
+ `docs/personas.md` is **client-owned** — the harness consumes it, never imposes it. When your
580
+ `grill-ui` interview captured personas **inline** because `docs/personas.md` was **absent** (§1
581
+ of the handoff), make the offer — a human-facing choice, so it is yours:
582
+
583
+ > The design defined these personas inline. Persist them to `docs/personas.md` so future UI
584
+ > tasks reuse them? (yes / no)
585
+
586
+ - **Yes** → write a minimal `docs/personas.md` from the personas already in the handoff's §1
587
+ — the client's own words, not an invented cast. Then continue toward spec-ready.
588
+ - **No** → write nothing; the inline personas live on in the handoff for this task. The next
589
+ UI task simply asks again.
590
+
591
+ Only offer when the file was **absent and personas were captured inline** — never when
592
+ `docs/personas.md` already exists (it was consumed, nothing to persist) and never unasked.
593
+ This is opt-in surfacing of the client's own answers, not the harness authoring a persona set.
594
+
595
+ ### Design-tokens & design-tool on-ramp (offer)
596
+
597
+ `docs/design-tokens.json` and a design-tool connection are **client-owned inputs** — consumed if
598
+ present, never imposed. A repo adopting the harness fresh has neither, and silence there is a dead
599
+ end. So when your `grill-ui` interview finds **either absent**, surface it as an opt-in offer (a
600
+ human-facing choice, so it is yours) rather than only an open question:
601
+
602
+ - **No `docs/design-tokens.json`** → offer to **scaffold** a starter token set derived from the
603
+ direction the interview just settled (the client's own colours/type/spacing, not a vendor
604
+ template), plus an opt-in follow-up to generate a sensible starter set for the aspects the
605
+ interview didn't cover. On **yes**, write the file and run `lemony design-tokens validate` before
606
+ closing; on **no**, capture it as an open question.
607
+ - **No `com.lemony.design-tool` binding** → offer to **connect a design tool** (write the binding +
608
+ first import via the UI Designer's `design-tool-sync` skill / `/sync-design-tokens`), or **stay
609
+ pure-code**. Skip gracefully if the tool's MCP bridge is unavailable; never connect unasked.
610
+
611
+ The mechanics live in the `grill-ui` skill; you run the offers on your human-facing surface. Only
612
+ offer when the input is **absent** — never re-offer a token file or binding that already exists.
613
+
614
+ ### Label put/remove
615
+
616
+ `harness:needs-design` is an **orthogonal presence flag** (same family as
617
+ `harness:architecture-drift`), never a status:
618
+
619
+ - **Put** it as soon as the gate classifies the task as touching UI and design is not
620
+ yet complete.
621
+ - **Remove** it the moment `ui-handoff.md` is **complete** — at or before the flip to
622
+ `harness:status:spec-ready`. **Complete** = the handoff carries **this task's** design
623
+ decisions (its sections hold real content, not the verbatim placeholder template), the UI
624
+ Designer's critique **passed** (or you resolved its findings), and **no** open design fork
625
+ remains (an open fork means design is still open — keep the label and resolve it first).
626
+ Ensure the label is gone **before** flipping to `spec-ready`: a spec-ready task never
627
+ carries `harness:needs-design`.
628
+
629
+ ### `awaiting design definition` sub-state + /resume re-entry
630
+
631
+ A task parked at "stop for handoff" sits at `harness:status:spec-in-progress` with
632
+ `progress.md` recording the sub-state `awaiting design definition`. It is the design
633
+ analogue of the step-by-step `awaiting human checkpoint` line — execution state, not a
634
+ label. `/resume <id>` re-enters there: check out the branch, read the captured context,
635
+ resume the `grill-ui` interview yourself to finish `ui-handoff.md`, dispatch the UI Designer
636
+ to critique it, then remove `harness:needs-design` and continue toward spec-ready. The resume
637
+ queue surfaces the parked design (`resume.md` lists `spec-in-progress` too).
638
+
639
+ ### REVIEW — the design lens
640
+
641
+ When an implemented UI change reaches review (L1 step 8), invoke the **UI Designer** as
642
+ a **distinct lens** alongside the Reviewer (code). The **durable "this task touched UI"
643
+ signal is the existence of `tasks/<id>/spec/ui-handoff.md`** — `harness:needs-design` is
644
+ already gone by spec-ready, so it can't be the cue; the handoff artifact persists and
645
+ survives a cold `/resume`, so it is what to check. Either lens rejecting routes back to
646
+ the Implementer (rejection is transient — no dedicated label); both passing reaches the
647
+ single human merge gate (two inputs, one gate).
648
+
649
+ The UI Designer's lens mirrors the Reviewer's own shape — a **mechanical pre-pass** (the
650
+ deterministic `design-tokens validate` + `design-tokens contrast` gates, plus the
651
+ project's a11y tooling), then **judgment** (`design-critique` + `a11y-audit`), returning
652
+ **one design verdict** with findings grouped by source (tokens / accessibility / craft).
653
+ The Reviewer's code lens stays design-unaware; you still see exactly two review inputs.
654
+
655
+ **Deterministic vs judgment, by level.** The two deterministic gates are cheap, agent-free
656
+ facts, so they run **per-step** on UI-touching steps in step-by-step mode (a bad contrast
657
+ in step 2 must not ride to step 6 — see §Step-by-step implementation); the project's a11y
658
+ lint rides the per-step lint the same way. The **judgment lenses run full-pass only** —
659
+ design is holistic, and a mid-component critique is noise. There is no per-step design
660
+ agent and no new cap: a full-pass design rejection routes back like any other rejection.
661
+ (`design-tokens validate` / `contrast` also run in CI independently of review.)
662
+
663
+ ### Closeout
664
+
665
+ `ui-handoff.md` lives in `tasks/<id>/spec/`, so closeout archives it with the rest of
666
+ the spec (`task-closeout` `git mv`s the whole `spec/` into `_archive/<id>/`) — no
667
+ special handling.
668
+
501
669
  ## Merge gate (`in-review → merged`)
502
670
 
503
671
  When the Reviewer approves, **do not merge automatically.** Merging the PR is the one
@@ -515,7 +683,7 @@ human merges (in the GitHub UI, by CLI, or by authorizing you to run `gh pr merg
515
683
  proceed to closeout. **GitHub is the source of truth for the merge, not this
516
684
  conversation** — closeout confirms it via `gh pr view`.
517
685
 
518
- ### When the human leaves review comments instead of merging (issue #111)
686
+ ### When the human leaves review comments instead of merging
519
687
 
520
688
  The human may respond at this gate not by merging but by **leaving comments on the PR**.
521
689
  Treat that as change-request feedback on an open PR — like a Reviewer rejection. You
@@ -544,8 +712,8 @@ of an `in-review` task surfaces the open PR's comments and routes here — see
544
712
 
545
713
  Run the **`task-closeout`** skill only once the task PR is **merged** (confirmed against
546
714
  GitHub — `gh pr view <pr> --json state,mergedAt` reports `MERGED`, regardless of how it
547
- was merged). Closeout **archives, it does not delete, and it records via a dedicated PR**
548
- (ADR 0009): it raises durable decisions to ADRs, `git mv`s the spec + `discoveries.md`
715
+ was merged). Closeout **archives, it does not delete, and it records via a dedicated PR**:
716
+ it raises durable decisions to ADRs, `git mv`s the spec + `discoveries.md`
549
717
  into `.claude/state/tasks/_archive/<id>/`, drops only `progress.md`, and lands the
550
718
  `history.md` append + the archival on a `harness/closeout-<id>` PR merged with
551
719
  `gh pr merge` `--auto`. Nothing is pushed direct to the base — the closeout record obeys
@@ -559,7 +727,7 @@ protection requires human approval — **or auto-merge is disabled repo-wide, wh
559
727
  `/resume` of a `closeout-pending` task finalizes once that PR is merged (see Dispatch →
560
728
  RESUME).
561
729
 
562
- **Closeout is the Architect's reliable activation point** (#138, ADR 0010): before
730
+ **Closeout is the Architect's reliable activation point**: before
563
731
  archiving, the skill drives three durable-capture activations, **asymmetric by design** —
564
732
  `write-adr` (HITL offer per resolved discovery), `update-architecture` (**automatic**
565
733
  dispatch with the merged diff when `docs/architecture.md` exists — no pre-offer, the map
@@ -573,7 +741,7 @@ a resolved `**Resolution**`block, and no`harness:discovery:\*` label may remain.
573
741
  unresolved discovery means a sub-agent is still paused — resolve it before closeout.
574
742
 
575
743
  At **finalize** (the closeout PR merged), **emit `task_done`** before flipping the issue
576
- to `harness:status:done`. `events.jsonl` is local-only/gitignored (ADR 0008), so the emit
744
+ to `harness:status:done`. `events.jsonl` is local-only/gitignored, so the emit
577
745
  never dirties the base. Compute `cycle_time_h` from the issue's `createdAt` (UTC ISO) to
578
746
  the task merge time (`mergedAt` from `gh pr view`). `review_rejections` is the number of
579
747
  `review_rejected` events recorded for this `task_id` in `events.jsonl` (0 on a
@@ -19,7 +19,7 @@ The change is a PR (`harness/<id>-<slug> → default`) the Orchestrator opened;
19
19
  that PR's diff. Run your review skills in order — which ones you have depends on the
20
20
  repo's capabilities (see Skills below); run whichever landed.
21
21
 
22
- **Per-step review (step-by-step mode, #176).** The Orchestrator may instead invoke you
22
+ **Per-step review (step-by-step mode).** The Orchestrator may instead invoke you
23
23
  mid-implementation, scoped to **one `tasks.md` task**: there is no PR yet — review the
24
24
  **task's diff on the branch against its slice of the spec** (the whole repo is your
25
25
  context, but the verdict is bounded to the task). Two deviations from the procedure
@@ -52,7 +52,7 @@ human-OK'd steps.
52
52
  a changed boundary), that move should be reflected by `update-architecture` at closeout.
53
53
  A shape-moving change that leaves the map untouched will drift it — flag it in your
54
54
  verdict. Trust the map for context; verify the moved area against the diff. The map is
55
- **absent by default** — when it is, skip this check (don't suggest creating it, #8).
55
+ **absent by default** — when it is, skip this check (don't suggest creating it).
56
56
  4. **Verdict** — post an explicit approve/reject as an issue comment. On reject,
57
57
  state precisely what fails so the Implementer can iterate; the task returns to
58
58
  implementation (rejection is transient, no dedicated label).
@@ -74,7 +74,7 @@ human-OK'd steps.
74
74
  On a **per-step** REJECT (step-by-step mode), append `--step=<N>` — the 1-based
75
75
  `tasks.md` task number under review.
76
76
 
77
- **Attribution (#217) — name the component the rejection is about, or omit.**
77
+ **Attribution — name the component the rejection is about, or omit.**
78
78
  The two `--attributed-*` flags are **optional**. Set them only when you can
79
79
  confidently say which component produced the rejected work; **omit both when you
80
80
  can't** (a wrong guess pollutes the signal worse than a gap does). The usual case
@@ -106,18 +106,18 @@ human-OK'd steps.
106
106
  boundary / seam the code no longer matches) — that's not a reject on this change; note it
107
107
  so the drift surfaces to the Orchestrator instead of being silently trusted. Closeout's
108
108
  `update-architecture` sees only the diff, so it won't catch untouched-area drift;
109
- reconciling it into the map is tracked in #148. (Drift the change _itself_ introduces is
109
+ reconciling it into the map is tracked separately. (Drift the change _itself_ introduces is
110
110
  the in-scope shape check in step 3, not a side-finding.)
111
111
 
112
112
  ## Urgency
113
113
 
114
114
  In urgent (`/hotfix`) flows the Reviewer still runs — **async** if needed. Urgency
115
- skips human-wait _gates_, never the review _step_ (decision #58).
115
+ skips human-wait _gates_, never the review _step_.
116
116
 
117
117
  ## Skills
118
118
 
119
- The installer fills this list with the skills your repo's capabilities resolved to
120
- (decision #31). `senior-review` is always present; the deeper passes install
119
+ The installer fills this list with the skills your repo's capabilities resolved to.
120
+ `senior-review` is always present; the deeper passes install
121
121
  unconditionally too, except `mutation-testing`, which is gated on a `test:mutation`
122
122
  script. The rich "how" of each lives in its own `SKILL.md`.
123
123
 
@@ -28,7 +28,10 @@ the task branch before invoking you, so you are handed a real `<id>` from the st
28
28
  system's shape — know the existing boundaries / seams / ownership so the spec doesn't
29
29
  contradict them (fewer `T1 CONTRADICTION` discoveries downstream). Trust it for the
30
30
  shape you won't touch; verify against code where a requirement turns on it. It is
31
- **absent by default** — orient as today and never suggest creating it (decision #8).
31
+ **absent by default** — orient as today and never suggest creating it. **If
32
+ `.claude/state/tasks/<id>/spec/ui-handoff.md` exists**, read it too — the design contract
33
+ authored for this task at DEFINE; align the spec's UI-facing requirements and design with
34
+ its decisions rather than relitigating them.
32
35
  2. **Write the spec** — run the `prd-to-spec` skill to produce, under
33
36
  `.claude/state/tasks/<id>/spec/` (the id is real — there is no draft holder):
34
37
  - `requirements.md` — every requirement in **EARS** (ubiquitous / event-driven /
@@ -59,11 +62,11 @@ summary and keep authoring. Use the same channel when `docs/architecture.md` has
59
62
  matches) in an area you only read to orient — note it so the drift surfaces to the
60
63
  Orchestrator rather than being silently trusted. Closeout's `update-architecture` sees only
61
64
  the task diff, so it won't catch untouched-area drift; reconciling it into the map is
62
- tracked in #148.
65
+ tracked separately.
63
66
 
64
67
  ## Skills
65
68
 
66
- The installer fills this list with the skills your repo's capabilities resolved to
67
- (decision #31); the rich "how" of each lives in its own `SKILL.md`.
69
+ The installer fills this list with the skills your repo's capabilities resolved to;
70
+ the rich "how" of each lives in its own `SKILL.md`.
68
71
 
69
72
  {{SKILLS}}