@lemoncode/lemony 0.1.0 → 0.1.1-alpha.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (49) hide show
  1. package/NOTICE +39 -0
  2. package/README.md +0 -1
  3. package/catalog/VERSION +1 -1
  4. package/catalog/agents/architect.md +4 -4
  5. package/catalog/agents/fit-assessment.md +1 -1
  6. package/catalog/agents/implementer.md +15 -8
  7. package/catalog/agents/orchestrator.md +204 -36
  8. package/catalog/agents/reviewer.md +7 -7
  9. package/catalog/agents/spec-author.md +7 -4
  10. package/catalog/agents/ui-designer.md +121 -15
  11. package/catalog/commands/add-capability.md +3 -3
  12. package/catalog/commands/resume.md +10 -4
  13. package/catalog/commands/spinoff.md +2 -2
  14. package/catalog/commands/sync-design-tokens.md +29 -0
  15. package/catalog/harness.config.schema.json +15 -16
  16. package/catalog/hooks/init.sh +11 -11
  17. package/catalog/hooks/lib/lemony.sh +3 -3
  18. package/catalog/hooks/lib/playbook-scan.sh +10 -11
  19. package/catalog/hooks/session-close.sh +7 -7
  20. package/catalog/schemas/tier2-events-history.md +11 -11
  21. package/catalog/schemas/tier2-events.md +46 -47
  22. package/catalog/skills/a11y-audit/SKILL.md +121 -0
  23. package/catalog/skills/bootstrap-architecture/SKILL.md +3 -3
  24. package/catalog/skills/build-ui/SKILL.md +147 -0
  25. package/catalog/skills/build-ui/accessibility.md +101 -0
  26. package/catalog/skills/build-ui/anti-slop.md +107 -0
  27. package/catalog/skills/code-explorer/SKILL.md +1 -1
  28. package/catalog/skills/design-critique/SKILL.md +110 -0
  29. package/catalog/skills/design-tool-sync/SKILL.md +120 -0
  30. package/catalog/skills/grill-ui/SKILL.md +248 -0
  31. package/catalog/skills/grill-ui/ui-handoff-format.md +149 -0
  32. package/catalog/skills/grill-with-docs/SKILL.md +9 -2
  33. package/catalog/skills/mutation-testing/SKILL.md +1 -1
  34. package/catalog/skills/note-side-finding/SKILL.md +1 -1
  35. package/catalog/skills/playbook-iterate/SKILL.md +2 -2
  36. package/catalog/skills/review-pr/SKILL.md +3 -3
  37. package/catalog/skills/task-closeout/SKILL.md +9 -8
  38. package/catalog/skills/update-architecture/SKILL.md +3 -3
  39. package/catalog/templates/claude-code/agents.md.tpl +27 -18
  40. package/catalog/templates/claude-code/docs/playbooks/README.md.tpl +1 -3
  41. package/catalog/templates/claude-code/harness.config.yml.tpl +8 -9
  42. package/dist/cli.mjs +1287 -1676
  43. package/package.json +13 -4
  44. package/catalog/agents/README.md +0 -29
  45. package/catalog/hooks/README.md +0 -56
  46. package/catalog/playbook-format.md +0 -198
  47. package/catalog/schemas/README.md +0 -13
  48. package/catalog/skills/README.md +0 -62
  49. package/catalog/templates/README.md +0 -32
@@ -8,17 +8,16 @@
8
8
  > dispatch-on-read).
9
9
 
10
10
  Tier 1 (client-local) writes append-only JSONL from day one so the data is
11
- forward-compatible with the Tier 2 central backend designed in Fase 1+ (decision
12
- \#24, #25, #27, #51).
11
+ forward-compatible with the Tier 2 central backend planned for a later phase.
13
12
 
14
13
  ## Storage
15
14
 
16
15
  - One event per line, UTF-8, in `.claude/state/events.jsonl`. The stream is
17
- **local-only and gitignored — never committed** (ADR 0008, retiring decision
18
- #18/#21): it sits in the managed `GITIGNORE_BLOCK` beside `current-*.md` /
19
- `sessions/`. There is no Tier 2 consumer yet, so committing only dirtied the
20
- base; transport to Tier 2 is the sink designed in #137.
21
- - **Append-only, except confirmed-sent prefix-prune** (#240, ADR 0008 §Amendment).
16
+ **local-only and gitignored — never committed**: it sits in the managed
17
+ `GITIGNORE_BLOCK` beside `current-*.md` / `sessions/`. There is no Tier 2
18
+ consumer yet, so committing only dirtied the base; transport to Tier 2 is a
19
+ planned sink.
20
+ - **Append-only, except confirmed-sent prefix-prune.**
22
21
  Emitters only ever append. The send engine may **collapse the already-delivered prefix**
23
22
  (`[0:cursor]`) once it exceeds ~5MB — never dropping unsent bytes — which rewrites the
24
23
  file and may **reorder** the unsent tail relative to concurrent appends. Order is not a
@@ -41,7 +40,7 @@ same top level — there is no nested `payload`, so Zod discriminated unions key
41
40
  | ----------------- | ------ | -------- | --------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
42
41
  | `type` | string | yes | `internal-enum` | One of the 9 event types listed below. Discriminator. |
43
42
  | `ts` | string | yes | `metric` | UTC ISO 8601 with `Z` suffix (e.g. `2026-05-28T14:30:00.000Z`). **No local offsets.** |
44
- | `user` | string | yes | `local-only` | `git config user.email` of the actor (decision #53). Never exported in any tier (D7). |
43
+ | `user` | string | yes | `local-only` | `git config user.email` of the actor. Never exported in any tier. |
45
44
  | `project` | string | yes | `identity` | `task_storage.repo` slug (e.g. `acme/widgets`), from `harness.config.yml`. **Never `OWNER/REPO`** — the CLI refuses to emit while that placeholder is the value (see [Placeholder guard](#placeholder-guard)). |
46
45
  | `task_id` | string | no | `identity` | Task issue id (e.g. `42`) when the event has a task context. Absent on session/global events. A per-project correlator — only meaningful alongside `project`, so it shares the `identity` axis. |
47
46
  | `harness_version` | string | yes | `metric` | `version` of the **installed** `@lemoncode/lemony` package — _not_ `vendor_version` from config. |
@@ -94,12 +93,12 @@ only `internal-enum` + `metric`; **`project`** (opt-in, for dogfood) additionall
94
93
  keeps `identity` + `free-text`. `local-only` is dropped in both. `identity` and
95
94
  `free-text` share today's policy but stay distinct axes for future divergence (a
96
95
  hashed `user_hash` would be `identity`-with-hashing, not `free-text`). v1 wires only
97
- the `anonymous` branch (decisions D7/D8/D9, ADR 0020).
96
+ the `anonymous` branch.
98
97
 
99
98
  `attributed_name` is deliberately an `internal-enum` even though Zod types it as a
100
99
  bounded free string: the axis is policy-oriented (keep-always), and the field is the
101
- moat metric (#1, "which component causes friction") — a roster component name shared
102
- across all installs, not sensitive free text (D8). The aggregation script flags names
100
+ moat metric ("which component causes friction") — a roster component name shared
101
+ across all installs, not sensitive free text. The aggregation script flags names
103
102
  outside the known roster as the data-quality thermometer (see [Attribution](#attribution)).
104
103
 
105
104
  ---
@@ -108,8 +107,8 @@ outside the known roster as the data-quality thermometer (see [Attribution](#att
108
107
 
109
108
  Five are emitted in P5. `bug_post_merge` is deferred to P8 (meta-test). `l3_bypass`
110
109
  is deferred to P6 (the `/bypass` command). `followup_captured` is emitted by the
111
- `/spinoff` command (#112). `step_completed` is emitted by the Orchestrator in
112
- step-by-step mode (#176). The schema covers all nine so the file is
110
+ `/spinoff` command. `step_completed` is emitted by the Orchestrator in
111
+ step-by-step mode. The schema covers all nine so the file is
113
112
  forward-compatible — readers dispatch on `type` and ignore unknowns.
114
113
 
115
114
  ### 1. `session_closed` _(P5)_
@@ -150,28 +149,28 @@ Emitted by the Orchestrator when it transitions `spec-in-progress → spec-ready
150
149
  Emitted by the Orchestrator at closeout (after `gh pr view` confirms `MERGED`,
151
150
  before `git rm` of the task state).
152
151
 
153
- | Field | Type | Required | Axis | Notes |
154
- | ------------------- | ------ | -------- | --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
155
- | `task_id` | string | yes | `identity` | Required for this type. |
156
- | `level` | string | yes | `internal-enum` | `L1` \| `L2` \| `L3` — the task-fit dial value used. |
157
- | `cycle_time_h` | number | yes | `metric` | Wall-clock hours from issue creation to merge. ≥ 0, finite. |
158
- | `review_rejections` | number | yes | `metric` | Count of `review_rejected` events for this `task_id` (≥ 0, int). |
159
- | `mode` | string | no | `internal-enum` | `all_at_once` \| `step_by_step` — the mode chosen at the L1 approval gate (#176). **Absent on L2** (the question only exists where `tasks.md` does). |
160
- | `steps` | number | no | `metric` | Count of `step_completed` events for this task (≥ 1, int). Only meaningful when `mode` is `step_by_step`; < total tasks after a mid-task downgrade. |
152
+ | Field | Type | Required | Axis | Notes |
153
+ | ------------------- | ------ | -------- | --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
154
+ | `task_id` | string | yes | `identity` | Required for this type. |
155
+ | `level` | string | yes | `internal-enum` | `L1` \| `L2` \| `L3` — the task-fit dial value used. |
156
+ | `cycle_time_h` | number | yes | `metric` | Wall-clock hours from issue creation to merge. ≥ 0, finite. |
157
+ | `review_rejections` | number | yes | `metric` | Count of `review_rejected` events for this `task_id` (≥ 0, int). |
158
+ | `mode` | string | no | `internal-enum` | `all_at_once` \| `step_by_step` — the mode chosen at the L1 approval gate. **Absent on L2** (the question only exists where `tasks.md` does). |
159
+ | `steps` | number | no | `metric` | Count of `step_completed` events for this task (≥ 1, int). Only meaningful when `mode` is `step_by_step`; < total tasks after a mid-task downgrade. |
161
160
 
162
161
  ### 5. `review_rejected` _(P5)_
163
162
 
164
- Emitted by the Reviewer when the verdict is REJECT (decision #25, transient
165
- state — no dedicated label).
163
+ Emitted by the Reviewer when the verdict is REJECT (transient state — no
164
+ dedicated label).
166
165
 
167
- | Field | Type | Required | Axis | Notes |
168
- | ----------------- | ------ | -------- | --------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
169
- | `task_id` | string | yes | `identity` | Required for this type. |
170
- | `reason` | string | yes | `free-text` | Short human-readable reason (one line; never the full review comment). 1-500 chars. |
171
- | `iteration` | number | yes | `metric` | 1-based: the Nth rejection of this task (≥ 1, int). |
172
- | `step` | number | no | `metric` | The step (1-based `tasks.md` task number) whose per-step review rejected (#176). **Absent** on full-pass and all-at-once rejections. |
173
- | `attributed_kind` | string | no | `internal-enum` | `agent` \| `skill` \| `playbook` — the kind of component the friction is attributed to (#217). **Omitted when the emitter can't attribute.** |
174
- | `attributed_name` | string | no | `internal-enum` | The component's name (free string, 1-200 chars), e.g. `implementer`. Independently optional in the schema; emitters pair it with `attributed_kind` and omit both when they can't attribute (#217). Free-string by design — see [Attribution](#attribution). |
166
+ | Field | Type | Required | Axis | Notes |
167
+ | ----------------- | ------ | -------- | --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
168
+ | `task_id` | string | yes | `identity` | Required for this type. |
169
+ | `reason` | string | yes | `free-text` | Short human-readable reason (one line; never the full review comment). 1-500 chars. |
170
+ | `iteration` | number | yes | `metric` | 1-based: the Nth rejection of this task (≥ 1, int). |
171
+ | `step` | number | no | `metric` | The step (1-based `tasks.md` task number) whose per-step review rejected. **Absent** on full-pass and all-at-once rejections. |
172
+ | `attributed_kind` | string | no | `internal-enum` | `agent` \| `skill` \| `playbook` — the kind of component the friction is attributed to. **Omitted when the emitter can't attribute.** |
173
+ | `attributed_name` | string | no | `internal-enum` | The component's name (free string, 1-200 chars), e.g. `implementer`. Independently optional in the schema; emitters pair it with `attributed_kind` and omit both when they can't attribute. Free-string by design — see [Attribution](#attribution). |
175
174
 
176
175
  ### 6. `bug_post_merge` _(P8 — schema only)_
177
176
 
@@ -194,7 +193,7 @@ from the envelope (optional) and usually absent.
194
193
  | `topic` | string | yes | `free-text` | One-line subject (typo, rename, lockfile bump, etc.). 1-200 chars. |
195
194
  | `reason` | string | yes | `free-text` | Why the harness was bypassed. 1-500 chars (≤ 200 recommended). |
196
195
 
197
- ### 8. `followup_captured` _(#112 — `/spinoff`)_
196
+ ### 8. `followup_captured` _(`/spinoff`)_
198
197
 
199
198
  Emitted by the `/spinoff` command when a non-blocking, independent defect found
200
199
  mid-task is parked as a `harness:managed` + `harness:status:pending` stub. Feeds the
@@ -207,7 +206,7 @@ post-merge / production signal; conflating them would dirty the post-merge metri
207
206
  | `parent_task_id` | string | no | `identity` | The originating task id. **Absent** when `/spinoff` runs outside any active task (a deferred stub). |
208
207
  | `severity` | string | no | `internal-enum` | `low` \| `medium` \| `high` \| `critical`. Best-effort — set only when cheaply inferred. |
209
208
 
210
- ### 9. `step_completed` _(#176 — step-by-step mode)_
209
+ ### 9. `step_completed` _(step-by-step mode)_
211
210
 
212
211
  Emitted by the Orchestrator each time a human checkpoint **resolves** in
213
212
  step-by-step mode (L1 opt-in, chosen at the approval gate). One event per
@@ -216,14 +215,14 @@ when it re-checkpoints, with the same `step`. This is the signal that justifies
216
215
  (or condemns) the mode — the rate of checkpoints that catch things, and where
217
216
  humans bail out (`ok_downgrade`).
218
217
 
219
- | Field | Type | Required | Axis | Notes |
220
- | ------------------- | ------ | -------- | --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
221
- | `task_id` | string | yes | `identity` | Required for this type. |
222
- | `step` | number | yes | `metric` | 1-based `tasks.md` task number the checkpoint belongs to (≥ 1, int). |
223
- | `review_iterations` | number | yes | `metric` | Reviewer invocations that preceded this checkpoint (≥ 1, int — every step is reviewed before the human; resets after a "changes"). |
224
- | `checkpoint_result` | string | yes | `internal-enum` | `ok` \| `changes` \| `ok_downgrade` (OK and switch the remaining tasks to all-at-once). |
225
- | `attributed_kind` | string | no | `internal-enum` | `agent` \| `skill` \| `playbook` — the kind of component the friction is attributed to (#217). **Omitted when the emitter can't attribute.** |
226
- | `attributed_name` | string | no | `internal-enum` | The component's name (free string, 1-200 chars). Independently optional in the schema; emitters pair it with `attributed_kind` and omit both when they can't attribute (#217). Free-string by design — see [Attribution](#attribution). |
218
+ | Field | Type | Required | Axis | Notes |
219
+ | ------------------- | ------ | -------- | --------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
220
+ | `task_id` | string | yes | `identity` | Required for this type. |
221
+ | `step` | number | yes | `metric` | 1-based `tasks.md` task number the checkpoint belongs to (≥ 1, int). |
222
+ | `review_iterations` | number | yes | `metric` | Reviewer invocations that preceded this checkpoint (≥ 1, int — every step is reviewed before the human; resets after a "changes"). |
223
+ | `checkpoint_result` | string | yes | `internal-enum` | `ok` \| `changes` \| `ok_downgrade` (OK and switch the remaining tasks to all-at-once). |
224
+ | `attributed_kind` | string | no | `internal-enum` | `agent` \| `skill` \| `playbook` — the kind of component the friction is attributed to. **Omitted when the emitter can't attribute.** |
225
+ | `attributed_name` | string | no | `internal-enum` | The component's name (free string, 1-200 chars). Independently optional in the schema; emitters pair it with `attributed_kind` and omit both when they can't attribute. Free-string by design — see [Attribution](#attribution). |
227
226
 
228
227
  ---
229
228
 
@@ -231,7 +230,7 @@ humans bail out (`ok_downgrade`).
231
230
 
232
231
  The two **friction** events — `review_rejected` and `step_completed` — carry an
233
232
  optional `attributed_kind` (`agent` | `skill` | `playbook`) + `attributed_name`
234
- (free string) pair (#217). They answer "_which_ component did this friction come
233
+ (free string) pair. They answer "_which_ component did this friction come
235
234
  from", feeding the "friction attributed to a specific skill/agent" metric. They are
236
235
  set by the emitting prompts (Reviewer, Orchestrator), which list the valid roster and
237
236
  are instructed to **omit both when they can't confidently attribute** — a wrong guess
@@ -243,11 +242,11 @@ from its child `review_rejected` events, not collapsed into one fuzzy culprit.
243
242
  `attributed_name` is a **free string this phase, by design** (measure-then-decide):
244
243
  the CLI does not enforce a component registry. It rides the `internal-enum` axis (kept
245
244
  in both tiers) because it is the moat metric and a bounded roster name, not sensitive
246
- free text (D8) — it is the one non-numeric field that survives the `anonymous`
247
- projection. The aggregation script (#218) reports attribution coverage and flags names
245
+ free text — it is the one non-numeric field that survives the `anonymous`
246
+ projection. The aggregation script reports attribution coverage and flags names
248
247
  outside the known roster — that signal is the thermometer. If it shows the data quality
249
248
  is poor, the cheap next step is enum validation in `lemony emit`; an MCP-backed registry
250
- only if that proves insufficient. Historical lines (pre-#217) lack both fields, so
249
+ only if that proves insufficient. Historical lines from earlier releases lack both fields, so
251
250
  coverage starts near 0% and ramps.
252
251
 
253
252
  ## Reader contract (forward-only, dispatch-on-read)
@@ -278,9 +277,9 @@ A writer (the `lemony emit` CLI):
278
277
 
279
278
  ## Out of scope (recorded for forward-design)
280
279
 
281
- - Tier 2 central ingestion, export, and aggregation (Fase 1+, decision #24).
280
+ - Tier 2 central ingestion, export, and aggregation (a later phase).
282
281
  - The `project`-tier export branch: the axis table assigns the `identity` /
283
282
  `free-text` policy, but the sanitizer wires only the `anonymous` branch in v1
284
- (D8/D9). The `project` branch lands with the consent/config work (#229).
283
+ The `project` branch lands with the consent/config work.
285
284
  - Backward-compatible field renames — `tier2-events-history.md` records them
286
285
  forward-only; readers add the alias when they care.
@@ -0,0 +1,121 @@
1
+ ---
2
+ name: a11y-audit
3
+ description: The accessibility-QA review lens for the UI Designer — deterministic-first and tool-orchestrating. Measures contrast from tokens, rides the project's a11y lint, runs axe on the rendered DOM when testable, then judges only what tools can't (focus management, live regions, alt-text quality, reduced motion). Use at REVIEW when a task touched UI, alongside design-critique.
4
+ origin: vendor
5
+ vendor_version: '{{vendor_version}}'
6
+ phase: post-implementation
7
+ invoked-by: [ui-designer]
8
+ ---
9
+
10
+ # Accessibility Audit
11
+
12
+ Judge the **accessibility** of an implemented UI change against the WCAG target the
13
+ `ui-handoff.md` §9 set (default WCAG 2.1 AA). This is the **objective, measurable** half of
14
+ design QA — the subjective half (does it carry the design's point of view?) is the
15
+ `design-critique` skill, run alongside this one. The build twin is the implementer's
16
+ `build-ui/accessibility.md`: this lens checks what that resource asks the implementer to
17
+ build accessible by construction. When you change an axis below, keep its build twin in step.
18
+
19
+ The method is **deterministic-first and tool-orchestrating**: gather evidence with tools
20
+ before you judge, and reserve your own judgment for what no tool can decide. A clear verdict
21
+ must not depend on whether the optional tools are present — the tiers degrade gracefully.
22
+
23
+ ## The tiered method
24
+
25
+ Run the tiers in order; each adds evidence the next builds on.
26
+
27
+ - **T0 — token-pair contrast (always available, deterministic).** Run
28
+ `lemony design-tokens contrast`. It computes the WCAG ratio of every foreground/background
29
+ token pair — including any dark-mode override — and exits non-zero on a pair below its
30
+ floor. This is the one a11y measurement doable offline, because colour comes from the
31
+ token file. It is complementary to `lemony design-tokens validate` (validate proves colour
32
+ is a token reference; contrast proves the pair meets its floor).
33
+ - **T1 — source a11y lint (rides the project).** The project's own linter usually carries an
34
+ accessibility plugin (`eslint-plugin-jsx-a11y`, Svelte's or Vue's a11y rules). Run the
35
+ project's `lint`; don't reimplement it. Read what it flags.
36
+ - **T2 — axe on the rendered DOM (when testable).** If the repo can render components in a
37
+ test (vitest-axe / jest-axe, Storybook + axe, or Playwright + `@axe-core/playwright`), run
38
+ axe over the changed views and read the violations. Browser-driven is simply the strongest
39
+ variant of this tier — opportunistic, **not required**. If nothing can render the DOM, skip
40
+ T2 and say so in the report; the verdict still stands on T0/T1/T3.
41
+ - **T3 — judgment (you).** Decide what the tools cannot: focus management on route and dialog
42
+ changes, live-region announcements, keyboard patterns for custom widgets, meaning not
43
+ carried by colour alone, **alt-text quality** (a tool sees that `alt` exists; only judgment
44
+ sees whether it is meaningful), and reduced-motion. T3 also triages and interprets the
45
+ T1/T2 output against the handoff — a lint rule a project disabled on purpose is not your
46
+ finding.
47
+
48
+ ## The axes — tier coverage plus what judgment adds
49
+
50
+ Each axis names which tier catches it and what remains for T3.
51
+
52
+ ### Semantic HTML first
53
+
54
+ Mostly T1/T2 (a clickable `<div>`, a skipped heading level, missing landmarks surface in
55
+ lint/axe). T3 confirms the right element was used for the meaning, not just a passing rule.
56
+
57
+ ### Colour and contrast
58
+
59
+ T0 measures it from tokens — the deterministic floor (text ≥ 4.5:1, large ≥ 3:1, non-text
60
+ ≥ 3:1), including dark mode. T3 checks the rendered pair where colour is _not_ from a token
61
+ (an image-on-text overlay, a hardcoded value `validate` already flags).
62
+
63
+ ### Keyboard operability
64
+
65
+ T2/axe catches missing focusability; T3 walks the actual order, confirms a **visible** focus
66
+ indicator meets the non-text floor, that there is no keyboard trap, and that focus is managed
67
+ on dynamic change (dialog open returns focus on close; route change moves it sensibly).
68
+
69
+ ### ARIA — only when semantics fall short
70
+
71
+ T1/T2 flag invalid ARIA and `aria-hidden` on focusable content. T3 judges whether ARIA was
72
+ reached for where a native element would have done, and whether state attributes
73
+ (`aria-expanded`, `aria-selected`, `aria-current`) actually track the real state.
74
+
75
+ ### Forms
76
+
77
+ T1/T2 catch an unlabelled field. T3 confirms errors are identified **in text** and linked to
78
+ the field (not colour alone), and required/invalid state is in the accessibility tree.
79
+
80
+ ### Images, icons and media
81
+
82
+ A tool sees that `alt` is present or absent; **T3 judges whether it is meaningful** — a
83
+ decorative image is empty-`alt`, a meaningful one conveys its purpose, an icon-only control
84
+ has an accessible name.
85
+
86
+ ### Motion and time
87
+
88
+ Largely T3: is `prefers-reduced-motion` honoured for non-essential animation (mirrors the
89
+ motion judgment in `design-critique`)? Nothing flashes more than three times a second; no
90
+ tight time limit without a way to extend it.
91
+
92
+ ### Touch and pointer targets
93
+
94
+ T2 can measure target size; T3 confirms interactive targets meet the 24×24 CSS px minimum
95
+ (WCAG 2.2 SC 2.5.8, Level AA) — 44×44 is the WCAG 2.1 AAA enhanced size and the comfortable
96
+ touch target — and adjacent targets are spaced so they aren't mis-tapped.
97
+
98
+ ## Confidence gating
99
+
100
+ Raise a finding when a tool flags it, or when you are **>80% confident** of a T3 judgment
101
+ issue. A measured T0/T1/T2 failure is not a confidence call — report it. Don't pad the report
102
+ with rules the project deliberately disabled.
103
+
104
+ ## Report
105
+
106
+ ```
107
+ ## Accessibility Audit — <task name>
108
+
109
+ **Target**: <WCAG level from ui-handoff §9>
110
+ **T0 contrast**: ✅ pass / ❌ <failing pairs>
111
+ **T1 a11y lint**: ✅ / ❌ <violations> / N/A
112
+ **T2 axe (rendered DOM)**: ✅ / ❌ <violations> / skipped — <why>
113
+ **T3 judgment**: <findings, or "none">
114
+
115
+ **Verdict**: approve / changes requested — <one-line reason>
116
+ ```
117
+
118
+ "Changes requested" routes back to the Implementer (transient — no label). If the handoff §9
119
+ is silent on a genuinely ambiguous decision, raise a discovery (`raise-discovery`) rather
120
+ than guessing. An independent defect unrelated to this change is a side finding
121
+ (`note-side-finding`).
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: bootstrap-architecture
3
- description: Author the first docs/architecture.md — a holistic map of the system's current shape, fitted to this project. A one-time bootstrap the Architect runs when a client opts into the architecture capability; thereafter update-architecture maintains it incrementally. The harness only does this when the human asks (decision #8).
3
+ description: Author the first docs/architecture.md — a holistic map of the system's current shape, fitted to this project. A one-time bootstrap the Architect runs when a client opts into the architecture capability; thereafter update-architecture maintains it incrementally. The harness only does this when the human asks.
4
4
  origin: vendor
5
5
  vendor_version: '{{vendor_version}}'
6
6
  invoked-by: [architect]
@@ -23,7 +23,7 @@ the file now existing) keeps it true incrementally.
23
23
  It is **not gated** — it is available before any `docs/architecture.md` exists, because its
24
24
  job is to create the first one. But the harness runs it **only when the human opts in**
25
25
  (via `/add-capability`, or an explicit request), never on its own: the vendor gives the
26
- framework, not the architecture (decision #8). The map is authored **from the client's real
26
+ framework, not the architecture. The map is authored **from the client's real
27
27
  code**, so the harness reflects the project's shape — it never imposes one.
28
28
 
29
29
  ## What to produce
@@ -66,7 +66,7 @@ Return to your invoker: the sections written, a one-line summary of the shape th
66
66
  captures, and the explicit note that `update-architecture` should be activated via `repair`.
67
67
  If the project is too small or too uniform to have a meaningful architecture (a single-purpose
68
68
  script, a flat library), **say so and write nothing** — an architecture map for a project
69
- without architecture is noise, and #8 means the client need not have one.
69
+ without architecture is noise, and the client need not have one.
70
70
 
71
71
  ## Uncontemplated Scenarios
72
72
 
@@ -0,0 +1,147 @@
1
+ ---
2
+ name: build-ui
3
+ description: The implementer's method for building UI from a ui-handoff.md — apply the project's design tokens to whatever stack the repo uses, build without generic "AI slop", and ship accessible by construction. Stack-agnostic — it owns the process and the token contract, not per-framework guides. Use when implementing a task that touches UI and a ui-handoff.md exists.
4
+ origin: vendor
5
+ vendor_version: '{{vendor_version}}'
6
+ phase: during-implementation
7
+ invoked-by: [implementer]
8
+ ---
9
+
10
+ # Build UI
11
+
12
+ The implementer's **build method**: turn a `ui-handoff.md` (the design contract) plus
13
+ `docs/design-tokens.json` (the token source of truth) into UI code that applies the
14
+ project's tokens correctly, carries the design's point of view instead of generic
15
+ defaults, and is accessible by construction.
16
+
17
+ ## What this skill owns — and what it deliberately doesn't
18
+
19
+ It owns the **harness-stable layer**: the tokens-as-code contract, the build process,
20
+ and the stack-agnostic principles. It does **not** reproduce per-framework know-how —
21
+ how to theme Tailwind v4, configure MUI's `createTheme`, or wire vanilla-extract. You
22
+ already know the stacks, and you read `package.json` to confirm which one this repo
23
+ uses. A frozen per-stack guide would rot the moment the ecosystem moves (a Tailwind
24
+ v3→v4 jump turns it into a lie); the live model does not. So this skill names the
25
+ **substrate space** and lets you map any tool — present or future — onto it.
26
+
27
+ ## The process
28
+
29
+ 1. **Read the inputs.** `ui-handoff.md` (under `.claude/state/tasks/<id>/spec/`) is your
30
+ obligatory design input — the dials, screens, components, states, motion, a11y target
31
+ and microcopy. `docs/design-tokens.json` is the token source of truth (§11 of the
32
+ handoff points at it). Read both before writing UI.
33
+ 2. **Identify the substrate.** Read `package.json` and the styling config to place the
34
+ repo in one of the three archetypes below. The archetype, not the tool name, drives
35
+ how you apply tokens.
36
+ 3. **Apply tokens through the substrate.** Build the screens and components from the
37
+ handoff at the direction its dials set — referencing tokens, never raw values.
38
+ 4. **Pull the craft resources as you build.** Read [anti-slop.md](./anti-slop.md) before
39
+ generating any visual code, and [accessibility.md](./accessibility.md) when you build
40
+ anything interactive or visual. They load on demand — don't inline their content here.
41
+ 5. **Self-validate before signalling done.** Run `lemony design-tokens validate` (it flags
42
+ raw colours/dimensions in source that should reference a token) and re-read the
43
+ handoff's dials and targets against what you built.
44
+
45
+ ## Tokens-as-code — the contract
46
+
47
+ `docs/design-tokens.json` is the **single, client-owned source of truth**, a 3-tier
48
+ W3C-DTCG model: **primitive** (raw scale values) → **semantic** (intent: `color.surface`,
49
+ `space.inset.md`) → **component** (a component's specific slots). Two rules hold on every
50
+ stack:
51
+
52
+ - **Reference semantic or component tokens; never primitives, never raw literals.** A
53
+ component reads `color.surface`, not `gray.50` and never `#fff`. Raw values in source
54
+ are what `design-tokens validate` catches.
55
+ - **Dark mode (and any theme) is an override, not a fork.** Swap the values behind the
56
+ semantic layer; the component code stays identical. You never branch component logic on
57
+ the theme.
58
+
59
+ ## Substrate archetypes
60
+
61
+ Identify which one the repo is, then apply tokens that way. Each snippet is
62
+ **illustrative, not normative** — it names the space so you can map the actual tool.
63
+
64
+ **1. CSS custom properties (runtime cascade).** Token → `--var` → `var()`. The cascade
65
+ carries theming; an override re-declares the variable.
66
+
67
+ ```css
68
+ :root {
69
+ --color-surface: #ffffff; /* from semantic token color.surface */
70
+ --space-inset-md: 12px;
71
+ }
72
+ [data-theme='dark'] {
73
+ --color-surface: #1a1a1a; /* dark mode = override the var, the .card below is untouched */
74
+ }
75
+ .card {
76
+ background: var(--color-surface);
77
+ padding: var(--space-inset-md);
78
+ }
79
+ ```
80
+
81
+ _Space: vanilla CSS, Sass, CSS Modules._
82
+
83
+ **2. Central config/theme object a library consumes.** Tokens populate one theme object;
84
+ components read it through the library's API (`className`, `sx`, `styled`), never literals.
85
+
86
+ ```ts
87
+ export const theme = {
88
+ colors: { surface: tokens.color.surface },
89
+ space: { insetMd: tokens.space.inset.md },
90
+ };
91
+ // usage: <Box sx={{ bg: 'surface', p: 'insetMd' }} /> — the library resolves it, no hard-coded values
92
+ ```
93
+
94
+ _Space: Tailwind (`config` / `@theme`), MUI / Chakra / Mantine (`createTheme`),
95
+ styled-components / Emotion._
96
+
97
+ **3. Compiled type-safe contract (zero-runtime CSS-in-TS).** Tokens become a typed
98
+ contract resolved at build; a missing token is a compile error, not a silent fallback.
99
+
100
+ ```ts
101
+ export const vars = createThemeContract({
102
+ color: { surface: null },
103
+ space: { insetMd: null },
104
+ });
105
+ export const light = createTheme(vars, {
106
+ color: { surface: tokens.color.surface },
107
+ space: { insetMd: tokens.space.inset.md },
108
+ });
109
+ // components consume `vars.color.surface`; the type system rejects an unknown token
110
+ ```
111
+
112
+ _Space: vanilla-extract, Panda CSS, StyleX, Linaria._
113
+
114
+ **Orthogonal axis — static-build ↔ dynamic-runtime theming.** Independent of the
115
+ archetype: a single fixed theme can compile away entirely, while multi-tenant / white-label
116
+ products need tokens to live as CSS variables or context so they switch at runtime. Pick the
117
+ side the product needs; it cross-cuts all three archetypes.
118
+
119
+ **Out of scope for now — native substrates.** React Native / Compose / SwiftUI map tokens
120
+ to a native style object; recognised as a fourth archetype but not covered here yet.
121
+
122
+ ## When the handoff contradicts reality
123
+
124
+ If building reveals that the `ui-handoff.md` contradicts the codebase, is missing a
125
+ decision with more than one valid answer, or asks for something that already exists, **do
126
+ not improvise** — raise it through the implementer's discovery channel so the designer or
127
+ the human resolves it, then resume. An independent defect unrelated to your change is a
128
+ side finding, not a blocker.
129
+
130
+ ## Cross-references
131
+
132
+ ```
133
+ grill-ui → ui-handoff.md → implementer (build-ui) → REVIEW (design-critique + a11y-audit) → merge gate
134
+ ```
135
+
136
+ You consume `ui-handoff.md`; the REVIEW design lens checks what you built against it. The
137
+ `design-critique` skill mirrors [anti-slop.md](./anti-slop.md)'s principles and `a11y-audit`
138
+ mirrors [accessibility.md](./accessibility.md)'s — when you change a principle here, keep its
139
+ review twin in step.
140
+
141
+ ---
142
+
143
+ See also:
144
+
145
+ - [anti-slop.md](./anti-slop.md) — the craft layer: how to build the design's point of view
146
+ into the code instead of generic defaults.
147
+ - [accessibility.md](./accessibility.md) — WCAG 2.1 AA implementation patterns.
@@ -0,0 +1,101 @@
1
+ # Accessibility — building to WCAG 2.1 AA
2
+
3
+ Load this when you implement anything interactive or visual. The `ui-handoff.md` §9 sets
4
+ the **target** (default WCAG 2.1 AA) and any task-specific decisions; this resource is the
5
+ **implementation** layer — how to build it accessible by construction, so the `a11y-audit`
6
+ review lens finds nothing to reject. Accessibility is not a pass you bolt on at the end; it
7
+ is a property of how the markup, focus, colour and motion are built.
8
+
9
+ ## Semantic HTML first
10
+
11
+ The fastest path to accessible UI is the right element. A native `<button>`, `<a>`,
12
+ `<label>`, `<nav>`, `<main>`, `<ul>` carries role, keyboard behaviour and focus for free.
13
+
14
+ - **Use the element that means what you mean.** A clickable `<div>` is a bug — it has no
15
+ role, no keyboard handler, no focusability. Use `<button>` for actions, `<a href>` for
16
+ navigation.
17
+ - **One `<h1>` per view, headings in order.** Don't skip levels for visual sizing — size
18
+ with tokens, structure with heading level. Screen-reader users navigate by heading.
19
+ - **Landmarks frame the page.** `<header>`, `<nav>`, `<main>`, `<footer>` give assistive
20
+ tech a map. Reach for ARIA roles only to fill a gap native elements can't.
21
+
22
+ ## Colour and contrast
23
+
24
+ This is the objective floor — measured, not judged.
25
+
26
+ - **Text contrast ≥ 4.5:1** against its background; **large text (≥ 24px, or ≥ 19px bold)
27
+ ≥ 3:1**.
28
+ - **Non-text contrast ≥ 3:1** for UI component boundaries, icons that carry meaning, and
29
+ the visible focus indicator.
30
+ - **Never encode meaning in colour alone.** Pair colour with an icon, label or pattern — a
31
+ red border needs an error message, a green dot needs a "online" label.
32
+ - Because colour comes from semantic tokens, contrast is a property of the token pairing;
33
+ check the actual rendered pair, including in dark mode.
34
+
35
+ ## Keyboard operability
36
+
37
+ Everything that works with a mouse must work with a keyboard alone.
38
+
39
+ - **All interactive elements are reachable and operable by keyboard**, in a logical Tab
40
+ order that follows the visual/reading order. Native elements give you this; custom
41
+ widgets need `tabindex` and key handlers.
42
+ - **A visible focus indicator on every focusable element** — never `outline: none` without
43
+ a clearly visible replacement that meets the 3:1 non-text contrast floor.
44
+ - **No keyboard traps.** Focus can always move on; if you trap it deliberately (a modal),
45
+ provide the documented way out (Escape) and restore focus on close.
46
+ - **Manage focus on dynamic change.** Opening a dialog moves focus into it and contains it;
47
+ closing returns focus to the trigger. Route changes move focus to a sensible anchor.
48
+ - **Honour expected keys** for the widget pattern (Enter/Space activate, arrows move within
49
+ a composite like tabs or a menu, Escape dismisses).
50
+
51
+ ## ARIA — only when semantics fall short
52
+
53
+ The first rule of ARIA is don't use ARIA when a native element would do. When you do need
54
+ it:
55
+
56
+ - **Name every control.** A visible `<label>` (associated via `for`/`id`), or
57
+ `aria-label` / `aria-labelledby` when there is no visible text (icon-only buttons).
58
+ - **Reflect state, don't fake it.** `aria-expanded`, `aria-selected`, `aria-checked`,
59
+ `aria-disabled`, `aria-current` must track the real state.
60
+ - **Announce async change with live regions.** A status message, toast, or validation
61
+ result that appears without a focus change needs `aria-live` (`polite` for status,
62
+ `assertive` only for errors) so it is announced.
63
+ - **Don't break the accessibility tree.** `aria-hidden` on focusable content, redundant
64
+ roles, or mislabelled landmarks are worse than no ARIA.
65
+
66
+ ## Forms
67
+
68
+ - **Every field has a programmatically-associated label** — not a placeholder standing in
69
+ for one (placeholders vanish on input and fail contrast).
70
+ - **Errors are identified in text, linked to the field** (`aria-describedby`), and not by
71
+ colour alone. Group related fields with `<fieldset>`/`<legend>`.
72
+ - **Required and invalid states are conveyed in the accessibility tree** (`aria-required`,
73
+ `aria-invalid`), not only visually.
74
+
75
+ ## Images, icons and media
76
+
77
+ - **Meaningful images need a text alternative** (`alt`) that conveys their purpose;
78
+ **decorative images get empty `alt=""`** so they are skipped.
79
+ - **Icon-only controls need an accessible name** (see ARIA above); a standalone meaningful
80
+ icon needs a text equivalent.
81
+
82
+ ## Motion and time
83
+
84
+ - **Respect `prefers-reduced-motion`** — provide a reduced or no-motion path for non-essential
85
+ animation. This mirrors the motion guidance in [anti-slop.md](./anti-slop.md).
86
+ - **No content that flashes more than three times per second.**
87
+ - **Don't impose tight time limits**; if one exists, let the user extend or disable it.
88
+
89
+ ## Touch and pointer targets
90
+
91
+ - **Interactive targets are large enough to hit** — a minimum of 24×24 CSS px (WCAG 2.2 AA,
92
+ SC 2.5.8); 44×44 is the WCAG 2.1 AAA enhanced size and the comfortable touch target. Space
93
+ adjacent targets so they aren't mis-tapped.
94
+
95
+ ## Self-check before done
96
+
97
+ Walk the change with these, matching the `a11y-audit` review lens so nothing comes back:
98
+ keyboard-only traversal reaches and operates everything with visible focus; measured
99
+ contrast meets the floor (including dark mode); every control has an accessible name; async
100
+ updates announce; reduced-motion is honoured; targets meet the size floor. Where a decision
101
+ is genuinely ambiguous, it belonged in the handoff §9 — raise it rather than guess.