@lemoncode/lemony 0.1.0 → 0.1.1-alpha.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/NOTICE +39 -0
- package/README.md +0 -1
- package/catalog/VERSION +1 -1
- package/catalog/agents/architect.md +4 -4
- package/catalog/agents/fit-assessment.md +1 -1
- package/catalog/agents/implementer.md +15 -8
- package/catalog/agents/orchestrator.md +204 -36
- package/catalog/agents/reviewer.md +7 -7
- package/catalog/agents/spec-author.md +7 -4
- package/catalog/agents/ui-designer.md +121 -15
- package/catalog/commands/add-capability.md +3 -3
- package/catalog/commands/resume.md +10 -4
- package/catalog/commands/spinoff.md +2 -2
- package/catalog/commands/sync-design-tokens.md +29 -0
- package/catalog/harness.config.schema.json +15 -16
- package/catalog/hooks/init.sh +11 -11
- package/catalog/hooks/lib/lemony.sh +3 -3
- package/catalog/hooks/lib/playbook-scan.sh +10 -11
- package/catalog/hooks/session-close.sh +7 -7
- package/catalog/schemas/tier2-events-history.md +11 -11
- package/catalog/schemas/tier2-events.md +46 -47
- package/catalog/skills/a11y-audit/SKILL.md +121 -0
- package/catalog/skills/bootstrap-architecture/SKILL.md +3 -3
- package/catalog/skills/build-ui/SKILL.md +147 -0
- package/catalog/skills/build-ui/accessibility.md +101 -0
- package/catalog/skills/build-ui/anti-slop.md +107 -0
- package/catalog/skills/code-explorer/SKILL.md +1 -1
- package/catalog/skills/design-critique/SKILL.md +110 -0
- package/catalog/skills/design-tool-sync/SKILL.md +120 -0
- package/catalog/skills/grill-ui/SKILL.md +248 -0
- package/catalog/skills/grill-ui/ui-handoff-format.md +149 -0
- package/catalog/skills/grill-with-docs/SKILL.md +9 -2
- package/catalog/skills/mutation-testing/SKILL.md +1 -1
- package/catalog/skills/note-side-finding/SKILL.md +1 -1
- package/catalog/skills/playbook-iterate/SKILL.md +2 -2
- package/catalog/skills/review-pr/SKILL.md +3 -3
- package/catalog/skills/task-closeout/SKILL.md +9 -8
- package/catalog/skills/update-architecture/SKILL.md +3 -3
- package/catalog/templates/claude-code/agents.md.tpl +27 -18
- package/catalog/templates/claude-code/docs/playbooks/README.md.tpl +1 -3
- package/catalog/templates/claude-code/harness.config.yml.tpl +8 -9
- package/dist/cli.mjs +1287 -1676
- package/package.json +13 -4
- package/catalog/agents/README.md +0 -29
- package/catalog/hooks/README.md +0 -56
- package/catalog/playbook-format.md +0 -198
- package/catalog/schemas/README.md +0 -13
- package/catalog/skills/README.md +0 -62
- package/catalog/templates/README.md +0 -32
|
@@ -8,17 +8,16 @@
|
|
|
8
8
|
> dispatch-on-read).
|
|
9
9
|
|
|
10
10
|
Tier 1 (client-local) writes append-only JSONL from day one so the data is
|
|
11
|
-
forward-compatible with the Tier 2 central backend
|
|
12
|
-
\#24, #25, #27, #51).
|
|
11
|
+
forward-compatible with the Tier 2 central backend planned for a later phase.
|
|
13
12
|
|
|
14
13
|
## Storage
|
|
15
14
|
|
|
16
15
|
- One event per line, UTF-8, in `.claude/state/events.jsonl`. The stream is
|
|
17
|
-
**local-only and gitignored — never committed
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
- **Append-only, except confirmed-sent prefix-prune
|
|
16
|
+
**local-only and gitignored — never committed**: it sits in the managed
|
|
17
|
+
`GITIGNORE_BLOCK` beside `current-*.md` / `sessions/`. There is no Tier 2
|
|
18
|
+
consumer yet, so committing only dirtied the base; transport to Tier 2 is a
|
|
19
|
+
planned sink.
|
|
20
|
+
- **Append-only, except confirmed-sent prefix-prune.**
|
|
22
21
|
Emitters only ever append. The send engine may **collapse the already-delivered prefix**
|
|
23
22
|
(`[0:cursor]`) once it exceeds ~5MB — never dropping unsent bytes — which rewrites the
|
|
24
23
|
file and may **reorder** the unsent tail relative to concurrent appends. Order is not a
|
|
@@ -41,7 +40,7 @@ same top level — there is no nested `payload`, so Zod discriminated unions key
|
|
|
41
40
|
| ----------------- | ------ | -------- | --------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
42
41
|
| `type` | string | yes | `internal-enum` | One of the 9 event types listed below. Discriminator. |
|
|
43
42
|
| `ts` | string | yes | `metric` | UTC ISO 8601 with `Z` suffix (e.g. `2026-05-28T14:30:00.000Z`). **No local offsets.** |
|
|
44
|
-
| `user` | string | yes | `local-only` | `git config user.email` of the actor
|
|
43
|
+
| `user` | string | yes | `local-only` | `git config user.email` of the actor. Never exported in any tier. |
|
|
45
44
|
| `project` | string | yes | `identity` | `task_storage.repo` slug (e.g. `acme/widgets`), from `harness.config.yml`. **Never `OWNER/REPO`** — the CLI refuses to emit while that placeholder is the value (see [Placeholder guard](#placeholder-guard)). |
|
|
46
45
|
| `task_id` | string | no | `identity` | Task issue id (e.g. `42`) when the event has a task context. Absent on session/global events. A per-project correlator — only meaningful alongside `project`, so it shares the `identity` axis. |
|
|
47
46
|
| `harness_version` | string | yes | `metric` | `version` of the **installed** `@lemoncode/lemony` package — _not_ `vendor_version` from config. |
|
|
@@ -94,12 +93,12 @@ only `internal-enum` + `metric`; **`project`** (opt-in, for dogfood) additionall
|
|
|
94
93
|
keeps `identity` + `free-text`. `local-only` is dropped in both. `identity` and
|
|
95
94
|
`free-text` share today's policy but stay distinct axes for future divergence (a
|
|
96
95
|
hashed `user_hash` would be `identity`-with-hashing, not `free-text`). v1 wires only
|
|
97
|
-
the `anonymous` branch
|
|
96
|
+
the `anonymous` branch.
|
|
98
97
|
|
|
99
98
|
`attributed_name` is deliberately an `internal-enum` even though Zod types it as a
|
|
100
99
|
bounded free string: the axis is policy-oriented (keep-always), and the field is the
|
|
101
|
-
moat metric (
|
|
102
|
-
across all installs, not sensitive free text
|
|
100
|
+
moat metric ("which component causes friction") — a roster component name shared
|
|
101
|
+
across all installs, not sensitive free text. The aggregation script flags names
|
|
103
102
|
outside the known roster as the data-quality thermometer (see [Attribution](#attribution)).
|
|
104
103
|
|
|
105
104
|
---
|
|
@@ -108,8 +107,8 @@ outside the known roster as the data-quality thermometer (see [Attribution](#att
|
|
|
108
107
|
|
|
109
108
|
Five are emitted in P5. `bug_post_merge` is deferred to P8 (meta-test). `l3_bypass`
|
|
110
109
|
is deferred to P6 (the `/bypass` command). `followup_captured` is emitted by the
|
|
111
|
-
`/spinoff` command
|
|
112
|
-
step-by-step mode
|
|
110
|
+
`/spinoff` command. `step_completed` is emitted by the Orchestrator in
|
|
111
|
+
step-by-step mode. The schema covers all nine so the file is
|
|
113
112
|
forward-compatible — readers dispatch on `type` and ignore unknowns.
|
|
114
113
|
|
|
115
114
|
### 1. `session_closed` _(P5)_
|
|
@@ -150,28 +149,28 @@ Emitted by the Orchestrator when it transitions `spec-in-progress → spec-ready
|
|
|
150
149
|
Emitted by the Orchestrator at closeout (after `gh pr view` confirms `MERGED`,
|
|
151
150
|
before `git rm` of the task state).
|
|
152
151
|
|
|
153
|
-
| Field | Type | Required | Axis | Notes
|
|
154
|
-
| ------------------- | ------ | -------- | --------------- |
|
|
155
|
-
| `task_id` | string | yes | `identity` | Required for this type.
|
|
156
|
-
| `level` | string | yes | `internal-enum` | `L1` \| `L2` \| `L3` — the task-fit dial value used.
|
|
157
|
-
| `cycle_time_h` | number | yes | `metric` | Wall-clock hours from issue creation to merge. ≥ 0, finite.
|
|
158
|
-
| `review_rejections` | number | yes | `metric` | Count of `review_rejected` events for this `task_id` (≥ 0, int).
|
|
159
|
-
| `mode` | string | no | `internal-enum` | `all_at_once` \| `step_by_step` — the mode chosen at the L1 approval gate
|
|
160
|
-
| `steps` | number | no | `metric` | Count of `step_completed` events for this task (≥ 1, int). Only meaningful when `mode` is `step_by_step`; < total tasks after a mid-task downgrade.
|
|
152
|
+
| Field | Type | Required | Axis | Notes |
|
|
153
|
+
| ------------------- | ------ | -------- | --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
154
|
+
| `task_id` | string | yes | `identity` | Required for this type. |
|
|
155
|
+
| `level` | string | yes | `internal-enum` | `L1` \| `L2` \| `L3` — the task-fit dial value used. |
|
|
156
|
+
| `cycle_time_h` | number | yes | `metric` | Wall-clock hours from issue creation to merge. ≥ 0, finite. |
|
|
157
|
+
| `review_rejections` | number | yes | `metric` | Count of `review_rejected` events for this `task_id` (≥ 0, int). |
|
|
158
|
+
| `mode` | string | no | `internal-enum` | `all_at_once` \| `step_by_step` — the mode chosen at the L1 approval gate. **Absent on L2** (the question only exists where `tasks.md` does). |
|
|
159
|
+
| `steps` | number | no | `metric` | Count of `step_completed` events for this task (≥ 1, int). Only meaningful when `mode` is `step_by_step`; < total tasks after a mid-task downgrade. |
|
|
161
160
|
|
|
162
161
|
### 5. `review_rejected` _(P5)_
|
|
163
162
|
|
|
164
|
-
Emitted by the Reviewer when the verdict is REJECT (
|
|
165
|
-
|
|
163
|
+
Emitted by the Reviewer when the verdict is REJECT (transient state — no
|
|
164
|
+
dedicated label).
|
|
166
165
|
|
|
167
|
-
| Field | Type | Required | Axis | Notes
|
|
168
|
-
| ----------------- | ------ | -------- | --------------- |
|
|
169
|
-
| `task_id` | string | yes | `identity` | Required for this type.
|
|
170
|
-
| `reason` | string | yes | `free-text` | Short human-readable reason (one line; never the full review comment). 1-500 chars.
|
|
171
|
-
| `iteration` | number | yes | `metric` | 1-based: the Nth rejection of this task (≥ 1, int).
|
|
172
|
-
| `step` | number | no | `metric` | The step (1-based `tasks.md` task number) whose per-step review rejected
|
|
173
|
-
| `attributed_kind` | string | no | `internal-enum` | `agent` \| `skill` \| `playbook` — the kind of component the friction is attributed to
|
|
174
|
-
| `attributed_name` | string | no | `internal-enum` | The component's name (free string, 1-200 chars), e.g. `implementer`. Independently optional in the schema; emitters pair it with `attributed_kind` and omit both when they can't attribute
|
|
166
|
+
| Field | Type | Required | Axis | Notes |
|
|
167
|
+
| ----------------- | ------ | -------- | --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
168
|
+
| `task_id` | string | yes | `identity` | Required for this type. |
|
|
169
|
+
| `reason` | string | yes | `free-text` | Short human-readable reason (one line; never the full review comment). 1-500 chars. |
|
|
170
|
+
| `iteration` | number | yes | `metric` | 1-based: the Nth rejection of this task (≥ 1, int). |
|
|
171
|
+
| `step` | number | no | `metric` | The step (1-based `tasks.md` task number) whose per-step review rejected. **Absent** on full-pass and all-at-once rejections. |
|
|
172
|
+
| `attributed_kind` | string | no | `internal-enum` | `agent` \| `skill` \| `playbook` — the kind of component the friction is attributed to. **Omitted when the emitter can't attribute.** |
|
|
173
|
+
| `attributed_name` | string | no | `internal-enum` | The component's name (free string, 1-200 chars), e.g. `implementer`. Independently optional in the schema; emitters pair it with `attributed_kind` and omit both when they can't attribute. Free-string by design — see [Attribution](#attribution). |
|
|
175
174
|
|
|
176
175
|
### 6. `bug_post_merge` _(P8 — schema only)_
|
|
177
176
|
|
|
@@ -194,7 +193,7 @@ from the envelope (optional) and usually absent.
|
|
|
194
193
|
| `topic` | string | yes | `free-text` | One-line subject (typo, rename, lockfile bump, etc.). 1-200 chars. |
|
|
195
194
|
| `reason` | string | yes | `free-text` | Why the harness was bypassed. 1-500 chars (≤ 200 recommended). |
|
|
196
195
|
|
|
197
|
-
### 8. `followup_captured` _(
|
|
196
|
+
### 8. `followup_captured` _(`/spinoff`)_
|
|
198
197
|
|
|
199
198
|
Emitted by the `/spinoff` command when a non-blocking, independent defect found
|
|
200
199
|
mid-task is parked as a `harness:managed` + `harness:status:pending` stub. Feeds the
|
|
@@ -207,7 +206,7 @@ post-merge / production signal; conflating them would dirty the post-merge metri
|
|
|
207
206
|
| `parent_task_id` | string | no | `identity` | The originating task id. **Absent** when `/spinoff` runs outside any active task (a deferred stub). |
|
|
208
207
|
| `severity` | string | no | `internal-enum` | `low` \| `medium` \| `high` \| `critical`. Best-effort — set only when cheaply inferred. |
|
|
209
208
|
|
|
210
|
-
### 9. `step_completed` _(
|
|
209
|
+
### 9. `step_completed` _(step-by-step mode)_
|
|
211
210
|
|
|
212
211
|
Emitted by the Orchestrator each time a human checkpoint **resolves** in
|
|
213
212
|
step-by-step mode (L1 opt-in, chosen at the approval gate). One event per
|
|
@@ -216,14 +215,14 @@ when it re-checkpoints, with the same `step`. This is the signal that justifies
|
|
|
216
215
|
(or condemns) the mode — the rate of checkpoints that catch things, and where
|
|
217
216
|
humans bail out (`ok_downgrade`).
|
|
218
217
|
|
|
219
|
-
| Field | Type | Required | Axis | Notes
|
|
220
|
-
| ------------------- | ------ | -------- | --------------- |
|
|
221
|
-
| `task_id` | string | yes | `identity` | Required for this type.
|
|
222
|
-
| `step` | number | yes | `metric` | 1-based `tasks.md` task number the checkpoint belongs to (≥ 1, int).
|
|
223
|
-
| `review_iterations` | number | yes | `metric` | Reviewer invocations that preceded this checkpoint (≥ 1, int — every step is reviewed before the human; resets after a "changes").
|
|
224
|
-
| `checkpoint_result` | string | yes | `internal-enum` | `ok` \| `changes` \| `ok_downgrade` (OK and switch the remaining tasks to all-at-once).
|
|
225
|
-
| `attributed_kind` | string | no | `internal-enum` | `agent` \| `skill` \| `playbook` — the kind of component the friction is attributed to
|
|
226
|
-
| `attributed_name` | string | no | `internal-enum` | The component's name (free string, 1-200 chars). Independently optional in the schema; emitters pair it with `attributed_kind` and omit both when they can't attribute
|
|
218
|
+
| Field | Type | Required | Axis | Notes |
|
|
219
|
+
| ------------------- | ------ | -------- | --------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
220
|
+
| `task_id` | string | yes | `identity` | Required for this type. |
|
|
221
|
+
| `step` | number | yes | `metric` | 1-based `tasks.md` task number the checkpoint belongs to (≥ 1, int). |
|
|
222
|
+
| `review_iterations` | number | yes | `metric` | Reviewer invocations that preceded this checkpoint (≥ 1, int — every step is reviewed before the human; resets after a "changes"). |
|
|
223
|
+
| `checkpoint_result` | string | yes | `internal-enum` | `ok` \| `changes` \| `ok_downgrade` (OK and switch the remaining tasks to all-at-once). |
|
|
224
|
+
| `attributed_kind` | string | no | `internal-enum` | `agent` \| `skill` \| `playbook` — the kind of component the friction is attributed to. **Omitted when the emitter can't attribute.** |
|
|
225
|
+
| `attributed_name` | string | no | `internal-enum` | The component's name (free string, 1-200 chars). Independently optional in the schema; emitters pair it with `attributed_kind` and omit both when they can't attribute. Free-string by design — see [Attribution](#attribution). |
|
|
227
226
|
|
|
228
227
|
---
|
|
229
228
|
|
|
@@ -231,7 +230,7 @@ humans bail out (`ok_downgrade`).
|
|
|
231
230
|
|
|
232
231
|
The two **friction** events — `review_rejected` and `step_completed` — carry an
|
|
233
232
|
optional `attributed_kind` (`agent` | `skill` | `playbook`) + `attributed_name`
|
|
234
|
-
(free string) pair
|
|
233
|
+
(free string) pair. They answer "_which_ component did this friction come
|
|
235
234
|
from", feeding the "friction attributed to a specific skill/agent" metric. They are
|
|
236
235
|
set by the emitting prompts (Reviewer, Orchestrator), which list the valid roster and
|
|
237
236
|
are instructed to **omit both when they can't confidently attribute** — a wrong guess
|
|
@@ -243,11 +242,11 @@ from its child `review_rejected` events, not collapsed into one fuzzy culprit.
|
|
|
243
242
|
`attributed_name` is a **free string this phase, by design** (measure-then-decide):
|
|
244
243
|
the CLI does not enforce a component registry. It rides the `internal-enum` axis (kept
|
|
245
244
|
in both tiers) because it is the moat metric and a bounded roster name, not sensitive
|
|
246
|
-
free text
|
|
247
|
-
projection. The aggregation script
|
|
245
|
+
free text — it is the one non-numeric field that survives the `anonymous`
|
|
246
|
+
projection. The aggregation script reports attribution coverage and flags names
|
|
248
247
|
outside the known roster — that signal is the thermometer. If it shows the data quality
|
|
249
248
|
is poor, the cheap next step is enum validation in `lemony emit`; an MCP-backed registry
|
|
250
|
-
only if that proves insufficient. Historical lines
|
|
249
|
+
only if that proves insufficient. Historical lines from earlier releases lack both fields, so
|
|
251
250
|
coverage starts near 0% and ramps.
|
|
252
251
|
|
|
253
252
|
## Reader contract (forward-only, dispatch-on-read)
|
|
@@ -278,9 +277,9 @@ A writer (the `lemony emit` CLI):
|
|
|
278
277
|
|
|
279
278
|
## Out of scope (recorded for forward-design)
|
|
280
279
|
|
|
281
|
-
- Tier 2 central ingestion, export, and aggregation (
|
|
280
|
+
- Tier 2 central ingestion, export, and aggregation (a later phase).
|
|
282
281
|
- The `project`-tier export branch: the axis table assigns the `identity` /
|
|
283
282
|
`free-text` policy, but the sanitizer wires only the `anonymous` branch in v1
|
|
284
|
-
|
|
283
|
+
The `project` branch lands with the consent/config work.
|
|
285
284
|
- Backward-compatible field renames — `tier2-events-history.md` records them
|
|
286
285
|
forward-only; readers add the alias when they care.
|
|
@@ -0,0 +1,121 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: a11y-audit
|
|
3
|
+
description: The accessibility-QA review lens for the UI Designer — deterministic-first and tool-orchestrating. Measures contrast from tokens, rides the project's a11y lint, runs axe on the rendered DOM when testable, then judges only what tools can't (focus management, live regions, alt-text quality, reduced motion). Use at REVIEW when a task touched UI, alongside design-critique.
|
|
4
|
+
origin: vendor
|
|
5
|
+
vendor_version: '{{vendor_version}}'
|
|
6
|
+
phase: post-implementation
|
|
7
|
+
invoked-by: [ui-designer]
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
# Accessibility Audit
|
|
11
|
+
|
|
12
|
+
Judge the **accessibility** of an implemented UI change against the WCAG target the
|
|
13
|
+
`ui-handoff.md` §9 set (default WCAG 2.1 AA). This is the **objective, measurable** half of
|
|
14
|
+
design QA — the subjective half (does it carry the design's point of view?) is the
|
|
15
|
+
`design-critique` skill, run alongside this one. The build twin is the implementer's
|
|
16
|
+
`build-ui/accessibility.md`: this lens checks what that resource asks the implementer to
|
|
17
|
+
build accessible by construction. When you change an axis below, keep its build twin in step.
|
|
18
|
+
|
|
19
|
+
The method is **deterministic-first and tool-orchestrating**: gather evidence with tools
|
|
20
|
+
before you judge, and reserve your own judgment for what no tool can decide. A clear verdict
|
|
21
|
+
must not depend on whether the optional tools are present — the tiers degrade gracefully.
|
|
22
|
+
|
|
23
|
+
## The tiered method
|
|
24
|
+
|
|
25
|
+
Run the tiers in order; each adds evidence the next builds on.
|
|
26
|
+
|
|
27
|
+
- **T0 — token-pair contrast (always available, deterministic).** Run
|
|
28
|
+
`lemony design-tokens contrast`. It computes the WCAG ratio of every foreground/background
|
|
29
|
+
token pair — including any dark-mode override — and exits non-zero on a pair below its
|
|
30
|
+
floor. This is the one a11y measurement doable offline, because colour comes from the
|
|
31
|
+
token file. It is complementary to `lemony design-tokens validate` (validate proves colour
|
|
32
|
+
is a token reference; contrast proves the pair meets its floor).
|
|
33
|
+
- **T1 — source a11y lint (rides the project).** The project's own linter usually carries an
|
|
34
|
+
accessibility plugin (`eslint-plugin-jsx-a11y`, Svelte's or Vue's a11y rules). Run the
|
|
35
|
+
project's `lint`; don't reimplement it. Read what it flags.
|
|
36
|
+
- **T2 — axe on the rendered DOM (when testable).** If the repo can render components in a
|
|
37
|
+
test (vitest-axe / jest-axe, Storybook + axe, or Playwright + `@axe-core/playwright`), run
|
|
38
|
+
axe over the changed views and read the violations. Browser-driven is simply the strongest
|
|
39
|
+
variant of this tier — opportunistic, **not required**. If nothing can render the DOM, skip
|
|
40
|
+
T2 and say so in the report; the verdict still stands on T0/T1/T3.
|
|
41
|
+
- **T3 — judgment (you).** Decide what the tools cannot: focus management on route and dialog
|
|
42
|
+
changes, live-region announcements, keyboard patterns for custom widgets, meaning not
|
|
43
|
+
carried by colour alone, **alt-text quality** (a tool sees that `alt` exists; only judgment
|
|
44
|
+
sees whether it is meaningful), and reduced-motion. T3 also triages and interprets the
|
|
45
|
+
T1/T2 output against the handoff — a lint rule a project disabled on purpose is not your
|
|
46
|
+
finding.
|
|
47
|
+
|
|
48
|
+
## The axes — tier coverage plus what judgment adds
|
|
49
|
+
|
|
50
|
+
Each axis names which tier catches it and what remains for T3.
|
|
51
|
+
|
|
52
|
+
### Semantic HTML first
|
|
53
|
+
|
|
54
|
+
Mostly T1/T2 (a clickable `<div>`, a skipped heading level, missing landmarks surface in
|
|
55
|
+
lint/axe). T3 confirms the right element was used for the meaning, not just a passing rule.
|
|
56
|
+
|
|
57
|
+
### Colour and contrast
|
|
58
|
+
|
|
59
|
+
T0 measures it from tokens — the deterministic floor (text ≥ 4.5:1, large ≥ 3:1, non-text
|
|
60
|
+
≥ 3:1), including dark mode. T3 checks the rendered pair where colour is _not_ from a token
|
|
61
|
+
(an image-on-text overlay, a hardcoded value `validate` already flags).
|
|
62
|
+
|
|
63
|
+
### Keyboard operability
|
|
64
|
+
|
|
65
|
+
T2/axe catches missing focusability; T3 walks the actual order, confirms a **visible** focus
|
|
66
|
+
indicator meets the non-text floor, that there is no keyboard trap, and that focus is managed
|
|
67
|
+
on dynamic change (dialog open returns focus on close; route change moves it sensibly).
|
|
68
|
+
|
|
69
|
+
### ARIA — only when semantics fall short
|
|
70
|
+
|
|
71
|
+
T1/T2 flag invalid ARIA and `aria-hidden` on focusable content. T3 judges whether ARIA was
|
|
72
|
+
reached for where a native element would have done, and whether state attributes
|
|
73
|
+
(`aria-expanded`, `aria-selected`, `aria-current`) actually track the real state.
|
|
74
|
+
|
|
75
|
+
### Forms
|
|
76
|
+
|
|
77
|
+
T1/T2 catch an unlabelled field. T3 confirms errors are identified **in text** and linked to
|
|
78
|
+
the field (not colour alone), and required/invalid state is in the accessibility tree.
|
|
79
|
+
|
|
80
|
+
### Images, icons and media
|
|
81
|
+
|
|
82
|
+
A tool sees that `alt` is present or absent; **T3 judges whether it is meaningful** — a
|
|
83
|
+
decorative image is empty-`alt`, a meaningful one conveys its purpose, an icon-only control
|
|
84
|
+
has an accessible name.
|
|
85
|
+
|
|
86
|
+
### Motion and time
|
|
87
|
+
|
|
88
|
+
Largely T3: is `prefers-reduced-motion` honoured for non-essential animation (mirrors the
|
|
89
|
+
motion judgment in `design-critique`)? Nothing flashes more than three times a second; no
|
|
90
|
+
tight time limit without a way to extend it.
|
|
91
|
+
|
|
92
|
+
### Touch and pointer targets
|
|
93
|
+
|
|
94
|
+
T2 can measure target size; T3 confirms interactive targets meet the 24×24 CSS px minimum
|
|
95
|
+
(WCAG 2.2 SC 2.5.8, Level AA) — 44×44 is the WCAG 2.1 AAA enhanced size and the comfortable
|
|
96
|
+
touch target — and adjacent targets are spaced so they aren't mis-tapped.
|
|
97
|
+
|
|
98
|
+
## Confidence gating
|
|
99
|
+
|
|
100
|
+
Raise a finding when a tool flags it, or when you are **>80% confident** of a T3 judgment
|
|
101
|
+
issue. A measured T0/T1/T2 failure is not a confidence call — report it. Don't pad the report
|
|
102
|
+
with rules the project deliberately disabled.
|
|
103
|
+
|
|
104
|
+
## Report
|
|
105
|
+
|
|
106
|
+
```
|
|
107
|
+
## Accessibility Audit — <task name>
|
|
108
|
+
|
|
109
|
+
**Target**: <WCAG level from ui-handoff §9>
|
|
110
|
+
**T0 contrast**: ✅ pass / ❌ <failing pairs>
|
|
111
|
+
**T1 a11y lint**: ✅ / ❌ <violations> / N/A
|
|
112
|
+
**T2 axe (rendered DOM)**: ✅ / ❌ <violations> / skipped — <why>
|
|
113
|
+
**T3 judgment**: <findings, or "none">
|
|
114
|
+
|
|
115
|
+
**Verdict**: approve / changes requested — <one-line reason>
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
"Changes requested" routes back to the Implementer (transient — no label). If the handoff §9
|
|
119
|
+
is silent on a genuinely ambiguous decision, raise a discovery (`raise-discovery`) rather
|
|
120
|
+
than guessing. An independent defect unrelated to this change is a side finding
|
|
121
|
+
(`note-side-finding`).
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: bootstrap-architecture
|
|
3
|
-
description: Author the first docs/architecture.md — a holistic map of the system's current shape, fitted to this project. A one-time bootstrap the Architect runs when a client opts into the architecture capability; thereafter update-architecture maintains it incrementally. The harness only does this when the human asks
|
|
3
|
+
description: Author the first docs/architecture.md — a holistic map of the system's current shape, fitted to this project. A one-time bootstrap the Architect runs when a client opts into the architecture capability; thereafter update-architecture maintains it incrementally. The harness only does this when the human asks.
|
|
4
4
|
origin: vendor
|
|
5
5
|
vendor_version: '{{vendor_version}}'
|
|
6
6
|
invoked-by: [architect]
|
|
@@ -23,7 +23,7 @@ the file now existing) keeps it true incrementally.
|
|
|
23
23
|
It is **not gated** — it is available before any `docs/architecture.md` exists, because its
|
|
24
24
|
job is to create the first one. But the harness runs it **only when the human opts in**
|
|
25
25
|
(via `/add-capability`, or an explicit request), never on its own: the vendor gives the
|
|
26
|
-
framework, not the architecture
|
|
26
|
+
framework, not the architecture. The map is authored **from the client's real
|
|
27
27
|
code**, so the harness reflects the project's shape — it never imposes one.
|
|
28
28
|
|
|
29
29
|
## What to produce
|
|
@@ -66,7 +66,7 @@ Return to your invoker: the sections written, a one-line summary of the shape th
|
|
|
66
66
|
captures, and the explicit note that `update-architecture` should be activated via `repair`.
|
|
67
67
|
If the project is too small or too uniform to have a meaningful architecture (a single-purpose
|
|
68
68
|
script, a flat library), **say so and write nothing** — an architecture map for a project
|
|
69
|
-
without architecture is noise, and
|
|
69
|
+
without architecture is noise, and the client need not have one.
|
|
70
70
|
|
|
71
71
|
## Uncontemplated Scenarios
|
|
72
72
|
|
|
@@ -0,0 +1,147 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: build-ui
|
|
3
|
+
description: The implementer's method for building UI from a ui-handoff.md — apply the project's design tokens to whatever stack the repo uses, build without generic "AI slop", and ship accessible by construction. Stack-agnostic — it owns the process and the token contract, not per-framework guides. Use when implementing a task that touches UI and a ui-handoff.md exists.
|
|
4
|
+
origin: vendor
|
|
5
|
+
vendor_version: '{{vendor_version}}'
|
|
6
|
+
phase: during-implementation
|
|
7
|
+
invoked-by: [implementer]
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
# Build UI
|
|
11
|
+
|
|
12
|
+
The implementer's **build method**: turn a `ui-handoff.md` (the design contract) plus
|
|
13
|
+
`docs/design-tokens.json` (the token source of truth) into UI code that applies the
|
|
14
|
+
project's tokens correctly, carries the design's point of view instead of generic
|
|
15
|
+
defaults, and is accessible by construction.
|
|
16
|
+
|
|
17
|
+
## What this skill owns — and what it deliberately doesn't
|
|
18
|
+
|
|
19
|
+
It owns the **harness-stable layer**: the tokens-as-code contract, the build process,
|
|
20
|
+
and the stack-agnostic principles. It does **not** reproduce per-framework know-how —
|
|
21
|
+
how to theme Tailwind v4, configure MUI's `createTheme`, or wire vanilla-extract. You
|
|
22
|
+
already know the stacks, and you read `package.json` to confirm which one this repo
|
|
23
|
+
uses. A frozen per-stack guide would rot the moment the ecosystem moves (a Tailwind
|
|
24
|
+
v3→v4 jump turns it into a lie); the live model does not. So this skill names the
|
|
25
|
+
**substrate space** and lets you map any tool — present or future — onto it.
|
|
26
|
+
|
|
27
|
+
## The process
|
|
28
|
+
|
|
29
|
+
1. **Read the inputs.** `ui-handoff.md` (under `.claude/state/tasks/<id>/spec/`) is your
|
|
30
|
+
obligatory design input — the dials, screens, components, states, motion, a11y target
|
|
31
|
+
and microcopy. `docs/design-tokens.json` is the token source of truth (§11 of the
|
|
32
|
+
handoff points at it). Read both before writing UI.
|
|
33
|
+
2. **Identify the substrate.** Read `package.json` and the styling config to place the
|
|
34
|
+
repo in one of the three archetypes below. The archetype, not the tool name, drives
|
|
35
|
+
how you apply tokens.
|
|
36
|
+
3. **Apply tokens through the substrate.** Build the screens and components from the
|
|
37
|
+
handoff at the direction its dials set — referencing tokens, never raw values.
|
|
38
|
+
4. **Pull the craft resources as you build.** Read [anti-slop.md](./anti-slop.md) before
|
|
39
|
+
generating any visual code, and [accessibility.md](./accessibility.md) when you build
|
|
40
|
+
anything interactive or visual. They load on demand — don't inline their content here.
|
|
41
|
+
5. **Self-validate before signalling done.** Run `lemony design-tokens validate` (it flags
|
|
42
|
+
raw colours/dimensions in source that should reference a token) and re-read the
|
|
43
|
+
handoff's dials and targets against what you built.
|
|
44
|
+
|
|
45
|
+
## Tokens-as-code — the contract
|
|
46
|
+
|
|
47
|
+
`docs/design-tokens.json` is the **single, client-owned source of truth**, a 3-tier
|
|
48
|
+
W3C-DTCG model: **primitive** (raw scale values) → **semantic** (intent: `color.surface`,
|
|
49
|
+
`space.inset.md`) → **component** (a component's specific slots). Two rules hold on every
|
|
50
|
+
stack:
|
|
51
|
+
|
|
52
|
+
- **Reference semantic or component tokens; never primitives, never raw literals.** A
|
|
53
|
+
component reads `color.surface`, not `gray.50` and never `#fff`. Raw values in source
|
|
54
|
+
are what `design-tokens validate` catches.
|
|
55
|
+
- **Dark mode (and any theme) is an override, not a fork.** Swap the values behind the
|
|
56
|
+
semantic layer; the component code stays identical. You never branch component logic on
|
|
57
|
+
the theme.
|
|
58
|
+
|
|
59
|
+
## Substrate archetypes
|
|
60
|
+
|
|
61
|
+
Identify which one the repo is, then apply tokens that way. Each snippet is
|
|
62
|
+
**illustrative, not normative** — it names the space so you can map the actual tool.
|
|
63
|
+
|
|
64
|
+
**1. CSS custom properties (runtime cascade).** Token → `--var` → `var()`. The cascade
|
|
65
|
+
carries theming; an override re-declares the variable.
|
|
66
|
+
|
|
67
|
+
```css
|
|
68
|
+
:root {
|
|
69
|
+
--color-surface: #ffffff; /* from semantic token color.surface */
|
|
70
|
+
--space-inset-md: 12px;
|
|
71
|
+
}
|
|
72
|
+
[data-theme='dark'] {
|
|
73
|
+
--color-surface: #1a1a1a; /* dark mode = override the var, the .card below is untouched */
|
|
74
|
+
}
|
|
75
|
+
.card {
|
|
76
|
+
background: var(--color-surface);
|
|
77
|
+
padding: var(--space-inset-md);
|
|
78
|
+
}
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
_Space: vanilla CSS, Sass, CSS Modules._
|
|
82
|
+
|
|
83
|
+
**2. Central config/theme object a library consumes.** Tokens populate one theme object;
|
|
84
|
+
components read it through the library's API (`className`, `sx`, `styled`), never literals.
|
|
85
|
+
|
|
86
|
+
```ts
|
|
87
|
+
export const theme = {
|
|
88
|
+
colors: { surface: tokens.color.surface },
|
|
89
|
+
space: { insetMd: tokens.space.inset.md },
|
|
90
|
+
};
|
|
91
|
+
// usage: <Box sx={{ bg: 'surface', p: 'insetMd' }} /> — the library resolves it, no hard-coded values
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
_Space: Tailwind (`config` / `@theme`), MUI / Chakra / Mantine (`createTheme`),
|
|
95
|
+
styled-components / Emotion._
|
|
96
|
+
|
|
97
|
+
**3. Compiled type-safe contract (zero-runtime CSS-in-TS).** Tokens become a typed
|
|
98
|
+
contract resolved at build; a missing token is a compile error, not a silent fallback.
|
|
99
|
+
|
|
100
|
+
```ts
|
|
101
|
+
export const vars = createThemeContract({
|
|
102
|
+
color: { surface: null },
|
|
103
|
+
space: { insetMd: null },
|
|
104
|
+
});
|
|
105
|
+
export const light = createTheme(vars, {
|
|
106
|
+
color: { surface: tokens.color.surface },
|
|
107
|
+
space: { insetMd: tokens.space.inset.md },
|
|
108
|
+
});
|
|
109
|
+
// components consume `vars.color.surface`; the type system rejects an unknown token
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
_Space: vanilla-extract, Panda CSS, StyleX, Linaria._
|
|
113
|
+
|
|
114
|
+
**Orthogonal axis — static-build ↔ dynamic-runtime theming.** Independent of the
|
|
115
|
+
archetype: a single fixed theme can compile away entirely, while multi-tenant / white-label
|
|
116
|
+
products need tokens to live as CSS variables or context so they switch at runtime. Pick the
|
|
117
|
+
side the product needs; it cross-cuts all three archetypes.
|
|
118
|
+
|
|
119
|
+
**Out of scope for now — native substrates.** React Native / Compose / SwiftUI map tokens
|
|
120
|
+
to a native style object; recognised as a fourth archetype but not covered here yet.
|
|
121
|
+
|
|
122
|
+
## When the handoff contradicts reality
|
|
123
|
+
|
|
124
|
+
If building reveals that the `ui-handoff.md` contradicts the codebase, is missing a
|
|
125
|
+
decision with more than one valid answer, or asks for something that already exists, **do
|
|
126
|
+
not improvise** — raise it through the implementer's discovery channel so the designer or
|
|
127
|
+
the human resolves it, then resume. An independent defect unrelated to your change is a
|
|
128
|
+
side finding, not a blocker.
|
|
129
|
+
|
|
130
|
+
## Cross-references
|
|
131
|
+
|
|
132
|
+
```
|
|
133
|
+
grill-ui → ui-handoff.md → implementer (build-ui) → REVIEW (design-critique + a11y-audit) → merge gate
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
You consume `ui-handoff.md`; the REVIEW design lens checks what you built against it. The
|
|
137
|
+
`design-critique` skill mirrors [anti-slop.md](./anti-slop.md)'s principles and `a11y-audit`
|
|
138
|
+
mirrors [accessibility.md](./accessibility.md)'s — when you change a principle here, keep its
|
|
139
|
+
review twin in step.
|
|
140
|
+
|
|
141
|
+
---
|
|
142
|
+
|
|
143
|
+
See also:
|
|
144
|
+
|
|
145
|
+
- [anti-slop.md](./anti-slop.md) — the craft layer: how to build the design's point of view
|
|
146
|
+
into the code instead of generic defaults.
|
|
147
|
+
- [accessibility.md](./accessibility.md) — WCAG 2.1 AA implementation patterns.
|
|
@@ -0,0 +1,101 @@
|
|
|
1
|
+
# Accessibility — building to WCAG 2.1 AA
|
|
2
|
+
|
|
3
|
+
Load this when you implement anything interactive or visual. The `ui-handoff.md` §9 sets
|
|
4
|
+
the **target** (default WCAG 2.1 AA) and any task-specific decisions; this resource is the
|
|
5
|
+
**implementation** layer — how to build it accessible by construction, so the `a11y-audit`
|
|
6
|
+
review lens finds nothing to reject. Accessibility is not a pass you bolt on at the end; it
|
|
7
|
+
is a property of how the markup, focus, colour and motion are built.
|
|
8
|
+
|
|
9
|
+
## Semantic HTML first
|
|
10
|
+
|
|
11
|
+
The fastest path to accessible UI is the right element. A native `<button>`, `<a>`,
|
|
12
|
+
`<label>`, `<nav>`, `<main>`, `<ul>` carries role, keyboard behaviour and focus for free.
|
|
13
|
+
|
|
14
|
+
- **Use the element that means what you mean.** A clickable `<div>` is a bug — it has no
|
|
15
|
+
role, no keyboard handler, no focusability. Use `<button>` for actions, `<a href>` for
|
|
16
|
+
navigation.
|
|
17
|
+
- **One `<h1>` per view, headings in order.** Don't skip levels for visual sizing — size
|
|
18
|
+
with tokens, structure with heading level. Screen-reader users navigate by heading.
|
|
19
|
+
- **Landmarks frame the page.** `<header>`, `<nav>`, `<main>`, `<footer>` give assistive
|
|
20
|
+
tech a map. Reach for ARIA roles only to fill a gap native elements can't.
|
|
21
|
+
|
|
22
|
+
## Colour and contrast
|
|
23
|
+
|
|
24
|
+
This is the objective floor — measured, not judged.
|
|
25
|
+
|
|
26
|
+
- **Text contrast ≥ 4.5:1** against its background; **large text (≥ 24px, or ≥ 19px bold)
|
|
27
|
+
≥ 3:1**.
|
|
28
|
+
- **Non-text contrast ≥ 3:1** for UI component boundaries, icons that carry meaning, and
|
|
29
|
+
the visible focus indicator.
|
|
30
|
+
- **Never encode meaning in colour alone.** Pair colour with an icon, label or pattern — a
|
|
31
|
+
red border needs an error message, a green dot needs a "online" label.
|
|
32
|
+
- Because colour comes from semantic tokens, contrast is a property of the token pairing;
|
|
33
|
+
check the actual rendered pair, including in dark mode.
|
|
34
|
+
|
|
35
|
+
## Keyboard operability
|
|
36
|
+
|
|
37
|
+
Everything that works with a mouse must work with a keyboard alone.
|
|
38
|
+
|
|
39
|
+
- **All interactive elements are reachable and operable by keyboard**, in a logical Tab
|
|
40
|
+
order that follows the visual/reading order. Native elements give you this; custom
|
|
41
|
+
widgets need `tabindex` and key handlers.
|
|
42
|
+
- **A visible focus indicator on every focusable element** — never `outline: none` without
|
|
43
|
+
a clearly visible replacement that meets the 3:1 non-text contrast floor.
|
|
44
|
+
- **No keyboard traps.** Focus can always move on; if you trap it deliberately (a modal),
|
|
45
|
+
provide the documented way out (Escape) and restore focus on close.
|
|
46
|
+
- **Manage focus on dynamic change.** Opening a dialog moves focus into it and contains it;
|
|
47
|
+
closing returns focus to the trigger. Route changes move focus to a sensible anchor.
|
|
48
|
+
- **Honour expected keys** for the widget pattern (Enter/Space activate, arrows move within
|
|
49
|
+
a composite like tabs or a menu, Escape dismisses).
|
|
50
|
+
|
|
51
|
+
## ARIA — only when semantics fall short
|
|
52
|
+
|
|
53
|
+
The first rule of ARIA is don't use ARIA when a native element would do. When you do need
|
|
54
|
+
it:
|
|
55
|
+
|
|
56
|
+
- **Name every control.** A visible `<label>` (associated via `for`/`id`), or
|
|
57
|
+
`aria-label` / `aria-labelledby` when there is no visible text (icon-only buttons).
|
|
58
|
+
- **Reflect state, don't fake it.** `aria-expanded`, `aria-selected`, `aria-checked`,
|
|
59
|
+
`aria-disabled`, `aria-current` must track the real state.
|
|
60
|
+
- **Announce async change with live regions.** A status message, toast, or validation
|
|
61
|
+
result that appears without a focus change needs `aria-live` (`polite` for status,
|
|
62
|
+
`assertive` only for errors) so it is announced.
|
|
63
|
+
- **Don't break the accessibility tree.** `aria-hidden` on focusable content, redundant
|
|
64
|
+
roles, or mislabelled landmarks are worse than no ARIA.
|
|
65
|
+
|
|
66
|
+
## Forms
|
|
67
|
+
|
|
68
|
+
- **Every field has a programmatically-associated label** — not a placeholder standing in
|
|
69
|
+
for one (placeholders vanish on input and fail contrast).
|
|
70
|
+
- **Errors are identified in text, linked to the field** (`aria-describedby`), and not by
|
|
71
|
+
colour alone. Group related fields with `<fieldset>`/`<legend>`.
|
|
72
|
+
- **Required and invalid states are conveyed in the accessibility tree** (`aria-required`,
|
|
73
|
+
`aria-invalid`), not only visually.
|
|
74
|
+
|
|
75
|
+
## Images, icons and media
|
|
76
|
+
|
|
77
|
+
- **Meaningful images need a text alternative** (`alt`) that conveys their purpose;
|
|
78
|
+
**decorative images get empty `alt=""`** so they are skipped.
|
|
79
|
+
- **Icon-only controls need an accessible name** (see ARIA above); a standalone meaningful
|
|
80
|
+
icon needs a text equivalent.
|
|
81
|
+
|
|
82
|
+
## Motion and time
|
|
83
|
+
|
|
84
|
+
- **Respect `prefers-reduced-motion`** — provide a reduced or no-motion path for non-essential
|
|
85
|
+
animation. This mirrors the motion guidance in [anti-slop.md](./anti-slop.md).
|
|
86
|
+
- **No content that flashes more than three times per second.**
|
|
87
|
+
- **Don't impose tight time limits**; if one exists, let the user extend or disable it.
|
|
88
|
+
|
|
89
|
+
## Touch and pointer targets
|
|
90
|
+
|
|
91
|
+
- **Interactive targets are large enough to hit** — a minimum of 24×24 CSS px (WCAG 2.2 AA,
|
|
92
|
+
SC 2.5.8); 44×44 is the WCAG 2.1 AAA enhanced size and the comfortable touch target. Space
|
|
93
|
+
adjacent targets so they aren't mis-tapped.
|
|
94
|
+
|
|
95
|
+
## Self-check before done
|
|
96
|
+
|
|
97
|
+
Walk the change with these, matching the `a11y-audit` review lens so nothing comes back:
|
|
98
|
+
keyboard-only traversal reaches and operates everything with visible focus; measured
|
|
99
|
+
contrast meets the floor (including dark mode); every control has an accessible name; async
|
|
100
|
+
updates announce; reduced-motion is honoured; targets meet the size floor. Where a decision
|
|
101
|
+
is genuinely ambiguous, it belonged in the handoff §9 — raise it rather than guess.
|