npm - @lemoncode/lemony - Versions diffs - 0.1.0 → 0.1.1-alpha.1 - Mend

@lemoncode/lemony 0.1.0 → 0.1.1-alpha.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (49) hide show

package/NOTICE +39 -0
package/README.md +0 -1
package/catalog/VERSION +1 -1
package/catalog/agents/architect.md +4 -4
package/catalog/agents/fit-assessment.md +1 -1
package/catalog/agents/implementer.md +15 -8
package/catalog/agents/orchestrator.md +204 -36
package/catalog/agents/reviewer.md +7 -7
package/catalog/agents/spec-author.md +7 -4
package/catalog/agents/ui-designer.md +121 -15
package/catalog/commands/add-capability.md +3 -3
package/catalog/commands/resume.md +10 -4
package/catalog/commands/spinoff.md +2 -2
package/catalog/commands/sync-design-tokens.md +29 -0
package/catalog/harness.config.schema.json +15 -16
package/catalog/hooks/init.sh +11 -11
package/catalog/hooks/lib/lemony.sh +3 -3
package/catalog/hooks/lib/playbook-scan.sh +10 -11
package/catalog/hooks/session-close.sh +7 -7
package/catalog/schemas/tier2-events-history.md +11 -11
package/catalog/schemas/tier2-events.md +46 -47
package/catalog/skills/a11y-audit/SKILL.md +121 -0
package/catalog/skills/bootstrap-architecture/SKILL.md +3 -3
package/catalog/skills/build-ui/SKILL.md +147 -0
package/catalog/skills/build-ui/accessibility.md +101 -0
package/catalog/skills/build-ui/anti-slop.md +107 -0
package/catalog/skills/code-explorer/SKILL.md +1 -1
package/catalog/skills/design-critique/SKILL.md +110 -0
package/catalog/skills/design-tool-sync/SKILL.md +120 -0
package/catalog/skills/grill-ui/SKILL.md +248 -0
package/catalog/skills/grill-ui/ui-handoff-format.md +149 -0
package/catalog/skills/grill-with-docs/SKILL.md +9 -2
package/catalog/skills/mutation-testing/SKILL.md +1 -1
package/catalog/skills/note-side-finding/SKILL.md +1 -1
package/catalog/skills/playbook-iterate/SKILL.md +2 -2
package/catalog/skills/review-pr/SKILL.md +3 -3
package/catalog/skills/task-closeout/SKILL.md +9 -8
package/catalog/skills/update-architecture/SKILL.md +3 -3
package/catalog/templates/claude-code/agents.md.tpl +27 -18
package/catalog/templates/claude-code/docs/playbooks/README.md.tpl +1 -3
package/catalog/templates/claude-code/harness.config.yml.tpl +8 -9
package/dist/cli.mjs +1287 -1676
package/package.json +13 -4
package/catalog/agents/README.md +0 -29
package/catalog/hooks/README.md +0 -56
package/catalog/playbook-format.md +0 -198
package/catalog/schemas/README.md +0 -13
package/catalog/skills/README.md +0 -62
package/catalog/templates/README.md +0 -32

package/catalog/schemas/tier2-events.md CHANGED Viewed

@@ -8,17 +8,16 @@
 > dispatch-on-read).
 Tier 1 (client-local) writes append-only JSONL from day one so the data is
-forward-compatible with the Tier 2 central backend designed in Fase 1+ (decision
-\#24, #25, #27, #51).
+forward-compatible with the Tier 2 central backend planned for a later phase.
 ## Storage
 - One event per line, UTF-8, in `.claude/state/events.jsonl`. The stream is
-  **local-only and gitignored — never committed** (ADR 0008, retiring decision
-  #18/#21): it sits in the managed `GITIGNORE_BLOCK` beside `current-*.md` /
-  `sessions/`. There is no Tier 2 consumer yet, so committing only dirtied the
-  base; transport to Tier 2 is the sink designed in #137.
-- **Append-only, except confirmed-sent prefix-prune** (#240, ADR 0008 §Amendment).
+  **local-only and gitignored — never committed**: it sits in the managed
+  `GITIGNORE_BLOCK` beside `current-*.md` / `sessions/`. There is no Tier 2
+  consumer yet, so committing only dirtied the base; transport to Tier 2 is a
+  planned sink.
+- **Append-only, except confirmed-sent prefix-prune.**
   Emitters only ever append. The send engine may **collapse the already-delivered prefix**
   (`[0:cursor]`) once it exceeds ~5MB — never dropping unsent bytes — which rewrites the
   file and may **reorder** the unsent tail relative to concurrent appends. Order is not a
@@ -41,7 +40,7 @@ same top level — there is no nested `payload`, so Zod discriminated unions key
 | ----------------- | ------ | -------- | --------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | `type`            | string | yes      | `internal-enum` | One of the 9 event types listed below. Discriminator.                                                                                                                                                          |
 | `ts`              | string | yes      | `metric`        | UTC ISO 8601 with `Z` suffix (e.g. `2026-05-28T14:30:00.000Z`). **No local offsets.**                                                                                                                          |
-| `user`            | string | yes      | `local-only`    | `git config user.email` of the actor (decision #53). Never exported in any tier (D7).                                                                                                                          |
+| `user`            | string | yes      | `local-only`    | `git config user.email` of the actor. Never exported in any tier.                                                                                                                                              |
 | `project`         | string | yes      | `identity`      | `task_storage.repo` slug (e.g. `acme/widgets`), from `harness.config.yml`. **Never `OWNER/REPO`** — the CLI refuses to emit while that placeholder is the value (see [Placeholder guard](#placeholder-guard)). |
 | `task_id`         | string | no       | `identity`      | Task issue id (e.g. `42`) when the event has a task context. Absent on session/global events. A per-project correlator — only meaningful alongside `project`, so it shares the `identity` axis.                |
 | `harness_version` | string | yes      | `metric`        | `version` of the **installed** `@lemoncode/lemony` package — _not_ `vendor_version` from config.                                                                                                               |
@@ -94,12 +93,12 @@ only `internal-enum` + `metric`; **`project`** (opt-in, for dogfood) additionall
 keeps `identity` + `free-text`. `local-only` is dropped in both. `identity` and
 `free-text` share today's policy but stay distinct axes for future divergence (a
 hashed `user_hash` would be `identity`-with-hashing, not `free-text`). v1 wires only
-the `anonymous` branch (decisions D7/D8/D9, ADR 0020).
+the `anonymous` branch.
 `attributed_name` is deliberately an `internal-enum` even though Zod types it as a
 bounded free string: the axis is policy-oriented (keep-always), and the field is the
-moat metric (#1, "which component causes friction") — a roster component name shared
-across all installs, not sensitive free text (D8). The aggregation script flags names
+moat metric ("which component causes friction") — a roster component name shared
+across all installs, not sensitive free text. The aggregation script flags names
 outside the known roster as the data-quality thermometer (see [Attribution](#attribution)).
 ---
@@ -108,8 +107,8 @@ outside the known roster as the data-quality thermometer (see [Attribution](#att
 Five are emitted in P5. `bug_post_merge` is deferred to P8 (meta-test). `l3_bypass`
 is deferred to P6 (the `/bypass` command). `followup_captured` is emitted by the
-`/spinoff` command (#112). `step_completed` is emitted by the Orchestrator in
-step-by-step mode (#176). The schema covers all nine so the file is
+`/spinoff` command. `step_completed` is emitted by the Orchestrator in
+step-by-step mode. The schema covers all nine so the file is
 forward-compatible — readers dispatch on `type` and ignore unknowns.
 ### 1. `session_closed` _(P5)_
@@ -150,28 +149,28 @@ Emitted by the Orchestrator when it transitions `spec-in-progress → spec-ready
 Emitted by the Orchestrator at closeout (after `gh pr view` confirms `MERGED`,
 before `git rm` of the task state).
-| Field               | Type   | Required | Axis            | Notes                                                                                                                                                |
-| ------------------- | ------ | -------- | --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `task_id`           | string | yes      | `identity`      | Required for this type.                                                                                                                              |
-| `level`             | string | yes      | `internal-enum` | `L1` \| `L2` \| `L3` — the task-fit dial value used.                                                                                                 |
-| `cycle_time_h`      | number | yes      | `metric`        | Wall-clock hours from issue creation to merge. ≥ 0, finite.                                                                                          |
-| `review_rejections` | number | yes      | `metric`        | Count of `review_rejected` events for this `task_id` (≥ 0, int).                                                                                     |
-| `mode`              | string | no       | `internal-enum` | `all_at_once` \| `step_by_step` — the mode chosen at the L1 approval gate (#176). **Absent on L2** (the question only exists where `tasks.md` does). |
-| `steps`             | number | no       | `metric`        | Count of `step_completed` events for this task (≥ 1, int). Only meaningful when `mode` is `step_by_step`; < total tasks after a mid-task downgrade.  |
+| Field               | Type   | Required | Axis            | Notes                                                                                                                                               |
+| ------------------- | ------ | -------- | --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `task_id`           | string | yes      | `identity`      | Required for this type.                                                                                                                             |
+| `level`             | string | yes      | `internal-enum` | `L1` \| `L2` \| `L3` — the task-fit dial value used.                                                                                                |
+| `cycle_time_h`      | number | yes      | `metric`        | Wall-clock hours from issue creation to merge. ≥ 0, finite.                                                                                         |
+| `review_rejections` | number | yes      | `metric`        | Count of `review_rejected` events for this `task_id` (≥ 0, int).                                                                                    |
+| `mode`              | string | no       | `internal-enum` | `all_at_once` \| `step_by_step` — the mode chosen at the L1 approval gate. **Absent on L2** (the question only exists where `tasks.md` does).       |
+| `steps`             | number | no       | `metric`        | Count of `step_completed` events for this task (≥ 1, int). Only meaningful when `mode` is `step_by_step`; < total tasks after a mid-task downgrade. |
 ### 5. `review_rejected` _(P5)_
-Emitted by the Reviewer when the verdict is REJECT (decision #25, transient
-state — no dedicated label).
+Emitted by the Reviewer when the verdict is REJECT (transient state — no
+dedicated label).
-| Field             | Type   | Required | Axis            | Notes                                                                                                                                                                                                                                                       |
-| ----------------- | ------ | -------- | --------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `task_id`         | string | yes      | `identity`      | Required for this type.                                                                                                                                                                                                                                     |
-| `reason`          | string | yes      | `free-text`     | Short human-readable reason (one line; never the full review comment). 1-500 chars.                                                                                                                                                                         |
-| `iteration`       | number | yes      | `metric`        | 1-based: the Nth rejection of this task (≥ 1, int).                                                                                                                                                                                                         |
-| `step`            | number | no       | `metric`        | The step (1-based `tasks.md` task number) whose per-step review rejected (#176). **Absent** on full-pass and all-at-once rejections.                                                                                                                        |
-| `attributed_kind` | string | no       | `internal-enum` | `agent` \| `skill` \| `playbook` — the kind of component the friction is attributed to (#217). **Omitted when the emitter can't attribute.**                                                                                                                |
-| `attributed_name` | string | no       | `internal-enum` | The component's name (free string, 1-200 chars), e.g. `implementer`. Independently optional in the schema; emitters pair it with `attributed_kind` and omit both when they can't attribute (#217). Free-string by design — see [Attribution](#attribution). |
+| Field             | Type   | Required | Axis            | Notes                                                                                                                                                                                                                                                |
+| ----------------- | ------ | -------- | --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `task_id`         | string | yes      | `identity`      | Required for this type.                                                                                                                                                                                                                              |
+| `reason`          | string | yes      | `free-text`     | Short human-readable reason (one line; never the full review comment). 1-500 chars.                                                                                                                                                                  |
+| `iteration`       | number | yes      | `metric`        | 1-based: the Nth rejection of this task (≥ 1, int).                                                                                                                                                                                                  |
+| `step`            | number | no       | `metric`        | The step (1-based `tasks.md` task number) whose per-step review rejected. **Absent** on full-pass and all-at-once rejections.                                                                                                                        |
+| `attributed_kind` | string | no       | `internal-enum` | `agent` \| `skill` \| `playbook` — the kind of component the friction is attributed to. **Omitted when the emitter can't attribute.**                                                                                                                |
+| `attributed_name` | string | no       | `internal-enum` | The component's name (free string, 1-200 chars), e.g. `implementer`. Independently optional in the schema; emitters pair it with `attributed_kind` and omit both when they can't attribute. Free-string by design — see [Attribution](#attribution). |
 ### 6. `bug_post_merge` _(P8 — schema only)_
@@ -194,7 +193,7 @@ from the envelope (optional) and usually absent.
 | `topic`  | string | yes      | `free-text` | One-line subject (typo, rename, lockfile bump, etc.). 1-200 chars. |
 | `reason` | string | yes      | `free-text` | Why the harness was bypassed. 1-500 chars (≤ 200 recommended).     |
-### 8. `followup_captured` _(#112 — `/spinoff`)_
+### 8. `followup_captured` _(`/spinoff`)_
 Emitted by the `/spinoff` command when a non-blocking, independent defect found
 mid-task is parked as a `harness:managed` + `harness:status:pending` stub. Feeds the
@@ -207,7 +206,7 @@ post-merge / production signal; conflating them would dirty the post-merge metri
 | `parent_task_id` | string | no       | `identity`      | The originating task id. **Absent** when `/spinoff` runs outside any active task (a deferred stub). |
 | `severity`       | string | no       | `internal-enum` | `low` \| `medium` \| `high` \| `critical`. Best-effort — set only when cheaply inferred.            |
-### 9. `step_completed` _(#176 — step-by-step mode)_
+### 9. `step_completed` _(step-by-step mode)_
 Emitted by the Orchestrator each time a human checkpoint **resolves** in
 step-by-step mode (L1 opt-in, chosen at the approval gate). One event per
@@ -216,14 +215,14 @@ when it re-checkpoints, with the same `step`. This is the signal that justifies
 (or condemns) the mode — the rate of checkpoints that catch things, and where
 humans bail out (`ok_downgrade`).
-| Field               | Type   | Required | Axis            | Notes                                                                                                                                                                                                                                   |
-| ------------------- | ------ | -------- | --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `task_id`           | string | yes      | `identity`      | Required for this type.                                                                                                                                                                                                                 |
-| `step`              | number | yes      | `metric`        | 1-based `tasks.md` task number the checkpoint belongs to (≥ 1, int).                                                                                                                                                                    |
-| `review_iterations` | number | yes      | `metric`        | Reviewer invocations that preceded this checkpoint (≥ 1, int — every step is reviewed before the human; resets after a "changes").                                                                                                      |
-| `checkpoint_result` | string | yes      | `internal-enum` | `ok` \| `changes` \| `ok_downgrade` (OK and switch the remaining tasks to all-at-once).                                                                                                                                                 |
-| `attributed_kind`   | string | no       | `internal-enum` | `agent` \| `skill` \| `playbook` — the kind of component the friction is attributed to (#217). **Omitted when the emitter can't attribute.**                                                                                            |
-| `attributed_name`   | string | no       | `internal-enum` | The component's name (free string, 1-200 chars). Independently optional in the schema; emitters pair it with `attributed_kind` and omit both when they can't attribute (#217). Free-string by design — see [Attribution](#attribution). |
+| Field               | Type   | Required | Axis            | Notes                                                                                                                                                                                                                            |
+| ------------------- | ------ | -------- | --------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `task_id`           | string | yes      | `identity`      | Required for this type.                                                                                                                                                                                                          |
+| `step`              | number | yes      | `metric`        | 1-based `tasks.md` task number the checkpoint belongs to (≥ 1, int).                                                                                                                                                             |
+| `review_iterations` | number | yes      | `metric`        | Reviewer invocations that preceded this checkpoint (≥ 1, int — every step is reviewed before the human; resets after a "changes").                                                                                               |
+| `checkpoint_result` | string | yes      | `internal-enum` | `ok` \| `changes` \| `ok_downgrade` (OK and switch the remaining tasks to all-at-once).                                                                                                                                          |
+| `attributed_kind`   | string | no       | `internal-enum` | `agent` \| `skill` \| `playbook` — the kind of component the friction is attributed to. **Omitted when the emitter can't attribute.**                                                                                            |
+| `attributed_name`   | string | no       | `internal-enum` | The component's name (free string, 1-200 chars). Independently optional in the schema; emitters pair it with `attributed_kind` and omit both when they can't attribute. Free-string by design — see [Attribution](#attribution). |
 ---
@@ -231,7 +230,7 @@ humans bail out (`ok_downgrade`).
 The two **friction** events — `review_rejected` and `step_completed` — carry an
 optional `attributed_kind` (`agent` | `skill` | `playbook`) + `attributed_name`
-(free string) pair (#217). They answer "_which_ component did this friction come
+(free string) pair. They answer "_which_ component did this friction come
 from", feeding the "friction attributed to a specific skill/agent" metric. They are
 set by the emitting prompts (Reviewer, Orchestrator), which list the valid roster and
 are instructed to **omit both when they can't confidently attribute** — a wrong guess
@@ -243,11 +242,11 @@ from its child `review_rejected` events, not collapsed into one fuzzy culprit.
 `attributed_name` is a **free string this phase, by design** (measure-then-decide):
 the CLI does not enforce a component registry. It rides the `internal-enum` axis (kept
 in both tiers) because it is the moat metric and a bounded roster name, not sensitive
-free text (D8) — it is the one non-numeric field that survives the `anonymous`
-projection. The aggregation script (#218) reports attribution coverage and flags names
+free text — it is the one non-numeric field that survives the `anonymous`
+projection. The aggregation script reports attribution coverage and flags names
 outside the known roster — that signal is the thermometer. If it shows the data quality
 is poor, the cheap next step is enum validation in `lemony emit`; an MCP-backed registry
-only if that proves insufficient. Historical lines (pre-#217) lack both fields, so
+only if that proves insufficient. Historical lines from earlier releases lack both fields, so
 coverage starts near 0% and ramps.
 ## Reader contract (forward-only, dispatch-on-read)
@@ -278,9 +277,9 @@ A writer (the `lemony emit` CLI):
 ## Out of scope (recorded for forward-design)
-- Tier 2 central ingestion, export, and aggregation (Fase 1+, decision #24).
+- Tier 2 central ingestion, export, and aggregation (a later phase).
 - The `project`-tier export branch: the axis table assigns the `identity` /
   `free-text` policy, but the sanitizer wires only the `anonymous` branch in v1
-  (D8/D9). The `project` branch lands with the consent/config work (#229).
+  The `project` branch lands with the consent/config work.
 - Backward-compatible field renames — `tier2-events-history.md` records them
   forward-only; readers add the alias when they care.

package/catalog/skills/a11y-audit/SKILL.md ADDED Viewed

@@ -0,0 +1,121 @@
+---
+name: a11y-audit
+description: The accessibility-QA review lens for the UI Designer — deterministic-first and tool-orchestrating. Measures contrast from tokens, rides the project's a11y lint, runs axe on the rendered DOM when testable, then judges only what tools can't (focus management, live regions, alt-text quality, reduced motion). Use at REVIEW when a task touched UI, alongside design-critique.
+origin: vendor
+vendor_version: '{{vendor_version}}'
+phase: post-implementation
+invoked-by: [ui-designer]
+---
+# Accessibility Audit
+Judge the **accessibility** of an implemented UI change against the WCAG target the
+`ui-handoff.md` §9 set (default WCAG 2.1 AA). This is the **objective, measurable** half of
+design QA — the subjective half (does it carry the design's point of view?) is the
+`design-critique` skill, run alongside this one. The build twin is the implementer's
+`build-ui/accessibility.md`: this lens checks what that resource asks the implementer to
+build accessible by construction. When you change an axis below, keep its build twin in step.
+The method is **deterministic-first and tool-orchestrating**: gather evidence with tools
+before you judge, and reserve your own judgment for what no tool can decide. A clear verdict
+must not depend on whether the optional tools are present — the tiers degrade gracefully.
+## The tiered method
+Run the tiers in order; each adds evidence the next builds on.
+- **T0 — token-pair contrast (always available, deterministic).** Run
+  `lemony design-tokens contrast`. It computes the WCAG ratio of every foreground/background
+  token pair — including any dark-mode override — and exits non-zero on a pair below its
+  floor. This is the one a11y measurement doable offline, because colour comes from the
+  token file. It is complementary to `lemony design-tokens validate` (validate proves colour
+  is a token reference; contrast proves the pair meets its floor).
+- **T1 — source a11y lint (rides the project).** The project's own linter usually carries an
+  accessibility plugin (`eslint-plugin-jsx-a11y`, Svelte's or Vue's a11y rules). Run the
+  project's `lint`; don't reimplement it. Read what it flags.
+- **T2 — axe on the rendered DOM (when testable).** If the repo can render components in a
+  test (vitest-axe / jest-axe, Storybook + axe, or Playwright + `@axe-core/playwright`), run
+  axe over the changed views and read the violations. Browser-driven is simply the strongest
+  variant of this tier — opportunistic, **not required**. If nothing can render the DOM, skip
+  T2 and say so in the report; the verdict still stands on T0/T1/T3.
+- **T3 — judgment (you).** Decide what the tools cannot: focus management on route and dialog
+  changes, live-region announcements, keyboard patterns for custom widgets, meaning not
+  carried by colour alone, **alt-text quality** (a tool sees that `alt` exists; only judgment
+  sees whether it is meaningful), and reduced-motion. T3 also triages and interprets the
+  T1/T2 output against the handoff — a lint rule a project disabled on purpose is not your
+  finding.
+## The axes — tier coverage plus what judgment adds
+Each axis names which tier catches it and what remains for T3.
+### Semantic HTML first
+Mostly T1/T2 (a clickable `<div>`, a skipped heading level, missing landmarks surface in
+lint/axe). T3 confirms the right element was used for the meaning, not just a passing rule.
+### Colour and contrast
+T0 measures it from tokens — the deterministic floor (text ≥ 4.5:1, large ≥ 3:1, non-text
+≥ 3:1), including dark mode. T3 checks the rendered pair where colour is _not_ from a token
+(an image-on-text overlay, a hardcoded value `validate` already flags).
+### Keyboard operability
+T2/axe catches missing focusability; T3 walks the actual order, confirms a **visible** focus
+indicator meets the non-text floor, that there is no keyboard trap, and that focus is managed
+on dynamic change (dialog open returns focus on close; route change moves it sensibly).
+### ARIA — only when semantics fall short
+T1/T2 flag invalid ARIA and `aria-hidden` on focusable content. T3 judges whether ARIA was
+reached for where a native element would have done, and whether state attributes
+(`aria-expanded`, `aria-selected`, `aria-current`) actually track the real state.
+### Forms
+T1/T2 catch an unlabelled field. T3 confirms errors are identified **in text** and linked to
+the field (not colour alone), and required/invalid state is in the accessibility tree.
+### Images, icons and media
+A tool sees that `alt` is present or absent; **T3 judges whether it is meaningful** — a
+decorative image is empty-`alt`, a meaningful one conveys its purpose, an icon-only control
+has an accessible name.
+### Motion and time
+Largely T3: is `prefers-reduced-motion` honoured for non-essential animation (mirrors the
+motion judgment in `design-critique`)? Nothing flashes more than three times a second; no
+tight time limit without a way to extend it.
+### Touch and pointer targets
+T2 can measure target size; T3 confirms interactive targets meet the 24×24 CSS px minimum
+(WCAG 2.2 SC 2.5.8, Level AA) — 44×44 is the WCAG 2.1 AAA enhanced size and the comfortable
+touch target — and adjacent targets are spaced so they aren't mis-tapped.
+## Confidence gating
+Raise a finding when a tool flags it, or when you are **>80% confident** of a T3 judgment
+issue. A measured T0/T1/T2 failure is not a confidence call — report it. Don't pad the report
+with rules the project deliberately disabled.
+## Report
+```
+## Accessibility Audit — <task name>
+**Target**: <WCAG level from ui-handoff §9>
+**T0 contrast**: ✅ pass / ❌ <failing pairs>
+**T1 a11y lint**: ✅ / ❌ <violations> / N/A
+**T2 axe (rendered DOM)**: ✅ / ❌ <violations> / skipped — <why>
+**T3 judgment**: <findings, or "none">
+**Verdict**: approve / changes requested — <one-line reason>
+```
+"Changes requested" routes back to the Implementer (transient — no label). If the handoff §9
+is silent on a genuinely ambiguous decision, raise a discovery (`raise-discovery`) rather
+than guessing. An independent defect unrelated to this change is a side finding
+(`note-side-finding`).

package/catalog/skills/bootstrap-architecture/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: bootstrap-architecture
-description: Author the first docs/architecture.md — a holistic map of the system's current shape, fitted to this project. A one-time bootstrap the Architect runs when a client opts into the architecture capability; thereafter update-architecture maintains it incrementally. The harness only does this when the human asks (decision #8).
+description: Author the first docs/architecture.md — a holistic map of the system's current shape, fitted to this project. A one-time bootstrap the Architect runs when a client opts into the architecture capability; thereafter update-architecture maintains it incrementally. The harness only does this when the human asks.
 origin: vendor
 vendor_version: '{{vendor_version}}'
 invoked-by: [architect]
@@ -23,7 +23,7 @@ the file now existing) keeps it true incrementally.
 It is **not gated** — it is available before any `docs/architecture.md` exists, because its
 job is to create the first one. But the harness runs it **only when the human opts in**
 (via `/add-capability`, or an explicit request), never on its own: the vendor gives the
-framework, not the architecture (decision #8). The map is authored **from the client's real
+framework, not the architecture. The map is authored **from the client's real
 code**, so the harness reflects the project's shape — it never imposes one.
 ## What to produce
@@ -66,7 +66,7 @@ Return to your invoker: the sections written, a one-line summary of the shape th
 captures, and the explicit note that `update-architecture` should be activated via `repair`.
 If the project is too small or too uniform to have a meaningful architecture (a single-purpose
 script, a flat library), **say so and write nothing** — an architecture map for a project
-without architecture is noise, and #8 means the client need not have one.
+without architecture is noise, and the client need not have one.
 ## Uncontemplated Scenarios

package/catalog/skills/build-ui/SKILL.md ADDED Viewed

@@ -0,0 +1,147 @@
+---
+name: build-ui
+description: The implementer's method for building UI from a ui-handoff.md — apply the project's design tokens to whatever stack the repo uses, build without generic "AI slop", and ship accessible by construction. Stack-agnostic — it owns the process and the token contract, not per-framework guides. Use when implementing a task that touches UI and a ui-handoff.md exists.
+origin: vendor
+vendor_version: '{{vendor_version}}'
+phase: during-implementation
+invoked-by: [implementer]
+---
+# Build UI
+The implementer's **build method**: turn a `ui-handoff.md` (the design contract) plus
+`docs/design-tokens.json` (the token source of truth) into UI code that applies the
+project's tokens correctly, carries the design's point of view instead of generic
+defaults, and is accessible by construction.
+## What this skill owns — and what it deliberately doesn't
+It owns the **harness-stable layer**: the tokens-as-code contract, the build process,
+and the stack-agnostic principles. It does **not** reproduce per-framework know-how —
+how to theme Tailwind v4, configure MUI's `createTheme`, or wire vanilla-extract. You
+already know the stacks, and you read `package.json` to confirm which one this repo
+uses. A frozen per-stack guide would rot the moment the ecosystem moves (a Tailwind
+v3→v4 jump turns it into a lie); the live model does not. So this skill names the
+**substrate space** and lets you map any tool — present or future — onto it.
+## The process
+1. **Read the inputs.** `ui-handoff.md` (under `.claude/state/tasks/<id>/spec/`) is your
+   obligatory design input — the dials, screens, components, states, motion, a11y target
+   and microcopy. `docs/design-tokens.json` is the token source of truth (§11 of the
+   handoff points at it). Read both before writing UI.
+2. **Identify the substrate.** Read `package.json` and the styling config to place the
+   repo in one of the three archetypes below. The archetype, not the tool name, drives
+   how you apply tokens.
+3. **Apply tokens through the substrate.** Build the screens and components from the
+   handoff at the direction its dials set — referencing tokens, never raw values.
+4. **Pull the craft resources as you build.** Read [anti-slop.md](./anti-slop.md) before
+   generating any visual code, and [accessibility.md](./accessibility.md) when you build
+   anything interactive or visual. They load on demand — don't inline their content here.
+5. **Self-validate before signalling done.** Run `lemony design-tokens validate` (it flags
+   raw colours/dimensions in source that should reference a token) and re-read the
+   handoff's dials and targets against what you built.
+## Tokens-as-code — the contract
+`docs/design-tokens.json` is the **single, client-owned source of truth**, a 3-tier
+W3C-DTCG model: **primitive** (raw scale values) → **semantic** (intent: `color.surface`,
+`space.inset.md`) → **component** (a component's specific slots). Two rules hold on every
+stack:
+- **Reference semantic or component tokens; never primitives, never raw literals.** A
+  component reads `color.surface`, not `gray.50` and never `#fff`. Raw values in source
+  are what `design-tokens validate` catches.
+- **Dark mode (and any theme) is an override, not a fork.** Swap the values behind the
+  semantic layer; the component code stays identical. You never branch component logic on
+  the theme.
+## Substrate archetypes
+Identify which one the repo is, then apply tokens that way. Each snippet is
+**illustrative, not normative** — it names the space so you can map the actual tool.
+**1. CSS custom properties (runtime cascade).** Token → `--var` → `var()`. The cascade
+carries theming; an override re-declares the variable.
+```css
+:root {
+  --color-surface: #ffffff; /* from semantic token color.surface */
+  --space-inset-md: 12px;
+}
+[data-theme='dark'] {
+  --color-surface: #1a1a1a; /* dark mode = override the var, the .card below is untouched */
+}
+.card {
+  background: var(--color-surface);
+  padding: var(--space-inset-md);
+}
+```
+_Space: vanilla CSS, Sass, CSS Modules._
+**2. Central config/theme object a library consumes.** Tokens populate one theme object;
+components read it through the library's API (`className`, `sx`, `styled`), never literals.
+```ts
+export const theme = {
+  colors: { surface: tokens.color.surface },
+  space: { insetMd: tokens.space.inset.md },
+};
+// usage: <Box sx={{ bg: 'surface', p: 'insetMd' }} /> — the library resolves it, no hard-coded values
+```
+_Space: Tailwind (`config` / `@theme`), MUI / Chakra / Mantine (`createTheme`),
+styled-components / Emotion._
+**3. Compiled type-safe contract (zero-runtime CSS-in-TS).** Tokens become a typed
+contract resolved at build; a missing token is a compile error, not a silent fallback.
+```ts
+export const vars = createThemeContract({
+  color: { surface: null },
+  space: { insetMd: null },
+});
+export const light = createTheme(vars, {
+  color: { surface: tokens.color.surface },
+  space: { insetMd: tokens.space.inset.md },
+});
+// components consume `vars.color.surface`; the type system rejects an unknown token
+```
+_Space: vanilla-extract, Panda CSS, StyleX, Linaria._
+**Orthogonal axis — static-build ↔ dynamic-runtime theming.** Independent of the
+archetype: a single fixed theme can compile away entirely, while multi-tenant / white-label
+products need tokens to live as CSS variables or context so they switch at runtime. Pick the
+side the product needs; it cross-cuts all three archetypes.
+**Out of scope for now — native substrates.** React Native / Compose / SwiftUI map tokens
+to a native style object; recognised as a fourth archetype but not covered here yet.
+## When the handoff contradicts reality
+If building reveals that the `ui-handoff.md` contradicts the codebase, is missing a
+decision with more than one valid answer, or asks for something that already exists, **do
+not improvise** — raise it through the implementer's discovery channel so the designer or
+the human resolves it, then resume. An independent defect unrelated to your change is a
+side finding, not a blocker.
+## Cross-references
+```
+grill-ui → ui-handoff.md → implementer (build-ui) → REVIEW (design-critique + a11y-audit) → merge gate
+```
+You consume `ui-handoff.md`; the REVIEW design lens checks what you built against it. The
+`design-critique` skill mirrors [anti-slop.md](./anti-slop.md)'s principles and `a11y-audit`
+mirrors [accessibility.md](./accessibility.md)'s — when you change a principle here, keep its
+review twin in step.
+---
+See also:
+- [anti-slop.md](./anti-slop.md) — the craft layer: how to build the design's point of view
+  into the code instead of generic defaults.
+- [accessibility.md](./accessibility.md) — WCAG 2.1 AA implementation patterns.

package/catalog/skills/build-ui/accessibility.md ADDED Viewed

@@ -0,0 +1,101 @@
+# Accessibility — building to WCAG 2.1 AA
+Load this when you implement anything interactive or visual. The `ui-handoff.md` §9 sets
+the **target** (default WCAG 2.1 AA) and any task-specific decisions; this resource is the
+**implementation** layer — how to build it accessible by construction, so the `a11y-audit`
+review lens finds nothing to reject. Accessibility is not a pass you bolt on at the end; it
+is a property of how the markup, focus, colour and motion are built.
+## Semantic HTML first
+The fastest path to accessible UI is the right element. A native `<button>`, `<a>`,
+`<label>`, `<nav>`, `<main>`, `<ul>` carries role, keyboard behaviour and focus for free.
+- **Use the element that means what you mean.** A clickable `<div>` is a bug — it has no
+  role, no keyboard handler, no focusability. Use `<button>` for actions, `<a href>` for
+  navigation.
+- **One `<h1>` per view, headings in order.** Don't skip levels for visual sizing — size
+  with tokens, structure with heading level. Screen-reader users navigate by heading.
+- **Landmarks frame the page.** `<header>`, `<nav>`, `<main>`, `<footer>` give assistive
+  tech a map. Reach for ARIA roles only to fill a gap native elements can't.
+## Colour and contrast
+This is the objective floor — measured, not judged.
+- **Text contrast ≥ 4.5:1** against its background; **large text (≥ 24px, or ≥ 19px bold)
+  ≥ 3:1**.
+- **Non-text contrast ≥ 3:1** for UI component boundaries, icons that carry meaning, and
+  the visible focus indicator.
+- **Never encode meaning in colour alone.** Pair colour with an icon, label or pattern — a
+  red border needs an error message, a green dot needs a "online" label.
+- Because colour comes from semantic tokens, contrast is a property of the token pairing;
+  check the actual rendered pair, including in dark mode.
+## Keyboard operability
+Everything that works with a mouse must work with a keyboard alone.
+- **All interactive elements are reachable and operable by keyboard**, in a logical Tab
+  order that follows the visual/reading order. Native elements give you this; custom
+  widgets need `tabindex` and key handlers.
+- **A visible focus indicator on every focusable element** — never `outline: none` without
+  a clearly visible replacement that meets the 3:1 non-text contrast floor.
+- **No keyboard traps.** Focus can always move on; if you trap it deliberately (a modal),
+  provide the documented way out (Escape) and restore focus on close.
+- **Manage focus on dynamic change.** Opening a dialog moves focus into it and contains it;
+  closing returns focus to the trigger. Route changes move focus to a sensible anchor.
+- **Honour expected keys** for the widget pattern (Enter/Space activate, arrows move within
+  a composite like tabs or a menu, Escape dismisses).
+## ARIA — only when semantics fall short
+The first rule of ARIA is don't use ARIA when a native element would do. When you do need
+it:
+- **Name every control.** A visible `<label>` (associated via `for`/`id`), or
+  `aria-label` / `aria-labelledby` when there is no visible text (icon-only buttons).
+- **Reflect state, don't fake it.** `aria-expanded`, `aria-selected`, `aria-checked`,
+  `aria-disabled`, `aria-current` must track the real state.
+- **Announce async change with live regions.** A status message, toast, or validation
+  result that appears without a focus change needs `aria-live` (`polite` for status,
+  `assertive` only for errors) so it is announced.
+- **Don't break the accessibility tree.** `aria-hidden` on focusable content, redundant
+  roles, or mislabelled landmarks are worse than no ARIA.
+## Forms
+- **Every field has a programmatically-associated label** — not a placeholder standing in
+  for one (placeholders vanish on input and fail contrast).
+- **Errors are identified in text, linked to the field** (`aria-describedby`), and not by
+  colour alone. Group related fields with `<fieldset>`/`<legend>`.
+- **Required and invalid states are conveyed in the accessibility tree** (`aria-required`,
+  `aria-invalid`), not only visually.
+## Images, icons and media
+- **Meaningful images need a text alternative** (`alt`) that conveys their purpose;
+  **decorative images get empty `alt=""`** so they are skipped.
+- **Icon-only controls need an accessible name** (see ARIA above); a standalone meaningful
+  icon needs a text equivalent.
+## Motion and time
+- **Respect `prefers-reduced-motion`** — provide a reduced or no-motion path for non-essential
+  animation. This mirrors the motion guidance in [anti-slop.md](./anti-slop.md).
+- **No content that flashes more than three times per second.**
+- **Don't impose tight time limits**; if one exists, let the user extend or disable it.
+## Touch and pointer targets
+- **Interactive targets are large enough to hit** — a minimum of 24×24 CSS px (WCAG 2.2 AA,
+  SC 2.5.8); 44×44 is the WCAG 2.1 AAA enhanced size and the comfortable touch target. Space
+  adjacent targets so they aren't mis-tapped.
+## Self-check before done
+Walk the change with these, matching the `a11y-audit` review lens so nothing comes back:
+keyboard-only traversal reaches and operates everything with visible focus; measured
+contrast meets the floor (including dark mode); every control has an accessible name; async
+updates announce; reduced-motion is honoured; targets meet the size floor. Where a decision
+is genuinely ambiguous, it belonged in the handoff §9 — raise it rather than guess.