npm - start-vibing - Versions diffs - 4.3.0 → 4.3.1 - Mend

start-vibing 4.3.0 → 4.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
 	"name": "start-vibing",
-	"version": "4.3.0",
-	"description": "Setup Claude Code with 9 plugins, 6 community skills, and 8 MCP servers. Parallel install, auto-accept, superpowers + ralph-loop. super-design 0.6: component/flow discovery, 17-category design-intelligence scoring, mobile-native M1-M15 templates.",
+	"version": "4.3.1",
+	"description": "Setup Claude Code with 9 plugins, 6 community skills, and 8 MCP servers. Parallel install, auto-accept, superpowers + ralph-loop. super-design 0.6.1: compass-artifact alignment — WCAG 2.2 SCs, CrUX field data, tool triangulation, Baymard sub-rules, Doherty/Tesler/Postel rationale, atomic state write, unshallow ladder.",
 	"type": "module",
 	"bin": {
 		"start-vibing": "./dist/cli.js"

package/template/.claude/agents/sd-audit.md CHANGED Viewed

@@ -150,6 +150,22 @@ Phase E — Form state coverage
   all valid, simulated 500, offline, paste into password, autocomplete
   tokens, Tab order vs visual order, Enter submits, mobile input zoom
   (font-size < 16px on iOS Safari).
+  Postel's Law robustness check (artifact Part 1 law table; "be liberal in
+  what you accept, conservative in what you send"). Per text/tel/email/date
+  input, verify the field is liberal on input:
+    - trims leading/trailing whitespace before validation;
+    - accepts the common format variants users actually type (phone:
+      "+55 (11) 9 9999-9999", "5511999999999", "11999999999"; date:
+      "2026-04-19", "19/04/2026", "Apr 19 2026"; email: case-insensitive
+      local-part where the provider allows it);
+    - accepts pasted values with mixed whitespace / soft hyphens / unicode
+      thin spaces without rejecting.
+  And strict on output: the value submitted to the backend and the value
+  re-rendered to the user are canonicalized (E.164 phone, ISO-8601 date,
+  trimmed). Record any field that rejects a legitimate variant that a
+  reasonable user would type as finding code `form-postel-<slug>`
+  (severity MEDIUM unless blocking primary conversion → HIGH).
 ```
 **Budget rule:** On large apps, cap to top 5 triggers per page (ranked by
@@ -172,11 +188,79 @@ For each of 10 heuristics (methodology §1), work audit questions. Score 0–4 v
 ### 3c. WCAG 2.2 AA manual pass
 Items NOT covered by axe (methodology §2.3): keyboard traps, focus-order-matches-visual-order, `:focus-visible` quality, reflow at 320px, text-spacing override, `prefers-reduced-motion`, alt text quality, link/button text adequacy.
+**WCAG 2.2 new Success Criteria — explicit checks (finding code prefix `a11y-wcag22-<sc>`):**
+- **2.4.11 Focus Not Obscured (Minimum) — AA** → `a11y-wcag22-2.4.11`. Tab/Shift+Tab through every page at all 3 viewports; verify every focused control is at least partly visible. Common fail: sticky headers/footers (`position:fixed`) covering focused links/inputs. Fix pattern: `html { scroll-padding-top: <header-h>; scroll-padding-bottom: <footer-h>; }`.
+- **2.5.7 Dragging Movements — AA** → `a11y-wcag22-2.5.7`. Enumerate every `draggable="true"`, drag-to-reorder list, range slider, kanban column, canvas drag. Each must expose a single-pointer non-dragging alternative (up/down buttons, ± steppers, numeric input, menu action). Keyboard-only is NOT sufficient (touch-only users).
+- **3.2.6 Consistent Help — A** → `a11y-wcag22-3.2.6`. If help mechanisms exist (contact, chat, self-help link, support email), verify they appear in the same relative DOM order across every page they occur on. Record snapshot quote per page, diff order, file finding if inconsistent.
+- **3.3.7 Redundant Entry — A** → `a11y-wcag22-3.3.7`. In any multi-step process (checkout, onboarding, registration), verify information previously entered is auto-populated or available for selection (e.g., "billing same as shipping" prefilled). Exceptions: essential re-entry (password confirmation), security-related, stale data. Browser autocomplete does not satisfy — the site must provide the value.
+- **3.3.8 Accessible Authentication (Minimum) — AA** → `a11y-wcag22-3.3.8`. On every auth surface (login, re-auth, 2FA, password reset), confirm no cognitive-function test (memorize password, transcribe OTP, puzzle CAPTCHA) is required unless an alternative exists (passkey, magic link), a mechanism helps (paste allowed + `autocomplete="username | current-password | one-time-code"`), or an object/personal-content exception applies. Fail pattern: `onpaste="return false"` or `autocomplete="off"` on password.
+- **3.3.9 Accessible Authentication (Enhanced) — AAA** → `a11y-wcag22-3.3.9` (advisory only, not required for AA audits). Flag as an advisory finding when AA passes only via the object-recognition or personal-content exception (e.g., "select all crosswalks" CAPTCHA). Passkeys / WebAuthn / biometrics / magic links clear this bar.
 ### 3d. Baymard (if e-commerce detected)
 If `package.json` has stripe/shopify/medusajs/saleor OR routes include /checkout /cart /products: checkout-flow + form-design + filter + PDP checklist (methodology §3).
+### 3e.0 Phase 0 — CrUX field data (MUST run before 3e synthetic lab audit)
+Lab numbers (Lighthouse / Playwright / web-vitals injected at audit time)
+are deterministic but reflect a single throttled machine. Google ranks on
+**field** data — Chrome User Experience Report (CrUX), 28-day p75 over real
+users. A site can score 95 in the lab and "Poor" in field due to device
+diversity. Field is authoritative; lab is indicative only.
+Before the synthetic pass in 3e, fetch CrUX for the origin (and for each
+templated page type if a key is configured):
+```bash
+# CrUX API (Chrome UX Report) — requires $CRUX_KEY or PageSpeed Insights key
+curl -s "https://chromeuxreport.googleapis.com/v1/records:queryOrigin?key=$CRUX_KEY" \
+  -H 'Content-Type: application/json' \
+  -d "{\"origin\":\"<site-origin>\",\"formFactor\":\"PHONE\"}" \
+  > "$SESSION_DIR/vitals/crux_origin_mobile.json"
+curl -s "https://chromeuxreport.googleapis.com/v1/records:queryOrigin?key=$CRUX_KEY" \
+  -H 'Content-Type: application/json' \
+  -d "{\"origin\":\"<site-origin>\",\"formFactor\":\"DESKTOP\"}" \
+  > "$SESSION_DIR/vitals/crux_origin_desktop.json"
+# Optional: per-URL record (only if URL has sufficient traffic)
+curl -s "https://chromeuxreport.googleapis.com/v1/records:queryRecord?key=$CRUX_KEY" \
+  -H 'Content-Type: application/json' \
+  -d "{\"url\":\"<full-url>\",\"formFactor\":\"PHONE\"}" \
+  > "$SESSION_DIR/vitals/crux_<slug>_mobile.json"
+```
+Capture field `p75` for LCP / INP / CLS (and FCP / TTFB when present).
+Outcomes:
+- **CrUX present + sufficient traffic** → field values are the verdict; lab
+  values annotate drill-down only.
+- **CrUX absent or insufficient traffic** → record the gap, fall back to
+  lab, and tag every performance finding as `source: "lab"`.
 ### 3e. Core Web Vitals
-Parse `session_dir/vitals/<page>.json`. LCP/INP/CLS/FCP/TTFB/TBT against thresholds (methodology §4). Doherty: interactions <400ms feedback.
+Parse `session_dir/vitals/<page>.json` (lab) AND `crux_*_mobile.json` /
+`crux_*_desktop.json` (field). LCP/INP/CLS/FCP/TTFB/TBT against thresholds
+(methodology §4). Doherty: interactions <400ms feedback.
+**Tag every performance finding with a `source` field:**
+```json
+{
+  "rule": "cwv-lcp",
+  "source": "lab" | "field" | "both",
+  "lab_value_ms": 3200,
+  "field_p75_ms": 4100,
+  "field_sample": "CrUX 28-day p75, PHONE",
+  "verdict": "needs-improvement"
+}
+```
+- `source: "both"` when lab + CrUX agree → highest confidence; proceed to fix.
+- `source: "field"` when CrUX fails but lab passes → real users hit it; still real.
+- `source: "lab"` when CrUX is absent / insufficient → note the gap and
+  surface as "unverified by field data" in the executive summary.
+- If lab and field disagree by > 30%, file a meta-finding
+  (`rule: perf-lab-field-divergence`) with both numbers and `source: "both"`.
 ### 3f. Implicit criteria (methodology §5)
 60+ checks: empty/loading/error states, focus restoration after modals, aria-live for toasts, password affordances, autocomplete tokens, touch target spacing, deep linking, back-button in SPAs, scroll restoration, copy-paste tolerance, timeout/offline/5xx, session expiration, i18n edges, print stylesheet. pass/fail/n-a with evidence.
@@ -265,6 +349,37 @@ mobile-native. Run the 21-item checklist verbatim against each mobile page:
 Each failed item → finding with `rule: mobile-pattern-M<N>`, evidence from
 Step 2.5 artifacts (NOT a fresh snapshot), `template_id: M<N>`.
+**Real-device vs emulation disclaimer (MANDATORY).** Playwright MCP drives
+Blink/Chromium in a resized viewport — it is NOT real iOS Safari (WebKit),
+Android Chrome on a low-end device, or any in-app WebView. Emulation can
+confirm layout, DOM, a11y tree, tab order, reduced-motion / forced-colors,
+computed CSS. It CANNOT confirm touch haptics, iOS safe-area rendering,
+iOS Safari font rasterization, PWA install/add-to-home-screen, iOS keyboard
+overlap via `visualViewport`, viewport-zoom quirks under pinch, Samsung
+Internet auto-dark, real Pointer Event latency, or hover-only fallbacks on
+real touch (iOS "sticky hover"). See methodology §9 for the full list.
+Any mobile finding whose verdict would require iOS Safari or Android Chrome
+on a real device to confirm — touch haptics, iOS safe-area insets, PWA
+install, pinch-zoom quirks, `@media (hover: hover)` behavior on real touch,
+payment sheet (Apple Pay / Google Pay), biometrics, push — MUST be tagged
+in the finding JSON as:
+```json
+{
+  "category": "real-device-required",
+  "real_device_required": true,
+  "emulation_verdict": "likely_fail | likely_pass | indeterminate",
+  "requires": ["ios-safari", "android-chrome"],
+  "rationale": "Playwright runs Blink; iOS Safari uses WebKit; cannot confirm X on emulation."
+}
+```
+`sd-synthesis` MUST surface a "real-device verification needed" banner at
+the top of the executive summary listing every finding where
+`real_device_required=true`, grouped by `requires` platform, so the human
+reviewer books a BrowserStack / Sauce / LambdaTest session before sign-off.
 Cross-reference the competitor component vocabulary from
 `.cache/evidence/component-comparison.md` — if every competitor uses bottom
 tabs on mobile and the product uses hamburger-only, density score drops AND
@@ -328,7 +443,11 @@ this to produce the executive DIS summary.
     document.head.appendChild(s);
   });
   window.__axe = await window.axe.run(document, {
-    runOnly: { type: 'tag', values: ['wcag2a','wcag2aa','wcag21a','wcag21aa','wcag22aa','best-practice'] }
+    runOnly: { type: 'tag', values: ['wcag2a','wcag2aa','wcag21a','wcag21aa','wcag22aa','best-practice'] },
+    // WCAG 2.2 rules (e.g., focus-not-obscured) ship under axe-core's
+    // experimental flag — without this, SC 2.4.11 / 2.5.7 / 2.5.8 etc.
+    // simply will NOT execute. Always enable for super-design audits.
+    experimental: true
   });
 })();
 ```

package/template/.claude/agents/sd-fix.md CHANGED Viewed

@@ -195,6 +195,17 @@ Source of truth: `references/fix-agent-playbook.md` §7.
 - Swap color used in >5 files → needs_human (too broad for single fix)
 - Convert design token itself → MEDIUM, escalate
+**Color-space rule (V4 and any new token):** When emitting new color tokens
+(V4 snap-to-nearest and any fresh tokens proposed by V-templates), express
+them in **OKLCH** — the perceptually uniform color space used by modern
+design systems (Tailwind v4, shadcn 2024+, Radix Colors). Hex / RGB are
+accepted ONLY when they match the existing codebase convention (e.g., the
+project's `tokens.css` / `globals.css` already defines all colors as hex).
+Mixing OKLCH tokens into a hex-only codebase requires a separate
+token-migration finding and is `needs_human`. Format:
+`--color-primary-500: oklch(0.65 0.20 265);` (lightness 0-1, chroma 0+,
+hue 0-360).
 ## ux templates (U1–U10)
 | ID | Fix |

package/template/.claude/agents/sd-research.md CHANGED Viewed

@@ -22,8 +22,29 @@ Output: exactly one file `docs/super-design/market-analysis.md` + evidence under
 3. **Detect niche.** Apply 8-signal scoring (playbook §1). Confidence = top / (top + second). If <0.55, use AskUserQuestion with 3 options from top verticals. Record reasoning to `.cache/evidence/niche.md`.
+   **Regulated-niche always-confirm rule.** Regulated niches: compliance-driven design choices override aesthetic preference, so always confirm. If the detected niche falls into any of the following — **fintech, healthtech, legaltech, gambling, crypto, insurance, children's-app** — ALWAYS fire `AskUserQuestion` to confirm niche + regulatory scope even when detector confidence is ≥0.95. These niches carry compliance implications (SOC2, HIPAA, PCI-DSS, GDPR, PSD2, COPPA, KYC/AML, age-gating, disclosure-mandated copy) that design directly affects — getting the niche wrong wastes the audit. Record the confirmation (selected scope, applicable regulations) to `.cache/evidence/niche.md` under `regulatory_scope:`.
 4. **Discover competitors.** 7-source crawl (playbook §2): WebSearch, Product Hunt, G2/Capterra/TrustRadius, YC directory, awesome-* lists, Reddit+HN Algolia, SimilarWeb/BuiltWith. Dedupe by domain. Rank fame × similarity × design-signal. Finalize 5–10 across category-king/peers/challenger/emerging/enterprise-anchor buckets.
+   **4a. Neumeier insertion test during discovery (per candidate).** For every candidate competitor considered for the final 5–10, apply Neumeier's insertion test (playbook §5.4): *"If this competitor's brand mark were swapped with the project's, would users notice?"* Score each on a 0–5 scale:
+   | Score | Meaning |
+   |---|---|
+   | 0 | Fully swappable — no brand equity, pure commodity visual language |
+   | 1 | Mostly swappable — generic category codes only |
+   | 2–3 | Partially distinct — some ownable elements but weak |
+   | 4 | Strong distinct identity — clear ownable signals |
+   | 5 | Instantly distinct — singular, unmistakable brand mark |
+   Competitors scoring ≤1 are **commodity benchmarks** (show what the category looks like by default); competitors scoring ≥4 **reveal defensible territory** (show what ownable positioning looks like). Include a healthy mix of both. Record the score and one-line justification per competitor in `market-analysis.md` (competitor table) and the per-competitor row in `.cache/evidence/<slug>/component-catalog.md` under a new `Insertion-test score:` field.
+   **4b. Vibe-quadrant final gate (self-check before step 5).** Before moving to step 5, plot the project draft position and each finalized competitor on a 2-axis vibe quadrant:
+   - **X axis:** serious ↔ playful
+   - **Y axis:** minimal ↔ expressive
+   If the project lands in the **same quadrant as ≥3 competitors**, surface a warning in `market-analysis.md` under a `## Positioning risk` section — exact text: `crowded quadrant — positioning risk` — and recommend **one axis shift** (per Kapferer prism §4.3 / Aaker dimensions §4.2) that would move the project into a less-occupied quadrant. **Do not auto-decide the shift**; document the warning and the recommended axis for synthesis (step 8) to reconcile with the user. Save the quadrant plot data (project + competitor coordinates) to `.cache/evidence/vibe-quadrant.md`.
 5. **Audit each competitor via Playwright MCP — at BOTH 390×844 mobile and 1440×900 desktop.** Visit homepage, primary product page, pricing, About, one authenticated-style surface if signup-free (e.g., docs, app tour). Per playbook §3 PLUS component-level extraction per §3bis below. Save to `.cache/evidence/<slug>/<viewport>/`.
 ### §3bis. Component-level extraction (mandatory, not optional)
@@ -97,11 +118,37 @@ and sd-fix use to recommend aesthetic direction.
 6. **Classify each.** Archetype (§4.1), Aaker peak (§4.2), vibe class, NN/g 4D tone (§7.1), hero-pattern.
+   **6a. Voice/tone capture — 8–12 copy sample rule (mandatory).** For each competitor, collect **8–12 distinct copy samples** (≥8 minimum; fewer = insufficient signal), one per surface where available:
+   - Hero headline
+   - Primary CTA label
+   - Error message
+   - Empty state
+   - 404 page
+   - Onboarding step 1
+   - Pricing caption / plan blurb
+   - Footer blurb
+   - ToS / legal excerpt
+   - (Optional extras: subhead, feature card, support article opener, confirmation toast)
+   Grade **each sample** on the NN/g 4D tone dimensions (playbook §7.1) using integers in {−1, 0, +1}:
+   - formal ↔ casual
+   - funny ↔ serious
+   - respectful ↔ irreverent
+   - enthusiastic ↔ matter-of-fact
+   Report per-sample scores + verbatim quote + source URL in `.cache/evidence/<slug>/copy-samples.md`, and the **mean + variance** per axis in `market-analysis.md` (tone row per competitor). Healthy brands are constant on voice, variable on tone.
+   **Insufficient-signal rule.** If fewer than 8 distinct samples can be collected (static site, gated app, locale blockers), **do not compute a tone profile** — flag the competitor as `tone-inconclusive` in `market-analysis.md` with a note listing which surfaces were missing. Never fabricate samples or scores to reach the threshold.
 7. **Build category-code matrix.** Tabulate dimensions (§5.1). Frequency per column. Classify codes obey/extend/subvert/open (§5.2).
-8. **Synthesize.** Archetype in whitespace via Neumeier insertion test (§5.4). Palette, typography, tone, audience, JTBD. Draft onliness statement.
+8. **Synthesize.** Archetype in whitespace via Neumeier insertion test (§5.4). Palette, typography, tone, audience, JTBD.
+8b. **Three-territories pitch (Q7 — Part 7 of `docs/compass_artifact_wf-2e33af6e-127f-402e-8ce6-cb506fc91b94_text_markdown.md` lines 515–519, 652–653).** Before drafting the onliness statement, produce THREE parallel variants of the design direction — **safe** (conforms to category codes), **expected** (the obvious evolution of category codes), **edgy** (the considered provocation / subversion). Each variant MUST include: palette strip (3–6 tokens), type specimen (primary + optional display), motion character (duration + easing archetype), one-line rationale tying back to archetype + category-code matrix from step 7. Build them in parallel — never serialize, or you will anchor to the first. Save to `.cache/evidence/territories/{safe,expected,edgy}.md` and include a summary table in the brief. The user chooses the primary territory (optionally stealing one detail from another) BEFORE the onliness statement lands. "Presenting one direction looks like opinion; presenting three looks like strategy" (artifact line 519).
-9. **Write `market-analysis.md`** per playbook §8 schema.
+9. **Draft onliness statement** against the chosen territory, then **write `market-analysis.md`** per playbook §8 schema (include the three-territories summary + chosen primary).
 10. **Self-check.** Fix gaps before returning.

package/template/.claude/skills/super-design/.schema-version ADDED Viewed

	@@ -0,0 +1 @@
1	+ 1.0.0

package/template/.claude/skills/super-design/SKILL.md CHANGED Viewed

@@ -8,7 +8,7 @@ description: >
   UX audit (WCAG 2.2 AA, Nielsen heuristics, Baymard, CWV), and synthesized
   overview. Re-audits only what changed since last run. On explicit user request,
   applies surgical fixes with full rollback.
-version: 0.6.0
+version: 0.6.1
 ---
 # super-design
@@ -86,9 +86,14 @@ Pass findings via files under `.super-design/sessions/<id>/`, not chat.
 ### Step 4: Write state + history
-- Atomic write `.audit-state.json` (.tmp then rename).
+- Atomic write `.audit-state.json` via `scripts/write-state.sh` (takes JSON
+  on stdin, writes `.tmp`, validates with `jq`, then renames). Do NOT write
+  the state file directly.
 - Append session to `audit-history.md`.
 - `git notes --ref=super-design add -f -m <json> HEAD`.
+- First-time notes setup (run once per clone, also in `setup-git-notes.sh`
+  if you extract it): `git config --add remote.origin.fetch '+refs/notes/super-design/*:refs/notes/super-design/*'`
+  — without this, notes don't round-trip across clones (artifact §7).
 ### Step 5: Return summary (≤5 sentences)
@@ -103,6 +108,7 @@ Do NOT paste overview into chat.
 - `--fix` — run sd-fix after audit
 - `--dry-run` — artifacts without committing state
 - `--ci` — non-interactive, create PR, exit non-zero on blockers
+- `--update-baselines` — Re-hash pages and tokens without re-auditing (use after accepted cosmetic drift)
 ## References (Read on demand)

package/template/.claude/skills/super-design/references/audit-methodology.md CHANGED Viewed

@@ -187,6 +187,66 @@ The 2026 WebAIM Million report (released ~30 March 2026, https://webaim.org/proj
 **What automation CANNOT catch** (the manual 40–70%): keyboard trap detection beyond trivial cases; screen-reader announcement quality; meaningful alt text (tools detect missing, not quality); logical focus order when CSS reorders; error message clarity; appropriate heading structure (presence vs meaning); context-appropriate link text; video caption accuracy/sync; color meaning (1.4.1); reading order under CSS transforms; reflow at 320px (no tool); text-spacing override; focus-indicator quality (2.4.11 / 2.4.13); `prefers-reduced-motion` honoring; target size / spacing (partial); autocomplete correctness; captcha alternatives; content-on-hover persistence (1.4.13); page title descriptiveness; consistent navigation/identification (3.2.3 / 3.2.4).
+### 2.5 Tool triangulation for automated a11y (MANDATORY)
+axe-core alone catches only ~57% of WCAG issues by volume (Deque's own
+coverage study) and ~40% by Success Criterion count. Running a single
+engine guarantees blind spots. sd-audit MUST run axe **plus** at least two
+more engines and dedupe the merged results by `{rule, selector}`.
+**Engines to run in parallel (don't serialize — they're cheap):**
+```bash
+# 1. axe-core — inject via Playwright MCP (already done in sd-audit Step 2,
+#    with `experimental: true` so WCAG 2.2 rules fire).
+# 2. Pa11y — runs both htmlcs and axe runners for broader coverage
+pa11y "$URL" \
+  --standard WCAG2AA \
+  --runner axe \
+  --runner htmlcs \
+  --reporter json \
+  > "$SESSION_DIR/a11y/pa11y_<slug>_<vp>.json"
+# 3. WAVE API — WebAIM's detector, overlay-oriented, strong on contrast and ARIA
+curl -s "https://wave.webaim.org/api/request?key=$WAVE_KEY&url=$(printf %s "$URL" | jq -sRr @uri)&reporttype=4" \
+  > "$SESSION_DIR/a11y/wave_<slug>_<vp>.json"
+# 4. IBM Equal Access Accessibility Checker — ACT-rule aligned, ~140 rules
+achecker "$URL" \
+  --reportLevel violation,potentialviolation \
+  --outputFormat json \
+  --outputFolder "$SESSION_DIR/a11y/achecker_<slug>_<vp>/"
+```
+**Dedup and severity normalization:**
+```js
+// Pseudocode — run in sd-audit Step 3a after parsing all engine outputs.
+const key = (f) => `${f.rule}::${f.selector}`;
+const bucket = new Map();
+for (const engine of ['axe', 'pa11y', 'wave', 'achecker']) {
+  for (const f of findings[engine]) {
+    const k = key(f);
+    if (!bucket.has(k)) bucket.set(k, { ...f, engines: [] });
+    bucket.get(k).engines.push(engine);
+  }
+}
+// Confidence ladder:
+//   engines.length === 1 → single-source, keep but tag "unverified"
+//   engines.length >= 2 → triangulated, high confidence
+//   engines.length >= 3 → near-certain, promote severity +1 step
+```
+**Why each engine is non-redundant:**
+- **axe-core** — strongest on `4.1.2`, `1.4.3`, `1.1.1`, `1.3.1` subsets; experimental flag gates 2.2.
+- **Pa11y** — htmlcs runner catches rule shapes axe doesn't ship; dual-runner mode is unique.
+- **WAVE** — contrast slider tool + in-context overlay; different heuristics for ARIA landmarks; never marks a page "passed" (useful bias — never false-clean).
+- **IBM Equal Access** — ACT-Rules aligned (different normative base than axe); baseline-file regression; severity categories (`violation / potentialviolation / recommendation / manual`) map onto Nielsen 0–4.
+Record per-finding `source_engines: [...]` so sd-synthesis can promote
+triangulated issues and down-weight single-engine noise.
 ### 2.4 Regulatory context
 **European Accessibility Act** (Directive (EU) 2019/882) enforceable **28 June 2025** for new products/services; 28 June 2030 deadline for pre-existing. References **EN 301 549** (currently WCAG 2.1 AA baseline; being updated to 2.2). Extraterritorial.
@@ -201,6 +261,64 @@ The 2026 WebAIM Million report (released ~30 March 2026, https://webaim.org/proj
 Baymard (Copenhagen, founded ~2009) is used by 71% of Fortune 500 e-commerce companies. Research: 54 rounds of benchmarking, 327 top-grossing US/EU sites, 275,000+ manually assigned UX performance scores, 200,000+ research hours. Methodology page: https://baymard.com/research/methodology. Headline finding: the average large e-commerce site can gain **+35.26% conversion** through better checkout design — **$260B** in recoverable US+EU orders.
+### 3.0 Baymard sub-rule enumeration (finding code prefixes)
+The single blanket "Baymard" verdict is too coarse for a structured audit.
+Baymard organises findings by **surface** (checkout, PDP, search, etc.) and
+each surface has a bounded, enumerable set of sub-rules. Every Baymard
+finding raised by sd-audit MUST use one of the prefixes below so
+sd-synthesis can group them and so the fix-playbook can route to the right
+U-template. Rules that do not yet have an official Baymard number are
+prefixed by count only (e.g. `baymard-cc-01` … `baymard-cc-14`) with the
+source section cited.
+| Surface | Prefix | Count | Source (methodology §) |
+|---|---|---|---|
+| Credit card form | `baymard-cc-<NN>` | 14 rules | §3.3 + https://baymard.com/checkout-usability/credit-card-patterns |
+| Address form | `baymard-addr-<NN>` | 8 rules | §3.4 + https://baymard.com/blog/address-line-2 + https://baymard.com/blog/automatic-address-lookup |
+| Search | `baymard-search-<NN>` | 12 rules | §3.6 + https://baymard.com/ecommerce-design-examples/34-autocomplete-suggestions |
+| Filter | `baymard-filter-<NN>` | 10 rules | §3.6 + https://baymard.com/blog/promoting-product-filters + https://baymard.com/blog/have-filters-for-list-item-info |
+| Breadcrumbs | `baymard-bread-<NN>` | 6 rules | §3.7 + https://baymard.com/blog/ecommerce-breadcrumbs |
+| PDP / Product Detail | `baymard-pdp-<NN>` | 18 rules | §3.8 + https://baymard.com/research/product-page |
+**`baymard-cc-*` — Credit card form (14 rules, §3.3):**
+1. `baymard-cc-01` auto-format spaces per card brand (4-4-4-4 / 4-6-5 AMEX / 4-4-4-4-3 19-digit)
+2. `baymard-cc-02` auto-detect card type via IIN ranges (adapt length limit, CVV help, spacing)
+3. `baymard-cc-03` expiration as single MM/YY field with auto-inserted slash (NOT YYYY, NOT two dropdowns)
+4. `baymard-cc-04` CVV field with adaptive help image (3-digit back for Visa/MC, 4-digit front for AMEX)
+5. `baymard-cc-05` autocomplete tokens present: `cc-number`, `cc-name`, `cc-exp`, `cc-exp-month`, `cc-exp-year`, `cc-csc`
+6. `baymard-cc-06` `inputmode="numeric"` (never `type="number"`) on card number, expiration, CVV
+7. `baymard-cc-07` never clear CC number/CVV on validation error (preserve entered data)
+8. `baymard-cc-08` stored-card edit uses "fake editing" (delete + re-add) per PCI; clear messaging
+9. `baymard-cc-09` inline validation with 5%-abandonment guardrail (no vague "Card declined" without reason)
+10. `baymard-cc-10` fallback entry path when wallet (Apple Pay / Google Pay) fails
+11. `baymard-cc-11` accept pasted numbers; strip spaces/dashes server+client
+12. `baymard-cc-12` surface accepted card brands near the input (logo strip) before submit
+13. `baymard-cc-13` billing-address reuse: "Billing same as shipping" default-checked with editable prefill
+14. `baymard-cc-14` error messages identify which field failed + how to fix (not "Payment failed")
+> Source: §3.3 bullets + https://baymard.com/checkout-usability/credit-card-patterns. Rules 10–14 derived from §3.3 narrative; number them in this table and supersede with Baymard's official IDs when available.
+**`baymard-addr-*` — Address form (8 rules, §3.4):**
+1. `baymard-addr-01` country selector FIRST (drives all subsequent field formats/validation)
+2. `baymard-addr-02` single "Address Line 1" + optional "Address Line 2" labeled "Apt, Suite — optional" (never omit Line 2)
+3. `baymard-addr-03` automatic address autocomplete preferred (9% manual-entry typo rate without it)
+4. `baymard-addr-04` postal-code autodetect of city/state (28% mobile sites fail)
+5. `baymard-addr-05` US state as dropdown (not free text); UK optional county; hide for countries without subdivisions
+6. `baymard-addr-06` autocomplete tokens: `street-address`, `address-line1`, `address-line2`, `address-level1/2`, `postal-code`, `country`
+7. `baymard-addr-07` "Billing same as shipping" default-checked with editable prefilled fields (also WCAG 3.3.7 Redundant Entry)
+8. `baymard-addr-08` name inputs accept Unicode, hyphens, apostrophes, single-name users, >20 chars
+> Source: §3.4.
+**`baymard-search-*` — Search (12 rules, §3.6):** placeholders for 12 rules covering autocomplete presence, scope suggestions in autocomplete, search-within-current-category, autodirect on category match, query-term pluralization tolerance, typo tolerance, "no results" with recovery options, recent-searches memory, sort-vs-filter separation, faceted-search state in URL, submit-without-suggestion-selection, voice search on mobile. Rules `baymard-search-01` … `baymard-search-12`. Source: §3.6 bullets + https://baymard.com/ecommerce-design-examples/34-autocomplete-suggestions. Enumerate the exact wording when next Baymard PDF is purchased.
+**`baymard-filter-*` — Filter (10 rules, §3.6):** placeholders `baymard-filter-01` … `baymard-filter-10` covering: promote top filters above the product grid; truncate long value lists >10 with styled "More" link; category-specific filters (megapixels, temperature rating); filters for every attribute displayed in list items; expand/collapse icons right-aligned; applied-filter pills visible and individually removable; "clear all" affordance; range sliders with keyboard + numeric input; multi-select affordance obvious; result count live-updated via `aria-live`. Source: §3.6 bullets + https://baymard.com/blog/promoting-product-filters + https://baymard.com/blog/have-filters-for-list-item-info.
+**`baymard-bread-*` — Breadcrumbs (6 rules, §3.7):** placeholders `baymard-bread-01` … `baymard-bread-06` covering: present on all non-home pages; implement BOTH hierarchy-based AND history-based (68% of top 50 sub-par, 45% only hierarchy, 23% none); present on mobile (65% mobile fail); no "hidden in more" collapse on desktop without reveal; last crumb non-clickable; structured-data markup (`BreadcrumbList`). Source: §3.7 + https://baymard.com/blog/ecommerce-breadcrumbs.
+**`baymard-pdp-*` — PDP / Product Detail (18 rules, §3.8):** placeholders `baymard-pdp-01` … `baymard-pdp-18` covering: single dominant Add-to-Cart (no 3–6 competing colorful buttons); shipping cost/ETA visible on PDP (64% of users look for it); total visible before checkout (24% abandon otherwise); accordion over horizontal tabs (67% of accordion users mis-implement; 28% use worst-performing tabs); UGC visuals present (67% lack them); minimum 3–5 images + zoom + variant-driven imagery; swatches not dropdowns for variants; out-of-stock variants visible-but-disabled (not removed); star ratings above the fold (up to +18% conversion with verified badges); stock urgency without dark patterns; delivery-date estimator; returns policy on PDP; size guide inline (not in a separate page); Q&A / reviews with filtering; cross-sell without shoving below CTA; "notify me" flow for OOS; price history for discount honesty; country/currency switcher persisted. Source: §3.8 + https://baymard.com/research/product-page.
 ### 3.1 Cart abandonment
 **70.19% average abandonment** across 49 studies 2006–2023, range 55–84.27% (https://baymard.com/lists/cart-abandonment-rate). By device: mobile 77.06%, tablet 66.39%, desktop 70.01%. **Reasons** (excluding 43% "just browsing"): extra costs 48%; forced account creation 24%; slow delivery 19%; distrust with CC 18–19%; too long/complicated 17–18%; couldn't see total up front 16%; errors/crashes 13%; returns policy 12%; declined CC 9%; limited payment methods 7%.

package/template/.claude/skills/super-design/references/design-intelligence-rubric.md CHANGED Viewed

@@ -29,6 +29,8 @@ A score without evidence is invalid. Auditor records `n/a` instead of guessing.
 ## Category 1 — Visual hierarchy (weight 1.0)
+**Rationale:** Reber et al. processing fluency + Tractinsky aesthetic-usability — clean dominance hierarchies literally reduce cognitive load; beauty buys friction tolerance only after hierarchy is solved (artifact Parts 1–3).
 **Question:** On this view, what is the single primary goal? Is it the most
 dominant element visually?
@@ -47,6 +49,8 @@ dominant element visually?
 ## Category 2 — Density calibration per viewport (weight 1.2)
+**Rationale:** Fitts's Law + thumbzone ergonomics — density must respect the physical reach envelope of the device; cramming desktop density onto mobile violates motor cost.
 **Question:** Does information density match the device context?
 | Viewport | Expected primary entities visible above fold |
@@ -68,6 +72,8 @@ dominant element visually?
 ## Category 3 — Consistency: spacing scale (weight 0.8)
+**Rationale:** Gestalt proximity + rhythm — shared spacing units fuse related elements and separate distinct ones; arbitrary magic numbers break grouping perception.
 **Question:** Do paddings, margins, gaps come from a scale (4/8px or 0.25rem) or are they arbitrary magic numbers?
 **Detect:**
@@ -86,12 +92,16 @@ dominant element visually?
 ## Category 4 — Consistency: typography scale (weight 0.8)
+**Rationale:** Processing fluency (Reber) — a discrete type scale accelerates recognition; a shapeless set of sizes forces re-parsing of hierarchy on every screen.
 Same method for font-size, font-weight, line-height. Look for `text-\[\d+px\]` and arbitrary font-size. Expected: 6–10 sizes total in a designed system; 30+ sizes = vibecoded.
 ---
 ## Category 5 — Consistency: color palette (weight 0.8)
+**Rationale:** Valdez & Mehrabian (1994) — saturation × value drive emotional response more than hue; a disciplined palette with controlled lightness/saturation ranges is the single highest-leverage decision for perceived quality (artifact line 43).
 **Detect:**
 - Collect computed `color`, `background-color`, `border-color` from ≥100 elements.
 - Unique colors count. <15 = disciplined. 30+ = vibecoded.
@@ -108,6 +118,8 @@ Same method for font-size, font-weight, line-height. Look for `text-\[\d+px\]` a
 ## Category 6 — Whitespace & breathing room (weight 0.7)
+**Rationale:** *Ma* (間) — negative space is substance, not absence; whitespace signals confidence and lets figure/ground perception resolve without strain.
 **Question:** Does content have room to breathe, or is it crammed?
 **Detect:** Compute average `padding-inline + margin-inline` per content block. Compare to container width. Measure content-to-chrome ratio.
@@ -123,6 +135,8 @@ Same method for font-size, font-weight, line-height. Look for `text-\[\d+px\]` a
 ## Category 7 — Text legibility (weight 1.2)
+**Rationale:** Miller 4±1 (Cowan 2001) + Postel's robustness — legible bodies keep working-memory cost low; dense microtext forces re-reading and exhausts the 4-chunk budget that forms and scannable text depend on.
 **Detect:** `browser_evaluate` → collect `fontSize` computed px. Find minimum across visible text.
 | Viewport | Min body | Min meta | Min input |
@@ -142,6 +156,8 @@ Same method for font-size, font-weight, line-height. Look for `text-\[\d+px\]` a
 ## Category 8 — CTA hierarchy (weight 1.0)
+**Rationale:** Hick-Hyman Law — decision time grows with the log of equally-weighted options; multiple competing primaries flatten hierarchy into a choose-your-adventure and cost measurable conversion (Baymard PDP data).
 **Question:** Is there ONE primary CTA per view?
 **Detect:** Count buttons with `variant=default | primary | filled` OR bg-primary class. >1 above fold = competing.
@@ -159,6 +175,8 @@ Reference: Baymard PDP — 51% of e-commerce pages fail due to competing CTAs.
 ## Category 9 — State coverage (weight 1.1)
+**Rationale:** Norman's "make system state visible" (Seven Stages of Action) + Nielsen H1 visibility-of-system-status — missing loading/empty/error states break the user's feedback loop and strand them in uncertainty.
 Per page, does the UI handle: default / loading / empty / error / success?
 **Detect per scenario:**
@@ -177,37 +195,89 @@ Per page, does the UI handle: default / loading / empty / error / success?
 ## Category 10 — Touch targets (mobile only, weight 1.0)
-**Detect:** `browser_evaluate` → get `getBoundingClientRect` of every clickable. Count targets < 44×44 px.
+**Rationale:** Fitts's Law (MT = a + b·log₂(2D/W)) — acquisition time scales inversely with target width; sub-44px targets on fingers multiply error rate and exhaust motor patience.
+**Detect:** `browser_evaluate` → get `getBoundingClientRect` of every
+clickable (buttons, links, `[role=button|link|tab|checkbox|radio|switch]`,
+`<input>`, `<select>`, `<summary>`, anchors with click handlers). Record
+the smaller of `width × height` per target.
+### Spec reconciliation
+Three conflicting specs define "how big a touch target should be":
+| Spec | Size | Nature | Citation |
+|---|---|---|---|
+| **WCAG 2.5.8 Target Size (Minimum) — AA** | **24 × 24 CSS px** | Baseline (legal/accessibility floor, with spacing exception) | https://www.w3.org/WAI/WCAG22/Understanding/target-size-minimum |
+| **Apple Human Interface Guidelines** | **44 × 44 pt** | Platform-native target (iOS) | https://developer.apple.com/design/human-interface-guidelines/accessibility#Interactivity |
+| **Material Design (Android)** | **48 × 48 dp** | Platform-native target (Android) | https://m3.material.io/foundations/accessible-design/accessibility-basics |
+| **WCAG 2.5.5 Target Size (Enhanced) — AAA** | 44 × 44 CSS px | Advisory ceiling | https://www.w3.org/WAI/WCAG21/Understanding/target-size |
+sd-audit reconciles as follows:
+- **Baseline = 24 × 24 CSS px** — WCAG 2.5.8 AA pass; legally sufficient
+  (with 24px center-to-center spacing exception).
+- **Target = 44 × 44 CSS px** — HIG / Material / WCAG AAA; the single
+  pragmatic "platform-native" size across iOS + Android + web (Android
+  48 dp ≈ 44 CSS px at default DPI).
+- Spacing exception keeps sub-44 icons compliant with WCAG AA but does NOT
+  earn full design-intelligence points — they still feel cramped on a
+  phone, which is what Fitts's Law above predicts.
+### Scoring ladder
+Per-target classification:
+- **Full points** (≥ 44 × 44 CSS px) — platform-native, HIG/Material-clean.
+- **Half points** (24 – 43 CSS px, min dimension) — WCAG AA pass but
+  sub-optimal; counts as half a compliant target.
+- **Zero points** (< 24 CSS px, min dimension) — WCAG AA FAIL; raises a
+  separate `a11y-wcag22-2.5.8` finding in addition to pulling this score.
+Let `N` = total targets, `n44` = count ≥ 44×44, `n24` = count in
+[24, 44), `n0` = count < 24. Compute
+`compliance = (n44 + 0.5 × n24) / N`.
 | Score | Criteria |
 |---|---|
-| 10 | 100% targets ≥ 44×44 OR 24×24 with 8px+ gap |
-| 7 | 80–99% compliant |
-| 4 | 50–80% compliant (common: icon-only buttons) |
-| 0 | Widespread <24px targets |
+| 10 | `compliance ≥ 0.95` AND `n0 == 0` — essentially all targets ≥ 44 |
+| 7  | `0.80 ≤ compliance < 0.95` AND `n0 == 0` — some half-credit (24–43 px) targets, none under 24 |
+| 4  | `0.50 ≤ compliance < 0.80` OR `0 < n0 ≤ 2` — common icon-only button fail; any WCAG breach |
+| 0  | `compliance < 0.50` OR `n0 ≥ 3` — widespread <24 px targets, structural problem |
+Any `n0 > 0` ALWAYS also raises a separate finding with prefix
+`a11y-wcag22-2.5.8` (the rubric scores design intelligence; the finding
+records the legal breach).
 ---
-## Category 11 — Motion & feedback (weight 0.6)
+## Category 11 — Motion & feedback / perceived performance (weight 0.6)
-**Question:** Do interactions give feedback? Are animations tasteful and respect `prefers-reduced-motion`?
+**Rationale:** Doherty threshold (Doherty & Thadhani, IBM 1982) — system response <400 ms sustains flow; above that, perceived unresponsiveness begins. Paired with INP (Core Web Vitals) for the measurable proxy.
+**Question:** Do interactions give feedback? Are animations tasteful and respect `prefers-reduced-motion`? Does every interaction land within the Doherty ceiling?
 **Detect:**
 - `browser_evaluate` with `matchMedia('(prefers-reduced-motion: reduce)')` + check for `transition` / `animation` on interactive elements.
 - Missing hover/focus feedback on buttons = major fail.
 - >3s animations = excessive.
+**Perceived-performance sub-criterion (Doherty 400 ms ceiling, alongside INP).**
+- Parse `session_dir/vitals/<page>.json` for INP (from `web-vitals@5` attribution build).
+- For each primary interaction (Step 2.5 Phase A enumeration — clicks on CTA, form submit, nav link, combobox, modal trigger), compute **end-to-end response time** = click → visual feedback (spinner / state change / new pixels painted), not just INP.
+- Fail rule: an interaction that **passes INP** (≤ 200 ms rating "good") but whose **user-perceivable response exceeds 400 ms** (e.g., INP fires at 150 ms but the resulting navigation/paint lands at 900 ms with no intermediate skeleton/optimistic UI) **penalizes C11**. Doherty is the ceiling; INP is the low-floor subset. Apply the Nielsen 0.1 s / 1 s / 10 s progress rule for anything over 400 ms (skeleton, optimistic UI, determinate bar + ETA + cancel for >10 s).
 | Score | Criteria |
 |---|---|
-| 10 | Hover/focus/active feedback everywhere; animations ≤300ms; reduced-motion respected |
-| 7 | Most interactions feedback; reduced-motion partial |
-| 4 | Some interactions static; reduced-motion ignored |
-| 0 | No hover/focus feedback at all OR autoplay video + parallax with no disable |
+| 10 | Hover/focus/active feedback everywhere; animations ≤300 ms; reduced-motion respected; every interaction under Doherty 400 ms OR shows skeleton/optimistic state |
+| 7 | Most interactions feedback; reduced-motion partial; occasional >400 ms interaction without feedback |
+| 4 | Some interactions static; reduced-motion ignored; multiple interactions cross Doherty with no intermediate state |
+| 0 | No hover/focus feedback at all OR autoplay video + parallax with no disable OR interactions routinely exceed 400 ms with blank waits |
 ---
 ## Category 12 — Nav pattern matches platform (weight 1.0)
+**Rationale:** Fitts's Law + Hick-Hyman Law — nav patterns succeed when they minimize both motor cost (thumbzone/edge placement) and choice cost (limited top-level destinations, chunked per Miller 4±1).
 | Viewport | Expected nav |
 |---|---|
 | Mobile (≤768) | Bottom tab bar (3–5), full-screen menus, gesture back |
@@ -226,6 +296,9 @@ Per page, does the UI handle: default / loading / empty / error / success?
 ## Category 13 — Table-on-mobile detection (weight 1.2, mobile only)
+**Rationale:** Platform affordance + thumbzone — desktop tables violate mobile reading models (microtext, horizontal overflow, no visible sort); transformation to card/list is the minimum cost to preserve parse-ability.
 **Detect:** At ≤768px, find `<table>` with >3 visible columns OR `display: table` containers with horizontal scroll AND text < 13px.
 | Score | Criteria |
@@ -240,6 +313,8 @@ Per page, does the UI handle: default / loading / empty / error / success?
 ## Category 14 — Modal/sheet appropriateness (weight 0.8)
+**Rationale:** Fitts's Law + thumbzone — on mobile, close affordances belong where the thumb lives; centered dialogs with top-right dismiss violate reach on phones and strand users in forced-modal states.
 | Viewport | Expected modal pattern |
 |---|---|
 | Mobile | Bottom sheet (slide-up) or full-screen with close top-left |
@@ -258,6 +333,8 @@ Per page, does the UI handle: default / loading / empty / error / success?
 ## Category 15 — Color semantics (weight 0.6)
+**Rationale:** Jakob's Law (users spend most time on other products) + learned convention — red/green/amber mappings are pre-installed in users' mental models; using them decoratively forces re-learning and breaks status recognition at a glance.
 **Detect:** Collect colors used on: error messages, success states, warnings, info. Red = error? Green = success? Or decorative-only?
 | Score | Criteria |
@@ -270,6 +347,8 @@ Per page, does the UI handle: default / loading / empty / error / success?
 ## Category 16 — Design-system coherence (weight 1.1)
+**Rationale:** Tesler's Law (conservation of complexity) + von Neumann consistency — complexity does not disappear, it moves; a disciplined system absorbs variation once inside tokens/variants/primitives so every downstream surface stays predictable. Incoherent systems push the same complexity onto users (re-learning each screen) and onto engineers (ad-hoc classes per component). This is why C16 carries one of the highest weights: coherence is not polish, it is the mechanism that conserves attention.
 **The meta-category.** Does the app LOOK like it was designed by one team with one vision? Or does it look like a collection of shadcn defaults?
 **Detect (aesthetic signal):**
@@ -290,6 +369,8 @@ session. See `design-skills-catalog.md`.
 ## Category 17 — Vibecode detection (weight 1.0)
+**Rationale:** Norman's reflective layer (Emotional Design, 2004 — artifact line 547) — vibecoded surfaces pass the visceral/behavioral layers but fail reflective judgment; code that reads as "hand-assembled divs" telegraphs lack of intentional system, which is exactly what distinguishes "designed" from "vibecoded" output.
 **Question:** Does the code follow patterns (components, variants, tokens)
 or is it hand-assembled divs with inline styles?

package/template/.claude/skills/super-design/references/design-skills-catalog.md CHANGED Viewed

@@ -62,6 +62,37 @@ it follows the skill's specific tokens + patterns instead of defaults.
 | Editorial / blog | Readable long-form | `typeui-paper` |
 | Feature-grid homepage | Modular showcase | `typeui-bento` |
+### Vibe → typeui skill (primary → fallback)
+Covers every vibe enumerated in the artifact Part 4 (12-vibe vocabulary). The
+primary skill carries the aesthetic; the fallback handles adjacent contexts or
+fills gaps when the primary would over-commit. When a project vibe has no
+single-perfect skill (e.g. Premium/luxury, Warm/organic), the fallback plus
+`/frontend-design` is the intended path.
+| Part-4 vibe | Primary skill | Fallback | Notes |
+|---|---|---|---|
+| Minimal / clean | `typeui-clean` | `typeui-application` | Default pick for pre-launch marketing and "honest SaaS". |
+| Bold / confident | `typeui-bold` | `typeui-dramatic` | Challenger brands, consumer launches. |
+| Playful / friendly | `typeui-doodle` | `typeui-artistic` | Education, kids, creative tools. |
+| Serious / professional (B2B) | `typeui-enterprise` | `typeui-ant` | Procurement-facing, compliance. |
+| Technical / data-dense (SaaS admin) | `typeui-dashboard` | `typeui-application` | Dark-theme analytics, operator consoles. |
+| Editorial / reading | `typeui-paper` | `typeui-clean` | Long-form content, publications. |
+| Modular / showcase | `typeui-bento` | `typeui-application` | Feature grids, portfolios. |
+| Expressive / artistic | `typeui-artistic` | `typeui-dramatic` | Design tools, non-enterprise vibe-forward. |
+| Raw / statement (neobrutalism) | `typeui-neobrutalism` | `typeui-bold` | Gen-Z, indie, deliberate rule-breaking. |
+| Premium / luxury | `typeui-dramatic` | `typeui-paper` | No dedicated luxury skill — combine dramatic hero with paper's typographic restraint, then commission custom tokens via `/frontend-design`. |
+| Tech / cyberpunk | `typeui-dashboard` | `typeui-bold` | Dashboard dark base + bold accent/glow; extend via `/frontend-design` for neon/chromatic detail. |
+| Warm / organic | `typeui-paper` | `typeui-doodle` | Paper carries the warmth via texture + typographic rhythm; doodle adds hand-made detail for craft brands. |
+| Retro / nostalgic | `typeui-paper` | `typeui-doodle` | Paper's print-era cues fit mid-century/editorial retro; doodle for 90s/zine nostalgia. `/frontend-design` required for period-specific palettes. |
+| Dark / cinematic | `typeui-dramatic` | `typeui-dashboard` | Dramatic for narrative hero surfaces; dashboard for operator/app surfaces that must stay dark through the product. |
+Read this table as: "if the positioning brief (sd-research §4) lands on vibe X,
+sd-audit/sd-fix should recommend the **primary** skill first; if the project
+has constraints that rule it out (e.g. already on a light palette), fall back
+to the secondary; if both are partial, log a non-blocking advisory that
+`/frontend-design` is required to finish the aesthetic."
 ## Recommending a skill in a finding
 When `design-intelligence.categories.design_system_coherence.score ≤ 4`,

package/template/.claude/skills/super-design/scripts/detect-changes.sh CHANGED Viewed

@@ -1,30 +1,122 @@
 #!/usr/bin/env bash
 # Usage: detect-changes.sh <last_sha> [<last_iso>]
 # Emits JSON: { mode, range_start, commits, files, classified }
+#
+# Implements the fallback ladder from artifact §6 + §14:
+#   1. Anchor exists → diff with rename detection (-M90%).
+#   2. Anchor missing + shallow repo → `git fetch --unshallow --no-tags`.
+#   3. Still missing → fetch super-design git notes, retry.
+#   4. Still missing → `--since=<iso>` time-based fallback.
+#   5. Empty repo / no last_sha at all → diff against empty-tree SHA
+#      4b825dc642cb6eb9a060e54bf8d69288fbee4904 (artifact line 900).
+# Also uses --first-parent + --cherry-pick --right-only when walking
+# history (artifact §2.5 line 167-172).
 set -euo pipefail
 LAST_SHA="${1:-}"
 LAST_ISO="${2:-}"
+EMPTY_TREE="4b825dc642cb6eb9a060e54bf8d69288fbee4904"
+log() { printf '[detect-changes] %s\n' "$*" >&2; }
-if [[ -z "$LAST_SHA" ]]; then echo '{"error":"missing last_sha"}'; exit 2; fi
 if ! git rev-parse --git-dir >/dev/null 2>&1; then echo '{"error":"not-a-git-repo"}'; exit 3; fi
-if git rev-parse --verify --quiet "${LAST_SHA}^{commit}" >/dev/null; then
-  if git merge-base --is-ancestor "$LAST_SHA" HEAD; then RANGE_START="$LAST_SHA"
-  else RANGE_START="$(git merge-base HEAD "$LAST_SHA" 2>/dev/null || echo "")"; fi
-else RANGE_START=""; fi
-if [[ -z "$RANGE_START" ]]; then
-  if [[ -n "$LAST_ISO" ]]; then
-    FILES="$(git log --since="$LAST_ISO" --name-only --pretty=format: 2>/dev/null | sort -u | sed '/^$/d' || true)"
-    COMMITS="$(git log --since="$LAST_ISO" --pretty=format:'%H|%s|%an|%aI' 2>/dev/null || true)"
-    MODE="since-time"
-  else echo '{"error":"lost-anchor-no-fallback-time"}'; exit 4; fi
+# Determine HEAD availability — empty repos have no commits at all.
+if ! git rev-parse --verify --quiet HEAD >/dev/null; then
+  log "no HEAD commit; using empty-tree SHA fallback"
+  LAST_SHA="$EMPTY_TREE"
+fi
+# If caller didn't pass a last_sha, treat it as empty-tree (first audit).
+if [[ -z "$LAST_SHA" ]]; then
+  log "missing last_sha; defaulting to empty-tree SHA"
+  LAST_SHA="$EMPTY_TREE"
+fi
+resolve_anchor() {
+  # Sets RANGE_START (may be empty if anchor unrecoverable).
+  if [[ "$LAST_SHA" == "$EMPTY_TREE" ]]; then
+    RANGE_START="$EMPTY_TREE"; return 0
+  fi
+  if git rev-parse --verify --quiet "${LAST_SHA}^{commit}" >/dev/null; then
+    if git merge-base --is-ancestor "$LAST_SHA" HEAD 2>/dev/null; then
+      RANGE_START="$LAST_SHA"
+    else
+      RANGE_START="$(git merge-base HEAD "$LAST_SHA" 2>/dev/null || echo "")"
+    fi
+    return 0
+  fi
+  RANGE_START=""
+  return 1
+}
+if ! resolve_anchor; then
+  # Ladder step (a): unshallow if possible.
+  if [[ "$(git rev-parse --is-shallow-repository 2>/dev/null || echo false)" == "true" ]]; then
+    log "anchor missing and repo is shallow; attempting git fetch --unshallow --no-tags"
+    git fetch --unshallow --no-tags 2>/dev/null || log "unshallow fetch failed (continuing)"
+    resolve_anchor || true
+  fi
+fi
+if [[ -z "${RANGE_START:-}" ]]; then
+  # Ladder step (b): try to fetch super-design notes; they may pin a commit
+  # we don't have locally.
+  log "anchor still missing; fetching refs/notes/super-design"
+  git fetch origin '+refs/notes/super-design:refs/notes/super-design' 2>/dev/null \
+    || log "notes fetch failed (continuing)"
+  resolve_anchor || true
+fi
+# Ladder step (c): time-based fallback.
+if [[ -z "${RANGE_START:-}" && -n "$LAST_ISO" ]]; then
+  log "using --since=$LAST_ISO time-based fallback"
+  FILES="$(git log --since="$LAST_ISO" --name-only --pretty=format: 2>/dev/null | sort -u | sed '/^$/d' || true)"
+  COMMITS="$(git log --first-parent --since="$LAST_ISO" --pretty=format:'%H|%s|%an|%aI' 2>/dev/null || true)"
+  MODE="since-time"
+  RANGE_START=""
+elif [[ -z "${RANGE_START:-}" ]]; then
+  echo '{"error":"lost-anchor-no-fallback-time"}'; exit 4
 else
-  FILES="$(git diff --name-only "${RANGE_START}..HEAD" \
-    -- ':!*.lock' ':!package-lock.json' ':!pnpm-lock.yaml' ':!yarn.lock' \
-       ':!.github/**' ':!**/*.test.*' ':!**/*.spec.*' ':!**/*.stories.*' | sort -u)"
-  COMMITS="$(git log --no-merges --pretty=format:'%H|%s|%an|%aI' "${RANGE_START}..HEAD" 2>/dev/null || true)"
+  # SHA range available. Use --name-status -M90% -z to catch renames AND
+  # filenames with spaces (NUL-terminated output).
+  RANGE="${RANGE_START}..HEAD"
+  if [[ "$RANGE_START" == "$EMPTY_TREE" ]]; then
+    # Empty-tree baseline: everything in HEAD is "new".
+    RANGE="${EMPTY_TREE} HEAD"
+  fi
+  # Parse NUL-terminated name-status output.
+  # Format: <STATUS>\0<path>[\0<new_path>]  where STATUS can be A/M/D/Rnn/Cnn.
+  # We collapse to the post-rename path so downstream classification matches
+  # the current file layout.
+  FILES="$(
+    git diff --name-status -M90% -z \
+      -- . ':!*.lock' ':!package-lock.json' ':!pnpm-lock.yaml' ':!yarn.lock' \
+         ':!.github/**' ':!**/*.test.*' ':!**/*.spec.*' ':!**/*.stories.*' \
+      $RANGE 2>/dev/null |
+    awk -v RS='\0' '
+      BEGIN { status = "" }
+      {
+        if (status == "") { status = $0; next }
+        # Rename/copy has two path fields; we want the second (post-rename).
+        if (status ~ /^R/ || status ~ /^C/) {
+          if (wanted == "") { wanted = 1; next }
+          print; status = ""; wanted = ""
+        } else {
+          print; status = ""
+        }
+      }
+    ' | sort -u
+  )"
+  # --first-parent keeps one entry per merged PR (trunk-based). Combine
+  # with --cherry-pick --right-only to dedupe back-ports / rebased copies.
+  if [[ "$RANGE_START" == "$EMPTY_TREE" ]]; then
+    COMMITS="$(git log --first-parent --pretty=format:'%H|%s|%an|%aI' HEAD 2>/dev/null || true)"
+  else
+    COMMITS="$(git log --first-parent --cherry-pick --right-only --no-merges \
+      --pretty=format:'%H|%s|%an|%aI' "${RANGE_START}...HEAD" 2>/dev/null || true)"
+  fi
   MODE="sha-range"
 fi
@@ -32,7 +124,7 @@ declare -A CLASSIFIED
 while IFS= read -r p; do
   [[ -z "$p" ]] && continue
   case "$p" in
-    tailwind.config.*|*.tokens.json|styles/tokens.css|styles/theme.css) CLASSIFIED[tokens]+="$p,";;
+    tailwind.config.*|*.tokens.json|*.tokens|styles/tokens.css|styles/theme.css) CLASSIFIED[tokens]+="$p,";;
     components/*|src/components/*|app/_components/*) CLASSIFIED[components]+="$p,";;
     app/*/page.*|app/page.*|app/*/route.*|app/route.*|pages/*|src/pages/*|app/routes/*|src/routes/*) CLASSIFIED[routes]+="$p,";;
     public/*|src/assets/*|assets/*) CLASSIFIED[imagery]+="$p,";;
@@ -43,7 +135,7 @@ while IFS= read -r p; do
   esac
 done <<< "$FILES"
-jq -Rn --arg mode "$MODE" --arg range_start "$RANGE_START" --arg last_iso "$LAST_ISO" \
+jq -Rn --arg mode "$MODE" --arg range_start "${RANGE_START:-}" --arg last_iso "$LAST_ISO" \
   --arg tokens "${CLASSIFIED[tokens]:-}" --arg components "${CLASSIFIED[components]:-}" \
   --arg routes "${CLASSIFIED[routes]:-}" --arg imagery "${CLASSIFIED[imagery]:-}" \
   --arg deps "${CLASSIFIED[deps]:-}" --arg theory "${CLASSIFIED[theory]:-}" \
@@ -59,3 +151,6 @@ jq -Rn --arg mode "$MODE" --arg range_start "$RANGE_START" --arg last_iso "$LAST
      deps: tolist($deps), theory: tolist($theory), content: tolist($content)
    }
   }'
+# TODO(sd-audit-state §11/artifact line 902): monorepo per-app state
+#   (apps/*/docs/super-design/.audit-state.json) not yet supported.

package/template/.claude/skills/super-design/scripts/discover-routes.sh CHANGED Viewed

@@ -1,4 +1,11 @@
 #!/usr/bin/env bash
+# TODO(sd-audit-state artifact §14): dynamic routes like `/posts/[slug]`
+#   should be expanded to `/posts/@fixture-<id>` using a fixtures manifest
+#   (discovered from tests or a user-configured JSON) before being passed
+#   to hash-pages/sd-audit. Current impl emits the raw pattern.
+# TODO(sd-audit-state artifact §8): madge-based import-graph builder
+#   (`madge --json --ts-config tsconfig.json src`) to compute N-hop
+#   component → page blast radius. Expected alongside this script.
 set -euo pipefail
 detect_framework() {

package/template/.claude/skills/super-design/scripts/extract-tokens.mjs CHANGED Viewed

@@ -1,6 +1,11 @@
 #!/usr/bin/env node
+// Extract + canonicalize + hash design tokens from Tailwind configs and
+// CSS custom properties. Output is deterministic regardless of insertion
+// order (artifact §9.2 line 722: "canonical = JSON.stringify(theme,
+// Object.keys(theme).sort())").
 import fs from "node:fs";
 import path from "node:path";
+import { createHash } from "node:crypto";
 import { pathToFileURL } from "node:url";
 const out = {};
@@ -28,12 +33,24 @@ try {
   }
 } catch (e) { out._postcss_error = String(e.message || e); }
-console.log(JSON.stringify(out, null, 2));
+// TODO(sd-audit-state §9.1 artifact line 707-709): DTCG *.tokens.json and
+// Tokens Studio support — parse JSON, resolve `{alias}` refs per §9.2 line
+// 747 before hashing, then merge into `out` with prefix `dtcg:`.
+// Deterministic canonical form: keys sorted top-to-bottom.
+const sorted = Object.keys(out).sort().reduce((acc, k) => { acc[k] = out[k]; return acc; }, {});
+const canonical = JSON.stringify(sorted);
+const tokens_hash = "sha256:" + createHash("sha256").update(canonical).digest("hex");
+console.log(JSON.stringify({ tokens: sorted, tokens_hash }, null, 2));
 function flatten(obj, prefix, acc) {
   if (obj == null) return;
   if (typeof obj !== "object") { acc[prefix] = String(obj); return; }
-  for (const [k, v] of Object.entries(obj)) {
+  // Sort keys so hash is stable across JS engines / config formatters.
+  // Artifact §9.2 line 722 calls this out explicitly.
+  for (const k of Object.keys(obj).sort()) {
+    const v = obj[k];
     const key = `${prefix}.${k}`;
     if (v && typeof v === "object" && !Array.isArray(v)) flatten(v, key, acc);
     else acc[key] = Array.isArray(v) ? v.join(",") : String(v);

package/template/.claude/skills/super-design/scripts/hash-pages.sh CHANGED Viewed

@@ -1,5 +1,11 @@
 #!/usr/bin/env bash
 # Usage: hash-pages.sh <urls_file>
+#
+# TODO(sd-audit-state artifact §10 line 492, §16 line 1367-1384):
+#   Per-viewport hashes (mobile_375 / tablet_768 / desktop_1280) + pHash
+#   for perceptual similarity, plus mask_selectors passed to
+#   page.screenshot({ mask: [...] }) for deterministic diffs. Current impl
+#   hashes a single desktop viewport only.
 set -euo pipefail
 URLS="${1:?usage: hash-pages.sh <urls_file>}"
 OUT_DIR="${OUT_DIR:-docs/super-design/.cache/hashes}"

package/template/.claude/skills/super-design/scripts/setup-git-notes.sh ADDED Viewed

@@ -0,0 +1,21 @@
+#!/usr/bin/env bash
+# Usage: setup-git-notes.sh
+#
+# One-shot setup so `git notes --ref=super-design` round-trips across
+# clones. Without the remote refspec, git fetch ignores notes by default
+# (artifact §7 line 570-573).
+set -euo pipefail
+if ! git rev-parse --git-dir >/dev/null 2>&1; then
+  echo '{"error":"not-a-git-repo"}' >&2; exit 3
+fi
+# Idempotent: only add if absent.
+if git config --get-all remote.origin.fetch 2>/dev/null |
+   grep -q 'refs/notes/super-design'; then
+  echo '{"status":"already-configured"}'
+else
+  git config --add remote.origin.fetch \
+    '+refs/notes/super-design:refs/notes/super-design'
+  echo '{"status":"added","ref":"refs/notes/super-design"}'
+fi

package/template/.claude/skills/super-design/scripts/validate-state.sh CHANGED Viewed

@@ -1,14 +1,46 @@
 #!/usr/bin/env bash
+# Usage: validate-state.sh [<state_path>]
+#
+# Validates the super-design audit state file. On schema/parse errors,
+# moves the broken file aside (artifact §3 "Graceful corruption handling"
+# line 74) and emits a JSON verdict. Also enforces schema_version major
+# compatibility (artifact §12 line 934).
 set -euo pipefail
 STATE="${1:-docs/super-design/.audit-state.json}"
+# Current schema major is either read from a sibling .schema-version file
+# (so the number can be bumped without editing shell) or falls back to 1.
+SCHEMA_VERSION_FILE="$(dirname "$0")/../.schema-version"
+if [[ -f "$SCHEMA_VERSION_FILE" ]]; then
+  CURRENT_SCHEMA_MAJOR="$(cut -d. -f1 <"$SCHEMA_VERSION_FILE" | tr -d '[:space:]')"
+else
+  CURRENT_SCHEMA_MAJOR=1
+fi
 if [[ ! -f "$STATE" ]]; then echo '{"status":"missing"}'; exit 2; fi
-jq -e '
+# Parse + shape check. On failure, rename so the user can inspect and we
+# fall through to first-audit (SKILL.md Step 1 treats "corrupt" that way).
+if ! jq -e '
   (.schema_version | type == "string") and
   (.last_audit_at  | fromdateiso8601 | . > 0) and
   (.git_sha_at_audit | test("^[0-9a-f]{7,64}$")) and
   (.skill_version  | type == "string") and
   (.tools | type == "object")
-' "$STATE" >/dev/null 2>&1 || { echo '{"status":"corrupt"}'; exit 2; }
+' "$STATE" >/dev/null 2>&1; then
+  mv "$STATE" "$STATE.corrupt-$(date +%s)" 2>/dev/null || true
+  echo '{"status":"corrupt"}'; exit 2
+fi
+# schema_version major-bump check — if state was written by a newer OR
+# incompatible-older skill, force a full re-audit rather than silently
+# trusting the shape.
+STATE_MAJOR="$(jq -r '.schema_version' "$STATE" | cut -d. -f1)"
+if [[ -z "$STATE_MAJOR" || "$STATE_MAJOR" != "$CURRENT_SCHEMA_MAJOR" ]]; then
+  echo "{\"status\":\"schema-incompatible\",\"action\":\"force-full\",\"state_major\":\"${STATE_MAJOR:-unknown}\",\"current_major\":\"${CURRENT_SCHEMA_MAJOR}\"}"
+  exit 1
+fi
 AGE_DAYS=$(( ( $(date -u +%s) - $(jq -r '.last_audit_at | fromdateiso8601' "$STATE") ) / 86400 ))
 if (( AGE_DAYS > 180 )); then echo "{\"status\":\"stale-force-full\",\"age_days\":$AGE_DAYS}"; exit 1
 elif (( AGE_DAYS > 90 )); then echo "{\"status\":\"stale-refresh-research\",\"age_days\":$AGE_DAYS}"; exit 1

package/template/.claude/skills/super-design/scripts/write-state.sh ADDED Viewed

@@ -0,0 +1,26 @@
+#!/usr/bin/env bash
+# Usage: write-state.sh [<target>]        (reads JSON body from stdin)
+#
+# Atomic state writer: accepts JSON on stdin, writes to <target>.tmp,
+# validates with jq, then renames in place. Referenced from
+# docs/compass_artifact §11 ("Write-then-rename (atomic)") and SKILL.md
+# Step 4 ("Atomic write .audit-state.json").
+set -euo pipefail
+TARGET="${1:-docs/super-design/.audit-state.json}"
+TMP="${TARGET}.tmp"
+mkdir -p "$(dirname "$TARGET")"
+# Drain stdin into the tmp file.
+cat >"$TMP"
+# Validate it is parseable JSON before swapping.
+if ! jq -e 'type == "object"' "$TMP" >/dev/null 2>&1; then
+  rm -f "$TMP"
+  echo '{"error":"invalid-json-on-stdin"}' >&2
+  exit 2
+fi
+mv -f "$TMP" "$TARGET"
+echo "{\"status\":\"written\",\"path\":\"$TARGET\"}"