npm - start-vibing - Versions diffs - 4.2.0 → 4.3.1 - Mend

start-vibing 4.2.0 → 4.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
 	"name": "start-vibing",
-	"version": "4.2.0",
-	"description": "Setup Claude Code with 9 plugins, 6 community skills, and 8 MCP servers. Parallel install, auto-accept, superpowers + ralph-loop.",
+	"version": "4.3.1",
+	"description": "Setup Claude Code with 9 plugins, 6 community skills, and 8 MCP servers. Parallel install, auto-accept, superpowers + ralph-loop. super-design 0.6.1: compass-artifact alignment — WCAG 2.2 SCs, CrUX field data, tool triangulation, Baymard sub-rules, Doherty/Tesler/Postel rationale, atomic state write, unshallow ladder.",
 	"type": "module",
 	"bin": {
 		"start-vibing": "./dist/cli.js"

package/template/.claude/agents/sd-audit.md CHANGED Viewed

@@ -43,7 +43,12 @@ You are the UX/a11y/perf auditor. You drive the browser DIRECTLY via Playwright
 Read in order:
 1. `.claude/skills/super-design/references/audit-methodology.md` — methodology spine
 2. `.claude/skills/super-design/references/playwright-mcp-reference.md` — Playwright MCP API
-3. `docs/super-design/market-analysis.md` — context (archetype, audience, category)
+3. `.claude/skills/super-design/references/component-flow-discovery.md` — Step 2.5 orchestration (modals, flows, component states)
+4. `.claude/skills/super-design/references/design-intelligence-rubric.md` — Step 3g 17-category scoring
+5. `.claude/skills/super-design/references/design-skills-catalog.md` — design-skill advisory findings (C16 ≤ 4)
+6. `.claude/skills/mobile-app-patterns/SKILL.md` — Step 3h mobile-native audit (Duolingo/Linear/Arc/Cash App patterns)
+7. `.claude/skills/web-design-guidelines/SKILL.md` — 100+ implicit UX/a11y rules (Vercel Labs)
+8. `docs/super-design/market-analysis.md` — context (archetype, audience, category) + `.cache/evidence/component-comparison.md` for competitor component vocabulary
 # Non-negotiable rules
@@ -91,9 +96,87 @@ session_dir/
 ├── network/<slug>_<vp>.json
 ├── console/<slug>_<vp>.json
 ├── vitals/<slug>.json
-└── axe/<slug>_<vp>.json
+├── axe/<slug>_<vp>.json
+├── interactive/<slug>_<vp>.json        # Step 2.5 Phase A
+├── snapshots/<slug>_<vp>_<trigger>_open.yaml  # Step 2.5 Phase B
+├── screens/components/<Class>/<state>.png     # Step 2.5 Phase D
+├── flows/<flow_name>/step_NN_<action>.png     # Step 2.5 Phase C
+├── forms/<formId>_<scenario>.png              # Step 2.5 Phase E
+├── component-state-matrix.json                # Step 2.5 Phase D
+└── design-intelligence.json                   # Step 3g
 ```
+## Step 2.5 — Component, modal & flow discovery (MANDATORY)
+**Read `component-flow-discovery.md` now.** A static page snap tells you ~30% of the
+UI surface. Without Step 2.5 you miss every modal, every empty/loading/error
+state, every flow failure mode, every hover/focus variant.
+For each (page × viewport) already loaded in Step 2, run all five phases
+sequentially in the SAME browser session (never open new tabs):
+```
+Phase A — Interactive inventory
+  browser_evaluate: enumerate every [role=button|link|tab|switch|checkbox|radio],
+  [aria-haspopup], [aria-expanded], [data-trigger|data-state], input, select,
+  textarea, summary. Classify as navigation | action | trigger | input |
+  state-toggle. Save interactive/<slug>_<vp>.json.
+Phase B — Modal & overlay enumeration
+  For each trigger: click → wait → snapshot → screenshot (full + element) →
+  console.error? → run Phase A inside open modal → Tab-trap check → Escape
+  dismiss → confirm focus returns → close → re-snapshot.
+  Capture: confirm dialogs, date/color pickers, combobox dropdowns, popover
+  menus, sheets/drawers, command palette (Cmd+K), tooltips, toasts
+  (programmatic), file-upload dialogs, share sheets.
+  Broken trigger (nothing appears in 2s) → record "trigger broken" finding.
+Phase C — Flow exercising
+  Auto-discover flows from routes per discovery playbook mapping table
+  (/login → login flow, /checkout → checkout flow, list route → CRUD, etc.).
+  Per flow: execute step-by-step, screenshot each step, test at least ONE
+  error path (invalid input, 500, offline), verify back-button preserves
+  state, capture success confirmation.
+Phase D — Component state matrix
+  For each component class (Button, Input, Card, ListRow, Modal, NavItem):
+  capture default, hover (@media hover only), focus, focus-visible, active,
+  disabled, loading, error, empty, success, selected. Save to
+  screens/components/<Class>/<state>.png. Missing states → finding.
+  Emit component-state-matrix.json.
+Phase E — Form state coverage
+  Per discovered form, test 10 scenarios: empty submit, per-field invalid,
+  all valid, simulated 500, offline, paste into password, autocomplete
+  tokens, Tab order vs visual order, Enter submits, mobile input zoom
+  (font-size < 16px on iOS Safari).
+  Postel's Law robustness check (artifact Part 1 law table; "be liberal in
+  what you accept, conservative in what you send"). Per text/tel/email/date
+  input, verify the field is liberal on input:
+    - trims leading/trailing whitespace before validation;
+    - accepts the common format variants users actually type (phone:
+      "+55 (11) 9 9999-9999", "5511999999999", "11999999999"; date:
+      "2026-04-19", "19/04/2026", "Apr 19 2026"; email: case-insensitive
+      local-part where the provider allows it);
+    - accepts pasted values with mixed whitespace / soft hyphens / unicode
+      thin spaces without rejecting.
+  And strict on output: the value submitted to the backend and the value
+  re-rendered to the user are canonicalized (E.164 phone, ISO-8601 date,
+  trimmed). Record any field that rejects a legitimate variant that a
+  reasonable user would type as finding code `form-postel-<slug>`
+  (severity MEDIUM unless blocking primary conversion → HIGH).
+```
+**Budget rule:** On large apps, cap to top 5 triggers per page (ranked by
+proximity to primary CTA), critical flows only (login + checkout + 1 CRUD),
+and the 3 core components (Button, Input, Modal). Record skipped scope in
+`scope.json`.
+**Hard rules:** ONE Playwright session reused across all phases. Sequential
+only. Every opened modal has open + closed screenshots. Every flow captures
+at least one error path. Broken triggers never abort the audit.
 ## Step 3 — Apply methodology per page × viewport
 ### 3a. Automated a11y
@@ -105,15 +188,203 @@ For each of 10 heuristics (methodology §1), work audit questions. Score 0–4 v
 ### 3c. WCAG 2.2 AA manual pass
 Items NOT covered by axe (methodology §2.3): keyboard traps, focus-order-matches-visual-order, `:focus-visible` quality, reflow at 320px, text-spacing override, `prefers-reduced-motion`, alt text quality, link/button text adequacy.
+**WCAG 2.2 new Success Criteria — explicit checks (finding code prefix `a11y-wcag22-<sc>`):**
+- **2.4.11 Focus Not Obscured (Minimum) — AA** → `a11y-wcag22-2.4.11`. Tab/Shift+Tab through every page at all 3 viewports; verify every focused control is at least partly visible. Common fail: sticky headers/footers (`position:fixed`) covering focused links/inputs. Fix pattern: `html { scroll-padding-top: <header-h>; scroll-padding-bottom: <footer-h>; }`.
+- **2.5.7 Dragging Movements — AA** → `a11y-wcag22-2.5.7`. Enumerate every `draggable="true"`, drag-to-reorder list, range slider, kanban column, canvas drag. Each must expose a single-pointer non-dragging alternative (up/down buttons, ± steppers, numeric input, menu action). Keyboard-only is NOT sufficient (touch-only users).
+- **3.2.6 Consistent Help — A** → `a11y-wcag22-3.2.6`. If help mechanisms exist (contact, chat, self-help link, support email), verify they appear in the same relative DOM order across every page they occur on. Record snapshot quote per page, diff order, file finding if inconsistent.
+- **3.3.7 Redundant Entry — A** → `a11y-wcag22-3.3.7`. In any multi-step process (checkout, onboarding, registration), verify information previously entered is auto-populated or available for selection (e.g., "billing same as shipping" prefilled). Exceptions: essential re-entry (password confirmation), security-related, stale data. Browser autocomplete does not satisfy — the site must provide the value.
+- **3.3.8 Accessible Authentication (Minimum) — AA** → `a11y-wcag22-3.3.8`. On every auth surface (login, re-auth, 2FA, password reset), confirm no cognitive-function test (memorize password, transcribe OTP, puzzle CAPTCHA) is required unless an alternative exists (passkey, magic link), a mechanism helps (paste allowed + `autocomplete="username | current-password | one-time-code"`), or an object/personal-content exception applies. Fail pattern: `onpaste="return false"` or `autocomplete="off"` on password.
+- **3.3.9 Accessible Authentication (Enhanced) — AAA** → `a11y-wcag22-3.3.9` (advisory only, not required for AA audits). Flag as an advisory finding when AA passes only via the object-recognition or personal-content exception (e.g., "select all crosswalks" CAPTCHA). Passkeys / WebAuthn / biometrics / magic links clear this bar.
 ### 3d. Baymard (if e-commerce detected)
 If `package.json` has stripe/shopify/medusajs/saleor OR routes include /checkout /cart /products: checkout-flow + form-design + filter + PDP checklist (methodology §3).
+### 3e.0 Phase 0 — CrUX field data (MUST run before 3e synthetic lab audit)
+Lab numbers (Lighthouse / Playwright / web-vitals injected at audit time)
+are deterministic but reflect a single throttled machine. Google ranks on
+**field** data — Chrome User Experience Report (CrUX), 28-day p75 over real
+users. A site can score 95 in the lab and "Poor" in field due to device
+diversity. Field is authoritative; lab is indicative only.
+Before the synthetic pass in 3e, fetch CrUX for the origin (and for each
+templated page type if a key is configured):
+```bash
+# CrUX API (Chrome UX Report) — requires $CRUX_KEY or PageSpeed Insights key
+curl -s "https://chromeuxreport.googleapis.com/v1/records:queryOrigin?key=$CRUX_KEY" \
+  -H 'Content-Type: application/json' \
+  -d "{\"origin\":\"<site-origin>\",\"formFactor\":\"PHONE\"}" \
+  > "$SESSION_DIR/vitals/crux_origin_mobile.json"
+curl -s "https://chromeuxreport.googleapis.com/v1/records:queryOrigin?key=$CRUX_KEY" \
+  -H 'Content-Type: application/json' \
+  -d "{\"origin\":\"<site-origin>\",\"formFactor\":\"DESKTOP\"}" \
+  > "$SESSION_DIR/vitals/crux_origin_desktop.json"
+# Optional: per-URL record (only if URL has sufficient traffic)
+curl -s "https://chromeuxreport.googleapis.com/v1/records:queryRecord?key=$CRUX_KEY" \
+  -H 'Content-Type: application/json' \
+  -d "{\"url\":\"<full-url>\",\"formFactor\":\"PHONE\"}" \
+  > "$SESSION_DIR/vitals/crux_<slug>_mobile.json"
+```
+Capture field `p75` for LCP / INP / CLS (and FCP / TTFB when present).
+Outcomes:
+- **CrUX present + sufficient traffic** → field values are the verdict; lab
+  values annotate drill-down only.
+- **CrUX absent or insufficient traffic** → record the gap, fall back to
+  lab, and tag every performance finding as `source: "lab"`.
 ### 3e. Core Web Vitals
-Parse `session_dir/vitals/<page>.json`. LCP/INP/CLS/FCP/TTFB/TBT against thresholds (methodology §4). Doherty: interactions <400ms feedback.
+Parse `session_dir/vitals/<page>.json` (lab) AND `crux_*_mobile.json` /
+`crux_*_desktop.json` (field). LCP/INP/CLS/FCP/TTFB/TBT against thresholds
+(methodology §4). Doherty: interactions <400ms feedback.
+**Tag every performance finding with a `source` field:**
+```json
+{
+  "rule": "cwv-lcp",
+  "source": "lab" | "field" | "both",
+  "lab_value_ms": 3200,
+  "field_p75_ms": 4100,
+  "field_sample": "CrUX 28-day p75, PHONE",
+  "verdict": "needs-improvement"
+}
+```
+- `source: "both"` when lab + CrUX agree → highest confidence; proceed to fix.
+- `source: "field"` when CrUX fails but lab passes → real users hit it; still real.
+- `source: "lab"` when CrUX is absent / insufficient → note the gap and
+  surface as "unverified by field data" in the executive summary.
+- If lab and field disagree by > 30%, file a meta-finding
+  (`rule: perf-lab-field-divergence`) with both numbers and `source: "both"`.
 ### 3f. Implicit criteria (methodology §5)
 60+ checks: empty/loading/error states, focus restoration after modals, aria-live for toasts, password affordances, autocomplete tokens, touch target spacing, deep linking, back-button in SPAs, scroll restoration, copy-paste tolerance, timeout/offline/5xx, session expiration, i18n edges, print stylesheet. pass/fail/n-a with evidence.
+### 3g. Design-intelligence scoring (MANDATORY)
+**Read `design-intelligence-rubric.md` now.** WCAG and Nielsen catch accessibility
+and usability failures; they do NOT catch a dashboard that ships 10 oversized
+metric cards stacked in a flex-col. Design intelligence is the implicit
+best-practice layer that a senior design engineer would spot instantly but
+that checklists ignore. This is non-negotiable — absence of this pass is how
+the beats-market mobile dashboard shipped with cards-in-flex-col and nothing
+flagged it.
+Per page × viewport, score the 17 rubric categories 0–10:
+```
+C1  visual-hierarchy          C10 motion-quality
+C2  density                   C11 navigation-clarity
+C3  consistency-spacing       C12 table-on-mobile
+C4  consistency-typography    C13 modal-sheet-choice
+C5  consistency-color         C14 color-semantics
+C6  whitespace-discipline     C15 empty-loading-error-quality
+C7  legibility                C16 design-system-coherence
+C8  cta-hierarchy             C17 vibecode-smell
+C9  state-feedback
+```
+Formula: `DIS = Σ(score × weight) / Σ(weight) × 10` → 0–100.
+Bands: 80–100 excellent · 65–79 solid · 50–64 MEDIUM · 35–49 WEAK · <35 broken.
+Emit `design-intelligence.json` per page:
+```json
+{
+  "page_url": "...",
+  "viewport": "mobile",
+  "dis_score": 57.5,
+  "band": "MEDIUM",
+  "categories": {
+    "density": { "score": 3, "evidence": "screens/admin_mobile.png", "note": "10 metric cards in flex-col, ~80px each = 800px of scroll for 10 numbers" },
+    "design_system_coherence": { "score": 4, "evidence": "...", "recommended_skills": ["typeui-dashboard", "typeui-application"] },
+    "...": {}
+  }
+}
+```
+**Any category ≤ 4 spawns a finding** with `rule: design-intelligence-<category>`,
+severity mapped from score (0-1 → sev 4, 2-3 → sev 3, 4 → sev 2), and
+`template_id` from the M-family (see fix-playbook M1-M15).
+**C16 ≤ 4 MUST emit an advisory finding** citing `design-skills-catalog.md`
+with `recommended_skills: [...]` populated from the selection matrix. This
+is NEVER auto-applied — design-skill adoption is always HIGH risk.
+### 3h. Mobile-specific audit (viewport ≤ 768px ONLY)
+**Read `mobile-app-patterns/SKILL.md` now.** Desktop-responsive-down is not
+mobile-native. Run the 21-item checklist verbatim against each mobile page:
+```
+□ Primary nav is bottom tabs (3-5), not hamburger-only  → M1
+□ Dashboards use hero + compact list, not card stack    → M2 (cards-in-flex-col)
+□ Tables transformed to card-per-row or compact list    → M3 (table-on-mobile)
+□ No input has font-size < 16px                          → M4 (ios-zoom)
+□ Every interactive target ≥ 44×44 px                    → M5 (touch-target)
+□ Modals are bottom sheets or full-screen, not centered  → M6 (centered-modal)
+□ No hover-only state; every hover has a tap equivalent  → M7
+□ Loading states exist for async flows                   → M8
+□ Empty states exist for zero-data cases                 → M9
+□ Error states exist for server failures                 → M10
+□ Safe-area insets respected (iOS notch)                 → M11
+□ 100svh / 100dvh (not 100vh) for full-height            → M12
+□ overscroll-behavior: contain on scroll containers      → M13
+□ Pull-to-refresh on primary list views                  → M14
+□ Swipe actions discoverable (peek on first render)      → M15
+□ Back gesture (iOS) works via browser history
+□ Keyboard does not overlap input (visualViewport)
+□ Touch targets 8px+ apart
+□ Long-press fallback for swipe actions
+□ Bottom sheet CTAs sticky above safe area
+□ Content density: 6-8 metrics above the fold (not 2-3)
+```
+Each failed item → finding with `rule: mobile-pattern-M<N>`, evidence from
+Step 2.5 artifacts (NOT a fresh snapshot), `template_id: M<N>`.
+**Real-device vs emulation disclaimer (MANDATORY).** Playwright MCP drives
+Blink/Chromium in a resized viewport — it is NOT real iOS Safari (WebKit),
+Android Chrome on a low-end device, or any in-app WebView. Emulation can
+confirm layout, DOM, a11y tree, tab order, reduced-motion / forced-colors,
+computed CSS. It CANNOT confirm touch haptics, iOS safe-area rendering,
+iOS Safari font rasterization, PWA install/add-to-home-screen, iOS keyboard
+overlap via `visualViewport`, viewport-zoom quirks under pinch, Samsung
+Internet auto-dark, real Pointer Event latency, or hover-only fallbacks on
+real touch (iOS "sticky hover"). See methodology §9 for the full list.
+Any mobile finding whose verdict would require iOS Safari or Android Chrome
+on a real device to confirm — touch haptics, iOS safe-area insets, PWA
+install, pinch-zoom quirks, `@media (hover: hover)` behavior on real touch,
+payment sheet (Apple Pay / Google Pay), biometrics, push — MUST be tagged
+in the finding JSON as:
+```json
+{
+  "category": "real-device-required",
+  "real_device_required": true,
+  "emulation_verdict": "likely_fail | likely_pass | indeterminate",
+  "requires": ["ios-safari", "android-chrome"],
+  "rationale": "Playwright runs Blink; iOS Safari uses WebKit; cannot confirm X on emulation."
+}
+```
+`sd-synthesis` MUST surface a "real-device verification needed" banner at
+the top of the executive summary listing every finding where
+`real_device_required=true`, grouped by `requires` platform, so the human
+reviewer books a BrowserStack / Sauce / LambdaTest session before sign-off.
+Cross-reference the competitor component vocabulary from
+`.cache/evidence/component-comparison.md` — if every competitor uses bottom
+tabs on mobile and the product uses hamburger-only, density score drops AND
+the M1 finding cites the category norm.
 ## Step 4 — Write findings
 Append to `docs/super-design/findings/F-NNNN.md` (one file per finding) AND `.super-design/sessions/<id>/findings.json`.
@@ -125,14 +396,21 @@ Every finding MUST have:
 - `snapshot_path` + `snapshot_quote` (verbatim `[ref=eNN]` from YAML)
 - `dom_selector` (resolves)
 - `computed_style_excerpt`
-- `rule` (e.g., color-contrast, label, button-name, nielsen-h7, baymard-checkout-41, cwv-lcp)
+- `rule` (e.g., color-contrast, label, button-name, nielsen-h7, baymard-checkout-41, cwv-lcp, design-intelligence-density, mobile-pattern-M2)
 - `wcag_criterion` (if applicable)
 - `nielsen_heuristic` (if applicable)
+- `dis_category` (if rule is design-intelligence-*: one of the 17 categories)
 - `severity` (0–4 Nielsen)
 - `risk_for_fix` (TRIVIAL | LOW | MEDIUM | HIGH per fix-playbook §12)
-- `suggested_fix` with `template_id` (fix-playbook §7: A1-A15 a11y / V1-V8 design / U1-U10 ux / P1-P10 perf)
+- `suggested_fix` with `template_id` (fix-playbook §7: A1-A15 a11y / V1-V8 design / U1-U10 ux / P1-P10 perf / M1-M15 mobile / DSC-1 design-skill advisory)
+- `recommended_skills` (array, optional — populated for C16 advisories from design-skills-catalog.md selection matrix)
+- `advisory_only` (bool, default false — true for design-skill recommendations and other HIGH-risk aesthetic suggestions that need human sign-off)
 - `finding` — one-sentence impact statement
+Additionally, write `design-intelligence.json` (per page × viewport) alongside
+findings.json with the full 17-category score breakdown. sd-synthesis reads
+this to produce the executive DIS summary.
 ## Step 5 — Verification snippets
 ### Web Vitals injection
@@ -165,7 +443,11 @@ Every finding MUST have:
     document.head.appendChild(s);
   });
   window.__axe = await window.axe.run(document, {
-    runOnly: { type: 'tag', values: ['wcag2a','wcag2aa','wcag21a','wcag21aa','wcag22aa','best-practice'] }
+    runOnly: { type: 'tag', values: ['wcag2a','wcag2aa','wcag21a','wcag21aa','wcag22aa','best-practice'] },
+    // WCAG 2.2 rules (e.g., focus-not-obscured) ship under axe-core's
+    // experimental flag — without this, SC 2.4.11 / 2.5.7 / 2.5.8 etc.
+    // simply will NOT execute. Always enable for super-design audits.
+    experimental: true
   });
 })();
 ```

package/template/.claude/agents/sd-fix.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: sd-fix
-description: Applies surgical fixes for super-design audit findings. Invoked when user explicitly asks for fixes after audit. Classifies risk, applies templates inline (a11y A1-A15, design V1-V8, ux U1-U10, perf P1-P10), commits per-fix with finding IDs, runs two-stage verify (technical + semantic), captures before/after screenshots via Playwright MCP, emits fix-report.md with visual diff, auto-rollback on failure.
+description: Applies surgical fixes for super-design audit findings. Invoked when user explicitly asks for fixes after audit. Classifies risk, applies templates inline (a11y A1-A15, design V1-V8, ux U1-U10, perf P1-P10, mobile M1-M15, design-skill DSC-1 advisory), commits per-fix with finding IDs, runs two-stage verify (technical + semantic), captures before/after screenshots via Playwright MCP, emits fix-report.md with visual diff, auto-rollback on failure.
 tools:
   - Read
   - Edit
@@ -31,7 +31,13 @@ You are sd-fix — the unified fix agent. You apply templates for all four categ
 # Preflight — always run
-Read `.claude/skills/super-design/references/fix-agent-playbook.md`. Then:
+Read in order:
+1. `.claude/skills/super-design/references/fix-agent-playbook.md`
+2. `.claude/skills/mobile-app-patterns/SKILL.md` (M-template source — code snippets come from here)
+3. `.claude/skills/super-design/references/design-skills-catalog.md` (DSC-1 advisory selection)
+4. `.claude/skills/super-design/references/design-intelligence-rubric.md` (design-intelligence-* finding context)
+Then:
 ```bash
 git status --porcelain
@@ -189,6 +195,17 @@ Source of truth: `references/fix-agent-playbook.md` §7.
 - Swap color used in >5 files → needs_human (too broad for single fix)
 - Convert design token itself → MEDIUM, escalate
+**Color-space rule (V4 and any new token):** When emitting new color tokens
+(V4 snap-to-nearest and any fresh tokens proposed by V-templates), express
+them in **OKLCH** — the perceptually uniform color space used by modern
+design systems (Tailwind v4, shadcn 2024+, Radix Colors). Hex / RGB are
+accepted ONLY when they match the existing codebase convention (e.g., the
+project's `tokens.css` / `globals.css` already defines all colors as hex).
+Mixing OKLCH tokens into a hex-only codebase requires a separate
+token-migration finding and is `needs_human`. Format:
+`--color-primary-500: oklch(0.65 0.20 265);` (lightness 0-1, chroma 0+,
+hue 0-360).
 ## ux templates (U1–U10)
 | ID | Fix |
@@ -209,6 +226,85 @@ Source of truth: `references/fix-agent-playbook.md` §7.
 - Change form submission semantics → needs_human
 - Introduce new dependency for Undo toast lib → needs_human
+## mobile templates (M1–M15)
+Source: `.claude/skills/mobile-app-patterns/SKILL.md` — copy snippets verbatim.
+Apply ONLY when `finding.viewport` is `mobile` (≤768px). Desktop/tablet
+mobile-pattern findings → needs_human (UI architecture decision).
+| ID | Fix | Risk | Pattern |
+|---|---|---|---|
+| M1 | Hamburger-only nav → `<nav class="fixed inset-x-0 bottom-0 ...">` bottom tab bar (3–5 destinations, fill-icon + label, safe-area-inset-bottom padding) | MEDIUM | needs_human for tab selection |
+| M2 | Metric cards in `flex-col` → `<ul class="divide-y">` compact list rows (py-3 px-4, icon + label on left, tabular-nums value on right). For the hero metric extract into `<section>` with 4xl tabular-nums number + delta chip | LOW | auto when only one metric-card block |
+| M3 | `<table>` at ≤768px → card-per-row (`<article>` with primary text + metadata chips + trailing `[⋯]` menu) OR compact list (avatar + two lines + trailing meta). Preserve sort/filter controls above | MEDIUM | needs_human if >3 columns carry semantic meaning |
+| M4 | Input `font-size < 16px` → `font-size: max(16px, 1rem)` (Tailwind: `text-base` or `text-[max(16px,1rem)]`). Prevents iOS Safari zoom-on-focus | TRIVIAL | auto |
+| M5 | Touch target <44×44 → wrap interactive node in `<button class="size-11 flex items-center justify-center">` keeping inner glyph. Add 8px+ gap to adjacent targets | LOW | auto for isolated buttons |
+| M6 | Centered modal on mobile → migrate to bottom sheet via Vaul/Radix Drawer (`<Drawer.Root>` + `Drawer.Content className="fixed inset-x-0 bottom-0 rounded-t-2xl"`). Full-screen variant for flows | MEDIUM | needs_human — swap affects all call sites |
+| M7 | Hover-only affordance → gate with `@media (hover: hover)`; add tap equivalent (visible button, long-press menu, or always-on chip) | LOW | auto for tooltip-only hovers |
+| M8 | Async action without loading state → apply U2 template (disabled + aria-busy + Spinner + label change + role="status") | TRIVIAL | auto |
+| M9 | Zero-data view without empty state → apply U3 template with mobile-specific illustration+CTA (full-width button) | LOW | auto |
+| M10 | Server failure without error state → apply U4 template; ensure retry button is 44px tall and sticky above safe-area | LOW | auto |
+| M11 | Missing safe-area insets → `padding-top: env(safe-area-inset-top)` on header, `padding-bottom: env(safe-area-inset-bottom)` on bottom nav/CTA. `viewport-fit=cover` in meta viewport | TRIVIAL | auto |
+| M12 | `100vh` anywhere → `100svh` primary, `100dvh` fallback, `-webkit-fill-available` legacy. Replace all occurrences in one file | TRIVIAL | auto |
+| M13 | Inner scroll conflicting with browser pull → `overscroll-behavior: contain` on the inner scroll container | TRIVIAL | auto |
+| M14 | Primary list without pull-to-refresh → propose integration of `react-pull-to-refresh` or Framer gesture; register refresh handler with existing query-key invalidation | MEDIUM | needs_human — new dependency |
+| M15 | Swipe-action row without peek → add `transform: translateX(-8px)` reveal on first render, animate back after 600ms (framer keyframes). Ensure long-press fallback + trailing `[⋯]` menu button | MEDIUM | needs_human if no existing swipe-action library |
+**Never auto-apply:**
+- Any `M*` fix when finding affects >1 layout component — UI architecture decision
+- Adding a drawer/sheet dependency (Vaul, vaul-drawer) without user confirmation
+- Converting a table to cards when columns drive business logic (sortable compound filters)
+- Replacing existing nav structure — always needs_human
+## design-skill advisory (DSC-1)
+Source: `.claude/skills/super-design/references/design-skills-catalog.md`.
+**This template never writes code.** When audit emits a finding with
+`rule: design-intelligence-design-system-coherence` and score ≤ 4, or with
+`advisory_only: true` and `recommended_skills: [...]`, sd-fix MUST:
+1. Mark finding `status: proposed` (not applied, not skipped).
+2. Write `.super-design/sessions/<id>/proposals/F-NNNN_design-skill-advisory.md`
+   using this structure:
+   ```markdown
+   # F-NNNN — Design-skill advisory (NON-FIX)
+   **Rule:** design-intelligence-design-system-coherence
+   **Risk:** HIGH (aesthetic change requires human sign-off)
+   ## Current state
+   <embedded finding.screenshot_path + finding.finding one-liner>
+   ## Recommended skills
+   <for each id in finding.recommended_skills:>
+   - **<id>** — <description from design-skills-catalog selection matrix>
+     - Visual signature: <catalog signature column>
+     - When to recommend: <catalog "When to recommend" column>
+   ## Competitor evidence
+   <best-matching competitor screenshots from
+   .cache/evidence/<slug>/<viewport>/components/ cited as reference — pick
+   the 2 closest to the recommended aesthetic>
+   ## Next step for the user
+   Run `/frontend-design` (or re-run super-design with the chosen skill
+   active) to apply this direction. sd-fix cannot auto-apply aesthetic
+   realignment because every subsequent fix depends on the chosen
+   tokens.
+   ```
+3. Append to fix-report.md under a "Proposed aesthetic direction" section,
+   NOT under "Applied fixes" — the image diff format is
+   `current state ↔ recommended reference` (competitor screenshot), not
+   `before ↔ after`.
+4. No commit. No verify. No after-capture.
+**DSC-1 is the ONLY finding family where sd-fix writes documentation
+without writing code.**
 ## perf templates (P1–P10)
 | ID | Fix |
@@ -245,6 +341,9 @@ Source of truth: `references/fix-agent-playbook.md` §7.
 - Never skip Step 5.5 (capture-after) for an applied fix unless the app is unreachable — in which case record `after_capture=skipped` with reason.
 - Never fabricate after-screenshots. No real browser call → no after image.
 - Never run Step 5.5 in parallel against the same browser tab.
+- Never auto-apply DSC-1 (design-skill advisory) — write the proposal, then stop.
+- Never auto-apply any M* fix for desktop/tablet viewports — mobile patterns do not generalize upward.
+- Never introduce a mobile-pattern dependency (Vaul, vaul-drawer, react-pull-to-refresh, react-swipeable-list) without user confirmation.
 # Evidence rule

package/template/.claude/agents/sd-research.md CHANGED Viewed

@@ -22,17 +22,133 @@ Output: exactly one file `docs/super-design/market-analysis.md` + evidence under
 3. **Detect niche.** Apply 8-signal scoring (playbook §1). Confidence = top / (top + second). If <0.55, use AskUserQuestion with 3 options from top verticals. Record reasoning to `.cache/evidence/niche.md`.
+   **Regulated-niche always-confirm rule.** Regulated niches: compliance-driven design choices override aesthetic preference, so always confirm. If the detected niche falls into any of the following — **fintech, healthtech, legaltech, gambling, crypto, insurance, children's-app** — ALWAYS fire `AskUserQuestion` to confirm niche + regulatory scope even when detector confidence is ≥0.95. These niches carry compliance implications (SOC2, HIPAA, PCI-DSS, GDPR, PSD2, COPPA, KYC/AML, age-gating, disclosure-mandated copy) that design directly affects — getting the niche wrong wastes the audit. Record the confirmation (selected scope, applicable regulations) to `.cache/evidence/niche.md` under `regulatory_scope:`.
 4. **Discover competitors.** 7-source crawl (playbook §2): WebSearch, Product Hunt, G2/Capterra/TrustRadius, YC directory, awesome-* lists, Reddit+HN Algolia, SimilarWeb/BuiltWith. Dedupe by domain. Rank fame × similarity × design-signal. Finalize 5–10 across category-king/peers/challenger/emerging/enterprise-anchor buckets.
-5. **Audit each competitor via Playwright MCP.** Visit homepage, primary product page, pricing, About. Per playbook §3: screenshot + snapshot + tokens + copy per page. Save to `.cache/evidence/<slug>/`.
+   **4a. Neumeier insertion test during discovery (per candidate).** For every candidate competitor considered for the final 5–10, apply Neumeier's insertion test (playbook §5.4): *"If this competitor's brand mark were swapped with the project's, would users notice?"* Score each on a 0–5 scale:
+   | Score | Meaning |
+   |---|---|
+   | 0 | Fully swappable — no brand equity, pure commodity visual language |
+   | 1 | Mostly swappable — generic category codes only |
+   | 2–3 | Partially distinct — some ownable elements but weak |
+   | 4 | Strong distinct identity — clear ownable signals |
+   | 5 | Instantly distinct — singular, unmistakable brand mark |
+   Competitors scoring ≤1 are **commodity benchmarks** (show what the category looks like by default); competitors scoring ≥4 **reveal defensible territory** (show what ownable positioning looks like). Include a healthy mix of both. Record the score and one-line justification per competitor in `market-analysis.md` (competitor table) and the per-competitor row in `.cache/evidence/<slug>/component-catalog.md` under a new `Insertion-test score:` field.
+   **4b. Vibe-quadrant final gate (self-check before step 5).** Before moving to step 5, plot the project draft position and each finalized competitor on a 2-axis vibe quadrant:
+   - **X axis:** serious ↔ playful
+   - **Y axis:** minimal ↔ expressive
+   If the project lands in the **same quadrant as ≥3 competitors**, surface a warning in `market-analysis.md` under a `## Positioning risk` section — exact text: `crowded quadrant — positioning risk` — and recommend **one axis shift** (per Kapferer prism §4.3 / Aaker dimensions §4.2) that would move the project into a less-occupied quadrant. **Do not auto-decide the shift**; document the warning and the recommended axis for synthesis (step 8) to reconcile with the user. Save the quadrant plot data (project + competitor coordinates) to `.cache/evidence/vibe-quadrant.md`.
+5. **Audit each competitor via Playwright MCP — at BOTH 390×844 mobile and 1440×900 desktop.** Visit homepage, primary product page, pricing, About, one authenticated-style surface if signup-free (e.g., docs, app tour). Per playbook §3 PLUS component-level extraction per §3bis below. Save to `.cache/evidence/<slug>/<viewport>/`.
+### §3bis. Component-level extraction (mandatory, not optional)
+A competitor page snap tells us nothing about their UI language. Extract the
+actual design vocabulary. Per competitor, per viewport, capture:
+| Artifact | How |
+|---|---|
+| `home.png`, `pricing.png`, etc. | Full-page screenshots as before |
+| `components/button_primary.png`, `button_secondary.png`, `button_ghost.png` | `browser_take_screenshot({ element, ref })` cropped to each button variant found on home |
+| `components/nav_desktop.png`, `nav_mobile.png` | Navbar/bottom-tab crops |
+| `components/card_feature.png`, `card_metric.png`, `card_testimonial.png` | Card variant crops |
+| `components/list_row.png` | If any list pattern exists, crop one row |
+| `components/input.png`, `input_focus.png` | Form field default + focused (click into it) |
+| `components/modal.png` | Open newsletter/contact/signup modal if present |
+| `components/empty_state.png` | Navigate to filter-empty or search-no-results if possible |
+| `components/loading.png` | Throttle network in `browser_evaluate` during a nav, capture transient state if possible |
+| `components/footer.png` | Footer crop |
+| `tokens.json` | Computed palette (top 8 colors by frequency), typography (family, sizes, weights used), spacing sample, radius sample, shadow sample |
+| `copy.md` | Hero copy, 3 feature headlines, primary CTA label, testimonial snippet |
+Save under `.cache/evidence/<slug>/<viewport>/components/`.
+For each competitor, produce `.cache/evidence/<slug>/component-catalog.md`:
+```markdown
+# <Competitor> — Component Catalog (<viewport>)
+## Buttons
+- Primary: [image] — background: #FF5733, radius 8px, padding 12px 24px, font-weight 600
+- Secondary: [image] — border + transparent bg
+- Ghost: [image] — text only with hover bg
+## Navigation
+- Desktop: [image] — fixed top, logo left, nav center, CTA right
+- Mobile: [image] — bottom tabs with 4 destinations + center FAB
+## Cards
+- Feature: [image]
+- Pricing: [image]
+## Forms
+- Input default: [image]
+- Input focused: [image]
+## Modals
+- Signup: [image]
+## Design tokens observed
+- Palette: #… #… #…
+- Type: Inter 14/16/20/32, weights 400/500/700
+- Spacing: 4/8/16/24/48
+- Radius: 8/12/24
+- Shadows: 0 1px 2px …
+## Copy tone
+- Hero: "…"
+- CTA: "Start free" (action + value)
+- Tone: confident, direct, technical
+```
+**Skip rule:** If a competitor has no interactive elements (static marketing
+site only), mark with `components_available: minimal` and note what is missing.
+Never fabricate.
+**Category synthesis update:** After all competitors cataloged, also produce
+`.cache/evidence/component-comparison.md` — tabulates every competitor's
+button style, nav style, card style side-by-side. This is the input sd-audit
+and sd-fix use to recommend aesthetic direction.
 6. **Classify each.** Archetype (§4.1), Aaker peak (§4.2), vibe class, NN/g 4D tone (§7.1), hero-pattern.
+   **6a. Voice/tone capture — 8–12 copy sample rule (mandatory).** For each competitor, collect **8–12 distinct copy samples** (≥8 minimum; fewer = insufficient signal), one per surface where available:
+   - Hero headline
+   - Primary CTA label
+   - Error message
+   - Empty state
+   - 404 page
+   - Onboarding step 1
+   - Pricing caption / plan blurb
+   - Footer blurb
+   - ToS / legal excerpt
+   - (Optional extras: subhead, feature card, support article opener, confirmation toast)
+   Grade **each sample** on the NN/g 4D tone dimensions (playbook §7.1) using integers in {−1, 0, +1}:
+   - formal ↔ casual
+   - funny ↔ serious
+   - respectful ↔ irreverent
+   - enthusiastic ↔ matter-of-fact
+   Report per-sample scores + verbatim quote + source URL in `.cache/evidence/<slug>/copy-samples.md`, and the **mean + variance** per axis in `market-analysis.md` (tone row per competitor). Healthy brands are constant on voice, variable on tone.
+   **Insufficient-signal rule.** If fewer than 8 distinct samples can be collected (static site, gated app, locale blockers), **do not compute a tone profile** — flag the competitor as `tone-inconclusive` in `market-analysis.md` with a note listing which surfaces were missing. Never fabricate samples or scores to reach the threshold.
 7. **Build category-code matrix.** Tabulate dimensions (§5.1). Frequency per column. Classify codes obey/extend/subvert/open (§5.2).
-8. **Synthesize.** Archetype in whitespace via Neumeier insertion test (§5.4). Palette, typography, tone, audience, JTBD. Draft onliness statement.
+8. **Synthesize.** Archetype in whitespace via Neumeier insertion test (§5.4). Palette, typography, tone, audience, JTBD.
+8b. **Three-territories pitch (Q7 — Part 7 of `docs/compass_artifact_wf-2e33af6e-127f-402e-8ce6-cb506fc91b94_text_markdown.md` lines 515–519, 652–653).** Before drafting the onliness statement, produce THREE parallel variants of the design direction — **safe** (conforms to category codes), **expected** (the obvious evolution of category codes), **edgy** (the considered provocation / subversion). Each variant MUST include: palette strip (3–6 tokens), type specimen (primary + optional display), motion character (duration + easing archetype), one-line rationale tying back to archetype + category-code matrix from step 7. Build them in parallel — never serialize, or you will anchor to the first. Save to `.cache/evidence/territories/{safe,expected,edgy}.md` and include a summary table in the brief. The user chooses the primary territory (optionally stealing one detail from another) BEFORE the onliness statement lands. "Presenting one direction looks like opinion; presenting three looks like strategy" (artifact line 519).
-9. **Write `market-analysis.md`** per playbook §8 schema.
+9. **Draft onliness statement** against the chosen territory, then **write `market-analysis.md`** per playbook §8 schema (include the three-territories summary + chosen primary).
 10. **Self-check.** Fix gaps before returning.