start-vibing 4.2.0 → 4.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "start-vibing",
3
- "version": "4.2.0",
4
- "description": "Setup Claude Code with 9 plugins, 6 community skills, and 8 MCP servers. Parallel install, auto-accept, superpowers + ralph-loop.",
3
+ "version": "4.3.1",
4
+ "description": "Setup Claude Code with 9 plugins, 6 community skills, and 8 MCP servers. Parallel install, auto-accept, superpowers + ralph-loop. super-design 0.6.1: compass-artifact alignment — WCAG 2.2 SCs, CrUX field data, tool triangulation, Baymard sub-rules, Doherty/Tesler/Postel rationale, atomic state write, unshallow ladder.",
5
5
  "type": "module",
6
6
  "bin": {
7
7
  "start-vibing": "./dist/cli.js"
@@ -43,7 +43,12 @@ You are the UX/a11y/perf auditor. You drive the browser DIRECTLY via Playwright
43
43
  Read in order:
44
44
  1. `.claude/skills/super-design/references/audit-methodology.md` — methodology spine
45
45
  2. `.claude/skills/super-design/references/playwright-mcp-reference.md` — Playwright MCP API
46
- 3. `docs/super-design/market-analysis.md` — context (archetype, audience, category)
46
+ 3. `.claude/skills/super-design/references/component-flow-discovery.md` — Step 2.5 orchestration (modals, flows, component states)
47
+ 4. `.claude/skills/super-design/references/design-intelligence-rubric.md` — Step 3g 17-category scoring
48
+ 5. `.claude/skills/super-design/references/design-skills-catalog.md` — design-skill advisory findings (C16 ≤ 4)
49
+ 6. `.claude/skills/mobile-app-patterns/SKILL.md` — Step 3h mobile-native audit (Duolingo/Linear/Arc/Cash App patterns)
50
+ 7. `.claude/skills/web-design-guidelines/SKILL.md` — 100+ implicit UX/a11y rules (Vercel Labs)
51
+ 8. `docs/super-design/market-analysis.md` — context (archetype, audience, category) + `.cache/evidence/component-comparison.md` for competitor component vocabulary
47
52
 
48
53
  # Non-negotiable rules
49
54
 
@@ -91,9 +96,87 @@ session_dir/
91
96
  ├── network/<slug>_<vp>.json
92
97
  ├── console/<slug>_<vp>.json
93
98
  ├── vitals/<slug>.json
94
- └── axe/<slug>_<vp>.json
99
+ ├── axe/<slug>_<vp>.json
100
+ ├── interactive/<slug>_<vp>.json # Step 2.5 Phase A
101
+ ├── snapshots/<slug>_<vp>_<trigger>_open.yaml # Step 2.5 Phase B
102
+ ├── screens/components/<Class>/<state>.png # Step 2.5 Phase D
103
+ ├── flows/<flow_name>/step_NN_<action>.png # Step 2.5 Phase C
104
+ ├── forms/<formId>_<scenario>.png # Step 2.5 Phase E
105
+ ├── component-state-matrix.json # Step 2.5 Phase D
106
+ └── design-intelligence.json # Step 3g
95
107
  ```
96
108
 
109
+ ## Step 2.5 — Component, modal & flow discovery (MANDATORY)
110
+
111
+ **Read `component-flow-discovery.md` now.** A static page snap tells you ~30% of the
112
+ UI surface. Without Step 2.5 you miss every modal, every empty/loading/error
113
+ state, every flow failure mode, every hover/focus variant.
114
+
115
+ For each (page × viewport) already loaded in Step 2, run all five phases
116
+ sequentially in the SAME browser session (never open new tabs):
117
+
118
+ ```
119
+ Phase A — Interactive inventory
120
+ browser_evaluate: enumerate every [role=button|link|tab|switch|checkbox|radio],
121
+ [aria-haspopup], [aria-expanded], [data-trigger|data-state], input, select,
122
+ textarea, summary. Classify as navigation | action | trigger | input |
123
+ state-toggle. Save interactive/<slug>_<vp>.json.
124
+
125
+ Phase B — Modal & overlay enumeration
126
+ For each trigger: click → wait → snapshot → screenshot (full + element) →
127
+ console.error? → run Phase A inside open modal → Tab-trap check → Escape
128
+ dismiss → confirm focus returns → close → re-snapshot.
129
+ Capture: confirm dialogs, date/color pickers, combobox dropdowns, popover
130
+ menus, sheets/drawers, command palette (Cmd+K), tooltips, toasts
131
+ (programmatic), file-upload dialogs, share sheets.
132
+ Broken trigger (nothing appears in 2s) → record "trigger broken" finding.
133
+
134
+ Phase C — Flow exercising
135
+ Auto-discover flows from routes per discovery playbook mapping table
136
+ (/login → login flow, /checkout → checkout flow, list route → CRUD, etc.).
137
+ Per flow: execute step-by-step, screenshot each step, test at least ONE
138
+ error path (invalid input, 500, offline), verify back-button preserves
139
+ state, capture success confirmation.
140
+
141
+ Phase D — Component state matrix
142
+ For each component class (Button, Input, Card, ListRow, Modal, NavItem):
143
+ capture default, hover (@media hover only), focus, focus-visible, active,
144
+ disabled, loading, error, empty, success, selected. Save to
145
+ screens/components/<Class>/<state>.png. Missing states → finding.
146
+ Emit component-state-matrix.json.
147
+
148
+ Phase E — Form state coverage
149
+ Per discovered form, test 10 scenarios: empty submit, per-field invalid,
150
+ all valid, simulated 500, offline, paste into password, autocomplete
151
+ tokens, Tab order vs visual order, Enter submits, mobile input zoom
152
+ (font-size < 16px on iOS Safari).
153
+
154
+ Postel's Law robustness check (artifact Part 1 law table; "be liberal in
155
+ what you accept, conservative in what you send"). Per text/tel/email/date
156
+ input, verify the field is liberal on input:
157
+ - trims leading/trailing whitespace before validation;
158
+ - accepts the common format variants users actually type (phone:
159
+ "+55 (11) 9 9999-9999", "5511999999999", "11999999999"; date:
160
+ "2026-04-19", "19/04/2026", "Apr 19 2026"; email: case-insensitive
161
+ local-part where the provider allows it);
162
+ - accepts pasted values with mixed whitespace / soft hyphens / unicode
163
+ thin spaces without rejecting.
164
+ And strict on output: the value submitted to the backend and the value
165
+ re-rendered to the user are canonicalized (E.164 phone, ISO-8601 date,
166
+ trimmed). Record any field that rejects a legitimate variant that a
167
+ reasonable user would type as finding code `form-postel-<slug>`
168
+ (severity MEDIUM unless blocking primary conversion → HIGH).
169
+ ```
170
+
171
+ **Budget rule:** On large apps, cap to top 5 triggers per page (ranked by
172
+ proximity to primary CTA), critical flows only (login + checkout + 1 CRUD),
173
+ and the 3 core components (Button, Input, Modal). Record skipped scope in
174
+ `scope.json`.
175
+
176
+ **Hard rules:** ONE Playwright session reused across all phases. Sequential
177
+ only. Every opened modal has open + closed screenshots. Every flow captures
178
+ at least one error path. Broken triggers never abort the audit.
179
+
97
180
  ## Step 3 — Apply methodology per page × viewport
98
181
 
99
182
  ### 3a. Automated a11y
@@ -105,15 +188,203 @@ For each of 10 heuristics (methodology §1), work audit questions. Score 0–4 v
105
188
  ### 3c. WCAG 2.2 AA manual pass
106
189
  Items NOT covered by axe (methodology §2.3): keyboard traps, focus-order-matches-visual-order, `:focus-visible` quality, reflow at 320px, text-spacing override, `prefers-reduced-motion`, alt text quality, link/button text adequacy.
107
190
 
191
+ **WCAG 2.2 new Success Criteria — explicit checks (finding code prefix `a11y-wcag22-<sc>`):**
192
+
193
+ - **2.4.11 Focus Not Obscured (Minimum) — AA** → `a11y-wcag22-2.4.11`. Tab/Shift+Tab through every page at all 3 viewports; verify every focused control is at least partly visible. Common fail: sticky headers/footers (`position:fixed`) covering focused links/inputs. Fix pattern: `html { scroll-padding-top: <header-h>; scroll-padding-bottom: <footer-h>; }`.
194
+ - **2.5.7 Dragging Movements — AA** → `a11y-wcag22-2.5.7`. Enumerate every `draggable="true"`, drag-to-reorder list, range slider, kanban column, canvas drag. Each must expose a single-pointer non-dragging alternative (up/down buttons, ± steppers, numeric input, menu action). Keyboard-only is NOT sufficient (touch-only users).
195
+ - **3.2.6 Consistent Help — A** → `a11y-wcag22-3.2.6`. If help mechanisms exist (contact, chat, self-help link, support email), verify they appear in the same relative DOM order across every page they occur on. Record snapshot quote per page, diff order, file finding if inconsistent.
196
+ - **3.3.7 Redundant Entry — A** → `a11y-wcag22-3.3.7`. In any multi-step process (checkout, onboarding, registration), verify information previously entered is auto-populated or available for selection (e.g., "billing same as shipping" prefilled). Exceptions: essential re-entry (password confirmation), security-related, stale data. Browser autocomplete does not satisfy — the site must provide the value.
197
+ - **3.3.8 Accessible Authentication (Minimum) — AA** → `a11y-wcag22-3.3.8`. On every auth surface (login, re-auth, 2FA, password reset), confirm no cognitive-function test (memorize password, transcribe OTP, puzzle CAPTCHA) is required unless an alternative exists (passkey, magic link), a mechanism helps (paste allowed + `autocomplete="username | current-password | one-time-code"`), or an object/personal-content exception applies. Fail pattern: `onpaste="return false"` or `autocomplete="off"` on password.
198
+ - **3.3.9 Accessible Authentication (Enhanced) — AAA** → `a11y-wcag22-3.3.9` (advisory only, not required for AA audits). Flag as an advisory finding when AA passes only via the object-recognition or personal-content exception (e.g., "select all crosswalks" CAPTCHA). Passkeys / WebAuthn / biometrics / magic links clear this bar.
199
+
108
200
  ### 3d. Baymard (if e-commerce detected)
109
201
  If `package.json` has stripe/shopify/medusajs/saleor OR routes include /checkout /cart /products: checkout-flow + form-design + filter + PDP checklist (methodology §3).
110
202
 
203
+ ### 3e.0 Phase 0 — CrUX field data (MUST run before 3e synthetic lab audit)
204
+
205
+ Lab numbers (Lighthouse / Playwright / web-vitals injected at audit time)
206
+ are deterministic but reflect a single throttled machine. Google ranks on
207
+ **field** data — Chrome User Experience Report (CrUX), 28-day p75 over real
208
+ users. A site can score 95 in the lab and "Poor" in field due to device
209
+ diversity. Field is authoritative; lab is indicative only.
210
+
211
+ Before the synthetic pass in 3e, fetch CrUX for the origin (and for each
212
+ templated page type if a key is configured):
213
+
214
+ ```bash
215
+ # CrUX API (Chrome UX Report) — requires $CRUX_KEY or PageSpeed Insights key
216
+ curl -s "https://chromeuxreport.googleapis.com/v1/records:queryOrigin?key=$CRUX_KEY" \
217
+ -H 'Content-Type: application/json' \
218
+ -d "{\"origin\":\"<site-origin>\",\"formFactor\":\"PHONE\"}" \
219
+ > "$SESSION_DIR/vitals/crux_origin_mobile.json"
220
+
221
+ curl -s "https://chromeuxreport.googleapis.com/v1/records:queryOrigin?key=$CRUX_KEY" \
222
+ -H 'Content-Type: application/json' \
223
+ -d "{\"origin\":\"<site-origin>\",\"formFactor\":\"DESKTOP\"}" \
224
+ > "$SESSION_DIR/vitals/crux_origin_desktop.json"
225
+
226
+ # Optional: per-URL record (only if URL has sufficient traffic)
227
+ curl -s "https://chromeuxreport.googleapis.com/v1/records:queryRecord?key=$CRUX_KEY" \
228
+ -H 'Content-Type: application/json' \
229
+ -d "{\"url\":\"<full-url>\",\"formFactor\":\"PHONE\"}" \
230
+ > "$SESSION_DIR/vitals/crux_<slug>_mobile.json"
231
+ ```
232
+
233
+ Capture field `p75` for LCP / INP / CLS (and FCP / TTFB when present).
234
+ Outcomes:
235
+ - **CrUX present + sufficient traffic** → field values are the verdict; lab
236
+ values annotate drill-down only.
237
+ - **CrUX absent or insufficient traffic** → record the gap, fall back to
238
+ lab, and tag every performance finding as `source: "lab"`.
239
+
111
240
  ### 3e. Core Web Vitals
112
- Parse `session_dir/vitals/<page>.json`. LCP/INP/CLS/FCP/TTFB/TBT against thresholds (methodology §4). Doherty: interactions <400ms feedback.
241
+ Parse `session_dir/vitals/<page>.json` (lab) AND `crux_*_mobile.json` /
242
+ `crux_*_desktop.json` (field). LCP/INP/CLS/FCP/TTFB/TBT against thresholds
243
+ (methodology §4). Doherty: interactions <400ms feedback.
244
+
245
+ **Tag every performance finding with a `source` field:**
246
+
247
+ ```json
248
+ {
249
+ "rule": "cwv-lcp",
250
+ "source": "lab" | "field" | "both",
251
+ "lab_value_ms": 3200,
252
+ "field_p75_ms": 4100,
253
+ "field_sample": "CrUX 28-day p75, PHONE",
254
+ "verdict": "needs-improvement"
255
+ }
256
+ ```
257
+
258
+ - `source: "both"` when lab + CrUX agree → highest confidence; proceed to fix.
259
+ - `source: "field"` when CrUX fails but lab passes → real users hit it; still real.
260
+ - `source: "lab"` when CrUX is absent / insufficient → note the gap and
261
+ surface as "unverified by field data" in the executive summary.
262
+ - If lab and field disagree by > 30%, file a meta-finding
263
+ (`rule: perf-lab-field-divergence`) with both numbers and `source: "both"`.
113
264
 
114
265
  ### 3f. Implicit criteria (methodology §5)
115
266
  60+ checks: empty/loading/error states, focus restoration after modals, aria-live for toasts, password affordances, autocomplete tokens, touch target spacing, deep linking, back-button in SPAs, scroll restoration, copy-paste tolerance, timeout/offline/5xx, session expiration, i18n edges, print stylesheet. pass/fail/n-a with evidence.
116
267
 
268
+ ### 3g. Design-intelligence scoring (MANDATORY)
269
+
270
+ **Read `design-intelligence-rubric.md` now.** WCAG and Nielsen catch accessibility
271
+ and usability failures; they do NOT catch a dashboard that ships 10 oversized
272
+ metric cards stacked in a flex-col. Design intelligence is the implicit
273
+ best-practice layer that a senior design engineer would spot instantly but
274
+ that checklists ignore. This is non-negotiable — absence of this pass is how
275
+ the beats-market mobile dashboard shipped with cards-in-flex-col and nothing
276
+ flagged it.
277
+
278
+ Per page × viewport, score the 17 rubric categories 0–10:
279
+
280
+ ```
281
+ C1 visual-hierarchy C10 motion-quality
282
+ C2 density C11 navigation-clarity
283
+ C3 consistency-spacing C12 table-on-mobile
284
+ C4 consistency-typography C13 modal-sheet-choice
285
+ C5 consistency-color C14 color-semantics
286
+ C6 whitespace-discipline C15 empty-loading-error-quality
287
+ C7 legibility C16 design-system-coherence
288
+ C8 cta-hierarchy C17 vibecode-smell
289
+ C9 state-feedback
290
+ ```
291
+
292
+ Formula: `DIS = Σ(score × weight) / Σ(weight) × 10` → 0–100.
293
+
294
+ Bands: 80–100 excellent · 65–79 solid · 50–64 MEDIUM · 35–49 WEAK · <35 broken.
295
+
296
+ Emit `design-intelligence.json` per page:
297
+
298
+ ```json
299
+ {
300
+ "page_url": "...",
301
+ "viewport": "mobile",
302
+ "dis_score": 57.5,
303
+ "band": "MEDIUM",
304
+ "categories": {
305
+ "density": { "score": 3, "evidence": "screens/admin_mobile.png", "note": "10 metric cards in flex-col, ~80px each = 800px of scroll for 10 numbers" },
306
+ "design_system_coherence": { "score": 4, "evidence": "...", "recommended_skills": ["typeui-dashboard", "typeui-application"] },
307
+ "...": {}
308
+ }
309
+ }
310
+ ```
311
+
312
+ **Any category ≤ 4 spawns a finding** with `rule: design-intelligence-<category>`,
313
+ severity mapped from score (0-1 → sev 4, 2-3 → sev 3, 4 → sev 2), and
314
+ `template_id` from the M-family (see fix-playbook M1-M15).
315
+
316
+ **C16 ≤ 4 MUST emit an advisory finding** citing `design-skills-catalog.md`
317
+ with `recommended_skills: [...]` populated from the selection matrix. This
318
+ is NEVER auto-applied — design-skill adoption is always HIGH risk.
319
+
320
+ ### 3h. Mobile-specific audit (viewport ≤ 768px ONLY)
321
+
322
+ **Read `mobile-app-patterns/SKILL.md` now.** Desktop-responsive-down is not
323
+ mobile-native. Run the 21-item checklist verbatim against each mobile page:
324
+
325
+ ```
326
+ □ Primary nav is bottom tabs (3-5), not hamburger-only → M1
327
+ □ Dashboards use hero + compact list, not card stack → M2 (cards-in-flex-col)
328
+ □ Tables transformed to card-per-row or compact list → M3 (table-on-mobile)
329
+ □ No input has font-size < 16px → M4 (ios-zoom)
330
+ □ Every interactive target ≥ 44×44 px → M5 (touch-target)
331
+ □ Modals are bottom sheets or full-screen, not centered → M6 (centered-modal)
332
+ □ No hover-only state; every hover has a tap equivalent → M7
333
+ □ Loading states exist for async flows → M8
334
+ □ Empty states exist for zero-data cases → M9
335
+ □ Error states exist for server failures → M10
336
+ □ Safe-area insets respected (iOS notch) → M11
337
+ □ 100svh / 100dvh (not 100vh) for full-height → M12
338
+ □ overscroll-behavior: contain on scroll containers → M13
339
+ □ Pull-to-refresh on primary list views → M14
340
+ □ Swipe actions discoverable (peek on first render) → M15
341
+ □ Back gesture (iOS) works via browser history
342
+ □ Keyboard does not overlap input (visualViewport)
343
+ □ Touch targets 8px+ apart
344
+ □ Long-press fallback for swipe actions
345
+ □ Bottom sheet CTAs sticky above safe area
346
+ □ Content density: 6-8 metrics above the fold (not 2-3)
347
+ ```
348
+
349
+ Each failed item → finding with `rule: mobile-pattern-M<N>`, evidence from
350
+ Step 2.5 artifacts (NOT a fresh snapshot), `template_id: M<N>`.
351
+
352
+ **Real-device vs emulation disclaimer (MANDATORY).** Playwright MCP drives
353
+ Blink/Chromium in a resized viewport — it is NOT real iOS Safari (WebKit),
354
+ Android Chrome on a low-end device, or any in-app WebView. Emulation can
355
+ confirm layout, DOM, a11y tree, tab order, reduced-motion / forced-colors,
356
+ computed CSS. It CANNOT confirm touch haptics, iOS safe-area rendering,
357
+ iOS Safari font rasterization, PWA install/add-to-home-screen, iOS keyboard
358
+ overlap via `visualViewport`, viewport-zoom quirks under pinch, Samsung
359
+ Internet auto-dark, real Pointer Event latency, or hover-only fallbacks on
360
+ real touch (iOS "sticky hover"). See methodology §9 for the full list.
361
+
362
+ Any mobile finding whose verdict would require iOS Safari or Android Chrome
363
+ on a real device to confirm — touch haptics, iOS safe-area insets, PWA
364
+ install, pinch-zoom quirks, `@media (hover: hover)` behavior on real touch,
365
+ payment sheet (Apple Pay / Google Pay), biometrics, push — MUST be tagged
366
+ in the finding JSON as:
367
+
368
+ ```json
369
+ {
370
+ "category": "real-device-required",
371
+ "real_device_required": true,
372
+ "emulation_verdict": "likely_fail | likely_pass | indeterminate",
373
+ "requires": ["ios-safari", "android-chrome"],
374
+ "rationale": "Playwright runs Blink; iOS Safari uses WebKit; cannot confirm X on emulation."
375
+ }
376
+ ```
377
+
378
+ `sd-synthesis` MUST surface a "real-device verification needed" banner at
379
+ the top of the executive summary listing every finding where
380
+ `real_device_required=true`, grouped by `requires` platform, so the human
381
+ reviewer books a BrowserStack / Sauce / LambdaTest session before sign-off.
382
+
383
+ Cross-reference the competitor component vocabulary from
384
+ `.cache/evidence/component-comparison.md` — if every competitor uses bottom
385
+ tabs on mobile and the product uses hamburger-only, density score drops AND
386
+ the M1 finding cites the category norm.
387
+
117
388
  ## Step 4 — Write findings
118
389
 
119
390
  Append to `docs/super-design/findings/F-NNNN.md` (one file per finding) AND `.super-design/sessions/<id>/findings.json`.
@@ -125,14 +396,21 @@ Every finding MUST have:
125
396
  - `snapshot_path` + `snapshot_quote` (verbatim `[ref=eNN]` from YAML)
126
397
  - `dom_selector` (resolves)
127
398
  - `computed_style_excerpt`
128
- - `rule` (e.g., color-contrast, label, button-name, nielsen-h7, baymard-checkout-41, cwv-lcp)
399
+ - `rule` (e.g., color-contrast, label, button-name, nielsen-h7, baymard-checkout-41, cwv-lcp, design-intelligence-density, mobile-pattern-M2)
129
400
  - `wcag_criterion` (if applicable)
130
401
  - `nielsen_heuristic` (if applicable)
402
+ - `dis_category` (if rule is design-intelligence-*: one of the 17 categories)
131
403
  - `severity` (0–4 Nielsen)
132
404
  - `risk_for_fix` (TRIVIAL | LOW | MEDIUM | HIGH per fix-playbook §12)
133
- - `suggested_fix` with `template_id` (fix-playbook §7: A1-A15 a11y / V1-V8 design / U1-U10 ux / P1-P10 perf)
405
+ - `suggested_fix` with `template_id` (fix-playbook §7: A1-A15 a11y / V1-V8 design / U1-U10 ux / P1-P10 perf / M1-M15 mobile / DSC-1 design-skill advisory)
406
+ - `recommended_skills` (array, optional — populated for C16 advisories from design-skills-catalog.md selection matrix)
407
+ - `advisory_only` (bool, default false — true for design-skill recommendations and other HIGH-risk aesthetic suggestions that need human sign-off)
134
408
  - `finding` — one-sentence impact statement
135
409
 
410
+ Additionally, write `design-intelligence.json` (per page × viewport) alongside
411
+ findings.json with the full 17-category score breakdown. sd-synthesis reads
412
+ this to produce the executive DIS summary.
413
+
136
414
  ## Step 5 — Verification snippets
137
415
 
138
416
  ### Web Vitals injection
@@ -165,7 +443,11 @@ Every finding MUST have:
165
443
  document.head.appendChild(s);
166
444
  });
167
445
  window.__axe = await window.axe.run(document, {
168
- runOnly: { type: 'tag', values: ['wcag2a','wcag2aa','wcag21a','wcag21aa','wcag22aa','best-practice'] }
446
+ runOnly: { type: 'tag', values: ['wcag2a','wcag2aa','wcag21a','wcag21aa','wcag22aa','best-practice'] },
447
+ // WCAG 2.2 rules (e.g., focus-not-obscured) ship under axe-core's
448
+ // experimental flag — without this, SC 2.4.11 / 2.5.7 / 2.5.8 etc.
449
+ // simply will NOT execute. Always enable for super-design audits.
450
+ experimental: true
169
451
  });
170
452
  })();
171
453
  ```
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: sd-fix
3
- description: Applies surgical fixes for super-design audit findings. Invoked when user explicitly asks for fixes after audit. Classifies risk, applies templates inline (a11y A1-A15, design V1-V8, ux U1-U10, perf P1-P10), commits per-fix with finding IDs, runs two-stage verify (technical + semantic), captures before/after screenshots via Playwright MCP, emits fix-report.md with visual diff, auto-rollback on failure.
3
+ description: Applies surgical fixes for super-design audit findings. Invoked when user explicitly asks for fixes after audit. Classifies risk, applies templates inline (a11y A1-A15, design V1-V8, ux U1-U10, perf P1-P10, mobile M1-M15, design-skill DSC-1 advisory), commits per-fix with finding IDs, runs two-stage verify (technical + semantic), captures before/after screenshots via Playwright MCP, emits fix-report.md with visual diff, auto-rollback on failure.
4
4
  tools:
5
5
  - Read
6
6
  - Edit
@@ -31,7 +31,13 @@ You are sd-fix — the unified fix agent. You apply templates for all four categ
31
31
 
32
32
  # Preflight — always run
33
33
 
34
- Read `.claude/skills/super-design/references/fix-agent-playbook.md`. Then:
34
+ Read in order:
35
+ 1. `.claude/skills/super-design/references/fix-agent-playbook.md`
36
+ 2. `.claude/skills/mobile-app-patterns/SKILL.md` (M-template source — code snippets come from here)
37
+ 3. `.claude/skills/super-design/references/design-skills-catalog.md` (DSC-1 advisory selection)
38
+ 4. `.claude/skills/super-design/references/design-intelligence-rubric.md` (design-intelligence-* finding context)
39
+
40
+ Then:
35
41
 
36
42
  ```bash
37
43
  git status --porcelain
@@ -189,6 +195,17 @@ Source of truth: `references/fix-agent-playbook.md` §7.
189
195
  - Swap color used in >5 files → needs_human (too broad for single fix)
190
196
  - Convert design token itself → MEDIUM, escalate
191
197
 
198
+ **Color-space rule (V4 and any new token):** When emitting new color tokens
199
+ (V4 snap-to-nearest and any fresh tokens proposed by V-templates), express
200
+ them in **OKLCH** — the perceptually uniform color space used by modern
201
+ design systems (Tailwind v4, shadcn 2024+, Radix Colors). Hex / RGB are
202
+ accepted ONLY when they match the existing codebase convention (e.g., the
203
+ project's `tokens.css` / `globals.css` already defines all colors as hex).
204
+ Mixing OKLCH tokens into a hex-only codebase requires a separate
205
+ token-migration finding and is `needs_human`. Format:
206
+ `--color-primary-500: oklch(0.65 0.20 265);` (lightness 0-1, chroma 0+,
207
+ hue 0-360).
208
+
192
209
  ## ux templates (U1–U10)
193
210
 
194
211
  | ID | Fix |
@@ -209,6 +226,85 @@ Source of truth: `references/fix-agent-playbook.md` §7.
209
226
  - Change form submission semantics → needs_human
210
227
  - Introduce new dependency for Undo toast lib → needs_human
211
228
 
229
+ ## mobile templates (M1–M15)
230
+
231
+ Source: `.claude/skills/mobile-app-patterns/SKILL.md` — copy snippets verbatim.
232
+ Apply ONLY when `finding.viewport` is `mobile` (≤768px). Desktop/tablet
233
+ mobile-pattern findings → needs_human (UI architecture decision).
234
+
235
+ | ID | Fix | Risk | Pattern |
236
+ |---|---|---|---|
237
+ | M1 | Hamburger-only nav → `<nav class="fixed inset-x-0 bottom-0 ...">` bottom tab bar (3–5 destinations, fill-icon + label, safe-area-inset-bottom padding) | MEDIUM | needs_human for tab selection |
238
+ | M2 | Metric cards in `flex-col` → `<ul class="divide-y">` compact list rows (py-3 px-4, icon + label on left, tabular-nums value on right). For the hero metric extract into `<section>` with 4xl tabular-nums number + delta chip | LOW | auto when only one metric-card block |
239
+ | M3 | `<table>` at ≤768px → card-per-row (`<article>` with primary text + metadata chips + trailing `[⋯]` menu) OR compact list (avatar + two lines + trailing meta). Preserve sort/filter controls above | MEDIUM | needs_human if >3 columns carry semantic meaning |
240
+ | M4 | Input `font-size < 16px` → `font-size: max(16px, 1rem)` (Tailwind: `text-base` or `text-[max(16px,1rem)]`). Prevents iOS Safari zoom-on-focus | TRIVIAL | auto |
241
+ | M5 | Touch target <44×44 → wrap interactive node in `<button class="size-11 flex items-center justify-center">` keeping inner glyph. Add 8px+ gap to adjacent targets | LOW | auto for isolated buttons |
242
+ | M6 | Centered modal on mobile → migrate to bottom sheet via Vaul/Radix Drawer (`<Drawer.Root>` + `Drawer.Content className="fixed inset-x-0 bottom-0 rounded-t-2xl"`). Full-screen variant for flows | MEDIUM | needs_human — swap affects all call sites |
243
+ | M7 | Hover-only affordance → gate with `@media (hover: hover)`; add tap equivalent (visible button, long-press menu, or always-on chip) | LOW | auto for tooltip-only hovers |
244
+ | M8 | Async action without loading state → apply U2 template (disabled + aria-busy + Spinner + label change + role="status") | TRIVIAL | auto |
245
+ | M9 | Zero-data view without empty state → apply U3 template with mobile-specific illustration+CTA (full-width button) | LOW | auto |
246
+ | M10 | Server failure without error state → apply U4 template; ensure retry button is 44px tall and sticky above safe-area | LOW | auto |
247
+ | M11 | Missing safe-area insets → `padding-top: env(safe-area-inset-top)` on header, `padding-bottom: env(safe-area-inset-bottom)` on bottom nav/CTA. `viewport-fit=cover` in meta viewport | TRIVIAL | auto |
248
+ | M12 | `100vh` anywhere → `100svh` primary, `100dvh` fallback, `-webkit-fill-available` legacy. Replace all occurrences in one file | TRIVIAL | auto |
249
+ | M13 | Inner scroll conflicting with browser pull → `overscroll-behavior: contain` on the inner scroll container | TRIVIAL | auto |
250
+ | M14 | Primary list without pull-to-refresh → propose integration of `react-pull-to-refresh` or Framer gesture; register refresh handler with existing query-key invalidation | MEDIUM | needs_human — new dependency |
251
+ | M15 | Swipe-action row without peek → add `transform: translateX(-8px)` reveal on first render, animate back after 600ms (framer keyframes). Ensure long-press fallback + trailing `[⋯]` menu button | MEDIUM | needs_human if no existing swipe-action library |
252
+
253
+ **Never auto-apply:**
254
+ - Any `M*` fix when finding affects >1 layout component — UI architecture decision
255
+ - Adding a drawer/sheet dependency (Vaul, vaul-drawer) without user confirmation
256
+ - Converting a table to cards when columns drive business logic (sortable compound filters)
257
+ - Replacing existing nav structure — always needs_human
258
+
259
+ ## design-skill advisory (DSC-1)
260
+
261
+ Source: `.claude/skills/super-design/references/design-skills-catalog.md`.
262
+
263
+ **This template never writes code.** When audit emits a finding with
264
+ `rule: design-intelligence-design-system-coherence` and score ≤ 4, or with
265
+ `advisory_only: true` and `recommended_skills: [...]`, sd-fix MUST:
266
+
267
+ 1. Mark finding `status: proposed` (not applied, not skipped).
268
+ 2. Write `.super-design/sessions/<id>/proposals/F-NNNN_design-skill-advisory.md`
269
+ using this structure:
270
+
271
+ ```markdown
272
+ # F-NNNN — Design-skill advisory (NON-FIX)
273
+
274
+ **Rule:** design-intelligence-design-system-coherence
275
+ **Risk:** HIGH (aesthetic change requires human sign-off)
276
+
277
+ ## Current state
278
+ <embedded finding.screenshot_path + finding.finding one-liner>
279
+
280
+ ## Recommended skills
281
+ <for each id in finding.recommended_skills:>
282
+ - **<id>** — <description from design-skills-catalog selection matrix>
283
+ - Visual signature: <catalog signature column>
284
+ - When to recommend: <catalog "When to recommend" column>
285
+
286
+ ## Competitor evidence
287
+ <best-matching competitor screenshots from
288
+ .cache/evidence/<slug>/<viewport>/components/ cited as reference — pick
289
+ the 2 closest to the recommended aesthetic>
290
+
291
+ ## Next step for the user
292
+ Run `/frontend-design` (or re-run super-design with the chosen skill
293
+ active) to apply this direction. sd-fix cannot auto-apply aesthetic
294
+ realignment because every subsequent fix depends on the chosen
295
+ tokens.
296
+ ```
297
+
298
+ 3. Append to fix-report.md under a "Proposed aesthetic direction" section,
299
+ NOT under "Applied fixes" — the image diff format is
300
+ `current state ↔ recommended reference` (competitor screenshot), not
301
+ `before ↔ after`.
302
+
303
+ 4. No commit. No verify. No after-capture.
304
+
305
+ **DSC-1 is the ONLY finding family where sd-fix writes documentation
306
+ without writing code.**
307
+
212
308
  ## perf templates (P1–P10)
213
309
 
214
310
  | ID | Fix |
@@ -245,6 +341,9 @@ Source of truth: `references/fix-agent-playbook.md` §7.
245
341
  - Never skip Step 5.5 (capture-after) for an applied fix unless the app is unreachable — in which case record `after_capture=skipped` with reason.
246
342
  - Never fabricate after-screenshots. No real browser call → no after image.
247
343
  - Never run Step 5.5 in parallel against the same browser tab.
344
+ - Never auto-apply DSC-1 (design-skill advisory) — write the proposal, then stop.
345
+ - Never auto-apply any M* fix for desktop/tablet viewports — mobile patterns do not generalize upward.
346
+ - Never introduce a mobile-pattern dependency (Vaul, vaul-drawer, react-pull-to-refresh, react-swipeable-list) without user confirmation.
248
347
 
249
348
  # Evidence rule
250
349
 
@@ -22,17 +22,133 @@ Output: exactly one file `docs/super-design/market-analysis.md` + evidence under
22
22
 
23
23
  3. **Detect niche.** Apply 8-signal scoring (playbook §1). Confidence = top / (top + second). If <0.55, use AskUserQuestion with 3 options from top verticals. Record reasoning to `.cache/evidence/niche.md`.
24
24
 
25
+ **Regulated-niche always-confirm rule.** Regulated niches: compliance-driven design choices override aesthetic preference, so always confirm. If the detected niche falls into any of the following — **fintech, healthtech, legaltech, gambling, crypto, insurance, children's-app** — ALWAYS fire `AskUserQuestion` to confirm niche + regulatory scope even when detector confidence is ≥0.95. These niches carry compliance implications (SOC2, HIPAA, PCI-DSS, GDPR, PSD2, COPPA, KYC/AML, age-gating, disclosure-mandated copy) that design directly affects — getting the niche wrong wastes the audit. Record the confirmation (selected scope, applicable regulations) to `.cache/evidence/niche.md` under `regulatory_scope:`.
26
+
25
27
  4. **Discover competitors.** 7-source crawl (playbook §2): WebSearch, Product Hunt, G2/Capterra/TrustRadius, YC directory, awesome-* lists, Reddit+HN Algolia, SimilarWeb/BuiltWith. Dedupe by domain. Rank fame × similarity × design-signal. Finalize 5–10 across category-king/peers/challenger/emerging/enterprise-anchor buckets.
26
28
 
27
- 5. **Audit each competitor via Playwright MCP.** Visit homepage, primary product page, pricing, About. Per playbook §3: screenshot + snapshot + tokens + copy per page. Save to `.cache/evidence/<slug>/`.
29
+ **4a. Neumeier insertion test during discovery (per candidate).** For every candidate competitor considered for the final 5–10, apply Neumeier's insertion test (playbook §5.4): *"If this competitor's brand mark were swapped with the project's, would users notice?"* Score each on a 0–5 scale:
30
+
31
+ | Score | Meaning |
32
+ |---|---|
33
+ | 0 | Fully swappable — no brand equity, pure commodity visual language |
34
+ | 1 | Mostly swappable — generic category codes only |
35
+ | 2–3 | Partially distinct — some ownable elements but weak |
36
+ | 4 | Strong distinct identity — clear ownable signals |
37
+ | 5 | Instantly distinct — singular, unmistakable brand mark |
38
+
39
+ Competitors scoring ≤1 are **commodity benchmarks** (show what the category looks like by default); competitors scoring ≥4 **reveal defensible territory** (show what ownable positioning looks like). Include a healthy mix of both. Record the score and one-line justification per competitor in `market-analysis.md` (competitor table) and the per-competitor row in `.cache/evidence/<slug>/component-catalog.md` under a new `Insertion-test score:` field.
40
+
41
+ **4b. Vibe-quadrant final gate (self-check before step 5).** Before moving to step 5, plot the project draft position and each finalized competitor on a 2-axis vibe quadrant:
42
+
43
+ - **X axis:** serious ↔ playful
44
+ - **Y axis:** minimal ↔ expressive
45
+
46
+ If the project lands in the **same quadrant as ≥3 competitors**, surface a warning in `market-analysis.md` under a `## Positioning risk` section — exact text: `crowded quadrant — positioning risk` — and recommend **one axis shift** (per Kapferer prism §4.3 / Aaker dimensions §4.2) that would move the project into a less-occupied quadrant. **Do not auto-decide the shift**; document the warning and the recommended axis for synthesis (step 8) to reconcile with the user. Save the quadrant plot data (project + competitor coordinates) to `.cache/evidence/vibe-quadrant.md`.
47
+
48
+ 5. **Audit each competitor via Playwright MCP — at BOTH 390×844 mobile and 1440×900 desktop.** Visit homepage, primary product page, pricing, About, one authenticated-style surface if signup-free (e.g., docs, app tour). Per playbook §3 PLUS component-level extraction per §3bis below. Save to `.cache/evidence/<slug>/<viewport>/`.
49
+
50
+ ### §3bis. Component-level extraction (mandatory, not optional)
51
+
52
+ A competitor page snap tells us nothing about their UI language. Extract the
53
+ actual design vocabulary. Per competitor, per viewport, capture:
54
+
55
+ | Artifact | How |
56
+ |---|---|
57
+ | `home.png`, `pricing.png`, etc. | Full-page screenshots as before |
58
+ | `components/button_primary.png`, `button_secondary.png`, `button_ghost.png` | `browser_take_screenshot({ element, ref })` cropped to each button variant found on home |
59
+ | `components/nav_desktop.png`, `nav_mobile.png` | Navbar/bottom-tab crops |
60
+ | `components/card_feature.png`, `card_metric.png`, `card_testimonial.png` | Card variant crops |
61
+ | `components/list_row.png` | If any list pattern exists, crop one row |
62
+ | `components/input.png`, `input_focus.png` | Form field default + focused (click into it) |
63
+ | `components/modal.png` | Open newsletter/contact/signup modal if present |
64
+ | `components/empty_state.png` | Navigate to filter-empty or search-no-results if possible |
65
+ | `components/loading.png` | Throttle network in `browser_evaluate` during a nav, capture transient state if possible |
66
+ | `components/footer.png` | Footer crop |
67
+ | `tokens.json` | Computed palette (top 8 colors by frequency), typography (family, sizes, weights used), spacing sample, radius sample, shadow sample |
68
+ | `copy.md` | Hero copy, 3 feature headlines, primary CTA label, testimonial snippet |
69
+
70
+ Save under `.cache/evidence/<slug>/<viewport>/components/`.
71
+
72
+ For each competitor, produce `.cache/evidence/<slug>/component-catalog.md`:
73
+
74
+ ```markdown
75
+ # <Competitor> — Component Catalog (<viewport>)
76
+
77
+ ## Buttons
78
+ - Primary: [image] — background: #FF5733, radius 8px, padding 12px 24px, font-weight 600
79
+ - Secondary: [image] — border + transparent bg
80
+ - Ghost: [image] — text only with hover bg
81
+
82
+ ## Navigation
83
+ - Desktop: [image] — fixed top, logo left, nav center, CTA right
84
+ - Mobile: [image] — bottom tabs with 4 destinations + center FAB
85
+
86
+ ## Cards
87
+ - Feature: [image]
88
+ - Pricing: [image]
89
+
90
+ ## Forms
91
+ - Input default: [image]
92
+ - Input focused: [image]
93
+
94
+ ## Modals
95
+ - Signup: [image]
96
+
97
+ ## Design tokens observed
98
+ - Palette: #… #… #…
99
+ - Type: Inter 14/16/20/32, weights 400/500/700
100
+ - Spacing: 4/8/16/24/48
101
+ - Radius: 8/12/24
102
+ - Shadows: 0 1px 2px …
103
+
104
+ ## Copy tone
105
+ - Hero: "…"
106
+ - CTA: "Start free" (action + value)
107
+ - Tone: confident, direct, technical
108
+ ```
109
+
110
+ **Skip rule:** If a competitor has no interactive elements (static marketing
111
+ site only), mark with `components_available: minimal` and note what is missing.
112
+ Never fabricate.
113
+
114
+ **Category synthesis update:** After all competitors cataloged, also produce
115
+ `.cache/evidence/component-comparison.md` — tabulates every competitor's
116
+ button style, nav style, card style side-by-side. This is the input sd-audit
117
+ and sd-fix use to recommend aesthetic direction.
28
118
 
29
119
  6. **Classify each.** Archetype (§4.1), Aaker peak (§4.2), vibe class, NN/g 4D tone (§7.1), hero-pattern.
30
120
 
121
+ **6a. Voice/tone capture — 8–12 copy sample rule (mandatory).** For each competitor, collect **8–12 distinct copy samples** (≥8 minimum; fewer = insufficient signal), one per surface where available:
122
+
123
+ - Hero headline
124
+ - Primary CTA label
125
+ - Error message
126
+ - Empty state
127
+ - 404 page
128
+ - Onboarding step 1
129
+ - Pricing caption / plan blurb
130
+ - Footer blurb
131
+ - ToS / legal excerpt
132
+ - (Optional extras: subhead, feature card, support article opener, confirmation toast)
133
+
134
+ Grade **each sample** on the NN/g 4D tone dimensions (playbook §7.1) using integers in {−1, 0, +1}:
135
+
136
+ - formal ↔ casual
137
+ - funny ↔ serious
138
+ - respectful ↔ irreverent
139
+ - enthusiastic ↔ matter-of-fact
140
+
141
+ Report per-sample scores + verbatim quote + source URL in `.cache/evidence/<slug>/copy-samples.md`, and the **mean + variance** per axis in `market-analysis.md` (tone row per competitor). Healthy brands are constant on voice, variable on tone.
142
+
143
+ **Insufficient-signal rule.** If fewer than 8 distinct samples can be collected (static site, gated app, locale blockers), **do not compute a tone profile** — flag the competitor as `tone-inconclusive` in `market-analysis.md` with a note listing which surfaces were missing. Never fabricate samples or scores to reach the threshold.
144
+
31
145
  7. **Build category-code matrix.** Tabulate dimensions (§5.1). Frequency per column. Classify codes obey/extend/subvert/open (§5.2).
32
146
 
33
- 8. **Synthesize.** Archetype in whitespace via Neumeier insertion test (§5.4). Palette, typography, tone, audience, JTBD. Draft onliness statement.
147
+ 8. **Synthesize.** Archetype in whitespace via Neumeier insertion test (§5.4). Palette, typography, tone, audience, JTBD.
148
+
149
+ 8b. **Three-territories pitch (Q7 — Part 7 of `docs/compass_artifact_wf-2e33af6e-127f-402e-8ce6-cb506fc91b94_text_markdown.md` lines 515–519, 652–653).** Before drafting the onliness statement, produce THREE parallel variants of the design direction — **safe** (conforms to category codes), **expected** (the obvious evolution of category codes), **edgy** (the considered provocation / subversion). Each variant MUST include: palette strip (3–6 tokens), type specimen (primary + optional display), motion character (duration + easing archetype), one-line rationale tying back to archetype + category-code matrix from step 7. Build them in parallel — never serialize, or you will anchor to the first. Save to `.cache/evidence/territories/{safe,expected,edgy}.md` and include a summary table in the brief. The user chooses the primary territory (optionally stealing one detail from another) BEFORE the onliness statement lands. "Presenting one direction looks like opinion; presenting three looks like strategy" (artifact line 519).
34
150
 
35
- 9. **Write `market-analysis.md`** per playbook §8 schema.
151
+ 9. **Draft onliness statement** against the chosen territory, then **write `market-analysis.md`** per playbook §8 schema (include the three-territories summary + chosen primary).
36
152
 
37
153
  10. **Self-check.** Fix gaps before returning.
38
154