start-vibing 4.3.0 → 4.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "start-vibing",
3
- "version": "4.3.0",
4
- "description": "Setup Claude Code with 9 plugins, 6 community skills, and 8 MCP servers. Parallel install, auto-accept, superpowers + ralph-loop. super-design 0.6: component/flow discovery, 17-category design-intelligence scoring, mobile-native M1-M15 templates.",
3
+ "version": "4.3.1",
4
+ "description": "Setup Claude Code with 9 plugins, 6 community skills, and 8 MCP servers. Parallel install, auto-accept, superpowers + ralph-loop. super-design 0.6.1: compass-artifact alignment — WCAG 2.2 SCs, CrUX field data, tool triangulation, Baymard sub-rules, Doherty/Tesler/Postel rationale, atomic state write, unshallow ladder.",
5
5
  "type": "module",
6
6
  "bin": {
7
7
  "start-vibing": "./dist/cli.js"
@@ -150,6 +150,22 @@ Phase E — Form state coverage
150
150
  all valid, simulated 500, offline, paste into password, autocomplete
151
151
  tokens, Tab order vs visual order, Enter submits, mobile input zoom
152
152
  (font-size < 16px on iOS Safari).
153
+
154
+ Postel's Law robustness check (artifact Part 1 law table; "be liberal in
155
+ what you accept, conservative in what you send"). Per text/tel/email/date
156
+ input, verify the field is liberal on input:
157
+ - trims leading/trailing whitespace before validation;
158
+ - accepts the common format variants users actually type (phone:
159
+ "+55 (11) 9 9999-9999", "5511999999999", "11999999999"; date:
160
+ "2026-04-19", "19/04/2026", "Apr 19 2026"; email: case-insensitive
161
+ local-part where the provider allows it);
162
+ - accepts pasted values with mixed whitespace / soft hyphens / unicode
163
+ thin spaces without rejecting.
164
+ And strict on output: the value submitted to the backend and the value
165
+ re-rendered to the user are canonicalized (E.164 phone, ISO-8601 date,
166
+ trimmed). Record any field that rejects a legitimate variant that a
167
+ reasonable user would type as finding code `form-postel-<slug>`
168
+ (severity MEDIUM unless blocking primary conversion → HIGH).
153
169
  ```
154
170
 
155
171
  **Budget rule:** On large apps, cap to top 5 triggers per page (ranked by
@@ -172,11 +188,79 @@ For each of 10 heuristics (methodology §1), work audit questions. Score 0–4 v
172
188
  ### 3c. WCAG 2.2 AA manual pass
173
189
  Items NOT covered by axe (methodology §2.3): keyboard traps, focus-order-matches-visual-order, `:focus-visible` quality, reflow at 320px, text-spacing override, `prefers-reduced-motion`, alt text quality, link/button text adequacy.
174
190
 
191
+ **WCAG 2.2 new Success Criteria — explicit checks (finding code prefix `a11y-wcag22-<sc>`):**
192
+
193
+ - **2.4.11 Focus Not Obscured (Minimum) — AA** → `a11y-wcag22-2.4.11`. Tab/Shift+Tab through every page at all 3 viewports; verify every focused control is at least partly visible. Common fail: sticky headers/footers (`position:fixed`) covering focused links/inputs. Fix pattern: `html { scroll-padding-top: <header-h>; scroll-padding-bottom: <footer-h>; }`.
194
+ - **2.5.7 Dragging Movements — AA** → `a11y-wcag22-2.5.7`. Enumerate every `draggable="true"`, drag-to-reorder list, range slider, kanban column, canvas drag. Each must expose a single-pointer non-dragging alternative (up/down buttons, ± steppers, numeric input, menu action). Keyboard-only is NOT sufficient (touch-only users).
195
+ - **3.2.6 Consistent Help — A** → `a11y-wcag22-3.2.6`. If help mechanisms exist (contact, chat, self-help link, support email), verify they appear in the same relative DOM order across every page they occur on. Record snapshot quote per page, diff order, file finding if inconsistent.
196
+ - **3.3.7 Redundant Entry — A** → `a11y-wcag22-3.3.7`. In any multi-step process (checkout, onboarding, registration), verify information previously entered is auto-populated or available for selection (e.g., "billing same as shipping" prefilled). Exceptions: essential re-entry (password confirmation), security-related, stale data. Browser autocomplete does not satisfy — the site must provide the value.
197
+ - **3.3.8 Accessible Authentication (Minimum) — AA** → `a11y-wcag22-3.3.8`. On every auth surface (login, re-auth, 2FA, password reset), confirm no cognitive-function test (memorize password, transcribe OTP, puzzle CAPTCHA) is required unless an alternative exists (passkey, magic link), a mechanism helps (paste allowed + `autocomplete="username | current-password | one-time-code"`), or an object/personal-content exception applies. Fail pattern: `onpaste="return false"` or `autocomplete="off"` on password.
198
+ - **3.3.9 Accessible Authentication (Enhanced) — AAA** → `a11y-wcag22-3.3.9` (advisory only, not required for AA audits). Flag as an advisory finding when AA passes only via the object-recognition or personal-content exception (e.g., "select all crosswalks" CAPTCHA). Passkeys / WebAuthn / biometrics / magic links clear this bar.
199
+
175
200
  ### 3d. Baymard (if e-commerce detected)
176
201
  If `package.json` has stripe/shopify/medusajs/saleor OR routes include /checkout /cart /products: checkout-flow + form-design + filter + PDP checklist (methodology §3).
177
202
 
203
+ ### 3e.0 Phase 0 — CrUX field data (MUST run before 3e synthetic lab audit)
204
+
205
+ Lab numbers (Lighthouse / Playwright / web-vitals injected at audit time)
206
+ are deterministic but reflect a single throttled machine. Google ranks on
207
+ **field** data — Chrome User Experience Report (CrUX), 28-day p75 over real
208
+ users. A site can score 95 in the lab and "Poor" in field due to device
209
+ diversity. Field is authoritative; lab is indicative only.
210
+
211
+ Before the synthetic pass in 3e, fetch CrUX for the origin (and for each
212
+ templated page type if a key is configured):
213
+
214
+ ```bash
215
+ # CrUX API (Chrome UX Report) — requires $CRUX_KEY or PageSpeed Insights key
216
+ curl -s "https://chromeuxreport.googleapis.com/v1/records:queryOrigin?key=$CRUX_KEY" \
217
+ -H 'Content-Type: application/json' \
218
+ -d "{\"origin\":\"<site-origin>\",\"formFactor\":\"PHONE\"}" \
219
+ > "$SESSION_DIR/vitals/crux_origin_mobile.json"
220
+
221
+ curl -s "https://chromeuxreport.googleapis.com/v1/records:queryOrigin?key=$CRUX_KEY" \
222
+ -H 'Content-Type: application/json' \
223
+ -d "{\"origin\":\"<site-origin>\",\"formFactor\":\"DESKTOP\"}" \
224
+ > "$SESSION_DIR/vitals/crux_origin_desktop.json"
225
+
226
+ # Optional: per-URL record (only if URL has sufficient traffic)
227
+ curl -s "https://chromeuxreport.googleapis.com/v1/records:queryRecord?key=$CRUX_KEY" \
228
+ -H 'Content-Type: application/json' \
229
+ -d "{\"url\":\"<full-url>\",\"formFactor\":\"PHONE\"}" \
230
+ > "$SESSION_DIR/vitals/crux_<slug>_mobile.json"
231
+ ```
232
+
233
+ Capture field `p75` for LCP / INP / CLS (and FCP / TTFB when present).
234
+ Outcomes:
235
+ - **CrUX present + sufficient traffic** → field values are the verdict; lab
236
+ values annotate drill-down only.
237
+ - **CrUX absent or insufficient traffic** → record the gap, fall back to
238
+ lab, and tag every performance finding as `source: "lab"`.
239
+
178
240
  ### 3e. Core Web Vitals
179
- Parse `session_dir/vitals/<page>.json`. LCP/INP/CLS/FCP/TTFB/TBT against thresholds (methodology §4). Doherty: interactions <400ms feedback.
241
+ Parse `session_dir/vitals/<page>.json` (lab) AND `crux_*_mobile.json` /
242
+ `crux_*_desktop.json` (field). LCP/INP/CLS/FCP/TTFB/TBT against thresholds
243
+ (methodology §4). Doherty: interactions <400ms feedback.
244
+
245
+ **Tag every performance finding with a `source` field:**
246
+
247
+ ```json
248
+ {
249
+ "rule": "cwv-lcp",
250
+ "source": "lab" | "field" | "both",
251
+ "lab_value_ms": 3200,
252
+ "field_p75_ms": 4100,
253
+ "field_sample": "CrUX 28-day p75, PHONE",
254
+ "verdict": "needs-improvement"
255
+ }
256
+ ```
257
+
258
+ - `source: "both"` when lab + CrUX agree → highest confidence; proceed to fix.
259
+ - `source: "field"` when CrUX fails but lab passes → real users hit it; still real.
260
+ - `source: "lab"` when CrUX is absent / insufficient → note the gap and
261
+ surface as "unverified by field data" in the executive summary.
262
+ - If lab and field disagree by > 30%, file a meta-finding
263
+ (`rule: perf-lab-field-divergence`) with both numbers and `source: "both"`.
180
264
 
181
265
  ### 3f. Implicit criteria (methodology §5)
182
266
  60+ checks: empty/loading/error states, focus restoration after modals, aria-live for toasts, password affordances, autocomplete tokens, touch target spacing, deep linking, back-button in SPAs, scroll restoration, copy-paste tolerance, timeout/offline/5xx, session expiration, i18n edges, print stylesheet. pass/fail/n-a with evidence.
@@ -265,6 +349,37 @@ mobile-native. Run the 21-item checklist verbatim against each mobile page:
265
349
  Each failed item → finding with `rule: mobile-pattern-M<N>`, evidence from
266
350
  Step 2.5 artifacts (NOT a fresh snapshot), `template_id: M<N>`.
267
351
 
352
+ **Real-device vs emulation disclaimer (MANDATORY).** Playwright MCP drives
353
+ Blink/Chromium in a resized viewport — it is NOT real iOS Safari (WebKit),
354
+ Android Chrome on a low-end device, or any in-app WebView. Emulation can
355
+ confirm layout, DOM, a11y tree, tab order, reduced-motion / forced-colors,
356
+ computed CSS. It CANNOT confirm touch haptics, iOS safe-area rendering,
357
+ iOS Safari font rasterization, PWA install/add-to-home-screen, iOS keyboard
358
+ overlap via `visualViewport`, viewport-zoom quirks under pinch, Samsung
359
+ Internet auto-dark, real Pointer Event latency, or hover-only fallbacks on
360
+ real touch (iOS "sticky hover"). See methodology §9 for the full list.
361
+
362
+ Any mobile finding whose verdict would require iOS Safari or Android Chrome
363
+ on a real device to confirm — touch haptics, iOS safe-area insets, PWA
364
+ install, pinch-zoom quirks, `@media (hover: hover)` behavior on real touch,
365
+ payment sheet (Apple Pay / Google Pay), biometrics, push — MUST be tagged
366
+ in the finding JSON as:
367
+
368
+ ```json
369
+ {
370
+ "category": "real-device-required",
371
+ "real_device_required": true,
372
+ "emulation_verdict": "likely_fail | likely_pass | indeterminate",
373
+ "requires": ["ios-safari", "android-chrome"],
374
+ "rationale": "Playwright runs Blink; iOS Safari uses WebKit; cannot confirm X on emulation."
375
+ }
376
+ ```
377
+
378
+ `sd-synthesis` MUST surface a "real-device verification needed" banner at
379
+ the top of the executive summary listing every finding where
380
+ `real_device_required=true`, grouped by `requires` platform, so the human
381
+ reviewer books a BrowserStack / Sauce / LambdaTest session before sign-off.
382
+
268
383
  Cross-reference the competitor component vocabulary from
269
384
  `.cache/evidence/component-comparison.md` — if every competitor uses bottom
270
385
  tabs on mobile and the product uses hamburger-only, density score drops AND
@@ -328,7 +443,11 @@ this to produce the executive DIS summary.
328
443
  document.head.appendChild(s);
329
444
  });
330
445
  window.__axe = await window.axe.run(document, {
331
- runOnly: { type: 'tag', values: ['wcag2a','wcag2aa','wcag21a','wcag21aa','wcag22aa','best-practice'] }
446
+ runOnly: { type: 'tag', values: ['wcag2a','wcag2aa','wcag21a','wcag21aa','wcag22aa','best-practice'] },
447
+ // WCAG 2.2 rules (e.g., focus-not-obscured) ship under axe-core's
448
+ // experimental flag — without this, SC 2.4.11 / 2.5.7 / 2.5.8 etc.
449
+ // simply will NOT execute. Always enable for super-design audits.
450
+ experimental: true
332
451
  });
333
452
  })();
334
453
  ```
@@ -195,6 +195,17 @@ Source of truth: `references/fix-agent-playbook.md` §7.
195
195
  - Swap color used in >5 files → needs_human (too broad for single fix)
196
196
  - Convert design token itself → MEDIUM, escalate
197
197
 
198
+ **Color-space rule (V4 and any new token):** When emitting new color tokens
199
+ (V4 snap-to-nearest and any fresh tokens proposed by V-templates), express
200
+ them in **OKLCH** — the perceptually uniform color space used by modern
201
+ design systems (Tailwind v4, shadcn 2024+, Radix Colors). Hex / RGB are
202
+ accepted ONLY when they match the existing codebase convention (e.g., the
203
+ project's `tokens.css` / `globals.css` already defines all colors as hex).
204
+ Mixing OKLCH tokens into a hex-only codebase requires a separate
205
+ token-migration finding and is `needs_human`. Format:
206
+ `--color-primary-500: oklch(0.65 0.20 265);` (lightness 0-1, chroma 0+,
207
+ hue 0-360).
208
+
198
209
  ## ux templates (U1–U10)
199
210
 
200
211
  | ID | Fix |
@@ -22,8 +22,29 @@ Output: exactly one file `docs/super-design/market-analysis.md` + evidence under
22
22
 
23
23
  3. **Detect niche.** Apply 8-signal scoring (playbook §1). Confidence = top / (top + second). If <0.55, use AskUserQuestion with 3 options from top verticals. Record reasoning to `.cache/evidence/niche.md`.
24
24
 
25
+ **Regulated-niche always-confirm rule.** Regulated niches: compliance-driven design choices override aesthetic preference, so always confirm. If the detected niche falls into any of the following — **fintech, healthtech, legaltech, gambling, crypto, insurance, children's-app** — ALWAYS fire `AskUserQuestion` to confirm niche + regulatory scope even when detector confidence is ≥0.95. These niches carry compliance implications (SOC2, HIPAA, PCI-DSS, GDPR, PSD2, COPPA, KYC/AML, age-gating, disclosure-mandated copy) that design directly affects — getting the niche wrong wastes the audit. Record the confirmation (selected scope, applicable regulations) to `.cache/evidence/niche.md` under `regulatory_scope:`.
26
+
25
27
  4. **Discover competitors.** 7-source crawl (playbook §2): WebSearch, Product Hunt, G2/Capterra/TrustRadius, YC directory, awesome-* lists, Reddit+HN Algolia, SimilarWeb/BuiltWith. Dedupe by domain. Rank fame × similarity × design-signal. Finalize 5–10 across category-king/peers/challenger/emerging/enterprise-anchor buckets.
26
28
 
29
+ **4a. Neumeier insertion test during discovery (per candidate).** For every candidate competitor considered for the final 5–10, apply Neumeier's insertion test (playbook §5.4): *"If this competitor's brand mark were swapped with the project's, would users notice?"* Score each on a 0–5 scale:
30
+
31
+ | Score | Meaning |
32
+ |---|---|
33
+ | 0 | Fully swappable — no brand equity, pure commodity visual language |
34
+ | 1 | Mostly swappable — generic category codes only |
35
+ | 2–3 | Partially distinct — some ownable elements but weak |
36
+ | 4 | Strong distinct identity — clear ownable signals |
37
+ | 5 | Instantly distinct — singular, unmistakable brand mark |
38
+
39
+ Competitors scoring ≤1 are **commodity benchmarks** (show what the category looks like by default); competitors scoring ≥4 **reveal defensible territory** (show what ownable positioning looks like). Include a healthy mix of both. Record the score and one-line justification per competitor in `market-analysis.md` (competitor table) and the per-competitor row in `.cache/evidence/<slug>/component-catalog.md` under a new `Insertion-test score:` field.
40
+
41
+ **4b. Vibe-quadrant final gate (self-check before step 5).** Before moving to step 5, plot the project draft position and each finalized competitor on a 2-axis vibe quadrant:
42
+
43
+ - **X axis:** serious ↔ playful
44
+ - **Y axis:** minimal ↔ expressive
45
+
46
+ If the project lands in the **same quadrant as ≥3 competitors**, surface a warning in `market-analysis.md` under a `## Positioning risk` section — exact text: `crowded quadrant — positioning risk` — and recommend **one axis shift** (per Kapferer prism §4.3 / Aaker dimensions §4.2) that would move the project into a less-occupied quadrant. **Do not auto-decide the shift**; document the warning and the recommended axis for synthesis (step 8) to reconcile with the user. Save the quadrant plot data (project + competitor coordinates) to `.cache/evidence/vibe-quadrant.md`.
47
+
27
48
  5. **Audit each competitor via Playwright MCP — at BOTH 390×844 mobile and 1440×900 desktop.** Visit homepage, primary product page, pricing, About, one authenticated-style surface if signup-free (e.g., docs, app tour). Per playbook §3 PLUS component-level extraction per §3bis below. Save to `.cache/evidence/<slug>/<viewport>/`.
28
49
 
29
50
  ### §3bis. Component-level extraction (mandatory, not optional)
@@ -97,11 +118,37 @@ and sd-fix use to recommend aesthetic direction.
97
118
 
98
119
  6. **Classify each.** Archetype (§4.1), Aaker peak (§4.2), vibe class, NN/g 4D tone (§7.1), hero-pattern.
99
120
 
121
+ **6a. Voice/tone capture — 8–12 copy sample rule (mandatory).** For each competitor, collect **8–12 distinct copy samples** (≥8 minimum; fewer = insufficient signal), one per surface where available:
122
+
123
+ - Hero headline
124
+ - Primary CTA label
125
+ - Error message
126
+ - Empty state
127
+ - 404 page
128
+ - Onboarding step 1
129
+ - Pricing caption / plan blurb
130
+ - Footer blurb
131
+ - ToS / legal excerpt
132
+ - (Optional extras: subhead, feature card, support article opener, confirmation toast)
133
+
134
+ Grade **each sample** on the NN/g 4D tone dimensions (playbook §7.1) using integers in {−1, 0, +1}:
135
+
136
+ - formal ↔ casual
137
+ - funny ↔ serious
138
+ - respectful ↔ irreverent
139
+ - enthusiastic ↔ matter-of-fact
140
+
141
+ Report per-sample scores + verbatim quote + source URL in `.cache/evidence/<slug>/copy-samples.md`, and the **mean + variance** per axis in `market-analysis.md` (tone row per competitor). Healthy brands are constant on voice, variable on tone.
142
+
143
+ **Insufficient-signal rule.** If fewer than 8 distinct samples can be collected (static site, gated app, locale blockers), **do not compute a tone profile** — flag the competitor as `tone-inconclusive` in `market-analysis.md` with a note listing which surfaces were missing. Never fabricate samples or scores to reach the threshold.
144
+
100
145
  7. **Build category-code matrix.** Tabulate dimensions (§5.1). Frequency per column. Classify codes obey/extend/subvert/open (§5.2).
101
146
 
102
- 8. **Synthesize.** Archetype in whitespace via Neumeier insertion test (§5.4). Palette, typography, tone, audience, JTBD. Draft onliness statement.
147
+ 8. **Synthesize.** Archetype in whitespace via Neumeier insertion test (§5.4). Palette, typography, tone, audience, JTBD.
148
+
149
+ 8b. **Three-territories pitch (Q7 — Part 7 of `docs/compass_artifact_wf-2e33af6e-127f-402e-8ce6-cb506fc91b94_text_markdown.md` lines 515–519, 652–653).** Before drafting the onliness statement, produce THREE parallel variants of the design direction — **safe** (conforms to category codes), **expected** (the obvious evolution of category codes), **edgy** (the considered provocation / subversion). Each variant MUST include: palette strip (3–6 tokens), type specimen (primary + optional display), motion character (duration + easing archetype), one-line rationale tying back to archetype + category-code matrix from step 7. Build them in parallel — never serialize, or you will anchor to the first. Save to `.cache/evidence/territories/{safe,expected,edgy}.md` and include a summary table in the brief. The user chooses the primary territory (optionally stealing one detail from another) BEFORE the onliness statement lands. "Presenting one direction looks like opinion; presenting three looks like strategy" (artifact line 519).
103
150
 
104
- 9. **Write `market-analysis.md`** per playbook §8 schema.
151
+ 9. **Draft onliness statement** against the chosen territory, then **write `market-analysis.md`** per playbook §8 schema (include the three-territories summary + chosen primary).
105
152
 
106
153
  10. **Self-check.** Fix gaps before returning.
107
154
 
@@ -8,7 +8,7 @@ description: >
8
8
  UX audit (WCAG 2.2 AA, Nielsen heuristics, Baymard, CWV), and synthesized
9
9
  overview. Re-audits only what changed since last run. On explicit user request,
10
10
  applies surgical fixes with full rollback.
11
- version: 0.6.0
11
+ version: 0.6.1
12
12
  ---
13
13
 
14
14
  # super-design
@@ -86,9 +86,14 @@ Pass findings via files under `.super-design/sessions/<id>/`, not chat.
86
86
 
87
87
  ### Step 4: Write state + history
88
88
 
89
- - Atomic write `.audit-state.json` (.tmp then rename).
89
+ - Atomic write `.audit-state.json` via `scripts/write-state.sh` (takes JSON
90
+ on stdin, writes `.tmp`, validates with `jq`, then renames). Do NOT write
91
+ the state file directly.
90
92
  - Append session to `audit-history.md`.
91
93
  - `git notes --ref=super-design add -f -m <json> HEAD`.
94
+ - First-time notes setup (run once per clone, also in `setup-git-notes.sh`
95
+ if you extract it): `git config --add remote.origin.fetch '+refs/notes/super-design/*:refs/notes/super-design/*'`
96
+ — without this, notes don't round-trip across clones (artifact §7).
92
97
 
93
98
  ### Step 5: Return summary (≤5 sentences)
94
99
 
@@ -103,6 +108,7 @@ Do NOT paste overview into chat.
103
108
  - `--fix` — run sd-fix after audit
104
109
  - `--dry-run` — artifacts without committing state
105
110
  - `--ci` — non-interactive, create PR, exit non-zero on blockers
111
+ - `--update-baselines` — Re-hash pages and tokens without re-auditing (use after accepted cosmetic drift)
106
112
 
107
113
  ## References (Read on demand)
108
114
 
@@ -187,6 +187,66 @@ The 2026 WebAIM Million report (released ~30 March 2026, https://webaim.org/proj
187
187
 
188
188
  **What automation CANNOT catch** (the manual 40–70%): keyboard trap detection beyond trivial cases; screen-reader announcement quality; meaningful alt text (tools detect missing, not quality); logical focus order when CSS reorders; error message clarity; appropriate heading structure (presence vs meaning); context-appropriate link text; video caption accuracy/sync; color meaning (1.4.1); reading order under CSS transforms; reflow at 320px (no tool); text-spacing override; focus-indicator quality (2.4.11 / 2.4.13); `prefers-reduced-motion` honoring; target size / spacing (partial); autocomplete correctness; captcha alternatives; content-on-hover persistence (1.4.13); page title descriptiveness; consistent navigation/identification (3.2.3 / 3.2.4).
189
189
 
190
+ ### 2.5 Tool triangulation for automated a11y (MANDATORY)
191
+
192
+ axe-core alone catches only ~57% of WCAG issues by volume (Deque's own
193
+ coverage study) and ~40% by Success Criterion count. Running a single
194
+ engine guarantees blind spots. sd-audit MUST run axe **plus** at least two
195
+ more engines and dedupe the merged results by `{rule, selector}`.
196
+
197
+ **Engines to run in parallel (don't serialize — they're cheap):**
198
+
199
+ ```bash
200
+ # 1. axe-core — inject via Playwright MCP (already done in sd-audit Step 2,
201
+ # with `experimental: true` so WCAG 2.2 rules fire).
202
+
203
+ # 2. Pa11y — runs both htmlcs and axe runners for broader coverage
204
+ pa11y "$URL" \
205
+ --standard WCAG2AA \
206
+ --runner axe \
207
+ --runner htmlcs \
208
+ --reporter json \
209
+ > "$SESSION_DIR/a11y/pa11y_<slug>_<vp>.json"
210
+
211
+ # 3. WAVE API — WebAIM's detector, overlay-oriented, strong on contrast and ARIA
212
+ curl -s "https://wave.webaim.org/api/request?key=$WAVE_KEY&url=$(printf %s "$URL" | jq -sRr @uri)&reporttype=4" \
213
+ > "$SESSION_DIR/a11y/wave_<slug>_<vp>.json"
214
+
215
+ # 4. IBM Equal Access Accessibility Checker — ACT-rule aligned, ~140 rules
216
+ achecker "$URL" \
217
+ --reportLevel violation,potentialviolation \
218
+ --outputFormat json \
219
+ --outputFolder "$SESSION_DIR/a11y/achecker_<slug>_<vp>/"
220
+ ```
221
+
222
+ **Dedup and severity normalization:**
223
+
224
+ ```js
225
+ // Pseudocode — run in sd-audit Step 3a after parsing all engine outputs.
226
+ const key = (f) => `${f.rule}::${f.selector}`;
227
+ const bucket = new Map();
228
+ for (const engine of ['axe', 'pa11y', 'wave', 'achecker']) {
229
+ for (const f of findings[engine]) {
230
+ const k = key(f);
231
+ if (!bucket.has(k)) bucket.set(k, { ...f, engines: [] });
232
+ bucket.get(k).engines.push(engine);
233
+ }
234
+ }
235
+ // Confidence ladder:
236
+ // engines.length === 1 → single-source, keep but tag "unverified"
237
+ // engines.length >= 2 → triangulated, high confidence
238
+ // engines.length >= 3 → near-certain, promote severity +1 step
239
+ ```
240
+
241
+ **Why each engine is non-redundant:**
242
+ - **axe-core** — strongest on `4.1.2`, `1.4.3`, `1.1.1`, `1.3.1` subsets; experimental flag gates 2.2.
243
+ - **Pa11y** — htmlcs runner catches rule shapes axe doesn't ship; dual-runner mode is unique.
244
+ - **WAVE** — contrast slider tool + in-context overlay; different heuristics for ARIA landmarks; never marks a page "passed" (useful bias — never false-clean).
245
+ - **IBM Equal Access** — ACT-Rules aligned (different normative base than axe); baseline-file regression; severity categories (`violation / potentialviolation / recommendation / manual`) map onto Nielsen 0–4.
246
+
247
+ Record per-finding `source_engines: [...]` so sd-synthesis can promote
248
+ triangulated issues and down-weight single-engine noise.
249
+
190
250
  ### 2.4 Regulatory context
191
251
 
192
252
  **European Accessibility Act** (Directive (EU) 2019/882) enforceable **28 June 2025** for new products/services; 28 June 2030 deadline for pre-existing. References **EN 301 549** (currently WCAG 2.1 AA baseline; being updated to 2.2). Extraterritorial.
@@ -201,6 +261,64 @@ The 2026 WebAIM Million report (released ~30 March 2026, https://webaim.org/proj
201
261
 
202
262
  Baymard (Copenhagen, founded ~2009) is used by 71% of Fortune 500 e-commerce companies. Research: 54 rounds of benchmarking, 327 top-grossing US/EU sites, 275,000+ manually assigned UX performance scores, 200,000+ research hours. Methodology page: https://baymard.com/research/methodology. Headline finding: the average large e-commerce site can gain **+35.26% conversion** through better checkout design — **$260B** in recoverable US+EU orders.
203
263
 
264
+ ### 3.0 Baymard sub-rule enumeration (finding code prefixes)
265
+
266
+ The single blanket "Baymard" verdict is too coarse for a structured audit.
267
+ Baymard organises findings by **surface** (checkout, PDP, search, etc.) and
268
+ each surface has a bounded, enumerable set of sub-rules. Every Baymard
269
+ finding raised by sd-audit MUST use one of the prefixes below so
270
+ sd-synthesis can group them and so the fix-playbook can route to the right
271
+ U-template. Rules that do not yet have an official Baymard number are
272
+ prefixed by count only (e.g. `baymard-cc-01` … `baymard-cc-14`) with the
273
+ source section cited.
274
+
275
+ | Surface | Prefix | Count | Source (methodology §) |
276
+ |---|---|---|---|
277
+ | Credit card form | `baymard-cc-<NN>` | 14 rules | §3.3 + https://baymard.com/checkout-usability/credit-card-patterns |
278
+ | Address form | `baymard-addr-<NN>` | 8 rules | §3.4 + https://baymard.com/blog/address-line-2 + https://baymard.com/blog/automatic-address-lookup |
279
+ | Search | `baymard-search-<NN>` | 12 rules | §3.6 + https://baymard.com/ecommerce-design-examples/34-autocomplete-suggestions |
280
+ | Filter | `baymard-filter-<NN>` | 10 rules | §3.6 + https://baymard.com/blog/promoting-product-filters + https://baymard.com/blog/have-filters-for-list-item-info |
281
+ | Breadcrumbs | `baymard-bread-<NN>` | 6 rules | §3.7 + https://baymard.com/blog/ecommerce-breadcrumbs |
282
+ | PDP / Product Detail | `baymard-pdp-<NN>` | 18 rules | §3.8 + https://baymard.com/research/product-page |
283
+
284
+ **`baymard-cc-*` — Credit card form (14 rules, §3.3):**
285
+ 1. `baymard-cc-01` auto-format spaces per card brand (4-4-4-4 / 4-6-5 AMEX / 4-4-4-4-3 19-digit)
286
+ 2. `baymard-cc-02` auto-detect card type via IIN ranges (adapt length limit, CVV help, spacing)
287
+ 3. `baymard-cc-03` expiration as single MM/YY field with auto-inserted slash (NOT YYYY, NOT two dropdowns)
288
+ 4. `baymard-cc-04` CVV field with adaptive help image (3-digit back for Visa/MC, 4-digit front for AMEX)
289
+ 5. `baymard-cc-05` autocomplete tokens present: `cc-number`, `cc-name`, `cc-exp`, `cc-exp-month`, `cc-exp-year`, `cc-csc`
290
+ 6. `baymard-cc-06` `inputmode="numeric"` (never `type="number"`) on card number, expiration, CVV
291
+ 7. `baymard-cc-07` never clear CC number/CVV on validation error (preserve entered data)
292
+ 8. `baymard-cc-08` stored-card edit uses "fake editing" (delete + re-add) per PCI; clear messaging
293
+ 9. `baymard-cc-09` inline validation with 5%-abandonment guardrail (no vague "Card declined" without reason)
294
+ 10. `baymard-cc-10` fallback entry path when wallet (Apple Pay / Google Pay) fails
295
+ 11. `baymard-cc-11` accept pasted numbers; strip spaces/dashes server+client
296
+ 12. `baymard-cc-12` surface accepted card brands near the input (logo strip) before submit
297
+ 13. `baymard-cc-13` billing-address reuse: "Billing same as shipping" default-checked with editable prefill
298
+ 14. `baymard-cc-14` error messages identify which field failed + how to fix (not "Payment failed")
299
+
300
+ > Source: §3.3 bullets + https://baymard.com/checkout-usability/credit-card-patterns. Rules 10–14 derived from §3.3 narrative; number them in this table and supersede with Baymard's official IDs when available.
301
+
302
+ **`baymard-addr-*` — Address form (8 rules, §3.4):**
303
+ 1. `baymard-addr-01` country selector FIRST (drives all subsequent field formats/validation)
304
+ 2. `baymard-addr-02` single "Address Line 1" + optional "Address Line 2" labeled "Apt, Suite — optional" (never omit Line 2)
305
+ 3. `baymard-addr-03` automatic address autocomplete preferred (9% manual-entry typo rate without it)
306
+ 4. `baymard-addr-04` postal-code autodetect of city/state (28% mobile sites fail)
307
+ 5. `baymard-addr-05` US state as dropdown (not free text); UK optional county; hide for countries without subdivisions
308
+ 6. `baymard-addr-06` autocomplete tokens: `street-address`, `address-line1`, `address-line2`, `address-level1/2`, `postal-code`, `country`
309
+ 7. `baymard-addr-07` "Billing same as shipping" default-checked with editable prefilled fields (also WCAG 3.3.7 Redundant Entry)
310
+ 8. `baymard-addr-08` name inputs accept Unicode, hyphens, apostrophes, single-name users, >20 chars
311
+
312
+ > Source: §3.4.
313
+
314
+ **`baymard-search-*` — Search (12 rules, §3.6):** placeholders for 12 rules covering autocomplete presence, scope suggestions in autocomplete, search-within-current-category, autodirect on category match, query-term pluralization tolerance, typo tolerance, "no results" with recovery options, recent-searches memory, sort-vs-filter separation, faceted-search state in URL, submit-without-suggestion-selection, voice search on mobile. Rules `baymard-search-01` … `baymard-search-12`. Source: §3.6 bullets + https://baymard.com/ecommerce-design-examples/34-autocomplete-suggestions. Enumerate the exact wording when next Baymard PDF is purchased.
315
+
316
+ **`baymard-filter-*` — Filter (10 rules, §3.6):** placeholders `baymard-filter-01` … `baymard-filter-10` covering: promote top filters above the product grid; truncate long value lists >10 with styled "More" link; category-specific filters (megapixels, temperature rating); filters for every attribute displayed in list items; expand/collapse icons right-aligned; applied-filter pills visible and individually removable; "clear all" affordance; range sliders with keyboard + numeric input; multi-select affordance obvious; result count live-updated via `aria-live`. Source: §3.6 bullets + https://baymard.com/blog/promoting-product-filters + https://baymard.com/blog/have-filters-for-list-item-info.
317
+
318
+ **`baymard-bread-*` — Breadcrumbs (6 rules, §3.7):** placeholders `baymard-bread-01` … `baymard-bread-06` covering: present on all non-home pages; implement BOTH hierarchy-based AND history-based (68% of top 50 sub-par, 45% only hierarchy, 23% none); present on mobile (65% mobile fail); no "hidden in more" collapse on desktop without reveal; last crumb non-clickable; structured-data markup (`BreadcrumbList`). Source: §3.7 + https://baymard.com/blog/ecommerce-breadcrumbs.
319
+
320
+ **`baymard-pdp-*` — PDP / Product Detail (18 rules, §3.8):** placeholders `baymard-pdp-01` … `baymard-pdp-18` covering: single dominant Add-to-Cart (no 3–6 competing colorful buttons); shipping cost/ETA visible on PDP (64% of users look for it); total visible before checkout (24% abandon otherwise); accordion over horizontal tabs (67% of accordion users mis-implement; 28% use worst-performing tabs); UGC visuals present (67% lack them); minimum 3–5 images + zoom + variant-driven imagery; swatches not dropdowns for variants; out-of-stock variants visible-but-disabled (not removed); star ratings above the fold (up to +18% conversion with verified badges); stock urgency without dark patterns; delivery-date estimator; returns policy on PDP; size guide inline (not in a separate page); Q&A / reviews with filtering; cross-sell without shoving below CTA; "notify me" flow for OOS; price history for discount honesty; country/currency switcher persisted. Source: §3.8 + https://baymard.com/research/product-page.
321
+
204
322
  ### 3.1 Cart abandonment
205
323
  **70.19% average abandonment** across 49 studies 2006–2023, range 55–84.27% (https://baymard.com/lists/cart-abandonment-rate). By device: mobile 77.06%, tablet 66.39%, desktop 70.01%. **Reasons** (excluding 43% "just browsing"): extra costs 48%; forced account creation 24%; slow delivery 19%; distrust with CC 18–19%; too long/complicated 17–18%; couldn't see total up front 16%; errors/crashes 13%; returns policy 12%; declined CC 9%; limited payment methods 7%.
206
324
 
@@ -29,6 +29,8 @@ A score without evidence is invalid. Auditor records `n/a` instead of guessing.
29
29
 
30
30
  ## Category 1 — Visual hierarchy (weight 1.0)
31
31
 
32
+ **Rationale:** Reber et al. processing fluency + Tractinsky aesthetic-usability — clean dominance hierarchies literally reduce cognitive load; beauty buys friction tolerance only after hierarchy is solved (artifact Parts 1–3).
33
+
32
34
  **Question:** On this view, what is the single primary goal? Is it the most
33
35
  dominant element visually?
34
36
 
@@ -47,6 +49,8 @@ dominant element visually?
47
49
 
48
50
  ## Category 2 — Density calibration per viewport (weight 1.2)
49
51
 
52
+ **Rationale:** Fitts's Law + thumbzone ergonomics — density must respect the physical reach envelope of the device; cramming desktop density onto mobile violates motor cost.
53
+
50
54
  **Question:** Does information density match the device context?
51
55
 
52
56
  | Viewport | Expected primary entities visible above fold |
@@ -68,6 +72,8 @@ dominant element visually?
68
72
 
69
73
  ## Category 3 — Consistency: spacing scale (weight 0.8)
70
74
 
75
+ **Rationale:** Gestalt proximity + rhythm — shared spacing units fuse related elements and separate distinct ones; arbitrary magic numbers break grouping perception.
76
+
71
77
  **Question:** Do paddings, margins, gaps come from a scale (4/8px or 0.25rem) or are they arbitrary magic numbers?
72
78
 
73
79
  **Detect:**
@@ -86,12 +92,16 @@ dominant element visually?
86
92
 
87
93
  ## Category 4 — Consistency: typography scale (weight 0.8)
88
94
 
95
+ **Rationale:** Processing fluency (Reber) — a discrete type scale accelerates recognition; a shapeless set of sizes forces re-parsing of hierarchy on every screen.
96
+
89
97
  Same method for font-size, font-weight, line-height. Look for `text-\[\d+px\]` and arbitrary font-size. Expected: 6–10 sizes total in a designed system; 30+ sizes = vibecoded.
90
98
 
91
99
  ---
92
100
 
93
101
  ## Category 5 — Consistency: color palette (weight 0.8)
94
102
 
103
+ **Rationale:** Valdez & Mehrabian (1994) — saturation × value drive emotional response more than hue; a disciplined palette with controlled lightness/saturation ranges is the single highest-leverage decision for perceived quality (artifact line 43).
104
+
95
105
  **Detect:**
96
106
  - Collect computed `color`, `background-color`, `border-color` from ≥100 elements.
97
107
  - Unique colors count. <15 = disciplined. 30+ = vibecoded.
@@ -108,6 +118,8 @@ Same method for font-size, font-weight, line-height. Look for `text-\[\d+px\]` a
108
118
 
109
119
  ## Category 6 — Whitespace & breathing room (weight 0.7)
110
120
 
121
+ **Rationale:** *Ma* (間) — negative space is substance, not absence; whitespace signals confidence and lets figure/ground perception resolve without strain.
122
+
111
123
  **Question:** Does content have room to breathe, or is it crammed?
112
124
 
113
125
  **Detect:** Compute average `padding-inline + margin-inline` per content block. Compare to container width. Measure content-to-chrome ratio.
@@ -123,6 +135,8 @@ Same method for font-size, font-weight, line-height. Look for `text-\[\d+px\]` a
123
135
 
124
136
  ## Category 7 — Text legibility (weight 1.2)
125
137
 
138
+ **Rationale:** Miller 4±1 (Cowan 2001) + Postel's robustness — legible bodies keep working-memory cost low; dense microtext forces re-reading and exhausts the 4-chunk budget that forms and scannable text depend on.
139
+
126
140
  **Detect:** `browser_evaluate` → collect `fontSize` computed px. Find minimum across visible text.
127
141
 
128
142
  | Viewport | Min body | Min meta | Min input |
@@ -142,6 +156,8 @@ Same method for font-size, font-weight, line-height. Look for `text-\[\d+px\]` a
142
156
 
143
157
  ## Category 8 — CTA hierarchy (weight 1.0)
144
158
 
159
+ **Rationale:** Hick-Hyman Law — decision time grows with the log of equally-weighted options; multiple competing primaries flatten hierarchy into a choose-your-adventure and cost measurable conversion (Baymard PDP data).
160
+
145
161
  **Question:** Is there ONE primary CTA per view?
146
162
 
147
163
  **Detect:** Count buttons with `variant=default | primary | filled` OR bg-primary class. >1 above fold = competing.
@@ -159,6 +175,8 @@ Reference: Baymard PDP — 51% of e-commerce pages fail due to competing CTAs.
159
175
 
160
176
  ## Category 9 — State coverage (weight 1.1)
161
177
 
178
+ **Rationale:** Norman's "make system state visible" (Seven Stages of Action) + Nielsen H1 visibility-of-system-status — missing loading/empty/error states break the user's feedback loop and strand them in uncertainty.
179
+
162
180
  Per page, does the UI handle: default / loading / empty / error / success?
163
181
 
164
182
  **Detect per scenario:**
@@ -177,37 +195,89 @@ Per page, does the UI handle: default / loading / empty / error / success?
177
195
 
178
196
  ## Category 10 — Touch targets (mobile only, weight 1.0)
179
197
 
180
- **Detect:** `browser_evaluate` get `getBoundingClientRect` of every clickable. Count targets < 44×44 px.
198
+ **Rationale:** Fitts's Law (MT = a + b·log₂(2D/W)) acquisition time scales inversely with target width; sub-44px targets on fingers multiply error rate and exhaust motor patience.
199
+
200
+ **Detect:** `browser_evaluate` → get `getBoundingClientRect` of every
201
+ clickable (buttons, links, `[role=button|link|tab|checkbox|radio|switch]`,
202
+ `<input>`, `<select>`, `<summary>`, anchors with click handlers). Record
203
+ the smaller of `width × height` per target.
204
+
205
+ ### Spec reconciliation
206
+
207
+ Three conflicting specs define "how big a touch target should be":
208
+
209
+ | Spec | Size | Nature | Citation |
210
+ |---|---|---|---|
211
+ | **WCAG 2.5.8 Target Size (Minimum) — AA** | **24 × 24 CSS px** | Baseline (legal/accessibility floor, with spacing exception) | https://www.w3.org/WAI/WCAG22/Understanding/target-size-minimum |
212
+ | **Apple Human Interface Guidelines** | **44 × 44 pt** | Platform-native target (iOS) | https://developer.apple.com/design/human-interface-guidelines/accessibility#Interactivity |
213
+ | **Material Design (Android)** | **48 × 48 dp** | Platform-native target (Android) | https://m3.material.io/foundations/accessible-design/accessibility-basics |
214
+ | **WCAG 2.5.5 Target Size (Enhanced) — AAA** | 44 × 44 CSS px | Advisory ceiling | https://www.w3.org/WAI/WCAG21/Understanding/target-size |
215
+
216
+ sd-audit reconciles as follows:
217
+ - **Baseline = 24 × 24 CSS px** — WCAG 2.5.8 AA pass; legally sufficient
218
+ (with 24px center-to-center spacing exception).
219
+ - **Target = 44 × 44 CSS px** — HIG / Material / WCAG AAA; the single
220
+ pragmatic "platform-native" size across iOS + Android + web (Android
221
+ 48 dp ≈ 44 CSS px at default DPI).
222
+ - Spacing exception keeps sub-44 icons compliant with WCAG AA but does NOT
223
+ earn full design-intelligence points — they still feel cramped on a
224
+ phone, which is what Fitts's Law above predicts.
225
+
226
+ ### Scoring ladder
227
+
228
+ Per-target classification:
229
+ - **Full points** (≥ 44 × 44 CSS px) — platform-native, HIG/Material-clean.
230
+ - **Half points** (24 – 43 CSS px, min dimension) — WCAG AA pass but
231
+ sub-optimal; counts as half a compliant target.
232
+ - **Zero points** (< 24 CSS px, min dimension) — WCAG AA FAIL; raises a
233
+ separate `a11y-wcag22-2.5.8` finding in addition to pulling this score.
234
+
235
+ Let `N` = total targets, `n44` = count ≥ 44×44, `n24` = count in
236
+ [24, 44), `n0` = count < 24. Compute
237
+ `compliance = (n44 + 0.5 × n24) / N`.
181
238
 
182
239
  | Score | Criteria |
183
240
  |---|---|
184
- | 10 | 100% targets 44×44 OR 24×24 with 8px+ gap |
185
- | 7 | 80–99% compliant |
186
- | 4 | 5080% compliant (common: icon-only buttons) |
187
- | 0 | Widespread <24px targets |
241
+ | 10 | `compliance0.95` AND `n0 == 0` essentially all targets ≥ 44 |
242
+ | 7 | `0.80 ≤ compliance < 0.95` AND `n0 == 0` — some half-credit (2443 px) targets, none under 24 |
243
+ | 4 | `0.50 ≤ compliance < 0.80` OR `0 < n0 ≤ 2` — common icon-only button fail; any WCAG breach |
244
+ | 0 | `compliance < 0.50` OR `n0 ≥ 3` — widespread <24 px targets, structural problem |
245
+
246
+ Any `n0 > 0` ALWAYS also raises a separate finding with prefix
247
+ `a11y-wcag22-2.5.8` (the rubric scores design intelligence; the finding
248
+ records the legal breach).
188
249
 
189
250
  ---
190
251
 
191
- ## Category 11 — Motion & feedback (weight 0.6)
252
+ ## Category 11 — Motion & feedback / perceived performance (weight 0.6)
192
253
 
193
- **Question:** Do interactions give feedback? Are animations tasteful and respect `prefers-reduced-motion`?
254
+ **Rationale:** Doherty threshold (Doherty & Thadhani, IBM 1982) system response <400 ms sustains flow; above that, perceived unresponsiveness begins. Paired with INP (Core Web Vitals) for the measurable proxy.
255
+
256
+ **Question:** Do interactions give feedback? Are animations tasteful and respect `prefers-reduced-motion`? Does every interaction land within the Doherty ceiling?
194
257
 
195
258
  **Detect:**
196
259
  - `browser_evaluate` with `matchMedia('(prefers-reduced-motion: reduce)')` + check for `transition` / `animation` on interactive elements.
197
260
  - Missing hover/focus feedback on buttons = major fail.
198
261
  - >3s animations = excessive.
199
262
 
263
+ **Perceived-performance sub-criterion (Doherty 400 ms ceiling, alongside INP).**
264
+ - Parse `session_dir/vitals/<page>.json` for INP (from `web-vitals@5` attribution build).
265
+ - For each primary interaction (Step 2.5 Phase A enumeration — clicks on CTA, form submit, nav link, combobox, modal trigger), compute **end-to-end response time** = click → visual feedback (spinner / state change / new pixels painted), not just INP.
266
+ - Fail rule: an interaction that **passes INP** (≤ 200 ms rating "good") but whose **user-perceivable response exceeds 400 ms** (e.g., INP fires at 150 ms but the resulting navigation/paint lands at 900 ms with no intermediate skeleton/optimistic UI) **penalizes C11**. Doherty is the ceiling; INP is the low-floor subset. Apply the Nielsen 0.1 s / 1 s / 10 s progress rule for anything over 400 ms (skeleton, optimistic UI, determinate bar + ETA + cancel for >10 s).
267
+
200
268
  | Score | Criteria |
201
269
  |---|---|
202
- | 10 | Hover/focus/active feedback everywhere; animations ≤300ms; reduced-motion respected |
203
- | 7 | Most interactions feedback; reduced-motion partial |
204
- | 4 | Some interactions static; reduced-motion ignored |
205
- | 0 | No hover/focus feedback at all OR autoplay video + parallax with no disable |
270
+ | 10 | Hover/focus/active feedback everywhere; animations ≤300 ms; reduced-motion respected; every interaction under Doherty 400 ms OR shows skeleton/optimistic state |
271
+ | 7 | Most interactions feedback; reduced-motion partial; occasional >400 ms interaction without feedback |
272
+ | 4 | Some interactions static; reduced-motion ignored; multiple interactions cross Doherty with no intermediate state |
273
+ | 0 | No hover/focus feedback at all OR autoplay video + parallax with no disable OR interactions routinely exceed 400 ms with blank waits |
206
274
 
207
275
  ---
208
276
 
209
277
  ## Category 12 — Nav pattern matches platform (weight 1.0)
210
278
 
279
+ **Rationale:** Fitts's Law + Hick-Hyman Law — nav patterns succeed when they minimize both motor cost (thumbzone/edge placement) and choice cost (limited top-level destinations, chunked per Miller 4±1).
280
+
211
281
  | Viewport | Expected nav |
212
282
  |---|---|
213
283
  | Mobile (≤768) | Bottom tab bar (3–5), full-screen menus, gesture back |
@@ -226,6 +296,9 @@ Per page, does the UI handle: default / loading / empty / error / success?
226
296
 
227
297
  ## Category 13 — Table-on-mobile detection (weight 1.2, mobile only)
228
298
 
299
+ **Rationale:** Platform affordance + thumbzone — desktop tables violate mobile reading models (microtext, horizontal overflow, no visible sort); transformation to card/list is the minimum cost to preserve parse-ability.
300
+
301
+
229
302
  **Detect:** At ≤768px, find `<table>` with >3 visible columns OR `display: table` containers with horizontal scroll AND text < 13px.
230
303
 
231
304
  | Score | Criteria |
@@ -240,6 +313,8 @@ Per page, does the UI handle: default / loading / empty / error / success?
240
313
 
241
314
  ## Category 14 — Modal/sheet appropriateness (weight 0.8)
242
315
 
316
+ **Rationale:** Fitts's Law + thumbzone — on mobile, close affordances belong where the thumb lives; centered dialogs with top-right dismiss violate reach on phones and strand users in forced-modal states.
317
+
243
318
  | Viewport | Expected modal pattern |
244
319
  |---|---|
245
320
  | Mobile | Bottom sheet (slide-up) or full-screen with close top-left |
@@ -258,6 +333,8 @@ Per page, does the UI handle: default / loading / empty / error / success?
258
333
 
259
334
  ## Category 15 — Color semantics (weight 0.6)
260
335
 
336
+ **Rationale:** Jakob's Law (users spend most time on other products) + learned convention — red/green/amber mappings are pre-installed in users' mental models; using them decoratively forces re-learning and breaks status recognition at a glance.
337
+
261
338
  **Detect:** Collect colors used on: error messages, success states, warnings, info. Red = error? Green = success? Or decorative-only?
262
339
 
263
340
  | Score | Criteria |
@@ -270,6 +347,8 @@ Per page, does the UI handle: default / loading / empty / error / success?
270
347
 
271
348
  ## Category 16 — Design-system coherence (weight 1.1)
272
349
 
350
+ **Rationale:** Tesler's Law (conservation of complexity) + von Neumann consistency — complexity does not disappear, it moves; a disciplined system absorbs variation once inside tokens/variants/primitives so every downstream surface stays predictable. Incoherent systems push the same complexity onto users (re-learning each screen) and onto engineers (ad-hoc classes per component). This is why C16 carries one of the highest weights: coherence is not polish, it is the mechanism that conserves attention.
351
+
273
352
  **The meta-category.** Does the app LOOK like it was designed by one team with one vision? Or does it look like a collection of shadcn defaults?
274
353
 
275
354
  **Detect (aesthetic signal):**
@@ -290,6 +369,8 @@ session. See `design-skills-catalog.md`.
290
369
 
291
370
  ## Category 17 — Vibecode detection (weight 1.0)
292
371
 
372
+ **Rationale:** Norman's reflective layer (Emotional Design, 2004 — artifact line 547) — vibecoded surfaces pass the visceral/behavioral layers but fail reflective judgment; code that reads as "hand-assembled divs" telegraphs lack of intentional system, which is exactly what distinguishes "designed" from "vibecoded" output.
373
+
293
374
  **Question:** Does the code follow patterns (components, variants, tokens)
294
375
  or is it hand-assembled divs with inline styles?
295
376
 
@@ -62,6 +62,37 @@ it follows the skill's specific tokens + patterns instead of defaults.
62
62
  | Editorial / blog | Readable long-form | `typeui-paper` |
63
63
  | Feature-grid homepage | Modular showcase | `typeui-bento` |
64
64
 
65
+ ### Vibe → typeui skill (primary → fallback)
66
+
67
+ Covers every vibe enumerated in the artifact Part 4 (12-vibe vocabulary). The
68
+ primary skill carries the aesthetic; the fallback handles adjacent contexts or
69
+ fills gaps when the primary would over-commit. When a project vibe has no
70
+ single-perfect skill (e.g. Premium/luxury, Warm/organic), the fallback plus
71
+ `/frontend-design` is the intended path.
72
+
73
+ | Part-4 vibe | Primary skill | Fallback | Notes |
74
+ |---|---|---|---|
75
+ | Minimal / clean | `typeui-clean` | `typeui-application` | Default pick for pre-launch marketing and "honest SaaS". |
76
+ | Bold / confident | `typeui-bold` | `typeui-dramatic` | Challenger brands, consumer launches. |
77
+ | Playful / friendly | `typeui-doodle` | `typeui-artistic` | Education, kids, creative tools. |
78
+ | Serious / professional (B2B) | `typeui-enterprise` | `typeui-ant` | Procurement-facing, compliance. |
79
+ | Technical / data-dense (SaaS admin) | `typeui-dashboard` | `typeui-application` | Dark-theme analytics, operator consoles. |
80
+ | Editorial / reading | `typeui-paper` | `typeui-clean` | Long-form content, publications. |
81
+ | Modular / showcase | `typeui-bento` | `typeui-application` | Feature grids, portfolios. |
82
+ | Expressive / artistic | `typeui-artistic` | `typeui-dramatic` | Design tools, non-enterprise vibe-forward. |
83
+ | Raw / statement (neobrutalism) | `typeui-neobrutalism` | `typeui-bold` | Gen-Z, indie, deliberate rule-breaking. |
84
+ | Premium / luxury | `typeui-dramatic` | `typeui-paper` | No dedicated luxury skill — combine dramatic hero with paper's typographic restraint, then commission custom tokens via `/frontend-design`. |
85
+ | Tech / cyberpunk | `typeui-dashboard` | `typeui-bold` | Dashboard dark base + bold accent/glow; extend via `/frontend-design` for neon/chromatic detail. |
86
+ | Warm / organic | `typeui-paper` | `typeui-doodle` | Paper carries the warmth via texture + typographic rhythm; doodle adds hand-made detail for craft brands. |
87
+ | Retro / nostalgic | `typeui-paper` | `typeui-doodle` | Paper's print-era cues fit mid-century/editorial retro; doodle for 90s/zine nostalgia. `/frontend-design` required for period-specific palettes. |
88
+ | Dark / cinematic | `typeui-dramatic` | `typeui-dashboard` | Dramatic for narrative hero surfaces; dashboard for operator/app surfaces that must stay dark through the product. |
89
+
90
+ Read this table as: "if the positioning brief (sd-research §4) lands on vibe X,
91
+ sd-audit/sd-fix should recommend the **primary** skill first; if the project
92
+ has constraints that rule it out (e.g. already on a light palette), fall back
93
+ to the secondary; if both are partial, log a non-blocking advisory that
94
+ `/frontend-design` is required to finish the aesthetic."
95
+
65
96
  ## Recommending a skill in a finding
66
97
 
67
98
  When `design-intelligence.categories.design_system_coherence.score ≤ 4`,
@@ -1,30 +1,122 @@
1
1
  #!/usr/bin/env bash
2
2
  # Usage: detect-changes.sh <last_sha> [<last_iso>]
3
3
  # Emits JSON: { mode, range_start, commits, files, classified }
4
+ #
5
+ # Implements the fallback ladder from artifact §6 + §14:
6
+ # 1. Anchor exists → diff with rename detection (-M90%).
7
+ # 2. Anchor missing + shallow repo → `git fetch --unshallow --no-tags`.
8
+ # 3. Still missing → fetch super-design git notes, retry.
9
+ # 4. Still missing → `--since=<iso>` time-based fallback.
10
+ # 5. Empty repo / no last_sha at all → diff against empty-tree SHA
11
+ # 4b825dc642cb6eb9a060e54bf8d69288fbee4904 (artifact line 900).
12
+ # Also uses --first-parent + --cherry-pick --right-only when walking
13
+ # history (artifact §2.5 line 167-172).
4
14
  set -euo pipefail
5
15
 
6
16
  LAST_SHA="${1:-}"
7
17
  LAST_ISO="${2:-}"
18
+ EMPTY_TREE="4b825dc642cb6eb9a060e54bf8d69288fbee4904"
19
+
20
+ log() { printf '[detect-changes] %s\n' "$*" >&2; }
8
21
 
9
- if [[ -z "$LAST_SHA" ]]; then echo '{"error":"missing last_sha"}'; exit 2; fi
10
22
  if ! git rev-parse --git-dir >/dev/null 2>&1; then echo '{"error":"not-a-git-repo"}'; exit 3; fi
11
23
 
12
- if git rev-parse --verify --quiet "${LAST_SHA}^{commit}" >/dev/null; then
13
- if git merge-base --is-ancestor "$LAST_SHA" HEAD; then RANGE_START="$LAST_SHA"
14
- else RANGE_START="$(git merge-base HEAD "$LAST_SHA" 2>/dev/null || echo "")"; fi
15
- else RANGE_START=""; fi
16
-
17
- if [[ -z "$RANGE_START" ]]; then
18
- if [[ -n "$LAST_ISO" ]]; then
19
- FILES="$(git log --since="$LAST_ISO" --name-only --pretty=format: 2>/dev/null | sort -u | sed '/^$/d' || true)"
20
- COMMITS="$(git log --since="$LAST_ISO" --pretty=format:'%H|%s|%an|%aI' 2>/dev/null || true)"
21
- MODE="since-time"
22
- else echo '{"error":"lost-anchor-no-fallback-time"}'; exit 4; fi
24
+ # Determine HEAD availability empty repos have no commits at all.
25
+ if ! git rev-parse --verify --quiet HEAD >/dev/null; then
26
+ log "no HEAD commit; using empty-tree SHA fallback"
27
+ LAST_SHA="$EMPTY_TREE"
28
+ fi
29
+
30
+ # If caller didn't pass a last_sha, treat it as empty-tree (first audit).
31
+ if [[ -z "$LAST_SHA" ]]; then
32
+ log "missing last_sha; defaulting to empty-tree SHA"
33
+ LAST_SHA="$EMPTY_TREE"
34
+ fi
35
+
36
+ resolve_anchor() {
37
+ # Sets RANGE_START (may be empty if anchor unrecoverable).
38
+ if [[ "$LAST_SHA" == "$EMPTY_TREE" ]]; then
39
+ RANGE_START="$EMPTY_TREE"; return 0
40
+ fi
41
+ if git rev-parse --verify --quiet "${LAST_SHA}^{commit}" >/dev/null; then
42
+ if git merge-base --is-ancestor "$LAST_SHA" HEAD 2>/dev/null; then
43
+ RANGE_START="$LAST_SHA"
44
+ else
45
+ RANGE_START="$(git merge-base HEAD "$LAST_SHA" 2>/dev/null || echo "")"
46
+ fi
47
+ return 0
48
+ fi
49
+ RANGE_START=""
50
+ return 1
51
+ }
52
+
53
+ if ! resolve_anchor; then
54
+ # Ladder step (a): unshallow if possible.
55
+ if [[ "$(git rev-parse --is-shallow-repository 2>/dev/null || echo false)" == "true" ]]; then
56
+ log "anchor missing and repo is shallow; attempting git fetch --unshallow --no-tags"
57
+ git fetch --unshallow --no-tags 2>/dev/null || log "unshallow fetch failed (continuing)"
58
+ resolve_anchor || true
59
+ fi
60
+ fi
61
+ if [[ -z "${RANGE_START:-}" ]]; then
62
+ # Ladder step (b): try to fetch super-design notes; they may pin a commit
63
+ # we don't have locally.
64
+ log "anchor still missing; fetching refs/notes/super-design"
65
+ git fetch origin '+refs/notes/super-design:refs/notes/super-design' 2>/dev/null \
66
+ || log "notes fetch failed (continuing)"
67
+ resolve_anchor || true
68
+ fi
69
+
70
+ # Ladder step (c): time-based fallback.
71
+ if [[ -z "${RANGE_START:-}" && -n "$LAST_ISO" ]]; then
72
+ log "using --since=$LAST_ISO time-based fallback"
73
+ FILES="$(git log --since="$LAST_ISO" --name-only --pretty=format: 2>/dev/null | sort -u | sed '/^$/d' || true)"
74
+ COMMITS="$(git log --first-parent --since="$LAST_ISO" --pretty=format:'%H|%s|%an|%aI' 2>/dev/null || true)"
75
+ MODE="since-time"
76
+ RANGE_START=""
77
+ elif [[ -z "${RANGE_START:-}" ]]; then
78
+ echo '{"error":"lost-anchor-no-fallback-time"}'; exit 4
23
79
  else
24
- FILES="$(git diff --name-only "${RANGE_START}..HEAD" \
25
- -- ':!*.lock' ':!package-lock.json' ':!pnpm-lock.yaml' ':!yarn.lock' \
26
- ':!.github/**' ':!**/*.test.*' ':!**/*.spec.*' ':!**/*.stories.*' | sort -u)"
27
- COMMITS="$(git log --no-merges --pretty=format:'%H|%s|%an|%aI' "${RANGE_START}..HEAD" 2>/dev/null || true)"
80
+ # SHA range available. Use --name-status -M90% -z to catch renames AND
81
+ # filenames with spaces (NUL-terminated output).
82
+ RANGE="${RANGE_START}..HEAD"
83
+ if [[ "$RANGE_START" == "$EMPTY_TREE" ]]; then
84
+ # Empty-tree baseline: everything in HEAD is "new".
85
+ RANGE="${EMPTY_TREE} HEAD"
86
+ fi
87
+
88
+ # Parse NUL-terminated name-status output.
89
+ # Format: <STATUS>\0<path>[\0<new_path>] where STATUS can be A/M/D/Rnn/Cnn.
90
+ # We collapse to the post-rename path so downstream classification matches
91
+ # the current file layout.
92
+ FILES="$(
93
+ git diff --name-status -M90% -z \
94
+ -- . ':!*.lock' ':!package-lock.json' ':!pnpm-lock.yaml' ':!yarn.lock' \
95
+ ':!.github/**' ':!**/*.test.*' ':!**/*.spec.*' ':!**/*.stories.*' \
96
+ $RANGE 2>/dev/null |
97
+ awk -v RS='\0' '
98
+ BEGIN { status = "" }
99
+ {
100
+ if (status == "") { status = $0; next }
101
+ # Rename/copy has two path fields; we want the second (post-rename).
102
+ if (status ~ /^R/ || status ~ /^C/) {
103
+ if (wanted == "") { wanted = 1; next }
104
+ print; status = ""; wanted = ""
105
+ } else {
106
+ print; status = ""
107
+ }
108
+ }
109
+ ' | sort -u
110
+ )"
111
+
112
+ # --first-parent keeps one entry per merged PR (trunk-based). Combine
113
+ # with --cherry-pick --right-only to dedupe back-ports / rebased copies.
114
+ if [[ "$RANGE_START" == "$EMPTY_TREE" ]]; then
115
+ COMMITS="$(git log --first-parent --pretty=format:'%H|%s|%an|%aI' HEAD 2>/dev/null || true)"
116
+ else
117
+ COMMITS="$(git log --first-parent --cherry-pick --right-only --no-merges \
118
+ --pretty=format:'%H|%s|%an|%aI' "${RANGE_START}...HEAD" 2>/dev/null || true)"
119
+ fi
28
120
  MODE="sha-range"
29
121
  fi
30
122
 
@@ -32,7 +124,7 @@ declare -A CLASSIFIED
32
124
  while IFS= read -r p; do
33
125
  [[ -z "$p" ]] && continue
34
126
  case "$p" in
35
- tailwind.config.*|*.tokens.json|styles/tokens.css|styles/theme.css) CLASSIFIED[tokens]+="$p,";;
127
+ tailwind.config.*|*.tokens.json|*.tokens|styles/tokens.css|styles/theme.css) CLASSIFIED[tokens]+="$p,";;
36
128
  components/*|src/components/*|app/_components/*) CLASSIFIED[components]+="$p,";;
37
129
  app/*/page.*|app/page.*|app/*/route.*|app/route.*|pages/*|src/pages/*|app/routes/*|src/routes/*) CLASSIFIED[routes]+="$p,";;
38
130
  public/*|src/assets/*|assets/*) CLASSIFIED[imagery]+="$p,";;
@@ -43,7 +135,7 @@ while IFS= read -r p; do
43
135
  esac
44
136
  done <<< "$FILES"
45
137
 
46
- jq -Rn --arg mode "$MODE" --arg range_start "$RANGE_START" --arg last_iso "$LAST_ISO" \
138
+ jq -Rn --arg mode "$MODE" --arg range_start "${RANGE_START:-}" --arg last_iso "$LAST_ISO" \
47
139
  --arg tokens "${CLASSIFIED[tokens]:-}" --arg components "${CLASSIFIED[components]:-}" \
48
140
  --arg routes "${CLASSIFIED[routes]:-}" --arg imagery "${CLASSIFIED[imagery]:-}" \
49
141
  --arg deps "${CLASSIFIED[deps]:-}" --arg theory "${CLASSIFIED[theory]:-}" \
@@ -59,3 +151,6 @@ jq -Rn --arg mode "$MODE" --arg range_start "$RANGE_START" --arg last_iso "$LAST
59
151
  deps: tolist($deps), theory: tolist($theory), content: tolist($content)
60
152
  }
61
153
  }'
154
+
155
+ # TODO(sd-audit-state §11/artifact line 902): monorepo per-app state
156
+ # (apps/*/docs/super-design/.audit-state.json) not yet supported.
@@ -1,4 +1,11 @@
1
1
  #!/usr/bin/env bash
2
+ # TODO(sd-audit-state artifact §14): dynamic routes like `/posts/[slug]`
3
+ # should be expanded to `/posts/@fixture-<id>` using a fixtures manifest
4
+ # (discovered from tests or a user-configured JSON) before being passed
5
+ # to hash-pages/sd-audit. Current impl emits the raw pattern.
6
+ # TODO(sd-audit-state artifact §8): madge-based import-graph builder
7
+ # (`madge --json --ts-config tsconfig.json src`) to compute N-hop
8
+ # component → page blast radius. Expected alongside this script.
2
9
  set -euo pipefail
3
10
 
4
11
  detect_framework() {
@@ -1,6 +1,11 @@
1
1
  #!/usr/bin/env node
2
+ // Extract + canonicalize + hash design tokens from Tailwind configs and
3
+ // CSS custom properties. Output is deterministic regardless of insertion
4
+ // order (artifact §9.2 line 722: "canonical = JSON.stringify(theme,
5
+ // Object.keys(theme).sort())").
2
6
  import fs from "node:fs";
3
7
  import path from "node:path";
8
+ import { createHash } from "node:crypto";
4
9
  import { pathToFileURL } from "node:url";
5
10
 
6
11
  const out = {};
@@ -28,12 +33,24 @@ try {
28
33
  }
29
34
  } catch (e) { out._postcss_error = String(e.message || e); }
30
35
 
31
- console.log(JSON.stringify(out, null, 2));
36
+ // TODO(sd-audit-state §9.1 artifact line 707-709): DTCG *.tokens.json and
37
+ // Tokens Studio support — parse JSON, resolve `{alias}` refs per §9.2 line
38
+ // 747 before hashing, then merge into `out` with prefix `dtcg:`.
39
+
40
+ // Deterministic canonical form: keys sorted top-to-bottom.
41
+ const sorted = Object.keys(out).sort().reduce((acc, k) => { acc[k] = out[k]; return acc; }, {});
42
+ const canonical = JSON.stringify(sorted);
43
+ const tokens_hash = "sha256:" + createHash("sha256").update(canonical).digest("hex");
44
+
45
+ console.log(JSON.stringify({ tokens: sorted, tokens_hash }, null, 2));
32
46
 
33
47
  function flatten(obj, prefix, acc) {
34
48
  if (obj == null) return;
35
49
  if (typeof obj !== "object") { acc[prefix] = String(obj); return; }
36
- for (const [k, v] of Object.entries(obj)) {
50
+ // Sort keys so hash is stable across JS engines / config formatters.
51
+ // Artifact §9.2 line 722 calls this out explicitly.
52
+ for (const k of Object.keys(obj).sort()) {
53
+ const v = obj[k];
37
54
  const key = `${prefix}.${k}`;
38
55
  if (v && typeof v === "object" && !Array.isArray(v)) flatten(v, key, acc);
39
56
  else acc[key] = Array.isArray(v) ? v.join(",") : String(v);
@@ -1,5 +1,11 @@
1
1
  #!/usr/bin/env bash
2
2
  # Usage: hash-pages.sh <urls_file>
3
+ #
4
+ # TODO(sd-audit-state artifact §10 line 492, §16 line 1367-1384):
5
+ # Per-viewport hashes (mobile_375 / tablet_768 / desktop_1280) + pHash
6
+ # for perceptual similarity, plus mask_selectors passed to
7
+ # page.screenshot({ mask: [...] }) for deterministic diffs. Current impl
8
+ # hashes a single desktop viewport only.
3
9
  set -euo pipefail
4
10
  URLS="${1:?usage: hash-pages.sh <urls_file>}"
5
11
  OUT_DIR="${OUT_DIR:-docs/super-design/.cache/hashes}"
@@ -0,0 +1,21 @@
1
+ #!/usr/bin/env bash
2
+ # Usage: setup-git-notes.sh
3
+ #
4
+ # One-shot setup so `git notes --ref=super-design` round-trips across
5
+ # clones. Without the remote refspec, git fetch ignores notes by default
6
+ # (artifact §7 line 570-573).
7
+ set -euo pipefail
8
+
9
+ if ! git rev-parse --git-dir >/dev/null 2>&1; then
10
+ echo '{"error":"not-a-git-repo"}' >&2; exit 3
11
+ fi
12
+
13
+ # Idempotent: only add if absent.
14
+ if git config --get-all remote.origin.fetch 2>/dev/null |
15
+ grep -q 'refs/notes/super-design'; then
16
+ echo '{"status":"already-configured"}'
17
+ else
18
+ git config --add remote.origin.fetch \
19
+ '+refs/notes/super-design:refs/notes/super-design'
20
+ echo '{"status":"added","ref":"refs/notes/super-design"}'
21
+ fi
@@ -1,14 +1,46 @@
1
1
  #!/usr/bin/env bash
2
+ # Usage: validate-state.sh [<state_path>]
3
+ #
4
+ # Validates the super-design audit state file. On schema/parse errors,
5
+ # moves the broken file aside (artifact §3 "Graceful corruption handling"
6
+ # line 74) and emits a JSON verdict. Also enforces schema_version major
7
+ # compatibility (artifact §12 line 934).
2
8
  set -euo pipefail
3
9
  STATE="${1:-docs/super-design/.audit-state.json}"
10
+
11
+ # Current schema major is either read from a sibling .schema-version file
12
+ # (so the number can be bumped without editing shell) or falls back to 1.
13
+ SCHEMA_VERSION_FILE="$(dirname "$0")/../.schema-version"
14
+ if [[ -f "$SCHEMA_VERSION_FILE" ]]; then
15
+ CURRENT_SCHEMA_MAJOR="$(cut -d. -f1 <"$SCHEMA_VERSION_FILE" | tr -d '[:space:]')"
16
+ else
17
+ CURRENT_SCHEMA_MAJOR=1
18
+ fi
19
+
4
20
  if [[ ! -f "$STATE" ]]; then echo '{"status":"missing"}'; exit 2; fi
5
- jq -e '
21
+
22
+ # Parse + shape check. On failure, rename so the user can inspect and we
23
+ # fall through to first-audit (SKILL.md Step 1 treats "corrupt" that way).
24
+ if ! jq -e '
6
25
  (.schema_version | type == "string") and
7
26
  (.last_audit_at | fromdateiso8601 | . > 0) and
8
27
  (.git_sha_at_audit | test("^[0-9a-f]{7,64}$")) and
9
28
  (.skill_version | type == "string") and
10
29
  (.tools | type == "object")
11
- ' "$STATE" >/dev/null 2>&1 || { echo '{"status":"corrupt"}'; exit 2; }
30
+ ' "$STATE" >/dev/null 2>&1; then
31
+ mv "$STATE" "$STATE.corrupt-$(date +%s)" 2>/dev/null || true
32
+ echo '{"status":"corrupt"}'; exit 2
33
+ fi
34
+
35
+ # schema_version major-bump check — if state was written by a newer OR
36
+ # incompatible-older skill, force a full re-audit rather than silently
37
+ # trusting the shape.
38
+ STATE_MAJOR="$(jq -r '.schema_version' "$STATE" | cut -d. -f1)"
39
+ if [[ -z "$STATE_MAJOR" || "$STATE_MAJOR" != "$CURRENT_SCHEMA_MAJOR" ]]; then
40
+ echo "{\"status\":\"schema-incompatible\",\"action\":\"force-full\",\"state_major\":\"${STATE_MAJOR:-unknown}\",\"current_major\":\"${CURRENT_SCHEMA_MAJOR}\"}"
41
+ exit 1
42
+ fi
43
+
12
44
  AGE_DAYS=$(( ( $(date -u +%s) - $(jq -r '.last_audit_at | fromdateiso8601' "$STATE") ) / 86400 ))
13
45
  if (( AGE_DAYS > 180 )); then echo "{\"status\":\"stale-force-full\",\"age_days\":$AGE_DAYS}"; exit 1
14
46
  elif (( AGE_DAYS > 90 )); then echo "{\"status\":\"stale-refresh-research\",\"age_days\":$AGE_DAYS}"; exit 1
@@ -0,0 +1,26 @@
1
+ #!/usr/bin/env bash
2
+ # Usage: write-state.sh [<target>] (reads JSON body from stdin)
3
+ #
4
+ # Atomic state writer: accepts JSON on stdin, writes to <target>.tmp,
5
+ # validates with jq, then renames in place. Referenced from
6
+ # docs/compass_artifact §11 ("Write-then-rename (atomic)") and SKILL.md
7
+ # Step 4 ("Atomic write .audit-state.json").
8
+ set -euo pipefail
9
+
10
+ TARGET="${1:-docs/super-design/.audit-state.json}"
11
+ TMP="${TARGET}.tmp"
12
+
13
+ mkdir -p "$(dirname "$TARGET")"
14
+
15
+ # Drain stdin into the tmp file.
16
+ cat >"$TMP"
17
+
18
+ # Validate it is parseable JSON before swapping.
19
+ if ! jq -e 'type == "object"' "$TMP" >/dev/null 2>&1; then
20
+ rm -f "$TMP"
21
+ echo '{"error":"invalid-json-on-stdin"}' >&2
22
+ exit 2
23
+ fi
24
+
25
+ mv -f "$TMP" "$TARGET"
26
+ echo "{\"status\":\"written\",\"path\":\"$TARGET\"}"