@bastani/atomic 0.8.20 → 0.8.21-0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (124) hide show
  1. package/CHANGELOG.md +6 -0
  2. package/dist/builtin/intercom/package.json +1 -1
  3. package/dist/builtin/mcp/package.json +1 -1
  4. package/dist/builtin/subagents/agents/code-simplifier.md +78 -22
  5. package/dist/builtin/subagents/agents/debugger.md +4 -3
  6. package/dist/builtin/subagents/package.json +1 -1
  7. package/dist/builtin/web-access/package.json +1 -1
  8. package/dist/builtin/workflows/CHANGELOG.md +19 -0
  9. package/dist/builtin/workflows/package.json +1 -1
  10. package/dist/builtin/workflows/skills/create-spec/SKILL.md +169 -125
  11. package/dist/builtin/workflows/skills/impeccable/SKILL.md +89 -80
  12. package/dist/builtin/workflows/skills/impeccable/agents/impeccable_asset_producer.toml +92 -0
  13. package/dist/builtin/workflows/skills/impeccable/agents/impeccable_manual_edit_applier.toml +95 -0
  14. package/dist/builtin/workflows/skills/impeccable/agents/openai.yaml +4 -0
  15. package/dist/builtin/workflows/skills/impeccable/reference/adapt.md +122 -1
  16. package/dist/builtin/workflows/skills/impeccable/reference/animate.md +38 -12
  17. package/dist/builtin/workflows/skills/impeccable/reference/audit.md +5 -5
  18. package/dist/builtin/workflows/skills/impeccable/reference/bolder.md +7 -7
  19. package/dist/builtin/workflows/skills/impeccable/reference/brand.md +4 -14
  20. package/dist/builtin/workflows/skills/impeccable/reference/clarify.md +115 -1
  21. package/dist/builtin/workflows/skills/impeccable/reference/codex.md +3 -3
  22. package/dist/builtin/workflows/skills/impeccable/reference/colorize.md +109 -6
  23. package/dist/builtin/workflows/skills/impeccable/reference/craft.md +7 -7
  24. package/dist/builtin/workflows/skills/impeccable/reference/critique.md +623 -94
  25. package/dist/builtin/workflows/skills/impeccable/reference/delight.md +2 -2
  26. package/dist/builtin/workflows/skills/impeccable/reference/distill.md +2 -2
  27. package/dist/builtin/workflows/skills/impeccable/reference/document.md +16 -14
  28. package/dist/builtin/workflows/skills/impeccable/reference/extract.md +1 -1
  29. package/dist/builtin/workflows/skills/impeccable/reference/harden.md +1 -1
  30. package/dist/builtin/workflows/skills/impeccable/reference/init.md +172 -0
  31. package/dist/builtin/workflows/skills/impeccable/reference/interaction-design.md +0 -6
  32. package/dist/builtin/workflows/skills/impeccable/reference/layout.md +33 -13
  33. package/dist/builtin/workflows/skills/impeccable/reference/live.md +96 -19
  34. package/dist/builtin/workflows/skills/impeccable/reference/onboard.md +1 -1
  35. package/dist/builtin/workflows/skills/impeccable/reference/optimize.md +1 -1
  36. package/dist/builtin/workflows/skills/impeccable/reference/overdrive.md +1 -1
  37. package/dist/builtin/workflows/skills/impeccable/reference/polish.md +3 -4
  38. package/dist/builtin/workflows/skills/impeccable/reference/product.md +1 -3
  39. package/dist/builtin/workflows/skills/impeccable/reference/quieter.md +2 -2
  40. package/dist/builtin/workflows/skills/impeccable/reference/shape.md +5 -5
  41. package/dist/builtin/workflows/skills/impeccable/reference/typeset.md +158 -3
  42. package/dist/builtin/workflows/skills/impeccable/scripts/cleanup-deprecated.mjs +1 -1
  43. package/dist/builtin/workflows/skills/impeccable/scripts/command-metadata.json +2 -2
  44. package/dist/builtin/workflows/skills/impeccable/scripts/context-signals.mjs +225 -0
  45. package/dist/builtin/workflows/skills/impeccable/scripts/context.mjs +266 -0
  46. package/dist/builtin/workflows/skills/impeccable/scripts/critique-storage.mjs +17 -1
  47. package/dist/builtin/workflows/skills/impeccable/scripts/design-parser.mjs +16 -1
  48. package/dist/builtin/workflows/skills/impeccable/scripts/detect.mjs +21 -0
  49. package/dist/builtin/workflows/skills/impeccable/scripts/detector/browser/injected/index.mjs +1725 -0
  50. package/dist/builtin/workflows/skills/impeccable/scripts/detector/cli/main.mjs +244 -0
  51. package/dist/builtin/workflows/skills/impeccable/scripts/detector/detect-antipatterns-browser.js +4543 -0
  52. package/dist/builtin/workflows/skills/impeccable/scripts/detector/detect-antipatterns.mjs +43 -0
  53. package/dist/builtin/workflows/skills/impeccable/scripts/detector/engines/browser/detect-url.mjs +252 -0
  54. package/dist/builtin/workflows/skills/impeccable/scripts/detector/engines/regex/detect-text.mjs +535 -0
  55. package/dist/builtin/workflows/skills/impeccable/scripts/detector/engines/static-html/css-cascade.mjs +986 -0
  56. package/dist/builtin/workflows/skills/impeccable/scripts/detector/engines/static-html/detect-html.mjs +208 -0
  57. package/dist/builtin/workflows/skills/impeccable/scripts/detector/engines/visual/screenshot-contrast.mjs +189 -0
  58. package/dist/builtin/workflows/skills/impeccable/scripts/detector/findings.mjs +12 -0
  59. package/dist/builtin/workflows/skills/impeccable/scripts/detector/node/file-system.mjs +198 -0
  60. package/dist/builtin/workflows/skills/impeccable/scripts/detector/profile/profiler.mjs +166 -0
  61. package/dist/builtin/workflows/skills/impeccable/scripts/detector/registry/antipatterns.mjs +419 -0
  62. package/dist/builtin/workflows/skills/impeccable/scripts/detector/rules/checks.mjs +2316 -0
  63. package/dist/builtin/workflows/skills/impeccable/scripts/detector/shared/color.mjs +124 -0
  64. package/dist/builtin/workflows/skills/impeccable/scripts/detector/shared/constants.mjs +101 -0
  65. package/dist/builtin/workflows/skills/impeccable/scripts/detector/shared/page.mjs +7 -0
  66. package/dist/builtin/workflows/skills/impeccable/scripts/impeccable-paths.mjs +17 -1
  67. package/dist/builtin/workflows/skills/impeccable/scripts/is-generated.mjs +2 -2
  68. package/dist/builtin/workflows/skills/impeccable/scripts/live-accept.mjs +139 -96
  69. package/dist/builtin/workflows/skills/impeccable/scripts/live-browser.js +4491 -526
  70. package/dist/builtin/workflows/skills/impeccable/scripts/live-commit-manual-edits.mjs +1241 -0
  71. package/dist/builtin/workflows/skills/impeccable/scripts/live-copy-edit-agent.mjs +683 -0
  72. package/dist/builtin/workflows/skills/impeccable/scripts/live-discard-manual-edits.mjs +51 -0
  73. package/dist/builtin/workflows/skills/impeccable/scripts/live-event-validation.mjs +136 -0
  74. package/dist/builtin/workflows/skills/impeccable/scripts/live-inject.mjs +22 -9
  75. package/dist/builtin/workflows/skills/impeccable/scripts/live-insert-ui.mjs +458 -0
  76. package/dist/builtin/workflows/skills/impeccable/scripts/live-insert.mjs +232 -0
  77. package/dist/builtin/workflows/skills/impeccable/scripts/live-manual-edit-evidence.mjs +363 -0
  78. package/dist/builtin/workflows/skills/impeccable/scripts/live-manual-edits-buffer.mjs +152 -0
  79. package/dist/builtin/workflows/skills/impeccable/scripts/live-poll.mjs +288 -110
  80. package/dist/builtin/workflows/skills/impeccable/scripts/live-resume.mjs +47 -1
  81. package/dist/builtin/workflows/skills/impeccable/scripts/live-server.mjs +1443 -100
  82. package/dist/builtin/workflows/skills/impeccable/scripts/live-session-store.mjs +17 -0
  83. package/dist/builtin/workflows/skills/impeccable/scripts/live-status.mjs +17 -3
  84. package/dist/builtin/workflows/skills/impeccable/scripts/live-wrap.mjs +216 -6
  85. package/dist/builtin/workflows/skills/impeccable/scripts/live.mjs +2 -3
  86. package/dist/builtin/workflows/skills/impeccable/scripts/palette.mjs +633 -0
  87. package/dist/builtin/workflows/skills/impeccable/scripts/pin.mjs +1 -1
  88. package/dist/builtin/workflows/src/extension/index.ts +67 -3
  89. package/dist/builtin/workflows/src/extension/render-result.ts +26 -1
  90. package/dist/builtin/workflows/src/runs/foreground/executor.ts +227 -3
  91. package/dist/builtin/workflows/src/runs/foreground/stage-runner.ts +94 -7
  92. package/dist/builtin/workflows/src/shared/stage-prompt.ts +326 -0
  93. package/dist/builtin/workflows/src/shared/stage-ui-broker.ts +62 -7
  94. package/dist/builtin/workflows/src/shared/store-types.ts +43 -0
  95. package/dist/builtin/workflows/src/shared/store.ts +37 -0
  96. package/dist/builtin/workflows/src/tui/chat-surface-message.ts +22 -4
  97. package/dist/builtin/workflows/src/tui/graph-view.ts +47 -0
  98. package/dist/builtin/workflows/src/tui/overlay-adapter.ts +43 -1
  99. package/dist/builtin/workflows/src/tui/run-detail.ts +10 -4
  100. package/dist/builtin/workflows/src/tui/stage-chat-view.ts +117 -15
  101. package/dist/builtin/workflows/src/tui/workflow-attach-pane.ts +9 -0
  102. package/dist/core/skills.d.ts.map +1 -1
  103. package/dist/core/skills.js +2 -5
  104. package/dist/core/skills.js.map +1 -1
  105. package/dist/core/system-prompt.d.ts.map +1 -1
  106. package/dist/core/system-prompt.js +11 -29
  107. package/dist/core/system-prompt.js.map +1 -1
  108. package/dist/index.d.ts +1 -0
  109. package/dist/index.d.ts.map +1 -1
  110. package/dist/index.js +3 -0
  111. package/dist/index.js.map +1 -1
  112. package/docs/quickstart.md +1 -2
  113. package/package.json +4 -4
  114. package/dist/builtin/workflows/skills/impeccable/reference/cognitive-load.md +0 -106
  115. package/dist/builtin/workflows/skills/impeccable/reference/color-and-contrast.md +0 -105
  116. package/dist/builtin/workflows/skills/impeccable/reference/heuristics-scoring.md +0 -234
  117. package/dist/builtin/workflows/skills/impeccable/reference/motion-design.md +0 -109
  118. package/dist/builtin/workflows/skills/impeccable/reference/personas.md +0 -179
  119. package/dist/builtin/workflows/skills/impeccable/reference/responsive-design.md +0 -114
  120. package/dist/builtin/workflows/skills/impeccable/reference/spatial-design.md +0 -100
  121. package/dist/builtin/workflows/skills/impeccable/reference/teach.md +0 -156
  122. package/dist/builtin/workflows/skills/impeccable/reference/typography.md +0 -159
  123. package/dist/builtin/workflows/skills/impeccable/reference/ux-writing.md +0 -107
  124. package/dist/builtin/workflows/skills/impeccable/scripts/load-context.mjs +0 -141
@@ -1,113 +1,100 @@
1
- > **Additional context needed**: what the interface is trying to accomplish.
1
+ ### Purpose
2
2
 
3
- ### Setup: Resolve Target and Load Ignore List
3
+ Resolve one stable target, run two independent assessments, synthesize a design critique, persist a snapshot, and ask the user what to improve next. The chat response is the primary deliverable; the snapshot is an archive/backlog for future commands.
4
4
 
5
- Before gathering assessments, do two small bookkeeping steps. They cost almost nothing and they're what makes critique iterative across runs.
5
+ ### Hard Invariants
6
6
 
7
- 1. **Resolve the primary artifact.** The user's phrasing ("the homepage", "the pricing flow") is not stable enough to track across runs. Resolve it to a concrete file path or URL: the same one you'd already need to scan code or open in a browser. Examples:
8
- - "the homepage" `site/pages/index.astro` (or `http://localhost:3000/` if you're inspecting live)
9
- - "the settings modal" the primary component file (e.g., `src/components/Settings.tsx`)
10
- - "this page" the URL or the page's source file
11
- Prefer the source file path over the dev-server URL when both exist; ports drift between runs (`bun dev` vs `bun preview`), file paths don't.
7
+ - Assessment A (design review) and Assessment B (detector/browser evidence) are both required.
8
+ - Assessment A must finish before detector findings enter the parent synthesis context. Detector output is deterministic, but it still anchors judgment.
9
+ - If sub-agents are unavailable, fall back sequentially: finish and record Assessment A first, then run Assessment B, then synthesize.
10
+ - A skipped detector is a failed critique run unless `detect.mjs` is missing or crashes after a real attempt.
11
+ - Viewable targets require browser inspection when available.
12
+ - Any local server started only for critique visualization must run in the background, have a recorded stop method, and be stopped before final reporting unless the user asks to keep it.
13
+ - Do not claim a user-visible overlay exists unless script injection succeeded and the detector ran in the page.
12
14
 
13
- 2. **Compute the slug.** Run:
15
+ ### Setup
16
+
17
+ 1. **Resolve the target** to a concrete file path or URL. Prefer a source path over a dev-server URL when both identify the same surface; ports drift, paths do not.
18
+ - "the homepage" -> `site/pages/index.astro` or `index.html`
19
+ - "the settings modal" -> the primary component file
20
+ - "this page" -> the current URL or source file
21
+ 2. **Compute the slug**:
14
22
  ```bash
15
- node .claude/skills/impeccable/scripts/critique-storage.mjs slug "<resolved-path-or-url>"
23
+ node .agents/skills/impeccable/scripts/critique-storage.mjs slug "<resolved-path-or-url>"
16
24
  ```
17
- Keep the printed slug. It identifies this target's stream across runs. If the command exits non-zero ("no stable slug for input"), skip persistence for this run and tell the user; the trend won't update but the critique still goes ahead.
18
-
19
- 3. **Read the ignore list** at `.impeccable/critique/ignore.md` if it exists. Plain markdown; each non-empty, non-comment line is something the user has marked as "do not re-raise" (deferred tradeoffs, designer-intended deviations, detector false-positives the user accepts). When a finding's text matches a line here (case-insensitive substring against rule name or snippet), **drop it silently**. Do not mention it in the report. This is the ONLY input critique consumes from prior runs; anchoring on prior findings would defeat the point of independent assessment.
20
-
21
- ### Gather Assessments
25
+ Keep it. If the command exits non-zero, skip persistence and trend for this run, but continue the critique.
26
+ 3. **Read `.impeccable/critique/ignore.md`** if it exists. Drop matching findings silently; it is the only prior-run input critique consumes.
22
27
 
23
- Launch two independent assessments. **Neither may see the other's output.** This isolation is what makes the combined score honest. Running both in one head silently anchors them to each other; do not shortcut it for cost, speed, or context-size reasons.
28
+ ### Assessment Orchestration
24
29
 
25
- Delegate each assessment to a separate sub-agent (Claude Code's `Agent` tool, Codex's subagent spawning, etc.). Each returns structured findings as text. Do NOT output findings to the user yet.
30
+ Delegate Assessment A and Assessment B to separate sub-agents when possible. They must not see each other's output. Do not show findings to the user until synthesis.
26
31
 
27
- Fall back to sequential in-head work only if the environment genuinely cannot spawn sub-agents.
32
+ Codex sub-agent gate:
33
+ - If `spawn_agent` is exposed and the user explicitly allowed sub-agents, delegation, or parallel agent work, spawn A and B immediately.
34
+ - If `spawn_agent` is exposed but the user did not explicitly allow sub-agents, ask exactly once: "Impeccable critique is designed to run two independent sub-agents for an unanchored assessment. May I use sub-agents for this critique?" Then stop until the user answers.
35
+ - If allowed, spawn A and B. If declined, run sequentially and report `Assessment independence: degraded (sub-agents declined by user)`.
36
+ - If `spawn_agent` is not exposed, do not ask; run sequentially and report `Assessment independence: degraded (spawn_agent unavailable in this session)`.
37
+ - If spawning fails after permission, run sequentially and report `Assessment independence: degraded (sub-agent spawn failed: <exact error>)`.
38
+ Prefer `fork_context: false` with self-contained prompts containing cwd, target, live URL, references, product context, and output contract. If using `fork_context: true`, omit `agent_type`, `model`, and `reasoning_effort`.
28
39
 
29
- **Tab isolation**: When browser automation is available, each assessment MUST create its own new tab. Never reuse an existing tab, even if one is already open at the correct URL. This prevents the two assessments from interfering with each other's page state.
30
-
31
- #### Assessment A: LLM Design Review
32
-
33
- Read the relevant source files (HTML, CSS, JS/TS) and, if browser automation is available, visually inspect the live page. **Create a new tab** for this; do not reuse existing tabs. After navigation, label the tab by setting the document title:
34
- ```javascript
35
- document.title = '[LLM] ' + document.title;
36
- ```
37
- Think like a design director. Evaluate:
40
+ If browser automation is available, each assessment creates its own new tab. Never reuse an existing tab, even if it is already at the right URL.
38
41
 
39
- **AI Slop Detection (CRITICAL)**: Does this look like every other AI-generated interface? Review against ALL **DON'T** guidelines from the parent impeccable skill (already loaded in this context). Check for AI color palette, gradient text, dark glows, glassmorphism, hero metric layouts, identical card grids, generic fonts, and all other tells. **The test**: If someone said "AI made this," would you believe them immediately?
42
+ ### Assessment A: Design Review
40
43
 
41
- **Holistic Design Review**: visual hierarchy (eye flow, primary action clarity), information architecture (structure, grouping, cognitive load), emotional resonance (does it match brand and audience?), discoverability (are interactive elements obvious?), composition (balance, whitespace, rhythm), typography (hierarchy, readability, font choices), color (purposeful use, cohesion, accessibility), states & edge cases (empty, loading, error, success), microcopy (clarity, tone, helpfulness).
44
+ Read relevant source files and visually inspect the live page when browser automation is available. Think like a design director.
42
45
 
43
- **Cognitive Load** (consult [cognitive-load](cognitive-load.md)):
44
- - Run the 8-item cognitive load checklist. Report failure count: 0-1 = low (good), 2-3 = moderate, 4+ = critical.
45
- - Count visible options at each decision point. If >4, flag it.
46
- - Check for progressive disclosure: is complexity revealed only when needed?
46
+ Evaluate:
47
+ - **AI slop**: Would someone believe "AI made this" immediately? Check all DON'T guidance from the parent Impeccable skill.
48
+ - **Holistic design**: hierarchy, IA, emotional fit, discoverability, composition, typography, color, accessibility, states, copy, and edge cases.
49
+ - **Cognitive load**: consult the [Cognitive Load Assessment](#cognitive-load-assessment) section below; report checklist failures and decision points with >4 visible options.
50
+ - **Emotional journey**: peak-end rule, emotional valleys, reassurance at high-stakes moments.
51
+ - **Nielsen heuristics**: consult the [Heuristics Scoring Guide](#heuristics-scoring-guide) section below; score all 10 heuristics 0-4.
47
52
 
48
- **Emotional Journey**:
49
- - What emotion does this interface evoke? Is that intentional?
50
- - **Peak-end rule**: Is the most intense moment positive? Does the experience end well?
51
- - **Emotional valleys**: Check for anxiety spikes at high-stakes moments (payment, delete, commit). Are there design interventions (progress indicators, reassurance copy, undo options)?
53
+ Return: AI slop verdict, heuristic scores, cognitive load, emotional journey, 2-3 strengths, 3-5 priority issues, persona red flags, minor observations, and provocative questions.
52
54
 
53
- **Nielsen's Heuristics** (consult [heuristics-scoring](heuristics-scoring.md)):
54
- Score each of the 10 heuristics 0-4. This scoring will be presented in the report.
55
+ ### Assessment B: Detector + Browser Evidence
55
56
 
56
- Return structured findings covering: AI slop verdict, heuristic scores, cognitive load assessment, what's working (2-3 items), priority issues (3-5 with what/why/fix), minor observations, and provocative questions.
57
+ Run the bundled detector and browser visualization evidence. Assessment B is mandatory and must remain isolated from Assessment A until both are complete.
57
58
 
58
- #### Assessment B: Automated Detection
59
-
60
- Run the bundled deterministic detector, which flags 27 specific patterns (AI slop tells + general design quality).
61
-
62
- **CLI scan**:
59
+ CLI scan:
63
60
  ```bash
64
- npx impeccable detect --json [--fast] [target]
61
+ node .agents/skills/impeccable/scripts/detect.mjs --json [target]
65
62
  ```
66
63
 
67
- - Pass HTML/JSX/TSX/Vue/Svelte files or directories as `[target]` (anything with markup). Do not pass CSS-only files.
68
- - For URLs, skip the CLI scan (it requires Puppeteer). Use browser visualization instead.
69
- - For large directories (200+ scannable files), use `--fast` (regex-only, skips jsdom)
70
- - For 500+ files, narrow scope or ask the user
71
- - Exit code 0 = clean, 2 = findings
64
+ - Pass markup files/directories as `[target]`; do not pass CSS-only files.
65
+ - For URLs, skip CLI scan and use browser visualization.
66
+ - For very large trees (500+ scannable files), narrow scope or ask.
67
+ - Exit code 0 = clean; 2 = findings.
68
+ - If the detector entrypoint is missing or fails to load, report deterministic scan unavailable and continue with browser/manual review.
72
69
 
73
- **Browser visualization**: **required** when browser automation tools are available AND the target is a viewable page. The `[Human]` overlay tab is the user-facing deliverable; the critique is incomplete without it. Skip only if the target is not a viewable page (CSS-only file, non-browser target).
70
+ Browser visualization is required for a viewable target when browser automation is available. Use a localhost dev/static URL for local files; avoid `file://` unless the available browser explicitly supports this workflow. Overlay flow:
74
71
 
75
- The overlay is a **visual aid for the user**. It highlights issues directly in their browser. Do NOT scroll through the page to screenshot overlays. Instead, read the console output to get the results programmatically.
72
+ 1. Create a fresh tab and navigate.
73
+ 2. Preflight mutable injection by setting `document.title` and appending a `<script>` tag. Read-only evaluate APIs do not count.
74
+ 3. If mutation is unavailable, skip live server, browser presentation, and injection; report fallback signal.
75
+ 4. If mutation is available, start `node .agents/skills/impeccable/scripts/live-server.mjs --background`, present the browser if supported, label `[Human]`, scroll top, inject `http://localhost:PORT/detect.js`, wait 2-3 seconds, read `impeccable` console messages, then stop the live server.
76
+ 5. For multi-view targets, inject on 3-5 representative pages.
76
77
 
77
- 1. **Start the live detection server**:
78
- ```bash
79
- npx impeccable live &
80
- ```
81
- Note the port printed to stdout (auto-assigned). Use `--port=PORT` to fix it.
82
- 2. **Create a new tab** and navigate to the page (use dev server URL for local files, or direct URL). Do not reuse existing tabs.
83
- 3. **Label the tab** via `javascript_tool` so the user can distinguish it:
84
- ```javascript
85
- document.title = '[Human] ' + document.title;
86
- ```
87
- 4. **Scroll to top** to ensure the page is scrolled to the very top before injection
88
- 5. **Inject** via `javascript_tool` (replace PORT with the port from step 1):
89
- ```javascript
90
- const s = document.createElement('script'); s.src = 'http://localhost:PORT/detect.js'; document.head.appendChild(s);
91
- ```
92
- 6. Wait 2-3 seconds for the detector to render overlays
93
- 7. **Read results from console** using `read_console_messages` with pattern `impeccable`. The detector logs all findings with the `[impeccable]` prefix. Do NOT scroll through the page to take screenshots of the overlays.
94
- 8. **Cleanup**: Stop the live server when done:
95
- ```bash
96
- npx impeccable live stop
97
- ```
78
+ Codex Browser note: Use the Browser skill. Do not spend a Browser attempt on `file://`. Only call `visibility.set(true)` after mutable script injection is confirmed for the `[Human]` overlay path; verify with `get()`. Use `tab.dev.logs({ filter: "impeccable" })` for console results. Its Playwright `evaluate(...)` surface is read-only; do not rely on it for mutation.
98
79
 
99
- For multi-view targets, inject on 3-5 representative pages. If injection fails, continue with CLI results only.
80
+ Return: CLI findings JSON/counts, browser console findings if applicable, false positives, and skipped/failed browser steps with concrete reasons.
100
81
 
101
- Return: CLI findings (JSON), browser console findings (if applicable), and any false positives noted.
82
+ After Assessment B returns usable CLI findings, reuse them. Do not rerun `detect.mjs` in the parent unless Assessment B failed, was truncated, or omitted count, rule names, or file locations.
83
+
84
+ Codex failure accounting: final Run Notes must include target slug, ignore list, assessment independence, CLI detector, browser visibility, overlay injection, live-server cleanup, temp-file cleanup, and any fallback signal used. Do not run repo status checks, late API spelunking, or unrelated verification after the report is assembled.
102
85
 
103
86
  ### Generate Combined Critique Report
104
87
 
105
88
  Synthesize both assessments into a single report. Do NOT simply concatenate. Weave the findings together, noting where the LLM review and detector agree, where the detector caught issues the LLM missed, and where detector findings are false positives.
106
89
 
90
+ The chat response is the primary user-facing deliverable. Present the full structured critique below in chat; do not replace it with a summary and a link. The persisted snapshot is only an archive/backlog for later commands.
91
+
92
+ Codex final-answer note: `$impeccable critique` produces a report artifact, so the final chat response should intentionally exceed the usual concise close-out style. Do not title the final response "Critique Summary" unless the user explicitly asked for a summary.
93
+
107
94
  Structure your feedback as a design director would:
108
95
 
109
96
  #### Design Health Score
110
- > *Consult [heuristics-scoring](heuristics-scoring.md)*
97
+ > *Consult the [Heuristics Scoring Guide](#heuristics-scoring-guide) section below.*
111
98
 
112
99
  Present the Nielsen's 10 heuristics scores as a table:
113
100
 
@@ -135,7 +122,7 @@ Be honest with scores. A 4 means genuinely excellent. Most real interfaces score
135
122
 
136
123
  **Deterministic scan**: Summarize what the automated detector found, with counts and file locations. Note any additional issues the detector caught that you missed, and flag any false positives.
137
124
 
138
- **Visual overlays** (if browser was used): Tell the user that overlays are now visible in the **[Human]** tab in their browser, highlighting the detected issues. Summarize what the console output reported.
125
+ **Visual overlays** (if injection succeeded): Tell the user that overlays are now visible in the **[Human]** tab in their browser, highlighting the detected issues. Summarize what the console output reported. If browser visualization was attempted but injection failed, say that no reliable user-visible overlay is available and report the fallback signal instead.
139
126
 
140
127
  #### Overall Impression
141
128
  A brief gut reaction: what works, what doesn't, and the single biggest opportunity.
@@ -146,16 +133,16 @@ Highlight 2-3 things done well. Be specific about why they work.
146
133
  #### Priority Issues
147
134
  The 3-5 most impactful design problems, ordered by importance.
148
135
 
149
- For each issue, tag with **P0-P3 severity** (consult [heuristics-scoring](heuristics-scoring.md) for severity definitions):
136
+ For each issue, tag with **P0-P3 severity** (see [Issue Severity below](#issue-severity-p0p3) for definitions):
150
137
  - **[P?] What**: Name the problem clearly
151
138
  - **Why it matters**: How this hurts users or undermines goals
152
139
  - **Fix**: What to do about it (be concrete)
153
- - **Suggested command**: Which command could address this (from: /impeccable adapt, /impeccable animate, /impeccable audit, /impeccable bolder, /impeccable clarify, /impeccable colorize, /impeccable critique, /impeccable delight, /impeccable distill, /impeccable document, /impeccable harden, /impeccable layout, /impeccable onboard, /impeccable optimize, /impeccable overdrive, /impeccable polish, /impeccable quieter, /impeccable shape, /impeccable typeset)
140
+ - **Suggested command**: Which command could address this (from: $impeccable adapt, $impeccable animate, $impeccable audit, $impeccable bolder, $impeccable clarify, $impeccable colorize, $impeccable critique, $impeccable delight, $impeccable distill, $impeccable document, $impeccable harden, $impeccable layout, $impeccable onboard, $impeccable optimize, $impeccable overdrive, $impeccable polish, $impeccable quieter, $impeccable shape, $impeccable typeset)
154
141
 
155
142
  #### Persona Red Flags
156
- > *Consult [personas](personas.md)*
143
+ > *Consult the [Personas reference](#persona-based-design-testing) below.*
157
144
 
158
- Auto-select 2-3 personas most relevant to this interface type (use the selection table in the reference). If `CLAUDE.md` contains a `## Design Context` section from `impeccable teach`, also generate 1-2 project-specific personas from the audience/brand info.
145
+ Auto-select 2-3 personas most relevant to this interface type (use the selection table in the reference). If `AGENTS.md` contains a `## Design Context` section from `impeccable init`, also generate 1-2 project-specific personas from the audience/brand info.
159
146
 
160
147
  For each selected persona, walk through the primary user action and list specific red flags found:
161
148
 
@@ -174,6 +161,11 @@ Provocative questions that might unlock better solutions:
174
161
  - "Does this need to feel this complex?"
175
162
  - "What would a confident version of this look like?"
176
163
 
164
+ #### Run Notes
165
+ Keep this compact. Include status for target slug, ignore list, assessment independence, CLI detector, browser visibility, overlay injection, live server cleanup, and temp-file cleanup. For failed or skipped steps, give the concrete observed reason and the fallback signal used. In the final chat response, also include snapshot write and trend read status after persistence has run.
166
+
167
+ Codex Run Notes are final-chat only. Do not include this section in the persisted snapshot body, because persistence, trend read, and temp cleanup happen after the snapshot write and would otherwise archive stale status such as "pending after persistence."
168
+
177
169
  **Remember**:
178
170
  - Be direct. Vague feedback wastes everyone's time.
179
171
  - Be specific. "The submit button," not "some elements."
@@ -184,26 +176,30 @@ Provocative questions that might unlock better solutions:
184
176
 
185
177
  ### Persist the Snapshot
186
178
 
187
- Once the report above is finalized, write it to `.impeccable/critique/` so the user can refer back, and so `/impeccable polish` can pick up the priority issues without a copy-paste.
179
+ Once the report above is finalized, write it to `.impeccable/critique/` so the user can refer back, and so `$impeccable polish` can pick up the priority issues without a copy-paste.
188
180
 
189
181
  Skip this step if the Setup slug was null (vague or root-level target).
190
182
 
191
- 1. **Write the body to a temp file** so you can pipe it to the helper. Use the full report (heuristic table, anti-patterns verdict, priority issues, persona red flags) but stop before the "Ask the User" / "Recommended Actions" sections that come later.
183
+ 1. **Write the body to a temp file** so you can pipe it to the helper. Use the full critique report (heuristic table, anti-patterns verdict, priority issues, persona red flags, minor observations, and questions), but stop before the "Ask the User" / "Recommended Actions" sections that come later.
184
+
185
+ Codex: exclude Run Notes from the temp body file; Run Notes are final-chat only because persistence, trend read, and temp cleanup happen after the snapshot write.
192
186
 
193
187
  2. **Pass the structured metadata** through `IMPECCABLE_CRITIQUE_META` (JSON), then run the write command:
194
188
  ```bash
195
189
  IMPECCABLE_CRITIQUE_META='{"target":"<user phrasing>","total_score":<n>,"p0_count":<n>,"p1_count":<n>}' \
196
- node .claude/skills/impeccable/scripts/critique-storage.mjs write <slug> <body-file>
190
+ node .agents/skills/impeccable/scripts/critique-storage.mjs write <slug> <body-file>
197
191
  ```
198
192
  The helper prints the absolute path it wrote.
199
193
 
200
- 3. **Read the trend** for context:
194
+ 3. **Delete the temp body file** after the write attempt completes, whether the write succeeded or failed. If deletion fails, mention `temp-file cleanup failed: <reason>` briefly in the final output, but do not block the critique.
195
+
196
+ 4. **Read the trend** for context:
201
197
  ```bash
202
- node .claude/skills/impeccable/scripts/critique-storage.mjs trend <slug> 5
198
+ node .agents/skills/impeccable/scripts/critique-storage.mjs trend <slug> 5
203
199
  ```
204
200
  This returns a JSON array of the last 5 frontmatter entries (including the one you just wrote).
205
201
 
206
- 4. **Append a single line to the user-visible output**, after the report and before the questions:
202
+ 5. **Append a single line to the user-visible output**, after the report and before the questions:
207
203
 
208
204
  > **Trend for `<slug>` (last 5 runs): 24 → 28 → 32 → 29 → 32**
209
205
  > Wrote `.impeccable/critique/<filename>`.
@@ -214,7 +210,7 @@ This is fire-and-forget. Do not show the user the helper's JSON output; only the
214
210
 
215
211
  ### Ask the User
216
212
 
217
- **After presenting findings**, use targeted questions based on what was actually found. STOP and call the AskUserQuestion tool to clarify. These answers will shape the action plan.
213
+ **After presenting findings**, use targeted questions based on what was actually found. STOP and use Codex's structured user-input/question tool when available; if unavailable, ask directly in chat to clarify what you cannot infer. These answers will shape the action plan.
218
214
 
219
215
  Ask questions along these lines (adapt to the specific findings; do NOT ask generic questions):
220
216
 
@@ -232,6 +228,8 @@ Ask questions along these lines (adapt to the specific findings; do NOT ask gene
232
228
  - Offer concrete options, not open-ended prompts.
233
229
  - If findings are straightforward (e.g., only 1-2 clear issues), skip questions and go directly to Recommended Actions.
234
230
 
231
+ Codex final-question gate: The user-visible response must either include the targeted questions or explicitly say `Questions skipped: <reason>` because the findings were straightforward. Each question must include 2-3 concrete answer options tied to the actual critique findings. Do not end with only open-ended questions.
232
+
235
233
  ### Recommended Actions
236
234
 
237
235
  **After receiving the user's answers**, present a prioritized action summary reflecting the user's priorities and scope from Ask the User.
@@ -240,22 +238,553 @@ Ask questions along these lines (adapt to the specific findings; do NOT ask gene
240
238
 
241
239
  List recommended commands in priority order, based on the user's answers:
242
240
 
243
- 1. **`/command-name`**: Brief description of what to fix (specific context from critique findings)
244
- 2. **`/command-name`**: Brief description (specific context)
241
+ 1. **`$command-name`**: Brief description of what to fix (specific context from critique findings)
242
+ 2. **`$command-name`**: Brief description (specific context)
245
243
  ...
246
244
 
247
245
  **Rules for recommendations**:
248
- - Only recommend commands from: /impeccable adapt, /impeccable animate, /impeccable audit, /impeccable bolder, /impeccable clarify, /impeccable colorize, /impeccable critique, /impeccable delight, /impeccable distill, /impeccable document, /impeccable harden, /impeccable layout, /impeccable onboard, /impeccable optimize, /impeccable overdrive, /impeccable polish, /impeccable quieter, /impeccable shape, /impeccable typeset
246
+ - Only recommend commands from: $impeccable adapt, $impeccable animate, $impeccable audit, $impeccable bolder, $impeccable clarify, $impeccable colorize, $impeccable critique, $impeccable delight, $impeccable distill, $impeccable document, $impeccable harden, $impeccable layout, $impeccable onboard, $impeccable optimize, $impeccable overdrive, $impeccable polish, $impeccable quieter, $impeccable shape, $impeccable typeset
249
247
  - Order by the user's stated priorities first, then by impact
250
248
  - Each item's description should carry enough context that the command knows what to focus on
251
249
  - Map each Priority Issue to the appropriate command
252
250
  - Skip commands that would address zero issues
253
251
  - If the user chose a limited scope, only include items within that scope
254
252
  - If the user marked areas as off-limits, exclude commands that would touch those areas
255
- - End with `/impeccable polish` as the final step if any fixes were recommended
253
+ - End with `$impeccable polish` as the final step if any fixes were recommended
256
254
 
257
255
  After presenting the summary, tell the user:
258
256
 
259
257
  > You can ask me to run these one at a time, all at once, or in any order you prefer.
260
258
  >
261
- > Re-run `/impeccable critique` after fixes to see your score improve.
259
+ > Re-run `$impeccable critique` after fixes to see your score improve.
260
+
261
+ ---
262
+
263
+ ## Reference Material
264
+
265
+ The sections below were previously separate reference files (`cognitive-load.md`, `heuristics-scoring.md`, `personas.md`). They live inline now so the critique flow has all its deep context in one place.
266
+
267
+ ### Cognitive Load Assessment
268
+
269
+ Cognitive load is the total mental effort required to use an interface. Overloaded users make mistakes, get frustrated, and leave. This reference helps identify and fix cognitive overload.
270
+
271
+ ---
272
+
273
+ #### Three Types of Cognitive Load
274
+
275
+ ##### Intrinsic Load: The Task Itself
276
+ Complexity inherent to what the user is trying to do. You can't eliminate this, but you can structure it.
277
+
278
+ **Manage it by**:
279
+ - Breaking complex tasks into discrete steps
280
+ - Providing scaffolding (templates, defaults, examples)
281
+ - Progressive disclosure: show what's needed now, hide the rest
282
+ - Grouping related decisions together
283
+
284
+ ##### Extraneous Load: Bad Design
285
+ Mental effort caused by poor design choices. **Eliminate this ruthlessly.** It's pure waste.
286
+
287
+ **Common sources**:
288
+ - Confusing navigation that requires mental mapping
289
+ - Unclear labels that force users to guess meaning
290
+ - Visual clutter competing for attention
291
+ - Inconsistent patterns that prevent learning
292
+ - Unnecessary steps between user intent and result
293
+
294
+ ##### Germane Load: Learning Effort
295
+ Mental effort spent building understanding. This is *good* cognitive load; it leads to mastery.
296
+
297
+ **Support it by**:
298
+ - Progressive disclosure that reveals complexity gradually
299
+ - Consistent patterns that reward learning
300
+ - Feedback that confirms correct understanding
301
+ - Onboarding that teaches through action, not walls of text
302
+
303
+ ---
304
+
305
+ #### Cognitive Load Checklist
306
+
307
+ Evaluate the interface against these 8 items:
308
+
309
+ - [ ] **Single focus**: Can the user complete their primary task without distraction from competing elements?
310
+ - [ ] **Chunking**: Is information presented in digestible groups (≤4 items per group)?
311
+ - [ ] **Grouping**: Are related items visually grouped together (proximity, borders, shared background)?
312
+ - [ ] **Visual hierarchy**: Is it immediately clear what's most important on the screen?
313
+ - [ ] **One thing at a time**: Can the user focus on a single decision before moving to the next?
314
+ - [ ] **Minimal choices**: Are decisions simplified (≤4 visible options at any decision point)?
315
+ - [ ] **Working memory**: Does the user need to remember information from a previous screen to act on the current one?
316
+ - [ ] **Progressive disclosure**: Is complexity revealed only when the user needs it?
317
+
318
+ **Scoring**: Count the failed items. 0–1 failures = low cognitive load (good). 2–3 = moderate (address soon). 4+ = high cognitive load (critical fix needed).
319
+
320
+ ---
321
+
322
+ #### The Working Memory Rule
323
+
324
+ **Humans can hold ≤4 items in working memory at once** (Miller's Law revised by Cowan, 2001).
325
+
326
+ At any decision point, count the number of distinct options, actions, or pieces of information a user must simultaneously consider:
327
+ - **≤4 items**: Within working memory limits, manageable
328
+ - **5–7 items**: Pushing the boundary; consider grouping or progressive disclosure
329
+ - **8+ items**: Overloaded; users will skip, misclick, or abandon
330
+
331
+ **Practical applications**:
332
+ - Navigation menus: ≤5 top-level items (group the rest under clear categories)
333
+ - Form sections: ≤4 fields visible per group before a visual break
334
+ - Action buttons: 1 primary, 1–2 secondary, group the rest in a menu
335
+ - Dashboard widgets: ≤4 key metrics visible without scrolling
336
+ - Pricing tiers: ≤3 options (more causes analysis paralysis)
337
+
338
+ ---
339
+
340
+ #### Common Cognitive Load Violations
341
+
342
+ ##### 1. The Wall of Options
343
+ **Problem**: Presenting 10+ choices at once with no hierarchy.
344
+ **Fix**: Group into categories, highlight recommended, use progressive disclosure.
345
+
346
+ ##### 2. The Memory Bridge
347
+ **Problem**: User must remember info from step 1 to complete step 3.
348
+ **Fix**: Keep relevant context visible, or repeat it where it's needed.
349
+
350
+ ##### 3. The Hidden Navigation
351
+ **Problem**: User must build a mental map of where things are.
352
+ **Fix**: Always show current location (breadcrumbs, active states, progress indicators).
353
+
354
+ ##### 4. The Jargon Barrier
355
+ **Problem**: Technical or domain language forces translation effort.
356
+ **Fix**: Use plain language. If domain terms are unavoidable, define them inline.
357
+
358
+ ##### 5. The Visual Noise Floor
359
+ **Problem**: Every element has the same visual weight; nothing stands out.
360
+ **Fix**: Establish clear hierarchy: one primary element, 2–3 secondary, everything else muted.
361
+
362
+ ##### 6. The Inconsistent Pattern
363
+ **Problem**: Similar actions work differently in different places.
364
+ **Fix**: Standardize interaction patterns. Same type of action = same type of UI.
365
+
366
+ ##### 7. The Multi-Task Demand
367
+ **Problem**: Interface requires processing multiple simultaneous inputs (reading + deciding + navigating).
368
+ **Fix**: Sequence the steps. Let the user do one thing at a time.
369
+
370
+ ##### 8. The Context Switch
371
+ **Problem**: User must jump between screens/tabs/modals to gather info for a single decision.
372
+ **Fix**: Co-locate the information needed for each decision. Reduce back-and-forth.
373
+
374
+ ---
375
+
376
+ ### Heuristics Scoring Guide
377
+
378
+ Score each of Nielsen's 10 Usability Heuristics on a 0–4 scale. Be honest: a 4 means genuinely excellent, not "good enough."
379
+
380
+ #### Nielsen's 10 Heuristics
381
+
382
+ ##### 1. Visibility of System Status
383
+
384
+ Keep users informed about what's happening through timely, appropriate feedback.
385
+
386
+ **Check for**:
387
+ - Loading indicators during async operations
388
+ - Confirmation of user actions (save, submit, delete)
389
+ - Progress indicators for multi-step processes
390
+ - Current location in navigation (breadcrumbs, active states)
391
+ - Form validation feedback (inline, not just on submit)
392
+
393
+ **Scoring**:
394
+ | Score | Criteria |
395
+ |-------|----------|
396
+ | 0 | No feedback; user is guessing what happened |
397
+ | 1 | Rare feedback; most actions produce no visible response |
398
+ | 2 | Partial; some states communicated, major gaps remain |
399
+ | 3 | Good; most operations give clear feedback, minor gaps |
400
+ | 4 | Excellent; every action confirms, progress is always visible |
401
+
402
+ ##### 2. Match Between System and Real World
403
+
404
+ Speak the user's language. Follow real-world conventions. Information appears in natural, logical order.
405
+
406
+ **Check for**:
407
+ - Familiar terminology (no unexplained jargon)
408
+ - Logical information order matching user expectations
409
+ - Recognizable icons and metaphors
410
+ - Domain-appropriate language for the target audience
411
+ - Natural reading flow (left-to-right, top-to-bottom priority)
412
+
413
+ **Scoring**:
414
+ | Score | Criteria |
415
+ |-------|----------|
416
+ | 0 | Pure tech jargon, alien to users |
417
+ | 1 | Mostly confusing; requires domain expertise to navigate |
418
+ | 2 | Mixed; some plain language, some jargon leaks through |
419
+ | 3 | Mostly natural; occasional term needs context |
420
+ | 4 | Speaks the user's language fluently throughout |
421
+
422
+ ##### 3. User Control and Freedom
423
+
424
+ Users need a clear "emergency exit" from unwanted states without extended dialogue.
425
+
426
+ **Check for**:
427
+ - Undo/redo functionality
428
+ - Cancel buttons on forms and modals
429
+ - Clear navigation back to safety (home, previous)
430
+ - Easy way to clear filters, search, selections
431
+ - Escape from long or multi-step processes
432
+
433
+ **Scoring**:
434
+ | Score | Criteria |
435
+ |-------|----------|
436
+ | 0 | Users get trapped; no way out without refreshing |
437
+ | 1 | Difficult exits; must find obscure paths to escape |
438
+ | 2 | Some exits; main flows have escape, edge cases don't |
439
+ | 3 | Good control; users can exit and undo most actions |
440
+ | 4 | Full control; undo, cancel, back, and escape everywhere |
441
+
442
+ ##### 4. Consistency and Standards
443
+
444
+ Users shouldn't wonder whether different words, situations, or actions mean the same thing.
445
+
446
+ **Check for**:
447
+ - Consistent terminology throughout the interface
448
+ - Same actions produce same results everywhere
449
+ - Platform conventions followed (standard UI patterns)
450
+ - Visual consistency (colors, typography, spacing, components)
451
+ - Consistent interaction patterns (same gesture = same behavior)
452
+
453
+ **Scoring**:
454
+ | Score | Criteria |
455
+ |-------|----------|
456
+ | 0 | Inconsistent everywhere; feels like different products stitched together |
457
+ | 1 | Many inconsistencies; similar things look/behave differently |
458
+ | 2 | Partially consistent; main flows match, details diverge |
459
+ | 3 | Mostly consistent; occasional deviation, nothing confusing |
460
+ | 4 | Fully consistent; cohesive system, predictable behavior |
461
+
462
+ ##### 5. Error Prevention
463
+
464
+ Better than good error messages is a design that prevents problems in the first place.
465
+
466
+ **Check for**:
467
+ - Confirmation before destructive actions (delete, overwrite)
468
+ - Constraints preventing invalid input (date pickers, dropdowns)
469
+ - Smart defaults that reduce errors
470
+ - Clear labels that prevent misunderstanding
471
+ - Autosave and draft recovery
472
+
473
+ **Scoring**:
474
+ | Score | Criteria |
475
+ |-------|----------|
476
+ | 0 | Errors easy to make; no guardrails anywhere |
477
+ | 1 | Few safeguards; some inputs validated, most aren't |
478
+ | 2 | Partial prevention; common errors caught, edge cases slip |
479
+ | 3 | Good prevention; most error paths blocked proactively |
480
+ | 4 | Excellent; errors nearly impossible through smart constraints |
481
+
482
+ ##### 6. Recognition Rather Than Recall
483
+
484
+ Minimize memory load. Make objects, actions, and options visible or easily retrievable.
485
+
486
+ **Check for**:
487
+ - Visible options (not buried in hidden menus)
488
+ - Contextual help when needed (tooltips, inline hints)
489
+ - Recent items and history
490
+ - Autocomplete and suggestions
491
+ - Labels on icons (not icon-only navigation)
492
+
493
+ **Scoring**:
494
+ | Score | Criteria |
495
+ |-------|----------|
496
+ | 0 | Heavy memorization; users must remember paths and commands |
497
+ | 1 | Mostly recall; many hidden features, few visible cues |
498
+ | 2 | Some aids; main actions visible, secondary features hidden |
499
+ | 3 | Good recognition; most things discoverable, few memory demands |
500
+ | 4 | Everything discoverable; users never need to memorize |
501
+
502
+ ##### 7. Flexibility and Efficiency of Use
503
+
504
+ Accelerators, invisible to novices, speed up expert interaction.
505
+
506
+ **Check for**:
507
+ - Keyboard shortcuts for common actions
508
+ - Customizable interface elements
509
+ - Recent items and favorites
510
+ - Bulk/batch actions
511
+ - Power user features that don't complicate the basics
512
+
513
+ **Scoring**:
514
+ | Score | Criteria |
515
+ |-------|----------|
516
+ | 0 | One rigid path; no shortcuts or alternatives |
517
+ | 1 | Limited flexibility; few alternatives to the main path |
518
+ | 2 | Some shortcuts; basic keyboard support, limited bulk actions |
519
+ | 3 | Good accelerators; keyboard nav, some customization |
520
+ | 4 | Highly flexible; multiple paths, power features, customizable |
521
+
522
+ ##### 8. Aesthetic and Minimalist Design
523
+
524
+ Interfaces should not contain irrelevant or rarely needed information. Every element should serve a purpose.
525
+
526
+ **Check for**:
527
+ - Only necessary information visible at each step
528
+ - Clear visual hierarchy directing attention
529
+ - Purposeful use of color and emphasis
530
+ - No decorative clutter competing for attention
531
+ - Focused, uncluttered layouts
532
+
533
+ **Scoring**:
534
+ | Score | Criteria |
535
+ |-------|----------|
536
+ | 0 | Overwhelming; everything competes for attention equally |
537
+ | 1 | Cluttered; too much noise, hard to find what matters |
538
+ | 2 | Some clutter; main content clear, periphery noisy |
539
+ | 3 | Mostly clean; focused design, minor visual noise |
540
+ | 4 | Perfectly minimal; every element earns its pixel |
541
+
542
+ ##### 9. Help Users Recognize, Diagnose, and Recover from Errors
543
+
544
+ Error messages should use plain language, precisely indicate the problem, and constructively suggest a solution.
545
+
546
+ **Check for**:
547
+ - Plain language error messages (no error codes for users)
548
+ - Specific problem identification ("Email is missing @" not "Invalid input")
549
+ - Actionable recovery suggestions
550
+ - Errors displayed near the source of the problem
551
+ - Non-blocking error handling (don't wipe the form)
552
+
553
+ **Scoring**:
554
+ | Score | Criteria |
555
+ |-------|----------|
556
+ | 0 | Cryptic errors; codes, jargon, or no message at all |
557
+ | 1 | Vague errors; "Something went wrong" with no guidance |
558
+ | 2 | Clear but unhelpful; names the problem but not the fix |
559
+ | 3 | Clear with suggestions; identifies problem and offers next steps |
560
+ | 4 | Perfect recovery; pinpoints issue, suggests fix, preserves user work |
561
+
562
+ ##### 10. Help and Documentation
563
+
564
+ Even if the system is usable without docs, help should be easy to find, task-focused, and concise.
565
+
566
+ **Check for**:
567
+ - Searchable help or documentation
568
+ - Contextual help (tooltips, inline hints, guided tours)
569
+ - Task-focused organization (not feature-organized)
570
+ - Concise, scannable content
571
+ - Easy access without leaving current context
572
+
573
+ **Scoring**:
574
+ | Score | Criteria |
575
+ |-------|----------|
576
+ | 0 | No help available anywhere |
577
+ | 1 | Help exists but hard to find or irrelevant |
578
+ | 2 | Basic help; FAQ or docs exist, not contextual |
579
+ | 3 | Good documentation; searchable, mostly task-focused |
580
+ | 4 | Excellent contextual help; right info at the right moment |
581
+
582
+ ---
583
+
584
+ #### Score Summary
585
+
586
+ **Total possible**: 40 points (10 heuristics × 4 max)
587
+
588
+ | Score Range | Rating | What It Means |
589
+ |-------------|--------|---------------|
590
+ | 36–40 | Excellent | Minor polish only; ship it |
591
+ | 28–35 | Good | Address weak areas, solid foundation |
592
+ | 20–27 | Acceptable | Significant improvements needed before users are happy |
593
+ | 12–19 | Poor | Major UX overhaul required; core experience broken |
594
+ | 0–11 | Critical | Redesign needed; unusable in current state |
595
+
596
+ ---
597
+
598
+ #### Issue Severity (P0–P3)
599
+
600
+ Tag each individual issue found during scoring with a priority level:
601
+
602
+ | Priority | Name | Description | Action |
603
+ |----------|------|-------------|--------|
604
+ | **P0** | Blocking | Prevents task completion entirely | Fix immediately; this is a showstopper |
605
+ | **P1** | Major | Causes significant difficulty or confusion | Fix before release |
606
+ | **P2** | Minor | Annoyance, but workaround exists | Fix in next pass |
607
+ | **P3** | Polish | Nice-to-fix, no real user impact | Fix if time permits |
608
+
609
+ **Tip**: If you're unsure between two levels, ask: "Would a user contact support about this?" If yes, it's at least P1.
610
+
611
+ ---
612
+
613
+ ### Persona-Based Design Testing
614
+
615
+ Test the interface through the eyes of 5 distinct user archetypes. Each persona exposes different failure modes that a single "design director" perspective would miss.
616
+
617
+ **How to use**: Select 2–3 personas most relevant to the interface being critiqued. Walk through the primary user action as each persona. Report specific red flags, not generic concerns.
618
+
619
+ ---
620
+
621
+ #### 1. Impatient Power User: "Alex"
622
+
623
+ **Profile**: Expert with similar products. Expects efficiency, hates hand-holding. Will find shortcuts or leave.
624
+
625
+ **Behaviors**:
626
+ - Skips all onboarding and instructions
627
+ - Looks for keyboard shortcuts immediately
628
+ - Tries to bulk-select, batch-edit, and automate
629
+ - Gets frustrated by required steps that feel unnecessary
630
+ - Abandons if anything feels slow or patronizing
631
+
632
+ **Test Questions**:
633
+ - Can Alex complete the core task in under 60 seconds?
634
+ - Are there keyboard shortcuts for common actions?
635
+ - Can onboarding be skipped entirely?
636
+ - Do modals have keyboard dismiss (Esc)?
637
+ - Is there a "power user" path (shortcuts, bulk actions)?
638
+
639
+ **Red Flags** (report these specifically):
640
+ - Forced tutorials or unskippable onboarding
641
+ - No keyboard navigation for primary actions
642
+ - Slow animations that can't be skipped
643
+ - One-item-at-a-time workflows where batch would be natural
644
+ - Redundant confirmation steps for low-risk actions
645
+
646
+ ---
647
+
648
+ #### 2. Confused First-Timer: "Jordan"
649
+
650
+ **Profile**: Never used this type of product. Needs guidance at every step. Will abandon rather than figure it out.
651
+
652
+ **Behaviors**:
653
+ - Reads all instructions carefully
654
+ - Hesitates before clicking anything unfamiliar
655
+ - Looks for help or support constantly
656
+ - Misunderstands jargon and abbreviations
657
+ - Takes the most literal interpretation of any label
658
+
659
+ **Test Questions**:
660
+ - Is the first action obviously clear within 5 seconds?
661
+ - Are all icons labeled with text?
662
+ - Is there contextual help at decision points?
663
+ - Does terminology assume prior knowledge?
664
+ - Is there a clear "back" or "undo" at every step?
665
+
666
+ **Red Flags** (report these specifically):
667
+ - Icon-only navigation with no labels
668
+ - Technical jargon without explanation
669
+ - No visible help option or guidance
670
+ - Ambiguous next steps after completing an action
671
+ - No confirmation that an action succeeded
672
+
673
+ ---
674
+
675
+ #### 3. Accessibility-Dependent User: "Sam"
676
+
677
+ **Profile**: Uses screen reader (VoiceOver/NVDA), keyboard-only navigation. May have low vision, motor impairment, or cognitive differences.
678
+
679
+ **Behaviors**:
680
+ - Tabs through the interface linearly
681
+ - Relies on ARIA labels and heading structure
682
+ - Cannot see hover states or visual-only indicators
683
+ - Needs adequate color contrast (4.5:1 minimum)
684
+ - May use browser zoom up to 200%
685
+
686
+ **Test Questions**:
687
+ - Can the entire primary flow be completed keyboard-only?
688
+ - Are all interactive elements focusable with visible focus indicators?
689
+ - Do images have meaningful alt text?
690
+ - Is color contrast WCAG AA compliant (4.5:1 for text)?
691
+ - Does the screen reader announce state changes (loading, success, errors)?
692
+
693
+ **Red Flags** (report these specifically):
694
+ - Click-only interactions with no keyboard alternative
695
+ - Missing or invisible focus indicators
696
+ - Meaning conveyed by color alone (red = error, green = success)
697
+ - Unlabeled form fields or buttons
698
+ - Time-limited actions without extension option
699
+ - Custom components that break screen reader flow
700
+
701
+ ---
702
+
703
+ #### 4. Deliberate Stress Tester: "Riley"
704
+
705
+ **Profile**: Methodical user who pushes interfaces beyond the happy path. Tests edge cases, tries unexpected inputs, and probes for gaps in the experience.
706
+
707
+ **Behaviors**:
708
+ - Tests edge cases intentionally (empty states, long strings, special characters)
709
+ - Submits forms with unexpected data (emoji, RTL text, very long values)
710
+ - Tries to break workflows by navigating backwards, refreshing mid-flow, or opening in multiple tabs
711
+ - Looks for inconsistencies between what the UI promises and what actually happens
712
+ - Documents problems methodically
713
+
714
+ **Test Questions**:
715
+ - What happens at the edges (0 items, 1000 items, very long text)?
716
+ - Do error states recover gracefully or leave the UI in a broken state?
717
+ - What happens on refresh mid-workflow? Is state preserved?
718
+ - Are there features that appear to work but produce broken results?
719
+ - How does the UI handle unexpected input (emoji, special chars, paste from Excel)?
720
+
721
+ **Red Flags** (report these specifically):
722
+ - Features that appear to work but silently fail or produce wrong results
723
+ - Error handling that exposes technical details or leaves UI in a broken state
724
+ - Empty states that show nothing useful ("No results" with no guidance)
725
+ - Workflows that lose user data on refresh or navigation
726
+ - Inconsistent behavior between similar interactions in different parts of the UI
727
+
728
+ ---
729
+
730
+ #### 5. Distracted Mobile User: "Casey"
731
+
732
+ **Profile**: Using phone one-handed on the go. Frequently interrupted. Possibly on a slow connection.
733
+
734
+ **Behaviors**:
735
+ - Uses thumb only; prefers bottom-of-screen actions
736
+ - Gets interrupted mid-flow and returns later
737
+ - Switches between apps frequently
738
+ - Has limited attention span and low patience
739
+ - Types as little as possible, prefers taps and selections
740
+
741
+ **Test Questions**:
742
+ - Are primary actions in the thumb zone (bottom half of screen)?
743
+ - Is state preserved if the user leaves and returns?
744
+ - Does it work on slow connections (3G)?
745
+ - Can forms use autocomplete and smart defaults?
746
+ - Are touch targets at least 44×44pt?
747
+
748
+ **Red Flags** (report these specifically):
749
+ - Important actions positioned at the top of the screen (unreachable by thumb)
750
+ - No state persistence; progress lost on tab switch or interruption
751
+ - Large text inputs required where selection would work
752
+ - Heavy assets loading on every page (no lazy loading)
753
+ - Tiny tap targets or targets too close together
754
+
755
+ ---
756
+
757
+ #### Selecting Personas
758
+
759
+ Choose personas based on the interface type:
760
+
761
+ | Interface Type | Primary Personas | Why |
762
+ |---------------|-----------------|-----|
763
+ | Landing page / marketing | Jordan, Riley, Casey | First impressions, trust, mobile |
764
+ | Dashboard / admin | Alex, Sam | Power users, accessibility |
765
+ | E-commerce / checkout | Casey, Riley, Jordan | Mobile, edge cases, clarity |
766
+ | Onboarding flow | Jordan, Casey | Confusion, interruption |
767
+ | Data-heavy / analytics | Alex, Sam | Efficiency, keyboard nav |
768
+ | Form-heavy / wizard | Jordan, Sam, Casey | Clarity, accessibility, mobile |
769
+
770
+ ---
771
+
772
+ #### Project-Specific Personas
773
+
774
+ If `AGENTS.md` contains a `## Design Context` section (generated by `impeccable init`), derive 1–2 additional personas from the audience and brand information:
775
+
776
+ 1. Read the target audience description
777
+ 2. Identify the primary user archetype not covered by the 5 predefined personas
778
+ 3. Create a persona following this template:
779
+
780
+ ```
781
+ ##### [Role]: "[Name]"
782
+
783
+ **Profile**: [2-3 key characteristics derived from Design Context]
784
+
785
+ **Behaviors**: [3-4 specific behaviors based on the described audience]
786
+
787
+ **Red Flags**: [3-4 things that would alienate this specific user type]
788
+ ```
789
+
790
+ Only generate project-specific personas when real Design Context data is available. Don't invent audience details; use the 5 predefined personas when no context exists.