qualia-framework 4.5.0 → 5.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (66) hide show
  1. package/AGENTS.md +24 -0
  2. package/CLAUDE.md +12 -75
  3. package/README.md +23 -16
  4. package/agents/builder.md +9 -21
  5. package/agents/planner.md +8 -0
  6. package/agents/verifier.md +8 -0
  7. package/agents/visual-evaluator.md +132 -0
  8. package/bin/cli.js +54 -18
  9. package/bin/install.js +369 -29
  10. package/bin/qualia-ui.js +208 -1
  11. package/bin/slop-detect.mjs +5 -0
  12. package/bin/state.js +34 -1
  13. package/docs/install-redesign-builder-prompt.md +290 -0
  14. package/docs/install-redesign-pilot.md +234 -0
  15. package/docs/playwright-loop-builder-prompt.md +185 -0
  16. package/docs/playwright-loop-design-notes.md +108 -0
  17. package/docs/playwright-loop-pilot-results.md +170 -0
  18. package/docs/playwright-loop-tester-prompt.md +213 -0
  19. package/docs/polish-loop-supervised-run.md +111 -0
  20. package/docs/reviews/matt-pocock-skills-analysis.md +300 -0
  21. package/guide.md +9 -5
  22. package/hooks/env-empty-guard.js +74 -0
  23. package/hooks/pre-compact.js +19 -9
  24. package/hooks/pre-deploy-gate.js +8 -2
  25. package/hooks/pre-push.js +26 -12
  26. package/hooks/supabase-destructive-guard.js +62 -0
  27. package/hooks/vercel-account-guard.js +91 -0
  28. package/package.json +2 -1
  29. package/rules/design-brand.md +4 -0
  30. package/rules/design-laws.md +4 -0
  31. package/rules/design-product.md +4 -0
  32. package/rules/design-rubric.md +4 -0
  33. package/rules/grounding.md +4 -0
  34. package/skills/qualia-build/SKILL.md +40 -46
  35. package/skills/qualia-discuss/SKILL.md +51 -68
  36. package/skills/qualia-handoff/SKILL.md +1 -0
  37. package/skills/qualia-hook-gen/SKILL.md +206 -0
  38. package/skills/qualia-issues/SKILL.md +151 -0
  39. package/skills/qualia-map/SKILL.md +78 -35
  40. package/skills/qualia-new/REFERENCE.md +139 -0
  41. package/skills/qualia-new/SKILL.md +45 -121
  42. package/skills/qualia-optimize/REFERENCE.md +265 -0
  43. package/skills/qualia-optimize/SKILL.md +92 -232
  44. package/skills/qualia-plan/SKILL.md +58 -65
  45. package/skills/qualia-polish-loop/REFERENCE.md +265 -0
  46. package/skills/qualia-polish-loop/SKILL.md +201 -0
  47. package/skills/qualia-polish-loop/fixtures/broken.html +117 -0
  48. package/skills/qualia-polish-loop/fixtures/clean.html +196 -0
  49. package/skills/qualia-polish-loop/scripts/loop.mjs +323 -0
  50. package/skills/qualia-polish-loop/scripts/playwright-capture.mjs +206 -0
  51. package/skills/qualia-polish-loop/scripts/score.mjs +176 -0
  52. package/skills/qualia-prd/SKILL.md +199 -0
  53. package/skills/qualia-report/SKILL.md +141 -200
  54. package/skills/qualia-research/SKILL.md +28 -33
  55. package/skills/qualia-road/SKILL.md +103 -0
  56. package/skills/qualia-ship/SKILL.md +1 -0
  57. package/skills/qualia-task/SKILL.md +1 -1
  58. package/skills/qualia-test/SKILL.md +50 -2
  59. package/skills/qualia-triage/SKILL.md +152 -0
  60. package/skills/qualia-verify/SKILL.md +63 -104
  61. package/skills/qualia-zoom/SKILL.md +51 -0
  62. package/skills/zoho-workflow/SKILL.md +1 -1
  63. package/templates/CONTEXT.md +36 -0
  64. package/templates/decisions/ADR-template.md +30 -0
  65. package/tests/bin.test.sh +598 -7
  66. package/tests/state.test.sh +58 -0
@@ -0,0 +1,234 @@
1
+ # Install Redesign — Pilot Results (v5.1.0)
2
+
3
+ Captured output and timing measurements for the three install-target
4
+ scenarios, run on Linux 6.19 / Node 22 / non-TTY (piped stdin).
5
+
6
+ All three scenarios pass cleanly. Total installer wall-clock cost
7
+ remains under 200ms; the live-progress lifecycle adds negligible
8
+ overhead because most file copies finish under 50ms (sub-50ms ops skip
9
+ the "doing" state and go straight to `✓`).
10
+
11
+ ## Method
12
+
13
+ ```bash
14
+ TMP=$(mktemp -d)
15
+ START=$(date +%s%N)
16
+ printf 'QS-FAWZI-01\n<CHOICE>\n' | HOME="$TMP" node bin/install.js > log.txt 2>&1
17
+ END=$(date +%s%N)
18
+ echo "wall-clock: $(( (END-START)/1000000 ))ms"
19
+ ```
20
+
21
+ ANSI codes stripped from the captured logs below for readability. The
22
+ real install renders the same lines with OKLCH-tinted teal / dim white
23
+ / green / yellow per `bin/qualia-ui.js`.
24
+
25
+ ## Scenario 1 — Claude Code only (target=1, the default)
26
+
27
+ **Wall-clock:** 99ms · **Lines:** 202 · **Result:** `~/.claude/` populated
28
+ (11 entries: agents, bin, CLAUDE.md, hooks, knowledge, qualia-guide.md,
29
+ qualia-references, qualia-templates, rules, settings.json, skills),
30
+ `~/.codex/` not created.
31
+
32
+ ```
33
+ ⬢ Q U A L I A
34
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
35
+ Framework v5.1.0 · Qualia Solutions
36
+ Plan → Build → Verify → Ship
37
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
38
+
39
+ Enter install code: QS-FAWZI-01
40
+
41
+ ✓ Welcome, Fawzi Goussous
42
+ Role: OWNER
43
+
44
+ Where would you like to install Qualia?
45
+
46
+ [1] Claude Code only — recommended, full feature set
47
+ [2] OpenAI Codex only — AGENTS.md (Codex's open standard)
48
+ [3] Both — max compatibility
49
+
50
+ Choice [1]: 1
51
+
52
+ Target: Claude Code
53
+
54
+ ▸ Skills
55
+ ────────────────────────────────────────
56
+ ✓ qualia
57
+ ✓ qualia-build
58
+ ✓ qualia-debug
59
+ ... (33 skills total) ...
60
+ ✓ zoho-workflow
61
+ └─ 33 skills · 4ms
62
+
63
+ ▸ Agents
64
+ ────────────────────────────────────────
65
+ ✓ builder.md
66
+ ... (9 agents total) ...
67
+ ✓ visual-evaluator.md
68
+ └─ 9 agents · 0ms
69
+
70
+ ▸ Rules
71
+ ────────────────────────────────────────
72
+ ... (10 rules) ...
73
+ └─ 10 rules · 0ms
74
+
75
+ ▸ Hooks (12 wired)
76
+ ▸ Templates (16 entries, recursive)
77
+ ▸ Knowledge layer (initialized on first install)
78
+ ▸ References (methodology docs)
79
+ ▸ Configuration (CLAUDE.md role substituted)
80
+ ▸ Scripts (state.js, qualia-ui.js, etc.)
81
+ ▸ Knowledge Base (learned-patterns / common-fixes / client-prefs)
82
+ ▸ ERP Integration (key opt-in)
83
+
84
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
85
+ ⬢ INSTALLED
86
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
87
+
88
+ Fawzi Goussous · OWNER · v5.1.0
89
+
90
+ Targets Claude Code
91
+ Time 99ms
92
+
93
+ Skills 33 Agents 9 Hooks 12
94
+ Rules 10 Scripts 7 Templates 16
95
+
96
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
97
+ Quick Start
98
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
99
+
100
+ 1. Restart Claude Code (loads new settings)
101
+ 2. cd into any project and run claude
102
+ 3. Try /qualia-new — kickoff a new project
103
+
104
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
105
+ Welcome to the future with Qualia.
106
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
107
+ ```
108
+
109
+ ## Scenario 2 — Codex only (target=2)
110
+
111
+ **Wall-clock:** 183ms (extra cost is the `which codex` probe via
112
+ `spawnSync`) · **Lines:** 56 · **Result:** `~/.codex/AGENTS.md` written
113
+ with `Role: OWNER` substituted, `~/.claude/` not touched.
114
+
115
+ ```
116
+ ⬢ Q U A L I A
117
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
118
+ Framework v5.1.0 · Qualia Solutions
119
+ ...
120
+
121
+ Choice [1]: 2
122
+
123
+ Target: Codex
124
+
125
+ ▸ Codex
126
+ ────────────────────────────────────────
127
+ ✓ AGENTS.md (configured as OWNER)
128
+ └─ Codex install scope: AGENTS.md only — Codex's runtime does not currently
129
+ consume the framework's skills/hooks/agents on disk. AGENTS.md carries
130
+ the rules; commands route through Claude Code.
131
+
132
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
133
+ ⬢ INSTALLED
134
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
135
+
136
+ Fawzi Goussous · OWNER · v5.1.0
137
+
138
+ Targets Codex
139
+ Time 183ms
140
+
141
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
142
+ Quick Start
143
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
144
+
145
+ 1. Open Codex in any project
146
+ 2. Codex picks up ~/.codex/AGENTS.md automatically
147
+ 3. Ask Codex about Qualia rules — they're in AGENTS.md
148
+ ```
149
+
150
+ If `which codex` fails (CLI not installed), the section prints a soft
151
+ warning before the file write; the AGENTS.md is still written so the
152
+ user is set up for when they install Codex via
153
+ `npm install -g @openai/codex`.
154
+
155
+ ## Scenario 3 — Both (target=3)
156
+
157
+ **Wall-clock:** 193ms · **Lines:** 209 · **Result:** Both `~/.claude/`
158
+ and `~/.codex/AGENTS.md` populated. Final summary shows
159
+ `Targets Claude Code · Codex`.
160
+
161
+ The Claude install runs first (identical to Scenario 1), then the Codex
162
+ section appends:
163
+
164
+ ```
165
+ ▸ Codex
166
+ ────────────────────────────────────────
167
+ ✓ AGENTS.md (configured as OWNER)
168
+ └─ Codex install scope: AGENTS.md only — ...
169
+ ```
170
+
171
+ Final card:
172
+
173
+ ```
174
+ Targets Claude Code · Codex
175
+ Time 193ms
176
+ ```
177
+
178
+ ## Backward compatibility — legacy single-line piped install
179
+
180
+ ```bash
181
+ echo "QS-FAWZI-01" | npx qualia-framework install
182
+ ```
183
+
184
+ still works untouched. The target prompt sees EOF on stdin, normalizes
185
+ to `1` (Claude only). Confirmed by test 121 in `tests/bin.test.sh`.
186
+
187
+ ## TTY-degradation verification
188
+
189
+ In non-TTY mode (output piped to a file), the live-progress primitives
190
+ in `bin/qualia-ui.js` skip:
191
+
192
+ - The Braille spinner (`spinner()`) — prints a one-shot `→ text` /
193
+ `✓ text` instead of an animated frame loop.
194
+ - The cursor-up / clear-line overwrites (`step()`) — skips the
195
+ `⏳ doing` line and prints the result line directly when finalized.
196
+ - Cursor hide/show escapes — never emitted.
197
+
198
+ Verified by test 124: `grep` for bare `\r`, `?25` (hide-cursor), and
199
+ `\[2K` (clear-line) on a piped install log returns nothing.
200
+
201
+ ## Timing budget
202
+
203
+ | Scenario | Wall-clock | Lines emitted | Notes |
204
+ |----------|------------|---------------|-------|
205
+ | 1 — Claude only | 99ms | 202 | Baseline |
206
+ | 2 — Codex only | 183ms | 56 | + `which codex` probe (~80ms on Node 22 / Linux) |
207
+ | 3 — Both | 193ms | 209 | Claude + Codex back-to-back |
208
+
209
+ The spec budget was "≤ 2× the current install time." v5.1 stays comfortably
210
+ under it — the live updates and section timing add only the cost of the
211
+ extra console.log calls (negligible) and a couple of `Date.now()` calls
212
+ per section.
213
+
214
+ ## Known limitations / deferred to v5.2
215
+
216
+ - **Codex skills/hooks/agents not mirrored.** Codex uses `.toml` agent
217
+ format and a different hook schema. AGENTS.md is the rule layer they
218
+ share; the rest stays Claude-only until Codex's on-disk surface
219
+ stabilizes for cross-mapping.
220
+ - **Spinner cosmetics on `cmd.exe`.** Braille frames may render as
221
+ boxes on legacy `cmd.exe`. Modern Windows Terminal handles them. The
222
+ install completes correctly either way; this is purely cosmetic.
223
+ - **`Time` row precision.** Sub-second installs show `Xms`; ≥1s installs
224
+ show `X.Ys`. Either format is grep-friendly (regex
225
+ `Time *[0-9]+(\.[0-9]+)?(ms|s)`).
226
+
227
+ ## What this enables
228
+
229
+ A user piping `npx qualia-framework install` from a fresh box now has
230
+ a one-keystroke choice between Claude-only (the recommended default)
231
+ and shipping Qualia's rule layer to OpenAI Codex via the open AGENTS.md
232
+ standard. The live-progress redesign means even a 5-second install
233
+ feels like an intentional product, not a hung process. First impression
234
+ matches the rest of the framework's polish.
@@ -0,0 +1,185 @@
1
+ # Playwright Visual-Polish Loop — Builder Agent Prompt
2
+
3
+ **Hand this entire file to a fresh Claude Code session.** Self-contained — no context from the originating session is needed.
4
+
5
+ ---
6
+
7
+ ## You are building a feature for the Qualia Framework v5.1
8
+
9
+ The Qualia Framework is a Claude Code workflow framework at `/home/qualia-new/qualia-framework` (npm package `qualia-framework`, current version 5.0.0). It manages full-stack project delivery for Qualia Solutions (Nicosia, Cyprus). It already has 32 skills, 12 hooks, 8 agents, 24 templates, and 260+ tests. Your job is to add ONE new flagship capability for v5.1: an autonomous visual-polish loop that uses Playwright to screenshot live pages and self-correct frontend until visually correct.
10
+
11
+ ## Why this exists (the friction it fixes)
12
+
13
+ Per `/insights` data from the framework owner Fawzi (122 sessions, 292 commits, 10 days), the #1 documented friction pattern is **design iteration churn** — hero videos, mobile layouts, and responsive breakpoints requiring 5-10 manual rounds before landing. Quotes from his transcripts include:
14
+
15
+ - "many frustrating iterations" on hero video layouts
16
+ - "what did u do" after a CSS regression
17
+ - "first showcase used basic HTML/CSS animation; you had to explicitly request 'proper design animation three js or framer motion'"
18
+ - "FUCK U" / "OMG I TOLD U I CHANGED THE PAGE SO U STOP LYING" — clusters around Claude not seeing what's actually rendered
19
+
20
+ **The root cause:** the framework's design QA today is text-based. Slop-detect grep-scans CSS for em-dashes/banned fonts. The verifier scores 8 design dimensions by reading TSX/CSS, not by looking at rendered pages. When something fails visually but passes code review (most hero-video, mobile-layout, and responsive-breakpoint bugs), there's no feedback loop. The user has to look, complain, iterate.
21
+
22
+ **Your feature closes that loop.** A new skill `/qualia-polish-loop` takes a URL + design brief, screenshots at multiple viewports, evaluates against the brief using vision, identifies issues, edits files, redeploys to a Vercel preview, loops up to N times until criteria pass, and stops only when it's actually correct (or hits a kill-switch).
23
+
24
+ ## What "good" looks like (success criteria)
25
+
26
+ The feature must:
27
+
28
+ 1. **See its own work.** Screenshots at mobile (375px), tablet (768px), desktop (1440px) at minimum, captured via Playwright MCP.
29
+ 2. **Anchor evaluation rigorously.** Vision model judgments must be scored against the project's `.planning/DESIGN.md` brief AND the `rules/design-rubric.md` 8-dimension scoring (Typography / Color cohesion / Spatial rhythm / Layout originality / Shadow & depth / Motion intent / Microcopy specificity / Container depth). Each dimension scored 1-5 with evidence; ANY <3 fails the iteration.
30
+ 3. **Iterate with discipline.** Max 8 iterations per loop invocation. Hard kill-switch if the same issue recurs 3 times (regression-stop). Per-iteration: identify TOP 3 issues, edit relevant files, redeploy, re-screenshot, re-evaluate.
31
+ 4. **Stop only when correct.** Success = all 8 dimensions ≥ 3 AND no critical-severity issues remain.
32
+ 5. **Token-discipline.** Each iteration uses ≤ 4 vision evaluations (3 viewports + 1 holistic). Estimate token cost upfront and warn user if budget will exceed 100K tokens.
33
+ 6. **Never silently destroy work.** All file edits go through `git commit` per iteration so any iteration is revertable. Failed iterations leave clear `[ITERATION-N]` commit prefixes for cleanup.
34
+ 7. **Integrate with the framework.** Honors all framework conventions: PRODUCT.md / DESIGN.md / CONTEXT.md as substrate, slop-detect at commit, qualia-ui banner, state.js telemetry.
35
+
36
+ ## Architecture (the design is yours to refine)
37
+
38
+ ```
39
+ /qualia-polish-loop {url} [--brief design-brief.md] [--max 8] [--viewports 375,768,1440]
40
+
41
+
42
+ Pre-flight (sequential, me)
43
+ ├─ Read .planning/PRODUCT.md (register, anti-references, voice)
44
+ ├─ Read .planning/DESIGN.md (color strategy, scene sentence, palette)
45
+ ├─ Read rules/design-rubric.md (8-dim scoring criteria)
46
+ ├─ Read brief argument if provided, else use DESIGN.md
47
+ └─ Estimate token budget. Warn if > 100K. AskUserQuestion to confirm proceed.
48
+
49
+
50
+ Loop (max 8 iterations):
51
+ ├─ Iteration N starts: log to .planning/visual-polish-loop.md
52
+ ├─ Capture: 3 viewports via Playwright MCP → save to /tmp/qpl-{N}/
53
+ ├─ Evaluate: spawn vision agent with screenshots + brief + rubric
54
+ │ Returns: per-dim 1-5 scores + evidence + top 3 issues + severity
55
+ ├─ Decide: all dims ≥ 3 AND no critical? → SUCCESS, exit loop
56
+ ├─ Else: regression check — if same issue recurred 3x → KILL, exit with FAIL
57
+ ├─ Else: spawn 1 builder per top-issue (parallel, max 3) to fix
58
+ │ Each builder: read affected file, apply fix, slop-detect, commit
59
+ ├─ Redeploy: vercel deploy --prebuilt OR `npm run dev` heartbeat check
60
+ └─ Loop back to capture
61
+
62
+
63
+ Post-loop:
64
+ ├─ Write .planning/visual-polish-loop.md (full report: iterations, scores, fixes)
65
+ ├─ Show before/after screenshots side-by-side via qualia-ui
66
+ ├─ git add .planning/visual-polish-loop.md && commit
67
+ └─ Tell user: SUCCESS / KILLED-AT-N / OUT-OF-BUDGET
68
+ ```
69
+
70
+ ## Integration points (read these before designing)
71
+
72
+ Before writing any code, read these files to understand the framework:
73
+
74
+ 1. `/home/qualia-new/qualia-framework/CLAUDE.md` — project rules, instruction-budget discipline
75
+ 2. `/home/qualia-new/qualia-framework/rules/design-rubric.md` — 8-dimension 1-5 scoring criteria with anchored definitions per dimension
76
+ 3. `/home/qualia-new/qualia-framework/rules/design-laws.md` — non-negotiable design rules (OKLCH-only, banned fonts, side-stripe-borders, gradient-text bans, glassmorphism, etc.)
77
+ 4. `/home/qualia-new/qualia-framework/templates/PRODUCT.md` — what the agent will read as register/anti-references/voice substrate
78
+ 5. `/home/qualia-new/qualia-framework/templates/DESIGN.md` — what the agent will read as visual contract
79
+ 6. `/home/qualia-new/qualia-framework/skills/qualia-polish/SKILL.md` — existing scope-adaptive polish skill; understand its modes (Component / Section / App / Redesign / Critique / Quick) and how this loop fits as a 7th mode OR a separate skill
80
+ 7. `/home/qualia-new/qualia-framework/agents/verifier.md` — how the existing verifier scores design dimensions today (text-based)
81
+ 8. `/home/qualia-new/qualia-framework/skills/qualia-build/SKILL.md` — pattern for spawning builder subagents in parallel
82
+ 9. `/home/qualia-new/qualia-framework/bin/qualia-ui.js` — the UI helper for banners/dividers/end-cards
83
+ 10. `/home/qualia-new/qualia-framework/bin/state.js` — for telemetry transitions if you want to log loop iterations
84
+ 11. `/home/qualia-new/qualia-framework/bin/slop-detect.mjs` — must be invoked on every committed file in iteration
85
+
86
+ ## External dependencies you'll integrate
87
+
88
+ 1. **Playwright MCP** — verify it's available via `claude mcp list` or instructions to add. Use `mcp__playwright__navigate`, `mcp__playwright__take_screenshot` (or equivalent — verify exact tool names by listing available MCP tools). Setup may require:
89
+ ```
90
+ claude mcp add playwright -- npx -y @playwright/mcp@latest
91
+ ```
92
+ Plus on Linux/CI: `npx playwright install chromium` to get the browser binaries.
93
+
94
+ 2. **Vision model** — Claude (you, the agent) reads images natively. The screenshots get attached to the spawned vision agent's prompt. Use the Read tool with image file paths.
95
+
96
+ 3. **Vercel deploys** — use `vercel deploy` (not `--prod`) to publish a preview each iteration. Read `.vercel/project.json` for project linkage. The dev-mode alternative is `npm run dev` + curl heartbeat, but preview deploys give a stable URL for re-screenshotting from anywhere.
97
+
98
+ ## Decision points the user (Fawzi) will care about
99
+
100
+ You MUST present these via `AskUserQuestion` BEFORE starting the loop on first invocation. Each is a load-bearing choice:
101
+
102
+ 1. **Brief source** — use `.planning/DESIGN.md` (default) OR a separate `--brief` markdown file (override). If neither exists, halt: "No design brief found. Run /qualia-new or pass --brief."
103
+
104
+ 2. **Reference screenshots (optional but recommended)** — "Do you have a reference image of what this should look like? Paste path or skip." Reference-anchored vision is dramatically more reliable than rubric-only.
105
+
106
+ 3. **Auto-deploy strategy** — "Each iteration redeploys to Vercel preview (slower, real environment) or runs `npm run dev` and screenshots localhost (faster, dev artifacts may differ). Pick: vercel-preview / dev-localhost." Default: dev-localhost for iteration speed, vercel-preview for final.
107
+
108
+ 4. **Token budget cap** — "Estimated 60-100K tokens for 8 iterations. Cap at 100K (default), 200K (generous), or 50K (tight)?"
109
+
110
+ ## Files to create
111
+
112
+ - `/home/qualia-new/qualia-framework/skills/qualia-polish-loop/SKILL.md` — the new skill (frontmatter + workflow + decision gates). Target: <250 lines per Matt Pocock progressive disclosure rule.
113
+ - `/home/qualia-new/qualia-framework/skills/qualia-polish-loop/REFERENCE.md` — verbatim agent prompt templates (vision-eval prompt, fix-builder prompt, etc.).
114
+ - `/home/qualia-new/qualia-framework/skills/qualia-polish-loop/scripts/playwright-capture.mjs` — Node ESM helper that takes URL + viewports[] + outDir, drives the Playwright MCP via subprocess OR uses Playwright directly via `npm install playwright` (your call).
115
+ - `/home/qualia-new/qualia-framework/skills/qualia-polish-loop/scripts/score.mjs` — utility that takes a scored JSON object (8 dim scores + evidence) and computes pass/fail per the rubric formula.
116
+
117
+ ## Files to modify
118
+
119
+ - `/home/qualia-new/qualia-framework/bin/install.js` — register the new skill (it's recursive, should auto-pick up the new folder, but verify).
120
+ - `/home/qualia-new/qualia-framework/skills/qualia-road/SKILL.md` — add `/qualia-polish-loop` to the v5.1 alignment-substrate list.
121
+ - `/home/qualia-new/qualia-framework/CHANGELOG.md` — add a v5.1.0 entry.
122
+ - `/home/qualia-new/qualia-framework/package.json` — bump version to 5.1.0.
123
+ - `/home/qualia-new/qualia-framework/tests/bin.test.sh` — add install assertions for the new skill folder (matching the v5.0 pattern at lines ~960-980).
124
+
125
+ ## Hard constraints (non-negotiable)
126
+
127
+ 1. **Vision-eval discipline.** The vision agent MUST be spawned with the rubric criteria inlined. Never spawn with "tell me what you think" — that's how you get "looks great!" hallucinations. Use the format Matt Pocock uses for grilling: ask one question at a time per dimension, require evidence on the next line.
128
+
129
+ 2. **Anti-loop discipline.** Track issue fingerprints across iterations. If issue X (same file:line, same dim, same severity) appears in 3 consecutive iterations, KILL with `LOOP_REGRESSION_DETECTED` and write a diagnostic to the report.
130
+
131
+ 3. **Per-iteration commits.** Every file edit gets its own commit with prefix `qpl-N: {issue-slug}`. The user must be able to `git revert` any iteration cleanly.
132
+
133
+ 4. **Slop-detect gate.** Before any commit, run `node ~/.claude/bin/slop-detect.mjs {touched files}`. Critical findings BLOCK the commit. The fix-builder must reapply.
134
+
135
+ 5. **No background processes after exit.** Cleanup any Playwright browser processes, temp screenshots, dev-server PIDs.
136
+
137
+ 6. **Honor `prefers-reduced-motion`.** Vision evaluation must NOT penalize an absence of motion if the page is motion-reduced (read the user's OS-level setting via Playwright if possible, else default to motion-on).
138
+
139
+ 7. **Do not modify framework agents/* files.** Specifically: don't touch agents/builder.md, agents/verifier.md, etc. The loop's vision evaluator is a NEW agent role file — create `agents/visual-evaluator.md` if needed.
140
+
141
+ ## Self-test scenarios (you must run these before declaring DONE)
142
+
143
+ Build the feature, then run it in 3 scenarios. Document outcomes in `/home/qualia-new/qualia-framework/docs/playwright-loop-pilot-results.md`:
144
+
145
+ **Scenario 1 — Synthetic clean page.** Create a deliberately well-designed test page (use Tailwind v4, OKLCH palette, varied layout, all 7 states, 65ch line length on body). Run the loop on it. Expected: SUCCESS in 1-2 iterations with all dims ≥ 4.
146
+
147
+ **Scenario 2 — Synthetic broken page.** Create a deliberately bad page (Inter font, blue-purple gradient, identical 3-card grid, hero centered + gradient bg + 2 CTAs, em-dashes, side-stripe borders). Run the loop. Expected: identifies all anti-patterns, fixes them, ends with SUCCESS in 4-6 iterations.
148
+
149
+ **Scenario 3 — Stress test the kill-switch.** Manually inject a fix-builder that always reintroduces the same issue (e.g. always rewrites color to `#000`). Run the loop. Expected: KILLED at iteration 4 with `LOOP_REGRESSION_DETECTED` after 3 consecutive recurrences.
150
+
151
+ For each scenario record: total iterations, total tokens consumed, final scores, screenshots before/after, time elapsed.
152
+
153
+ ## Things you MUST NOT do
154
+
155
+ - Do not deploy to production. Vercel preview only. Never `vercel --prod`.
156
+ - Do not touch any file in `.planning/` other than writing your own report.
157
+ - Do not add new dependencies without justification — Playwright MCP + native Node is the budget.
158
+ - Do not increase any global SKILL.md or CLAUDE.md sizes (instruction-budget discipline).
159
+ - Do not invent new design rules — score against `rules/design-rubric.md` as it is. If the rubric is wrong, that's a separate problem.
160
+ - Do not "just make it work" by iterating forever. Hard cap 8.
161
+
162
+ ## Deliverables (the DONE definition)
163
+
164
+ You return DONE when ALL of these are true:
165
+
166
+ 1. ✅ `/qualia-polish-loop` skill exists and installs via `node bin/install.js`
167
+ 2. ✅ All 3 self-test scenarios pass per the spec above (results doc written)
168
+ 3. ✅ `npm test` shows the new install assertions passing (tests pass count went up by ≥ 4)
169
+ 4. ✅ `node bin/slop-detect.mjs` clean on all new files
170
+ 5. ✅ CHANGELOG v5.1.0 entry present and slop-clean
171
+ 6. ✅ A 1-page integration note at `docs/playwright-loop-design-notes.md` documenting: how it integrates with existing /qualia-polish, where it differs, when to use which, and what's deferred to v5.2
172
+
173
+ Return DONE with the test results and the path to the pilot-results doc.
174
+
175
+ ## When you encounter unknowns
176
+
177
+ The Playwright MCP setup, vision-eval reliability, and Vercel preview deploy timing are all real unknowns. When you hit one:
178
+
179
+ - Check `claude mcp list` to see what's actually wired in this session
180
+ - Try ONE approach, measure its reliability via Scenario 1, iterate
181
+ - If something is genuinely blocking after 2 attempts, write a `BLOCKED — {what}` report to `docs/playwright-loop-blockers.md` and surface it back to the user. Do NOT silently work around blockers — the framework owner needs to know what's brittle before relying on it.
182
+
183
+ ## One last thing
184
+
185
+ Fawzi (the framework owner) will read your report. He's a senior engineer with strong design sense and very low tolerance for flakiness. If the loop kind-of-works but is unreliable, mark it experimental and say so loudly. If it works well, that's the v5.1 headline. Honest reporting beats good-news theater every time.
@@ -0,0 +1,108 @@
1
+ # /qualia-polish-loop — Design notes
2
+
3
+ One-page integration narrative. Companion to the SKILL.md (`skills/qualia-polish-loop/SKILL.md`) and pilot results (`docs/playwright-loop-pilot-results.md`).
4
+
5
+ ## What it is
6
+
7
+ A skill that takes a URL + design brief, screenshots at three viewports (mobile / tablet / desktop), evaluates with vision against the 8-dimension rubric, fixes the top issues with parallel builders, re-screenshots, and loops until every dimension scores at least 3 (success) or one of three kill conditions trips: regression, budget, max-iterations.
8
+
9
+ ## Why it exists separately from `/qualia-polish`
10
+
11
+ `/qualia-polish` (v4.5.0+) is **scope-adaptive** with six modes (Component / Section / App / Redesign / Critique / Quick). Its evaluation is **text-first**: it reads CSS and TSX, runs `slop-detect`, runs Lighthouse if a dev server is up, and (in Redesign scope only) runs a 2-iteration vision loop as Stage 4. The vision step is one stage of one mode.
12
+
13
+ `/qualia-polish-loop` is **vision-first** and built to actually iterate. It assumes a running URL, captures real renders, and treats the screenshot as primary evidence. The loop length is configurable up to 8; regressions are tracked with fingerprints; every fix is its own revertable commit.
14
+
15
+ These are not redundant. They solve different failure modes:
16
+
17
+ | Failure mode | `/qualia-polish` catches it | `/qualia-polish-loop` catches it |
18
+ |---|---|---|
19
+ | Banned font in source | YES (slop-detect grep on CSS) | YES (vision sees Inter rendering) |
20
+ | Hardcoded hex in JSX | YES (slop-detect) | NO directly — would manifest as Color < 3 |
21
+ | Three-column card grid | YES (slop-detect) | YES (vision sees identical cards) |
22
+ | Hero video framed wrong on mobile | NO (text doesn't reveal mobile cropping) | YES (mobile screenshot + min-aggregate) |
23
+ | Touch targets < 44px | NO (slop-detect doesn't measure) | YES (visible on 375px capture) |
24
+ | Spacing rhythm "feels off" | NO | YES (vision scores Spatial < 3) |
25
+ | `prefers-reduced-motion` working correctly | YES (CSS grep) | YES (capture with reduced-motion forced — deferred to v5.1.1) |
26
+
27
+ The visual-only failures are exactly Fawzi's `/insights`-documented friction pattern: hero videos cropped wrong, mobile spacing collapsing, motion missing. `slop-detect` was never going to catch those — it doesn't see what the browser draws.
28
+
29
+ ## When to use which
30
+
31
+ | User says... | Use |
32
+ |---|---|
33
+ | "fix the button styling" | `/qualia-polish src/components/Button.tsx` |
34
+ | "the whole dashboard needs a design pass" | `/qualia-polish app/dashboard` |
35
+ | "it doesn't look right on mobile" | `/qualia-polish-loop http://localhost:3000` |
36
+ | "score without fixing" | `/qualia-polish --critique` |
37
+ | "iterate on the home page until the hero video is right" | `/qualia-polish-loop http://localhost:3000 --max 6` |
38
+ | "ship-ready final check" | `/qualia-polish-loop` then `/qualia-ship` |
39
+
40
+ The smart router (`/qualia`) does not auto-route to `/qualia-polish-loop` because it requires a running URL. Users invoke it explicitly.
41
+
42
+ ## Architectural choices and why
43
+
44
+ ### Chromium binary as default backend
45
+
46
+ The capture script tries four backends in order:
47
+
48
+ 1. `import('playwright')` — when the project has it as a dep
49
+ 2. `import('playwright-core')` — same API, lighter package
50
+ 3. `~/.cache/ms-playwright/chromium-{version}/chrome-{linux64,linux,mac,win}/chrome` — Playwright-cached chromium binary, used directly via `--headless=new --screenshot`
51
+ 4. `which google-chrome` / `chromium` / `chromium-browser` / `chrome` — system browser
52
+
53
+ The earlier draft of this skill used `mcp__claude-in-chrome__*` tools as primary, but those require the user to install a Chrome browser extension and have Chrome running with it active — a real prerequisite for many users (browserless servers, CI). The new chromium-binary fallback removes that prerequisite: any machine with Google Chrome on PATH or Playwright cached binaries can run the loop.
54
+
55
+ Playwright SDK is preferred when available because its `waitUntil: 'networkidle'` is more deterministic than `--virtual-time-budget`. Binary mode is the safety net.
56
+
57
+ ### Deterministic state outside the LLM context
58
+
59
+ `scripts/loop.mjs` is a CLI state machine. The iteration counter, token usage, fingerprint history, and verdict all live in a JSON file at `/tmp/qpl-{ts}/state.json`. Claude reads compact JSON (`{verdict, iteration, top_issues}`) per iteration — not the full state. This keeps per-iteration token cost roughly constant (~14.5K) instead of growing with iteration count.
60
+
61
+ ### Vision-evaluator anchoring
62
+
63
+ The single biggest failure mode for vision-eval is "looks great!" hallucinations. The visual-evaluator agent (`agents/visual-evaluator.md`) inlines the rubric criteria with anchored definitions (`1 = fails, 2 = below acceptable, 3 = acceptable — DEFAULT, 4 = good, 5 = excellent`) and requires evidence on the line after each score. The instruction `DEFAULT TO 3` is repeated three times in the role file. Calibration examples show "good" and "rejected" evaluations side by side.
64
+
65
+ The output is a single fenced JSON block — no prose — which the orchestrator parses without re-asking. The aggregate score is the **minimum** across viewports per dimension, so a layout that's elegant on desktop but breaks at 375px is a fail. This was deliberate: most documented visual regressions are mobile-only.
66
+
67
+ ### Regression detection via fingerprints
68
+
69
+ Each top-issue is hashed to a fingerprint = `{dim}__{file_basename}__{first_32_chars_of_description}` (lowercased, non-word chars collapsed). If the same fingerprint appears in **3 consecutive iterations**, the loop kills with `LOOP_REGRESSION_DETECTED`. Non-consecutive recurrences don't kill — they're a normal pattern when a fix worked, then a different change broke the same dimension again.
70
+
71
+ ### Per-iteration commits
72
+
73
+ Every fix is its own git commit with `qpl-{N}: {slug}` prefix. The orchestrator gates the commit through `slop-detect` first; critical findings BLOCK the commit and the fix-builder must retry. The user can `git revert` any single iteration cleanly without losing other fixes. The alternative (one squashed commit at the end) was rejected because it makes partial rollbacks impossible.
74
+
75
+ ## Deferred to v5.2
76
+
77
+ 1. **`prefers-reduced-motion` capture mode** — currently the capture script doesn't force `prefers-reduced-motion: reduce` in the headless run. The vision evaluator handles the case correctly (scores motion on CSS-declaration quality, not visible animation, when reduced motion is on), but the capture itself doesn't force the OS bit. Adding `--force-prefers-reduced-motion` is straightforward via Chrome flags.
78
+
79
+ 2. **Vercel-preview deploy mode end-to-end** — `--deploy preview` is wired in SKILL.md (each iteration redeploys to a Vercel preview URL), but not exercised in the pilot. Real-project use will surface deploy-latency edge cases. Once validated, this can become the default for production iteration.
80
+
81
+ 3. **Multi-route sweeps** — one URL per invocation today. Multi-route would mean batching `/route-a, /route-b, /route-c` and running the loop per route, then producing a unified report. Useful for marketing-site polish where the brand has to read consistently across pages.
82
+
83
+ 4. **Reference-image structural similarity** — `--ref` is accepted but the comparison is rubric-anchored (the evaluator looks at both the current screenshot and the reference, scores against the rubric). True pixel/structural-similarity comparison would need an embedding model and more careful scoring.
84
+
85
+ 5. **Lighthouse + axe integration into the loop** — currently `/qualia-polish` Stage 3 runs Lighthouse and axe; the loop does not. A future version could pipe a11y/performance scores from Lighthouse into the same iteration as the rubric eval, enabling "fix design AND a11y in the same loop."
86
+
87
+ 6. **Real token telemetry** — token costs are estimated (~14.5K/iter). Wiring real `tokens_used` from the Anthropic API would let `--budget` work against actual spend instead of estimates.
88
+
89
+ ## What can go wrong, and how the loop handles it
90
+
91
+ | Failure mode | Handling |
92
+ |---|---|
93
+ | Vision says "looks great" to a broken page | Anchored rubric + DEFAULT TO 3 + required evidence per dimension. Without evidence the score is rejected. |
94
+ | Same issue recurs forever | Fingerprint kill-switch at 3 consecutive iterations. |
95
+ | Fix-builder introduces a different issue | The next iteration catches it; if it persists 3 iters, regression-kill fires on that fingerprint instead. |
96
+ | Token budget blown | Verdict transitions to `out_of_budget` deterministically. Loop exits with partial-progress report. |
97
+ | Dev server dies mid-loop | curl heartbeat after every redeploy; loop halts with a clear error and the user can resume from the saved state. |
98
+ | Capture fails (browser crash) | Capture script returns exit 1 with per-viewport error. Loop can retry once or HALT. |
99
+ | Slop-detect blocks a fix-builder commit | The fix-builder retries; if it can't, returns BLOCKED. Loop's regression detector sees the same issue persist and kills cleanly. |
100
+
101
+ ## How to reason about cost
102
+
103
+ Each iteration = ~14.5K tokens (3 PNG reads + rubric + brief + previous-iteration delta + 3 fix-builder spawns).
104
+ Each iteration = ~6-15s wall clock for capture + vision-eval; fix-builders run in parallel.
105
+
106
+ Six iterations on a real Next.js dev server with HMR ≈ ~90K tokens, ~90 seconds. That's the realistic cost envelope. The 8-iter ceiling at 120K tokens is for projects with deep design debt where the loop has to iterate on multiple dimensions across many fix passes; in practice most invocations are ≤ 4 iterations.
107
+
108
+ The loop is cheaper than the human alternative (5-10 manual rounds at 5-15 minutes each = 30-90 minutes of human time) and converges deterministically.