npm - qualia-framework - Versions diffs - 4.5.0 → 5.3.0 - Mend

qualia-framework 4.5.0 → 5.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (66) hide show

package/AGENTS.md +24 -0
package/CLAUDE.md +12 -75
package/README.md +23 -16
package/agents/builder.md +9 -21
package/agents/planner.md +8 -0
package/agents/verifier.md +8 -0
package/agents/visual-evaluator.md +132 -0
package/bin/cli.js +54 -18
package/bin/install.js +369 -29
package/bin/qualia-ui.js +208 -1
package/bin/slop-detect.mjs +5 -0
package/bin/state.js +34 -1
package/docs/install-redesign-builder-prompt.md +290 -0
package/docs/install-redesign-pilot.md +234 -0
package/docs/playwright-loop-builder-prompt.md +185 -0
package/docs/playwright-loop-design-notes.md +108 -0
package/docs/playwright-loop-pilot-results.md +170 -0
package/docs/playwright-loop-tester-prompt.md +213 -0
package/docs/polish-loop-supervised-run.md +111 -0
package/docs/reviews/matt-pocock-skills-analysis.md +300 -0
package/guide.md +9 -5
package/hooks/env-empty-guard.js +74 -0
package/hooks/pre-compact.js +19 -9
package/hooks/pre-deploy-gate.js +8 -2
package/hooks/pre-push.js +26 -12
package/hooks/supabase-destructive-guard.js +62 -0
package/hooks/vercel-account-guard.js +91 -0
package/package.json +2 -1
package/rules/design-brand.md +4 -0
package/rules/design-laws.md +4 -0
package/rules/design-product.md +4 -0
package/rules/design-rubric.md +4 -0
package/rules/grounding.md +4 -0
package/skills/qualia-build/SKILL.md +40 -46
package/skills/qualia-discuss/SKILL.md +51 -68
package/skills/qualia-handoff/SKILL.md +1 -0
package/skills/qualia-hook-gen/SKILL.md +206 -0
package/skills/qualia-issues/SKILL.md +151 -0
package/skills/qualia-map/SKILL.md +78 -35
package/skills/qualia-new/REFERENCE.md +139 -0
package/skills/qualia-new/SKILL.md +45 -121
package/skills/qualia-optimize/REFERENCE.md +265 -0
package/skills/qualia-optimize/SKILL.md +92 -232
package/skills/qualia-plan/SKILL.md +58 -65
package/skills/qualia-polish-loop/REFERENCE.md +265 -0
package/skills/qualia-polish-loop/SKILL.md +201 -0
package/skills/qualia-polish-loop/fixtures/broken.html +117 -0
package/skills/qualia-polish-loop/fixtures/clean.html +196 -0
package/skills/qualia-polish-loop/scripts/loop.mjs +323 -0
package/skills/qualia-polish-loop/scripts/playwright-capture.mjs +206 -0
package/skills/qualia-polish-loop/scripts/score.mjs +176 -0
package/skills/qualia-prd/SKILL.md +199 -0
package/skills/qualia-report/SKILL.md +141 -200
package/skills/qualia-research/SKILL.md +28 -33
package/skills/qualia-road/SKILL.md +103 -0
package/skills/qualia-ship/SKILL.md +1 -0
package/skills/qualia-task/SKILL.md +1 -1
package/skills/qualia-test/SKILL.md +50 -2
package/skills/qualia-triage/SKILL.md +152 -0
package/skills/qualia-verify/SKILL.md +63 -104
package/skills/qualia-zoom/SKILL.md +51 -0
package/skills/zoho-workflow/SKILL.md +1 -1
package/templates/CONTEXT.md +36 -0
package/templates/decisions/ADR-template.md +30 -0
package/tests/bin.test.sh +598 -7
package/tests/state.test.sh +58 -0

package/docs/install-redesign-pilot.md ADDED Viewed

@@ -0,0 +1,234 @@
+# Install Redesign — Pilot Results (v5.1.0)
+Captured output and timing measurements for the three install-target
+scenarios, run on Linux 6.19 / Node 22 / non-TTY (piped stdin).
+All three scenarios pass cleanly. Total installer wall-clock cost
+remains under 200ms; the live-progress lifecycle adds negligible
+overhead because most file copies finish under 50ms (sub-50ms ops skip
+the "doing" state and go straight to `✓`).
+## Method
+```bash
+TMP=$(mktemp -d)
+START=$(date +%s%N)
+printf 'QS-FAWZI-01\n<CHOICE>\n' | HOME="$TMP" node bin/install.js > log.txt 2>&1
+END=$(date +%s%N)
+echo "wall-clock: $(( (END-START)/1000000 ))ms"
+```
+ANSI codes stripped from the captured logs below for readability. The
+real install renders the same lines with OKLCH-tinted teal / dim white
+/ green / yellow per `bin/qualia-ui.js`.
+## Scenario 1 — Claude Code only (target=1, the default)
+**Wall-clock:** 99ms · **Lines:** 202 · **Result:** `~/.claude/` populated
+(11 entries: agents, bin, CLAUDE.md, hooks, knowledge, qualia-guide.md,
+qualia-references, qualia-templates, rules, settings.json, skills),
+`~/.codex/` not created.
+```
+  ⬢  Q U A L I A
+  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+  Framework v5.1.0  ·  Qualia Solutions
+  Plan → Build → Verify → Ship
+  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+  Enter install code: QS-FAWZI-01
+  ✓ Welcome, Fawzi Goussous
+    Role: OWNER
+  Where would you like to install Qualia?
+    [1] Claude Code only     — recommended, full feature set
+    [2] OpenAI Codex only    — AGENTS.md (Codex's open standard)
+    [3] Both                 — max compatibility
+  Choice [1]: 1
+    Target: Claude Code
+  ▸ Skills
+  ────────────────────────────────────────
+  ✓ qualia
+  ✓ qualia-build
+  ✓ qualia-debug
+  ... (33 skills total) ...
+  ✓ zoho-workflow
+  └─ 33 skills · 4ms
+  ▸ Agents
+  ────────────────────────────────────────
+  ✓ builder.md
+  ... (9 agents total) ...
+  ✓ visual-evaluator.md
+  └─ 9 agents · 0ms
+  ▸ Rules
+  ────────────────────────────────────────
+  ... (10 rules) ...
+  └─ 10 rules · 0ms
+  ▸ Hooks                    (12 wired)
+  ▸ Templates                (16 entries, recursive)
+  ▸ Knowledge layer          (initialized on first install)
+  ▸ References               (methodology docs)
+  ▸ Configuration            (CLAUDE.md role substituted)
+  ▸ Scripts                  (state.js, qualia-ui.js, etc.)
+  ▸ Knowledge Base           (learned-patterns / common-fixes / client-prefs)
+  ▸ ERP Integration          (key opt-in)
+  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+  ⬢  INSTALLED
+  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+  Fawzi Goussous  ·  OWNER  ·  v5.1.0
+  Targets   Claude Code
+  Time      99ms
+  Skills    33     Agents  9     Hooks   12
+  Rules     10     Scripts 7     Templates 16
+  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+  Quick Start
+  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+  1. Restart Claude Code (loads new settings)
+  2. cd into any project and run claude
+  3. Try /qualia-new — kickoff a new project
+  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+  Welcome to the future with Qualia.
+  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+```
+## Scenario 2 — Codex only (target=2)
+**Wall-clock:** 183ms (extra cost is the `which codex` probe via
+`spawnSync`) · **Lines:** 56 · **Result:** `~/.codex/AGENTS.md` written
+with `Role: OWNER` substituted, `~/.claude/` not touched.
+```
+  ⬢  Q U A L I A
+  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+  Framework v5.1.0  ·  Qualia Solutions
+  ...
+  Choice [1]: 2
+    Target: Codex
+  ▸ Codex
+  ────────────────────────────────────────
+  ✓ AGENTS.md (configured as OWNER)
+  └─ Codex install scope: AGENTS.md only — Codex's runtime does not currently
+     consume the framework's skills/hooks/agents on disk. AGENTS.md carries
+     the rules; commands route through Claude Code.
+  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+  ⬢  INSTALLED
+  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+  Fawzi Goussous  ·  OWNER  ·  v5.1.0
+  Targets   Codex
+  Time      183ms
+  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+  Quick Start
+  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+  1. Open Codex in any project
+  2. Codex picks up ~/.codex/AGENTS.md automatically
+  3. Ask Codex about Qualia rules — they're in AGENTS.md
+```
+If `which codex` fails (CLI not installed), the section prints a soft
+warning before the file write; the AGENTS.md is still written so the
+user is set up for when they install Codex via
+`npm install -g @openai/codex`.
+## Scenario 3 — Both (target=3)
+**Wall-clock:** 193ms · **Lines:** 209 · **Result:** Both `~/.claude/`
+and `~/.codex/AGENTS.md` populated. Final summary shows
+`Targets   Claude Code · Codex`.
+The Claude install runs first (identical to Scenario 1), then the Codex
+section appends:
+```
+  ▸ Codex
+  ────────────────────────────────────────
+  ✓ AGENTS.md (configured as OWNER)
+  └─ Codex install scope: AGENTS.md only — ...
+```
+Final card:
+```
+  Targets   Claude Code · Codex
+  Time      193ms
+```
+## Backward compatibility — legacy single-line piped install
+```bash
+echo "QS-FAWZI-01" | npx qualia-framework install
+```
+still works untouched. The target prompt sees EOF on stdin, normalizes
+to `1` (Claude only). Confirmed by test 121 in `tests/bin.test.sh`.
+## TTY-degradation verification
+In non-TTY mode (output piped to a file), the live-progress primitives
+in `bin/qualia-ui.js` skip:
+- The Braille spinner (`spinner()`) — prints a one-shot `→ text` /
+  `✓ text` instead of an animated frame loop.
+- The cursor-up / clear-line overwrites (`step()`) — skips the
+  `⏳ doing` line and prints the result line directly when finalized.
+- Cursor hide/show escapes — never emitted.
+Verified by test 124: `grep` for bare `\r`, `?25` (hide-cursor), and
+`\[2K` (clear-line) on a piped install log returns nothing.
+## Timing budget
+| Scenario | Wall-clock | Lines emitted | Notes |
+|----------|------------|---------------|-------|
+| 1 — Claude only | 99ms | 202 | Baseline |
+| 2 — Codex only | 183ms | 56 | + `which codex` probe (~80ms on Node 22 / Linux) |
+| 3 — Both | 193ms | 209 | Claude + Codex back-to-back |
+The spec budget was "≤ 2× the current install time." v5.1 stays comfortably
+under it — the live updates and section timing add only the cost of the
+extra console.log calls (negligible) and a couple of `Date.now()` calls
+per section.
+## Known limitations / deferred to v5.2
+- **Codex skills/hooks/agents not mirrored.** Codex uses `.toml` agent
+  format and a different hook schema. AGENTS.md is the rule layer they
+  share; the rest stays Claude-only until Codex's on-disk surface
+  stabilizes for cross-mapping.
+- **Spinner cosmetics on `cmd.exe`.** Braille frames may render as
+  boxes on legacy `cmd.exe`. Modern Windows Terminal handles them. The
+  install completes correctly either way; this is purely cosmetic.
+- **`Time` row precision.** Sub-second installs show `Xms`; ≥1s installs
+  show `X.Ys`. Either format is grep-friendly (regex
+  `Time *[0-9]+(\.[0-9]+)?(ms|s)`).
+## What this enables
+A user piping `npx qualia-framework install` from a fresh box now has
+a one-keystroke choice between Claude-only (the recommended default)
+and shipping Qualia's rule layer to OpenAI Codex via the open AGENTS.md
+standard. The live-progress redesign means even a 5-second install
+feels like an intentional product, not a hung process. First impression
+matches the rest of the framework's polish.

package/docs/playwright-loop-builder-prompt.md ADDED Viewed

@@ -0,0 +1,185 @@
+# Playwright Visual-Polish Loop — Builder Agent Prompt
+**Hand this entire file to a fresh Claude Code session.** Self-contained — no context from the originating session is needed.
+---
+## You are building a feature for the Qualia Framework v5.1
+The Qualia Framework is a Claude Code workflow framework at `/home/qualia-new/qualia-framework` (npm package `qualia-framework`, current version 5.0.0). It manages full-stack project delivery for Qualia Solutions (Nicosia, Cyprus). It already has 32 skills, 12 hooks, 8 agents, 24 templates, and 260+ tests. Your job is to add ONE new flagship capability for v5.1: an autonomous visual-polish loop that uses Playwright to screenshot live pages and self-correct frontend until visually correct.
+## Why this exists (the friction it fixes)
+Per `/insights` data from the framework owner Fawzi (122 sessions, 292 commits, 10 days), the #1 documented friction pattern is **design iteration churn** — hero videos, mobile layouts, and responsive breakpoints requiring 5-10 manual rounds before landing. Quotes from his transcripts include:
+- "many frustrating iterations" on hero video layouts
+- "what did u do" after a CSS regression
+- "first showcase used basic HTML/CSS animation; you had to explicitly request 'proper design animation three js or framer motion'"
+- "FUCK U" / "OMG I TOLD U I CHANGED THE PAGE SO U STOP LYING" — clusters around Claude not seeing what's actually rendered
+**The root cause:** the framework's design QA today is text-based. Slop-detect grep-scans CSS for em-dashes/banned fonts. The verifier scores 8 design dimensions by reading TSX/CSS, not by looking at rendered pages. When something fails visually but passes code review (most hero-video, mobile-layout, and responsive-breakpoint bugs), there's no feedback loop. The user has to look, complain, iterate.
+**Your feature closes that loop.** A new skill `/qualia-polish-loop` takes a URL + design brief, screenshots at multiple viewports, evaluates against the brief using vision, identifies issues, edits files, redeploys to a Vercel preview, loops up to N times until criteria pass, and stops only when it's actually correct (or hits a kill-switch).
+## What "good" looks like (success criteria)
+The feature must:
+1. **See its own work.** Screenshots at mobile (375px), tablet (768px), desktop (1440px) at minimum, captured via Playwright MCP.
+2. **Anchor evaluation rigorously.** Vision model judgments must be scored against the project's `.planning/DESIGN.md` brief AND the `rules/design-rubric.md` 8-dimension scoring (Typography / Color cohesion / Spatial rhythm / Layout originality / Shadow & depth / Motion intent / Microcopy specificity / Container depth). Each dimension scored 1-5 with evidence; ANY <3 fails the iteration.
+3. **Iterate with discipline.** Max 8 iterations per loop invocation. Hard kill-switch if the same issue recurs 3 times (regression-stop). Per-iteration: identify TOP 3 issues, edit relevant files, redeploy, re-screenshot, re-evaluate.
+4. **Stop only when correct.** Success = all 8 dimensions ≥ 3 AND no critical-severity issues remain.
+5. **Token-discipline.** Each iteration uses ≤ 4 vision evaluations (3 viewports + 1 holistic). Estimate token cost upfront and warn user if budget will exceed 100K tokens.
+6. **Never silently destroy work.** All file edits go through `git commit` per iteration so any iteration is revertable. Failed iterations leave clear `[ITERATION-N]` commit prefixes for cleanup.
+7. **Integrate with the framework.** Honors all framework conventions: PRODUCT.md / DESIGN.md / CONTEXT.md as substrate, slop-detect at commit, qualia-ui banner, state.js telemetry.
+## Architecture (the design is yours to refine)
+```
+/qualia-polish-loop {url} [--brief design-brief.md] [--max 8] [--viewports 375,768,1440]
+       │
+       ▼
+   Pre-flight (sequential, me)
+   ├─ Read .planning/PRODUCT.md (register, anti-references, voice)
+   ├─ Read .planning/DESIGN.md (color strategy, scene sentence, palette)
+   ├─ Read rules/design-rubric.md (8-dim scoring criteria)
+   ├─ Read brief argument if provided, else use DESIGN.md
+   └─ Estimate token budget. Warn if > 100K. AskUserQuestion to confirm proceed.
+       │
+       ▼
+   Loop (max 8 iterations):
+   ├─ Iteration N starts: log to .planning/visual-polish-loop.md
+   ├─ Capture: 3 viewports via Playwright MCP → save to /tmp/qpl-{N}/
+   ├─ Evaluate: spawn vision agent with screenshots + brief + rubric
+   │     Returns: per-dim 1-5 scores + evidence + top 3 issues + severity
+   ├─ Decide: all dims ≥ 3 AND no critical? → SUCCESS, exit loop
+   ├─ Else: regression check — if same issue recurred 3x → KILL, exit with FAIL
+   ├─ Else: spawn 1 builder per top-issue (parallel, max 3) to fix
+   │     Each builder: read affected file, apply fix, slop-detect, commit
+   ├─ Redeploy: vercel deploy --prebuilt OR `npm run dev` heartbeat check
+   └─ Loop back to capture
+       │
+       ▼
+   Post-loop:
+   ├─ Write .planning/visual-polish-loop.md (full report: iterations, scores, fixes)
+   ├─ Show before/after screenshots side-by-side via qualia-ui
+   ├─ git add .planning/visual-polish-loop.md && commit
+   └─ Tell user: SUCCESS / KILLED-AT-N / OUT-OF-BUDGET
+```
+## Integration points (read these before designing)
+Before writing any code, read these files to understand the framework:
+1. `/home/qualia-new/qualia-framework/CLAUDE.md` — project rules, instruction-budget discipline
+2. `/home/qualia-new/qualia-framework/rules/design-rubric.md` — 8-dimension 1-5 scoring criteria with anchored definitions per dimension
+3. `/home/qualia-new/qualia-framework/rules/design-laws.md` — non-negotiable design rules (OKLCH-only, banned fonts, side-stripe-borders, gradient-text bans, glassmorphism, etc.)
+4. `/home/qualia-new/qualia-framework/templates/PRODUCT.md` — what the agent will read as register/anti-references/voice substrate
+5. `/home/qualia-new/qualia-framework/templates/DESIGN.md` — what the agent will read as visual contract
+6. `/home/qualia-new/qualia-framework/skills/qualia-polish/SKILL.md` — existing scope-adaptive polish skill; understand its modes (Component / Section / App / Redesign / Critique / Quick) and how this loop fits as a 7th mode OR a separate skill
+7. `/home/qualia-new/qualia-framework/agents/verifier.md` — how the existing verifier scores design dimensions today (text-based)
+8. `/home/qualia-new/qualia-framework/skills/qualia-build/SKILL.md` — pattern for spawning builder subagents in parallel
+9. `/home/qualia-new/qualia-framework/bin/qualia-ui.js` — the UI helper for banners/dividers/end-cards
+10. `/home/qualia-new/qualia-framework/bin/state.js` — for telemetry transitions if you want to log loop iterations
+11. `/home/qualia-new/qualia-framework/bin/slop-detect.mjs` — must be invoked on every committed file in iteration
+## External dependencies you'll integrate
+1. **Playwright MCP** — verify it's available via `claude mcp list` or instructions to add. Use `mcp__playwright__navigate`, `mcp__playwright__take_screenshot` (or equivalent — verify exact tool names by listing available MCP tools). Setup may require:
+   ```
+   claude mcp add playwright -- npx -y @playwright/mcp@latest
+   ```
+   Plus on Linux/CI: `npx playwright install chromium` to get the browser binaries.
+2. **Vision model** — Claude (you, the agent) reads images natively. The screenshots get attached to the spawned vision agent's prompt. Use the Read tool with image file paths.
+3. **Vercel deploys** — use `vercel deploy` (not `--prod`) to publish a preview each iteration. Read `.vercel/project.json` for project linkage. The dev-mode alternative is `npm run dev` + curl heartbeat, but preview deploys give a stable URL for re-screenshotting from anywhere.
+## Decision points the user (Fawzi) will care about
+You MUST present these via `AskUserQuestion` BEFORE starting the loop on first invocation. Each is a load-bearing choice:
+1. **Brief source** — use `.planning/DESIGN.md` (default) OR a separate `--brief` markdown file (override). If neither exists, halt: "No design brief found. Run /qualia-new or pass --brief."
+2. **Reference screenshots (optional but recommended)** — "Do you have a reference image of what this should look like? Paste path or skip." Reference-anchored vision is dramatically more reliable than rubric-only.
+3. **Auto-deploy strategy** — "Each iteration redeploys to Vercel preview (slower, real environment) or runs `npm run dev` and screenshots localhost (faster, dev artifacts may differ). Pick: vercel-preview / dev-localhost." Default: dev-localhost for iteration speed, vercel-preview for final.
+4. **Token budget cap** — "Estimated 60-100K tokens for 8 iterations. Cap at 100K (default), 200K (generous), or 50K (tight)?"
+## Files to create
+- `/home/qualia-new/qualia-framework/skills/qualia-polish-loop/SKILL.md` — the new skill (frontmatter + workflow + decision gates). Target: <250 lines per Matt Pocock progressive disclosure rule.
+- `/home/qualia-new/qualia-framework/skills/qualia-polish-loop/REFERENCE.md` — verbatim agent prompt templates (vision-eval prompt, fix-builder prompt, etc.).
+- `/home/qualia-new/qualia-framework/skills/qualia-polish-loop/scripts/playwright-capture.mjs` — Node ESM helper that takes URL + viewports[] + outDir, drives the Playwright MCP via subprocess OR uses Playwright directly via `npm install playwright` (your call).
+- `/home/qualia-new/qualia-framework/skills/qualia-polish-loop/scripts/score.mjs` — utility that takes a scored JSON object (8 dim scores + evidence) and computes pass/fail per the rubric formula.
+## Files to modify
+- `/home/qualia-new/qualia-framework/bin/install.js` — register the new skill (it's recursive, should auto-pick up the new folder, but verify).
+- `/home/qualia-new/qualia-framework/skills/qualia-road/SKILL.md` — add `/qualia-polish-loop` to the v5.1 alignment-substrate list.
+- `/home/qualia-new/qualia-framework/CHANGELOG.md` — add a v5.1.0 entry.
+- `/home/qualia-new/qualia-framework/package.json` — bump version to 5.1.0.
+- `/home/qualia-new/qualia-framework/tests/bin.test.sh` — add install assertions for the new skill folder (matching the v5.0 pattern at lines ~960-980).
+## Hard constraints (non-negotiable)
+1. **Vision-eval discipline.** The vision agent MUST be spawned with the rubric criteria inlined. Never spawn with "tell me what you think" — that's how you get "looks great!" hallucinations. Use the format Matt Pocock uses for grilling: ask one question at a time per dimension, require evidence on the next line.
+2. **Anti-loop discipline.** Track issue fingerprints across iterations. If issue X (same file:line, same dim, same severity) appears in 3 consecutive iterations, KILL with `LOOP_REGRESSION_DETECTED` and write a diagnostic to the report.
+3. **Per-iteration commits.** Every file edit gets its own commit with prefix `qpl-N: {issue-slug}`. The user must be able to `git revert` any iteration cleanly.
+4. **Slop-detect gate.** Before any commit, run `node ~/.claude/bin/slop-detect.mjs {touched files}`. Critical findings BLOCK the commit. The fix-builder must reapply.
+5. **No background processes after exit.** Cleanup any Playwright browser processes, temp screenshots, dev-server PIDs.
+6. **Honor `prefers-reduced-motion`.** Vision evaluation must NOT penalize an absence of motion if the page is motion-reduced (read the user's OS-level setting via Playwright if possible, else default to motion-on).
+7. **Do not modify framework agents/* files.** Specifically: don't touch agents/builder.md, agents/verifier.md, etc. The loop's vision evaluator is a NEW agent role file — create `agents/visual-evaluator.md` if needed.
+## Self-test scenarios (you must run these before declaring DONE)
+Build the feature, then run it in 3 scenarios. Document outcomes in `/home/qualia-new/qualia-framework/docs/playwright-loop-pilot-results.md`:
+**Scenario 1 — Synthetic clean page.** Create a deliberately well-designed test page (use Tailwind v4, OKLCH palette, varied layout, all 7 states, 65ch line length on body). Run the loop on it. Expected: SUCCESS in 1-2 iterations with all dims ≥ 4.
+**Scenario 2 — Synthetic broken page.** Create a deliberately bad page (Inter font, blue-purple gradient, identical 3-card grid, hero centered + gradient bg + 2 CTAs, em-dashes, side-stripe borders). Run the loop. Expected: identifies all anti-patterns, fixes them, ends with SUCCESS in 4-6 iterations.
+**Scenario 3 — Stress test the kill-switch.** Manually inject a fix-builder that always reintroduces the same issue (e.g. always rewrites color to `#000`). Run the loop. Expected: KILLED at iteration 4 with `LOOP_REGRESSION_DETECTED` after 3 consecutive recurrences.
+For each scenario record: total iterations, total tokens consumed, final scores, screenshots before/after, time elapsed.
+## Things you MUST NOT do
+- Do not deploy to production. Vercel preview only. Never `vercel --prod`.
+- Do not touch any file in `.planning/` other than writing your own report.
+- Do not add new dependencies without justification — Playwright MCP + native Node is the budget.
+- Do not increase any global SKILL.md or CLAUDE.md sizes (instruction-budget discipline).
+- Do not invent new design rules — score against `rules/design-rubric.md` as it is. If the rubric is wrong, that's a separate problem.
+- Do not "just make it work" by iterating forever. Hard cap 8.
+## Deliverables (the DONE definition)
+You return DONE when ALL of these are true:
+1. ✅ `/qualia-polish-loop` skill exists and installs via `node bin/install.js`
+2. ✅ All 3 self-test scenarios pass per the spec above (results doc written)
+3. ✅ `npm test` shows the new install assertions passing (tests pass count went up by ≥ 4)
+4. ✅ `node bin/slop-detect.mjs` clean on all new files
+5. ✅ CHANGELOG v5.1.0 entry present and slop-clean
+6. ✅ A 1-page integration note at `docs/playwright-loop-design-notes.md` documenting: how it integrates with existing /qualia-polish, where it differs, when to use which, and what's deferred to v5.2
+Return DONE with the test results and the path to the pilot-results doc.
+## When you encounter unknowns
+The Playwright MCP setup, vision-eval reliability, and Vercel preview deploy timing are all real unknowns. When you hit one:
+- Check `claude mcp list` to see what's actually wired in this session
+- Try ONE approach, measure its reliability via Scenario 1, iterate
+- If something is genuinely blocking after 2 attempts, write a `BLOCKED — {what}` report to `docs/playwright-loop-blockers.md` and surface it back to the user. Do NOT silently work around blockers — the framework owner needs to know what's brittle before relying on it.
+## One last thing
+Fawzi (the framework owner) will read your report. He's a senior engineer with strong design sense and very low tolerance for flakiness. If the loop kind-of-works but is unreliable, mark it experimental and say so loudly. If it works well, that's the v5.1 headline. Honest reporting beats good-news theater every time.

package/docs/playwright-loop-design-notes.md ADDED Viewed

@@ -0,0 +1,108 @@
+# /qualia-polish-loop — Design notes
+One-page integration narrative. Companion to the SKILL.md (`skills/qualia-polish-loop/SKILL.md`) and pilot results (`docs/playwright-loop-pilot-results.md`).
+## What it is
+A skill that takes a URL + design brief, screenshots at three viewports (mobile / tablet / desktop), evaluates with vision against the 8-dimension rubric, fixes the top issues with parallel builders, re-screenshots, and loops until every dimension scores at least 3 (success) or one of three kill conditions trips: regression, budget, max-iterations.
+## Why it exists separately from `/qualia-polish`
+`/qualia-polish` (v4.5.0+) is **scope-adaptive** with six modes (Component / Section / App / Redesign / Critique / Quick). Its evaluation is **text-first**: it reads CSS and TSX, runs `slop-detect`, runs Lighthouse if a dev server is up, and (in Redesign scope only) runs a 2-iteration vision loop as Stage 4. The vision step is one stage of one mode.
+`/qualia-polish-loop` is **vision-first** and built to actually iterate. It assumes a running URL, captures real renders, and treats the screenshot as primary evidence. The loop length is configurable up to 8; regressions are tracked with fingerprints; every fix is its own revertable commit.
+These are not redundant. They solve different failure modes:
+| Failure mode | `/qualia-polish` catches it | `/qualia-polish-loop` catches it |
+|---|---|---|
+| Banned font in source | YES (slop-detect grep on CSS) | YES (vision sees Inter rendering) |
+| Hardcoded hex in JSX | YES (slop-detect) | NO directly — would manifest as Color < 3 |
+| Three-column card grid | YES (slop-detect) | YES (vision sees identical cards) |
+| Hero video framed wrong on mobile | NO (text doesn't reveal mobile cropping) | YES (mobile screenshot + min-aggregate) |
+| Touch targets < 44px | NO (slop-detect doesn't measure) | YES (visible on 375px capture) |
+| Spacing rhythm "feels off" | NO | YES (vision scores Spatial < 3) |
+| `prefers-reduced-motion` working correctly | YES (CSS grep) | YES (capture with reduced-motion forced — deferred to v5.1.1) |
+The visual-only failures are exactly Fawzi's `/insights`-documented friction pattern: hero videos cropped wrong, mobile spacing collapsing, motion missing. `slop-detect` was never going to catch those — it doesn't see what the browser draws.
+## When to use which
+| User says... | Use |
+|---|---|
+| "fix the button styling" | `/qualia-polish src/components/Button.tsx` |
+| "the whole dashboard needs a design pass" | `/qualia-polish app/dashboard` |
+| "it doesn't look right on mobile" | `/qualia-polish-loop http://localhost:3000` |
+| "score without fixing" | `/qualia-polish --critique` |
+| "iterate on the home page until the hero video is right" | `/qualia-polish-loop http://localhost:3000 --max 6` |
+| "ship-ready final check" | `/qualia-polish-loop` then `/qualia-ship` |
+The smart router (`/qualia`) does not auto-route to `/qualia-polish-loop` because it requires a running URL. Users invoke it explicitly.
+## Architectural choices and why
+### Chromium binary as default backend
+The capture script tries four backends in order:
+1. `import('playwright')` — when the project has it as a dep
+2. `import('playwright-core')` — same API, lighter package
+3. `~/.cache/ms-playwright/chromium-{version}/chrome-{linux64,linux,mac,win}/chrome` — Playwright-cached chromium binary, used directly via `--headless=new --screenshot`
+4. `which google-chrome` / `chromium` / `chromium-browser` / `chrome` — system browser
+The earlier draft of this skill used `mcp__claude-in-chrome__*` tools as primary, but those require the user to install a Chrome browser extension and have Chrome running with it active — a real prerequisite for many users (browserless servers, CI). The new chromium-binary fallback removes that prerequisite: any machine with Google Chrome on PATH or Playwright cached binaries can run the loop.
+Playwright SDK is preferred when available because its `waitUntil: 'networkidle'` is more deterministic than `--virtual-time-budget`. Binary mode is the safety net.
+### Deterministic state outside the LLM context
+`scripts/loop.mjs` is a CLI state machine. The iteration counter, token usage, fingerprint history, and verdict all live in a JSON file at `/tmp/qpl-{ts}/state.json`. Claude reads compact JSON (`{verdict, iteration, top_issues}`) per iteration — not the full state. This keeps per-iteration token cost roughly constant (~14.5K) instead of growing with iteration count.
+### Vision-evaluator anchoring
+The single biggest failure mode for vision-eval is "looks great!" hallucinations. The visual-evaluator agent (`agents/visual-evaluator.md`) inlines the rubric criteria with anchored definitions (`1 = fails, 2 = below acceptable, 3 = acceptable — DEFAULT, 4 = good, 5 = excellent`) and requires evidence on the line after each score. The instruction `DEFAULT TO 3` is repeated three times in the role file. Calibration examples show "good" and "rejected" evaluations side by side.
+The output is a single fenced JSON block — no prose — which the orchestrator parses without re-asking. The aggregate score is the **minimum** across viewports per dimension, so a layout that's elegant on desktop but breaks at 375px is a fail. This was deliberate: most documented visual regressions are mobile-only.
+### Regression detection via fingerprints
+Each top-issue is hashed to a fingerprint = `{dim}__{file_basename}__{first_32_chars_of_description}` (lowercased, non-word chars collapsed). If the same fingerprint appears in **3 consecutive iterations**, the loop kills with `LOOP_REGRESSION_DETECTED`. Non-consecutive recurrences don't kill — they're a normal pattern when a fix worked, then a different change broke the same dimension again.
+### Per-iteration commits
+Every fix is its own git commit with `qpl-{N}: {slug}` prefix. The orchestrator gates the commit through `slop-detect` first; critical findings BLOCK the commit and the fix-builder must retry. The user can `git revert` any single iteration cleanly without losing other fixes. The alternative (one squashed commit at the end) was rejected because it makes partial rollbacks impossible.
+## Deferred to v5.2
+1. **`prefers-reduced-motion` capture mode** — currently the capture script doesn't force `prefers-reduced-motion: reduce` in the headless run. The vision evaluator handles the case correctly (scores motion on CSS-declaration quality, not visible animation, when reduced motion is on), but the capture itself doesn't force the OS bit. Adding `--force-prefers-reduced-motion` is straightforward via Chrome flags.
+2. **Vercel-preview deploy mode end-to-end** — `--deploy preview` is wired in SKILL.md (each iteration redeploys to a Vercel preview URL), but not exercised in the pilot. Real-project use will surface deploy-latency edge cases. Once validated, this can become the default for production iteration.
+3. **Multi-route sweeps** — one URL per invocation today. Multi-route would mean batching `/route-a, /route-b, /route-c` and running the loop per route, then producing a unified report. Useful for marketing-site polish where the brand has to read consistently across pages.
+4. **Reference-image structural similarity** — `--ref` is accepted but the comparison is rubric-anchored (the evaluator looks at both the current screenshot and the reference, scores against the rubric). True pixel/structural-similarity comparison would need an embedding model and more careful scoring.
+5. **Lighthouse + axe integration into the loop** — currently `/qualia-polish` Stage 3 runs Lighthouse and axe; the loop does not. A future version could pipe a11y/performance scores from Lighthouse into the same iteration as the rubric eval, enabling "fix design AND a11y in the same loop."
+6. **Real token telemetry** — token costs are estimated (~14.5K/iter). Wiring real `tokens_used` from the Anthropic API would let `--budget` work against actual spend instead of estimates.
+## What can go wrong, and how the loop handles it
+| Failure mode | Handling |
+|---|---|
+| Vision says "looks great" to a broken page | Anchored rubric + DEFAULT TO 3 + required evidence per dimension. Without evidence the score is rejected. |
+| Same issue recurs forever | Fingerprint kill-switch at 3 consecutive iterations. |
+| Fix-builder introduces a different issue | The next iteration catches it; if it persists 3 iters, regression-kill fires on that fingerprint instead. |
+| Token budget blown | Verdict transitions to `out_of_budget` deterministically. Loop exits with partial-progress report. |
+| Dev server dies mid-loop | curl heartbeat after every redeploy; loop halts with a clear error and the user can resume from the saved state. |
+| Capture fails (browser crash) | Capture script returns exit 1 with per-viewport error. Loop can retry once or HALT. |
+| Slop-detect blocks a fix-builder commit | The fix-builder retries; if it can't, returns BLOCKED. Loop's regression detector sees the same issue persist and kills cleanly. |
+## How to reason about cost
+Each iteration = ~14.5K tokens (3 PNG reads + rubric + brief + previous-iteration delta + 3 fix-builder spawns).
+Each iteration = ~6-15s wall clock for capture + vision-eval; fix-builders run in parallel.
+Six iterations on a real Next.js dev server with HMR ≈ ~90K tokens, ~90 seconds. That's the realistic cost envelope. The 8-iter ceiling at 120K tokens is for projects with deep design debt where the loop has to iterate on multiple dimensions across many fix passes; in practice most invocations are ≤ 4 iterations.
+The loop is cheaper than the human alternative (5-10 manual rounds at 5-15 minutes each = 30-90 minutes of human time) and converges deterministically.