npm - qualia-framework - Versions diffs - 4.5.0 → 5.1.0 - Mend

qualia-framework 4.5.0 → 5.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (64) hide show

package/AGENTS.md +24 -0
package/CLAUDE.md +12 -75
package/README.md +23 -16
package/agents/builder.md +9 -21
package/agents/planner.md +8 -0
package/agents/verifier.md +8 -0
package/agents/visual-evaluator.md +132 -0
package/bin/cli.js +54 -18
package/bin/install.js +369 -29
package/bin/qualia-ui.js +208 -1
package/bin/slop-detect.mjs +5 -0
package/bin/state.js +34 -1
package/docs/install-redesign-builder-prompt.md +290 -0
package/docs/install-redesign-pilot.md +234 -0
package/docs/playwright-loop-builder-prompt.md +185 -0
package/docs/playwright-loop-design-notes.md +108 -0
package/docs/playwright-loop-pilot-results.md +170 -0
package/docs/playwright-loop-review-2026-05-03.md +65 -0
package/docs/playwright-loop-tester-prompt.md +213 -0
package/docs/reviews/matt-pocock-skills-analysis.md +300 -0
package/guide.md +9 -5
package/hooks/env-empty-guard.js +74 -0
package/hooks/pre-compact.js +19 -9
package/hooks/pre-deploy-gate.js +8 -2
package/hooks/pre-push.js +26 -12
package/hooks/supabase-destructive-guard.js +62 -0
package/hooks/vercel-account-guard.js +91 -0
package/package.json +2 -1
package/rules/design-brand.md +4 -0
package/rules/design-laws.md +4 -0
package/rules/design-product.md +4 -0
package/rules/design-rubric.md +4 -0
package/rules/grounding.md +4 -0
package/skills/qualia-build/SKILL.md +40 -46
package/skills/qualia-discuss/SKILL.md +51 -68
package/skills/qualia-handoff/SKILL.md +1 -0
package/skills/qualia-issues/SKILL.md +151 -0
package/skills/qualia-map/SKILL.md +78 -35
package/skills/qualia-new/REFERENCE.md +139 -0
package/skills/qualia-new/SKILL.md +45 -121
package/skills/qualia-optimize/REFERENCE.md +202 -0
package/skills/qualia-optimize/SKILL.md +72 -237
package/skills/qualia-plan/SKILL.md +58 -65
package/skills/qualia-polish-loop/REFERENCE.md +265 -0
package/skills/qualia-polish-loop/SKILL.md +201 -0
package/skills/qualia-polish-loop/fixtures/broken.html +117 -0
package/skills/qualia-polish-loop/fixtures/clean.html +196 -0
package/skills/qualia-polish-loop/scripts/loop.mjs +302 -0
package/skills/qualia-polish-loop/scripts/playwright-capture.mjs +197 -0
package/skills/qualia-polish-loop/scripts/score.mjs +176 -0
package/skills/qualia-report/SKILL.md +141 -200
package/skills/qualia-research/SKILL.md +28 -33
package/skills/qualia-road/SKILL.md +103 -0
package/skills/qualia-ship/SKILL.md +1 -0
package/skills/qualia-task/SKILL.md +1 -1
package/skills/qualia-test/SKILL.md +50 -2
package/skills/qualia-triage/SKILL.md +152 -0
package/skills/qualia-verify/SKILL.md +63 -104
package/skills/qualia-zoom/SKILL.md +51 -0
package/skills/zoho-workflow/SKILL.md +1 -1
package/templates/CONTEXT.md +36 -0
package/templates/decisions/ADR-template.md +30 -0
package/tests/bin.test.sh +451 -7
package/tests/state.test.sh +58 -0

package/skills/qualia-polish-loop/REFERENCE.md ADDED Viewed

@@ -0,0 +1,265 @@
+# REFERENCE — /qualia-polish-loop
+Verbatim agent prompts and operational details. Loaded on demand by SKILL.md, not carried in the system prompt. Per progressive-disclosure discipline (Matt Pocock): the agent reads SKILL.md first, then this file when it needs the spawn templates.
+## Architecture summary
+```
+SKILL.md driver (Claude session)
+  │
+  ├─ scripts/playwright-capture.mjs    (deterministic Node — produces PNGs)
+  │     ↓ writes /tmp/qpl-{ts}/iter-{N}/{mobile,tablet,desktop}-*.png
+  │
+  ├─ Agent({subagent_type: "qualia-visual-evaluator", ...})
+  │     ↓ reads PNGs, returns single JSON envelope (eval.json)
+  │
+  ├─ scripts/loop.mjs record           (deterministic — verdict + fingerprints)
+  │     ↓ exit 0=SUCCESS, 1=CONTINUE, 3=KILLED
+  │
+  ├─ Agent({subagent_type: "qualia-builder", ...})  × up to 3 in parallel
+  │     ↓ each fixes ONE issue, calls scripts/loop.mjs commit-fix
+  │
+  └─ scripts/loop.mjs report           (final markdown report)
+        ↓ writes .planning/visual-polish-loop.md
+```
+## Capture: backend selection
+The capture script (`scripts/playwright-capture.mjs`) auto-selects in this order:
+1. `import('playwright')` — preferred when available; gives deterministic `waitUntil: 'networkidle'`
+2. `import('playwright-core')` — same API, lighter package
+3. `~/.cache/ms-playwright/chromium-{version}/chrome-{linux64,linux,mac,win}/chrome` — if Playwright was ever installed for browsers but the package isn't import-resolvable
+4. `which google-chrome` / `chromium` / `chromium-browser` / `chrome` — system browser fallback
+For backends 3 and 4 (binary-direct), the script uses `--headless=new --screenshot --virtual-time-budget`. Less precise than Playwright's `networkidle` waiting but works without any npm dependency.
+Setup hints if all four fail:
+```bash
+# Option A — Playwright (best stability)
+npm i -D playwright && npx playwright install chromium
+# Option B — system Chrome (fastest setup if you already have Chrome installed)
+# (no action needed if google-chrome is on PATH)
+```
+## Vision-evaluator spawn template (VERBATIM)
+The vision evaluator's anchored discipline: **DEFAULT TO 3.** Only score above 3 with a cited design principle. Only score below 3 with a quoted violation. Without anchoring, vision models return "looks great!" to everything — that failure mode is the entire reason this loop exists. The full rubric criteria live in `agents/visual-evaluator.md`; this section is the spawn template.
+When the loop reaches step 2 (Evaluate), spawn ONE agent with the screenshots, brief, rubric, and previous-iteration context. Inline this prompt verbatim — do not paraphrase.
+```
+Agent({
+  subagent_type: "qualia-visual-evaluator",
+  description: "Score iteration {N} screenshots against rubric",
+  prompt: `
+Role: @~/.claude/agents/visual-evaluator.md
+<rubric>
+{INLINE rules/design-rubric.md §"The 8 dimensions" through §"Aggregate score"}
+</rubric>
+<brief>
+{INLINE the relevant excerpt from .planning/DESIGN.md — sections "Direction", "Color", "Typography"}
+</brief>
+<product>
+{INLINE the relevant excerpt from .planning/PRODUCT.md — register, voice, anti-references}
+</product>
+<screenshots>
+- mobile (375px):  /tmp/qpl-{ts}/iter-{N}/mobile-375.png
+- tablet (768px):  /tmp/qpl-{ts}/iter-{N}/tablet-768.png
+- desktop (1440px): /tmp/qpl-{ts}/iter-{N}/desktop-1440.png
+</screenshots>
+<viewport_meta>
+{ "reduced_motion": {true|false}, "viewport_widths": [375, 768, 1440] }
+</viewport_meta>
+<previous_iteration>
+{If N > 1, INLINE eval.json.top_issues from iter-{N-1} so the evaluator can verify regression vs improvement. Otherwise: "(first iteration — no prior data)"}
+</previous_iteration>
+<task>
+This is iteration {N} of {max}. Read each screenshot. Score every dimension 1-5 with one-line evidence per dimension per viewport. Return a single fenced JSON block per the contract in your role file. No prose outside the JSON.
+</task>
+`
+})
+```
+The evaluator's role file (`agents/visual-evaluator.md`) carries the trust-boundary block, the calibration examples, and the JSON output contract. Together with this spawn template, the prompt prefix is stable across iterations — Anthropic prompt caching reuses the role + rubric + brief prefix, so the per-iteration cost is roughly: 3 image reads + the previous-iteration delta.
+## Fix-builder spawn template (VERBATIM)
+When the loop has 1-3 issues to fix, spawn one builder per issue IN THE SAME RESPONSE TURN (parallel). Each fixes one dimension, narrowly.
+```
+Agent({
+  subagent_type: "qualia-builder",
+  description: "Fix {dim} issue: {short description}",
+  prompt: `
+Role: @~/.claude/agents/builder.md
+<phase_context>
+You are inside /qualia-polish-loop iteration {N}. The vision evaluator scored
+the {dim} dimension at {score}. Your single task: fix that one dimension.
+<design>
+{INLINE .planning/DESIGN.md tokens relevant to {dim}}
+</design>
+<product>
+{INLINE .planning/PRODUCT.md voice + register}
+</product>
+</phase_context>
+<task_context>
+# Issue
+- Dimension: {dim}
+- Severity: {severity}
+- Description: {description}
+- Likely file: {likely_file or "(infer from grep — start at the path the screenshot suggests)"}
+- Recommended fix: {fix}
+# Files probably affected
+{1-3 candidate paths the loop has inferred from the URL routing}
+</task_context>
+<task>
+1. Read the likely file. If the issue is in a different file, follow the import graph until you find the source.
+2. Make the MINIMUM edit to fix this one dimension. Do not refactor. Do not change logic. Do not touch state management. Do not change copy unless this is a microcopy issue.
+3. Use design tokens from DESIGN.md. Do not invent new color values, font names, or spacing.
+4. After the edit, commit via the orchestrator (slop-detect-gated):
+     node ~/.claude/skills/qualia-polish-loop/scripts/loop.mjs commit-fix --state {STATE} --file {file} --slug {dim}-{short-keyword}
+   If slop-detect blocks (exit 2), READ the slop output and re-edit. If you cannot fix without violating slop-detect, return BLOCKED with the conflict.
+5. Return DONE with: file modified, lines changed, slop-detect: pass, commit: {sha}.
+</task>
+<rules>
+- Vision says: {evidence from eval.json.viewport_results[].evidence[{dim}]}
+- Do not add features.
+- Do not write tests for this fix (the loop's next iteration is the test).
+- Single commit. The orchestrator handles the slug + iteration prefix.
+</rules>
+`
+})
+```
+## Iteration log entry (what `loop.mjs record` writes to state.json.iterations[])
+```json
+{
+  "iteration": 1,
+  "scores": { "typography": 1, "color": 1, "spatial": 3, "layout": 1, "shadow": 3, "motion": 3, "microcopy": 1, "container": 1 },
+  "aggregate": 14,
+  "pass": false,
+  "failing_dims": ["typography", "color", "layout", "microcopy", "container"],
+  "top_issues": [
+    { "dim": "color", "severity": "critical", "description": "blue→purple gradient on hero", "likely_file": "src/styles/globals.css", "fix": "replace linear-gradient with single accent var(--accent)" },
+    { "dim": "typography", "severity": "critical", "description": "Inter as primary font-family", "likely_file": "src/styles/globals.css", "fix": "swap to Fraunces + JetBrains Mono per DESIGN.md §3" },
+    { "dim": "layout", "severity": "high", "description": "three identical feature cards in section 2", "likely_file": "src/pages/index.tsx", "fix": "vary card sizes per design-brand.md §Layout" }
+  ],
+  "tokens_used": 14500,
+  "timestamp": "2026-05-03T12:34:56.000Z"
+}
+```
+## Issue fingerprint (regression detection)
+The orchestrator computes a fingerprint per top_issue for each iteration:
+```
+fingerprint = `${dim}__${path.basename(likely_file)}__${first_32_chars_of_description}`
+                      .toLowerCase().replace(/\W+/g, "_")
+```
+State stores `state.fingerprints[fingerprint] = { iterations: [1,2,3], description, dim }`. The KILL trigger is **3 consecutive integer iterations in `iterations[]`** — non-consecutive recurrences don't kill (the issue may have been fixed, broken by a different change, then refixed; that's a different signal than "fix-builder cannot fix this").
+When the kill trigger fires, the verdict becomes `killed_regression` and `state.kill_fingerprint` records which one. The user can `cat state.json | jq '.fingerprints | to_entries | map(select(.key == "{fingerprint}"))'` to see the recurrence pattern.
+## Token-budget table
+| Iterations | Tokens (est.) | Sized for |
+|---|---|---|
+| 2 | ~30K  | known-clean page sanity-check |
+| 4 | ~60K  | mid-confidence |
+| 6 | ~90K  | default |
+| 8 | ~120K | hard cap; pass `--budget 150000` to allow |
+Per-iteration cost (rough):
+- 3 screenshot reads ≈ 9K
+- rubric + brief inlined ≈ 2K (cached after iter 1)
+- previous-iteration delta ≈ 0.5K
+- 3 fix-builder spawns × (file read + edit + commit-fix call) ≈ 3K
+- **per-iteration ≈ 14.5K**
+## Self-test scenarios (mapping to spec)
+| # | Fixture | Expected | Verifier |
+|---|---|---|---|
+| 1 | `fixtures/clean.html` | SUCCESS in 1-2 iterations, all dims ≥ 4 | run capture, run evaluator inline, assert pass |
+| 2 | `fixtures/broken.html` | SUCCESS in 4-6 iters; identifies banned font + gradient + 3-card grid + side-stripe + generic CTA | each fix-builder commits a `qpl-N:` change; final eval all dims ≥ 3 |
+| 3 | Kill-switch | KILL at iter ≤ 4 with `LOOP_REGRESSION_DETECTED` | call `loop.mjs record` 3× with the same fingerprint; assert exit 3 + correct verdict |
+The pilot-results doc at `docs/playwright-loop-pilot-results.md` records the actual outcome from `bash scripts/_self-tests.sh` (Scenario 3 is exercised by a deterministic unit-style invocation; Scenarios 1+2 require a real vision pass and are run by Claude when the loop ships).
+## Final report template (what `loop.mjs report` emits to stdout)
+```markdown
+# Visual-Polish Loop Report
+- **URL:** http://localhost:3000
+- **Brief:** .planning/DESIGN.md
+- **Started:** 2026-05-03T12:00:00Z
+- **Final verdict:** SUCCESS
+- **Iterations:** 4 / 8
+- **Tokens used:** 58000 / 100000
+- **Fixes committed:** 7
+## Iteration log
+### Iteration 1
+- Scores: typo=1 colo=1 spat=3 layo=1 shad=3 moti=3 micr=1 cont=1
+- Aggregate: 14/40 (avg 1.75)
+- Pass: NO (failing: typography, color, layout, microcopy, container)
+- Top issues:
+  - **color** [critical] blue→purple gradient on hero → src/styles/globals.css
+  - **typography** [critical] Inter as primary → src/styles/globals.css
+  - **layout** [high] three identical cards → src/pages/index.tsx
+### Iteration 2
+- Scores: typo=3 colo=3 spat=3 layo=2 shad=3 moti=3 micr=2 cont=2
+- Aggregate: 21/40 (avg 2.62)
+- Pass: NO (failing: layout, microcopy, container)
+- ...
+### Iteration 3
+- Scores: typo=4 colo=3 spat=3 layo=3 shad=3 moti=3 micr=3 cont=3
+- Aggregate: 25/40 (avg 3.13)
+- Pass: YES
+## Fix commits (revertable)
+- abc1234 qpl-1: color-gradient-removal — src/styles/globals.css
+- def5678 qpl-1: typography-fraunces — src/styles/globals.css
+- ...
+## Issue fingerprints (regression tracker)
+- color__globals_css__blue_purple_gradient — iterations [1] — fixed at iter 2
+```
+## Why three viewports
+Per the spec's hard constraint (§5g `prefers-reduced-motion` and §5c mobile-only failures), the loop MUST evaluate at mobile (375), tablet (768), and desktop (1440). The aggregate score is the **minimum** across viewports for each dimension — a layout that's elegant on desktop but breaks at 375 is a fail, full stop.
+This is intentional. Most visual regressions Fawzi has documented in `/insights` (hero videos cropped wrong on mobile, touch targets < 44px on mobile, navigation collapse misbehaving) only show up below 768. Scoring on desktop alone is how we got "looks great in dev" → "looks broken on the user's phone."
+## What the loop does NOT do (deferred to v5.2)
+- Cross-browser rendering checks (Firefox / WebKit) — Chromium-only, per `qualia-polish` Stage 4 precedent
+- Accessibility audits beyond what the rubric scores — use `/qualia-polish` Stage 3 (Lighthouse + axe) for that
+- Performance regressions — use `/qualia-polish-loop` only after Lighthouse score passes
+- Reference-image-only mode (compare to a target screenshot without a brief) — currently the brief is required; reference is supplemental
+- Multi-page sweeps — one URL per invocation; chain `/qualia-polish-loop` per route for site-wide passes

package/skills/qualia-polish-loop/SKILL.md ADDED Viewed

@@ -0,0 +1,201 @@
+---
+name: qualia-polish-loop
+description: "Autonomous visual-polish loop — screenshots a live URL at three viewports, scores 8 design dimensions against the rubric using vision, fixes issues in the codebase, re-screenshots, loops until every dimension scores ≥ 3 or the kill-switch fires. Trigger on 'polish loop', 'visual loop', 'screenshot polish', 'visual QA loop', 'fix what I see', 'make it look right', 'iterate on the design until it's correct'. v5.1 flagship — closes the design-iteration churn friction."
+allowed-tools:
+  - Bash
+  - Read
+  - Write
+  - Edit
+  - Grep
+  - Glob
+  - Agent
+argument-hint: "{url} [--brief PATH] [--max 8] [--viewports 375,768,1440] [--ref PATH] [--budget 100000]"
+---
+# /qualia-polish-loop — Autonomous Visual-Polish Loop
+See its own work. Fix its own work. Stop only when correct.
+## What it does
+Takes a URL + design brief. Screenshots at 3 viewports (mobile / tablet / desktop). Spawns a vision evaluator that scores 8 dimensions of `rules/design-rubric.md` against the brief with cited evidence. Spawns up to 3 fix-builders in parallel for the top issues. Re-screenshots. Loops until all dimensions ≥ 3 or the kill-switch trips (regression, budget, or max iterations).
+Different from `/qualia-polish`: that one is read+edit+slop-detect, single pass. This one is **see+edit+verify+repeat** with a real loop and real screenshots.
+## When to use
+- After `/qualia-build` or `/qualia-verify` when visual issues remain that text-based slop-detect can't catch (mobile-only breakage, motion missing, hero-video framing)
+- When the user says "it doesn't look right" / "fix what I see" / "the mobile version is broken"
+- As an opt-in upgrade to `/qualia-polish --redesign` Stage 4 (vision loop)
+- Before `/qualia-ship` for a final visual QA pass
+## Pre-flight (sequential, every invocation)
+```bash
+node ~/.claude/bin/qualia-ui.js banner polish
+```
+Run these in order. Halt on the first failure.
+| Gate | Check | If fail |
+|---|---|---|
+| Substrate | `rules/design-rubric.md`, `rules/design-laws.md` exist | Run `npx qualia install` |
+| Brief | `--brief` PATH if provided, else `.planning/DESIGN.md`, else PRODUCT.md | If none, HALT: "No design brief found. Pass --brief or run /qualia-new." |
+| Browser | `node ~/.claude/skills/qualia-polish-loop/scripts/playwright-capture.mjs --url about:blank --out /tmp/qpl-preflight` exits 0 | HALT with the script's setup hint |
+| URL reachable | `curl -fsS -o /dev/null -w '%{http_code}' "$URL"` returns 2xx/3xx | HALT — start the dev server first |
+| Working tree | `git status --porcelain` is empty | HALT — "Loop commits per iteration. Stash or commit pending changes first." |
+| Budget | Estimate iters × ~14.5K tokens; warn if > 100K | If `--budget` not set, default 100K. Surface estimate to user. |
+State the preflight explicitly:
+```
+QPL_PREFLIGHT: substrate=pass brief={path} browser=pass url=200 git=clean budget=100K
+```
+## Loop (max 8 iterations, default 6)
+Single state file at `/tmp/qpl-{timestamp}/state.json` keeps the deterministic counters (iteration, fingerprints, fixes, verdict) out of the LLM context.
+```bash
+RUN_ID="qpl-$(date +%s)"
+STATE="/tmp/${RUN_ID}/state.json"
+node ~/.claude/skills/qualia-polish-loop/scripts/loop.mjs init \
+  --state "$STATE" --url "$URL" --brief "$BRIEF" \
+  --max "${MAX:-8}" --budget "${BUDGET:-100000}"
+```
+Each iteration:
+### 1. Capture (deterministic, ~3-5s)
+```bash
+ITER_DIR="/tmp/${RUN_ID}/iter-${N}"
+node ~/.claude/skills/qualia-polish-loop/scripts/playwright-capture.mjs \
+  --url "$URL" --out "$ITER_DIR" --viewports 375,768,1440 --wait 1500
+```
+Three PNGs land in `$ITER_DIR/{mobile,tablet,desktop}-*.png`. The capture script auto-selects a backend: Playwright if `import('playwright')` resolves, else the cached `~/.cache/ms-playwright/chromium-*` binary, else `google-chrome` / `chromium` on PATH.
+### 2. Evaluate (vision — single Agent spawn per iteration)
+Spawn `Agent` with `subagent_type="qualia-visual-evaluator"`. Inline the rubric, the brief, the screenshots paths, the viewport meta, and the previous iteration's issues (if any). The agent reads the screenshots itself and returns a single JSON envelope per the contract in `agents/visual-evaluator.md`.
+Save the agent's JSON to `$ITER_DIR/eval.json`. The exact spawn template lives in REFERENCE.md (so the SKILL.md stays under 250 lines per progressive-disclosure discipline).
+### 3. Decide (deterministic)
+```bash
+node ~/.claude/skills/qualia-polish-loop/scripts/loop.mjs record \
+  --state "$STATE" --eval "$ITER_DIR/eval.json"
+```
+Exit codes: `0` = SUCCESS (all dims ≥ 3), `1` = CONTINUE (more iterations), `3` = KILLED (regression / budget / max).
+The orchestrator computes the verdict per `rules/design-rubric.md`:
+- **all aggregate scores ≥ 3 AND no critical issues remain** → SUCCESS, exit loop
+- **same issue fingerprint recurred 3 consecutive iterations** → KILL, `LOOP_REGRESSION_DETECTED`
+- **tokens used exceeds budget** → KILL, `OUT_OF_BUDGET`
+- **iteration count = max** → KILL, `MAX_ITERATIONS_REACHED`
+- **else** → CONTINUE
+### 4. Fix (parallel, up to 3 builders)
+For the top 3 issues from `eval.json.top_issues`, spawn `qualia-builder` agents IN THE SAME RESPONSE TURN (parallel). Each builder receives the issue, the design tokens from DESIGN.md, and explicit "fix only this dimension, do not refactor" rules. Template in REFERENCE.md.
+Each builder commits its fix via the orchestrator's slop-detect-gated commit:
+```bash
+node ~/.claude/skills/qualia-polish-loop/scripts/loop.mjs commit-fix \
+  --state "$STATE" --file "$AFFECTED_FILE" --slug "{issue-slug}"
+```
+This stages the file, runs `slop-detect` first (critical findings BLOCK and the builder must retry), then commits with `qpl-{N}: {slug}`. Every fix is a real revertable git commit. Failed fixes leave the working tree clean — no half-edits.
+### 5. Redeploy (default: dev-localhost)
+Default mode does NOT redeploy. The loop assumes the dev server is running with HMR; the next capture re-screenshots the changed page. Verify the dev server is still up:
+```bash
+curl -fsS -o /dev/null "$URL" || HALT "dev server died at iteration $N"
+```
+For Vercel-preview mode (opt-in via `--deploy preview`), run `vercel deploy` (NOT `--prod`) and update `URL` to the deploy URL before the next capture. Slower (~30-60s/iter), but works against a fully-built environment when HMR isn't reliable.
+### 6. Loop back to Capture, increment N
+## Post-loop
+```bash
+node ~/.claude/skills/qualia-polish-loop/scripts/loop.mjs report --state "$STATE" \
+  > .planning/visual-polish-loop.md
+git add .planning/visual-polish-loop.md
+git -c user.name="Qualia Solutions" -c user.email="info@qualiasolutions.net" \
+    commit -m "qpl-final: visual-polish-loop report"
+```
+Then a closing card via `qualia-ui`:
+```bash
+node ~/.claude/bin/qualia-ui.js divider
+node ~/.claude/bin/qualia-ui.js ok "Iterations: {N}/{max}"
+node ~/.claude/bin/qualia-ui.js ok "Final aggregate: {sum}/40 (avg {avg})"
+node ~/.claude/bin/qualia-ui.js ok "Fixes committed: {count}"
+node ~/.claude/bin/qualia-ui.js ok "Tokens used: {N}K of {budget}K"
+node ~/.claude/bin/qualia-ui.js end "VISUAL POLISH LOOP — {SUCCESS|KILLED-AT-N|OUT-OF-BUDGET}" "/qualia-verify or /qualia-ship"
+```
+## Hard rules
+1. **Vision-eval is anchored.** The rubric criteria are inlined in the eval prompt verbatim. Never spawn the evaluator with "tell me what you think" — that's how you get "looks great!" hallucinations. The verbatim contract is in REFERENCE.md.
+2. **Per-iteration commits.** Every fix gets `qpl-{N}: {slug}` prefix so the user can `git revert` any iteration cleanly. The orchestrator enforces this via `loop.mjs commit-fix`.
+3. **Slop-detect gate.** Critical findings BLOCK the commit. The fix-builder must retry (or the loop kills if the same fingerprint recurs 3x).
+4. **No silent destruction.** All edits go through git. Failed iterations leave clear `qpl-N:` commit trail. No working-tree side effects.
+5. **`prefers-reduced-motion` honored.** If reduced motion is set, the evaluator scores motion on CSS-declaration quality, not visible animation. Don't penalize "no motion" when reduced motion is on.
+6. **Budget discipline.** Token usage is tracked in state. KILL at cap. Surface partial progress.
+7. **Cleanup on exit.** No orphan browser processes (the capture script spawns short-lived headless processes, but verify nothing lingers via `pgrep -f chrome --headless`). Temp dirs left for forensic debugging until the next run.
+## Failure modes
+| Symptom | Likely cause | Action |
+|---|---|---|
+| `playwright-capture.mjs` exits 2 with `no_browser_backend` | No Chrome/Chromium found anywhere | `npx playwright install chromium` or install Google Chrome |
+| Screenshot is blank | Page not done rendering | Increase `--wait` to 3000ms; retry once |
+| Vision agent says "looks great!" to everything | Rubric not inlined / prompt drift | Verify REFERENCE.md prompt is being used verbatim. Check `agents/visual-evaluator.md` is on disk. |
+| Same issue every iteration | Fix-builder not addressing root cause | Kill at 3 recurrences (automatic). Read `state.json.fingerprints[*]` for diagnosis. Hand-fix and resume. |
+| Dev server died mid-loop | Port conflict / crash | Detect via curl heartbeat in step 5. HALT — restart server, run loop with `--resume STATE`. |
+| Token budget blown | Too many iterations needed | KILL at cap (automatic). Report partial progress. Consider tightening DESIGN.md or pre-running `/qualia-polish` once first. |
+## Setup notes for users
+The loop requires headless Chrome/Chromium. Two ways to get it:
+```bash
+# A. Playwright (recommended — best stability + waiting semantics)
+npx playwright install chromium
+# B. System Chrome (fallback — works if google-chrome or chromium is on PATH)
+# Already installed on most dev machines; nothing to do.
+```
+The capture script tries Playwright first, then the cached chromium binary, then PATH lookups. No npm dependency on Playwright is added to your project — it's optional.
+## Self-tests (the spec mandates 3 scenarios)
+`docs/playwright-loop-pilot-results.md` records the outcome of running the loop against:
+1. `fixtures/clean.html` (well-designed page) — expect SUCCESS in 1-2 iterations
+2. `fixtures/broken.html` (deliberately bad page) — expect SUCCESS in 4-6 iterations after fixes
+3. Synthetic kill-switch test — verify regression detection fires at iteration 4
+Run them with `bash skills/qualia-polish-loop/scripts/_self-tests.sh` (if present) or follow the manual instructions in REFERENCE.md.
+## Token-budget discipline
+| Max iterations | Estimated tokens | Notes |
+|---|---|---|
+| 2 | ~30K | tight — only for known-clean pages |
+| 4 | ~60K | standard |
+| 6 | ~90K | default |
+| 8 | ~120K | maximum — set `--budget 150000` to allow |
+Each iteration costs roughly: 3 PNG vision reads (~9K), rubric prompt (~2K), 3 fix-builder spawns with file reads (~3.5K). Plus the orchestrator's deterministic CLI calls (~negligible). The state file persists outside Claude's context entirely — only the per-iteration eval JSON and fix outcomes flow back in.

package/skills/qualia-polish-loop/fixtures/broken.html ADDED Viewed

@@ -0,0 +1,117 @@
+<!doctype html>
+<!--
+  Deliberately broken page used by /qualia-polish-loop self-test Scenario 2.
+  Hits multiple absolute-ban patterns from rules/design-laws.md and
+  rules/design-brand.md so the vision evaluator has to identify them all.
+  Banned font (Inter), pure white + pure black, blue-purple gradient,
+  gradient text, identical 3-column card grid, "Get Started" / "Learn More"
+  generic CTAs, side-stripe border-left:4px decorative, max-width:1280
+  fixed container, outline:none without focus replacement. This fixture
+  is intentional shipped slop. Do not lift patterns from it.
+-->
+<html lang="en">
+<head>
+<meta charset="utf-8">
+<meta name="viewport" content="width=device-width,initial-scale=1">
+<title>BrandX</title>
+<style>
+  body {
+    margin: 0;
+    background: #ffffff;
+    color: #000000;
+    font-family: Inter, sans-serif;
+    font-weight: 400;
+  }
+  .container { max-width: 1280px; margin: 0 auto; padding: 24px; }
+  header { display: flex; justify-content: space-between; align-items: center; padding: 16px 0; }
+  .logo { font-size: 18px; font-weight: 700; }
+  nav a { color: #000; margin-left: 24px; text-decoration: none; outline: none; }
+  nav a:hover { text-decoration: underline; }
+  .hero {
+    text-align: center;
+    padding: 80px 16px;
+    background: linear-gradient(135deg, #2563eb 0%, #9333ea 100%);
+    color: #ffffff;
+  }
+  .hero h1 {
+    font-size: 56px;
+    font-family: Inter, sans-serif;
+    font-weight: 800;
+    margin: 0 0 16px;
+    background: linear-gradient(90deg, #fff, #c4b5fd);
+    -webkit-background-clip: text;
+    background-clip: text;
+    color: transparent;
+  }
+  .hero p { font-size: 18px; max-width: 620px; margin: 0 auto 32px; }
+  .cta-row { display: flex; gap: 12px; justify-content: center; }
+  .btn {
+    background: #ffffff;
+    color: #000;
+    padding: 12px 24px;
+    border: 0;
+    font-family: Inter, sans-serif;
+    font-size: 16px;
+    cursor: pointer;
+    outline: none;
+  }
+  .btn-secondary { background: transparent; color: #fff; border: 1px solid #fff; }
+  .features { padding: 80px 0; }
+  .features h2 { font-size: 32px; text-align: center; margin: 0 0 48px; font-family: Inter, sans-serif; }
+  .grid { display: grid; grid-template-columns: repeat(3, 1fr); gap: 24px; }
+  .card {
+    background: #f8f9fa;
+    padding: 32px;
+    border-left: 4px solid #2563eb;
+  }
+  .card h3 { font-size: 18px; margin: 0 0 8px; font-family: Inter, sans-serif; }
+  .card p { color: #666; font-size: 14px; margin: 0; }
+  footer { background: #000; color: #fff; padding: 32px 0; text-align: center; font-size: 14px; }
+</style>
+</head>
+<body>
+  <div class="container">
+    <header>
+      <span class="logo">BrandX</span>
+      <nav>
+        <a href="#">Features</a>
+        <a href="#">Pricing</a>
+        <a href="#">About</a>
+      </nav>
+    </header>
+  </div>
+  <section class="hero">
+    <h1>Welcome to BrandX</h1>
+    <p>The AI-powered platform that helps you do more with less, built for modern teams who care about results.</p>
+    <div class="cta-row">
+      <button class="btn">Get Started</button>
+      <button class="btn btn-secondary">Learn More</button>
+    </div>
+  </section>
+  <div class="container features">
+    <h2>Everything you need to succeed</h2>
+    <div class="grid">
+      <div class="card">
+        <h3>Fast</h3>
+        <p>Lightning-fast performance that scales with your business needs and grows alongside you.</p>
+      </div>
+      <div class="card">
+        <h3>Secure</h3>
+        <p>Bank-grade security with end-to-end encryption keeping your data safe from prying eyes.</p>
+      </div>
+      <div class="card">
+        <h3>Easy</h3>
+        <p>Intuitive interface designed for everyone, from beginners to power users.</p>
+      </div>
+    </div>
+  </div>
+  <footer>2026 BrandX. All rights reserved.</footer>
+</body>
+</html>