npm - qualia-framework - Versions diffs - 4.5.0 → 5.3.0 - Mend

qualia-framework 4.5.0 → 5.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (66) hide show

package/AGENTS.md +24 -0
package/CLAUDE.md +12 -75
package/README.md +23 -16
package/agents/builder.md +9 -21
package/agents/planner.md +8 -0
package/agents/verifier.md +8 -0
package/agents/visual-evaluator.md +132 -0
package/bin/cli.js +54 -18
package/bin/install.js +369 -29
package/bin/qualia-ui.js +208 -1
package/bin/slop-detect.mjs +5 -0
package/bin/state.js +34 -1
package/docs/install-redesign-builder-prompt.md +290 -0
package/docs/install-redesign-pilot.md +234 -0
package/docs/playwright-loop-builder-prompt.md +185 -0
package/docs/playwright-loop-design-notes.md +108 -0
package/docs/playwright-loop-pilot-results.md +170 -0
package/docs/playwright-loop-tester-prompt.md +213 -0
package/docs/polish-loop-supervised-run.md +111 -0
package/docs/reviews/matt-pocock-skills-analysis.md +300 -0
package/guide.md +9 -5
package/hooks/env-empty-guard.js +74 -0
package/hooks/pre-compact.js +19 -9
package/hooks/pre-deploy-gate.js +8 -2
package/hooks/pre-push.js +26 -12
package/hooks/supabase-destructive-guard.js +62 -0
package/hooks/vercel-account-guard.js +91 -0
package/package.json +2 -1
package/rules/design-brand.md +4 -0
package/rules/design-laws.md +4 -0
package/rules/design-product.md +4 -0
package/rules/design-rubric.md +4 -0
package/rules/grounding.md +4 -0
package/skills/qualia-build/SKILL.md +40 -46
package/skills/qualia-discuss/SKILL.md +51 -68
package/skills/qualia-handoff/SKILL.md +1 -0
package/skills/qualia-hook-gen/SKILL.md +206 -0
package/skills/qualia-issues/SKILL.md +151 -0
package/skills/qualia-map/SKILL.md +78 -35
package/skills/qualia-new/REFERENCE.md +139 -0
package/skills/qualia-new/SKILL.md +45 -121
package/skills/qualia-optimize/REFERENCE.md +265 -0
package/skills/qualia-optimize/SKILL.md +92 -232
package/skills/qualia-plan/SKILL.md +58 -65
package/skills/qualia-polish-loop/REFERENCE.md +265 -0
package/skills/qualia-polish-loop/SKILL.md +201 -0
package/skills/qualia-polish-loop/fixtures/broken.html +117 -0
package/skills/qualia-polish-loop/fixtures/clean.html +196 -0
package/skills/qualia-polish-loop/scripts/loop.mjs +323 -0
package/skills/qualia-polish-loop/scripts/playwright-capture.mjs +206 -0
package/skills/qualia-polish-loop/scripts/score.mjs +176 -0
package/skills/qualia-prd/SKILL.md +199 -0
package/skills/qualia-report/SKILL.md +141 -200
package/skills/qualia-research/SKILL.md +28 -33
package/skills/qualia-road/SKILL.md +103 -0
package/skills/qualia-ship/SKILL.md +1 -0
package/skills/qualia-task/SKILL.md +1 -1
package/skills/qualia-test/SKILL.md +50 -2
package/skills/qualia-triage/SKILL.md +152 -0
package/skills/qualia-verify/SKILL.md +63 -104
package/skills/qualia-zoom/SKILL.md +51 -0
package/skills/zoho-workflow/SKILL.md +1 -1
package/templates/CONTEXT.md +36 -0
package/templates/decisions/ADR-template.md +30 -0
package/tests/bin.test.sh +598 -7
package/tests/state.test.sh +58 -0

package/skills/qualia-plan/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: qualia-plan
-description: "Plan the current phase — spawns planner, validates with plan-checker in a revision loop (max 2), optionally runs discuss/research first. Use when ready to plan a phase."
+description: "Plans the current phase by spawning a planner agent to break it into executable tasks with waves, then validates via a plan-checker revision loop (max 2 cycles). Supports gap-closure mode for verification failures. Use when the user says 'plan this phase', 'break this into tasks', 'create the plan', 'qualia-plan', 'plan phase 2', or after /qualia-new sets up the journey."
 allowed-tools:
   - Bash
   - Read
@@ -15,15 +15,15 @@ allowed-tools:
 # /qualia-plan — Plan a Phase
-Spawn a planner agent to break the current phase into executable tasks, then validate the plan with a checker (up to 2 revision cycles) before routing to build.
+Spawn planner to break phase into tasks, validate with checker (max 2 revision cycles), route to build.
 ## Usage
-`/qualia-plan` — plan the next unplanned phase
+`/qualia-plan` — plan next unplanned phase
 `/qualia-plan {N}` — plan specific phase N
 `/qualia-plan {N} --gaps` — plan fixes for verification failures
-`/qualia-plan {N} --skip-check` — skip the plan-checker validation loop (not recommended)
-`/qualia-plan {N} --auto` — plan + auto-chain into `/qualia-build {N} --auto` when done (no human approval between plan and build)
+`/qualia-plan {N} --skip-check` — skip plan-checker loop (not recommended)
+`/qualia-plan {N} --auto` — plan + chain into `/qualia-build {N} --auto` (no human gate)
 ## Process
@@ -38,9 +38,9 @@ node ~/.claude/bin/knowledge.js load patterns
 node ~/.claude/bin/knowledge.js load client
 ```
-If no phase number given, use the current phase from STATE.md.
+No phase number → current phase from STATE.md.
-**Read phase-specific context if it exists:**
+**Phase-specific context (if exists):**
 ```bash
 cat .planning/phase-{N}-context.md 2>/dev/null   # from /qualia-discuss
 cat .planning/phase-{N}-research.md 2>/dev/null  # from /qualia-research
@@ -48,19 +48,17 @@ cat .planning/phase-{N}-research.md 2>/dev/null  # from /qualia-research
 ### 2. Optional: Suggest Deeper Prep
-**If ROADMAP.md marked this phase as a "research flag" AND no phase-{N}-research.md exists:**
+**If ROADMAP.md flagged phase for research AND no phase-{N}-research.md:**
 - header: "Research first?"
-- question: "This phase was flagged for deeper research. Run /qualia-research {N} first?"
+- question: "Phase flagged for research. Run /qualia-research {N} first?"
 - options:
-  - "Yes, research first" — Run /qualia-research {N} inline, then continue
-  - "Skip, plan directly" — I know enough
+  - "Yes, research first"
+  - "Skip, plan directly"
-**If phase involves compliance/regulatory/architectural stakes AND no phase-{N}-context.md exists:**
+**If phase has compliance/regulatory/architectural stakes AND no phase-{N}-context.md:**
-Briefly suggest: *"Want to run /qualia-discuss {N} first to lock decisions? Optional."*
-Don't force it. Some phases don't need it.
+Suggest: *"Run /qualia-discuss {N} first to lock decisions? Optional."*
 ### 3. Spawn Planner (Fresh Context)
@@ -69,12 +67,9 @@ node ~/.claude/bin/qualia-ui.js banner plan {N} "{phase name from ROADMAP.md}"
 node ~/.claude/bin/qualia-ui.js spawn planner "Breaking phase into tasks..."
 ```
-Spawn the planner:
 ```
 Agent(prompt="
-Read your role: @~/.claude/agents/planner.md
-Grounding + rubrics: @~/.claude/rules/grounding.md
+Role: @~/.claude/agents/planner.md
 <project_context>
 @.planning/PROJECT.md
@@ -89,86 +84,84 @@ Phase {N} from ROADMAP.md:
 @.planning/ROADMAP.md
 Goal: {goal from ROADMAP.md}
-Requirements: {REQ-IDs from ROADMAP.md}
+Reqs: {REQ-IDs from ROADMAP.md}
 Success criteria: {success criteria from ROADMAP.md}
 </phase_details>
 <locked_decisions>
-{if phase-{N}-context.md exists, inline its Locked Decisions section; else 'none'}
+{phase-{N}-context.md Locked Decisions if exists, else 'none'}
 </locked_decisions>
 <research_findings>
-{if phase-{N}-research.md exists, inline its recommendation; else 'none'}
+{phase-{N}-research.md recommendation if exists, else 'none'}
 </research_findings>
-{If --gaps: Also read @.planning/phase-{N}-verification.md for failures to fix. Create gap-closure plan.}
+{--gaps → read @.planning/phase-{N}-verification.md failures. Create gap-closure plan.}
 <relevant_learnings>
-{inline any applicable patterns from knowledge/learned-patterns.md}
+{applicable patterns from knowledge/learned-patterns.md}
 </relevant_learnings>
-Create the plan at .planning/phase-{N}-plan.md (or .planning/phase-{N}-gaps-plan.md for --gaps).
+Output: .planning/phase-{N}-plan.md (or phase-{N}-gaps-plan.md for --gaps).
 ", subagent_type="qualia-planner", description="Plan phase {N}")
 ```
-### 4. Validate the Plan (unless --skip-check)
+### 4. Validate Plan (unless --skip-check)
-Read the generated plan. Spawn the plan-checker:
+Read generated plan. Spawn checker:
 ```
 Agent(prompt="
-Read your role: @~/.claude/agents/plan-checker.md
-Grounding + rubrics: @~/.claude/rules/grounding.md
+Role: @~/.claude/agents/plan-checker.md
 <plan_path>.planning/phase-{N}-plan.md</plan_path>
 <phase_goal>{goal from ROADMAP.md}</phase_goal>
 <success_criteria>{criteria from ROADMAP.md}</success_criteria>
 <project_context>@.planning/PROJECT.md</project_context>
-Validate against the 7 rules. Return PASS or REVISE with structured issues.
+Validate against 7 rules. Return PASS or REVISE with structured issues.
 ", subagent_type="qualia-plan-checker", description="Check plan phase {N}")
 ```
-**Revision loop (max 2 iterations):**
+**Revision loop (max 2):**
-- Iteration 1: Check → if REVISE, re-spawn planner with checker issues
-- Iteration 2: Re-check → if REVISE or BLOCKED, escalate to user
+- Iter 1: Check → REVISE → re-spawn planner with checker issues
+- Iter 2: Re-check → REVISE/BLOCKED → escalate to user
-Rationale: Amazon/NeurIPS 2025 measured reflection gains at 74→86% for 1 round, 88% for 3 rounds. Iteration 3 only added 2pp over iteration 1 — not worth the extra planner spawn (serial cost ~30-60s).
+(74→86% after 1 round, 88% after 3. Iter 3 adds only 2pp; not worth extra spawn.)
-For each revision:
+Per revision:
 ```
 Agent(prompt="
-Read your role: @~/.claude/agents/planner.md
+Role: @~/.claude/agents/planner.md
 <revision_mode>true</revision_mode>
 <current_plan>@.planning/phase-{N}-plan.md</current_plan>
 <checker_feedback>
-{inline REVISE output from plan-checker}
+{REVISE output from plan-checker}
 </checker_feedback>
-Revise the plan in place. Address every issue. Do NOT add new tasks or change scope
-— only fix what the checker flagged.
+Revise in place. Address every issue. Do NOT add tasks or change scope; fix only what checker flagged.
 ", subagent_type="qualia-planner", description="Revise plan phase {N}")
 ```
-After revision, spawn the checker again. Max 2 total revision cycles.
+After revision, re-spawn checker. Max 2 total cycles.
-**If checker returns BLOCKED after 2 cycles:**
+**BLOCKED after 2 cycles:**
 ```bash
 node ~/.claude/bin/qualia-ui.js fail "Plan failed validation after 2 revisions"
 ```
-Show the remaining issues. Ask:
-- "Skip validation and proceed anyway" (use `--skip-check`)
-- "Adjust the roadmap" (phase scope may be wrong)
-- "Adjust the phase goal" (success criteria may be under-specified)
+Show remaining issues. Options:
+- "Skip validation" (`--skip-check`)
+- "Adjust roadmap" (scope wrong)
+- "Adjust phase goal" (criteria under-specified)
 ### 5. Present Final Plan
-Render the story-file dashboard — this is a single command that parses the plan and shows the phase goal, waves, tasks, personas, dependencies, acceptance-criteria counts, and validation counts:
+Render story-file dashboard:
 ```bash
 node ~/.claude/bin/qualia-ui.js plan-summary .planning/phase-{N}-plan.md
@@ -184,19 +177,19 @@ If "adjust" — get feedback, re-spawn planner with revision context, re-validat
 node ~/.claude/bin/state.js transition --to planned --phase {N}
 ```
-If state.js returns an error, show it and stop. Do NOT manually edit STATE.md or tracking.json.
+Error → show, stop. Do NOT edit STATE.md or tracking.json manually.
 ### 7. Route (auto-chain aware)
-**If invoked with `--auto`:** immediately invoke `/qualia-build {N} --auto` inline. The user already approved the whole journey at `/qualia-new` time — no additional approval needed per phase plan.
+**`--auto`:** invoke `/qualia-build {N} --auto` inline. User approved at `/qualia-new`; no per-phase gate.
 ```bash
 node ~/.claude/bin/qualia-ui.js info "Auto mode — chaining into /qualia-build {N}"
 ```
-Then invoke the `qualia-build` skill inline with `--auto`.
+Then invoke `qualia-build` inline with `--auto`.
-**Otherwise (guided mode):** stop and show the next step:
+**Guided mode:** stop, show next step:
 ```bash
 node ~/.claude/bin/qualia-ui.js end "PHASE {N} PLANNED" "/qualia-build {N}"
@@ -204,22 +197,22 @@ node ~/.claude/bin/qualia-ui.js end "PHASE {N} PLANNED" "/qualia-build {N}"
 ## Gap Closure Mode (`--gaps`)
-When invoked as `/qualia-plan {N} --gaps`, the planner is in gap-closure mode:
+With `--gaps`, planner enters gap-closure mode:
-1. Read `.planning/phase-{N}-verification.md` — extract ONLY the FAIL items
-2. For each FAIL item, create a targeted fix task:
-   - **Files:** specific files that failed verification
-   - **Action:** specific fix (not "fix auth" — "add session persistence check in src/lib/auth.ts signIn function")
-   - **Acceptance Criteria:** the exact verification criterion that previously failed, restated as an observable behavior
-3. Do NOT re-plan passing items. Do NOT add new features. Gap plans are surgical.
-4. Write to `.planning/phase-{N}-gaps-plan.md` (separate from original plan)
-5. All gap tasks are Wave 1 (parallel) unless they share files
-6. Plan-checker still validates the gap plan — same 7 rules apply
+1. Read `.planning/phase-{N}-verification.md` → extract ONLY FAIL items
+2. Per FAIL, create targeted fix task:
+   - **Files:** specific files that failed
+   - **Action:** specific fix (not "fix auth"; "add session persistence check in src/lib/auth.ts signIn fn")
+   - **AC:** failed criterion restated as observable behavior
+3. Do NOT re-plan passing items. No new features. Surgical only.
+4. Output: `.planning/phase-{N}-gaps-plan.md`
+5. All gap tasks Wave 1 (parallel) unless they share files
+6. Plan-checker still validates; same 7 rules
 ## Rules
-1. **Plan-checker is mandatory by default.** Only skip with `--skip-check`, and only if you know what you're doing.
-2. **Max 3 revision cycles.** After 3 failed checks, escalate — the phase scope is probably wrong.
-3. **Honor locked decisions.** If phase-{N}-context.md exists, its locked decisions are non-negotiable.
-4. **One plan file per phase.** Don't create phase-1-plan.md AND phase-1-plan-v2.md. Edit in place.
-5. **Revision is surgical.** When revising, only fix what the checker flagged — no scope creep.
+1. **Plan-checker mandatory by default.** Skip only with `--skip-check`.
+2. **Max 3 revision cycles.** 3 fails → escalate; scope probably wrong.
+3. **Honor locked decisions.** phase-{N}-context.md locked decisions non-negotiable.
+4. **One plan file per phase.** No phase-1-plan-v2.md. Edit in place.
+5. **Revision is surgical.** Fix only what checker flagged; no scope creep.

package/skills/qualia-polish-loop/REFERENCE.md ADDED Viewed

@@ -0,0 +1,265 @@
+# REFERENCE — /qualia-polish-loop
+Verbatim agent prompts and operational details. Loaded on demand by SKILL.md, not carried in the system prompt. Per progressive-disclosure discipline (Matt Pocock): the agent reads SKILL.md first, then this file when it needs the spawn templates.
+## Architecture summary
+```
+SKILL.md driver (Claude session)
+  │
+  ├─ scripts/playwright-capture.mjs    (deterministic Node — produces PNGs)
+  │     ↓ writes /tmp/qpl-{ts}/iter-{N}/{mobile,tablet,desktop}-*.png
+  │
+  ├─ Agent({subagent_type: "qualia-visual-evaluator", ...})
+  │     ↓ reads PNGs, returns single JSON envelope (eval.json)
+  │
+  ├─ scripts/loop.mjs record           (deterministic — verdict + fingerprints)
+  │     ↓ exit 0=SUCCESS, 1=CONTINUE, 3=KILLED
+  │
+  ├─ Agent({subagent_type: "qualia-builder", ...})  × up to 3 in parallel
+  │     ↓ each fixes ONE issue, calls scripts/loop.mjs commit-fix
+  │
+  └─ scripts/loop.mjs report           (final markdown report)
+        ↓ writes .planning/visual-polish-loop.md
+```
+## Capture: backend selection
+The capture script (`scripts/playwright-capture.mjs`) auto-selects in this order:
+1. `import('playwright')` — preferred when available; gives deterministic `waitUntil: 'networkidle'`
+2. `import('playwright-core')` — same API, lighter package
+3. `~/.cache/ms-playwright/chromium-{version}/chrome-{linux64,linux,mac,win}/chrome` — if Playwright was ever installed for browsers but the package isn't import-resolvable
+4. `which google-chrome` / `chromium` / `chromium-browser` / `chrome` — system browser fallback
+For backends 3 and 4 (binary-direct), the script uses `--headless=new --screenshot --virtual-time-budget`. Less precise than Playwright's `networkidle` waiting but works without any npm dependency.
+Setup hints if all four fail:
+```bash
+# Option A — Playwright (best stability)
+npm i -D playwright && npx playwright install chromium
+# Option B — system Chrome (fastest setup if you already have Chrome installed)
+# (no action needed if google-chrome is on PATH)
+```
+## Vision-evaluator spawn template (VERBATIM)
+The vision evaluator's anchored discipline: **DEFAULT TO 3.** Only score above 3 with a cited design principle. Only score below 3 with a quoted violation. Without anchoring, vision models return "looks great!" to everything — that failure mode is the entire reason this loop exists. The full rubric criteria live in `agents/visual-evaluator.md`; this section is the spawn template.
+When the loop reaches step 2 (Evaluate), spawn ONE agent with the screenshots, brief, rubric, and previous-iteration context. Inline this prompt verbatim — do not paraphrase.
+```
+Agent({
+  subagent_type: "qualia-visual-evaluator",
+  description: "Score iteration {N} screenshots against rubric",
+  prompt: `
+Role: @~/.claude/agents/visual-evaluator.md
+<rubric>
+{INLINE rules/design-rubric.md §"The 8 dimensions" through §"Aggregate score"}
+</rubric>
+<brief>
+{INLINE the relevant excerpt from .planning/DESIGN.md — sections "Direction", "Color", "Typography"}
+</brief>
+<product>
+{INLINE the relevant excerpt from .planning/PRODUCT.md — register, voice, anti-references}
+</product>
+<screenshots>
+- mobile (375px):  /tmp/qpl-{ts}/iter-{N}/mobile-375.png
+- tablet (768px):  /tmp/qpl-{ts}/iter-{N}/tablet-768.png
+- desktop (1440px): /tmp/qpl-{ts}/iter-{N}/desktop-1440.png
+</screenshots>
+<viewport_meta>
+{ "reduced_motion": {true|false}, "viewport_widths": [375, 768, 1440] }
+</viewport_meta>
+<previous_iteration>
+{If N > 1, INLINE eval.json.top_issues from iter-{N-1} so the evaluator can verify regression vs improvement. Otherwise: "(first iteration — no prior data)"}
+</previous_iteration>
+<task>
+This is iteration {N} of {max}. Read each screenshot. Score every dimension 1-5 with one-line evidence per dimension per viewport. Return a single fenced JSON block per the contract in your role file. No prose outside the JSON.
+</task>
+`
+})
+```
+The evaluator's role file (`agents/visual-evaluator.md`) carries the trust-boundary block, the calibration examples, and the JSON output contract. Together with this spawn template, the prompt prefix is stable across iterations — Anthropic prompt caching reuses the role + rubric + brief prefix, so the per-iteration cost is roughly: 3 image reads + the previous-iteration delta.
+## Fix-builder spawn template (VERBATIM)
+When the loop has 1-3 issues to fix, spawn one builder per issue IN THE SAME RESPONSE TURN (parallel). Each fixes one dimension, narrowly.
+```
+Agent({
+  subagent_type: "qualia-builder",
+  description: "Fix {dim} issue: {short description}",
+  prompt: `
+Role: @~/.claude/agents/builder.md
+<phase_context>
+You are inside /qualia-polish-loop iteration {N}. The vision evaluator scored
+the {dim} dimension at {score}. Your single task: fix that one dimension.
+<design>
+{INLINE .planning/DESIGN.md tokens relevant to {dim}}
+</design>
+<product>
+{INLINE .planning/PRODUCT.md voice + register}
+</product>
+</phase_context>
+<task_context>
+# Issue
+- Dimension: {dim}
+- Severity: {severity}
+- Description: {description}
+- Likely file: {likely_file or "(infer from grep — start at the path the screenshot suggests)"}
+- Recommended fix: {fix}
+# Files probably affected
+{1-3 candidate paths the loop has inferred from the URL routing}
+</task_context>
+<task>
+1. Read the likely file. If the issue is in a different file, follow the import graph until you find the source.
+2. Make the MINIMUM edit to fix this one dimension. Do not refactor. Do not change logic. Do not touch state management. Do not change copy unless this is a microcopy issue.
+3. Use design tokens from DESIGN.md. Do not invent new color values, font names, or spacing.
+4. After the edit, commit via the orchestrator (slop-detect-gated):
+     node ~/.claude/skills/qualia-polish-loop/scripts/loop.mjs commit-fix --state {STATE} --file {file} --slug {dim}-{short-keyword}
+   If slop-detect blocks (exit 2), READ the slop output and re-edit. If you cannot fix without violating slop-detect, return BLOCKED with the conflict.
+5. Return DONE with: file modified, lines changed, slop-detect: pass, commit: {sha}.
+</task>
+<rules>
+- Vision says: {evidence from eval.json.viewport_results[].evidence[{dim}]}
+- Do not add features.
+- Do not write tests for this fix (the loop's next iteration is the test).
+- Single commit. The orchestrator handles the slug + iteration prefix.
+</rules>
+`
+})
+```
+## Iteration log entry (what `loop.mjs record` writes to state.json.iterations[])
+```json
+{
+  "iteration": 1,
+  "scores": { "typography": 1, "color": 1, "spatial": 3, "layout": 1, "shadow": 3, "motion": 3, "microcopy": 1, "container": 1 },
+  "aggregate": 14,
+  "pass": false,
+  "failing_dims": ["typography", "color", "layout", "microcopy", "container"],
+  "top_issues": [
+    { "dim": "color", "severity": "critical", "description": "blue→purple gradient on hero", "likely_file": "src/styles/globals.css", "fix": "replace linear-gradient with single accent var(--accent)" },
+    { "dim": "typography", "severity": "critical", "description": "Inter as primary font-family", "likely_file": "src/styles/globals.css", "fix": "swap to Fraunces + JetBrains Mono per DESIGN.md §3" },
+    { "dim": "layout", "severity": "high", "description": "three identical feature cards in section 2", "likely_file": "src/pages/index.tsx", "fix": "vary card sizes per design-brand.md §Layout" }
+  ],
+  "tokens_used": 14500,
+  "timestamp": "2026-05-03T12:34:56.000Z"
+}
+```
+## Issue fingerprint (regression detection)
+The orchestrator computes a fingerprint per top_issue for each iteration:
+```
+fingerprint = `${dim}__${path.basename(likely_file)}__${first_32_chars_of_description}`
+                      .toLowerCase().replace(/\W+/g, "_")
+```
+State stores `state.fingerprints[fingerprint] = { iterations: [1,2,3], description, dim }`. The KILL trigger is **3 consecutive integer iterations in `iterations[]`** — non-consecutive recurrences don't kill (the issue may have been fixed, broken by a different change, then refixed; that's a different signal than "fix-builder cannot fix this").
+When the kill trigger fires, the verdict becomes `killed_regression` and `state.kill_fingerprint` records which one. The user can `cat state.json | jq '.fingerprints | to_entries | map(select(.key == "{fingerprint}"))'` to see the recurrence pattern.
+## Token-budget table
+| Iterations | Tokens (est.) | Sized for |
+|---|---|---|
+| 2 | ~30K  | known-clean page sanity-check |
+| 4 | ~60K  | mid-confidence |
+| 6 | ~90K  | default |
+| 8 | ~120K | hard cap; pass `--budget 150000` to allow |
+Per-iteration cost (rough):
+- 3 screenshot reads ≈ 9K
+- rubric + brief inlined ≈ 2K (cached after iter 1)
+- previous-iteration delta ≈ 0.5K
+- 3 fix-builder spawns × (file read + edit + commit-fix call) ≈ 3K
+- **per-iteration ≈ 14.5K**
+## Self-test scenarios (mapping to spec)
+| # | Fixture | Expected | Verifier |
+|---|---|---|---|
+| 1 | `fixtures/clean.html` | SUCCESS in 1-2 iterations, all dims ≥ 4 | run capture, run evaluator inline, assert pass |
+| 2 | `fixtures/broken.html` | SUCCESS in 4-6 iters; identifies banned font + gradient + 3-card grid + side-stripe + generic CTA | each fix-builder commits a `qpl-N:` change; final eval all dims ≥ 3 |
+| 3 | Kill-switch | KILL at iter ≤ 4 with `LOOP_REGRESSION_DETECTED` | call `loop.mjs record` 3× with the same fingerprint; assert exit 3 + correct verdict |
+The pilot-results doc at `docs/playwright-loop-pilot-results.md` records the actual outcome from `bash scripts/_self-tests.sh` (Scenario 3 is exercised by a deterministic unit-style invocation; Scenarios 1+2 require a real vision pass and are run by Claude when the loop ships).
+## Final report template (what `loop.mjs report` emits to stdout)
+```markdown
+# Visual-Polish Loop Report
+- **URL:** http://localhost:3000
+- **Brief:** .planning/DESIGN.md
+- **Started:** 2026-05-03T12:00:00Z
+- **Final verdict:** SUCCESS
+- **Iterations:** 4 / 8
+- **Tokens used:** 58000 / 100000
+- **Fixes committed:** 7
+## Iteration log
+### Iteration 1
+- Scores: typo=1 colo=1 spat=3 layo=1 shad=3 moti=3 micr=1 cont=1
+- Aggregate: 14/40 (avg 1.75)
+- Pass: NO (failing: typography, color, layout, microcopy, container)
+- Top issues:
+  - **color** [critical] blue→purple gradient on hero → src/styles/globals.css
+  - **typography** [critical] Inter as primary → src/styles/globals.css
+  - **layout** [high] three identical cards → src/pages/index.tsx
+### Iteration 2
+- Scores: typo=3 colo=3 spat=3 layo=2 shad=3 moti=3 micr=2 cont=2
+- Aggregate: 21/40 (avg 2.62)
+- Pass: NO (failing: layout, microcopy, container)
+- ...
+### Iteration 3
+- Scores: typo=4 colo=3 spat=3 layo=3 shad=3 moti=3 micr=3 cont=3
+- Aggregate: 25/40 (avg 3.13)
+- Pass: YES
+## Fix commits (revertable)
+- abc1234 qpl-1: color-gradient-removal — src/styles/globals.css
+- def5678 qpl-1: typography-fraunces — src/styles/globals.css
+- ...
+## Issue fingerprints (regression tracker)
+- color__globals_css__blue_purple_gradient — iterations [1] — fixed at iter 2
+```
+## Why three viewports
+Per the spec's hard constraint (§5g `prefers-reduced-motion` and §5c mobile-only failures), the loop MUST evaluate at mobile (375), tablet (768), and desktop (1440). The aggregate score is the **minimum** across viewports for each dimension — a layout that's elegant on desktop but breaks at 375 is a fail, full stop.
+This is intentional. Most visual regressions Fawzi has documented in `/insights` (hero videos cropped wrong on mobile, touch targets < 44px on mobile, navigation collapse misbehaving) only show up below 768. Scoring on desktop alone is how we got "looks great in dev" → "looks broken on the user's phone."
+## What the loop does NOT do (deferred to v5.2)
+- Cross-browser rendering checks (Firefox / WebKit) — Chromium-only, per `qualia-polish` Stage 4 precedent
+- Accessibility audits beyond what the rubric scores — use `/qualia-polish` Stage 3 (Lighthouse + axe) for that
+- Performance regressions — use `/qualia-polish-loop` only after Lighthouse score passes
+- Reference-image-only mode (compare to a target screenshot without a brief) — currently the brief is required; reference is supplemental
+- Multi-page sweeps — one URL per invocation; chain `/qualia-polish-loop` per route for site-wide passes