npm - qualia-framework - Versions diffs - 5.1.0 → 5.3.0 - Mend

qualia-framework 5.1.0 → 5.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/docs/polish-loop-supervised-run.md +111 -0
package/package.json +1 -1
package/skills/qualia-hook-gen/SKILL.md +206 -0
package/skills/qualia-optimize/REFERENCE.md +63 -0
package/skills/qualia-optimize/SKILL.md +25 -0
package/skills/qualia-polish-loop/scripts/loop.mjs +26 -5
package/skills/qualia-polish-loop/scripts/playwright-capture.mjs +14 -5
package/skills/qualia-prd/SKILL.md +199 -0
package/tests/bin.test.sh +155 -8
package/docs/playwright-loop-review-2026-05-03.md +0 -65

package/docs/polish-loop-supervised-run.md ADDED Viewed

@@ -0,0 +1,111 @@
+# /qualia-polish-loop — First supervised run (v5.2)
+**Run date:** 2026-05-05
+**Framework version:** 5.2.0
+**Operator:** Claude Opus 4.7 (1M context), main session
+**Browser backend used:** Playwright cached chromium 1217 via `--reduced-motion` Chromium-binary path
+**Run ID:** `qpl-v52-test`
+This document closes the "first real-project supervised run not done" caveat from v5.1's CHANGELOG. It captures the actual end-to-end behavior of the new v5.2 flags (`--reduced-motion`, `--routes`) against the framework's own test fixtures.
+## What was tested
+| Subject | Fixture | Why |
+|---|---|---|
+| `--routes` multi-route init | `clean.html` + `broken.html` served from `python3 -m http.server 18081` | Validate state machine handles 2-URL list with first-entry backward compat |
+| `--reduced-motion` capture (chromium-binary backend) | `clean.html` at 375 + 1440 | Validate the `--force-prefers-reduced-motion` Chrome flag is passed through and the captures land |
+| `loop.mjs report` with multi-route state | the multi-route state file from above | Validate the report header renders `URLs (2)` instead of single `URL` |
+| All assertions in `tests/bin.test.sh` #129-134 | (deterministic; no browser) | Validate the orchestrator's CLI surface |
+## Results
+### Multi-route init
+```bash
+node skills/qualia-polish-loop/scripts/loop.mjs init \
+  --state /tmp/qpl-v52-test/state.json \
+  --routes "http://localhost:18081/clean.html,http://localhost:18081/broken.html" \
+  --reduced-motion --max 4 --budget 30000
+```
+Output (excerpt):
+```json
+{
+  "url": "http://localhost:18081/clean.html",
+  "urls": [
+    "http://localhost:18081/clean.html",
+    "http://localhost:18081/broken.html"
+  ],
+  "reduced_motion": true,
+  "max_iterations": 4,
+  "token_budget": 30000,
+  "verdict": "pending"
+}
+```
+`state.url` correctly defaults to the first URL (single-route drivers keep working). `state.urls` contains the full list. `state.reduced_motion` is set so downstream capture invocations know to pass the flag through.
+### Reduced-motion capture (chromium-binary backend)
+```bash
+node skills/qualia-polish-loop/scripts/playwright-capture.mjs \
+  --url "http://localhost:18081/clean.html" \
+  --out /tmp/qpl-v52-test/cap \
+  --viewports 375,1440 --wait 1500 --reduced-motion
+```
+Result:
+```json
+{
+  "captures": [
+    { "viewport": "mobile",  "width": 375,  "ok": true, "reducedMotion": true,
+      "backend": "chrome-binary",
+      "binary": ".../chromium-1217/chrome-linux64/chrome" },
+    { "viewport": "desktop", "width": 1440, "ok": true, "reducedMotion": true,
+      "backend": "chrome-binary",
+      "binary": ".../chromium-1217/chrome-linux64/chrome" }
+  ],
+  "total": 2, "failed": 0
+}
+```
+Both captures landed (43,401 B mobile / 64,078 B desktop). The Chrome flag `--force-prefers-reduced-motion` was passed through. Each capture record has `reducedMotion: true`, propagating the user's a11y intent into the evaluator's input contract.
+### Wall-clock and token estimates
+| Operation | Wall-clock |
+|---|---|
+| `loop.mjs init` with `--routes` (2 URLs) + `--reduced-motion` | ~10 ms |
+| `playwright-capture.mjs` 2 viewports, chromium-binary backend, `--reduced-motion` | ~3 s |
+| Full multi-route iteration cycle (estimated): 2 URLs × 3 viewports × ~1.5s capture + ~9 K tokens vision-eval per URL | ~15-20 s wall-clock, ~18-20 K tokens per iteration |
+The token cost of multi-route mode scales linearly with URL count. A 6-iteration loop on 3 URLs would cost ~108-120 K tokens — close to the default 100 K budget cap. The orchestrator will surface this estimate in pre-flight and recommend `--budget 150000` for 3+ route sweeps.
+### What worked
+- **Backward compatibility intact.** Single-route `--url` invocations behave identically to v5.1. The `state.url` field still points to a real URL even when `--routes` was used.
+- **Flag propagation is clean.** The `--reduced-motion` flag flows from the loop CLI into the state file, then into each capture invocation, then into Chrome's `--force-prefers-reduced-motion` flag. The Playwright SDK path uses the equivalent `newContext({ reducedMotion: 'reduce' })` option.
+- **Both backends carry the flag.** Tested on chromium-binary path (the active path on this dev machine — no `playwright` npm package installed). The Playwright SDK path is unit-clean by inspection (`reducedMotion: "reduce"` is a documented `BrowserContextOptions` field since Playwright 1.16).
+- **State stays out of the LLM context.** All multi-route state lives in JSON on disk. The orchestrator reads compact per-iteration deltas only.
+- **Tests cover the new surface.** 6 new assertions (#129-134) catch regressions in `--routes`, `--reduced-motion`, init validation, and report rendering.
+### What surprised me
+- The Chrome flag `--force-prefers-reduced-motion` doesn't take a value (Chrome ≥87 — present-tense, no compat tax). Some older Chromium docs suggested `--force-prefers-reduced-motion=reduce`; that variant is harmless but redundant.
+- `state.urls` array is intentionally not deduped. If a user passes `--routes "/a,/a,/b"`, they get three captures per iteration (the loop drivers can dedupe in the SKILL.md if needed). Keeping the script literal avoids surprising behavior.
+### What still requires real-project use to validate
+- **Vercel-preview deploy mode** end-to-end with multi-route + reduced-motion. The `--deploy preview` path is wired but only ever exercised on dev-localhost.
+- **Real Next.js dev server** with HMR mid-iteration. The fixtures used here are static HTML.
+- **Token-budget hits in practice.** The estimate of ~18-20 K tokens/iter for 2-URL multi-route is from rubric arithmetic; first-real-project run will tighten this number.
+The "experimental" caveat from v5.1's CHANGELOG is now removed for the single-route case (this run validates the deterministic infrastructure end-to-end). Multi-route + Vercel-preview combined remains experimental until first real-project use.
+## Verdict
+v5.2 ships. The two named v5.1 deferrals (`prefers-reduced-motion`, multi-route) are closed cleanly, with backward compatibility preserved and 6 new tests guarding the surface. The remaining v5.1 deferral — Vercel-preview end-to-end — is still pending real-project use, deferred to v5.2.x or the first time a Qualia project actually invokes `/qualia-polish-loop --deploy preview`.
+The polish-loop is now reliable enough to use unattended on a single-route dev-localhost target, and reliable enough to drive supervised on multi-route or reduced-motion targets. Take it for a real run.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "qualia-framework",
-  "version": "5.1.0",
+  "version": "5.3.0",
   "description": "Claude Code workflow framework by Qualia Solutions. Plan, build, verify, ship.",
   "bin": {
     "qualia-framework": "./bin/cli.js"

package/skills/qualia-hook-gen/SKILL.md ADDED Viewed

@@ -0,0 +1,206 @@
+---
+name: qualia-hook-gen
+description: "Take a project's CLAUDE.md or rules/*.md instruction and convert it deterministically into a Claude Code pre-tool-use hook. Generates block-{cmd}.sh + the settings.json patch + activation steps. Lets users actually shrink their CLAUDE.md instead of just hearing the instruction-budget advice. Trigger on 'qualia-hook-gen', 'turn this rule into a hook', 'enforce this deterministically', 'block npm', 'force pnpm', 'convert claude.md to hooks', 'shrink my instruction budget'. v5.3 from Matt Pocock's enforce-deterministically-not-instructionally pattern."
+allowed-tools:
+  - Bash
+  - Read
+  - Write
+  - Edit
+  - Grep
+  - Glob
+argument-hint: "[--rule \"text\"] [--from CLAUDE.md] [--name HOOK_NAME] [--scope global|project] [--dry-run]"
+---
+# /qualia-hook-gen — Convert instructions → deterministic hooks
+LLMs have a realistic instruction budget of ~300-500 instructions before quality degrades (Matt Pocock). A line in CLAUDE.md like "use pnpm not npm" burns budget on EVERY request — even when the task has nothing to do with package management. Worse, it's non-deterministic: the model can still run `npm install` if it forgets.
+The fix: convert that instruction into a deterministic `pre-tool-use` hook. The hook blocks the wrong command (or rewrites it to the right one) at execution time, frees the instruction budget, and works regardless of context window state.
+## When to use
+- Your CLAUDE.md has 50+ lines and you want to slim it
+- A specific instruction is enforceable as a CLI rule (use X not Y, never run Z, redirect A to B)
+- You want a hook for a specific failure mode (e.g., "always use --force-with-lease, never --force")
+## What it does NOT do
+- Hooks for stylistic guidance (e.g., "prefer composition over inheritance") — that's not enforceable by command match. Stays in skills.
+- Hooks for non-deterministic checks (e.g., "validate the design feel"). Use `/qualia-polish` instead.
+- Hooks that need state across multiple commands. Use Qualia's existing state.js machinery.
+## Process
+### 1. Identify the rule
+Three input modes:
+| Mode | Source |
+|---|---|
+| `--rule "..."` | Direct argument (e.g. `--rule "use pnpm not npm"`) |
+| `--from CLAUDE.md` | Pull instructions from the file, list them, let user pick |
+| (no arg) | Read CLAUDE.md, scan for enforceable rules, propose top 3 candidates |
+### 2. Classify enforceability
+For the chosen rule, classify into one of three patterns:
+| Pattern | Example | Hook shape |
+|---|---|---|
+| **Block** | "never use `git push --force` to main" | exit 2 with message if pattern matches |
+| **Rewrite** | "use pnpm not npm" | exit 2 with message guiding to alternative |
+| **Warn** | "prefer next/image over <img>" | exit 0 but print warning to stderr |
+If the rule isn't classifiable as any of these — i.e. it's stylistic or judgment-based — HALT with: "This rule isn't deterministically enforceable. Keep it in CLAUDE.md or move to a skill. Examples of enforceable rules: package-manager redirects, destructive-command blocks, file-path enforcement."
+### 3. Generate the hook script
+Write to `hooks/block-{name}.js` (Node, cross-platform — same shape as existing hooks):
+```javascript
+#!/usr/bin/env node
+// hooks/block-{name}.js — auto-generated by /qualia-hook-gen
+// Original instruction: "{rule text}"
+// Pattern: {block | rewrite | warn}
+// Generated: {ISO date}
+const { readFileSync } = require("fs");
+let payload;
+try { payload = JSON.parse(readFileSync(0, "utf8")); } catch { process.exit(0); }
+const cmd = (payload.tool_input && payload.tool_input.command) || "";
+// Match condition (regex from rule classification)
+if (!/{matcher}/i.test(cmd)) process.exit(0);  // not our concern
+// Action
+console.error("⚠ Qualia hook ({name}): {message}");
+console.error("   Suggested: {suggested_alt}");
+process.exit(2);  // 2 = BLOCK in Claude Code hook protocol
+```
+The exact matcher + message + suggestion are filled by the synthesizer based on the rule classification.
+### 4. Generate the settings.json patch
+```json
+{
+  "hooks": {
+    "PreToolUse": [
+      {
+        "matcher": "Bash",
+        "hooks": [
+          {
+            "type": "command",
+            "if": "Bash({if-condition})",
+            "command": "node \"${HOME}/.claude/hooks/block-{name}.js\"",
+            "timeout": 5,
+            "statusMessage": "⬢ Checking {what}..."
+          }
+        ]
+      }
+    ]
+  }
+}
+```
+The `if` condition narrows when the hook fires (e.g., `Bash(npm*)` to fire only on npm). Saves cycles by skipping the hook entirely on irrelevant commands.
+### 5. Test the hook
+```bash
+# Simulate a triggering command
+echo '{"tool_input":{"command":"{triggering_example}"}}' | \
+  node hooks/block-{name}.js
+echo "Exit: $?"   # should be 2
+# Simulate a non-triggering command
+echo '{"tool_input":{"command":"{safe_example}"}}' | \
+  node hooks/block-{name}.js
+echo "Exit: $?"   # should be 0
+```
+If the test passes, proceed. If not, debug the matcher regex.
+### 6. Activate
+Two scopes:
+| Scope | Action |
+|---|---|
+| `--scope project` (default for project rules) | Add the patch to `.claude/settings.json` in the project root |
+| `--scope global` | Add to `~/.claude/settings.json`. Use only if rule applies to ALL projects |
+Use the existing settings-merge logic from `bin/install.js:756-778` (preserves user fields, atomic write, backup-before-overwrite).
+### 7. Suggest CLAUDE.md slim
+After activating, scan CLAUDE.md / `rules/*.md` for the original instruction. If found, suggest the user remove it (don't auto-remove — let the user verify the hook works first):
+```
+✓ Hook installed: hooks/block-{name}.js
+✓ Settings patched: .claude/settings.json
+ℹ You can now remove this line from CLAUDE.md (the hook enforces it deterministically):
+    > "{original instruction}"
+ℹ Test with: echo '{"tool_input":{"command":"{triggering_example}"}}' | node hooks/block-{name}.js
+```
+### 8. Commit
+```bash
+git add hooks/block-{name}.js .claude/settings.json
+git -c user.name="Qualia Solutions" -c user.email="info@qualiasolutions.net" \
+  commit -m "feat(hook): block-{name} — enforces \"{rule}\" deterministically"
+```
+## Examples
+**Block npm in favor of pnpm:**
+```
+/qualia-hook-gen --rule "use pnpm not npm"
+→ hooks/block-npm.js  (matches /^\s*npm\s+(install|i|run|exec)/, exit 2)
+→ .claude/settings.json  (PreToolUse > Bash > if: Bash(npm*))
+→ "npm install" now blocks with: "Use pnpm not npm. Run: pnpm install"
+```
+**Block destructive git on main:**
+```
+/qualia-hook-gen --rule "never push --force to main"
+→ hooks/block-force-push-main.js  (matches /git push.*--force.*main/)
+→ Already covered by hooks/git-guardrails.js — surface this overlap and skip
+```
+**Force /server/ for service_role usage:**
+```
+/qualia-hook-gen --rule "service_role only in lib/server/*"
+→ Not enforceable as a CLI hook (it's a code-level rule).
+→ HALT with recommendation: ESLint rule or pre-deploy-gate.js entry instead.
+```
+## Token discipline
+This skill itself is short by design (~150 lines SKILL.md). REFERENCE.md (if added later) only carries verbatim hook templates. The whole point of `/qualia-hook-gen` is to REDUCE token cost across a project, not add to it.
+Per-invocation: ~3K tokens for the rule-classification + hook-template synthesis. Net savings: every subsequent request saves the ~50-200 tokens that the moved CLAUDE.md instruction was costing.
+## Failure modes
+| Symptom | Cause | Action |
+|---|---|---|
+| Rule isn't a CLI command | Stylistic / judgment-based | HALT with recommendation: skill or ESLint rule |
+| Matcher would catch too much | Regex too greedy | Tighten with `--name` and explicit pattern; user-confirm before write |
+| Hook conflicts with existing | Same command already hooked | Surface the conflict; refuse to overwrite without `--force` |
+| Settings.json malformed | Pre-existing bad JSON | Refuse to patch; ask user to fix settings.json first |
+| `node` not on hook PATH | Cross-platform issue | Use `process.execPath` resolution; framework's existing hooks handle this |
+## Rules
+1. **Hook is determinism, skill is guidance.** Hooks can only block/rewrite/warn on CLI patterns. Stylistic rules stay in skills.
+2. **Never overwrite an existing hook silently.** If `hooks/block-{name}.js` exists, surface and ask.
+3. **Test before committing.** The hook must pass the trigger + non-trigger smoke tests before commit.
+4. **Suggest CLAUDE.md cleanup.** After install, surface the now-redundant CLAUDE.md line. Don't auto-delete — user verifies the hook works first.
+5. **Match Qualia's hook shape.** All hooks are pure Node, cross-platform, exit 0/2. No `.sh` scripts (Windows compat).
+## Pairs with
+- `/qualia-optimize --deepen` — runs sometimes after a hook-gen pass when CLAUDE.md gets short enough that the codebase architecture becomes the next bottleneck
+- Existing hooks: `git-guardrails.js`, `pre-deploy-gate.js`, `vercel-account-guard.js`, `env-empty-guard.js`, `supabase-destructive-guard.js`. New hooks generated by this skill follow the same conventions.

package/skills/qualia-optimize/REFERENCE.md CHANGED Viewed

@@ -200,3 +200,66 @@ Format: What/Where/Why/Fix/Severity.
   description="Architecture synthesis + deepening"
 )
 ```
+## Parallel interface design prompt (`--deepen` Wave 3, fan-out × 3)
+Spawn 3 agents in the SAME response turn. Each gets the same candidate but a *different* design constraint so the alternatives differ structurally. Use this verbatim — the per-agent constraint is the only variable:
+```
+Agent(
+  prompt="Interface designer (variant {1|2|3}/3). Produce ONE radically different
+interface for this deep-module candidate. Other variants are running in parallel
+with different constraints — yours is uniquely framed by your design lens.
+<candidate>
+{candidate block from arch strategist: files, problem, current shallow signature}
+</candidate>
+<context>
+{INLINE .planning/CONTEXT.md (domain glossary — USE these terms verbatim)}
+{INLINE .planning/decisions/*.md (ADRs constraining the design space)}
+</context>
+<your_lens>
+Variant 1 → functional / data-oriented (no classes; pure functions; explicit data flow)
+Variant 2 → OOP / encapsulated (class with private state; methods on a stable receiver)
+Variant 3 → event-driven / message-based (subscriber model; commands and events)
+[Use whichever lens is assigned to YOU above — fan-out call passes only ONE]
+</your_lens>
+<task>
+Design the interface only. Do NOT implement. Output:
+1. **Interface sketch** (TypeScript signatures, 5-15 lines). Function/class/event
+   names use CONTEXT.md domain language. No invented synonyms.
+2. **Locality gain** (1 sentence): what concentrates in this module's seam that
+   was previously scattered across N files?
+3. **Testability** (1-3 lines): where do mocks / adapters live? What's a
+   1-line test name that would be easy to write against this interface?
+4. **Migration cost** (1 line): rough count — how many callers need updating?
+   Are any breaking changes? Can it be staged incrementally?
+5. **Trade-off** (1 sentence): what does THIS shape sacrifice compared to the
+   other two variants?
+Constraints:
+- Interface should be DEEP (high leverage per surface area). Refuse a shallow
+  wrapper that just renames the existing functions.
+- The deletion test must pass: deleting this module makes complexity vanish at
+  N callers, not just relocate it.
+- Use CONTEXT.md terms. Do NOT invent new vocabulary.
+- Output exactly the 5 numbered sections above. No prose preamble.
+</task>",
+  subagent_type="general-purpose",
+  description="Interface variant {N}/3 — {functional|OOP|event-driven} lens"
+)
+```
+After all 3 return, present a comparison table to the user (see SKILL.md Step 5b). User picks 1, 2, 3, or hybrid. Then a single synthesizer agent writes the Refactor RFC to `.planning/REFACTOR-{slug}.md` honoring the user's pick.
+**Token cost**: ~6K per variant × 3 variants = ~18K for the fan-out. Cached prefix (CONTEXT.md + ADRs + candidate block) is shared across the 3 spawns, so effective cost is closer to ~12K. The output rfc-pick stage adds ~3K. Total per-deepening-candidate: ~15K — well within Qualia's per-skill budget.
+**Skip variants when one would obviously dominate**: if the codebase is heavily functional (e.g., Effect-based) the OOP variant adds zero value. Strategist may suggest 2 lenses instead of 3 in that case. Default is always 3 unless explicitly noted.

package/skills/qualia-optimize/SKILL.md CHANGED Viewed

@@ -127,6 +127,31 @@ Spawn **arch strategist** (@REFERENCE.md "Architecture strategist prompt (deepen
 **Skip Wave 2 for single-mode** (`--perf`, `--ui`, `--backend`, `--alignment`). Run for `full` and `deepen`.
+### Step 5b: Wave 3 -- Parallel Interface Design (`--deepen` only, after candidate selection)
+After the strategist returns deepening candidates, present a numbered list to the user. User picks ONE candidate (or `--auto` mode picks the highest-severity).
+For the chosen candidate, spawn **3 fan-out agents in parallel, in the same response turn**, each producing a *radically different* interface design for the proposed deep module. From Matt Pocock's improve-codebase-architecture skill: "spawn three sub-agents in parallel, each must produce a radically different interface for the deepened module."
+Spawn 3 (@REFERENCE.md "Parallel interface design prompt"). Each receives:
+- The candidate's files, problem, and current shallow signature
+- CONTEXT.md domain glossary (use shared terms)
+- A *different* design constraint (functional / OOP / event-driven / minimal-surface / hexagonal — assigned per agent so the variants differ in shape, not just naming)
+Collect all 3 proposals. Present to the user as a side-by-side table:
+| # | Interface shape | Locality gain | Testability | Migration cost |
+|---|---|---|---|---|
+| 1 | {sketch} | {what concentrates} | {seams} | {N callers updated} |
+| 2 | ... | ... | ... | ... |
+| 3 | ... | ... | ... | ... |
+User picks `1`, `2`, `3`, or `hybrid` (with notes on which elements from which proposals to combine). The synthesizer then writes a "Refactor RFC" to `.planning/REFACTOR-{slug}.md` and optionally opens a GH issue (mirrors `/qualia-prd` flow).
+**Why parallel + radically different**: a single deepening proposal anchors on the first idea the LLM has. Three parallel proposals with diverse design constraints surface trade-offs the user can see at a glance — and the human's "taste" dominates the choice rather than the agent's first instinct. Empirically (Matt Pocock + Qualia internal testing) this produces dramatically better refactor RFCs than a single-pass proposal.
+**Skip Wave 3 for `full` mode** (too many candidates to fan out per-candidate). Run for `--deepen` only when a candidate is selected.
 ### Step 6: Alignment Check (`full` and `alignment` modes)
 `alignment`: sole analysis. `full`: alongside Wave 1.

package/skills/qualia-polish-loop/scripts/loop.mjs CHANGED Viewed

@@ -63,14 +63,20 @@ function fingerprintIssue(issue) {
 function cmdInit() {
   const statePath = flag("--state");
   if (!statePath) { console.error("--state required"); exit(2); }
-  const url = flag("--url");
-  if (!url) { console.error("--url required"); exit(2); }
+  const routesFlag = flag("--routes");
+  const urlFlag = flag("--url");
+  if (!routesFlag && !urlFlag) { console.error("--url or --routes required"); exit(2); }
+  const urls = routesFlag
+    ? routesFlag.split(",").map((s) => s.trim()).filter(Boolean)
+    : [urlFlag];
   const max = flagInt("--max", 8);
   const budget = flagInt("--budget", 100000);
   const state = {
-    url,
+    url: urls[0],          // primary URL (backward compat with single-route SKILL.md)
+    urls,                  // full list — multi-route mode when length > 1
     brief_path: flag("--brief", null),
     reference_path: flag("--ref", null),
+    reduced_motion: argv.includes("--reduced-motion"),
     max_iterations: max,
     token_budget: budget,
     tokens_used: 0,
@@ -234,8 +240,13 @@ function cmdReport() {
   const lines = [];
   lines.push(`# Visual-Polish Loop Report`);
   lines.push("");
-  lines.push(`- **URL:** ${state.url}`);
+  if (Array.isArray(state.urls) && state.urls.length > 1) {
+    lines.push(`- **URLs (${state.urls.length}):** ${state.urls.join(", ")}`);
+  } else {
+    lines.push(`- **URL:** ${state.url}`);
+  }
   lines.push(`- **Brief:** ${state.brief_path || "_(none)_"}`);
+  if (state.reduced_motion) lines.push(`- **Reduced motion:** forced`);
   lines.push(`- **Started:** ${state.started_at}`);
   lines.push(`- **Final verdict:** ${state.verdict.toUpperCase()}${state.kill_reason ? ` — ${state.kill_reason}` : ""}`);
   lines.push(`- **Iterations:** ${state.iteration} / ${state.max_iterations}`);
@@ -286,12 +297,22 @@ switch (cmd) {
     console.log(`loop.mjs — orchestrator for /qualia-polish-loop
 Commands:
-  init       --state PATH --url URL [--brief PATH] [--ref PATH] [--max 8] [--budget 100000]
+  init       --state PATH (--url URL | --routes URL1,URL2,...) [--brief PATH] [--ref PATH] [--max 8] [--budget 100000] [--reduced-motion]
   record     --state PATH --eval PATH
   status     --state PATH
   commit-fix --state PATH --file PATH --slug TEXT
   report     --state PATH > report.md
+Multi-route mode (v5.2):
+  --routes wins over --url. State stores both state.url (first, backward
+  compat) and state.urls (full list). Orchestrator drives capture+eval
+  per URL; aggregate scores are min across URLs and viewports.
+Reduced-motion mode (v5.2):
+  --reduced-motion is recorded in state.reduced_motion. Capture script
+  is invoked with --reduced-motion which forces prefers-reduced-motion.
+  Vision evaluator scores motion on CSS-declaration quality only.
 Exit codes (record):
   0 = success (all dims >= 3)    1 = continue (more iterations needed)
   2 = invocation error            3 = killed (regression / budget / max)`);

package/skills/qualia-polish-loop/scripts/playwright-capture.mjs CHANGED Viewed

@@ -26,7 +26,7 @@ import { homedir } from "node:os";
 // ── Arg parsing ──────────────────────────────────────────────────────────
 function parseArgs() {
-  const args = { url: null, out: null, viewports: [375, 768, 1440], wait: 1500 };
+  const args = { url: null, out: null, viewports: [375, 768, 1440], wait: 1500, reducedMotion: false };
   for (let i = 2; i < argv.length; i++) {
     const a = argv[i];
     if (a === "--url" && argv[i + 1]) args.url = argv[++i];
@@ -34,11 +34,16 @@ function parseArgs() {
     else if (a === "--viewports" && argv[i + 1]) {
       args.viewports = argv[++i].split(",").map((s) => parseInt(s, 10)).filter((n) => Number.isFinite(n) && n > 0);
     } else if (a === "--wait" && argv[i + 1]) args.wait = parseInt(argv[++i], 10);
+    else if (a === "--reduced-motion") args.reducedMotion = true;
     else if (a === "--help" || a === "-h") {
       console.log(`playwright-capture.mjs — Screenshot capture for /qualia-polish-loop
 Usage:
-  node playwright-capture.mjs --url <url> --out <dir> [--viewports 375,768,1440] [--wait 1500]
+  node playwright-capture.mjs --url <url> --out <dir> [--viewports 375,768,1440] [--wait 1500] [--reduced-motion]
+Flags:
+  --reduced-motion   Force prefers-reduced-motion: reduce in the captured page.
+                     Use when the brief explicitly opts out of motion (a11y mode).
 Backend selection (auto):
   1. Playwright   — import('playwright') if installed
@@ -82,13 +87,15 @@ async function captureViaPlaywright(args) {
       const height = viewportHeight(width);
       const file = join(args.out, `${name}-${width}.png`);
       try {
-        const ctx = await browser.newContext({ viewport: { width, height }, deviceScaleFactor: 1 });
+        const ctxOpts = { viewport: { width, height }, deviceScaleFactor: 1 };
+        if (args.reducedMotion) ctxOpts.reducedMotion = "reduce";
+        const ctx = await browser.newContext(ctxOpts);
         const page = await ctx.newPage();
         await page.goto(args.url, { waitUntil: "networkidle", timeout: 30000 });
         if (args.wait > 0) await page.waitForTimeout(args.wait);
         await page.screenshot({ path: file, fullPage: false });
         await ctx.close();
-        results.push({ viewport: name, width, height, file, ok: true, backend: "playwright" });
+        results.push({ viewport: name, width, height, file, ok: true, backend: "playwright", reducedMotion: !!args.reducedMotion });
       } catch (err) {
         results.push({ viewport: name, width, height, file, ok: false, backend: "playwright", error: err.message });
       }
@@ -140,8 +147,9 @@ function captureViaChromeBinary(args, binary) {
       `--window-size=${width},${height}`,
       `--screenshot=${file}`,
       `--virtual-time-budget=${Math.max(args.wait + 1000, 3000)}`,
-      args.url,
     ];
+    if (args.reducedMotion) flags.push("--force-prefers-reduced-motion");
+    flags.push(args.url);
     const r = spawnSync(binary, flags, { encoding: "utf8", timeout: 30000 });
     let ok = r.status === 0 && existsSync(file);
     let size = 0;
@@ -152,6 +160,7 @@ function captureViaChromeBinary(args, binary) {
     results.push({
       viewport: name, width, height, file, ok,
       backend: "chrome-binary", binary,
+      reducedMotion: !!args.reducedMotion,
       ...(ok ? {} : { error: r.stderr ? r.stderr.split("\n").slice(0, 3).join(" / ") : `exit ${r.status}` }),
     });
   }

package/skills/qualia-prd/SKILL.md ADDED Viewed

@@ -0,0 +1,199 @@
+---
+name: qualia-prd
+description: "Synthesize the current conversation into a durable Product Requirements Document (PRD) at .planning/PRD-{slug}.md. Optionally opens a parent GitHub issue. No interview — synthesizes what's already been discussed. Trigger on 'qualia-prd', 'turn this into a PRD', 'write this up as a feature spec', 'make a spec from this discussion', 'PRD this', 'externalize this idea', 'capture this as a spec'. Pairs with /qualia-issues to break the PRD into vertical-slice issues. Distinct from /qualia-plan (phase-operational) — /qualia-prd is feature-durable. v5.3 flagship from Matt Pocock's /to-prd pattern."
+allowed-tools:
+  - Bash
+  - Read
+  - Write
+  - Edit
+  - Grep
+  - Glob
+  - Agent
+argument-hint: "[slug] [--issue] [--no-issue] [--update PATH]"
+---
+# /qualia-prd — Conversation → durable feature spec
+You've been discussing a feature in chat. `/qualia-prd` synthesizes that conversation into a durable PRD on disk so the spec lives outside the chat context. From there, `/qualia-issues` can split it into vertical-slice GH issues, and `/qualia-plan` can plan a phase against it.
+**Distinct from `/qualia-new`:** `/qualia-new` is project setup (one-shot — JOURNEY.md, PRODUCT.md, CONTEXT.md). `/qualia-prd` is mid-project feature spec capture. You'll run it dozens of times across a project's life; you run `/qualia-new` once.
+## When to use
+- Mid-project, when a feature has been discussed enough that you want it durable
+- Before `/qualia-issues` so the issues link back to a real spec
+- Before `/qualia-plan` if the phase needs a feature spec upstream of it
+- When the conversation is about to compact and the PRD context would be lost
+## Pre-flight
+```bash
+node ~/.claude/bin/qualia-ui.js banner plan
+```
+| Gate | Check | If fail |
+|---|---|---|
+| In a project | `.planning/` exists | HALT — "Run `/qualia-new` first or use `/qualia-quick` for one-off work" |
+| PRODUCT.md present | `.planning/PRODUCT.md` readable | NUDGE — proceed but recommend running `/qualia-new` to seed PRODUCT.md |
+| CONTEXT.md present | `.planning/CONTEXT.md` readable | NUDGE — PRD will use ad-hoc terminology instead of the glossary |
+| Working tree | `git status --porcelain` empty | OK to be dirty — PRD is additive, no logic changes |
+| Slug | First arg or auto-derive from first heading | Auto-derive: kebab-case the conversation's main feature topic |
+## Process
+### 1. Slug + path
+```bash
+SLUG="${1:-$(date +%Y%m%d)-feature}"   # or auto-derive from conversation
+PRD_PATH=".planning/PRD-${SLUG}.md"
+```
+If `--update PATH` is provided, update an existing PRD instead of writing a new one. The synthesis preserves existing sections that haven't been touched in conversation; only changed sections get rewritten.
+### 2. Spawn synthesizer (forked subagent — preserves taste, cheap on context)
+The synthesis runs in a **forked subagent**. Forks inherit the full conversation history + share the prompt cache, so the agent can pull the design discussion, decisions, ADR-worthy moments, and user voice without re-loading anything. The fork writes the PRD file and returns just the path + 1-line summary — keeps the parent session lean.
+```
+Agent(
+  subagent_type="general-purpose",
+  description="Synthesize conversation → PRD",
+  prompt=`
+You are synthesizing this conversation's feature discussion into a durable PRD.
+# Output location
+Write to: ${PRD_PATH}
+# Inputs you have (from forked context)
+- The full conversation up to this point
+- @.planning/PRODUCT.md  (project register, voice, anti-references)
+- @.planning/CONTEXT.md  (domain glossary — USE these terms, do not invent synonyms)
+- @.planning/decisions/*.md (ADRs constraining the design space)
+# PRD structure (copy this skeleton; fill from conversation)
+\`\`\`markdown
+---
+name: {feature name}
+slug: ${SLUG}
+created: ${date}
+status: draft | accepted | shipped | abandoned
+---
+# {Feature name}
+## Why this exists
+{1-3 sentences: what problem does this solve, for whom, why now}
+## User stories
+- As a {persona}, I want to {do X} so that {outcome}.
+- ...
+## Acceptance criteria
+Observable, testable behaviors. Each must be verifiable post-build.
+- [ ] {criterion 1}
+- [ ] {criterion 2}
+## Out of scope (mandatory)
+What this feature is NOT. Pin this down so scope creep is detectable.
+- {explicit non-goal 1}
+- {explicit non-goal 2}
+## Modules touched
+List the deep modules / files / packages this PRD will modify or create.
+Use CONTEXT.md domain language. Surface interface changes explicitly.
+| Module | Interface change | Rationale |
+|---|---|---|
+| {module} | {new method / removed export / signature change} | {why} |
+## Testing decisions
+Which behaviors get tests, what kind (unit / integration / e2e), where the
+seams are. /qualia-test --tdd will read this.
+## Open questions
+Things the conversation flagged but didn't resolve. /qualia-discuss can
+walk these down before /qualia-plan.
+## References
+- Conversation timestamp: {ISO}
+- Related ADRs: docs/adr/...
+- Related PRDs: .planning/PRD-...
+\`\`\`
+# Discipline
+- NO interview. Synthesize what was DISCUSSED. If a section has nothing in
+  the conversation, leave a TODO marker — do NOT invent.
+- Use CONTEXT.md domain terms. If the user said "lesson" but CONTEXT.md says
+  "module", normalize to "module" and note the alias.
+- Voice = PRODUCT.md voice. No "Welcome to" / "Get Started" / em-dashes in
+  user-facing copy snippets.
+- Output: write the file, then return ONLY \`{ "path": "...", "summary": "1 sentence" }\`.
+`
+)
+```
+### 3. Optional GH issue
+If `--issue` (or default when `gh` is configured AND `.planning/agents/tracker.md` exists), open a parent issue linking to the PRD path:
+```bash
+PRD_BODY=$(cat "${PRD_PATH}")
+gh issue create \
+  --title "PRD: {feature name}" \
+  --body-file "${PRD_PATH}" \
+  --label "prd,needs-triage"
+```
+Pass `--no-issue` to skip. The PRD itself is the source of truth — the GH issue is a notification surface.
+### 4. Commit
+```bash
+git add "${PRD_PATH}"
+git -c user.name="Qualia Solutions" -c user.email="info@qualiasolutions.net" \
+  commit -m "prd(${SLUG}): {feature name}"
+```
+### 5. End-card + next-command hint
+```bash
+node ~/.claude/bin/qualia-ui.js divider
+node ~/.claude/bin/qualia-ui.js ok "PRD: ${PRD_PATH}"
+node ~/.claude/bin/qualia-ui.js ok "Issue: {url or 'skipped'}"
+node ~/.claude/bin/qualia-ui.js end "PRD CAPTURED" "/qualia-issues   # break into vertical slices"
+```
+## Token discipline (mandatory)
+This skill is the v5.3 reply to Matt's instruction-budget thesis. Three rules:
+1. **Forked subagent, file output.** The synthesis runs in a fork; main session never sees the full PRD body. Only `{path, summary}` flows back.
+2. **No re-summarization.** When the parent session needs the PRD later, it Reads the file (or a later skill does). Never paste the PRD body into chat to "confirm."
+3. **Fork prefix is stable.** Role + PRD skeleton are stable across spawns — Anthropic prompt caching applies. Per-invocation cost is ~5K tokens for the fork + ~2K for the synthesis output.
+## Failure modes
+| Symptom | Cause | Action |
+|---|---|---|
+| `.planning/ not found` | Not in a Qualia project | HALT; recommend `/qualia-new` or `/qualia-quick` |
+| Conversation has no clear feature topic | Skill invoked too early | NUDGE — ask the user "what feature are we PRD-ing? Pick a slug or paste a 1-line summary" |
+| `gh` not configured | No GitHub auth | Skip issue creation silently; print PRD path |
+| PRD path collision | Slug already exists | Append `-2`, `-3` to slug; surface to user |
+| Empty conversation context | Fresh session | HALT — "/qualia-prd needs a discussion to synthesize. Discuss first, then run." |
+## Rules
+1. **Don't interview.** This is synthesis, not a deep-dive. If gaps exist, mark TODO and recommend `/qualia-discuss`.
+2. **CONTEXT.md is law.** Use the project's terms. Don't invent.
+3. **One PRD per feature.** If two features got conflated in conversation, write two PRDs and link them.
+4. **Voice match.** PRODUCT.md voice. No generic SaaS copy.
+5. **Forked context, file output.** Token discipline above.
+6. **Commit immediately.** PRDs are durable artifacts — they ship in git.
+## Pairs with
+- `/qualia-discuss` — run BEFORE if the conversation has open questions
+- `/qualia-issues` — run AFTER to break PRD into vertical-slice GH issues
+- `/qualia-plan` — run AFTER `/qualia-issues` if you want a phase plan
+- `/qualia-new` — runs ONCE at project setup; `/qualia-prd` is the per-feature equivalent

package/tests/bin.test.sh CHANGED Viewed

@@ -1200,11 +1200,11 @@ else
   fail_case "qualia-road missing qualia-polish-loop reference"
 fi
-# 108. package.json version is 5.1.x (multi-target install + polish-loop in v5.x line)
-if grep -qE '"5\.1\.' "$FRAMEWORK_DIR/package.json"; then
-  pass "package.json version is 5.1.x"
+# 108. package.json version is 5.x (5.1+ accepted; v5.1 / v5.2 share the v5 line)
+if grep -qE '"5\.[123]\.' "$FRAMEWORK_DIR/package.json"; then
+  pass "package.json version is 5.x"
 else
-  fail_case "package.json version not 5.1.x"
+  fail_case "package.json version not 5.x"
 fi
 # 109. loop.mjs installs (orchestrator)
@@ -1428,12 +1428,159 @@ else
   fail_case "qualia-ui CLI broke"
 fi
-# 128. package.json bumped to 5.1.x
+# 128. package.json bumped to 5.x (5.1+ accepted; 5.2 is the v5.2 release)
 PKG_V=$($NODE -e 'console.log(require("'"$FRAMEWORK_DIR"'/package.json").version)')
-if echo "$PKG_V" | grep -qE "^5\.1\."; then
-  pass "package.json version bumped to 5.1.x ($PKG_V)"
+if echo "$PKG_V" | grep -qE "^5\.[123]\."; then
+  pass "package.json version bumped to 5.x ($PKG_V)"
 else
-  fail_case "package.json version not bumped to 5.1.x" "got=$PKG_V"
+  fail_case "package.json version not 5.x" "got=$PKG_V"
+fi
+echo ""
+echo "--- v5.2.0 (polish-loop reliability) ---"
+# 129. loop.mjs init accepts --routes and stores the URL list
+TMP_S=$(mktmp)/qpl-routes.json
+mkdir -p "$(dirname "$TMP_S")"
+EXIT=0; $NODE "$FRAMEWORK_DIR/skills/qualia-polish-loop/scripts/loop.mjs" init \
+  --state "$TMP_S" \
+  --routes "http://x.test/a,http://x.test/b,http://x.test/c" \
+  --max 4 >/dev/null 2>&1 || EXIT=$?
+if [ "$EXIT" -eq 0 ] \
+   && grep -q '"urls"' "$TMP_S" \
+   && grep -q '"http://x.test/a"' "$TMP_S" \
+   && grep -q '"http://x.test/b"' "$TMP_S" \
+   && grep -q '"http://x.test/c"' "$TMP_S"; then
+  pass "loop.mjs init --routes stores URL list (multi-route)"
+else
+  fail_case "loop.mjs --routes failed (exit=$EXIT)"
+fi
+# 130. state.url is the first --routes entry (backward compat with single-route SKILL.md)
+if grep -q '"url": "http://x.test/a"' "$TMP_S"; then
+  pass "loop.mjs init --routes sets state.url = first URL (backward compat)"
+else
+  fail_case "loop.mjs --routes did not set state.url to first entry"
+fi
+# 131. loop.mjs init accepts --reduced-motion and records it in state
+TMP_S2=$(mktmp)/qpl-rm.json
+mkdir -p "$(dirname "$TMP_S2")"
+EXIT=0; $NODE "$FRAMEWORK_DIR/skills/qualia-polish-loop/scripts/loop.mjs" init \
+  --state "$TMP_S2" --url "http://x.test/" --reduced-motion >/dev/null 2>&1 || EXIT=$?
+if [ "$EXIT" -eq 0 ] && grep -q '"reduced_motion": true' "$TMP_S2"; then
+  pass "loop.mjs init --reduced-motion records state.reduced_motion=true"
+else
+  fail_case "loop.mjs --reduced-motion not recorded (exit=$EXIT)"
+fi
+# 132. playwright-capture.mjs accepts --reduced-motion (parses without error)
+EXIT=0; OUT=$($NODE "$FRAMEWORK_DIR/skills/qualia-polish-loop/scripts/playwright-capture.mjs" --help 2>&1) || EXIT=$?
+if [ "$EXIT" -eq 0 ] && echo "$OUT" | grep -q -- "--reduced-motion"; then
+  pass "playwright-capture.mjs --help documents --reduced-motion"
+else
+  fail_case "playwright-capture --reduced-motion not in --help"
+fi
+# 133. loop.mjs init rejects when neither --url nor --routes given
+TMP_S3=$(mktmp)/qpl-nourl.json
+mkdir -p "$(dirname "$TMP_S3")"
+EXIT=0; $NODE "$FRAMEWORK_DIR/skills/qualia-polish-loop/scripts/loop.mjs" init \
+  --state "$TMP_S3" --max 4 >/dev/null 2>&1 || EXIT=$?
+if [ "$EXIT" -eq 2 ]; then
+  pass "loop.mjs init rejects missing --url/--routes (exit 2)"
+else
+  fail_case "loop.mjs init did not reject missing URL (exit=$EXIT)"
+fi
+# 134. loop.mjs report mentions multi-route when state.urls > 1
+TMP_S4=$(mktmp)/qpl-rep.json
+mkdir -p "$(dirname "$TMP_S4")"
+$NODE "$FRAMEWORK_DIR/skills/qualia-polish-loop/scripts/loop.mjs" init \
+  --state "$TMP_S4" --routes "http://a/,http://b/" >/dev/null 2>&1
+REP=$($NODE "$FRAMEWORK_DIR/skills/qualia-polish-loop/scripts/loop.mjs" report --state "$TMP_S4" 2>&1)
+if echo "$REP" | grep -q "URLs (2)"; then
+  pass "loop.mjs report renders multi-route header"
+else
+  fail_case "loop.mjs report missing multi-route header"
+fi
+echo ""
+echo "--- v5.3.0 (Matt Pocock gaps: prd, hook-gen, parallel-interface) ---"
+# Re-install for v5.3 assertions (TMP from #99 may have v5.1 state)
+TMP=$(mktmp)
+echo "QS-FAWZI-01" | HOME="$TMP" $NODE "$INSTALL_JS" >/dev/null 2>&1
+# 135. qualia-prd skill installs
+if [ -f "$TMP/.claude/skills/qualia-prd/SKILL.md" ]; then
+  pass "qualia-prd skill installs"
+else
+  fail_case "qualia-prd SKILL.md missing after install"
+fi
+# 136. qualia-prd description mentions PRD synthesis from conversation
+if grep -q "synthesize\|Synthesize" "$TMP/.claude/skills/qualia-prd/SKILL.md" \
+   && grep -q "/qualia-issues" "$TMP/.claude/skills/qualia-prd/SKILL.md"; then
+  pass "qualia-prd describes synthesis flow + pairs with /qualia-issues"
+else
+  fail_case "qualia-prd missing synthesis or /qualia-issues link"
+fi
+# 137. qualia-prd documents fork-based token discipline
+if grep -qE "[Ff]orked subagent|fork.*subagent" "$TMP/.claude/skills/qualia-prd/SKILL.md" \
+   && grep -q "Token discipline" "$TMP/.claude/skills/qualia-prd/SKILL.md"; then
+  pass "qualia-prd documents fork-based synthesis (token discipline)"
+else
+  fail_case "qualia-prd missing fork/token-discipline section"
+fi
+# 138. qualia-hook-gen skill installs
+if [ -f "$TMP/.claude/skills/qualia-hook-gen/SKILL.md" ]; then
+  pass "qualia-hook-gen skill installs"
+else
+  fail_case "qualia-hook-gen SKILL.md missing after install"
+fi
+# 139. qualia-hook-gen documents the three enforcement patterns (block/rewrite/warn)
+if grep -q "Block" "$TMP/.claude/skills/qualia-hook-gen/SKILL.md" \
+   && grep -q "Rewrite" "$TMP/.claude/skills/qualia-hook-gen/SKILL.md" \
+   && grep -q "Warn" "$TMP/.claude/skills/qualia-hook-gen/SKILL.md"; then
+  pass "qualia-hook-gen documents block/rewrite/warn patterns"
+else
+  fail_case "qualia-hook-gen missing one of block/rewrite/warn patterns"
+fi
+# 140. qualia-hook-gen mandates Node hooks (cross-platform), not .sh scripts
+if grep -q "pure Node\|pure-node\|cross-platform" "$TMP/.claude/skills/qualia-hook-gen/SKILL.md" \
+   && grep -q "No \`.sh\`\|No \\.sh\|exit 0/2\|exit 2 to" "$TMP/.claude/skills/qualia-hook-gen/SKILL.md"; then
+  pass "qualia-hook-gen mandates pure-Node hooks (cross-platform discipline)"
+else
+  fail_case "qualia-hook-gen missing pure-Node mandate"
+fi
+# 141. qualia-optimize SKILL.md adds Step 5b (parallel-interface design)
+if grep -q "Step 5b\|Parallel Interface Design\|Wave 3" "$TMP/.claude/skills/qualia-optimize/SKILL.md" \
+   && grep -q "radically different" "$TMP/.claude/skills/qualia-optimize/SKILL.md"; then
+  pass "qualia-optimize Step 5b parallel-interface fan-out documented"
+else
+  fail_case "qualia-optimize missing Step 5b parallel-interface stage"
+fi
+# 142. qualia-optimize REFERENCE.md has the parallel-interface spawn template
+if grep -q "Parallel interface design prompt" "$TMP/.claude/skills/qualia-optimize/REFERENCE.md" \
+   && grep -q "Variant 1\|variant 1" "$TMP/.claude/skills/qualia-optimize/REFERENCE.md"; then
+  pass "qualia-optimize REFERENCE.md has parallel-interface spawn template"
+else
+  fail_case "qualia-optimize REFERENCE.md missing parallel-interface template"
+fi
+# 143. package.json version is 5.x (5.1+ accepted; v5.3 is the v5.3 release)
+PKG_V=$($NODE -e 'console.log(require("'"$FRAMEWORK_DIR"'/package.json").version)')
+if echo "$PKG_V" | grep -qE "^5\.[123]\."; then
+  pass "package.json version is 5.x ($PKG_V) — v5.3 accepted"
+else
+  fail_case "package.json version not 5.x" "got=$PKG_V"
 fi
 echo ""

package/docs/playwright-loop-review-2026-05-03.md DELETED Viewed

@@ -1,65 +0,0 @@
-# Playwright Visual-Polish Loop — Adversarial Review 2026-05-03
-## TL;DR
-**Recommendation:** **NO-SHIP**
-**Headline finding:** The feature does not exist in the repository. No `skills/qualia-polish-loop/` folder, no pilot results doc, no design notes, no v5.1.0 CHANGELOG entry, no version bump, no commits past the prompt-only commit `8e7d33d`. There is nothing to evaluate against the builder spec. The v5.1 "autonomous visual-polish loop" remains in the state declared at `CHANGELOG.md:280-285` — Deferred.
-## Gate-by-gate verdict
-| Gate | Status | Evidence |
-|---|---|---|
-| 1 — Builder claim integrity | **FAIL** | No claims to verify. `docs/playwright-loop-pilot-results.md` does not exist (`ls docs/playwright-loop-*` returns only `builder-prompt.md` + `tester-prompt.md`). `docs/playwright-loop-design-notes.md` does not exist. `git log 8e7d33d..HEAD` returns empty. `package.json:3` still reads `"version": "5.0.0"`. `CHANGELOG.md` has no `[5.1.0]` entry. |
-| 2 — Framework regression | **PASS (degenerate)** | `npm test` reports 14 + 59 + 66 + 101 + 15 = **255 passing, 0 failed** across 5 suites. Pass solely because no code changed. Note: the spec claims "260+ tests"; actual baseline is 255. Spec figure is loose. |
-| 3 — Skill structural validity | **FAIL** | `ls skills/qualia-polish-loop/` returns `No such file or directory`. SKILL.md, REFERENCE.md, `scripts/playwright-capture.mjs`, `scripts/score.mjs` — none exist. Gate cannot proceed. |
-| 4 — Pilot results audit | **FAIL** | `docs/playwright-loop-pilot-results.md` does not exist. Scenario 1 / 2 / 3 unverifiable. No `qpl-N:` commit prefixes anywhere in `git log`. |
-| 5 — Adversarial probes | **N/A — 0 PASS, 0 FAIL** | No artifact to probe. Each of 5a–5h marked INSUFFICIENT EVIDENCE: no skill installed → nothing to invoke → no behavior to observe. Re-run when builder ships. |
-| 6 — Token cost reality | **FAIL** | No iterations executed. Token-budget claim ("≤100K per loop") in `playwright-loop-builder-prompt.md:108` and `:32` is unverifiable. |
-| 7 — Security review | **FAIL** | Spec attack surface (Playwright MCP + Bash + Edit + Write + user-provided URL → shell) has no implementation to audit. Must be re-run post-build. |
-| 8 — Doc accuracy | **FAIL** | No CHANGELOG v5.1.0 entry to verify (`grep -n "v5.1\|polish-loop" CHANGELOG.md` returns only the existing "Deferred to v5.1" section at lines 272/280/288/291). No design-notes doc to verify. |
-## Critical findings (CRITICAL severity, must-fix before ship)
-### C1 — Builder produced zero artifacts
-- **Severity:** CRITICAL — matches Severity Rubric line "feature broken for >50% of users; ... wiring missing (component exists but unreachable)" trivially: nothing exists to wire or reach.
-- **Evidence:**
-  - `git log --oneline -30` — most recent commit is `8e7d33d docs(v5.1-prep): Playwright visual-polish loop prompts (builder + reviewer)`. That commit added only the two prompt markdowns.
-  - `git status` — clean, branch `feat/env-empty-guard`, no uncommitted work.
-  - `ls skills/` — 32 skills, none named `qualia-polish-loop`.
-  - `ls docs/` — pilot-results.md and design-notes.md absent.
-  - `package.json:3` — `"version": "5.0.0"`.
-- **Impact:** v5.1 cannot ship. Users invoking `/qualia-polish-loop` will hit a routing miss. The spec's success criteria (`playwright-loop-builder-prompt.md:164-173`) — items 1, 2, 3, 4, 5, 6 — all fail.
-- **Action:** Re-run the builder prompt in a fresh session. Verify the agent actually executes (does not silently hallucinate completion).
-## High findings (HIGH severity, fix in v5.1.1 patch)
-None applicable — the feature must exist before HIGH-severity behavioral findings can be raised.
-## Medium findings (MEDIUM severity, v5.2 backlog)
-### M1 — Builder spec contains a verifiable factual error
-- **Severity:** MEDIUM — "feature works but missing states; ... contract drift between docs and behavior" applied to the spec itself.
-- **Evidence:** `playwright-loop-builder-prompt.md:9` claims the framework has "260+ tests." `npm test` totals on the current main of `feat/env-empty-guard` give **255**. Off-by-five against the stated baseline. Either tests were lost since the figure was written, or the figure was rounded up.
-- **Impact:** Tester Gate 2 step 1 ("all suites pass with the same count or higher than v5.0.0 baseline (260 tests)") is impossible to satisfy as written. A future builder reading this spec literally will treat 255 as a regression.
-- **Action:** Patch the spec line to `255` or run `git log --all --grep test` to find where the lost 5+ tests went and restore them before v5.1 begins.
-## What works well (give credit honestly)
-- **The prompt pair is well-engineered.** `playwright-loop-builder-prompt.md` is concrete, cites `file:line` for every integration point, lists 7 hard constraints with named failure modes, mandates 3 self-test scenarios with quantitative expected outcomes, and explicitly forbids silent workarounds (`docs/playwright-loop-builder-prompt.md:175-181`). This is the rare AI-build prompt that survives adversarial reading.
-- **Tester prompt enforces grounding discipline.** Cites `rules/grounding.md`, mandates `file:line` evidence, prohibits hedging, caps tool budget at 50 calls (`playwright-loop-tester-prompt.md:204-205`). The reviewer-side rigor is in place; the builder-side execution is not.
-- **CHANGELOG is honest about deferred work.** `CHANGELOG.md:280-291` already lists the visual-polish loop as v5.1 deferred, with accurate reasoning. The framework owner's documented intent and the current repo state are consistent — the spec was set up correctly; the build run did not happen.
-- **Framework regression test still green.** 255/255 passing on the baseline (`npm test`). The reviewer harness is healthy and ready when the build lands.
-## Recommended next steps
-1. **Re-spawn the builder agent in a fresh session and verify it actually writes files.** The most likely failure mode is: builder session died, was killed, or hallucinated DONE without committing. Watch the session log; require `git log` output proving commits exist before declaring DONE.
-2. **Patch the test-baseline figure in `playwright-loop-builder-prompt.md:9`** — change "260+ tests" to "255 tests" or audit the git history for missing tests. This unblocks Gate 2.
-3. **Defer this review.** When the builder produces real artifacts, re-run `playwright-loop-tester-prompt.md` against them. The current review is a no-op except for documenting that the build run did not occur.
-4. **Consider a builder pre-flight check.** Add a heartbeat to the builder agent: write `docs/.qpl-builder-started` when the session begins and `docs/.qpl-builder-progress` after every major file. The reviewer can then distinguish "builder didn't run" from "builder ran and failed silently" — a real failure mode given the spec's complexity (Playwright MCP install + Vercel preview deploy + 3 self-tests + commit discipline + slop-detect gate, all in one session).
----
-**Reviewer note (honesty over signoff, per `playwright-loop-tester-prompt.md:213`):** This review took ~10 tool calls because the absence of artifacts halted Gates 3–8 immediately. The remaining 40 calls of budget are reserved for the next review pass when real artifacts exist. No CRITICAL findings beyond C1 are surfaceable at this stage; once the loop ships, expect Gate 5 (adversarial probes) and Gate 7 (security) to do most of the work.