qualia-framework 6.7.1 → 6.8.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -32,6 +32,8 @@ The capture script (`scripts/playwright-capture.mjs`) auto-selects in this order
32
32
  3. `~/.cache/ms-playwright/chromium-{version}/chrome-{linux64,linux,mac,win}/chrome` — if Playwright was ever installed for browsers but the package isn't import-resolvable
33
33
  4. `which google-chrome` / `chromium` / `chromium-browser` / `chrome` — system browser fallback
34
34
 
35
+ All four backends capture **full-page** (top-to-bottom, including below-fold content), not just the first viewport-height — the evaluator scores card grids, CTAs, and footers, so the capture must contain them. Playwright uses `fullPage: true`; the binary-direct backends omit any fixed `--window-size` height clamp so the headless render extends to the full document height.
36
+
35
37
  For backends 3 and 4 (binary-direct), the script uses `--headless=new --screenshot --virtual-time-budget`. Less precise than Playwright's `networkidle` waiting but works without any npm dependency.
36
38
 
37
39
  Setup hints if all four fail:
@@ -70,6 +72,7 @@ Role: @${QUALIA_AGENTS}/visual-evaluator.md
70
72
  </product>
71
73
 
72
74
  <screenshots>
75
+ One full-page PNG per viewport (captured top-to-bottom, includes below-fold content):
73
76
  - mobile (375px): /tmp/qpl-{ts}/iter-{N}/mobile-375.png
74
77
  - tablet (768px): /tmp/qpl-{ts}/iter-{N}/tablet-768.png
75
78
  - desktop (1440px): /tmp/qpl-{ts}/iter-{N}/desktop-1440.png
@@ -150,6 +153,8 @@ the {dim} dimension at {score}. Your single task: fix that one dimension.
150
153
 
151
154
  ## Iteration log entry (what `loop.mjs record` writes to state.json.iterations[])
152
155
 
156
+ `tokens_used` is a **deterministic fixed charge** (`PER_ITER_TOKENS = 14500`) added by `loop.mjs` per iteration — NOT a number reported by the evaluator. The evaluator's JSON no longer carries a `tokens_used` field; trusting a model-guessed count to drive the budget kill-switch was unsound, so the loop charges the constant below for every iteration regardless of what any agent claims.
157
+
153
158
  ```json
154
159
  {
155
160
  "iteration": 1,
@@ -189,20 +194,20 @@ When the kill trigger fires, the verdict becomes `killed_regression` and `state.
189
194
  | 6 | ~90K | default |
190
195
  | 8 | ~120K | hard cap; pass `--budget 150000` to allow |
191
196
 
192
- Per-iteration cost (rough):
193
- - 3 screenshot reads ≈ 9K
197
+ Per-iteration cost is charged as a **fixed constant** — `loop.mjs` adds `PER_ITER_TOKENS = 14500` per iteration to `state.tokens_used` and the kill-switch trips off that deterministic sum, never off an agent's self-estimate. The breakdown below is the rationale for the constant, not a per-run measurement:
198
+ - 3 full-page screenshot reads ≈ 9K (taller PNGs than viewport-height crops, but still 3 reads)
194
199
  - rubric + brief inlined ≈ 2K (cached after iter 1)
195
200
  - previous-iteration delta ≈ 0.5K
196
201
  - 3 fix-builder spawns × (file read + edit + commit-fix call) ≈ 3K
197
- - **per-iteration 14.5K**
202
+ - **per-iteration constant = 14.5K** (deterministic — same charge every iteration)
198
203
 
199
204
  ## Self-test scenarios (mapping to spec)
200
205
 
201
206
  | # | Fixture | Expected | Verifier |
202
207
  |---|---|---|---|
203
- | 1 | `fixtures/clean.html` | SUCCESS in 1-2 iterations, all dims ≥ 4 | run capture, run evaluator inline, assert pass |
204
- | 2 | `fixtures/broken.html` | SUCCESS in 4-6 iters; identifies banned font + gradient + 3-card grid + side-stripe + generic CTA | each fix-builder commits a `qpl-N:` change; final eval all dims ≥ 3 |
205
- | 3 | Kill-switch | KILL at iter ≤ 4 with `LOOP_REGRESSION_DETECTED` | call `loop.mjs record` 3× with the same fingerprint; assert exit 3 + correct verdict |
208
+ | 1 | `fixtures/clean.html` | Each viewport PNG is full-page (height > viewport-height when the fixture scrolls); SUCCESS in 1-2 iterations, all dims ≥ 4 | run capture, assert PNG is full-page, run evaluator inline, assert pass |
209
+ | 2 | `fixtures/broken.html` | SUCCESS in 4-6 iters; identifies banned font + gradient + 3-card grid + side-stripe + generic CTA, INCLUDING below-fold offenders only visible in a full-page shot | each fix-builder commits a `qpl-N:` change; final eval all dims ≥ 3 |
210
+ | 3 | Kill-switch | KILL at iter ≤ 4 with `LOOP_REGRESSION_DETECTED`; `state.tokens_used` increments by exactly `PER_ITER_TOKENS` (14500) per recorded iteration | call `loop.mjs record` 3× with the same fingerprint; assert exit 3 + correct verdict + deterministic token charge |
206
211
 
207
212
  The pilot-results doc at `docs/playwright-loop-pilot-results.md` records the actual outcome from `bash scripts/_self-tests.sh` (Scenario 3 is exercised by a deterministic unit-style invocation; Scenarios 1+2 require a real vision pass and are run by Claude when the loop ships).
208
213
 
@@ -32,7 +32,7 @@ The first argument selects the scope. Stage selection follows from scope.
32
32
  | `/qualia-polish --redesign` | **Redesign** | ~30m | all + Stage 1 mandatory + 2 vision iterations |
33
33
  | `/qualia-polish --critique` | **Critique** | read-only | 0, 4, 5 (no edits) |
34
34
  | `/qualia-polish --quick` | **Quick** | ~1m | 0, 2, 7 (gates only, no vision loop) |
35
- | `/qualia-polish --vibe` | **Aesthetic pivot** | ~3m | token-only direction swap |
35
+ | `/qualia-polish --vibe` | **Aesthetic pivot** | ~3m | token-only direction swap, then 6, 7 (mandatory verify) |
36
36
  | `/qualia-polish --loop {url}` | **Loop** | ~5-15m | autonomous see/fix/verify, max 8 iterations |
37
37
 
38
38
  Other flags: `--register=brand|product` overrides register inference. Vibe-specific flags: `--variants N`, `--extract URL|image`, `--sync`, `--write`. Loop-specific flags: `--brief PATH`, `--max N`, `--viewports 375,768,1440`, `--ref PATH`, `--budget 100000`.
@@ -73,6 +73,21 @@ node ${QUALIA_SKILLS}/qualia-polish/scripts/vibe-extract.mjs --source https://ex
73
73
 
74
74
  Default flow proposes one opinionated direction per `rules/one-opinion.md`. `--variants` is opt-in only. `--extract` stages a reference screenshot for the visual evaluator to reverse-engineer into DESIGN.md tokens; the user must review low-confidence extracted values before application. `--sync --write` patches DESIGN.md to reflect tokens already present in code.
75
75
 
76
+ **Mandatory post-vibe verify (the token swap does NOT terminate the run).** After applying the new tokens, a `--vibe` run MUST NOT stop at the swap — it routes back through **Stage 6 — Verify**. Before claiming done, run:
77
+
78
+ ```bash
79
+ # 1. TypeScript still compiles (a token rename can break a typed theme map)
80
+ npx tsc --noEmit 2>&1 | tail -10
81
+
82
+ # 2. Anti-pattern scan on the files the swap touched (new tokens can reintroduce slop)
83
+ node bin/slop-detect.mjs {changed_files}
84
+ [ $? -eq 0 ] || { echo "vibe failed — slop-detect found critical issues in the new tokens"; exit 1; }
85
+ ```
86
+
87
+ 3. **Scoped vision check (best-effort, only if a dev server is reachable).** Capture one screenshot per viewport via `scripts/playwright-capture.mjs` and spawn a single `agents/visual-evaluator.md` pass scored ONLY on the dimensions a token swap can move: **color, typography, shadow, motion**. (Layout/container/microcopy/graphics are out of scope for `--vibe` — do not score or fix them here.) If no dev server and no resolvable browser backend, skip this step and note it in the report; never block the vibe on missing optional tooling. Any in-scope dimension scoring < 3 is a regression in the new direction — re-tune the offending tokens and re-verify, do not ship.
88
+
89
+ Only after Stage 6 passes does the `--vibe` run reach Stage 7 (commit & state).
90
+
76
91
  ## Setup gates (non-optional, every scope)
77
92
 
78
93
  Before any work — design or otherwise — pass these gates. Skipping them produces generic output that ignores the project.
@@ -163,6 +178,8 @@ Each agent receives:
163
178
 
164
179
  For Component scope: do the work in main context. Read, fix, verify.
165
180
 
181
+ **Optional best-effort vision check (Component scope).** If a dev server is reachable (`http://localhost:3000` or a port detected via `lsof`) AND `scripts/playwright-capture.mjs` can resolve a browser backend, capture ONE screenshot at a single viewport (desktop 1440 is sufficient for an isolated component) and spawn one `agents/visual-evaluator.md` pass scored on component-relevant dimensions ONLY — Typography, Color, Shadow, Motion, Microcopy (skip Layout/Container/Graphics per the rubric's scope guard). This is opt-in by environment: if there is no dev server or no resolvable backend, **omit the vision check without error and record that it was unavailable** — never block a component touch-up on missing optional tooling.
182
+
166
183
  **Apply fixes scoped by what's missing:**
167
184
 
168
185
  | If the file scores low on… | Apply fix from… |
@@ -216,7 +233,7 @@ If a11y &lt; 90 OR axe critical/serious violations: **fix programmatically** (th
216
233
 
217
234
  ### Stage 4 — Vision loop (Redesign scope only — max 2 iterations)
218
235
 
219
- Use `skills/qualia-polish/scripts/playwright-capture.mjs` (Playwright backend, with Chrome-binary fallback). Capture screenshots at 3 viewports: 375 / 768 / 1440 — these match what `agents/visual-evaluator.md` expects, and what the `--loop` mode captures. Single browser (Chromium) is fine in 2026 — cross-browser CSS rendering differences are vanishingly rare.
236
+ Use `skills/qualia-polish/scripts/playwright-capture.mjs` (Playwright backend, with Chrome-binary fallback). Capture one **full-page** screenshot at each of 3 viewports: 375 / 768 / 1440 — these match what `agents/visual-evaluator.md` expects, and what the `--loop` mode captures. The `height` below sets the viewport window, not the screenshot height — each PNG runs top-to-bottom so the evaluator scores below-fold sections too. Single browser (Chromium) is fine in 2026 — cross-browser CSS rendering differences are vanishingly rare.
220
237
 
221
238
  ```
222
239
  viewports: [
@@ -325,6 +342,6 @@ node ${QUALIA_BIN}/qualia-ui.js end "POLISHED" "{next command — depends on con
325
342
  | `bin/slop-detect.mjs not found` | Framework not installed in project | Run `npx qualia install` or pull script from framework repo |
326
343
  | `PRODUCT.md missing` | Project predates the PRODUCT.md substrate | Run setup (ask 5 questions, generate). For Component scope, can proceed with nudge. |
327
344
  | `Lighthouse not installed` | Optional tool | Skip Stage 3 numeric gates, note in report. Don't fail. |
328
- | `webapp-testing skill not present` | Optional Anthropic skill | Skip Stage 4 vision loop on Redesign scope. Note in report. Recommend installing. |
345
+ | `playwright-capture.mjs cannot resolve a browser backend` | None of the four backends resolve: no `playwright`/`playwright-core` package, no cached Chromium under `~/.cache/ms-playwright`, no system Chrome/Chromium on PATH | Skip Stage 4 vision loop (and any screenshot-dependent verify). Note in report. Install a backend: `npm i -D playwright && npx playwright install chromium` (or put `google-chrome` on PATH). |
329
346
  | Vision loop oscillates between iterations | Rubric not anchored properly | Verify rubric prompt instructs "default to 3, only 1-2 = fix". Hard cap at 2 iterations. |
330
347
  | User says "you missed X" after polish completes | Scope was too narrow | Re-run with wider scope (`/qualia-polish` whole-app). Don't argue scope. |
@@ -14,12 +14,13 @@
14
14
  *
15
15
  * Eval JSON contract (what the vision evaluator returns):
16
16
  * {
17
- * "iteration": 1,
17
+ * "iteration": 1, // optional; if present must equal state.iteration+1 (monotonicity assertion)
18
18
  * "viewport_results": [{ viewport, scores, top_issues }],
19
19
  * "aggregate_scores": { typography, color, spatial, layout, shadow, motion, microcopy, container, graphics },
20
- * "top_issues": [{ dim, severity, description, likely_file, fix }],
21
- * "tokens_used": 14500
20
+ * "top_issues": [{ dim, severity, description, likely_file, fix }]
22
21
  * }
22
+ * NOTE: any ev.tokens_used is IGNORED — the budget charges a fixed
23
+ * PER_ITER_TOKENS per iteration (deterministic kill-switch).
23
24
  *
24
25
  * State JSON shape (written by this script, read by Claude):
25
26
  * {
@@ -35,6 +36,12 @@ import { spawnSync } from "node:child_process";
35
36
 
36
37
  const DIMS = ["typography", "color", "spatial", "layout", "shadow", "motion", "microcopy", "container", "graphics"];
37
38
 
39
+ // Deterministic per-iteration token charge (evaluator + 3 builders ≈ 14500,
40
+ // per REFERENCE.md:197). The budget kill-switch must NOT trust ev.tokens_used —
41
+ // that is a model-guessed number and an under-report would defeat the budget
42
+ // guard. Charge a fixed, known cost per iteration instead.
43
+ const PER_ITER_TOKENS = 14500;
44
+
38
45
  // ── helpers ──────────────────────────────────────────────────────────────
39
46
  function loadState(p) {
40
47
  if (!existsSync(p)) { console.error(`state file not found: ${p}`); exit(2); }
@@ -98,7 +105,17 @@ function cmdRecord() {
98
105
  const state = loadState(statePath);
99
106
  const ev = JSON.parse(readFileSync(evalPath, "utf8"));
100
107
 
101
- state.iteration = (ev.iteration ?? state.iteration + 1);
108
+ // Monotonicity assertion: each record() advances exactly one iteration. If the
109
+ // evaluator reports an iteration that disagrees with the script's own counter,
110
+ // the loop is desynced — fail loudly instead of trusting the eval's number,
111
+ // which would corrupt regression fingerprinting (state.iteration is the key
112
+ // used at :108-134) and budget accounting.
113
+ const expected = state.iteration + 1;
114
+ if (ev.iteration != null && ev.iteration !== expected) {
115
+ console.error(`loop desync: eval.iteration=${ev.iteration} expected=${expected}`);
116
+ exit(2);
117
+ }
118
+ state.iteration = expected;
102
119
  const scores = ev.aggregate_scores || {};
103
120
  const failingDims = DIMS.filter((d) => (scores[d] ?? 3) < 3);
104
121
  const total = DIMS.reduce((a, d) => a + (scores[d] ?? 3), 0);
@@ -133,7 +150,8 @@ function cmdRecord() {
133
150
  return its.length >= 3 && its.every((n, i, arr) => i === 0 || n === arr[i - 1] + 1);
134
151
  });
135
152
 
136
- state.tokens_used += parseInt(ev.tokens_used || 0, 10) || 0;
153
+ // Charge the fixed per-iteration cost do NOT trust ev.tokens_used.
154
+ state.tokens_used += PER_ITER_TOKENS;
137
155
 
138
156
  state.iterations.push({
139
157
  iteration: state.iteration,
@@ -142,7 +160,7 @@ function cmdRecord() {
142
160
  pass,
143
161
  failing_dims: failingDims,
144
162
  top_issues: issues,
145
- tokens_used: ev.tokens_used || 0,
163
+ tokens_used: PER_ITER_TOKENS,
146
164
  timestamp: new Date().toISOString(),
147
165
  });
148
166
 
@@ -26,7 +26,7 @@ import { homedir } from "node:os";
26
26
 
27
27
  // ── Arg parsing ──────────────────────────────────────────────────────────
28
28
  function parseArgs() {
29
- const args = { url: null, out: null, viewports: [375, 768, 1440], wait: 1500, reducedMotion: false };
29
+ const args = { url: null, out: null, viewports: [375, 768, 1440], wait: 1500, reducedMotion: false, states: false, stateSelector: null };
30
30
  for (let i = 2; i < argv.length; i++) {
31
31
  const a = argv[i];
32
32
  if (a === "--url" && argv[i + 1]) args.url = argv[++i];
@@ -35,15 +35,22 @@ function parseArgs() {
35
35
  args.viewports = argv[++i].split(",").map((s) => parseInt(s, 10)).filter((n) => Number.isFinite(n) && n > 0);
36
36
  } else if (a === "--wait" && argv[i + 1]) args.wait = parseInt(argv[++i], 10);
37
37
  else if (a === "--reduced-motion") args.reducedMotion = true;
38
+ else if (a === "--states") args.states = true;
39
+ else if (a === "--state-selector" && argv[i + 1]) args.stateSelector = argv[++i];
38
40
  else if (a === "--help" || a === "-h") {
39
41
  console.log(`playwright-capture.mjs — Screenshot capture for /qualia-polish --loop
40
42
 
41
43
  Usage:
42
- node playwright-capture.mjs --url <url> --out <dir> [--viewports 375,768,1440] [--wait 1500] [--reduced-motion]
44
+ node playwright-capture.mjs --url <url> --out <dir> [--viewports 375,768,1440] [--wait 1500] [--reduced-motion] [--states]
43
45
 
44
46
  Flags:
45
47
  --reduced-motion Force prefers-reduced-motion: reduce in the captured page.
46
48
  Use when the brief explicitly opts out of motion (a11y mode).
49
+ --states Also capture hover/focus interaction-state PNGs (Playwright
50
+ backend only — the Chrome-binary backend cannot drive
51
+ hover/focus from the CLI and silently skips them).
52
+ --state-selector CSS selector to target for --states (default: first
53
+ a, button, [role=button], or input on the page).
47
54
 
48
55
  Backend selection (auto):
49
56
  1. Playwright — import('playwright') if installed
@@ -64,11 +71,14 @@ function viewportName(width) {
64
71
  if (width <= 900) return "tablet";
65
72
  return "desktop";
66
73
  }
67
- function viewportHeight(width) {
68
- if (width <= 480) return 812;
69
- if (width <= 900) return 1024;
70
- return 900;
71
- }
74
+ // Initial render height for the Playwright context viewport. Capture is
75
+ // full-page (fullPage:true), so this is NOT a clamp on the captured PNG — it
76
+ // only sets the layout viewport before Playwright stitches the full scroll.
77
+ const INITIAL_RENDER_HEIGHT = 1080;
78
+ // Headless Chrome (binary backend) has no --full-page flag, so a tall window
79
+ // height approximates full-page capture. Replaces the old per-viewport clamp
80
+ // (812/1024/900) that cropped everything below the fold.
81
+ const FULLPAGE_WINDOW_HEIGHT = 8000;
72
82
 
73
83
  // ── Backend: Playwright (preferred when available) ──────────────────────
74
84
  async function captureViaPlaywright(args) {
@@ -84,7 +94,7 @@ async function captureViaPlaywright(args) {
84
94
  browser = await chromium.launch({ headless: true });
85
95
  for (const width of args.viewports) {
86
96
  const name = viewportName(width);
87
- const height = viewportHeight(width);
97
+ const height = INITIAL_RENDER_HEIGHT;
88
98
  const file = join(args.out, `${name}-${width}.png`);
89
99
  try {
90
100
  const ctxOpts = { viewport: { width, height }, deviceScaleFactor: 1 };
@@ -93,9 +103,14 @@ async function captureViaPlaywright(args) {
93
103
  const page = await ctx.newPage();
94
104
  await page.goto(args.url, { waitUntil: "networkidle", timeout: 30000 });
95
105
  if (args.wait > 0) await page.waitForTimeout(args.wait);
96
- await page.screenshot({ path: file, fullPage: false });
106
+ await page.screenshot({ path: file, fullPage: true });
107
+ const stateFiles = [];
108
+ if (args.states) {
109
+ const extra = await captureStates(page, args, name, width);
110
+ stateFiles.push(...extra);
111
+ }
97
112
  await ctx.close();
98
- results.push({ viewport: name, width, height, file, ok: true, backend: "playwright", reducedMotion: !!args.reducedMotion });
113
+ results.push({ viewport: name, width, height, file, ok: true, backend: "playwright", reducedMotion: !!args.reducedMotion, ...(stateFiles.length ? { state_captures: stateFiles } : {}) });
99
114
  } catch (err) {
100
115
  results.push({ viewport: name, width, height, file, ok: false, backend: "playwright", error: err.message });
101
116
  }
@@ -106,6 +121,59 @@ async function captureViaPlaywright(args) {
106
121
  return results;
107
122
  }
108
123
 
124
+ // ── Optional interaction-state capture (--states, Playwright only) ───────
125
+ // Captures hover/focus/loading variant PNGs of the first interactive element
126
+ // (or --state-selector) so the evaluator can score Motion/States honestly
127
+ // instead of guessing from a single static above-fold frame. Best-effort: any
128
+ // individual state that can't be captured is recorded as ok:false, never throws.
129
+ async function captureStates(page, args, name, width) {
130
+ const out = [];
131
+ const selector = args.stateSelector
132
+ || "a:visible, button:visible, [role=button]:visible, input:visible";
133
+ let target;
134
+ try {
135
+ target = page.locator(selector).first();
136
+ await target.waitFor({ state: "visible", timeout: 2000 });
137
+ } catch {
138
+ out.push({ state: "hover", viewport: name, width, ok: false, error: "no interactive target found" });
139
+ return out;
140
+ }
141
+
142
+ // hover
143
+ const hoverFile = join(args.out, `${name}-${width}-hover.png`);
144
+ try {
145
+ await target.hover({ timeout: 2000 });
146
+ await page.waitForTimeout(Math.min(args.wait, 400));
147
+ await target.screenshot({ path: hoverFile });
148
+ out.push({ state: "hover", viewport: name, width, file: hoverFile, ok: true });
149
+ } catch (e) {
150
+ out.push({ state: "hover", viewport: name, width, file: hoverFile, ok: false, error: e.message });
151
+ }
152
+
153
+ // focus
154
+ const focusFile = join(args.out, `${name}-${width}-focus.png`);
155
+ try {
156
+ await target.focus({ timeout: 2000 });
157
+ await page.waitForTimeout(Math.min(args.wait, 400));
158
+ await target.screenshot({ path: focusFile });
159
+ out.push({ state: "focus", viewport: name, width, file: focusFile, ok: true });
160
+ } catch (e) {
161
+ out.push({ state: "focus", viewport: name, width, file: focusFile, ok: false, error: e.message });
162
+ }
163
+
164
+ // loading — re-navigate and grab the earliest paint (commit) before networkidle
165
+ const loadingFile = join(args.out, `${name}-${width}-loading.png`);
166
+ try {
167
+ await page.goto(args.url, { waitUntil: "commit", timeout: 30000 });
168
+ await page.screenshot({ path: loadingFile });
169
+ out.push({ state: "loading", viewport: name, width, file: loadingFile, ok: true });
170
+ } catch (e) {
171
+ out.push({ state: "loading", viewport: name, width, file: loadingFile, ok: false, error: e.message });
172
+ }
173
+
174
+ return out;
175
+ }
176
+
109
177
  // ── Backend: Chrome/Chromium headless via spawn ─────────────────────────
110
178
  function findChromeBinary() {
111
179
  // 1. Playwright cached chromium (newest first)
@@ -136,7 +204,10 @@ function captureViaChromeBinary(args, binary) {
136
204
  const results = [];
137
205
  for (const width of args.viewports) {
138
206
  const name = viewportName(width);
139
- const height = viewportHeight(width);
207
+ // Tall window so headless Chrome's --screenshot is not cropped to a fixed
208
+ // above-fold height. Classic headless has no --full-page flag, so a large
209
+ // height is the closest equivalent to full-page capture on this backend.
210
+ const height = FULLPAGE_WINDOW_HEIGHT;
140
211
  const file = join(args.out, `${name}-${width}.png`);
141
212
  const flags = [
142
213
  "--headless=new",
@@ -144,6 +215,8 @@ function captureViaChromeBinary(args, binary) {
144
215
  "--disable-gpu",
145
216
  "--disable-dev-shm-usage",
146
217
  "--hide-scrollbars",
218
+ // Constrain layout WIDTH; use a tall height (not the old per-viewport
219
+ // clamp) so the capture spans well below the fold.
147
220
  `--window-size=${width},${height}`,
148
221
  `--screenshot=${file}`,
149
222
  `--virtual-time-budget=${Math.max(args.wait + 1000, 3000)}`,
@@ -189,6 +262,9 @@ async function main() {
189
262
  }
190
263
 
191
264
  const failed = captures.filter((c) => !c.ok).length;
265
+ // --states is only honored on the Playwright backend (the Chrome-binary
266
+ // backend cannot drive hover/focus from the CLI). Surface that it was skipped.
267
+ const statesSkipped = args.states && backendUsed === "chrome-binary";
192
268
  console.log(JSON.stringify({
193
269
  url: args.url,
194
270
  output_dir: args.out,
@@ -196,6 +272,8 @@ async function main() {
196
272
  captures,
197
273
  total: captures.length,
198
274
  failed,
275
+ ...(args.states ? { states_requested: true } : {}),
276
+ ...(statesSkipped ? { states_skipped: "chrome-binary backend cannot capture hover/focus/loading states" } : {}),
199
277
  }, null, 2));
200
278
  exit(failed > 0 ? 1 : 0);
201
279
  }
@@ -19,8 +19,9 @@
19
19
  */
20
20
 
21
21
  import { readFileSync, writeFileSync, existsSync, readdirSync, statSync } from "node:fs";
22
- import { join, basename, extname } from "node:path";
22
+ import path, { join, basename, extname } from "node:path";
23
23
  import { argv, exit, cwd } from "node:process";
24
+ import { spawnSync } from "node:child_process";
24
25
 
25
26
  function flag(name, fallback) {
26
27
  const i = argv.indexOf(name);
@@ -41,6 +42,7 @@ if (!CMD || CMD === "--help" || CMD === "-h") {
41
42
  Usage:
42
43
  tokens.mjs sync --design <path> [--write]
43
44
  tokens.mjs propose-variants --product <path> --design <path> --count <N>
45
+ tokens.mjs verify [--files a.css,b.tsx] # post-vibe gate: tsc --noEmit + slop-detect
44
46
 
45
47
  See skills/qualia-polish/SKILL.md.
46
48
  `);
@@ -331,11 +333,65 @@ function cmdProposeVariants() {
331
333
  exit(0);
332
334
  }
333
335
 
336
+ // ─── Post-vibe verify (script-side piece of W7.3) ─────────────────────
337
+ // Deterministic gate the orchestrator (SKILL.md Stage 6) calls AFTER a --vibe
338
+ // token swap, so a vibe pivot never ships unverified. This runs the two
339
+ // machine-checkable gates — `tsc --noEmit` and `slop-detect` on changed files.
340
+ // The screenshot + visual-evaluator pass remains orchestration (the parent
341
+ // session / another agent), not a script primitive.
342
+ function resolveSlopScript() {
343
+ const candidates = [
344
+ process.env.SLOP_DETECT_SCRIPT,
345
+ `${process.env.HOME}/.claude/bin/slop-detect.mjs`,
346
+ path.join(path.dirname(new URL(import.meta.url).pathname), "..", "..", "..", "bin", "slop-detect.mjs"),
347
+ ].filter(Boolean);
348
+ return candidates.find((p) => existsSync(p)) || null;
349
+ }
350
+
351
+ function cmdVerify() {
352
+ const filesFlag = flag("--files", null);
353
+ const files = typeof filesFlag === "string"
354
+ ? filesFlag.split(",").map((s) => s.trim()).filter(Boolean)
355
+ : [];
356
+ const gates = [];
357
+
358
+ // Gate 1 — tsc --noEmit (skipped if no tsconfig in cwd).
359
+ if (existsSync(join(cwd(), "tsconfig.json"))) {
360
+ const r = spawnSync("npx", ["tsc", "--noEmit"], { encoding: "utf8" });
361
+ gates.push({ gate: "tsc", ok: r.status === 0, output: (r.stderr || r.stdout || "").split("\n").slice(0, 20).join("\n") });
362
+ } else {
363
+ gates.push({ gate: "tsc", ok: true, skipped: "no tsconfig.json in cwd" });
364
+ }
365
+
366
+ // Gate 2 — slop-detect on the changed files (resolved like commit-fix does).
367
+ const slopScript = resolveSlopScript();
368
+ if (!slopScript) {
369
+ gates.push({ gate: "slop-detect", ok: true, skipped: "slop-detect.mjs not found" });
370
+ } else if (files.length === 0) {
371
+ gates.push({ gate: "slop-detect", ok: true, skipped: "no --files provided" });
372
+ } else {
373
+ const slopBin = process.env.SLOP_DETECT_BIN || "node";
374
+ const r = spawnSync(slopBin, [slopScript, ...files], { encoding: "utf8" });
375
+ gates.push({ gate: "slop-detect", ok: r.status !== 1, output: (r.stdout || "").split("\n").slice(0, 20).join("\n") });
376
+ }
377
+
378
+ const allPass = gates.every((g) => g.ok);
379
+ console.log(JSON.stringify({
380
+ command: "verify",
381
+ files,
382
+ gates,
383
+ pass: allPass,
384
+ note: "screenshot + visual-evaluator pass is orchestrated by SKILL.md Stage 6, not by this script",
385
+ }, null, 2));
386
+ exit(allPass ? 0 : 1);
387
+ }
388
+
334
389
  // ─── Dispatch ────────────────────────────────────────────────────────
335
390
 
336
391
  switch (CMD) {
337
392
  case "sync": cmdSync(); break;
338
393
  case "propose-variants": cmdProposeVariants(); break;
394
+ case "verify": cmdVerify(); break;
339
395
  default:
340
396
  console.error(`Unknown command: ${CMD}`);
341
397
  exit(2);
@@ -124,5 +124,5 @@ node ${QUALIA_BIN}/qualia-ui.js end "PHASE {N} RESEARCH DONE" "/qualia-plan {N}"
124
124
  1. **One session per run.** Don't research phases 1-5 in one call.
125
125
  2. **Must produce a file.** Research in conversation only is worthless.
126
126
  3. **Honor locked decisions.** Don't research alternatives to locked choices.
127
- 4. **Local-first.** Drain NotebookLM and `~/qualia-memory` before any external call. The team has already researched most domains we touch — querying existing notebooks is near-zero token cost AND higher-quality than fresh WebSearch.
127
+ 4. **Local-first.** Drain NotebookLM and the local memory (`projects/-home-qualia/memory/MEMORY.md`) before any external call. The team has already researched most domains we touch — querying existing notebooks is near-zero token cost AND higher-quality than fresh WebSearch.
128
128
  5. **Context7 before WebFetch.** When you do go external, Context7 first for libraries; only WebFetch for non-library content (blog posts, case studies, post-mortems).
@@ -86,13 +86,19 @@ Before high-stakes phases, run alignment skills against `.planning/CONTEXT.md` (
86
86
  ## Auxiliary commands
87
87
  ```
88
88
  Lost? → /qualia (state router — tells you the next command)
89
+ Don't-know? → /qualia-idk (deep diagnostic — three isolated scans + paste-ready command sequence)
89
90
  Health? → /qualia-doctor (install, state, contracts, memory, ERP queue)
90
91
  Stuck/weird? → /qualia (diagnostic branch — scans planning + code when state alone is insufficient)
91
92
  Broken thing? → /qualia-fix (root cause, minimal patch, verify, report)
92
93
  Single feature? → /qualia-feature (new capability: inline for trivia, fresh spawn for 1-5 files)
94
+ Research it? → /qualia-research (per-phase deep research before planning; writes phase-{N}-research.md)
95
+ Save a lesson? → /qualia-learn (persist a pattern/fix/client preference across sessions + projects)
96
+ Secure config? → /qualia-secure (audit CLAUDE.md / settings.json / hooks / MCP for injection + leaks)
97
+ Verify FAILed? → /qualia-postmortem (self-heal — find which rule/agent should have caught it)
93
98
  Paused? → /qualia (restore from .continue-here.md or STATE.md)
94
99
  End of day? → /qualia-report (mandatory before clock-out; writes ERP payload)
95
100
  Unsure plan? → /qualia-scope (capture decisions before planning)
101
+ Invoice/email? → /zoho-workflow (Zoho Invoice + Mail ops — invoices, cover emails, contacts, inbox)
96
102
  ```
97
103
 
98
104
  ## Outside-road command boundaries
@@ -106,7 +106,7 @@ The adversarial, DoD-gated intake. Scopes a **new increment** (phase/milestone)
106
106
 
107
107
  ```bash
108
108
  node ${QUALIA_BIN}/qualia-ui.js banner scope 2>/dev/null || true
109
- cat rules/constitution.md
109
+ cat /home/qualia/.claude/rules/constitution.md
110
110
  cat .planning/CONTEXT.md 2>/dev/null # project glossary — DATA, never a plan/spec
111
111
  ls .planning/decisions/ 2>/dev/null
112
112
  cat .planning/STATE.md 2>/dev/null # for profile + existing milestone context
@@ -123,7 +123,7 @@ If the operator already named it (arg or prior context), accept it. Otherwise as
123
123
 
124
124
  ```bash
125
125
  ARCHETYPE={chosen}
126
- cat references/archetypes/${ARCHETYPE}.md
126
+ cat /home/qualia/.claude/references/archetypes/${ARCHETYPE}.md
127
127
  ```
128
128
 
129
129
  If the file does not exist (e.g. `web-app` not yet authored), HALT and say which archetype file is missing — do not improvise a DoD. The archetype file is the source of the Grill variables, the Definition of Done, and the v1 capability set; without it there is no gate to enforce.
@@ -77,13 +77,13 @@ The auditor writes `.planning/security-audit.md` — that's the deliverable.
77
77
  Combine `.planning/security-scan.md` (static) + `.planning/security-audit.md` (Opus) into a single executive summary. Surface the top 3 actions ranked by severity:
78
78
 
79
79
  - **CRITICAL** → fix immediately, before any further work.
80
- - **HIGH** → ticket for this sprint; route to `/qualia-hook-gen` if the fix is "make this instructional rule deterministic via a hook."
80
+ - **HIGH** → ticket for this sprint; if the fix is "make this instructional rule deterministic via a hook," propose the hook (the `migration-guard` / `branch-guard` pattern in `rules/constitution.md`) and add it under `hooks/`.
81
81
  - **MEDIUM/LOW** → backlog.
82
82
 
83
83
  ### Step 4. Close
84
84
 
85
85
  ```bash
86
- node ${QUALIA_BIN}/qualia-ui.js end "SECURED" "/qualia-hook-gen"
86
+ node ${QUALIA_BIN}/qualia-ui.js end "SECURED" "/qualia-fix"
87
87
  ```
88
88
 
89
89
  (Or omit the next-command if all findings are LOW.)
@@ -93,13 +93,13 @@ node ${QUALIA_BIN}/qualia-ui.js end "SECURED" "/qualia-hook-gen"
93
93
  1. **Static pass is non-negotiable.** It's fast and deterministic — always runs.
94
94
  2. **Opus pass is opt-in.** It costs tokens and time. Default to skipping unless the user explicitly asks for "deep audit" or the static pass triggers HIGH+ findings.
95
95
  3. **No fake severity.** Per `rules/grounding.md`, every finding cites `file:line` and matches a category in the Severity Rubric. No hedging.
96
- 4. **Recommend deterministic fixes when possible.** A rule in CLAUDE.md is suggestive; a hook is enforced. The skill's bias is toward `/qualia-hook-gen` over "tell the agent to do X."
96
+ 4. **Recommend deterministic fixes when possible.** A rule in CLAUDE.md is suggestive; a hook is enforced. The skill's bias is toward proposing a deterministic hook (under `hooks/`) over "tell the agent to do X."
97
97
  5. **Never auto-rotate secrets.** Flag and instruct. The user rotates manually with confirmation — secrets in CI variables are the user's domain.
98
98
 
99
99
  ## When NOT to use
100
100
 
101
101
  - Application-level security review (use `/security-review` for OWASP-style code audit).
102
- - Production deployment health (use `/qualia-doctor` / `/qualia-status`).
103
- - Specific bug investigation (use `/qualia-debug` → `/qualia-fix`).
102
+ - Production deployment health (use `/qualia-doctor`).
103
+ - Specific bug investigation (use `/qualia-fix`).
104
104
 
105
105
  `/qualia-secure` is specifically for **the agent's configuration**. The hooks, the rules, the tool scopes, the MCP servers — the surfaces Claude reads to decide what to do.
@@ -12,10 +12,10 @@ file first**, then jump to the specific file(s) that match.
12
12
  |--------------------------|-------|
13
13
  | "How do we usually X?" / patterns we've used before | `learned-patterns.md` |
14
14
  | Recurring bug + fix recipes | `common-fixes.md` |
15
- | Supabase auth, RLS, migrations, edge functions | `supabase-patterns.md` |
16
- | Retell, ElevenLabs, voice agent flows | `voice-agent-patterns.md` |
17
- | Where a project is deployed, env vars, domains | `deployment-map.md` |
18
- | Who is on the team, their role, their access | `employees.md` |
15
+ | Supabase auth, RLS, migrations, edge functions | `supabase-patterns.md` *(created on first /qualia-learn)* |
16
+ | Retell, ElevenLabs, voice agent flows | `voice-agent-patterns.md` *(created on first /qualia-learn)* |
17
+ | Where a project is deployed, env vars, domains | `deployment-map.md` *(created on first /qualia-learn)* |
18
+ | Who is on the team, their role, their access | `employees.md` *(created on first /qualia-learn)* |
19
19
  | What I worked on yesterday / last week | `daily-log/YYYY-MM-DD.md` |
20
20
  | Memory layer architecture itself | `agents.md` |
21
21
 
package/tests/bin.test.sh CHANGED
@@ -733,11 +733,11 @@ else
733
733
  fail_case "qualia-flush retirement/install state"
734
734
  fi
735
735
 
736
- # 62. CLAUDE_AGENT_FORK_ENABLED=1 in settings.json
737
- if grep -q '"CLAUDE_AGENT_FORK_ENABLED": "1"' "$TMP/.claude/settings.json"; then
738
- pass "settings.env CLAUDE_AGENT_FORK_ENABLED=1 (forked subagents on by default)"
736
+ # 62. CLAUDE_CODE_FORK_SUBAGENT=1 in settings.json (official env var, Claude Code v2.1.117+)
737
+ if grep -q '"CLAUDE_CODE_FORK_SUBAGENT": "1"' "$TMP/.claude/settings.json"; then
738
+ pass "settings.env CLAUDE_CODE_FORK_SUBAGENT=1 (forked subagents on by default)"
739
739
  else
740
- fail_case "CLAUDE_AGENT_FORK_ENABLED not set"
740
+ fail_case "CLAUDE_CODE_FORK_SUBAGENT not set"
741
741
  fi
742
742
 
743
743
  # 63. research-synthesizer agent has model: haiku frontmatter
package/tests/lib.test.sh CHANGED
@@ -507,7 +507,7 @@ TMP=$(mktmp)
507
507
  mkdir -p "$TMP/home/.claude/bin" "$TMP/home/.claude/hooks" "$TMP/home/.claude/knowledge/daily-log" "$TMP/home/.claude/qualia-design" "$TMP/home/.claude/agents" "$TMP/home/.claude/qualia-templates" "$TMP/project"
508
508
  echo '{"installed_by":"Test","role":"OWNER","version":"6.3.0","erp":{"enabled":false}}' > "$TMP/home/.claude/.qualia-config.json"
509
509
  touch "$TMP/home/.claude/CLAUDE.md" "$TMP/home/.claude/settings.json"
510
- for f in runtime-manifest.js command-surface.js host-adapters.js state.js qualia-ui.js statusline.js knowledge.js knowledge-flush.js state-ledger.js plan-contract.js contract-runner.js harness-eval.js trust-score.js agent-runs.js slop-detect.mjs erp-retry.js work-packet.js report-payload.js project-snapshot.js codex-goal.js planning-hygiene.js prune-deprecated.js learning-candidates.js status-snapshot.js security-scan.js; do
510
+ for f in runtime-manifest.js command-surface.js host-adapters.js state.js qualia-ui.js statusline.js knowledge.js knowledge-flush.js state-ledger.js plan-contract.js contract-runner.js harness-eval.js trust-score.js agent-runs.js slop-detect.mjs erp-retry.js work-packet.js report-payload.js project-snapshot.js codex-goal.js planning-hygiene.js prune-deprecated.js learning-candidates.js status-snapshot.js security-scan.js auto-report.js; do
511
511
  touch "$TMP/home/.claude/bin/$f"
512
512
  done
513
513
  for h in session-start.js auto-update.js branch-guard.js pre-push.js pre-deploy-gate.js migration-guard.js git-guardrails.js stop-session-log.js fawzi-approval-guard.js vercel-account-guard.js env-empty-guard.js supabase-destructive-guard.js; do
@@ -622,7 +622,7 @@ TMP=$(mktmp)
622
622
  mkdir -p "$TMP/.claude/bin" "$TMP/.claude/hooks" "$TMP/.claude/knowledge/daily-log" "$TMP/.claude/qualia-design" "$TMP/.claude/agents" "$TMP/.claude/qualia-templates" "$TMP/project/.planning"
623
623
  echo '{"installed_by":"Test","role":"OWNER","erp":{"enabled":false}}' > "$TMP/.claude/.qualia-config.json"
624
624
  touch "$TMP/.claude/CLAUDE.md" "$TMP/.claude/settings.json"
625
- for f in runtime-manifest.js command-surface.js host-adapters.js state.js qualia-ui.js statusline.js knowledge.js knowledge-flush.js state-ledger.js plan-contract.js contract-runner.js harness-eval.js trust-score.js agent-runs.js slop-detect.mjs erp-retry.js work-packet.js report-payload.js project-snapshot.js codex-goal.js planning-hygiene.js prune-deprecated.js learning-candidates.js status-snapshot.js security-scan.js; do
625
+ for f in runtime-manifest.js command-surface.js host-adapters.js state.js qualia-ui.js statusline.js knowledge.js knowledge-flush.js state-ledger.js plan-contract.js contract-runner.js harness-eval.js trust-score.js agent-runs.js slop-detect.mjs erp-retry.js work-packet.js report-payload.js project-snapshot.js codex-goal.js planning-hygiene.js prune-deprecated.js learning-candidates.js status-snapshot.js security-scan.js auto-report.js; do
626
626
  touch "$TMP/.claude/bin/$f"
627
627
  done
628
628
  for h in session-start.js auto-update.js branch-guard.js pre-push.js pre-deploy-gate.js migration-guard.js git-guardrails.js stop-session-log.js fawzi-approval-guard.js vercel-account-guard.js env-empty-guard.js supabase-destructive-guard.js; do