qualia-framework 4.5.0 → 5.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +24 -0
- package/CLAUDE.md +12 -75
- package/README.md +23 -16
- package/agents/builder.md +9 -21
- package/agents/planner.md +8 -0
- package/agents/verifier.md +8 -0
- package/agents/visual-evaluator.md +132 -0
- package/bin/cli.js +54 -18
- package/bin/install.js +369 -29
- package/bin/qualia-ui.js +208 -1
- package/bin/slop-detect.mjs +5 -0
- package/bin/state.js +34 -1
- package/docs/install-redesign-builder-prompt.md +290 -0
- package/docs/install-redesign-pilot.md +234 -0
- package/docs/playwright-loop-builder-prompt.md +185 -0
- package/docs/playwright-loop-design-notes.md +108 -0
- package/docs/playwright-loop-pilot-results.md +170 -0
- package/docs/playwright-loop-tester-prompt.md +213 -0
- package/docs/polish-loop-supervised-run.md +111 -0
- package/docs/reviews/matt-pocock-skills-analysis.md +300 -0
- package/guide.md +9 -5
- package/hooks/env-empty-guard.js +74 -0
- package/hooks/pre-compact.js +19 -9
- package/hooks/pre-deploy-gate.js +8 -2
- package/hooks/pre-push.js +26 -12
- package/hooks/supabase-destructive-guard.js +62 -0
- package/hooks/vercel-account-guard.js +91 -0
- package/package.json +2 -1
- package/rules/design-brand.md +4 -0
- package/rules/design-laws.md +4 -0
- package/rules/design-product.md +4 -0
- package/rules/design-rubric.md +4 -0
- package/rules/grounding.md +4 -0
- package/skills/qualia-build/SKILL.md +40 -46
- package/skills/qualia-discuss/SKILL.md +51 -68
- package/skills/qualia-handoff/SKILL.md +1 -0
- package/skills/qualia-hook-gen/SKILL.md +206 -0
- package/skills/qualia-issues/SKILL.md +151 -0
- package/skills/qualia-map/SKILL.md +78 -35
- package/skills/qualia-new/REFERENCE.md +139 -0
- package/skills/qualia-new/SKILL.md +45 -121
- package/skills/qualia-optimize/REFERENCE.md +265 -0
- package/skills/qualia-optimize/SKILL.md +92 -232
- package/skills/qualia-plan/SKILL.md +58 -65
- package/skills/qualia-polish-loop/REFERENCE.md +265 -0
- package/skills/qualia-polish-loop/SKILL.md +201 -0
- package/skills/qualia-polish-loop/fixtures/broken.html +117 -0
- package/skills/qualia-polish-loop/fixtures/clean.html +196 -0
- package/skills/qualia-polish-loop/scripts/loop.mjs +323 -0
- package/skills/qualia-polish-loop/scripts/playwright-capture.mjs +206 -0
- package/skills/qualia-polish-loop/scripts/score.mjs +176 -0
- package/skills/qualia-prd/SKILL.md +199 -0
- package/skills/qualia-report/SKILL.md +141 -200
- package/skills/qualia-research/SKILL.md +28 -33
- package/skills/qualia-road/SKILL.md +103 -0
- package/skills/qualia-ship/SKILL.md +1 -0
- package/skills/qualia-task/SKILL.md +1 -1
- package/skills/qualia-test/SKILL.md +50 -2
- package/skills/qualia-triage/SKILL.md +152 -0
- package/skills/qualia-verify/SKILL.md +63 -104
- package/skills/qualia-zoom/SKILL.md +51 -0
- package/skills/zoho-workflow/SKILL.md +1 -1
- package/templates/CONTEXT.md +36 -0
- package/templates/decisions/ADR-template.md +30 -0
- package/tests/bin.test.sh +598 -7
- package/tests/state.test.sh +58 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: qualia-plan
|
|
3
|
-
description: "
|
|
3
|
+
description: "Plans the current phase by spawning a planner agent to break it into executable tasks with waves, then validates via a plan-checker revision loop (max 2 cycles). Supports gap-closure mode for verification failures. Use when the user says 'plan this phase', 'break this into tasks', 'create the plan', 'qualia-plan', 'plan phase 2', or after /qualia-new sets up the journey."
|
|
4
4
|
allowed-tools:
|
|
5
5
|
- Bash
|
|
6
6
|
- Read
|
|
@@ -15,15 +15,15 @@ allowed-tools:
|
|
|
15
15
|
|
|
16
16
|
# /qualia-plan — Plan a Phase
|
|
17
17
|
|
|
18
|
-
Spawn
|
|
18
|
+
Spawn planner to break phase into tasks, validate with checker (max 2 revision cycles), route to build.
|
|
19
19
|
|
|
20
20
|
## Usage
|
|
21
21
|
|
|
22
|
-
`/qualia-plan` — plan
|
|
22
|
+
`/qualia-plan` — plan next unplanned phase
|
|
23
23
|
`/qualia-plan {N}` — plan specific phase N
|
|
24
24
|
`/qualia-plan {N} --gaps` — plan fixes for verification failures
|
|
25
|
-
`/qualia-plan {N} --skip-check` — skip
|
|
26
|
-
`/qualia-plan {N} --auto` — plan +
|
|
25
|
+
`/qualia-plan {N} --skip-check` — skip plan-checker loop (not recommended)
|
|
26
|
+
`/qualia-plan {N} --auto` — plan + chain into `/qualia-build {N} --auto` (no human gate)
|
|
27
27
|
|
|
28
28
|
## Process
|
|
29
29
|
|
|
@@ -38,9 +38,9 @@ node ~/.claude/bin/knowledge.js load patterns
|
|
|
38
38
|
node ~/.claude/bin/knowledge.js load client
|
|
39
39
|
```
|
|
40
40
|
|
|
41
|
-
|
|
41
|
+
No phase number → current phase from STATE.md.
|
|
42
42
|
|
|
43
|
-
**
|
|
43
|
+
**Phase-specific context (if exists):**
|
|
44
44
|
```bash
|
|
45
45
|
cat .planning/phase-{N}-context.md 2>/dev/null # from /qualia-discuss
|
|
46
46
|
cat .planning/phase-{N}-research.md 2>/dev/null # from /qualia-research
|
|
@@ -48,19 +48,17 @@ cat .planning/phase-{N}-research.md 2>/dev/null # from /qualia-research
|
|
|
48
48
|
|
|
49
49
|
### 2. Optional: Suggest Deeper Prep
|
|
50
50
|
|
|
51
|
-
**If ROADMAP.md
|
|
51
|
+
**If ROADMAP.md flagged phase for research AND no phase-{N}-research.md:**
|
|
52
52
|
|
|
53
53
|
- header: "Research first?"
|
|
54
|
-
- question: "
|
|
54
|
+
- question: "Phase flagged for research. Run /qualia-research {N} first?"
|
|
55
55
|
- options:
|
|
56
|
-
- "Yes, research first"
|
|
57
|
-
- "Skip, plan directly"
|
|
56
|
+
- "Yes, research first"
|
|
57
|
+
- "Skip, plan directly"
|
|
58
58
|
|
|
59
|
-
**If phase
|
|
59
|
+
**If phase has compliance/regulatory/architectural stakes AND no phase-{N}-context.md:**
|
|
60
60
|
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
Don't force it. Some phases don't need it.
|
|
61
|
+
Suggest: *"Run /qualia-discuss {N} first to lock decisions? Optional."*
|
|
64
62
|
|
|
65
63
|
### 3. Spawn Planner (Fresh Context)
|
|
66
64
|
|
|
@@ -69,12 +67,9 @@ node ~/.claude/bin/qualia-ui.js banner plan {N} "{phase name from ROADMAP.md}"
|
|
|
69
67
|
node ~/.claude/bin/qualia-ui.js spawn planner "Breaking phase into tasks..."
|
|
70
68
|
```
|
|
71
69
|
|
|
72
|
-
Spawn the planner:
|
|
73
|
-
|
|
74
70
|
```
|
|
75
71
|
Agent(prompt="
|
|
76
|
-
|
|
77
|
-
Grounding + rubrics: @~/.claude/rules/grounding.md
|
|
72
|
+
Role: @~/.claude/agents/planner.md
|
|
78
73
|
|
|
79
74
|
<project_context>
|
|
80
75
|
@.planning/PROJECT.md
|
|
@@ -89,86 +84,84 @@ Phase {N} from ROADMAP.md:
|
|
|
89
84
|
@.planning/ROADMAP.md
|
|
90
85
|
|
|
91
86
|
Goal: {goal from ROADMAP.md}
|
|
92
|
-
|
|
87
|
+
Reqs: {REQ-IDs from ROADMAP.md}
|
|
93
88
|
Success criteria: {success criteria from ROADMAP.md}
|
|
94
89
|
</phase_details>
|
|
95
90
|
|
|
96
91
|
<locked_decisions>
|
|
97
|
-
{
|
|
92
|
+
{phase-{N}-context.md Locked Decisions if exists, else 'none'}
|
|
98
93
|
</locked_decisions>
|
|
99
94
|
|
|
100
95
|
<research_findings>
|
|
101
|
-
{
|
|
96
|
+
{phase-{N}-research.md recommendation if exists, else 'none'}
|
|
102
97
|
</research_findings>
|
|
103
98
|
|
|
104
|
-
{
|
|
99
|
+
{--gaps → read @.planning/phase-{N}-verification.md failures. Create gap-closure plan.}
|
|
105
100
|
|
|
106
101
|
<relevant_learnings>
|
|
107
|
-
{
|
|
102
|
+
{applicable patterns from knowledge/learned-patterns.md}
|
|
108
103
|
</relevant_learnings>
|
|
109
104
|
|
|
110
|
-
|
|
105
|
+
Output: .planning/phase-{N}-plan.md (or phase-{N}-gaps-plan.md for --gaps).
|
|
111
106
|
", subagent_type="qualia-planner", description="Plan phase {N}")
|
|
112
107
|
```
|
|
113
108
|
|
|
114
|
-
### 4. Validate
|
|
109
|
+
### 4. Validate Plan (unless --skip-check)
|
|
115
110
|
|
|
116
|
-
Read
|
|
111
|
+
Read generated plan. Spawn checker:
|
|
117
112
|
|
|
118
113
|
```
|
|
119
114
|
Agent(prompt="
|
|
120
|
-
|
|
121
|
-
Grounding + rubrics: @~/.claude/rules/grounding.md
|
|
115
|
+
Role: @~/.claude/agents/plan-checker.md
|
|
122
116
|
|
|
123
117
|
<plan_path>.planning/phase-{N}-plan.md</plan_path>
|
|
124
118
|
<phase_goal>{goal from ROADMAP.md}</phase_goal>
|
|
125
119
|
<success_criteria>{criteria from ROADMAP.md}</success_criteria>
|
|
126
120
|
<project_context>@.planning/PROJECT.md</project_context>
|
|
127
121
|
|
|
128
|
-
Validate against
|
|
122
|
+
Validate against 7 rules. Return PASS or REVISE with structured issues.
|
|
129
123
|
", subagent_type="qualia-plan-checker", description="Check plan phase {N}")
|
|
130
124
|
```
|
|
131
125
|
|
|
132
|
-
**Revision loop (max 2
|
|
126
|
+
**Revision loop (max 2):**
|
|
133
127
|
|
|
134
|
-
-
|
|
135
|
-
-
|
|
128
|
+
- Iter 1: Check → REVISE → re-spawn planner with checker issues
|
|
129
|
+
- Iter 2: Re-check → REVISE/BLOCKED → escalate to user
|
|
136
130
|
|
|
137
|
-
|
|
131
|
+
(74→86% after 1 round, 88% after 3. Iter 3 adds only 2pp; not worth extra spawn.)
|
|
138
132
|
|
|
139
|
-
|
|
133
|
+
Per revision:
|
|
140
134
|
|
|
141
135
|
```
|
|
142
136
|
Agent(prompt="
|
|
143
|
-
|
|
137
|
+
Role: @~/.claude/agents/planner.md
|
|
144
138
|
|
|
145
139
|
<revision_mode>true</revision_mode>
|
|
146
140
|
<current_plan>@.planning/phase-{N}-plan.md</current_plan>
|
|
147
141
|
<checker_feedback>
|
|
148
|
-
{
|
|
142
|
+
{REVISE output from plan-checker}
|
|
149
143
|
</checker_feedback>
|
|
150
144
|
|
|
151
|
-
Revise
|
|
152
|
-
— only fix what the checker flagged.
|
|
145
|
+
Revise in place. Address every issue. Do NOT add tasks or change scope; fix only what checker flagged.
|
|
153
146
|
", subagent_type="qualia-planner", description="Revise plan phase {N}")
|
|
154
147
|
```
|
|
155
148
|
|
|
156
|
-
After revision, spawn
|
|
149
|
+
After revision, re-spawn checker. Max 2 total cycles.
|
|
157
150
|
|
|
158
|
-
**
|
|
151
|
+
**BLOCKED after 2 cycles:**
|
|
159
152
|
|
|
160
153
|
```bash
|
|
161
154
|
node ~/.claude/bin/qualia-ui.js fail "Plan failed validation after 2 revisions"
|
|
162
155
|
```
|
|
163
156
|
|
|
164
|
-
Show
|
|
165
|
-
- "Skip validation
|
|
166
|
-
- "Adjust
|
|
167
|
-
- "Adjust
|
|
157
|
+
Show remaining issues. Options:
|
|
158
|
+
- "Skip validation" (`--skip-check`)
|
|
159
|
+
- "Adjust roadmap" (scope wrong)
|
|
160
|
+
- "Adjust phase goal" (criteria under-specified)
|
|
168
161
|
|
|
169
162
|
### 5. Present Final Plan
|
|
170
163
|
|
|
171
|
-
Render
|
|
164
|
+
Render story-file dashboard:
|
|
172
165
|
|
|
173
166
|
```bash
|
|
174
167
|
node ~/.claude/bin/qualia-ui.js plan-summary .planning/phase-{N}-plan.md
|
|
@@ -184,19 +177,19 @@ If "adjust" — get feedback, re-spawn planner with revision context, re-validat
|
|
|
184
177
|
node ~/.claude/bin/state.js transition --to planned --phase {N}
|
|
185
178
|
```
|
|
186
179
|
|
|
187
|
-
|
|
180
|
+
Error → show, stop. Do NOT edit STATE.md or tracking.json manually.
|
|
188
181
|
|
|
189
182
|
### 7. Route (auto-chain aware)
|
|
190
183
|
|
|
191
|
-
|
|
184
|
+
**`--auto`:** invoke `/qualia-build {N} --auto` inline. User approved at `/qualia-new`; no per-phase gate.
|
|
192
185
|
|
|
193
186
|
```bash
|
|
194
187
|
node ~/.claude/bin/qualia-ui.js info "Auto mode — chaining into /qualia-build {N}"
|
|
195
188
|
```
|
|
196
189
|
|
|
197
|
-
Then invoke
|
|
190
|
+
Then invoke `qualia-build` inline with `--auto`.
|
|
198
191
|
|
|
199
|
-
**
|
|
192
|
+
**Guided mode:** stop, show next step:
|
|
200
193
|
|
|
201
194
|
```bash
|
|
202
195
|
node ~/.claude/bin/qualia-ui.js end "PHASE {N} PLANNED" "/qualia-build {N}"
|
|
@@ -204,22 +197,22 @@ node ~/.claude/bin/qualia-ui.js end "PHASE {N} PLANNED" "/qualia-build {N}"
|
|
|
204
197
|
|
|
205
198
|
## Gap Closure Mode (`--gaps`)
|
|
206
199
|
|
|
207
|
-
|
|
200
|
+
With `--gaps`, planner enters gap-closure mode:
|
|
208
201
|
|
|
209
|
-
1. Read `.planning/phase-{N}-verification.md`
|
|
210
|
-
2.
|
|
211
|
-
- **Files:** specific files that failed
|
|
212
|
-
- **Action:** specific fix (not "fix auth"
|
|
213
|
-
- **
|
|
214
|
-
3. Do NOT re-plan passing items.
|
|
215
|
-
4.
|
|
216
|
-
5. All gap tasks
|
|
217
|
-
6. Plan-checker still validates
|
|
202
|
+
1. Read `.planning/phase-{N}-verification.md` → extract ONLY FAIL items
|
|
203
|
+
2. Per FAIL, create targeted fix task:
|
|
204
|
+
- **Files:** specific files that failed
|
|
205
|
+
- **Action:** specific fix (not "fix auth"; "add session persistence check in src/lib/auth.ts signIn fn")
|
|
206
|
+
- **AC:** failed criterion restated as observable behavior
|
|
207
|
+
3. Do NOT re-plan passing items. No new features. Surgical only.
|
|
208
|
+
4. Output: `.planning/phase-{N}-gaps-plan.md`
|
|
209
|
+
5. All gap tasks Wave 1 (parallel) unless they share files
|
|
210
|
+
6. Plan-checker still validates; same 7 rules
|
|
218
211
|
|
|
219
212
|
## Rules
|
|
220
213
|
|
|
221
|
-
1. **Plan-checker
|
|
222
|
-
2. **Max 3 revision cycles.**
|
|
223
|
-
3. **Honor locked decisions.**
|
|
224
|
-
4. **One plan file per phase.**
|
|
225
|
-
5. **Revision is surgical.**
|
|
214
|
+
1. **Plan-checker mandatory by default.** Skip only with `--skip-check`.
|
|
215
|
+
2. **Max 3 revision cycles.** 3 fails → escalate; scope probably wrong.
|
|
216
|
+
3. **Honor locked decisions.** phase-{N}-context.md locked decisions non-negotiable.
|
|
217
|
+
4. **One plan file per phase.** No phase-1-plan-v2.md. Edit in place.
|
|
218
|
+
5. **Revision is surgical.** Fix only what checker flagged; no scope creep.
|
|
@@ -0,0 +1,265 @@
|
|
|
1
|
+
# REFERENCE — /qualia-polish-loop
|
|
2
|
+
|
|
3
|
+
Verbatim agent prompts and operational details. Loaded on demand by SKILL.md, not carried in the system prompt. Per progressive-disclosure discipline (Matt Pocock): the agent reads SKILL.md first, then this file when it needs the spawn templates.
|
|
4
|
+
|
|
5
|
+
## Architecture summary
|
|
6
|
+
|
|
7
|
+
```
|
|
8
|
+
SKILL.md driver (Claude session)
|
|
9
|
+
│
|
|
10
|
+
├─ scripts/playwright-capture.mjs (deterministic Node — produces PNGs)
|
|
11
|
+
│ ↓ writes /tmp/qpl-{ts}/iter-{N}/{mobile,tablet,desktop}-*.png
|
|
12
|
+
│
|
|
13
|
+
├─ Agent({subagent_type: "qualia-visual-evaluator", ...})
|
|
14
|
+
│ ↓ reads PNGs, returns single JSON envelope (eval.json)
|
|
15
|
+
│
|
|
16
|
+
├─ scripts/loop.mjs record (deterministic — verdict + fingerprints)
|
|
17
|
+
│ ↓ exit 0=SUCCESS, 1=CONTINUE, 3=KILLED
|
|
18
|
+
│
|
|
19
|
+
├─ Agent({subagent_type: "qualia-builder", ...}) × up to 3 in parallel
|
|
20
|
+
│ ↓ each fixes ONE issue, calls scripts/loop.mjs commit-fix
|
|
21
|
+
│
|
|
22
|
+
└─ scripts/loop.mjs report (final markdown report)
|
|
23
|
+
↓ writes .planning/visual-polish-loop.md
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
## Capture: backend selection
|
|
27
|
+
|
|
28
|
+
The capture script (`scripts/playwright-capture.mjs`) auto-selects in this order:
|
|
29
|
+
|
|
30
|
+
1. `import('playwright')` — preferred when available; gives deterministic `waitUntil: 'networkidle'`
|
|
31
|
+
2. `import('playwright-core')` — same API, lighter package
|
|
32
|
+
3. `~/.cache/ms-playwright/chromium-{version}/chrome-{linux64,linux,mac,win}/chrome` — if Playwright was ever installed for browsers but the package isn't import-resolvable
|
|
33
|
+
4. `which google-chrome` / `chromium` / `chromium-browser` / `chrome` — system browser fallback
|
|
34
|
+
|
|
35
|
+
For backends 3 and 4 (binary-direct), the script uses `--headless=new --screenshot --virtual-time-budget`. Less precise than Playwright's `networkidle` waiting but works without any npm dependency.
|
|
36
|
+
|
|
37
|
+
Setup hints if all four fail:
|
|
38
|
+
|
|
39
|
+
```bash
|
|
40
|
+
# Option A — Playwright (best stability)
|
|
41
|
+
npm i -D playwright && npx playwright install chromium
|
|
42
|
+
|
|
43
|
+
# Option B — system Chrome (fastest setup if you already have Chrome installed)
|
|
44
|
+
# (no action needed if google-chrome is on PATH)
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
## Vision-evaluator spawn template (VERBATIM)
|
|
48
|
+
|
|
49
|
+
The vision evaluator's anchored discipline: **DEFAULT TO 3.** Only score above 3 with a cited design principle. Only score below 3 with a quoted violation. Without anchoring, vision models return "looks great!" to everything — that failure mode is the entire reason this loop exists. The full rubric criteria live in `agents/visual-evaluator.md`; this section is the spawn template.
|
|
50
|
+
|
|
51
|
+
When the loop reaches step 2 (Evaluate), spawn ONE agent with the screenshots, brief, rubric, and previous-iteration context. Inline this prompt verbatim — do not paraphrase.
|
|
52
|
+
|
|
53
|
+
```
|
|
54
|
+
Agent({
|
|
55
|
+
subagent_type: "qualia-visual-evaluator",
|
|
56
|
+
description: "Score iteration {N} screenshots against rubric",
|
|
57
|
+
prompt: `
|
|
58
|
+
Role: @~/.claude/agents/visual-evaluator.md
|
|
59
|
+
|
|
60
|
+
<rubric>
|
|
61
|
+
{INLINE rules/design-rubric.md §"The 8 dimensions" through §"Aggregate score"}
|
|
62
|
+
</rubric>
|
|
63
|
+
|
|
64
|
+
<brief>
|
|
65
|
+
{INLINE the relevant excerpt from .planning/DESIGN.md — sections "Direction", "Color", "Typography"}
|
|
66
|
+
</brief>
|
|
67
|
+
|
|
68
|
+
<product>
|
|
69
|
+
{INLINE the relevant excerpt from .planning/PRODUCT.md — register, voice, anti-references}
|
|
70
|
+
</product>
|
|
71
|
+
|
|
72
|
+
<screenshots>
|
|
73
|
+
- mobile (375px): /tmp/qpl-{ts}/iter-{N}/mobile-375.png
|
|
74
|
+
- tablet (768px): /tmp/qpl-{ts}/iter-{N}/tablet-768.png
|
|
75
|
+
- desktop (1440px): /tmp/qpl-{ts}/iter-{N}/desktop-1440.png
|
|
76
|
+
</screenshots>
|
|
77
|
+
|
|
78
|
+
<viewport_meta>
|
|
79
|
+
{ "reduced_motion": {true|false}, "viewport_widths": [375, 768, 1440] }
|
|
80
|
+
</viewport_meta>
|
|
81
|
+
|
|
82
|
+
<previous_iteration>
|
|
83
|
+
{If N > 1, INLINE eval.json.top_issues from iter-{N-1} so the evaluator can verify regression vs improvement. Otherwise: "(first iteration — no prior data)"}
|
|
84
|
+
</previous_iteration>
|
|
85
|
+
|
|
86
|
+
<task>
|
|
87
|
+
This is iteration {N} of {max}. Read each screenshot. Score every dimension 1-5 with one-line evidence per dimension per viewport. Return a single fenced JSON block per the contract in your role file. No prose outside the JSON.
|
|
88
|
+
</task>
|
|
89
|
+
`
|
|
90
|
+
})
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
The evaluator's role file (`agents/visual-evaluator.md`) carries the trust-boundary block, the calibration examples, and the JSON output contract. Together with this spawn template, the prompt prefix is stable across iterations — Anthropic prompt caching reuses the role + rubric + brief prefix, so the per-iteration cost is roughly: 3 image reads + the previous-iteration delta.
|
|
94
|
+
|
|
95
|
+
## Fix-builder spawn template (VERBATIM)
|
|
96
|
+
|
|
97
|
+
When the loop has 1-3 issues to fix, spawn one builder per issue IN THE SAME RESPONSE TURN (parallel). Each fixes one dimension, narrowly.
|
|
98
|
+
|
|
99
|
+
```
|
|
100
|
+
Agent({
|
|
101
|
+
subagent_type: "qualia-builder",
|
|
102
|
+
description: "Fix {dim} issue: {short description}",
|
|
103
|
+
prompt: `
|
|
104
|
+
Role: @~/.claude/agents/builder.md
|
|
105
|
+
|
|
106
|
+
<phase_context>
|
|
107
|
+
You are inside /qualia-polish-loop iteration {N}. The vision evaluator scored
|
|
108
|
+
the {dim} dimension at {score}. Your single task: fix that one dimension.
|
|
109
|
+
|
|
110
|
+
<design>
|
|
111
|
+
{INLINE .planning/DESIGN.md tokens relevant to {dim}}
|
|
112
|
+
</design>
|
|
113
|
+
|
|
114
|
+
<product>
|
|
115
|
+
{INLINE .planning/PRODUCT.md voice + register}
|
|
116
|
+
</product>
|
|
117
|
+
</phase_context>
|
|
118
|
+
|
|
119
|
+
<task_context>
|
|
120
|
+
# Issue
|
|
121
|
+
- Dimension: {dim}
|
|
122
|
+
- Severity: {severity}
|
|
123
|
+
- Description: {description}
|
|
124
|
+
- Likely file: {likely_file or "(infer from grep — start at the path the screenshot suggests)"}
|
|
125
|
+
- Recommended fix: {fix}
|
|
126
|
+
|
|
127
|
+
# Files probably affected
|
|
128
|
+
{1-3 candidate paths the loop has inferred from the URL routing}
|
|
129
|
+
</task_context>
|
|
130
|
+
|
|
131
|
+
<task>
|
|
132
|
+
1. Read the likely file. If the issue is in a different file, follow the import graph until you find the source.
|
|
133
|
+
2. Make the MINIMUM edit to fix this one dimension. Do not refactor. Do not change logic. Do not touch state management. Do not change copy unless this is a microcopy issue.
|
|
134
|
+
3. Use design tokens from DESIGN.md. Do not invent new color values, font names, or spacing.
|
|
135
|
+
4. After the edit, commit via the orchestrator (slop-detect-gated):
|
|
136
|
+
node ~/.claude/skills/qualia-polish-loop/scripts/loop.mjs commit-fix --state {STATE} --file {file} --slug {dim}-{short-keyword}
|
|
137
|
+
If slop-detect blocks (exit 2), READ the slop output and re-edit. If you cannot fix without violating slop-detect, return BLOCKED with the conflict.
|
|
138
|
+
5. Return DONE with: file modified, lines changed, slop-detect: pass, commit: {sha}.
|
|
139
|
+
</task>
|
|
140
|
+
|
|
141
|
+
<rules>
|
|
142
|
+
- Vision says: {evidence from eval.json.viewport_results[].evidence[{dim}]}
|
|
143
|
+
- Do not add features.
|
|
144
|
+
- Do not write tests for this fix (the loop's next iteration is the test).
|
|
145
|
+
- Single commit. The orchestrator handles the slug + iteration prefix.
|
|
146
|
+
</rules>
|
|
147
|
+
`
|
|
148
|
+
})
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
## Iteration log entry (what `loop.mjs record` writes to state.json.iterations[])
|
|
152
|
+
|
|
153
|
+
```json
|
|
154
|
+
{
|
|
155
|
+
"iteration": 1,
|
|
156
|
+
"scores": { "typography": 1, "color": 1, "spatial": 3, "layout": 1, "shadow": 3, "motion": 3, "microcopy": 1, "container": 1 },
|
|
157
|
+
"aggregate": 14,
|
|
158
|
+
"pass": false,
|
|
159
|
+
"failing_dims": ["typography", "color", "layout", "microcopy", "container"],
|
|
160
|
+
"top_issues": [
|
|
161
|
+
{ "dim": "color", "severity": "critical", "description": "blue→purple gradient on hero", "likely_file": "src/styles/globals.css", "fix": "replace linear-gradient with single accent var(--accent)" },
|
|
162
|
+
{ "dim": "typography", "severity": "critical", "description": "Inter as primary font-family", "likely_file": "src/styles/globals.css", "fix": "swap to Fraunces + JetBrains Mono per DESIGN.md §3" },
|
|
163
|
+
{ "dim": "layout", "severity": "high", "description": "three identical feature cards in section 2", "likely_file": "src/pages/index.tsx", "fix": "vary card sizes per design-brand.md §Layout" }
|
|
164
|
+
],
|
|
165
|
+
"tokens_used": 14500,
|
|
166
|
+
"timestamp": "2026-05-03T12:34:56.000Z"
|
|
167
|
+
}
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
## Issue fingerprint (regression detection)
|
|
171
|
+
|
|
172
|
+
The orchestrator computes a fingerprint per top_issue for each iteration:
|
|
173
|
+
|
|
174
|
+
```
|
|
175
|
+
fingerprint = `${dim}__${path.basename(likely_file)}__${first_32_chars_of_description}`
|
|
176
|
+
.toLowerCase().replace(/\W+/g, "_")
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
State stores `state.fingerprints[fingerprint] = { iterations: [1,2,3], description, dim }`. The KILL trigger is **3 consecutive integer iterations in `iterations[]`** — non-consecutive recurrences don't kill (the issue may have been fixed, broken by a different change, then refixed; that's a different signal than "fix-builder cannot fix this").
|
|
180
|
+
|
|
181
|
+
When the kill trigger fires, the verdict becomes `killed_regression` and `state.kill_fingerprint` records which one. The user can `cat state.json | jq '.fingerprints | to_entries | map(select(.key == "{fingerprint}"))'` to see the recurrence pattern.
|
|
182
|
+
|
|
183
|
+
## Token-budget table
|
|
184
|
+
|
|
185
|
+
| Iterations | Tokens (est.) | Sized for |
|
|
186
|
+
|---|---|---|
|
|
187
|
+
| 2 | ~30K | known-clean page sanity-check |
|
|
188
|
+
| 4 | ~60K | mid-confidence |
|
|
189
|
+
| 6 | ~90K | default |
|
|
190
|
+
| 8 | ~120K | hard cap; pass `--budget 150000` to allow |
|
|
191
|
+
|
|
192
|
+
Per-iteration cost (rough):
|
|
193
|
+
- 3 screenshot reads ≈ 9K
|
|
194
|
+
- rubric + brief inlined ≈ 2K (cached after iter 1)
|
|
195
|
+
- previous-iteration delta ≈ 0.5K
|
|
196
|
+
- 3 fix-builder spawns × (file read + edit + commit-fix call) ≈ 3K
|
|
197
|
+
- **per-iteration ≈ 14.5K**
|
|
198
|
+
|
|
199
|
+
## Self-test scenarios (mapping to spec)
|
|
200
|
+
|
|
201
|
+
| # | Fixture | Expected | Verifier |
|
|
202
|
+
|---|---|---|---|
|
|
203
|
+
| 1 | `fixtures/clean.html` | SUCCESS in 1-2 iterations, all dims ≥ 4 | run capture, run evaluator inline, assert pass |
|
|
204
|
+
| 2 | `fixtures/broken.html` | SUCCESS in 4-6 iters; identifies banned font + gradient + 3-card grid + side-stripe + generic CTA | each fix-builder commits a `qpl-N:` change; final eval all dims ≥ 3 |
|
|
205
|
+
| 3 | Kill-switch | KILL at iter ≤ 4 with `LOOP_REGRESSION_DETECTED` | call `loop.mjs record` 3× with the same fingerprint; assert exit 3 + correct verdict |
|
|
206
|
+
|
|
207
|
+
The pilot-results doc at `docs/playwright-loop-pilot-results.md` records the actual outcome from `bash scripts/_self-tests.sh` (Scenario 3 is exercised by a deterministic unit-style invocation; Scenarios 1+2 require a real vision pass and are run by Claude when the loop ships).
|
|
208
|
+
|
|
209
|
+
## Final report template (what `loop.mjs report` emits to stdout)
|
|
210
|
+
|
|
211
|
+
```markdown
|
|
212
|
+
# Visual-Polish Loop Report
|
|
213
|
+
|
|
214
|
+
- **URL:** http://localhost:3000
|
|
215
|
+
- **Brief:** .planning/DESIGN.md
|
|
216
|
+
- **Started:** 2026-05-03T12:00:00Z
|
|
217
|
+
- **Final verdict:** SUCCESS
|
|
218
|
+
- **Iterations:** 4 / 8
|
|
219
|
+
- **Tokens used:** 58000 / 100000
|
|
220
|
+
- **Fixes committed:** 7
|
|
221
|
+
|
|
222
|
+
## Iteration log
|
|
223
|
+
|
|
224
|
+
### Iteration 1
|
|
225
|
+
- Scores: typo=1 colo=1 spat=3 layo=1 shad=3 moti=3 micr=1 cont=1
|
|
226
|
+
- Aggregate: 14/40 (avg 1.75)
|
|
227
|
+
- Pass: NO (failing: typography, color, layout, microcopy, container)
|
|
228
|
+
- Top issues:
|
|
229
|
+
- **color** [critical] blue→purple gradient on hero → src/styles/globals.css
|
|
230
|
+
- **typography** [critical] Inter as primary → src/styles/globals.css
|
|
231
|
+
- **layout** [high] three identical cards → src/pages/index.tsx
|
|
232
|
+
|
|
233
|
+
### Iteration 2
|
|
234
|
+
- Scores: typo=3 colo=3 spat=3 layo=2 shad=3 moti=3 micr=2 cont=2
|
|
235
|
+
- Aggregate: 21/40 (avg 2.62)
|
|
236
|
+
- Pass: NO (failing: layout, microcopy, container)
|
|
237
|
+
- ...
|
|
238
|
+
|
|
239
|
+
### Iteration 3
|
|
240
|
+
- Scores: typo=4 colo=3 spat=3 layo=3 shad=3 moti=3 micr=3 cont=3
|
|
241
|
+
- Aggregate: 25/40 (avg 3.13)
|
|
242
|
+
- Pass: YES
|
|
243
|
+
|
|
244
|
+
## Fix commits (revertable)
|
|
245
|
+
- abc1234 qpl-1: color-gradient-removal — src/styles/globals.css
|
|
246
|
+
- def5678 qpl-1: typography-fraunces — src/styles/globals.css
|
|
247
|
+
- ...
|
|
248
|
+
|
|
249
|
+
## Issue fingerprints (regression tracker)
|
|
250
|
+
- color__globals_css__blue_purple_gradient — iterations [1] — fixed at iter 2
|
|
251
|
+
```
|
|
252
|
+
|
|
253
|
+
## Why three viewports
|
|
254
|
+
|
|
255
|
+
Per the spec's hard constraint (§5g `prefers-reduced-motion` and §5c mobile-only failures), the loop MUST evaluate at mobile (375), tablet (768), and desktop (1440). The aggregate score is the **minimum** across viewports for each dimension — a layout that's elegant on desktop but breaks at 375 is a fail, full stop.
|
|
256
|
+
|
|
257
|
+
This is intentional. Most visual regressions Fawzi has documented in `/insights` (hero videos cropped wrong on mobile, touch targets < 44px on mobile, navigation collapse misbehaving) only show up below 768. Scoring on desktop alone is how we got "looks great in dev" → "looks broken on the user's phone."
|
|
258
|
+
|
|
259
|
+
## What the loop does NOT do (deferred to v5.2)
|
|
260
|
+
|
|
261
|
+
- Cross-browser rendering checks (Firefox / WebKit) — Chromium-only, per `qualia-polish` Stage 4 precedent
|
|
262
|
+
- Accessibility audits beyond what the rubric scores — use `/qualia-polish` Stage 3 (Lighthouse + axe) for that
|
|
263
|
+
- Performance regressions — use `/qualia-polish-loop` only after Lighthouse score passes
|
|
264
|
+
- Reference-image-only mode (compare to a target screenshot without a brief) — currently the brief is required; reference is supplemental
|
|
265
|
+
- Multi-page sweeps — one URL per invocation; chain `/qualia-polish-loop` per route for site-wide passes
|