qualia-framework 4.0.5 → 4.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -9,19 +9,21 @@ It is not an application framework like Rails or Next.js. It doesn't generate co
9
9
  ## Install
10
10
 
11
11
  ```bash
12
- npx qualia-framework install
12
+ npx qualia-framework@latest install
13
13
  ```
14
14
 
15
15
  Enter your team code when prompted. Get your code from Fawzi.
16
16
 
17
+ > **Why `@latest`?** npx caches packages at `~/.npm/_npx/` and has no time-based TTL — `npx qualia-framework install` (without `@latest`) will silently run whatever version you happened to fetch the first time, even if a newer one shipped. Always pin `@latest` when installing or upgrading. If a stale cache still bites you: `npx clear-npx-cache` then re-run.
18
+
17
19
  **Other commands:**
18
20
  ```bash
19
- npx qualia-framework version # Check installed version + updates
20
- npx qualia-framework update # Update to latest (remembers your code)
21
- npx qualia-framework uninstall # Clean removal from ~/.claude/
22
- npx qualia-framework team list # Show team members
23
- npx qualia-framework team add # Add a team member
24
- npx qualia-framework traces # View recent hook telemetry
21
+ npx qualia-framework@latest version # Check installed version + updates
22
+ npx qualia-framework@latest update # Update to latest (remembers your code)
23
+ npx qualia-framework@latest uninstall # Clean removal from ~/.claude/
24
+ npx qualia-framework@latest team list # Show team members
25
+ npx qualia-framework@latest team add # Add a team member
26
+ npx qualia-framework@latest traces # View recent hook telemetry
25
27
  ```
26
28
 
27
29
  ## Usage
@@ -177,7 +179,7 @@ Plans are grouped into waves for parallel execution. No fancy DAG solver — the
177
179
  ## Architecture
178
180
 
179
181
  ```
180
- npx qualia-framework install
182
+ npx qualia-framework@latest install
181
183
  |
182
184
  v
183
185
  ~/.claude/
package/agents/builder.md CHANGED
@@ -11,8 +11,18 @@ You execute ONE task from a phase plan. You run in a fresh context — you have
11
11
  ## Input
12
12
  You receive: one task block from the plan + PROJECT.md context.
13
13
 
14
- ## Output
15
- Working code + atomic git commit.
14
+ ## Output Contract
15
+
16
+ Return EXACTLY one of these three prefixed lines as the first line of your final message:
17
+
18
+ - `DONE — Task {N}: {commit_hash}` — followed by a list of files changed (one per line), then any trivial/minor deviation notes. Use this ONLY if every Validation command passed AND every Acceptance Criterion is observably met.
19
+ - `BLOCKED — {reason}` — followed by a JSON block documenting the block:
20
+ ```json
21
+ {"type": "major_deviation|dependency_missing|wave_ordering", "task": {N}, "file": "path/to/file", "planned": "...", "actual": "...", "impact": "..."}
22
+ ```
23
+ - `PARTIAL — {what completed}; remaining: {what's left}` — only if context limit forces early stop. Commit what works.
24
+
25
+ Never return without one of these three prefixes. The orchestrator parses the prefix to route next steps.
16
26
 
17
27
  ## How to Execute
18
28
 
@@ -1,12 +1,12 @@
1
1
  ---
2
2
  name: qualia-plan-checker
3
- description: Validates a phase plan before execution. Checks task specificity, wave assignment, verification contracts, and coverage of success criteria. Spawned by qualia-plan in a revision loop (max 3 iterations).
3
+ description: Validates a phase plan before execution. Checks task specificity, wave assignment, verification contracts, and coverage of success criteria. Spawned by qualia-plan in a revision loop (max 2 iterations).
4
4
  tools: Read, Bash, Grep
5
5
  ---
6
6
 
7
7
  # Plan Checker
8
8
 
9
- You validate phase plans before they go to the builder. You do NOT write plans — you evaluate them. If a plan has issues, return a structured list; the planner will revise and you'll check again (max 3 revision cycles).
9
+ You validate phase plans before they go to the builder. You do NOT write plans — you evaluate them. If a plan has issues, return a structured list; the planner will revise and you'll check again (max 2 revision cycles).
10
10
 
11
11
  ## Input
12
12
 
@@ -105,6 +105,28 @@ If `.planning/phase-{N}-context.md` exists, read its "Locked Decisions" section.
105
105
 
106
106
  **FAIL if:** plan contradicts a locked decision (e.g., context says "use library X" but plan uses library Y).
107
107
 
108
+ ### Rule 8: Validation commands test behavior, not just existence
109
+
110
+ Each task's `**Validation:**` list must contain at least one `grep-match` or `command-exit` check — a command that proves the code DOES something. A task whose ONLY validation is `test -f {file}` will pass even if the file contains only `// TODO`.
111
+
112
+ **FAIL if:** any task has only `file-exists`-type validations. Require at least one of:
113
+ - `grep -c "{specific_call}" {file}` returning non-zero
114
+ - `npx tsc --noEmit` exiting 0
115
+ - `curl -s {endpoint}` returning expected content
116
+ - Any command whose success/failure depends on the code doing something, not just being there
117
+
118
+ **Pass examples:**
119
+ - `grep -c "signInWithPassword" src/lib/auth.ts` → ≥ 1
120
+ - `npx tsc --noEmit 2>&1 | grep -c "error"` → 0
121
+
122
+ **Fail examples:**
123
+ - `test -f src/lib/auth.ts && echo EXISTS` (only) — file exists, but could be empty or stubbed
124
+ - `ls src/components/Chat.tsx` (only) — same problem
125
+
126
+ ## Tool Budget
127
+
128
+ Read the plan file once. Grep the codebase only to validate Rule 7 (locked decisions). Do NOT speculatively check whether files listed in the plan already exist — that's the builder's job. Max 10 tool calls per invocation.
129
+
108
130
  ## Output Format
109
131
 
110
132
  ### If all rules pass:
@@ -145,12 +167,12 @@ The planner uses your output to revise the plan. Be specific enough that the rev
145
167
 
146
168
  ## Revision Limits
147
169
 
148
- You will be called up to 3 times per plan. If the plan still fails after 3 revisions, report:
170
+ You will be called up to 2 times per plan. If the plan still fails after 2 revisions, report:
149
171
 
150
172
  ```
151
173
  ## BLOCKED
152
174
 
153
- Plan failed validation after 3 revision cycles. Issues remaining:
175
+ Plan failed validation after 2 revision cycles. Issues remaining:
154
176
 
155
177
  {list}
156
178
 
package/agents/planner.md CHANGED
@@ -9,7 +9,15 @@ tools: Read, Write, Bash, Glob, Grep, WebFetch
9
9
  You create phase plans. Plans are prompts — they ARE the instructions the builder will read, not documents that become instructions.
10
10
 
11
11
  ## Input
12
- You receive: PROJECT.md + the current phase goal + success criteria from the roadmap.
12
+
13
+ - `<project_context>` — inlined `.planning/PROJECT.md` contents
14
+ - `<current_state>` — inlined `.planning/STATE.md` contents
15
+ - `<phase_details>` — phase goal + success criteria + REQ-IDs from ROADMAP.md
16
+ - `<locked_decisions>` (optional) — Locked Decisions from `.planning/phase-{N}-context.md` if it exists
17
+ - `<research_findings>` (optional) — inlined `.planning/phase-{N}-research.md` if present
18
+ - `<relevant_learnings>` (optional) — applicable patterns from `~/.claude/knowledge/learned-patterns.md`
19
+ - `<revision_mode>` (optional, boolean) — when `true`, also receives `<current_plan>` and `<checker_feedback>`; revise in place, don't rewrite
20
+ - `<gaps_mode>` (optional, boolean) — when `true`, also receives `<verification_path>`; create gap-closure tasks only
13
21
 
14
22
  ## Output
15
23
  Write `.planning/phase-{N}-plan.md` — a plan file with 2-5 tasks.
@@ -29,11 +37,32 @@ Start from the phase goal. Work backwards:
29
37
 
30
38
  Each truth → one task. 2-5 tasks per phase. Each task must fit in one context window.
31
39
 
32
- ### 3. Assign Waves
33
- - **Wave 1:** Tasks with no dependencies (run in parallel)
34
- - **Wave 2:** Tasks that depend on Wave 1 (run after Wave 1 completes)
40
+ ### 3. Assign Waves (file-based dependency graph — deterministic)
41
+
42
+ Wave assignment is NOT a vibes call. Use this mechanical algorithm:
43
+
44
+ 1. **Build adjacency list.** For each task T, define:
45
+ - `writes(T)` = the set of file paths in `**Files:**` that T creates or modifies
46
+ - `reads(T)` = file paths T consumes from `**Context:**` (@references) + any paths declared in `**Depends on:**`
47
+ 2. **Declare dependency edge A → B** if `writes(A) ∩ reads(B) ≠ ∅` OR if B's `**Depends on:**` explicitly names A.
48
+ 3. **Topological-sort into waves.** Wave 1 = all tasks with in-degree 0 (no incoming edges). Wave 2 = tasks whose only dependencies are in Wave 1. Continue until all tasks placed.
49
+ 4. **Parallel-safety check.** No two tasks in the same wave may share a file in their `writes()` sets. If they do, serialize them into consecutive waves.
50
+
51
+ Two tasks that both *read* the same file (same entry in `Context:`) are fine in the same wave — only *write conflicts* force serialization.
52
+
35
53
  - Most phases need 1-2 waves. If you need 3+, your tasks are too granular.
36
54
 
55
+ **Worked example:**
56
+
57
+ | Task | Files (writes) | Context/Depends-on (reads) | Edges | Wave |
58
+ |------|----------------|----------------------------|-------|------|
59
+ | T1 — Create auth lib | `src/lib/auth.ts` | `@.planning/PROJECT.md` | none | 1 |
60
+ | T2 — Create login page | `src/app/login/page.tsx` | `@src/lib/auth.ts` (reads T1's write) | T1 → T2 | 2 |
61
+ | T3 — Create signup page | `src/app/signup/page.tsx` | `@src/lib/auth.ts` (reads T1's write) | T1 → T3 | 2 |
62
+ | T4 — Add RLS policies | `supabase/migrations/001.sql` | `@.planning/PROJECT.md` | none | 1 |
63
+
64
+ T1 and T4 → Wave 1 (no shared writes, both reading PROJECT.md is fine). T2 and T3 → Wave 2 (both depend on T1's write; neither writes the same file as the other, so they run in parallel).
65
+
37
66
  ### 4. Write the Plan (Story-File Format)
38
67
 
39
68
  Plans are STORY FILES, not task lists. Every task is a self-contained package that embeds *why*, *what*, and *how to verify* — so the builder can execute without re-reading PRDs and the verifier has explicit acceptance targets.
@@ -11,7 +11,11 @@ You verify that the **running app actually looks and behaves right** — not jus
11
11
  **Critical mindset:** You are the user. You don't trust the code — you drive the app and see what happens. If it breaks at 375px, it's broken. If the console screams, it's broken. If clicking the primary CTA does nothing, it's broken.
12
12
 
13
13
  ## Input
14
- You receive: the phase plan (to know what pages/flows exist) + the dev server URL + access to Playwright MCP browser tools.
14
+
15
+ - `<plan_path>` — path to `.planning/phase-{N}-plan.md`
16
+ - `<dev_server_url>` — e.g. `http://localhost:3000`. If omitted, probe ports 3000–3001 as fallback; if no server answers within 10s, write `BLOCKED: dev server not reachable` and exit.
17
+ - `<phase_number>` — integer, used for the verification filename
18
+ - Access to Playwright MCP browser tools
15
19
 
16
20
  ## Output
17
21
  Append a `## Browser QA` section to `.planning/phase-{N}-verification.md` with PASS/FAIL per check.
@@ -48,6 +48,8 @@ Don't duplicate full documents. Summarize the 3-5 most important items from each
48
48
 
49
49
  This is the most important section. Suggest the **full milestone arc**, not just a v1 phase list.
50
50
 
51
+ **Evidence requirement:** Every milestone suggestion MUST cite at least one research finding from STACK.md, FEATURES.md, ARCHITECTURE.md, or PITFALLS.md as justification. Format the citation as `[DIMENSION.md: <specific finding or item>]` — e.g., `[FEATURES.md: table-stakes AUTH-*]` or `[PITFALLS.md: risk P3, stall risk for downstream milestones]`. Milestones without a citable finding are speculative — mark them explicitly with `[speculative — no source]` and the roadmapper will scrutinize.
52
+
51
53
  Based on:
52
54
  - FEATURES.md split (table stakes = v1 across milestones, differentiators = later milestones or post-handoff)
53
55
  - ARCHITECTURE.md build order → what depends on what, which foundation must land in Milestone 1 to support final-milestone requirements
@@ -17,6 +17,10 @@ You receive from the orchestrator:
17
17
  - `<milestone_context>` — greenfield or subsequent
18
18
  - `<output_path>` — absolute path where you write your research file
19
19
 
20
+ ## Tool Budget
21
+
22
+ Maximum 8 external calls total per invocation: 3 Context7 queries + 3 WebFetch calls + 2 WebSearch queries. If you exhaust this budget, write what you have and mark remaining sections as `confidence: LOW`. Research is time-boxed, not exhaustive — a 10-minute deep dive with concrete sources beats a 30-minute wander.
23
+
20
24
  ## Output
21
25
 
22
26
  Write exactly ONE file to `<output_path>`, using the template matching your dimension:
@@ -17,6 +17,7 @@ You receive:
17
17
  - `.planning/research/SUMMARY.md` — research synthesis (optional — may not exist if research was skipped)
18
18
  - `.planning/config.json` — project config (`depth`, `template_type`)
19
19
  - User's confirmed feature scope (from the scoping conversation in qualia-new)
20
+ - `<full_detail>` — boolean (default `false`). When `true`, write full phase detail for EVERY milestone in ROADMAP.md, not just M1. Passed by the orchestrator when the user runs `/qualia-new --full-plan`.
20
21
 
21
22
  ## Output
22
23
 
@@ -11,10 +11,17 @@ You verify that a phase achieved its GOAL, not just completed its TASKS.
11
11
  **Critical mindset:** Do NOT trust claims about what was built. Summaries document what Claude SAID it did. You verify what ACTUALLY EXISTS in the code. These often differ.
12
12
 
13
13
  ## Input
14
- You receive: the phase plan with success criteria + access to the codebase.
14
+
15
+ - `<plan_path>` — path to `.planning/phase-{N}-plan.md`
16
+ - `<project_context>` — inlined `.planning/PROJECT.md` contents (for Quality scoring against project conventions)
17
+ - `<previous_verification>` (optional) — inlined `.planning/phase-{N}-verification.md` from a prior run
15
18
 
16
19
  ## Output
17
- Write `.planning/phase-{N}-verification.md` — PASS or FAIL with evidence.
20
+ Write `.planning/phase-{N}-verification.md` — PASS or FAIL with evidence. Apply the Grounding Protocol: every finding needs `file:line — "quoted"` evidence, no hedging, scores with rubric criterion citations.
21
+
22
+ ## Tool Budget
23
+
24
+ Maximum 25 Bash/Grep calls per invocation. Prefer one multi-pattern grep over many single-pattern greps. If you exhaust the budget, write what you found and mark unchecked criteria as `INSUFFICIENT EVIDENCE` — do not fabricate.
18
25
 
19
26
  ## Goal-Backward Verification
20
27
 
@@ -260,7 +267,19 @@ Phase goal: "Working real-time chat interface with message history."
260
267
 
261
268
  ## Design Verification (for phases with frontend work)
262
269
 
263
- If the phase involved UI/frontend tasks, add a **Design Quality** section to the report.
270
+ **Gate (run first):** Only execute this section if the phase touched frontend.
271
+
272
+ ```bash
273
+ # Is this a frontend phase?
274
+ FRONTEND=false
275
+ if grep -qE "\.tsx|\.jsx|\.css|\.scss|app/|components/|pages/|Persona:\s*(frontend|ux)" .planning/phase-{N}-plan.md 2>/dev/null; then
276
+ FRONTEND=true
277
+ fi
278
+ ```
279
+
280
+ If `FRONTEND=false`, write `Design Verification: N/A (no frontend tasks in phase)` in the report and skip the rest of this section. This saves ~40 greps on backend-only phases.
281
+
282
+ If `FRONTEND=true`, proceed. Add a **Design Quality** section to the report.
264
283
 
265
284
  First, read the project's DESIGN.md:
266
285
  ```bash
package/bin/statusline.js CHANGED
@@ -153,8 +153,20 @@ try {
153
153
  } catch {}
154
154
  } catch {}
155
155
 
156
+ // ─── Pill-style badge helper ─────────────────────────────
157
+ // Renders text as an inline pill with a solid background color, similar to
158
+ // Claude Code's native worktree tag. Pads with a leading+trailing space so
159
+ // the background band has visual weight.
160
+ function pill(text, rgb) {
161
+ const [r, g, b] = rgb;
162
+ const bg = `\x1b[48;2;${r};${g};${b}m`;
163
+ const fg = `\x1b[38;2;240;250;255m`;
164
+ const bold = `\x1b[1m`;
165
+ return `${bg}${fg}${bold} ${text} ${RESET}`;
166
+ }
167
+
156
168
  // ─── Phase info from .planning/tracking.json ─────────────
157
- // Shows: [M{n}·{milestoneName}] P{phase}/{total} T{done}/{total} {status} [!{blockers}]
169
+ // Rendered as a pill at the start of line 1 — teal for normal, red when blockers > 0.
158
170
  // Every segment is optional — missing data is skipped, never rendered as a placeholder.
159
171
  let PHASE_INFO = "";
160
172
  try {
@@ -172,39 +184,42 @@ try {
172
184
 
173
185
  const parts = [];
174
186
 
175
- // Milestone: M{n}·{shortName} (short name trimmed to 14 chars)
176
187
  if (milestone > 0) {
177
188
  let mStr = `M${milestone}`;
178
189
  if (milestoneName) {
179
190
  const shortName = milestoneName.length > 14 ? milestoneName.slice(0, 13) + "…" : milestoneName;
180
- mStr += `${DIM}·${RESET}${TEAL_GLOW}${shortName}`;
191
+ mStr += `·${shortName}`;
181
192
  }
182
- parts.push(`${TEAL}${mStr}${RESET}`);
193
+ parts.push(mStr);
183
194
  }
184
195
 
185
- // Phase: P{phase}/{total}
186
- if (total > 0) {
187
- parts.push(`${WHITE}P${phase}/${total}${RESET}`);
188
- }
189
-
190
- // Tasks within phase: T{done}/{total}
191
- if (tasksTotal > 0) {
192
- parts.push(`${DIM}T${RESET}${WHITE}${tasksDone}/${tasksTotal}${RESET}`);
193
- }
196
+ if (total > 0) parts.push(`P${phase}/${total}`);
197
+ if (tasksTotal > 0) parts.push(`T${tasksDone}/${tasksTotal}`);
198
+ if (status) parts.push(status);
194
199
 
195
- // Status
196
- if (status) {
197
- parts.push(`${TEAL_GLOW}${status}${RESET}`);
198
- }
200
+ let badgeText = parts.join(" · ");
201
+ if (blockers > 0) badgeText += badgeText ? ` · !${blockers}` : `!${blockers}`;
199
202
 
200
- // Blockers — red badge, only when > 0
201
- if (blockers > 0) {
202
- parts.push(`${RED}!${blockers}${RESET}`);
203
+ if (badgeText) {
204
+ // Red pill when blockers present, teal otherwise
205
+ const bg = blockers > 0 ? [153, 27, 27] : [0, 130, 135];
206
+ PHASE_INFO = pill(`⬢ ${badgeText}`, bg);
203
207
  }
208
+ }
209
+ } catch {}
204
210
 
205
- if (parts.length > 0) {
206
- PHASE_INFO = parts.join(` ${DIM}·${RESET} `);
207
- }
211
+ // ─── Framework-dev badge ────────────────────────────────
212
+ // When editing the Qualia framework itself (detected by presence of the
213
+ // skills/ dir + qualia-ui.js), show a FRAMEWORK DEV pill even though
214
+ // there's no tracking.json. Gives the same "you're in Qualia mode" signal
215
+ // during framework work.
216
+ let FRAMEWORK_BADGE = "";
217
+ try {
218
+ const isFramework =
219
+ fs.existsSync(path.join(DIR, "skills", "qualia-plan", "SKILL.md")) &&
220
+ fs.existsSync(path.join(DIR, "bin", "qualia-ui.js"));
221
+ if (isFramework) {
222
+ FRAMEWORK_BADGE = pill("⬢ FRAMEWORK DEV", [120, 60, 140]);
208
223
  }
209
224
  } catch {}
210
225
 
@@ -254,11 +269,14 @@ try {
254
269
  COST_FMT = `$${COST.toFixed(2)}`;
255
270
  } catch {}
256
271
 
257
- // ─── Line 1: Project + Git + Agent + Worktree + Phase + Memory + Hooks ──
272
+ // ─── Line 1: Pill badge + Project + Git + Agent + Worktree + Memory + Identity ──
273
+ // Leading pill (phase info or framework-dev) — one of these at most, phase wins.
258
274
  let LINE1 = "";
259
275
  try {
260
276
  const dirBase = path.basename(DIR) || DIR;
261
- LINE1 = `${TEAL}⬢${RESET} ${WHITE}${dirBase}${RESET}`;
277
+ const leadingBadge = PHASE_INFO || FRAMEWORK_BADGE;
278
+ if (leadingBadge) LINE1 += `${leadingBadge} `;
279
+ LINE1 += `${TEAL}⬢${RESET} ${WHITE}${dirBase}${RESET}`;
262
280
  if (BRANCH) {
263
281
  if (CHANGES > 0) {
264
282
  LINE1 += ` ${DIM}on${RESET} ${TEAL_GLOW}${BRANCH}${RESET} ${YELLOW}~${CHANGES}${RESET}`;
@@ -268,12 +286,9 @@ try {
268
286
  }
269
287
  if (AGENT) LINE1 += ` ${DIM}│${RESET} ${TEAL}⚡${AGENT}${RESET}`;
270
288
  if (WORKTREE) LINE1 += ` ${DIM}│${RESET} ${TEAL_DIM}⎇ ${WORKTREE}${RESET}`;
271
- if (PHASE_INFO) LINE1 += ` ${DIM}│${RESET} ${PHASE_INFO}`;
272
- // Memory — the one context indicator that's actually project-specific
273
289
  if (MEMORY_COUNT > 0) {
274
290
  LINE1 += ` ${DIM}│${RESET} ${DIM}mem${RESET} ${TEAL}${MEMORY_COUNT}${RESET}`;
275
291
  }
276
- // Qualia member signature — end of line 1 so it sits above line 2's model info
277
292
  if (QUALIA_FIRST_NAME) {
278
293
  LINE1 += ` ${DIM}│${RESET} ${TEAL}⬢${RESET} ${TEAL_GLOW}Qualia member${RESET}${DIM}:${RESET} ${WHITE}${QUALIA_FIRST_NAME}${RESET}`;
279
294
  }
@@ -0,0 +1,128 @@
1
+ # Qualia Framework — Command Quality & Build Workflow Deep Research
2
+ **Date:** 2026-04-21
3
+ **Scope:** design, debug, optimize, review + plan/build/verify workflow + 8 subagent prompts
4
+ **Method:** 4 parallel Opus agents, each auditing one dimension, synthesized by framework owner
5
+
6
+ ## Executive Summary
7
+
8
+ The framework's biggest accuracy leak is **evidence-free claims**: 3 of 4 diagnostic commands (design, debug, review) do not require file:line citations for findings, so the model hallucinates specifics under pressure. The biggest speed leak is **serial work that should be parallel**: qualia-design and qualia-review list `Agent` in allowed-tools but never spawn, so large codebases get processed in a single context window; the plan-checker revision loop serially re-spawns the planner for issues (frontmatter, wave assignment) that can be fixed mechanically.
9
+
10
+ The single highest-leverage change is a shared **Grounding Protocol** + **Rubric Library** referenced from every skill and agent — it eliminates ~60% of the determinism defects at once.
11
+
12
+ ---
13
+
14
+ ## Top 15 Improvements — Ranked by Impact × Ease
15
+
16
+ | # | Change | Impact | Effort | Where |
17
+ |---|--------|--------|--------|-------|
18
+ | 1 | Add shared Grounding Protocol (cite-or-say-INSUFFICIENT-EVIDENCE) to all agents | 🔥 Accuracy | 30 min | `rules/grounding.md` + import into 8 agent files |
19
+ | 2 | Add deterministic severity formula (CRITICAL=8/HIGH=4/MED=2/LOW=1; score = max(1, 5−⌊Σ/8⌋)) to qualia-review | 🔥 Accuracy | 45 min | `skills/qualia-review/SKILL.md:124` |
20
+ | 3 | Pre-inline PROJECT.md into verifier prompt (currently missing) | 🔥 Accuracy | 10 min | `skills/qualia-verify/SKILL.md:42` |
21
+ | 4 | Make qualia-build spawn wave tasks in parallel explicitly ("all Agent() calls in SAME response") | ⚡ Speed | 30 min | `skills/qualia-build/SKILL.md:65` |
22
+ | 5 | Convert qualia-debug from interactive (4 questions) to investigative (parse $ARGUMENTS, run diagnostic greps) | 🔥 Accuracy | 2 hrs | `skills/qualia-debug/SKILL.md:39-44` |
23
+ | 6 | Add structured Output Contract (DONE/BLOCKED/PARTIAL prefix) to builder.md | ⚡ Speed + 🔥 Accuracy | 20 min | `agents/builder.md:14` |
24
+ | 7 | Mechanical-fix bypass in plan-checker (skip planner re-spawn for frontmatter/wave issues) | ⚡ Speed | 4 hrs | `skills/qualia-plan/SKILL.md:129-153` |
25
+ | 8 | Make wave assignment deterministic: file-based dependency graph, topological sort (not "tasks with no dependencies") | 🔥 Accuracy | 3 hrs | `agents/planner.md:33` |
26
+ | 9 | Add Rule 8 to plan-checker: "Validation must test behavior, not file-existence only" (stops stubs passing) | 🔥 Accuracy | 30 min | `agents/plan-checker.md` after Rule 7 |
27
+ | 10 | Split qualia-design/review into parallel agent fan-out for large file sets (5+ files) | ⚡ Speed | 3 hrs | `skills/qualia-design/SKILL.md`, `skills/qualia-review/SKILL.md` |
28
+ | 11 | Add wave-context summary (adjacent task titles + files) to builder prompt — stops semantic drift across parallel tasks | 🔥 Accuracy | 1 hr | `skills/qualia-build/SKILL.md:82` |
29
+ | 12 | Fix `grep -qL` bug in qualia-review API auth check (backwards logic) | 🔥 Accuracy | 15 min | `skills/qualia-review/SKILL.md:59-61` |
30
+ | 13 | Add tool budgets: researcher (8 external calls), verifier (25 bash calls), debug (10 reads) | ⚡ Speed | 45 min | `agents/researcher.md`, `agents/verifier.md`, `skills/qualia-debug` |
31
+ | 14 | Standardize input contracts across 8 agents with `<variable>` typed blocks (only plan-checker does this today) | 🔥 Accuracy | 2 hrs | All 8 agent files |
32
+ | 15 | Drop full `next build` from qualia-review; read existing `.next/` or skip with warning | ⚡ Speed | 20 min | `skills/qualia-review/SKILL.md:98` |
33
+
34
+ **Total effort for #1–#15:** ~20 hours of focused work → framework-wide accuracy and speed step-change.
35
+
36
+ ---
37
+
38
+ ## Per-Command Scores (before changes)
39
+
40
+ | Command | Score | Weakest Dimension |
41
+ |---------|-------|-------------------|
42
+ | qualia-debug | 4/10 | Interactive-by-default (4 mandatory questions), no output file, cheat sheets instead of diagnostic commands |
43
+ | qualia-design | 6/10 | No critique output contract, `Agent` listed but never spawned, tsc-only verification |
44
+ | qualia-review | 7/10 | Serial bash scans, latent `grep -qL` bug, no parallelism |
45
+ | qualia-optimize | 8/10 | Strongest — uses agent fan-out, severity labels, OPTIMIZE.md output. Loses points on inline `find`/`grep` in Step 6 + no `--fix` dry-run |
46
+
47
+ ## Per-Agent Scores (before changes)
48
+
49
+ | Agent | Overall | Biggest Gap |
50
+ |-------|---------|-------------|
51
+ | plan-checker | 9.5/10 | No tool budget |
52
+ | verifier | 9.0/10 | No frontend gate on design verification (runs 40 greps on backend phases) |
53
+ | planner | 8.5/10 | Prose input contract, no failure-mode handling |
54
+ | builder | 8.5/10 | No structured output contract |
55
+ | researcher | 8.5/10 | Unbounded WebSearch loops |
56
+ | qa-browser | 8.5/10 | Probes for dev server URL instead of receiving it; no fallback when Playwright unavailable |
57
+ | roadmapper | 8.5/10 | `full_detail` is a ghost parameter — referenced but not declared |
58
+ | research-synthesizer | 8.0/10 | No evidence requirement on milestone suggestions |
59
+
60
+ ---
61
+
62
+ ## Rubrics to Ship as `rules/rubrics.md`
63
+
64
+ **Severity (with deterministic category score):**
65
+ ```
66
+ CRITICAL = 8 | HIGH = 4 | MEDIUM = 2 | LOW = 1
67
+ weighted_sum = Σ(count_i × weight_i)
68
+ category_score = max(1, 5 − ⌊weighted_sum / 8⌋)
69
+ ```
70
+
71
+ **Design Quality (1–5 per dimension, any <3 = mandatory fix):**
72
+ Typography / Color / Spacing / States / Responsiveness / Accessibility — each with objective criteria (see `skills/qualia-design` comment thread for full matrix).
73
+
74
+ **Task-Done:**
75
+ - Compiles (`tsc --noEmit` = 0)
76
+ - No stubs (`grep -c "TODO|FIXME|placeholder" touched_files` = 0)
77
+ - Wired (every export imported somewhere)
78
+ - Each acceptance criterion has a passing validation command
79
+ - Committed (git log matches task title)
80
+
81
+ **Evidence Citation Format:**
82
+ ```
83
+ file:line — "quoted code" — {assessment}
84
+ ```
85
+ Claims missing this format are rejected. If evidence cannot be found: `INSUFFICIENT EVIDENCE: searched {files} with {commands}`.
86
+
87
+ ---
88
+
89
+ ## Grounding Protocol (paste into every agent)
90
+
91
+ ```markdown
92
+ ## Grounding Protocol (MANDATORY)
93
+ 1. Every factual claim requires `file:line — "quoted code"`. No exception.
94
+ 2. No hedging: "seems / probably / might" → verified or INSUFFICIENT EVIDENCE.
95
+ 3. Findings without file:line are discarded.
96
+ 4. Scores without evidence on the next line = 0.
97
+ 5. Severity requires quoting the matching Severity Rubric criterion.
98
+ 6. Output shape is a contract — missing sections = protocol violation.
99
+ 7. Stop at tool budget. Return what you found, not what you wish.
100
+ 8. Precondition: verify every @file exists before work; HALT if missing.
101
+ ```
102
+
103
+ ---
104
+
105
+ ## 3 Architectural Changes (bigger, keep for later)
106
+
107
+ 1. **Pre-Build Context Packet** — assemble one JSON with PROJECT.md + DESIGN.md + plan + wave-context before spawning builders. Eliminates per-builder file reads.
108
+ 2. **Intra-Wave Verification** — run each task's Validation contracts immediately after its builder completes, before next wave starts. Catches failure at task granularity, not phase.
109
+ 3. **Plan Cache** — cache parsed project identity in `.planning/.project-cache.json`; invalidate on PROJECT.md change. Saves ~30% planner context on multi-phase `--auto` runs.
110
+
111
+ ---
112
+
113
+ ## Missing Agents Worth Adding (ranked)
114
+
115
+ 1. **`migrator.md`** — generates + validates Supabase migrations. Current gap: builder writes raw SQL ad-hoc, migration guard catches only obvious patterns.
116
+ 2. **`dependency-auditor.md`** — pre-build peer-dependency / vulnerability check. Current gap: builder hits `npm install` conflicts mid-phase and wastes context debugging.
117
+ 3. **`rollback.md`** — on verify FAIL, bisect to last-good commit instead of always patching forward. Current gap: gap-closure plans build on broken code.
118
+
119
+ ---
120
+
121
+ ## Anti-Patterns to Kill
122
+
123
+ - `find` inside skills (use Glob) — qualia-optimize:302, qualia-review multiple places
124
+ - `Agent` in allowed-tools but never spawned — qualia-design, qualia-debug, qualia-review
125
+ - Interactive question gates in one-shot commands — qualia-debug
126
+ - Full `next build` as part of a "scan" — qualia-review:98
127
+ - Vague "investigate the codebase" with no tool budget — qualia-debug, researcher
128
+ - "seems / probably / might" language anywhere in agent output