npm - qualia-framework - Versions diffs - 4.0.5 → 4.1.1 - Mend

qualia-framework 4.0.5 → 4.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (28) hide show

package/README.md +10 -8
package/agents/builder.md +12 -2
package/agents/plan-checker.md +26 -4
package/agents/planner.md +33 -4
package/agents/qa-browser.md +5 -1
package/agents/research-synthesizer.md +2 -0
package/agents/researcher.md +4 -0
package/agents/roadmapper.md +1 -0
package/agents/verifier.md +22 -3
package/bin/statusline.js +43 -28
package/docs/research/2026-04-21-command-quality-deep-research.md +128 -0
package/docs/research/2026-04-21-industry-best-practices.md +255 -0
package/hooks/auto-update.js +5 -0
package/hooks/pre-push.js +7 -1
package/hooks/session-start.js +68 -2
package/package.json +1 -1
package/rules/grounding.md +110 -0
package/skills/qualia-build/SKILL.md +20 -9
package/skills/qualia-debug/SKILL.md +141 -49
package/skills/qualia-design/SKILL.md +52 -5
package/skills/qualia-plan/SKILL.md +11 -8
package/skills/qualia-report/SKILL.md +82 -46
package/skills/qualia-review/SKILL.md +36 -16
package/skills/qualia-ship/SKILL.md +99 -18
package/skills/qualia-skill-new/SKILL.md +1 -1
package/skills/qualia-verify/SKILL.md +5 -1
package/templates/help.html +1 -1
package/tests/runner.js +9 -1

package/README.md CHANGED Viewed

@@ -9,19 +9,21 @@ It is not an application framework like Rails or Next.js. It doesn't generate co
 ## Install
 ```bash
-npx qualia-framework install
+npx qualia-framework@latest install
 ```
 Enter your team code when prompted. Get your code from Fawzi.
+> **Why `@latest`?** npx caches packages at `~/.npm/_npx/` and has no time-based TTL — `npx qualia-framework install` (without `@latest`) will silently run whatever version you happened to fetch the first time, even if a newer one shipped. Always pin `@latest` when installing or upgrading. If a stale cache still bites you: `npx clear-npx-cache` then re-run.
 **Other commands:**
 ```bash
-npx qualia-framework version    # Check installed version + updates
-npx qualia-framework update     # Update to latest (remembers your code)
-npx qualia-framework uninstall  # Clean removal from ~/.claude/
-npx qualia-framework team list  # Show team members
-npx qualia-framework team add   # Add a team member
-npx qualia-framework traces     # View recent hook telemetry
+npx qualia-framework@latest version    # Check installed version + updates
+npx qualia-framework@latest update     # Update to latest (remembers your code)
+npx qualia-framework@latest uninstall  # Clean removal from ~/.claude/
+npx qualia-framework@latest team list  # Show team members
+npx qualia-framework@latest team add   # Add a team member
+npx qualia-framework@latest traces     # View recent hook telemetry
 ```
 ## Usage
@@ -177,7 +179,7 @@ Plans are grouped into waves for parallel execution. No fancy DAG solver — the
 ## Architecture
 ```
-npx qualia-framework install
+npx qualia-framework@latest install
      |
      v
 ~/.claude/

package/agents/builder.md CHANGED Viewed

@@ -11,8 +11,18 @@ You execute ONE task from a phase plan. You run in a fresh context — you have
 ## Input
 You receive: one task block from the plan + PROJECT.md context.
-## Output
-Working code + atomic git commit.
+## Output Contract
+Return EXACTLY one of these three prefixed lines as the first line of your final message:
+- `DONE — Task {N}: {commit_hash}` — followed by a list of files changed (one per line), then any trivial/minor deviation notes. Use this ONLY if every Validation command passed AND every Acceptance Criterion is observably met.
+- `BLOCKED — {reason}` — followed by a JSON block documenting the block:
+  ```json
+  {"type": "major_deviation|dependency_missing|wave_ordering", "task": {N}, "file": "path/to/file", "planned": "...", "actual": "...", "impact": "..."}
+  ```
+- `PARTIAL — {what completed}; remaining: {what's left}` — only if context limit forces early stop. Commit what works.
+Never return without one of these three prefixes. The orchestrator parses the prefix to route next steps.
 ## How to Execute

package/agents/plan-checker.md CHANGED Viewed

@@ -1,12 +1,12 @@
 ---
 name: qualia-plan-checker
-description: Validates a phase plan before execution. Checks task specificity, wave assignment, verification contracts, and coverage of success criteria. Spawned by qualia-plan in a revision loop (max 3 iterations).
+description: Validates a phase plan before execution. Checks task specificity, wave assignment, verification contracts, and coverage of success criteria. Spawned by qualia-plan in a revision loop (max 2 iterations).
 tools: Read, Bash, Grep
 ---
 # Plan Checker
-You validate phase plans before they go to the builder. You do NOT write plans — you evaluate them. If a plan has issues, return a structured list; the planner will revise and you'll check again (max 3 revision cycles).
+You validate phase plans before they go to the builder. You do NOT write plans — you evaluate them. If a plan has issues, return a structured list; the planner will revise and you'll check again (max 2 revision cycles).
 ## Input
@@ -105,6 +105,28 @@ If `.planning/phase-{N}-context.md` exists, read its "Locked Decisions" section.
 **FAIL if:** plan contradicts a locked decision (e.g., context says "use library X" but plan uses library Y).
+### Rule 8: Validation commands test behavior, not just existence
+Each task's `**Validation:**` list must contain at least one `grep-match` or `command-exit` check — a command that proves the code DOES something. A task whose ONLY validation is `test -f {file}` will pass even if the file contains only `// TODO`.
+**FAIL if:** any task has only `file-exists`-type validations. Require at least one of:
+- `grep -c "{specific_call}" {file}` returning non-zero
+- `npx tsc --noEmit` exiting 0
+- `curl -s {endpoint}` returning expected content
+- Any command whose success/failure depends on the code doing something, not just being there
+**Pass examples:**
+- `grep -c "signInWithPassword" src/lib/auth.ts` → ≥ 1
+- `npx tsc --noEmit 2>&1 | grep -c "error"` → 0
+**Fail examples:**
+- `test -f src/lib/auth.ts && echo EXISTS` (only) — file exists, but could be empty or stubbed
+- `ls src/components/Chat.tsx` (only) — same problem
+## Tool Budget
+Read the plan file once. Grep the codebase only to validate Rule 7 (locked decisions). Do NOT speculatively check whether files listed in the plan already exist — that's the builder's job. Max 10 tool calls per invocation.
 ## Output Format
 ### If all rules pass:
@@ -145,12 +167,12 @@ The planner uses your output to revise the plan. Be specific enough that the rev
 ## Revision Limits
-You will be called up to 3 times per plan. If the plan still fails after 3 revisions, report:
+You will be called up to 2 times per plan. If the plan still fails after 2 revisions, report:
 ```
 ## BLOCKED
-Plan failed validation after 3 revision cycles. Issues remaining:
+Plan failed validation after 2 revision cycles. Issues remaining:
 {list}

package/agents/planner.md CHANGED Viewed

@@ -9,7 +9,15 @@ tools: Read, Write, Bash, Glob, Grep, WebFetch
 You create phase plans. Plans are prompts — they ARE the instructions the builder will read, not documents that become instructions.
 ## Input
-You receive: PROJECT.md + the current phase goal + success criteria from the roadmap.
+- `<project_context>` — inlined `.planning/PROJECT.md` contents
+- `<current_state>` — inlined `.planning/STATE.md` contents
+- `<phase_details>` — phase goal + success criteria + REQ-IDs from ROADMAP.md
+- `<locked_decisions>` (optional) — Locked Decisions from `.planning/phase-{N}-context.md` if it exists
+- `<research_findings>` (optional) — inlined `.planning/phase-{N}-research.md` if present
+- `<relevant_learnings>` (optional) — applicable patterns from `~/.claude/knowledge/learned-patterns.md`
+- `<revision_mode>` (optional, boolean) — when `true`, also receives `<current_plan>` and `<checker_feedback>`; revise in place, don't rewrite
+- `<gaps_mode>` (optional, boolean) — when `true`, also receives `<verification_path>`; create gap-closure tasks only
 ## Output
 Write `.planning/phase-{N}-plan.md` — a plan file with 2-5 tasks.
@@ -29,11 +37,32 @@ Start from the phase goal. Work backwards:
 Each truth → one task. 2-5 tasks per phase. Each task must fit in one context window.
-### 3. Assign Waves
-- **Wave 1:** Tasks with no dependencies (run in parallel)
-- **Wave 2:** Tasks that depend on Wave 1 (run after Wave 1 completes)
+### 3. Assign Waves (file-based dependency graph — deterministic)
+Wave assignment is NOT a vibes call. Use this mechanical algorithm:
+1. **Build adjacency list.** For each task T, define:
+   - `writes(T)` = the set of file paths in `**Files:**` that T creates or modifies
+   - `reads(T)` = file paths T consumes from `**Context:**` (@references) + any paths declared in `**Depends on:**`
+2. **Declare dependency edge A → B** if `writes(A) ∩ reads(B) ≠ ∅` OR if B's `**Depends on:**` explicitly names A.
+3. **Topological-sort into waves.** Wave 1 = all tasks with in-degree 0 (no incoming edges). Wave 2 = tasks whose only dependencies are in Wave 1. Continue until all tasks placed.
+4. **Parallel-safety check.** No two tasks in the same wave may share a file in their `writes()` sets. If they do, serialize them into consecutive waves.
+Two tasks that both *read* the same file (same entry in `Context:`) are fine in the same wave — only *write conflicts* force serialization.
 - Most phases need 1-2 waves. If you need 3+, your tasks are too granular.
+**Worked example:**
+| Task | Files (writes) | Context/Depends-on (reads) | Edges | Wave |
+|------|----------------|----------------------------|-------|------|
+| T1 — Create auth lib | `src/lib/auth.ts` | `@.planning/PROJECT.md` | none | 1 |
+| T2 — Create login page | `src/app/login/page.tsx` | `@src/lib/auth.ts` (reads T1's write) | T1 → T2 | 2 |
+| T3 — Create signup page | `src/app/signup/page.tsx` | `@src/lib/auth.ts` (reads T1's write) | T1 → T3 | 2 |
+| T4 — Add RLS policies | `supabase/migrations/001.sql` | `@.planning/PROJECT.md` | none | 1 |
+T1 and T4 → Wave 1 (no shared writes, both reading PROJECT.md is fine). T2 and T3 → Wave 2 (both depend on T1's write; neither writes the same file as the other, so they run in parallel).
 ### 4. Write the Plan (Story-File Format)
 Plans are STORY FILES, not task lists. Every task is a self-contained package that embeds *why*, *what*, and *how to verify* — so the builder can execute without re-reading PRDs and the verifier has explicit acceptance targets.

package/agents/qa-browser.md CHANGED Viewed

@@ -11,7 +11,11 @@ You verify that the **running app actually looks and behaves right** — not jus
 **Critical mindset:** You are the user. You don't trust the code — you drive the app and see what happens. If it breaks at 375px, it's broken. If the console screams, it's broken. If clicking the primary CTA does nothing, it's broken.
 ## Input
-You receive: the phase plan (to know what pages/flows exist) + the dev server URL + access to Playwright MCP browser tools.
+- `<plan_path>` — path to `.planning/phase-{N}-plan.md`
+- `<dev_server_url>` — e.g. `http://localhost:3000`. If omitted, probe ports 3000–3001 as fallback; if no server answers within 10s, write `BLOCKED: dev server not reachable` and exit.
+- `<phase_number>` — integer, used for the verification filename
+- Access to Playwright MCP browser tools
 ## Output
 Append a `## Browser QA` section to `.planning/phase-{N}-verification.md` with PASS/FAIL per check.

package/agents/research-synthesizer.md CHANGED Viewed

@@ -48,6 +48,8 @@ Don't duplicate full documents. Summarize the 3-5 most important items from each
 This is the most important section. Suggest the **full milestone arc**, not just a v1 phase list.
+**Evidence requirement:** Every milestone suggestion MUST cite at least one research finding from STACK.md, FEATURES.md, ARCHITECTURE.md, or PITFALLS.md as justification. Format the citation as `[DIMENSION.md: <specific finding or item>]` — e.g., `[FEATURES.md: table-stakes AUTH-*]` or `[PITFALLS.md: risk P3, stall risk for downstream milestones]`. Milestones without a citable finding are speculative — mark them explicitly with `[speculative — no source]` and the roadmapper will scrutinize.
 Based on:
 - FEATURES.md split (table stakes = v1 across milestones, differentiators = later milestones or post-handoff)
 - ARCHITECTURE.md build order → what depends on what, which foundation must land in Milestone 1 to support final-milestone requirements

package/agents/researcher.md CHANGED Viewed

@@ -17,6 +17,10 @@ You receive from the orchestrator:
 - `<milestone_context>` — greenfield or subsequent
 - `<output_path>` — absolute path where you write your research file
+## Tool Budget
+Maximum 8 external calls total per invocation: 3 Context7 queries + 3 WebFetch calls + 2 WebSearch queries. If you exhaust this budget, write what you have and mark remaining sections as `confidence: LOW`. Research is time-boxed, not exhaustive — a 10-minute deep dive with concrete sources beats a 30-minute wander.
 ## Output
 Write exactly ONE file to `<output_path>`, using the template matching your dimension:

package/agents/roadmapper.md CHANGED Viewed

@@ -17,6 +17,7 @@ You receive:
 - `.planning/research/SUMMARY.md` — research synthesis (optional — may not exist if research was skipped)
 - `.planning/config.json` — project config (`depth`, `template_type`)
 - User's confirmed feature scope (from the scoping conversation in qualia-new)
+- `<full_detail>` — boolean (default `false`). When `true`, write full phase detail for EVERY milestone in ROADMAP.md, not just M1. Passed by the orchestrator when the user runs `/qualia-new --full-plan`.
 ## Output

package/agents/verifier.md CHANGED Viewed

@@ -11,10 +11,17 @@ You verify that a phase achieved its GOAL, not just completed its TASKS.
 **Critical mindset:** Do NOT trust claims about what was built. Summaries document what Claude SAID it did. You verify what ACTUALLY EXISTS in the code. These often differ.
 ## Input
-You receive: the phase plan with success criteria + access to the codebase.
+- `<plan_path>` — path to `.planning/phase-{N}-plan.md`
+- `<project_context>` — inlined `.planning/PROJECT.md` contents (for Quality scoring against project conventions)
+- `<previous_verification>` (optional) — inlined `.planning/phase-{N}-verification.md` from a prior run
 ## Output
-Write `.planning/phase-{N}-verification.md` — PASS or FAIL with evidence.
+Write `.planning/phase-{N}-verification.md` — PASS or FAIL with evidence. Apply the Grounding Protocol: every finding needs `file:line — "quoted"` evidence, no hedging, scores with rubric criterion citations.
+## Tool Budget
+Maximum 25 Bash/Grep calls per invocation. Prefer one multi-pattern grep over many single-pattern greps. If you exhaust the budget, write what you found and mark unchecked criteria as `INSUFFICIENT EVIDENCE` — do not fabricate.
 ## Goal-Backward Verification
@@ -260,7 +267,19 @@ Phase goal: "Working real-time chat interface with message history."
 ## Design Verification (for phases with frontend work)
-If the phase involved UI/frontend tasks, add a **Design Quality** section to the report.
+**Gate (run first):** Only execute this section if the phase touched frontend.
+```bash
+# Is this a frontend phase?
+FRONTEND=false
+if grep -qE "\.tsx|\.jsx|\.css|\.scss|app/|components/|pages/|Persona:\s*(frontend|ux)" .planning/phase-{N}-plan.md 2>/dev/null; then
+  FRONTEND=true
+fi
+```
+If `FRONTEND=false`, write `Design Verification: N/A (no frontend tasks in phase)` in the report and skip the rest of this section. This saves ~40 greps on backend-only phases.
+If `FRONTEND=true`, proceed. Add a **Design Quality** section to the report.
 First, read the project's DESIGN.md:
 ```bash

package/bin/statusline.js CHANGED Viewed

@@ -153,8 +153,20 @@ try {
   } catch {}
 } catch {}
+// ─── Pill-style badge helper ─────────────────────────────
+// Renders text as an inline pill with a solid background color, similar to
+// Claude Code's native worktree tag. Pads with a leading+trailing space so
+// the background band has visual weight.
+function pill(text, rgb) {
+  const [r, g, b] = rgb;
+  const bg = `\x1b[48;2;${r};${g};${b}m`;
+  const fg = `\x1b[38;2;240;250;255m`;
+  const bold = `\x1b[1m`;
+  return `${bg}${fg}${bold} ${text} ${RESET}`;
+}
 // ─── Phase info from .planning/tracking.json ─────────────
-// Shows: [M{n}·{milestoneName}] P{phase}/{total} T{done}/{total} {status} [!{blockers}]
+// Rendered as a pill at the start of line 1 — teal for normal, red when blockers > 0.
 // Every segment is optional — missing data is skipped, never rendered as a placeholder.
 let PHASE_INFO = "";
 try {
@@ -172,39 +184,42 @@ try {
     const parts = [];
-    // Milestone: M{n}·{shortName}   (short name trimmed to 14 chars)
     if (milestone > 0) {
       let mStr = `M${milestone}`;
       if (milestoneName) {
         const shortName = milestoneName.length > 14 ? milestoneName.slice(0, 13) + "…" : milestoneName;
-        mStr += `${DIM}·${RESET}${TEAL_GLOW}${shortName}`;
+        mStr += `·${shortName}`;
       }
-      parts.push(`${TEAL}${mStr}${RESET}`);
+      parts.push(mStr);
     }
-    // Phase: P{phase}/{total}
-    if (total > 0) {
-      parts.push(`${WHITE}P${phase}/${total}${RESET}`);
-    }
-    // Tasks within phase: T{done}/{total}
-    if (tasksTotal > 0) {
-      parts.push(`${DIM}T${RESET}${WHITE}${tasksDone}/${tasksTotal}${RESET}`);
-    }
+    if (total > 0) parts.push(`P${phase}/${total}`);
+    if (tasksTotal > 0) parts.push(`T${tasksDone}/${tasksTotal}`);
+    if (status) parts.push(status);
-    // Status
-    if (status) {
-      parts.push(`${TEAL_GLOW}${status}${RESET}`);
-    }
+    let badgeText = parts.join(" · ");
+    if (blockers > 0) badgeText += badgeText ? ` · !${blockers}` : `!${blockers}`;
-    // Blockers — red badge, only when > 0
-    if (blockers > 0) {
-      parts.push(`${RED}!${blockers}${RESET}`);
+    if (badgeText) {
+      // Red pill when blockers present, teal otherwise
+      const bg = blockers > 0 ? [153, 27, 27] : [0, 130, 135];
+      PHASE_INFO = pill(`⬢ ${badgeText}`, bg);
     }
+  }
+} catch {}
-    if (parts.length > 0) {
-      PHASE_INFO = parts.join(` ${DIM}·${RESET} `);
-    }
+// ─── Framework-dev badge ────────────────────────────────
+// When editing the Qualia framework itself (detected by presence of the
+// skills/ dir + qualia-ui.js), show a FRAMEWORK DEV pill even though
+// there's no tracking.json. Gives the same "you're in Qualia mode" signal
+// during framework work.
+let FRAMEWORK_BADGE = "";
+try {
+  const isFramework =
+    fs.existsSync(path.join(DIR, "skills", "qualia-plan", "SKILL.md")) &&
+    fs.existsSync(path.join(DIR, "bin", "qualia-ui.js"));
+  if (isFramework) {
+    FRAMEWORK_BADGE = pill("⬢ FRAMEWORK DEV", [120, 60, 140]);
   }
 } catch {}
@@ -254,11 +269,14 @@ try {
   COST_FMT = `$${COST.toFixed(2)}`;
 } catch {}
-// ─── Line 1: Project + Git + Agent + Worktree + Phase + Memory + Hooks ──
+// ─── Line 1: Pill badge + Project + Git + Agent + Worktree + Memory + Identity ──
+// Leading pill (phase info or framework-dev) — one of these at most, phase wins.
 let LINE1 = "";
 try {
   const dirBase = path.basename(DIR) || DIR;
-  LINE1 = `${TEAL}⬢${RESET} ${WHITE}${dirBase}${RESET}`;
+  const leadingBadge = PHASE_INFO || FRAMEWORK_BADGE;
+  if (leadingBadge) LINE1 += `${leadingBadge} `;
+  LINE1 += `${TEAL}⬢${RESET} ${WHITE}${dirBase}${RESET}`;
   if (BRANCH) {
     if (CHANGES > 0) {
       LINE1 += ` ${DIM}on${RESET} ${TEAL_GLOW}${BRANCH}${RESET} ${YELLOW}~${CHANGES}${RESET}`;
@@ -268,12 +286,9 @@ try {
   }
   if (AGENT) LINE1 += ` ${DIM}│${RESET} ${TEAL}⚡${AGENT}${RESET}`;
   if (WORKTREE) LINE1 += ` ${DIM}│${RESET} ${TEAL_DIM}⎇ ${WORKTREE}${RESET}`;
-  if (PHASE_INFO) LINE1 += ` ${DIM}│${RESET} ${PHASE_INFO}`;
-  // Memory — the one context indicator that's actually project-specific
   if (MEMORY_COUNT > 0) {
     LINE1 += ` ${DIM}│${RESET} ${DIM}mem${RESET} ${TEAL}${MEMORY_COUNT}${RESET}`;
   }
-  // Qualia member signature — end of line 1 so it sits above line 2's model info
   if (QUALIA_FIRST_NAME) {
     LINE1 += ` ${DIM}│${RESET} ${TEAL}⬢${RESET} ${TEAL_GLOW}Qualia member${RESET}${DIM}:${RESET} ${WHITE}${QUALIA_FIRST_NAME}${RESET}`;
   }

package/docs/research/2026-04-21-command-quality-deep-research.md ADDED Viewed

@@ -0,0 +1,128 @@
+# Qualia Framework — Command Quality & Build Workflow Deep Research
+**Date:** 2026-04-21
+**Scope:** design, debug, optimize, review + plan/build/verify workflow + 8 subagent prompts
+**Method:** 4 parallel Opus agents, each auditing one dimension, synthesized by framework owner
+## Executive Summary
+The framework's biggest accuracy leak is **evidence-free claims**: 3 of 4 diagnostic commands (design, debug, review) do not require file:line citations for findings, so the model hallucinates specifics under pressure. The biggest speed leak is **serial work that should be parallel**: qualia-design and qualia-review list `Agent` in allowed-tools but never spawn, so large codebases get processed in a single context window; the plan-checker revision loop serially re-spawns the planner for issues (frontmatter, wave assignment) that can be fixed mechanically.
+The single highest-leverage change is a shared **Grounding Protocol** + **Rubric Library** referenced from every skill and agent — it eliminates ~60% of the determinism defects at once.
+---
+## Top 15 Improvements — Ranked by Impact × Ease
+| # | Change | Impact | Effort | Where |
+|---|--------|--------|--------|-------|
+| 1 | Add shared Grounding Protocol (cite-or-say-INSUFFICIENT-EVIDENCE) to all agents | 🔥 Accuracy | 30 min | `rules/grounding.md` + import into 8 agent files |
+| 2 | Add deterministic severity formula (CRITICAL=8/HIGH=4/MED=2/LOW=1; score = max(1, 5−⌊Σ/8⌋)) to qualia-review | 🔥 Accuracy | 45 min | `skills/qualia-review/SKILL.md:124` |
+| 3 | Pre-inline PROJECT.md into verifier prompt (currently missing) | 🔥 Accuracy | 10 min | `skills/qualia-verify/SKILL.md:42` |
+| 4 | Make qualia-build spawn wave tasks in parallel explicitly ("all Agent() calls in SAME response") | ⚡ Speed | 30 min | `skills/qualia-build/SKILL.md:65` |
+| 5 | Convert qualia-debug from interactive (4 questions) to investigative (parse $ARGUMENTS, run diagnostic greps) | 🔥 Accuracy | 2 hrs | `skills/qualia-debug/SKILL.md:39-44` |
+| 6 | Add structured Output Contract (DONE/BLOCKED/PARTIAL prefix) to builder.md | ⚡ Speed + 🔥 Accuracy | 20 min | `agents/builder.md:14` |
+| 7 | Mechanical-fix bypass in plan-checker (skip planner re-spawn for frontmatter/wave issues) | ⚡ Speed | 4 hrs | `skills/qualia-plan/SKILL.md:129-153` |
+| 8 | Make wave assignment deterministic: file-based dependency graph, topological sort (not "tasks with no dependencies") | 🔥 Accuracy | 3 hrs | `agents/planner.md:33` |
+| 9 | Add Rule 8 to plan-checker: "Validation must test behavior, not file-existence only" (stops stubs passing) | 🔥 Accuracy | 30 min | `agents/plan-checker.md` after Rule 7 |
+| 10 | Split qualia-design/review into parallel agent fan-out for large file sets (5+ files) | ⚡ Speed | 3 hrs | `skills/qualia-design/SKILL.md`, `skills/qualia-review/SKILL.md` |
+| 11 | Add wave-context summary (adjacent task titles + files) to builder prompt — stops semantic drift across parallel tasks | 🔥 Accuracy | 1 hr | `skills/qualia-build/SKILL.md:82` |
+| 12 | Fix `grep -qL` bug in qualia-review API auth check (backwards logic) | 🔥 Accuracy | 15 min | `skills/qualia-review/SKILL.md:59-61` |
+| 13 | Add tool budgets: researcher (8 external calls), verifier (25 bash calls), debug (10 reads) | ⚡ Speed | 45 min | `agents/researcher.md`, `agents/verifier.md`, `skills/qualia-debug` |
+| 14 | Standardize input contracts across 8 agents with `<variable>` typed blocks (only plan-checker does this today) | 🔥 Accuracy | 2 hrs | All 8 agent files |
+| 15 | Drop full `next build` from qualia-review; read existing `.next/` or skip with warning | ⚡ Speed | 20 min | `skills/qualia-review/SKILL.md:98` |
+**Total effort for #1–#15:** ~20 hours of focused work → framework-wide accuracy and speed step-change.
+---
+## Per-Command Scores (before changes)
+| Command | Score | Weakest Dimension |
+|---------|-------|-------------------|
+| qualia-debug | 4/10 | Interactive-by-default (4 mandatory questions), no output file, cheat sheets instead of diagnostic commands |
+| qualia-design | 6/10 | No critique output contract, `Agent` listed but never spawned, tsc-only verification |
+| qualia-review | 7/10 | Serial bash scans, latent `grep -qL` bug, no parallelism |
+| qualia-optimize | 8/10 | Strongest — uses agent fan-out, severity labels, OPTIMIZE.md output. Loses points on inline `find`/`grep` in Step 6 + no `--fix` dry-run |
+## Per-Agent Scores (before changes)
+| Agent | Overall | Biggest Gap |
+|-------|---------|-------------|
+| plan-checker | 9.5/10 | No tool budget |
+| verifier | 9.0/10 | No frontend gate on design verification (runs 40 greps on backend phases) |
+| planner | 8.5/10 | Prose input contract, no failure-mode handling |
+| builder | 8.5/10 | No structured output contract |
+| researcher | 8.5/10 | Unbounded WebSearch loops |
+| qa-browser | 8.5/10 | Probes for dev server URL instead of receiving it; no fallback when Playwright unavailable |
+| roadmapper | 8.5/10 | `full_detail` is a ghost parameter — referenced but not declared |
+| research-synthesizer | 8.0/10 | No evidence requirement on milestone suggestions |
+---
+## Rubrics to Ship as `rules/rubrics.md`
+**Severity (with deterministic category score):**
+```
+CRITICAL = 8 | HIGH = 4 | MEDIUM = 2 | LOW = 1
+weighted_sum = Σ(count_i × weight_i)
+category_score = max(1, 5 − ⌊weighted_sum / 8⌋)
+```
+**Design Quality (1–5 per dimension, any <3 = mandatory fix):**
+Typography / Color / Spacing / States / Responsiveness / Accessibility — each with objective criteria (see `skills/qualia-design` comment thread for full matrix).
+**Task-Done:**
+- Compiles (`tsc --noEmit` = 0)
+- No stubs (`grep -c "TODO|FIXME|placeholder" touched_files` = 0)
+- Wired (every export imported somewhere)
+- Each acceptance criterion has a passing validation command
+- Committed (git log matches task title)
+**Evidence Citation Format:**
+```
+file:line — "quoted code" — {assessment}
+```
+Claims missing this format are rejected. If evidence cannot be found: `INSUFFICIENT EVIDENCE: searched {files} with {commands}`.
+---
+## Grounding Protocol (paste into every agent)
+```markdown
+## Grounding Protocol (MANDATORY)
+1. Every factual claim requires `file:line — "quoted code"`. No exception.
+2. No hedging: "seems / probably / might" → verified or INSUFFICIENT EVIDENCE.
+3. Findings without file:line are discarded.
+4. Scores without evidence on the next line = 0.
+5. Severity requires quoting the matching Severity Rubric criterion.
+6. Output shape is a contract — missing sections = protocol violation.
+7. Stop at tool budget. Return what you found, not what you wish.
+8. Precondition: verify every @file exists before work; HALT if missing.
+```
+---
+## 3 Architectural Changes (bigger, keep for later)
+1. **Pre-Build Context Packet** — assemble one JSON with PROJECT.md + DESIGN.md + plan + wave-context before spawning builders. Eliminates per-builder file reads.
+2. **Intra-Wave Verification** — run each task's Validation contracts immediately after its builder completes, before next wave starts. Catches failure at task granularity, not phase.
+3. **Plan Cache** — cache parsed project identity in `.planning/.project-cache.json`; invalidate on PROJECT.md change. Saves ~30% planner context on multi-phase `--auto` runs.
+---
+## Missing Agents Worth Adding (ranked)
+1. **`migrator.md`** — generates + validates Supabase migrations. Current gap: builder writes raw SQL ad-hoc, migration guard catches only obvious patterns.
+2. **`dependency-auditor.md`** — pre-build peer-dependency / vulnerability check. Current gap: builder hits `npm install` conflicts mid-phase and wastes context debugging.
+3. **`rollback.md`** — on verify FAIL, bisect to last-good commit instead of always patching forward. Current gap: gap-closure plans build on broken code.
+---
+## Anti-Patterns to Kill
+- `find` inside skills (use Glob) — qualia-optimize:302, qualia-review multiple places
+- `Agent` in allowed-tools but never spawned — qualia-design, qualia-debug, qualia-review
+- Interactive question gates in one-shot commands — qualia-debug
+- Full `next build` as part of a "scan" — qualia-review:98
+- Vague "investigate the codebase" with no tool budget — qualia-debug, researcher
+- "seems / probably / might" language anywhere in agent output