npm - qualia-framework - Versions diffs - 6.9.2 → 6.22.0 - Mend

qualia-framework 6.9.2 → 6.22.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (64) hide show

package/AGENTS.md +8 -5
package/CHANGELOG.md +208 -0
package/CLAUDE.md +3 -1
package/agents/roadmapper.md +16 -14
package/agents/verifier.md +1 -1
package/bin/agent-status.js +264 -0
package/bin/analyze-gate.js +318 -0
package/bin/branch-hygiene.js +135 -0
package/bin/command-surface.js +2 -0
package/bin/compile-instructions.js +82 -0
package/bin/eval-runner.js +218 -0
package/bin/host-adapters.js +72 -12
package/bin/install.js +27 -17
package/bin/last-report.js +207 -0
package/bin/project-sync.js +315 -0
package/bin/report-payload.js +7 -0
package/bin/runtime-manifest.js +8 -0
package/bin/state.js +257 -12
package/bin/verify-panel.js +294 -0
package/bin/wave-plan.js +211 -0
package/docs/EMPLOYEE-QUICKSTART.md +3 -3
package/docs/erp-contract.md +168 -0
package/docs/qualia-manual.html +5 -5
package/hooks/branch-guard.js +133 -63
package/hooks/pre-deploy-gate.js +38 -0
package/hooks/task-write-guard.js +165 -0
package/package.json +3 -2
package/rules/codex-goal.md +28 -26
package/rules/infrastructure.md +1 -1
package/skills/qualia/SKILL.md +6 -0
package/skills/qualia-build/SKILL.md +39 -7
package/skills/qualia-eval/SKILL.md +83 -0
package/skills/qualia-feature/SKILL.md +20 -4
package/skills/qualia-fix/SKILL.md +13 -1
package/skills/qualia-milestone/SKILL.md +12 -6
package/skills/qualia-new/REFERENCE.md +6 -4
package/skills/qualia-new/SKILL.md +27 -15
package/skills/qualia-plan/SKILL.md +2 -2
package/skills/qualia-report/SKILL.md +10 -0
package/skills/qualia-scope/SKILL.md +3 -3
package/skills/qualia-ship/SKILL.md +37 -4
package/skills/qualia-update/SKILL.md +100 -0
package/skills/qualia-verify/SKILL.md +51 -24
package/templates/instructions.md +32 -0
package/templates/journey.md +2 -2
package/templates/project-discovery.md +30 -23
package/templates/requirements.md +7 -7
package/tests/agent-status.test.sh +153 -0
package/tests/analyze-gate.test.sh +170 -0
package/tests/bin.test.sh +5 -4
package/tests/branch-hygiene.test.sh +93 -0
package/tests/eval-runner.test.sh +147 -0
package/tests/hooks.test.sh +218 -17
package/tests/install-smoke.test.sh +4 -3
package/tests/instructions.test.sh +109 -0
package/tests/last-report.test.sh +156 -0
package/tests/lib.test.sh +2 -2
package/tests/project-sync.test.sh +175 -0
package/tests/run-all.sh +9 -0
package/tests/runner.js +3 -2
package/tests/state.test.sh +187 -0
package/tests/verify-panel.test.sh +162 -0
package/tests/wave-plan.test.sh +153 -0
package/skills/qualia-discuss/SKILL.md +0 -222

package/hooks/task-write-guard.js ADDED Viewed

@@ -0,0 +1,165 @@
+#!/usr/bin/env node
+// ~/.claude/hooks/task-write-guard.js — runtime enforcement of the plan
+// contract's declared file sets. PreToolUse hook on Edit/Write.
+// Exits 2 to BLOCK. Exits 0 to allow. Cross-platform (Windows/macOS/Linux).
+//
+// WHY: plan-contract.js proves file-disjointness across parallel tasks at PLAN
+// time, but nothing stops a builder writing outside its declared set at RUN
+// time — the documented #1 cause of cross-wave merge conflicts and AI entropy
+// (files nobody planned). This turns the static check into a deterministic
+// guardrail ("a rule worth enforcing is worth a hook" — constitution).
+//
+// SCOPE & HONEST LIMITATION: Claude Code fires the same stateless hook for
+// every subagent and gives it no task identity, so this hook cannot attribute a
+// write to a *specific* task. What it CAN enforce — and does — is that, while a
+// build is in flight, every Edit/Write targets a path DECLARED by SOME task in
+// the active phase contract (files_modify ∪ files_create). Plan-time
+// disjointness already guarantees no two tasks share a path, and the builder's
+// <wave_context> prompt tells it which set is its own; so the residual gap
+// ("T3 edits T4's declared file") is prompt-guarded while the high-frequency
+// vector ("builder invents/edits a file nobody planned") is hard-blocked.
+//
+// The guard is SCOPED: it is a no-op unless a build is active (≥1 RUNNING entry
+// in .agent-status/). Outside a build it never fires, so it can't interfere with
+// the orchestrator, the verifier, or ordinary editing. Fails OPEN on any error.
+const fs = require("fs");
+const path = require("path");
+const _traceStart = Date.now();
+// ── stdin reader (same robust pattern as the other guards) ──────────────
+function sleepSync(ms) {
+  try { Atomics.wait(new Int32Array(new SharedArrayBuffer(4)), 0, 0, ms); } catch {}
+}
+function readInput() {
+  const deadline = Date.now() + 1000;
+  const buf = Buffer.alloc(65536);
+  let data = "";
+  try {
+    while (Date.now() < deadline) {
+      let n = 0;
+      try {
+        n = fs.readSync(0, buf, 0, buf.length, null);
+      } catch (e) {
+        if (e && (e.code === "EAGAIN" || e.code === "EWOULDBLOCK")) { sleepSync(1); continue; }
+        break;
+      }
+      if (n === 0) break;
+      data += buf.slice(0, n).toString("utf8");
+    }
+    if (!data) return {};
+    return JSON.parse(data);
+  } catch {
+    return {};
+  }
+}
+function _trace(result, extra) {
+  try {
+    const os = require("os");
+    const parent = path.basename(path.dirname(__dirname));
+    const qualiaHome = process.env.QUALIA_HOME ||
+      (parent === ".codex" || parent === ".claude" ? path.dirname(__dirname) : path.join(os.homedir(), ".claude"));
+    const traceDir = path.join(qualiaHome, ".qualia-traces");
+    if (!fs.existsSync(traceDir)) fs.mkdirSync(traceDir, { recursive: true });
+    const entry = { hook: "task-write-guard", result, timestamp: new Date().toISOString(), duration_ms: Date.now() - _traceStart, ...extra };
+    fs.appendFileSync(path.join(traceDir, `${new Date().toISOString().split("T")[0]}.jsonl`), JSON.stringify(entry) + "\n");
+  } catch {}
+}
+function allow(reason, extra) { _trace("allow", { reason, ...extra }); process.exit(0); }
+// OWNER / debugging escape hatch, mirroring git-guardrails' QUALIA_ALLOW_*.
+if (process.env.QUALIA_ALLOW_OUTSIDE_CONTRACT === "1") allow("escape-hatch");
+const input = readInput();
+const ti = input.tool_input || {};
+const rawPath = String(ti.file_path || "");
+if (!rawPath) allow("no file_path");
+const root = process.cwd();
+// Reuse the status + contract libraries that ship alongside this hook (bin/ is a
+// sibling of hooks/ in both the repo and the installed layout). If they're not
+// resolvable (older/partial install), fail open.
+let agentStatus, planContract;
+try {
+  agentStatus = require(path.join(__dirname, "..", "bin", "agent-status.js"));
+  planContract = require(path.join(__dirname, "..", "bin", "plan-contract.js"));
+} catch {
+  allow("libs unavailable");
+}
+// SCOPE: only enforce during an active build (≥1 RUNNING agent-status entry).
+let running;
+try {
+  running = agentStatus.listStatuses(root).filter((s) => s.status === "RUNNING");
+} catch {
+  allow("status unreadable");
+}
+if (!running || running.length === 0) allow("no active build");
+// Locate the active phase contract. Prefer the phase declared by a RUNNING
+// builder; fall back to the sole phase-*-contract.json if unambiguous.
+function findContractPath() {
+  const phases = [...new Set(running.map((s) => s.phase).filter((p) => p != null))];
+  for (const p of phases) {
+    const cp = path.join(root, ".planning", `phase-${p}-contract.json`);
+    if (fs.existsSync(cp)) return cp;
+  }
+  try {
+    const dir = path.join(root, ".planning");
+    const matches = fs.readdirSync(dir).filter((f) => /^phase-\d+-contract\.json$/.test(f));
+    if (matches.length === 1) return path.join(dir, matches[0]);
+  } catch {}
+  return null;
+}
+const contractPath = findContractPath();
+if (!contractPath) allow("no active contract");
+let contract;
+try {
+  const loaded = planContract.readContractFile(contractPath);
+  if (!loaded.ok) allow("contract unreadable");
+  contract = loaded.contract;
+} catch {
+  allow("contract parse error");
+}
+// Build the union of writable declared paths across all tasks.
+// Edit/Write create or modify; deletes are out of band for this tool family.
+function norm(p) {
+  return String(p).replace(/\\/g, "/").replace(/^\.\//, "");
+}
+const declared = new Set();
+for (const t of contract.tasks || []) {
+  for (const f of t.files_modify || []) declared.add(norm(f));
+  for (const f of t.files_create || []) declared.add(norm(f));
+}
+// Resolve the target to a path relative to the project root.
+const abs = path.isAbsolute(rawPath) ? rawPath : path.resolve(root, rawPath);
+const rel = norm(path.relative(root, abs));
+// Out of project root → not this guard's concern (other guards handle secrets).
+if (rel.startsWith("../") || rel === "" || path.isAbsolute(rel)) allow("outside project root", { rel });
+// Framework scratch / planning artifacts are always writable during a build:
+// the status protocol, evidence, deviations, plan and contract files.
+if (rel.startsWith(".agent-status/") || rel.startsWith(".planning/")) allow("framework path", { rel });
+if (declared.has(rel)) allow("declared", { rel });
+// Not declared by any task → block.
+console.error("⬢ task-write-guard — write outside the plan contract:");
+console.error(`  ✗ ${rel}`);
+console.error("");
+console.error(`  No task in ${path.relative(root, contractPath)} declares this file`);
+console.error("  (files_modify / files_create). Builders may only write files");
+console.error("  their task planned. If this file is genuinely needed, add it to");
+console.error("  the contract via the locked-decision channel, or re-plan the phase.");
+console.error("  OWNER override: QUALIA_ALLOW_OUTSIDE_CONTRACT=1");
+_trace("block", { rel, contract: path.relative(root, contractPath) });
+process.exit(2);

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "qualia-framework",
-  "version": "6.9.2",
+  "version": "6.22.0",
   "description": "Claude Code and Codex workflow framework by Qualia Solutions. Plan, build, verify, ship.",
   "bin": {
     "qualia-framework": "./bin/cli.js"
@@ -33,7 +33,8 @@
     "test:statusline": "bash tests/statusline.test.sh",
     "test:refs": "bash tests/refs.test.sh",
     "test:published-install": "bash tests/published-install-smoke.test.sh",
-    "test:shell": "bash tests/run-all.sh"
+    "test:shell": "bash tests/run-all.sh",
+    "compile:instructions": "node bin/compile-instructions.js"
   },
   "files": [
     "bin/",

package/rules/codex-goal.md CHANGED Viewed

@@ -1,46 +1,48 @@
-# Codex /goal integration
+# Work-unit goal (both runtimes)
-When this skill spawns a unit of work on **Codex** (not Claude Code), set the thread goal at the start so Codex's native token-budget + status tracking takes over.
+When a skill begins a defined **unit of work** (a phase build, a feature, a milestone, a fix), set an explicit goal — an objective + a token budget — so the session tracks burn-vs-budget and stays anchored to one outcome. Both runtimes get this; the *surface* differs.
-## Runtime detection
-You are on Codex when `~/.codex/` exists and `~/.claude/` is absent or stale. The simplest probe:
+The objective + budget come from one shared helper, regardless of runtime:
 ```bash
-test -f ~/.codex/AGENTS.md && echo codex || echo claude
+node ${QUALIA_BIN}/codex-goal.js {scope}    # scope ∈ phase · task · feature · quick
 ```
-If the answer is `claude`, **skip this entire rule** — Claude Code has no equivalent surface and emitting `/goal` text would just be noise.
+It prints two lines from `.planning/STATE.md` + `ROADMAP.md`:
+```
+/goal {objective text}
+# token_budget suggestion: {N}
+```
-## How to set the goal
+## Runtime detection
+```bash
+test -f ~/.codex/AGENTS.md && [ ! -d ~/.claude ] && echo codex || echo claude
+```
-1. Run the helper to produce the objective string + suggested token budget:
+## Codex — native `/goal`
-   ```bash
-   node ~/.codex/bin/codex-goal.js {scope}
-   ```
+Codex has a first-class goal surface (`thread_goals`: objective, token_budget, tokens_used, status).
-   `{scope}` is one of: `phase` · `task` · `feature` · `quick`. Use the scope of the current skill.
+1. **If the `update_goal` tool is available** (Codex exposes it as a model-callable tool), call it with `objective` = the text after `/goal ` and `token_budget` = the integer suggestion.
+2. **Otherwise** surface the `/goal` line for the user to paste. Don't silently skip — it's a one-second set and the only way Codex's budget telemetry knows what to track.
-2. The output is two lines:
+## Claude Code — equivalent via the harness work-list + budget
-   ```
-   /goal {objective text from STATE.md + ROADMAP.md}
-   # token_budget suggestion: {N}
-   ```
+Claude Code has no `/goal` table, but it has a native equivalent: the **session task-list** (the model's todo/task tool) and the turn **token budget**. Use them so the work unit is just as anchored and visible:
-3. **If the `update_goal` tool is available** to you (Codex exposes it as a model-callable tool), call it directly with:
-   - `objective` = the text after `/goal ` on line 1
-   - `token_budget` = the integer suggestion on line 2
+1. **Create a tracked task** for the unit with the objective as its title (e.g. *"Phase 3 — checkout + Stripe webhook"*). Mark it `in_progress` at start, `completed` at end. This is the Claude-side "active goal" — it shows in the UI and survives compaction.
+2. **Treat `token_budget` as the unit's context budget.** State it in the opening line (banner) — *"Goal: {objective} · budget ~{N} tok"* — so the operator and the model both see how much room the unit has. If a `+Nk` turn directive is set, prefer that.
+3. For a multi-wave phase, the per-task `.agent-status/` entries (see `/qualia-build`) are the sub-goals under this one.
-4. **If `update_goal` is not available**, surface the `/goal` line to the user in your next message and let them paste it. Do not silently skip — the goal-set takes 1 second and is the only way Codex's budget telemetry knows what to track.
+Either way the rule is the same: **one named objective + one budget per work unit, surfaced, not silent.**
 ## When NOT to set a goal
-- The user is on Claude Code (no `/goal` surface).
-- A goal is already active for this thread (Codex rejects `update_goal` when one exists — call `thread/goal/get` first if you're using the tool API directly).
-- The work is open-ended exploration with no clear objective (e.g. `/qualia`, `/qualia-scope`). Goals are for executing a defined scope.
+- A goal/task is already active for this unit (don't double-set; Codex rejects `update_goal` when one exists — check first).
+- Open-ended exploration with no defined scope (`/qualia`, `/qualia-scope` PROJECT MODE, `/qualia-idk`). Goals are for *executing* a defined scope, not discovering one.
 ## Why
-Codex's `thread_goals` table tracks `objective`, `token_budget`, `tokens_used`, and a `status` enum (`active | paused | blocked | usage_limited | budget_limited | complete`). Setting the goal lets the user see burn-vs-budget in the TUI without the framework reinventing it. The token-budget number also makes the model self-aware of how much context it has left for the current unit of work.
+A named objective + budget keeps a unit of work from sprawling: the model stays self-aware of how much context remains, the operator sees burn-vs-budget, and the unit has a single definition of done. On Codex this rides `thread_goals`; on Claude Code it rides the task-list + turn budget. Same discipline, native surface on each.

package/rules/infrastructure.md CHANGED Viewed

@@ -49,7 +49,7 @@ Standard services across all Qualia projects. Use these unless the project expli
 - **QualiasolutionsCY** — primary org for all Qualia Solutions projects
 - **SakaniQualia** — org for Sakani-related projects (real estate platform)
 - All repos are private by default
-- Branch protection: main/master require PR reviews (enforced by framework guards)
+- Main integration: feature branches integrate to `main` at **`/qualia-ship`** (ship is the single merge point — it fast-forwards the branch into `main`, deploys from `main`, and deletes the branch). Pushes to `main` are **allowed and recorded** by `branch-guard` (per-employee tally → ERP) — accountability, not a hard block. `/qualia-report` sweeps for branches with unshipped commits + stale PRs at clock-out so nothing lingers. Keep GitHub branch protection on `main` OFF (or with the team allowed to push) for this model; if you re-enable required reviews, switch ship to an auto-merged PR instead.
 ## Vercel Teams (admin knowledge)
 - Qualia operates across **3 Vercel teams** — projects are distributed across them

package/skills/qualia/SKILL.md CHANGED Viewed

@@ -33,6 +33,12 @@ ls .planning/phase-*-plan.md 2>/dev/null || echo "NO_PLANS"
 ls .planning/phase-*-verification.md 2>/dev/null || echo "NO_VERIFICATIONS"
 ```
+And surface where work was left off last time — the richest "where we left off" signal lives in `.planning/reports/`:
+```bash
+node ${QUALIA_BIN}/last-report.js 2>/dev/null
+```
+Exit 0 → it prints a one-line digest of the newest session report (`Last session ({date}, {age}d ago): {summary} → next: {next}`). Exit 1 → no reports yet (nothing to surface). When a project is loaded and a digest exists, print that line **at the very TOP of your output**, before the banner — so the first thing the operator (or a teammate picking the project up) sees is exactly where the last session ended.
 Read conversation context — what has the user been doing, what errors occurred.
 ### 2. Classify and Route

package/skills/qualia-build/SKILL.md CHANGED Viewed

@@ -21,12 +21,13 @@ Execute phase plan. Each task = fresh subagent. Independent tasks run parallel.
 `/qualia-build` — build current planned phase
 `/qualia-build {N}` — build specific phase
 `/qualia-build {N} --auto` — build + chain into `/qualia-verify {N} --auto` (no human gate)
+`/qualia-build {N} --parallel K` — cap concurrent builders at K (default auto: sequential under 3 tasks, else up to 5)
 ## Process
-### 0. Codex goal (Codex runtime only)
+### 0. Set the work-unit goal
-Per `rules/codex-goal.md` — set the thread goal at phase start with scope `phase`.
+Per `rules/codex-goal.md` — set the work-unit goal at phase start with scope `phase` (Codex `/goal`; on Claude Code, a tracked task + budget in the banner). One named objective + budget for the whole build.
 ### 1. Load Plan
@@ -38,6 +39,20 @@ node ${QUALIA_BIN}/plan-contract.js validate .planning/phase-{N}-contract.json
 Parse tasks, waves, file refs. Prefer the JSON contract for task ids, dependencies, file lists, and verification checks; use the Markdown plan as the human-readable context.
+### 1a. Analyze Gate (scope ↔ plan, before any build)
+`plan-contract.js` proves the contract is internally well-formed; this gate diffs it **against intent** — scope acceptance criteria (`phase-{N}-context.md`) + the CONTEXT.md glossary — to catch requirements the plan silently dropped or contradicted. This is the plan→build seam Spec-Kit calls `/analyze`.
+```bash
+node ${QUALIA_BIN}/analyze-gate.js {N}
+```
+Exit 0 → consistent, proceed. Non-zero → it lists under-covered scope criteria, orphan success criteria, glossary violations, and scope-reduction language. **Profile-aware** (the `profile` field from `state.js check`):
+- **strict** → a HIGH finding is a stop. Route to `/qualia-plan {N} --gaps` (plan dropped a requirement) or `/qualia-scope {N}` (scope itself is wrong). Do not build.
+- **standard** → surface findings to the operator and proceed only with an explicit ack; log the waiver reason to `.planning/decisions/` if you proceed past a HIGH.
+(No scope file = scope-coverage check is skipped, not a failure — `/qualia-feature` trivia and scope-less phases still build.)
 ### 1b. Recovery Reference
 Tag HEAD before executing. Reference only, no auto-rollback.
@@ -62,13 +77,15 @@ git diff --stat
 node ${QUALIA_BIN}/qualia-ui.js banner build {N} "{phase name}"
 ```
-**For each wave (sequential):**
+**Derive the build schedule from the dependency graph (don't trust hand-numbered waves, don't over-spawn):**
 ```bash
-node ${QUALIA_BIN}/qualia-ui.js wave {W} {total_waves} {tasks_in_wave}
+node ${QUALIA_BIN}/wave-plan.js .planning/phase-{N}-contract.json {--parallel K if set} --json
 ```
-**Per task in wave: spawn ALL as separate `Agent()` calls in SAME turn (concurrent). Do NOT await one before spawning next.**
+`wave-plan.js` recomputes minimal-depth waves from `depends_on` (maximal safe parallelism) and splits each into **batches capped at `max_concurrency`** (auto: 1 if <3 tasks, else 5; `--parallel K` overrides). Spawn **one batch at a time, in order** — every task in a batch is dependency-free of its batch-mates, so they run concurrently; the next batch waits for the fan-in barrier (§ after each wave). Follow the emitted `batches[]`, not the raw contract `wave` numbers.
+**Per batch: spawn ALL its tasks as separate `Agent()` calls in the SAME turn (concurrent). Do NOT await one before spawning the next.**
 ```bash
 node ${QUALIA_BIN}/qualia-ui.js task {task_num} "{task title}"
@@ -117,7 +134,13 @@ Parallel tasks Wave {W} (do NOT touch their files):
 </task_contract>
 Context tags already loaded. Only Read project code you modify.
-Execute. Commit. Return DONE/BLOCKED/PARTIAL.
+Status protocol (machine-readable fan-in — do this, do not skip):
+- First action: `node ${QUALIA_BIN}/agent-status.js write {task_id} RUNNING --phase {N} --wave {W}`
+- Last action, after committing: `node ${QUALIA_BIN}/agent-status.js write {task_id} DONE --commit $(git rev-parse --short HEAD)`
+  (use BLOCKED or PARTIAL with `--note \"why\"` instead of DONE if you could not finish)
+Execute. Commit. Write your DONE/BLOCKED/PARTIAL status. Return DONE/BLOCKED/PARTIAL.
 ", subagent_type="qualia-builder", description="Task {N}: {title}")
 ```
@@ -130,7 +153,15 @@ Execute. Commit. Return DONE/BLOCKED/PARTIAL.
 node ${QUALIA_BIN}/qualia-ui.js done {task_num} "{title}" {commit_hash}
 ```
-**After each wave:** move to next, show summary.
+**After each batch — fan-in barrier (deterministic, not "did the model notice"):**
+```bash
+node ${QUALIA_BIN}/agent-status.js barrier --tasks {comma-separated task ids in this batch}
+```
+Exit 0 ⇔ every task in the batch wrote `DONE`. Non-zero → the barrier lists which tasks are RUNNING/BLOCKED/PARTIAL/MISSING. Do NOT spawn the next batch until the barrier passes; a BLOCKED/PARTIAL task is a wave failure (§4). `agent-status.js list` shows the live view. (Gating per batch — not per contract wave — keeps the barrier aligned with the `wave-plan.js` schedule, whose derived waves needn't match the contract's declared wave numbers.)
+**After each batch:** move to the next batch in the schedule, show summary.
 ### 3. Wave Completion
@@ -141,6 +172,7 @@ node ${QUALIA_BIN}/qualia-ui.js divider
 node ${QUALIA_BIN}/qualia-ui.js ok "Tasks: {done}/{total}"
 node ${QUALIA_BIN}/qualia-ui.js ok "Commits: {count}"
 node ${QUALIA_BIN}/qualia-ui.js ok "Waves: {count}"
+node ${QUALIA_BIN}/agent-status.js clear   # drop ephemeral .agent-status/ scratch
 ```
 ### 4. Handle Failures

package/skills/qualia-eval/SKILL.md ADDED Viewed

@@ -0,0 +1,83 @@
+---
+name: qualia-eval
+description: "Evaluate an AI feature (chat / RAG / voice / agent) against a layered eval suite — deterministic assertions first, then llm-rubric judges — and gate on the result. Qualia gates UI and code; this is the equivalent gate for the AI artifacts a project builds. Triggers: 'eval this agent', 'test the chatbot', 'evaluate the AI feature', 'rag eval', 'does the assistant answer correctly', 'judge the model output', 'qualia-eval'."
+allowed-tools:
+  - Bash
+  - Read
+  - Write
+  - Edit
+  - Grep
+  - Glob
+  - Agent
+---
+# /qualia-eval — Evaluate an AI Feature
+`contract-runner` proves the code exists; `verify-panel` proves the code is correct. Neither can tell you whether the **chatbot actually answers the refund question**. This lane closes that gap with a layered eval suite — cheap deterministic checks first, model judgment only where a model is required — mirroring the contract-runner evidence model.
+## Usage
+`/qualia-eval {suite.json}` — run an eval suite for one AI feature
+`/qualia-eval {N}` — run every `.planning/evals/*-suite.json` for phase N (verify-step gate)
+## The suite (JSON)
+One suite per AI feature. Each case carries a captured `output` (or `output_file`) plus optional `latency_ms` / `cost_usd`, and a list of assertions:
+```json
+{
+  "feature": "support-chat",
+  "cases": [
+    { "name": "refund window", "input": "what's your refund policy?",
+      "output": "We refund within 30 days of purchase.",
+      "latency_ms": 1200, "cost_usd": 0.008,
+      "assert": [
+        { "type": "contains", "value": "30 days" },
+        { "type": "not_contains", "value": "I cannot help" },
+        { "type": "max_latency_ms", "value": 2000 },
+        { "type": "llm_rubric", "rubric": "answer is grounded in the policy, no hallucinated terms" }
+      ] } ]
+}
+```
+Deterministic assertion types (settled with no model): `contains`, `not_contains`, `equals`, `regex`, `not_regex`, `min_length`, `max_length`, `json_valid`, `json_path` (`equals`/`contains`), `max_latency_ms`, `max_cost_usd`. The model-only type is `llm_rubric`.
+## Process
+### 1. Capture outputs
+For each case, run the AI feature on `input` and record the real `output` (+ `latency_ms`/`cost_usd` if measurable) back into the suite. Use the project's own entrypoint — an API route, a script, or the agent SDK. If outputs are already captured (replay fixtures), skip to step 2.
+### 2. Judge the rubrics (one judge per llm_rubric, fresh context)
+Deterministic assertions need no model — `eval-runner.js` settles them. For each `llm_rubric` assertion, spawn a judge to return a verdict, then write `"verdict": "pass"|"fail"` onto that assertion in the suite. This mirrors how `verify-panel` consumes skeptic votes: the model judges, the runner aggregates.
+```
+Agent(prompt="
+Role: @${QUALIA_AGENTS}/verifier.md
+JUDGE one rubric against one output. No code to grep — judge the text only.
+Rubric: {rubric}
+Input: {input}
+Output to judge: {output}
+Return exactly one line: PASS — {reason}  OR  FAIL — {reason}. Default FAIL if the output does not clearly satisfy the rubric.
+", subagent_type="qualia-verifier", description="Judge rubric — {case name}")
+```
+An `llm_rubric` with no verdict is PENDING and **fails** the suite — never silently pass an unjudged rubric.
+### 3. Run the deterministic verdict
+```bash
+node ${QUALIA_BIN}/eval-runner.js {suite.json} --write
+```
+`eval-runner.js` runs every deterministic assertion itself, folds in the rubric verdicts, and exits **0 = all cases pass / 1 = any failure or unjudged rubric**. Artifact: `.planning/evals/eval-{feature}.json`.
+### 4. Gate
+Exit 0 → the AI feature meets its bar; report PASS with the per-case summary. Exit 1 → list the failing cases + assertions and route to `/qualia-fix` (behavior wrong) or back to the prompt/RAG config. When run as a phase verify-step gate (`/qualia-eval {N}`), a FAIL is a phase FAIL — same standing as a failing contract.
+```bash
+node ${QUALIA_BIN}/qualia-ui.js end "EVAL COMPLETE" "/qualia-verify {N}"
+```

package/skills/qualia-feature/SKILL.md CHANGED Viewed

@@ -40,9 +40,9 @@ One command for adding a small new capability outside the planned Road. Auto-det
 ## Process
-### 0. Codex goal (Codex runtime only)
+### 0. Set the work-unit goal
-Per `rules/codex-goal.md` — set the thread goal with scope matching the auto-detected bucket (`quick` for inline, `feature` for spawn). Do this AFTER Step 2 (auto-detect scope) so the budget matches the actual work shape.
+Per `rules/codex-goal.md` — set the work-unit goal (Codex `/goal`; on Claude Code, a tracked task + budget) with scope matching the auto-detected bucket (`quick` for inline, `feature` for spawn). Do this AFTER Step 2 (auto-detect scope) so the budget matches the actual work shape.
 ### 1. Capture description
@@ -50,6 +50,22 @@ If invoked without args, ask: **"What do you want to build?"**
 Wait for free-text answer. Don't paraphrase back. Capture the user's exact phrasing — it feeds both the auto-scope classifier and the eventual commit message.
+### 1b. Scope gate (anti-drift — keep work on the milestone arc)
+Before building, check whether this work belongs to the active milestone. This is what stops feature/fix from drifting off-plan.
+```bash
+node ${QUALIA_BIN}/state.js check 2>/dev/null   # → milestone, profile; JOURNEY.md = the arc
+node ${QUALIA_BIN}/state.js reqs-check 2>/dev/null   # current milestone's open REQ-IDs
+```
+- **No active project / no milestone** (`.planning/` absent) → not governed; proceed normally (skip to Step 2).
+- **Active milestone**: decide if this work serves it.
+  - **In-scope** (it advances the current milestone's goal or an open REQ-ID) → proceed. Record it tagged to scope in Steps 4/5: add `--scope in --ref {REQ-ID or phase}` to the `state.js transition --to note` call.
+  - **Off-road** (a new capability/feature that isn't in the current milestone): this is exactly the drift the framework guards against. Resolve by profile (`state.js check` → `profile`):
+    - **strict** → STOP. Do not build off-road. Route to `/qualia-scope` to fold it into the arc (a phase/REQ in the current or a future milestone) or `/qualia-milestone` if it's a new milestone. Off-road building is blocked.
+    - **standard** → allowed, but **recorded**: build it, then record with `--scope off --ref "{what + why off-road}"` so the OWNER + ERP see the off-road tally (it is never silent).
 ### 2. Auto-detect scope
 Classify the description into one of three buckets:
@@ -116,7 +132,7 @@ git commit -m "fix: {description}"
 5. Record in state:
 ```bash
-node ${QUALIA_BIN}/state.js transition --to note --notes "{brief description}" --tasks-done 1
+node ${QUALIA_BIN}/state.js transition --to note --notes "{brief description}" --tasks-done 1 {--scope in --ref {REQ/phase}  |  --scope off --ref "{why off-road}" — from the §1b scope gate}
 ```
 6. End with:
@@ -184,7 +200,7 @@ node ${QUALIA_BIN}/qualia-ui.js end "FEATURE SHIPPED (spawn)"
 5. Record in state:
 ```bash
-node ${QUALIA_BIN}/state.js transition --to note --notes "{description}" --tasks-done 1
+node ${QUALIA_BIN}/state.js transition --to note --notes "{description}" --tasks-done 1 {--scope in --ref {REQ/phase}  |  --scope off --ref "{why off-road}" — from the §1b scope gate}
 ```
 ### 6. Execute the refuse path

package/skills/qualia-fix/SKILL.md CHANGED Viewed

@@ -48,6 +48,10 @@ Fix is the practical lane for "this used to work, or should work, and now it doe
 node ${QUALIA_BIN}/qualia-ui.js banner fix
 ```
+### 0. Set the work-unit goal
+Per `rules/codex-goal.md` — set the work-unit goal (Codex `/goal`; on Claude Code, a tracked task + budget) with scope `quick` for `--quick`, else `feature`. Anchors the fix to one objective + budget so root-cause work doesn't sprawl.
 ### 1. Classify The Request
 Parse `$ARGUMENTS` into:
@@ -70,6 +74,14 @@ If the request is phase-sized, stop and route:
 node ${QUALIA_BIN}/qualia-ui.js end "ROUTED" "/qualia-plan"
 ```
+### 1b. Scope tag (anti-drift)
+```bash
+node ${QUALIA_BIN}/state.js check 2>/dev/null   # milestone + profile
+```
+Repairing broken behavior in what the current milestone already built is **in-scope** — proceed, and tag the record `--scope in --ref {REQ/phase}` in Step 7. But a "fix" that is really **new off-road behavior** (a capability the milestone never included, dressed as a bug) is drift: in **strict** profile, STOP and route to `/qualia-scope` to fold it into the arc; in **standard**, proceed but record `--scope off --ref "{why off-road}"` so it's counted, never silent. No active milestone → not governed, proceed.
 ### 2. Build The Feedback Loop
 Use the cheapest check that can prove the bug is real and later prove it is fixed.
@@ -175,7 +187,7 @@ git commit -m "fix: {short symptom/root-cause summary}"
 Record state:
 ```bash
-node ${QUALIA_BIN}/state.js transition --to note --notes "{short fix summary}" --tasks-done 1
+node ${QUALIA_BIN}/state.js transition --to note --notes "{short fix summary}" --tasks-done 1 {--scope in --ref {REQ/phase}  |  --scope off --ref "{why off-road}" — from the §1b scope tag}
 ```
 ### 8. Output

package/skills/qualia-milestone/SKILL.md CHANGED Viewed

@@ -30,13 +30,17 @@ Triggered after `/qualia-verify` passes on the LAST phase of the current milesto
 ```bash
 node ${QUALIA_BIN}/state.js check
+node ${QUALIA_BIN}/state.js reqs-check   # this milestone's REQ-ID completion
 ```
-`state.js close-milestone` enforces two guards:
+`state.js close-milestone` enforces three guards:
 - `MILESTONE_NOT_READY` — any phase not verified
 - `MILESTONE_TOO_SMALL` — milestone has < 2 phases
+- `MILESTONE_REQS_INCOMPLETE` — a REQ-ID mapped to this milestone in REQUIREMENTS.md is not yet `Complete` (strict profile blocks; standard profile proceeds but the unfinished REQs are surfaced as `warnings` to log). This is what stops "finishing a milestone with scope still open."
-If either fires (without `--force`), stop and show the error. The user must verify remaining phases first (or add `--force` for explicit bypass on a preview/demo milestone).
+If any fires (without `--force`), stop and show the error. Resolve before closing: verify remaining phases, finish the open requirements, or **explicitly defer** a requirement by moving it to `Out of Scope` in REQUIREMENTS.md (a conscious deferral, not silent). `--force` bypasses all three for retroactive bookkeeping only.
+Run `reqs-check` first so the user sees exactly which requirements are still open before the close attempt — Step 4 (mark Complete) should already have flipped the finished ones.
 ### 1b. Demo-Extension Branch
@@ -59,7 +63,7 @@ If `PROJECT_TYPE=demo` AND `MILESTONE_COUNT=1`, the demo's one milestone is clos
 **If "Client signed — extend to full project":**
 1. Update `.planning/PROJECT.md` frontmatter: `project_type: full`.
-2. Run a brief discovery top-up — invoke `/qualia-scope` in PROJECT MODE, but only ask §9-§14 (the full-project-only questions). This adds the milestone arc, compliance, integrations, content ownership, handoff team, and budget shape.
+2. Run a brief discovery top-up — invoke `/qualia-scope` in PROJECT MODE, but only ask §9–§15 (the full-project-only questions). This adds the **capability inventory** (the whole project's scope), the **whole-project definition of done**, shipping order, compliance, integrations, content ownership, handoff team, and budget shape.
 3. Spawn the roadmapper in `extend-to-full` mode (see prompt below). It reads the existing single milestone (now M1), the updated discovery, and produces a full JOURNEY.md with M2..M{N-1} sketches plus the Handoff milestone.
 4. Then proceed with the standard close-milestone flow (Steps 2-9) — M1 closes, M2 opens, the user is asked to continue.
@@ -75,11 +79,13 @@ Read your role: @${QUALIA_AGENTS}/roadmapper.md
 <task>
 The existing JOURNEY.md has 1 milestone (the demo, now M1 and shipped). Extend it
-into a 2-5 milestone arc to Handoff:
+into the FULL milestone arc to Handoff — as many milestones as the agreed scope
+needs (no cap), covering the entire capability inventory:
 - Keep M1 exactly as-is (it shipped).
-- Add M2..M{N-1} based on §9 of project-discovery.md (the milestone-arc question
-  the user answered when converting from demo).
+- Add M2..M{N-1} covering every capability in §9 of project-discovery.md (the
+  capability inventory), ordered per §11 (shipping order). Every §9 capability
+  must land in a milestone — nothing agreed is left unplanned.
 - Append a Handoff milestone (fixed 4 phases: Polish, Content + SEO, Final QA,
   Handoff).
 - Update REQUIREMENTS.md to add REQ-IDs for the new milestones.

package/skills/qualia-new/REFERENCE.md CHANGED Viewed

@@ -59,8 +59,10 @@ Read your role: @${QUALIA_AGENTS}/research-synthesizer.md
 Merge the 4 research files at .planning/research/ into .planning/research/SUMMARY.md.
 This is a multi-milestone project -- the SUMMARY must suggest a FULL milestone arc
-(2-5 milestones including Handoff), not just a v1 phase list. Include roadmap
-implications AND handoff implications (what client takeover requires).
+that covers the ENTIRE capability set to its done-state (as many milestones as the
+scope needs, ending in Handoff for client projects -- no milestone cap), not just a
+v1 phase list. Include roadmap implications AND handoff implications (what client
+takeover requires).
 ", subagent_type="qualia-research-synthesizer", description="Synthesize research")
 ```
@@ -74,7 +76,7 @@ Read your role: @${QUALIA_AGENTS}/roadmapper.md
 <task>
 Create the FULL JOURNEY for this project:
-  - .planning/JOURNEY.md -- all milestones (2-5 including Handoff) with exit criteria
+  - .planning/JOURNEY.md -- all milestones (≥2, no upper cap; ending in Handoff for client projects) covering every capability from discovery §9, with exit criteria
   - .planning/REQUIREMENTS.md -- requirements grouped by milestone
   - .planning/ROADMAP.md -- Milestone 1's phase detail (and ALL milestones if full_detail=true)
@@ -115,7 +117,7 @@ The branded journey ladder rendered in Step 11. Use `node ${QUALIA_BIN}/qualia-u
 ```
 ## Proposed Journey
-**{N} milestones to handoff** | **{X} requirements mapped** | All v1 requirements covered
+**{N} milestones to handoff** | **{X}/{X} capabilities mapped** | Full §9 inventory covered (0 unmapped)
   +-- Milestone 1 . {Name}               [CURRENT]
   |  Why now: {one line}