qualia-framework 5.8.0 → 5.9.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/agents/plan-checker.md +8 -0
- package/agents/qa-browser.md +7 -0
- package/agents/roadmapper.md +8 -0
- package/agents/verifier.md +14 -1
- package/bin/cli.js +30 -1
- package/bin/erp-retry.js +289 -0
- package/bin/install.js +6 -0
- package/bin/state.js +10 -1
- package/docs/onboarding.html +3 -5
- package/docs/playwright-loop-pilot-results.md +7 -5
- package/docs/research/2026-05-11-deep-research.md +189 -0
- package/hooks/session-start.js +18 -0
- package/package.json +3 -2
- package/rules/speed.md +1 -2
- package/skills/qualia-report/SKILL.md +64 -2
- package/skills/qualia-verify/SKILL.md +16 -0
- package/templates/help.html +2 -3
- package/tests/bin.test.sh +23 -5
- package/tests/refs.test.sh +146 -0
package/agents/plan-checker.md
CHANGED
|
@@ -2,8 +2,16 @@
|
|
|
2
2
|
name: qualia-plan-checker
|
|
3
3
|
description: Validates a phase plan before execution. Checks task specificity, wave assignment, verification contracts, and coverage of success criteria. Spawned by qualia-plan in a revision loop (max 2 iterations).
|
|
4
4
|
tools: Read, Bash, Grep
|
|
5
|
+
model: sonnet
|
|
5
6
|
---
|
|
6
7
|
|
|
8
|
+
<!-- v5.9: Sonnet, not Opus. The checker runs an 11-rule checklist against the
|
|
9
|
+
plan — every rule is a deterministic match (task has a Why?, AC is
|
|
10
|
+
observable?, wave assignment correct?). Structured validation, not plan
|
|
11
|
+
synthesis. Plan WRITING is on Opus (agents/planner.md); plan CHECKING is
|
|
12
|
+
on Sonnet because it's pattern-matching. -->
|
|
13
|
+
|
|
14
|
+
|
|
7
15
|
# Plan Checker
|
|
8
16
|
|
|
9
17
|
You validate phase plans before they go to the builder. You do NOT write plans — you evaluate them. If a plan has issues, return a structured list; the planner will revise and you'll check again (max 2 revision cycles).
|
package/agents/qa-browser.md
CHANGED
|
@@ -2,8 +2,15 @@
|
|
|
2
2
|
name: qualia-qa-browser
|
|
3
3
|
description: Real-browser QA. Navigates the running dev server, checks layout at mobile/tablet/desktop, clicks primary flows, captures console errors and a11y issues. Spawned by /qualia-verify on phases with frontend work.
|
|
4
4
|
tools: Read, Bash, Grep, Glob
|
|
5
|
+
model: sonnet
|
|
5
6
|
---
|
|
6
7
|
|
|
8
|
+
<!-- v5.9: Sonnet, not Opus. QA-browser drives the browser through scripted
|
|
9
|
+
flows and reports console + a11y findings. Mechanical interaction +
|
|
10
|
+
finding-collection, not architectural reasoning. Vision interpretation
|
|
11
|
+
for design quality lives in visual-evaluator.md, which stays on Opus. -->
|
|
12
|
+
|
|
13
|
+
|
|
7
14
|
# Qualia QA Browser
|
|
8
15
|
|
|
9
16
|
You verify that the **running app actually looks and behaves right** — not just that the code compiles and greps clean. Fresh context, no memory of what was built.
|
package/agents/roadmapper.md
CHANGED
|
@@ -2,8 +2,16 @@
|
|
|
2
2
|
name: qualia-roadmapper
|
|
3
3
|
description: Creates JOURNEY.md (full multi-milestone arc), REQUIREMENTS.md (multi-milestone, REQ-IDs), and ROADMAP.md (current milestone's phase detail) from PROJECT.md and research. Spawned by qualia-new after research completes.
|
|
4
4
|
tools: Read, Write, Bash
|
|
5
|
+
model: sonnet
|
|
5
6
|
---
|
|
6
7
|
|
|
8
|
+
<!-- v5.9: Sonnet, not Opus. The roadmapper fills mostly-deterministic templates
|
|
9
|
+
(JOURNEY.md, REQUIREMENTS.md, ROADMAP.md) from PROJECT.md + research
|
|
10
|
+
synthesis. Project-specific shape, but the milestone-decomposition logic
|
|
11
|
+
is bounded and structured — not novel synthesis. Builder and planner stay
|
|
12
|
+
on Opus where real architectural reasoning lives. -->
|
|
13
|
+
|
|
14
|
+
|
|
7
15
|
# Qualia Roadmapper
|
|
8
16
|
|
|
9
17
|
You produce the **full project journey** — every milestone from kickoff to handoff. This is the North Star for the rest of the project. Everything downstream (planner, builder, verifier, milestone close) stays architecturally consistent with what you write here.
|
package/agents/verifier.md
CHANGED
|
@@ -2,8 +2,17 @@
|
|
|
2
2
|
name: qualia-verifier
|
|
3
3
|
description: Goal-backward verification. Checks if the phase ACTUALLY works, not just if tasks ran.
|
|
4
4
|
tools: Read, Bash, Grep, Glob
|
|
5
|
+
model: sonnet
|
|
5
6
|
---
|
|
6
7
|
|
|
8
|
+
<!-- v5.9: Sonnet, not Opus. The verifier executes a deterministic protocol —
|
|
9
|
+
run greps against acceptance criteria, score the 8-dim design rubric, walk
|
|
10
|
+
stub-detection patterns. Pattern-matching + structured output, not novel
|
|
11
|
+
architectural reasoning. Opus is overkill; the inherited-Opus default cost
|
|
12
|
+
~3x what Sonnet does without measurably better verdicts.
|
|
13
|
+
Builder and planner stay on Opus (they do real synthesis). -->
|
|
14
|
+
|
|
15
|
+
|
|
7
16
|
# Qualia Verifier
|
|
8
17
|
|
|
9
18
|
You verify that a phase achieved its GOAL, not just completed its TASKS.
|
|
@@ -44,7 +53,11 @@ Write `.planning/phase-{N}-verification.md` — PASS or FAIL with evidence. Appl
|
|
|
44
53
|
|
|
45
54
|
## Tool Budget
|
|
46
55
|
|
|
47
|
-
|
|
56
|
+
Budget scales with phase size: **`max(25, task_count * 5)` Bash/Grep calls** per invocation. The plan file's `## Task N` count determines `task_count` — a 3-task phase gets 25 calls, a 10-task phase gets 50.
|
|
57
|
+
|
|
58
|
+
Prefer one multi-pattern grep over many single-pattern greps. If you exhaust the budget, write what you found and mark unchecked criteria as `INSUFFICIENT EVIDENCE` — do not fabricate.
|
|
59
|
+
|
|
60
|
+
**INSUFFICIENT EVIDENCE is a hard FAIL signal.** The orchestrator (`/qualia-verify`) treats any `INSUFFICIENT EVIDENCE` line in the verification file as a phase FAIL, not silent PASS. Don't use it as a pass-through — only when budget genuinely exhausted before a criterion could be checked.
|
|
48
61
|
|
|
49
62
|
## Goal-Backward Verification
|
|
50
63
|
|
package/bin/cli.js
CHANGED
|
@@ -160,7 +160,7 @@ const QUALIA_AGENT_FILES = [
|
|
|
160
160
|
];
|
|
161
161
|
|
|
162
162
|
// 3 Qualia bin scripts.
|
|
163
|
-
const QUALIA_BIN_FILES = ["state.js", "qualia-ui.js", "statusline.js", "knowledge.js", "knowledge-flush.js", "plan-contract.js", "agent-runs.js", "slop-detect.mjs"];
|
|
163
|
+
const QUALIA_BIN_FILES = ["state.js", "qualia-ui.js", "statusline.js", "knowledge.js", "knowledge-flush.js", "plan-contract.js", "agent-runs.js", "slop-detect.mjs", "erp-retry.js"];
|
|
164
164
|
|
|
165
165
|
// Qualia rules — security, deployment, infra, grounding, plus the v4.5.0 design substrate.
|
|
166
166
|
// frontend.md and design-reference.md are kept for backward compat; new projects use design-laws/brand/product/rubric.
|
|
@@ -1035,6 +1035,29 @@ function cmdSetErpKey() {
|
|
|
1035
1035
|
// ─── Flush: convenience wrapper around knowledge-flush.js ───────
|
|
1036
1036
|
// Exposes the cron-runnable script as a top-level CLI command so users can
|
|
1037
1037
|
// run `qualia-framework flush` ad-hoc. All args after the command pass through.
|
|
1038
|
+
// ─── erp-flush: drain the ERP retry queue verbosely ───────────
|
|
1039
|
+
// Thin wrapper around bin/erp-retry.js. Used when an employee wants to
|
|
1040
|
+
// retry stranded reports on demand (e.g., after the ERP came back online,
|
|
1041
|
+
// or after rotating the API key). All args pass through.
|
|
1042
|
+
function cmdErpFlush() {
|
|
1043
|
+
const retryScript = path.join(CLAUDE_DIR, "bin", "erp-retry.js");
|
|
1044
|
+
if (!fs.existsSync(retryScript)) {
|
|
1045
|
+
console.log(` ${RED}✗${RESET} erp-retry.js not installed at ${retryScript}`);
|
|
1046
|
+
console.log(` ${DIM}Run: npx qualia-framework@latest install${RESET}`);
|
|
1047
|
+
process.exit(1);
|
|
1048
|
+
}
|
|
1049
|
+
banner();
|
|
1050
|
+
console.log("");
|
|
1051
|
+
const args = process.argv.slice(3);
|
|
1052
|
+
// Default: drain action. Allow `qualia-framework erp-flush show` / `clear` too.
|
|
1053
|
+
const action = (args[0] && !args[0].startsWith("--")) ? args.shift() : "drain";
|
|
1054
|
+
const r = spawnSync(process.execPath, [retryScript, action, ...args], {
|
|
1055
|
+
stdio: "inherit",
|
|
1056
|
+
shell: false,
|
|
1057
|
+
});
|
|
1058
|
+
process.exit(r.status || 0);
|
|
1059
|
+
}
|
|
1060
|
+
|
|
1038
1061
|
function cmdFlush() {
|
|
1039
1062
|
const flushScript = path.join(CLAUDE_DIR, "bin", "knowledge-flush.js");
|
|
1040
1063
|
if (!fs.existsSync(flushScript)) {
|
|
@@ -1073,6 +1096,7 @@ function cmdDoctor() {
|
|
|
1073
1096
|
path.join(CLAUDE_DIR, "bin", "statusline.js"),
|
|
1074
1097
|
path.join(CLAUDE_DIR, "bin", "knowledge.js"),
|
|
1075
1098
|
path.join(CLAUDE_DIR, "bin", "knowledge-flush.js"),
|
|
1099
|
+
path.join(CLAUDE_DIR, "bin", "erp-retry.js"),
|
|
1076
1100
|
path.join(CLAUDE_DIR, "CLAUDE.md"),
|
|
1077
1101
|
CONFIG_FILE,
|
|
1078
1102
|
];
|
|
@@ -1247,6 +1271,7 @@ function cmdHelp() {
|
|
|
1247
1271
|
console.log(` qualia-framework ${TEAL}agents${RESET} Show per-agent run history (${DIM}--failed|--task ID|--phase N|prune --before${RESET})`);
|
|
1248
1272
|
console.log(` qualia-framework ${TEAL}set-erp-key${RESET} Save/enable the ERP API key`);
|
|
1249
1273
|
console.log(` qualia-framework ${TEAL}erp-ping${RESET} Verify ERP connectivity + API key`);
|
|
1274
|
+
console.log(` qualia-framework ${TEAL}erp-flush${RESET} Retry queued ERP report uploads (${DIM}show|clear${RESET})`);
|
|
1250
1275
|
console.log(` qualia-framework ${TEAL}doctor${RESET} Health-check the install (files, hooks, settings)`);
|
|
1251
1276
|
console.log(` qualia-framework ${TEAL}flush${RESET} Promote daily-log → curated knowledge (memory layer)`);
|
|
1252
1277
|
console.log("");
|
|
@@ -1312,6 +1337,10 @@ switch (cmd) {
|
|
|
1312
1337
|
case "ping":
|
|
1313
1338
|
cmdErpPing();
|
|
1314
1339
|
break;
|
|
1340
|
+
case "erp-flush":
|
|
1341
|
+
case "erp-retry":
|
|
1342
|
+
cmdErpFlush();
|
|
1343
|
+
break;
|
|
1315
1344
|
case "doctor":
|
|
1316
1345
|
case "health":
|
|
1317
1346
|
case "health-check":
|
package/bin/erp-retry.js
ADDED
|
@@ -0,0 +1,289 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
// ~/.claude/bin/erp-retry.js
|
|
3
|
+
//
|
|
4
|
+
// ERP report retry queue. /qualia-report enqueues a report here if the
|
|
5
|
+
// inline 3-attempt-with-backoff upload fails. session-start.js drains the
|
|
6
|
+
// queue quietly on the next Claude Code launch; `qualia-framework erp-flush`
|
|
7
|
+
// drains it verbosely on demand.
|
|
8
|
+
//
|
|
9
|
+
// Why a queue: the prior v5.8 implementation told the user "$REPORT will
|
|
10
|
+
// appear in ERP after retry" but had no retry mechanism — data was stranded
|
|
11
|
+
// locally until the employee manually re-ran /qualia-report. Found by the
|
|
12
|
+
// 2026-05-11 deep-research audit. This module makes the promise real.
|
|
13
|
+
//
|
|
14
|
+
// Idempotency: every enqueued item carries the Idempotency-Key header that
|
|
15
|
+
// the original report attempt used. The ERP UPSERTs on
|
|
16
|
+
// (project_id, client_report_id) and deduplicates Idempotency-Key for 24h,
|
|
17
|
+
// so retry-of-a-just-succeeded item is safe.
|
|
18
|
+
//
|
|
19
|
+
// Hard rules:
|
|
20
|
+
// - Never throw out the queue file on parse error — back it up and start fresh.
|
|
21
|
+
// - Max 10 attempts per item before marking give_up=true (stops the cycle).
|
|
22
|
+
// - 401/422 are permanent failures: keep the item but mark give_up=true so
|
|
23
|
+
// the user can see it and resolve manually (typically: fix the API key).
|
|
24
|
+
// - The CLI invocation NEVER exits non-zero unless the queue file is unreadable —
|
|
25
|
+
// we don't want session-start.js to surface "hook error" red banners.
|
|
26
|
+
//
|
|
27
|
+
// Usage:
|
|
28
|
+
// node erp-retry.js # drain all (with default 3-attempt budget per item)
|
|
29
|
+
// node erp-retry.js --quiet # no stdout unless errors
|
|
30
|
+
// node erp-retry.js --max=3 # drain at most 3 items this run
|
|
31
|
+
// node erp-retry.js --timeout=3000 # ms timeout per upload attempt
|
|
32
|
+
// node erp-retry.js show # print queue contents, drain nothing
|
|
33
|
+
// node erp-retry.js clear # delete the queue (use after manual fix)
|
|
34
|
+
|
|
35
|
+
const fs = require("fs");
|
|
36
|
+
const path = require("path");
|
|
37
|
+
const os = require("os");
|
|
38
|
+
const https = require("https");
|
|
39
|
+
const http = require("http");
|
|
40
|
+
const urlLib = require("url");
|
|
41
|
+
|
|
42
|
+
const HOME = os.homedir();
|
|
43
|
+
const QUEUE_FILE = path.join(HOME, ".claude", ".erp-retry-queue.json");
|
|
44
|
+
const API_KEY_FILE = path.join(HOME, ".claude", ".erp-api-key");
|
|
45
|
+
const CONFIG_FILE = path.join(HOME, ".claude", ".qualia-config.json");
|
|
46
|
+
|
|
47
|
+
const MAX_GIVE_UP_ATTEMPTS = 10;
|
|
48
|
+
const DEFAULT_TIMEOUT_MS = 5000;
|
|
49
|
+
const DEFAULT_MAX_ITEMS = 100;
|
|
50
|
+
|
|
51
|
+
// ─── Args ───────────────────────────────────────────────
|
|
52
|
+
const args = process.argv.slice(2);
|
|
53
|
+
const ACTION = (args[0] && !args[0].startsWith("--")) ? args[0] : "drain";
|
|
54
|
+
const QUIET = args.includes("--quiet");
|
|
55
|
+
function flag(name, fallback) {
|
|
56
|
+
const found = args.find((a) => a.startsWith(`--${name}=`));
|
|
57
|
+
if (!found) return fallback;
|
|
58
|
+
const v = found.split("=", 2)[1];
|
|
59
|
+
const n = Number(v);
|
|
60
|
+
return Number.isFinite(n) ? n : fallback;
|
|
61
|
+
}
|
|
62
|
+
const MAX_ITEMS = flag("max", DEFAULT_MAX_ITEMS);
|
|
63
|
+
const TIMEOUT_MS = flag("timeout", DEFAULT_TIMEOUT_MS);
|
|
64
|
+
|
|
65
|
+
function log(msg) { if (!QUIET) process.stdout.write(msg + "\n"); }
|
|
66
|
+
function logErr(msg) { process.stderr.write(msg + "\n"); }
|
|
67
|
+
|
|
68
|
+
// ─── Queue I/O ──────────────────────────────────────────
|
|
69
|
+
function readQueue() {
|
|
70
|
+
if (!fs.existsSync(QUEUE_FILE)) return { queue: [] };
|
|
71
|
+
try {
|
|
72
|
+
const raw = fs.readFileSync(QUEUE_FILE, "utf8");
|
|
73
|
+
const parsed = JSON.parse(raw);
|
|
74
|
+
if (!parsed || !Array.isArray(parsed.queue)) return { queue: [] };
|
|
75
|
+
return parsed;
|
|
76
|
+
} catch (e) {
|
|
77
|
+
// Corrupt queue — back up and start fresh. Never silently destroy data.
|
|
78
|
+
const bak = `${QUEUE_FILE}.bak.${new Date().toISOString().replace(/[:.]/g, "-")}`;
|
|
79
|
+
try { fs.copyFileSync(QUEUE_FILE, bak); } catch {}
|
|
80
|
+
logErr(`erp-retry: queue file unparseable; backed up to ${bak} and starting fresh`);
|
|
81
|
+
return { queue: [] };
|
|
82
|
+
}
|
|
83
|
+
}
|
|
84
|
+
|
|
85
|
+
function writeQueue(data) {
|
|
86
|
+
const dir = path.dirname(QUEUE_FILE);
|
|
87
|
+
if (!fs.existsSync(dir)) fs.mkdirSync(dir, { recursive: true });
|
|
88
|
+
// Atomic: write tmp, rename. Mode 0600 because the payload contains
|
|
89
|
+
// session notes that may reference internal project state.
|
|
90
|
+
const tmp = `${QUEUE_FILE}.tmp.${process.pid}`;
|
|
91
|
+
fs.writeFileSync(tmp, JSON.stringify(data, null, 2) + "\n", { mode: 0o600 });
|
|
92
|
+
try { fs.chmodSync(tmp, 0o600); } catch {}
|
|
93
|
+
fs.renameSync(tmp, QUEUE_FILE);
|
|
94
|
+
}
|
|
95
|
+
|
|
96
|
+
function enqueue({ client_report_id, idempotency_key, url, payload, last_error }) {
|
|
97
|
+
if (!client_report_id || !url || !payload) {
|
|
98
|
+
throw new Error("enqueue: client_report_id, url, payload are required");
|
|
99
|
+
}
|
|
100
|
+
const data = readQueue();
|
|
101
|
+
// Dedupe — if this report id is already queued, update the existing item
|
|
102
|
+
// instead of appending a duplicate.
|
|
103
|
+
const existing = data.queue.find((it) => it.client_report_id === client_report_id);
|
|
104
|
+
if (existing) {
|
|
105
|
+
existing.idempotency_key = idempotency_key || existing.idempotency_key;
|
|
106
|
+
existing.url = url;
|
|
107
|
+
existing.payload = payload;
|
|
108
|
+
existing.last_error = last_error || existing.last_error || "";
|
|
109
|
+
existing.attempts = existing.attempts || 0;
|
|
110
|
+
existing.give_up = false; // unblock a retry if user fixed the underlying issue
|
|
111
|
+
existing.enqueued_at = new Date().toISOString();
|
|
112
|
+
} else {
|
|
113
|
+
data.queue.push({
|
|
114
|
+
client_report_id,
|
|
115
|
+
idempotency_key: idempotency_key || "",
|
|
116
|
+
url,
|
|
117
|
+
payload,
|
|
118
|
+
enqueued_at: new Date().toISOString(),
|
|
119
|
+
attempts: 0,
|
|
120
|
+
last_error: last_error || "",
|
|
121
|
+
give_up: false,
|
|
122
|
+
});
|
|
123
|
+
}
|
|
124
|
+
writeQueue(data);
|
|
125
|
+
}
|
|
126
|
+
|
|
127
|
+
// ─── HTTP upload (native https — no curl bearer leak via /proc) ──
|
|
128
|
+
function postOnce(item, apiKey) {
|
|
129
|
+
return new Promise((resolve) => {
|
|
130
|
+
let u;
|
|
131
|
+
try { u = urlLib.parse(item.url); } catch {
|
|
132
|
+
return resolve({ code: "—", body: "", error: "invalid url" });
|
|
133
|
+
}
|
|
134
|
+
const lib = u.protocol === "https:" ? https : http;
|
|
135
|
+
const headers = {
|
|
136
|
+
"Authorization": `Bearer ${apiKey}`,
|
|
137
|
+
"Content-Type": "application/json",
|
|
138
|
+
"Content-Length": Buffer.byteLength(item.payload),
|
|
139
|
+
};
|
|
140
|
+
if (item.idempotency_key) headers["Idempotency-Key"] = item.idempotency_key;
|
|
141
|
+
const req = lib.request({
|
|
142
|
+
method: "POST",
|
|
143
|
+
hostname: u.hostname,
|
|
144
|
+
port: u.port || (u.protocol === "https:" ? 443 : 80),
|
|
145
|
+
path: u.path,
|
|
146
|
+
headers,
|
|
147
|
+
timeout: TIMEOUT_MS,
|
|
148
|
+
}, (res) => {
|
|
149
|
+
let chunks = "";
|
|
150
|
+
res.setEncoding("utf8");
|
|
151
|
+
res.on("data", (c) => { chunks += c; });
|
|
152
|
+
res.on("end", () => resolve({ code: String(res.statusCode), body: chunks.trim(), error: null }));
|
|
153
|
+
});
|
|
154
|
+
req.on("error", (e) => resolve({ code: "—", body: "", error: e.message || "request failed" }));
|
|
155
|
+
req.on("timeout", () => { try { req.destroy(new Error("timeout")); } catch {} });
|
|
156
|
+
req.write(item.payload);
|
|
157
|
+
req.end();
|
|
158
|
+
});
|
|
159
|
+
}
|
|
160
|
+
|
|
161
|
+
function readApiKey() {
|
|
162
|
+
try { return fs.readFileSync(API_KEY_FILE, "utf8").trim(); } catch { return ""; }
|
|
163
|
+
}
|
|
164
|
+
|
|
165
|
+
function readConfig() {
|
|
166
|
+
try { return JSON.parse(fs.readFileSync(CONFIG_FILE, "utf8")); } catch { return {}; }
|
|
167
|
+
}
|
|
168
|
+
|
|
169
|
+
// ─── Actions ────────────────────────────────────────────
|
|
170
|
+
async function actionDrain() {
|
|
171
|
+
const data = readQueue();
|
|
172
|
+
if (data.queue.length === 0) {
|
|
173
|
+
log("erp-retry: queue empty");
|
|
174
|
+
return { drained: 0, kept: 0, give_up: 0 };
|
|
175
|
+
}
|
|
176
|
+
|
|
177
|
+
const cfg = readConfig();
|
|
178
|
+
const erpEnabled = !cfg.erp || cfg.erp.enabled !== false;
|
|
179
|
+
if (!erpEnabled) {
|
|
180
|
+
log("erp-retry: ERP disabled in config; skipping drain (queue preserved)");
|
|
181
|
+
return { drained: 0, kept: data.queue.length, give_up: 0 };
|
|
182
|
+
}
|
|
183
|
+
|
|
184
|
+
const apiKey = readApiKey();
|
|
185
|
+
if (!apiKey) {
|
|
186
|
+
log("erp-retry: API key missing; skipping drain (queue preserved)");
|
|
187
|
+
return { drained: 0, kept: data.queue.length, give_up: 0 };
|
|
188
|
+
}
|
|
189
|
+
|
|
190
|
+
let drained = 0;
|
|
191
|
+
let give_up = 0;
|
|
192
|
+
const remaining = [];
|
|
193
|
+
let processed = 0;
|
|
194
|
+
|
|
195
|
+
for (const item of data.queue) {
|
|
196
|
+
// Already given up — keep but don't try.
|
|
197
|
+
if (item.give_up) {
|
|
198
|
+
remaining.push(item);
|
|
199
|
+
continue;
|
|
200
|
+
}
|
|
201
|
+
// Respect the per-run cap.
|
|
202
|
+
if (processed >= MAX_ITEMS) {
|
|
203
|
+
remaining.push(item);
|
|
204
|
+
continue;
|
|
205
|
+
}
|
|
206
|
+
processed++;
|
|
207
|
+
|
|
208
|
+
const result = await postOnce(item, apiKey);
|
|
209
|
+
|
|
210
|
+
if (result.code === "200") {
|
|
211
|
+
drained++;
|
|
212
|
+
log(` ✓ uploaded ${item.client_report_id}`);
|
|
213
|
+
continue; // omit from remaining
|
|
214
|
+
}
|
|
215
|
+
|
|
216
|
+
item.attempts = (item.attempts || 0) + 1;
|
|
217
|
+
item.last_error = result.error
|
|
218
|
+
? `network: ${result.error}`
|
|
219
|
+
: `HTTP ${result.code}${result.body ? `: ${result.body.slice(0, 200)}` : ""}`;
|
|
220
|
+
|
|
221
|
+
if (result.code === "401" || result.code === "422") {
|
|
222
|
+
// Permanent — surface to user.
|
|
223
|
+
item.give_up = true;
|
|
224
|
+
give_up++;
|
|
225
|
+
log(` ✗ ${item.client_report_id} permanent fail (HTTP ${result.code}) — leaving in queue for manual review`);
|
|
226
|
+
} else if (item.attempts >= MAX_GIVE_UP_ATTEMPTS) {
|
|
227
|
+
item.give_up = true;
|
|
228
|
+
give_up++;
|
|
229
|
+
log(` ✗ ${item.client_report_id} gave up after ${item.attempts} attempts (last: ${item.last_error})`);
|
|
230
|
+
} else {
|
|
231
|
+
log(` · ${item.client_report_id} retry pending (attempt ${item.attempts}, last: ${item.last_error})`);
|
|
232
|
+
}
|
|
233
|
+
remaining.push(item);
|
|
234
|
+
}
|
|
235
|
+
|
|
236
|
+
writeQueue({ queue: remaining });
|
|
237
|
+
|
|
238
|
+
if (drained > 0 || give_up > 0 || !QUIET) {
|
|
239
|
+
log(`erp-retry: drained=${drained} kept=${remaining.length} give_up=${give_up}`);
|
|
240
|
+
}
|
|
241
|
+
return { drained, kept: remaining.length, give_up };
|
|
242
|
+
}
|
|
243
|
+
|
|
244
|
+
function actionShow() {
|
|
245
|
+
const data = readQueue();
|
|
246
|
+
if (data.queue.length === 0) {
|
|
247
|
+
log("queue empty");
|
|
248
|
+
return;
|
|
249
|
+
}
|
|
250
|
+
log(`${data.queue.length} item(s) in queue:`);
|
|
251
|
+
for (const item of data.queue) {
|
|
252
|
+
log(` ${item.client_report_id} enqueued=${item.enqueued_at} attempts=${item.attempts || 0}${item.give_up ? " GIVE_UP" : ""}`);
|
|
253
|
+
if (item.last_error) log(` last_error: ${item.last_error}`);
|
|
254
|
+
}
|
|
255
|
+
}
|
|
256
|
+
|
|
257
|
+
function actionClear() {
|
|
258
|
+
if (!fs.existsSync(QUEUE_FILE)) {
|
|
259
|
+
log("queue already absent");
|
|
260
|
+
return;
|
|
261
|
+
}
|
|
262
|
+
// Back up rather than rm — destructive ops on user data deserve a recovery point.
|
|
263
|
+
const bak = `${QUEUE_FILE}.bak.${new Date().toISOString().replace(/[:.]/g, "-")}`;
|
|
264
|
+
try { fs.copyFileSync(QUEUE_FILE, bak); } catch {}
|
|
265
|
+
try { fs.unlinkSync(QUEUE_FILE); } catch {}
|
|
266
|
+
log(`queue cleared (backup at ${bak})`);
|
|
267
|
+
}
|
|
268
|
+
|
|
269
|
+
// ─── Export for in-process use (qualia-report skill enqueues directly) ──
|
|
270
|
+
module.exports = { enqueue, readQueue, writeQueue };
|
|
271
|
+
|
|
272
|
+
// ─── CLI entrypoint ─────────────────────────────────────
|
|
273
|
+
if (require.main === module) {
|
|
274
|
+
(async () => {
|
|
275
|
+
try {
|
|
276
|
+
if (ACTION === "drain") await actionDrain();
|
|
277
|
+
else if (ACTION === "show") actionShow();
|
|
278
|
+
else if (ACTION === "clear") actionClear();
|
|
279
|
+
else {
|
|
280
|
+
logErr(`erp-retry: unknown action "${ACTION}". Use: drain | show | clear`);
|
|
281
|
+
process.exit(2);
|
|
282
|
+
}
|
|
283
|
+
} catch (e) {
|
|
284
|
+
logErr(`erp-retry: ${e && e.message ? e.message : e}`);
|
|
285
|
+
// Soft-fail so session-start.js never blocks.
|
|
286
|
+
process.exit(0);
|
|
287
|
+
}
|
|
288
|
+
})();
|
|
289
|
+
}
|
package/bin/install.js
CHANGED
|
@@ -665,6 +665,12 @@ async function main() {
|
|
|
665
665
|
);
|
|
666
666
|
fs.chmodSync(path.join(binDest, "slop-detect.mjs"), 0o755);
|
|
667
667
|
ok("slop-detect.mjs (anti-pattern scanner — runs pre-commit on frontend builds)");
|
|
668
|
+
copy(
|
|
669
|
+
path.join(FRAMEWORK_DIR, "bin", "erp-retry.js"),
|
|
670
|
+
path.join(binDest, "erp-retry.js")
|
|
671
|
+
);
|
|
672
|
+
fs.chmodSync(path.join(binDest, "erp-retry.js"), 0o755);
|
|
673
|
+
ok("erp-retry.js (ERP report retry queue — drained by session-start hook and erp-flush CLI)");
|
|
668
674
|
} catch (e) {
|
|
669
675
|
warn(`scripts — ${e.message}`);
|
|
670
676
|
}
|
package/bin/state.js
CHANGED
|
@@ -329,7 +329,16 @@ function parseStateMd(content) {
|
|
|
329
329
|
|
|
330
330
|
// ─── STATE.md Writer ─────────────────────────────────────
|
|
331
331
|
function writeStateMd(s) {
|
|
332
|
-
|
|
332
|
+
// Completed phases count toward progress. A phase counts as "done" once the
|
|
333
|
+
// state has advanced past `built` (i.e. status in verified/polished/shipped/handed_off).
|
|
334
|
+
// Without this adjustment, a 3-phase project shows 66% at completion and a
|
|
335
|
+
// 1-phase project shows 0% — the bar would never reach 100%.
|
|
336
|
+
const DONE_STATUSES = new Set(["verified", "polished", "shipped", "handed_off"]);
|
|
337
|
+
const currentDone = DONE_STATUSES.has(s.status) && s.verification !== "fail" ? 1 : 0;
|
|
338
|
+
const phaseFrac = Math.min(
|
|
339
|
+
100,
|
|
340
|
+
Math.round(((s.phase - 1 + currentDone) / s.total_phases) * 100)
|
|
341
|
+
);
|
|
333
342
|
const filled = Math.round(phaseFrac / 10);
|
|
334
343
|
const bar = "█".repeat(filled) + "░".repeat(10 - filled);
|
|
335
344
|
const now = new Date().toISOString().split("T")[0];
|
package/docs/onboarding.html
CHANGED
|
@@ -505,8 +505,8 @@
|
|
|
505
505
|
<div class="kit-group">
|
|
506
506
|
<h4>Mid-flight design & tests</h4>
|
|
507
507
|
<dl>
|
|
508
|
-
<dt>/qualia-polish</dt><dd>
|
|
509
|
-
<dt>/qualia-polish
|
|
508
|
+
<dt>/qualia-polish</dt><dd>Design pass — scope-adaptive: component, route, full app, redesign, critique, quick.</dd>
|
|
509
|
+
<dt>/qualia-polish --loop</dt><dd>Autonomous screenshot → score → fix loop until rubric passes.</dd>
|
|
510
510
|
<dt>/qualia-test</dt><dd>Generate tests, run tests, or drive a feature TDD-style.</dd>
|
|
511
511
|
</dl>
|
|
512
512
|
</div>
|
|
@@ -521,8 +521,7 @@
|
|
|
521
521
|
<div class="kit-group">
|
|
522
522
|
<h4>Build something small</h4>
|
|
523
523
|
<dl>
|
|
524
|
-
<dt>/qualia-
|
|
525
|
-
<dt>/qualia-task</dt><dd>Single feature, fresh builder spawn, atomic commit. 1 to 5 files.</dd>
|
|
524
|
+
<dt>/qualia-feature</dt><dd>Auto-scoped: inline for trivia (typo, config tweak), fresh builder spawn for 1-5 file features. Refuses and routes to /qualia-plan when scope is phase-sized.</dd>
|
|
526
525
|
</dl>
|
|
527
526
|
</div>
|
|
528
527
|
<div class="kit-group">
|
|
@@ -553,7 +552,6 @@
|
|
|
553
552
|
<dl>
|
|
554
553
|
<dt>/qualia-discuss</dt><dd>Without args: project-level non-technical discovery (mandatory at kickoff). With <code>N</code>: phase-level technical alignment with locked decisions and ADRs.</dd>
|
|
555
554
|
<dt>/qualia-research</dt><dd>Deep-research a niche library or domain before planning a phase.</dd>
|
|
556
|
-
<dt>/qualia-prd</dt><dd>Synthesize the current conversation into a durable PRD document.</dd>
|
|
557
555
|
</dl>
|
|
558
556
|
</div>
|
|
559
557
|
<div class="kit-group">
|
|
@@ -1,10 +1,12 @@
|
|
|
1
|
-
# /qualia-polish
|
|
1
|
+
# /qualia-polish --loop — Pilot results
|
|
2
|
+
|
|
3
|
+
> Historical pilot from v5.1.0 (when the command was `/qualia-polish-loop`). Paths updated to the current `skills/qualia-polish/` location after the v5.8.0 consolidation into the `--loop` flag.
|
|
2
4
|
|
|
3
5
|
**Run date:** 2026-05-03
|
|
4
6
|
**Framework version:** 5.1.0 (this commit)
|
|
5
7
|
**Operator:** Claude Opus 4.7 (1M context), main session, autonomous build
|
|
6
8
|
**Browser backend used:** Playwright cached chromium 1217 (`~/.cache/ms-playwright/chromium-1217/chrome-linux64/chrome`) — auto-selected by `playwright-capture.mjs` after `import('playwright')` failed (no `playwright` npm package in the framework repo) and the cache lookup found a usable binary
|
|
7
|
-
**Fixture server:** `python3 -m http.server 18080` against `skills/qualia-polish
|
|
9
|
+
**Fixture server:** `python3 -m http.server 18080` against `skills/qualia-polish/fixtures/`
|
|
8
10
|
**Captures:** `/tmp/qpl-pilot-1777778113/{scenario1,scenario2}/{mobile,tablet,desktop}-*.png`
|
|
9
11
|
|
|
10
12
|
This pilot replaces an earlier draft that pre-dated the actual capture pipeline. The numbers below come from real runs of `playwright-capture.mjs` against the committed fixtures, not from architectural reasoning.
|
|
@@ -13,7 +15,7 @@ This pilot replaces an earlier draft that pre-dated the actual capture pipeline.
|
|
|
13
15
|
|
|
14
16
|
## Scenario 1 — Synthetic clean page
|
|
15
17
|
|
|
16
|
-
**Fixture:** `skills/qualia-polish
|
|
18
|
+
**Fixture:** `skills/qualia-polish/fixtures/clean.html` (170 lines). Fraunces + JetBrains Mono pair, OKLCH palette tinted toward 220° (cyan), asymmetric hero (`1.4fr / 1fr`), full-width with `clamp()` padding, varied work-grid (no card monotony), border-only headers, focus-visible rings, `prefers-reduced-motion` respected, 65ch line-length cap on prose.
|
|
17
19
|
|
|
18
20
|
**Expected per spec:** SUCCESS in 1-2 iterations with all dims ≥ 4.
|
|
19
21
|
|
|
@@ -53,7 +55,7 @@ This pilot replaces an earlier draft that pre-dated the actual capture pipeline.
|
|
|
53
55
|
|
|
54
56
|
## Scenario 2 — Synthetic broken page
|
|
55
57
|
|
|
56
|
-
**Fixture:** `skills/qualia-polish
|
|
58
|
+
**Fixture:** `skills/qualia-polish/fixtures/broken.html` (115 lines). Deliberately shipped slop. Inter font as primary, pure `#fff` + `#000`, blue→purple linear-gradient hero, `background-clip: text` gradient on h1, three identical 1/3-width feature cards with `border-left: 4px solid #2563eb` side-stripes, "Get Started" + "Learn More" CTAs, `max-width: 1280px` container, `outline: none` on nav links without replacement.
|
|
57
59
|
|
|
58
60
|
**Expected per spec:** Loop identifies all anti-patterns, fixes them, ends SUCCESS in 4-6 iterations.
|
|
59
61
|
|
|
@@ -110,7 +112,7 @@ This pilot replaces an earlier draft that pre-dated the actual capture pipeline.
|
|
|
110
112
|
|
|
111
113
|
```bash
|
|
112
114
|
# Initialize a fresh state file
|
|
113
|
-
node skills/qualia-polish
|
|
115
|
+
node skills/qualia-polish/scripts/loop.mjs init \
|
|
114
116
|
--state /tmp/.../qpl-kill.json --url http://localhost:3000 --max 8
|
|
115
117
|
|
|
116
118
|
# Iteration 1: write an eval where typography==1 with a fixed issue fingerprint
|
|
@@ -0,0 +1,189 @@
|
|
|
1
|
+
# Qualia Framework — Deep Research Audit
|
|
2
|
+
|
|
3
|
+
**Date:** 2026-05-11
|
|
4
|
+
**Version audited:** v5.8.0 (tip `387c422`)
|
|
5
|
+
**Auditors:** 4 parallel investigators (surface health, ERP integration, token economy + personalization, workflow outcomes)
|
|
6
|
+
**Method:** Grounded protocol — every claim carries `file:line` citation with quoted snippet.
|
|
7
|
+
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
## Headline verdict
|
|
11
|
+
|
|
12
|
+
The framework is **structurally solid** but carries three real failure modes the user was right to suspect:
|
|
13
|
+
|
|
14
|
+
1. **Documentation drift is the loudest silent failure.** Three user-facing surfaces (`rules/speed.md`, `templates/help.html`, `docs/onboarding.html`) still list `/qualia-quick`, `/qualia-task`, `/qualia-design`, `/qualia-prd`, `/qualia-polish-loop` — commands removed in v5.7/v5.8. A new hire following onboarding.html immediately hits dead ends.
|
|
15
|
+
2. **ERP health is better than the user feared, but one promise is a lie.** After 3 failed upload attempts the message says "will appear in ERP after retry" — there is no retry mechanism. No queue, no cron, no session-start re-try. Data sits locally until the employee manually re-runs `/qualia-report`. The retry logic that DOES exist (1s/3s/9s backoff, 401/422 permanent-fail distinction) is correct.
|
|
16
|
+
3. **Always-loaded substrate is ~2× larger than the "Pocock discipline" claim implies.** CLAUDE.md is genuinely 24 lines, but the 8 rules files (~480 lines) + 33 skill descriptions (~14.7 KB) total **~10,300 tokens** on every session start. ~5,400 of those are recoverable without losing functionality.
|
|
17
|
+
|
|
18
|
+
Production-readiness score (framework as a product): **77 / 100**
|
|
19
|
+
|
|
20
|
+
| Dimension | Score | One-line |
|
|
21
|
+
|---|---:|---|
|
|
22
|
+
| Surface honesty | 6/10 | Dead refs in 3 user-facing files |
|
|
23
|
+
| ERP health | 7/10 | Real retry, false retry-promise, missing idempotency |
|
|
24
|
+
| Token discipline | 6/10 | Real where claimed, but ~5.4K tokens of recoverable bloat |
|
|
25
|
+
| Personalization | 3/10 | 4 employees are identical clones in the framework's eyes |
|
|
26
|
+
| Workflow speed | 7/10 | Road works; kickoff has redundant questions, no fast path |
|
|
27
|
+
| Verifier strictness | 7/10 | Strong protocol, INSUFFICIENT EVIDENCE silently treated as PASS |
|
|
28
|
+
| Test coverage | 8/10 | State machine excellently tested, workflow loop untested |
|
|
29
|
+
| Hooks (safety) | 9/10 | Genuinely well-engineered, zero token tax, real enforcement |
|
|
30
|
+
|
|
31
|
+
---
|
|
32
|
+
|
|
33
|
+
## CRITICAL findings (zero this round)
|
|
34
|
+
|
|
35
|
+
None of the four audits surfaced a CRITICAL severity issue (security breach / data loss / auth bypass / crash on happy path). The framework's safety hooks (`branch-guard`, `git-guardrails`, `migration-guard`, `pre-deploy-gate`) and state-machine locking are real defenses.
|
|
36
|
+
|
|
37
|
+
This is meaningful: the user's "I think this is very fucked up" suspicion about the ERP was wrong at the data-safety level — local commit always happens before upload, retry logic is correct, API key file is mode `0600`, no shell injection.
|
|
38
|
+
|
|
39
|
+
---
|
|
40
|
+
|
|
41
|
+
## HIGH findings (10 — fix before next minor)
|
|
42
|
+
|
|
43
|
+
### Surface honesty
|
|
44
|
+
|
|
45
|
+
**H1. `rules/speed.md:50-51` lists removed `/qualia-quick` + `/qualia-task` as active.**
|
|
46
|
+
`speed.md` is read by users looking for shortcuts. Users will invoke commands that don't exist.
|
|
47
|
+
|
|
48
|
+
**H2. `templates/help.html:377-379` shows `/qualia-quick`, `/qualia-task`, `/qualia-design`.**
|
|
49
|
+
This is what `/qualia-help` opens in the browser. It's the canonical reference page.
|
|
50
|
+
|
|
51
|
+
**H3. `docs/onboarding.html:509, 524, 525, 556` shows 4 removed commands.**
|
|
52
|
+
This is the file the README explicitly recommends sending to new hires.
|
|
53
|
+
|
|
54
|
+
### ERP
|
|
55
|
+
|
|
56
|
+
**H4. `skills/qualia-report/SKILL.md:238` — "will appear in ERP after retry" is a lie.**
|
|
57
|
+
There is no retry mechanism. No background queue, no cron, no session-start drain. Either build a retry queue at `~/.claude/.erp-retry-queue.json` and drain on session-start, OR change the message to honest "Re-run /qualia-report to retry."
|
|
58
|
+
|
|
59
|
+
**H5. `skills/qualia-report/SKILL.md:220-222` — no `Idempotency-Key` header sent.**
|
|
60
|
+
The ERP contract documents idempotency support with a 24h replay window (`docs/erp-contract.md:42-49`). The framework ignores this. The UPSERT on `(project_id, client_report_id)` covers most cases, but retries after a response-lost-mid-flight could double-count without the explicit header.
|
|
61
|
+
|
|
62
|
+
**H6. `session_duration_minutes` documented in ERP contract example but never sent.**
|
|
63
|
+
`docs/erp-contract.md:93` shows it; the payload builder at `skills/qualia-report/SKILL.md:192-205` never computes it. Trivial fix: `Math.round((Date.now() - new Date(t.session_started_at)) / 60000)`.
|
|
64
|
+
|
|
65
|
+
### Workflow quality
|
|
66
|
+
|
|
67
|
+
**H7. `agents/verifier.md:47` — 25-call tool budget + `skills/qualia-verify/SKILL.md` doesn't block on INSUFFICIENT EVIDENCE.**
|
|
68
|
+
The verifier is told "mark unchecked criteria as INSUFFICIENT EVIDENCE" when budget exhausts. The orchestrator does not grep for that string before declaring PASS. Phases with 8+ tasks can pass verification with criteria literally not checked. **This is the #1 false-pass vector.**
|
|
69
|
+
|
|
70
|
+
**H8. `/qualia-new` has 15-21+ user questions before any code is written.**
|
|
71
|
+
14 discovery + 1 design vibe + 1 client + 5 PRODUCT.md + N feature scoping. The PRODUCT.md questions at `skills/qualia-new/SKILL.md:163-169` overlap with discovery questions 2-5. Demos hit ~15 interactions despite the "8 questions for demos" framing.
|
|
72
|
+
|
|
73
|
+
### Personalization
|
|
74
|
+
|
|
75
|
+
**H9. All 4 employees (Hasan, Moayad, Rama, Sally) have identical role descriptions.**
|
|
76
|
+
`bin/install.js:31-57` — every EMPLOYEE entry has the description "Developer. Feature branches only. Cannot push to main." No stack expertise, seniority, specialization. The framework cannot adapt explanation depth, task assignment, or review style per developer.
|
|
77
|
+
|
|
78
|
+
**H10. `architecture.md` is always-loaded but explicitly says "Do not auto-load this on quick fixes."**
|
|
79
|
+
`rules/architecture.md` is 125 lines (~1,560 tokens). It lives in `~/.claude/rules/` where Claude Code auto-loads everything. The file itself contradicts its location.
|
|
80
|
+
|
|
81
|
+
---
|
|
82
|
+
|
|
83
|
+
## MEDIUM findings (12 — fix this quarter)
|
|
84
|
+
|
|
85
|
+
| # | Where | What |
|
|
86
|
+
|---|---|---|
|
|
87
|
+
| M1 | `bin/state.js:332` | Progress bar formula `(phase-1)/total_phases` — completed project shows 66%, never 100% |
|
|
88
|
+
| M2 | `agents/builder.md:155`, `agents/planner.md:147`, `agents/research-synthesizer.md:91` | "likely", "probably" — hedging language the grounding protocol explicitly bans |
|
|
89
|
+
| M3 | `bin/state.js:375-376` | `polished → shipped` mandatory; no skip for API-only / backend-only projects |
|
|
90
|
+
| M4 | `skills/zoho-workflow/` | Completely unreferenced — orphan skill |
|
|
91
|
+
| M5 | `tests/skills.test.sh` | Tests structure, not behavior — no integration test for the plan→build→verify loop |
|
|
92
|
+
| M6 | `agents/plan-checker.md:83` | "Any shared file = wave conflict" forces unnecessary serialization (no read-only-overlap distinction) |
|
|
93
|
+
| M7 | `agents/plan-checker.md:173-179` | Scope-reduction Rule 10 substring-matches "v1", "basic version" — false REVISE on legit plans |
|
|
94
|
+
| M8 | `agents/verifier.md:315-317` + `:356-362` | Design rubric fires full 8-dim on any `.tsx` file presence — backend-heavy phases get disproportionate design scrutiny |
|
|
95
|
+
| M9 | `bin/cli.js:874-888` | `erp-ping` sends synthetic payload — does not validate current schema |
|
|
96
|
+
| M10 | `skills/qualia-report/SKILL.md:197` | `framework_version` reads from config snapshot at install time — stale after `npm update` |
|
|
97
|
+
| M11 | `skills/qualia-new/SKILL.md:163-169` | PRODUCT.md questions duplicate discovery questions 2-5 |
|
|
98
|
+
| M12 | `skills/qualia/SKILL.md:38-55` | Router has no row for "I want a one-off change outside the Road" — users must already know `/qualia-feature` exists |
|
|
99
|
+
|
|
100
|
+
---
|
|
101
|
+
|
|
102
|
+
## LOW findings (5)
|
|
103
|
+
|
|
104
|
+
- L1. `hooks/pre-compact.js:80-81` — default `--no-verify` + `--no-gpg-sign` (configurable, documented, but silent default)
|
|
105
|
+
- L2. `docs/playwright-loop-pilot-results.md:7,16,56,113` — references the renamed `skills/qualia-polish-loop/` path
|
|
106
|
+
- L3. `skills/qualia-report/SKILL.md:152` — error table shows the old `set-erp-key <key>` positional syntax (CLI now requires piped)
|
|
107
|
+
- L4. `bin/state.js:130` — trace probabilistic pruning (1% chance) may let `.qualia-traces/` grow unnecessarily on heavy installs
|
|
108
|
+
- L5. `agents/visual-evaluator.md:91` — `likely_file` (as JSON field name, borderline)
|
|
109
|
+
|
|
110
|
+
---
|
|
111
|
+
|
|
112
|
+
## Cross-cutting patterns
|
|
113
|
+
|
|
114
|
+
### Pattern 1: Surface drift outpaces test coverage
|
|
115
|
+
|
|
116
|
+
`tests/skills.test.sh` validates that every SKILL.md has the right frontmatter. It does NOT validate that command references inside SKILL.md, rules/, docs/, templates/ point to skills that still exist. Three v5.7/v5.8 removals slipped through because the test surface is structural, not referential.
|
|
117
|
+
|
|
118
|
+
**Fix:** Add `tests/refs.test.sh` that greps every `.md` and `.html` for `/qualia-{name}` and asserts each name has a matching `skills/qualia-{name}/SKILL.md`.
|
|
119
|
+
|
|
120
|
+
### Pattern 2: The framework is more disciplined than its tooling enforces
|
|
121
|
+
|
|
122
|
+
CLAUDE.md is genuinely lean. Design substrate was correctly moved off the always-loaded path. Hooks enforce deterministically. But:
|
|
123
|
+
|
|
124
|
+
- `rules/architecture.md` lives where it auto-loads despite its own warning
|
|
125
|
+
- Skill descriptions accumulate flavor text ("Karpathy-style", "v5.3 from Matt Pocock's...") on top of trigger phrases — every change invalidates the cache prefix
|
|
126
|
+
- 8 rules files always-load when 3 are sufficient for most sessions
|
|
127
|
+
|
|
128
|
+
**Fix:** A `tests/budget.test.sh` that asserts total always-loaded substrate stays under ~6,000 tokens.
|
|
129
|
+
|
|
130
|
+
### Pattern 3: The verifier knows what to check but can't always afford to check it
|
|
131
|
+
|
|
132
|
+
The 3-level verification (Truths / Artifacts / Wiring) is the right abstraction. The 25-call budget makes it un-affordable on phases with 8+ tasks. INSUFFICIENT EVIDENCE is the escape hatch, but the orchestrator doesn't punish it. So the verifier silently approves under-verified phases.
|
|
133
|
+
|
|
134
|
+
**Fix:** Budget = `max(25, tasks * 5)`. AND: any INSUFFICIENT EVIDENCE in the verification file → verdict downgraded to FAIL.
|
|
135
|
+
|
|
136
|
+
### Pattern 4: Personalization is structurally absent
|
|
137
|
+
|
|
138
|
+
There are 4 distinct humans (Hasan, Moayad, Rama, Sally) who get an identical one-sentence description. The daily-log, knowledge layer, learned-patterns, and commit history all contain per-user signal that is collected and never read for personalization.
|
|
139
|
+
|
|
140
|
+
**Fix:** Per-employee profile file under `~/.claude/team/{code}.md`, injected into CLAUDE.md template at install time. Auto-derived from daily logs via a `/qualia-flush` extension.
|
|
141
|
+
|
|
142
|
+
---
|
|
143
|
+
|
|
144
|
+
## Top 10 fixes ranked by ROI
|
|
145
|
+
|
|
146
|
+
1. **Find-and-replace the 6 dead command references in `speed.md`, `help.html`, `onboarding.html`** — 15 min. Eliminates every user-facing dead end.
|
|
147
|
+
2. **Add INSUFFICIENT EVIDENCE blocker to `qualia-verify`** — 20 min. Eliminates the #1 false-pass vector.
|
|
148
|
+
3. **Stop lying about retry: either build the queue or change the message** — 30 min for the message change, ~3 hours for the queue.
|
|
149
|
+
4. **Send `Idempotency-Key` + `session_duration_minutes` in ERP payload** — 20 min. Completes the documented contract.
|
|
150
|
+
5. **Move `architecture.md` to `~/.claude/qualia-substrate/` (lazy-load)** — 10 min. Saves ~1,560 tokens per session.
|
|
151
|
+
6. **Trim skill descriptions to trigger-phrases-only** — 1 hour. Saves ~1,500 tokens per session + improves cache stability.
|
|
152
|
+
7. **Per-employee profile files (`team/{code}.md`)** — 2 hours. Transforms personalization from 0 to material.
|
|
153
|
+
8. **Scale verifier budget to `max(25, tasks*5)`** — 5 min. Eliminates INSUFFICIENT EVIDENCE on large phases.
|
|
154
|
+
9. **Merge PRODUCT.md questions into the discovery interview** — 30 min. Cuts ~3 questions from every kickoff.
|
|
155
|
+
10. **Add `tests/refs.test.sh`** — 45 min. Prevents the next surface-drift incident.
|
|
156
|
+
|
|
157
|
+
**Total effort for all 10:** ~8-10 hours.
|
|
158
|
+
**Effort-weighted impact:** removes every found false-pass vector, every documented dead-link, ~3K of recoverable tokens, the framework's biggest UX papercut (personalization), and the largest invisible quality risk (INSUFFICIENT EVIDENCE silently passing).
|
|
159
|
+
|
|
160
|
+
---
|
|
161
|
+
|
|
162
|
+
## What the framework does WELL (honest acknowledgement)
|
|
163
|
+
|
|
164
|
+
These are not patronizing — they came up across all four audits.
|
|
165
|
+
|
|
166
|
+
1. **State machine** (`bin/state.js`). 57 behavioral tests. Atomic dual-file writes. File-based locking with stale detection. Crash-recovery journaling. Gap-cycle circuit breaker with configurable limit. Schema validation + repair. This is real engineering.
|
|
167
|
+
2. **Hook architecture.** Pure Node.js (Windows-safe). Zero model-token tax (deterministic enforcement, not instructional). Real protections: service_role leak scan, force-push-to-main block (role-aware), migration safety, Vercel account guard, env-empty guard.
|
|
168
|
+
3. **Verifier abstraction** (Truths / Artifacts / Wiring). Most AI coding tools check "did the task run." The Qualia verifier checks "is the artifact substantive, is it imported, is it called." The stub-detection patterns are operational learning, not theory.
|
|
169
|
+
4. **Polish-loop kill-switch.** Fingerprint regression detection + budget cap. Real engineering against the known infinite-loop failure mode of vision-model feedback loops.
|
|
170
|
+
5. **ERP security hardening.** Native `https.request` instead of curl (no bearer in `/proc/cmdline`). API key mode `0600`. Refuses positional CLI args. Env-var passing in payload builder (no shell injection). Atomic tmp+rename writes.
|
|
171
|
+
|
|
172
|
+
---
|
|
173
|
+
|
|
174
|
+
## Resources & references
|
|
175
|
+
|
|
176
|
+
- All findings cite specific files. Original investigator outputs from this audit are not committed (held in conversation context).
|
|
177
|
+
- Grounding protocol: `/home/qualia-new/.claude/rules/grounding.md`
|
|
178
|
+
- Severity criteria: same file.
|
|
179
|
+
- Pocock instruction-budget pattern: referenced in README:8.
|
|
180
|
+
- ERP contract spec: `docs/erp-contract.md` (this file exists and was used by the ERP-integration audit).
|
|
181
|
+
|
|
182
|
+
---
|
|
183
|
+
|
|
184
|
+
## Open questions for the user
|
|
185
|
+
|
|
186
|
+
1. **The retry-queue approach for ERP** — would you prefer (a) an honest message change ("re-run to retry"), (b) a queue file drained on session-start, or (c) a cron job? Each has different operational characteristics.
|
|
187
|
+
2. **Personalization depth** — willing to spend an afternoon writing 4 profile files for Hasan/Moayad/Rama/Sally? Or want this auto-derived from daily logs over time?
|
|
188
|
+
3. **Token cuts vs cache stability** — some of the cuts (e.g. trimming skill descriptions) will invalidate prompt caches for one cycle. Worth it once, or keep stable?
|
|
189
|
+
4. **The 14-question discovery interview** — keep depth at the kickoff cost, or shave 4-5 questions and accept slightly fuzzier project framing?
|
package/hooks/session-start.js
CHANGED
|
@@ -26,6 +26,8 @@ const STATE_FILE = path.join(".planning", "STATE.md");
|
|
|
26
26
|
const CONTINUE_HERE = ".continue-here.md";
|
|
27
27
|
const NOTIF_FILE = path.join(HOME, ".claude", ".qualia-update-available.json");
|
|
28
28
|
const HEALTH_FILE = path.join(HOME, ".claude", ".qualia-install-health.json");
|
|
29
|
+
const ERP_RETRY = path.join(HOME, ".claude", "bin", "erp-retry.js");
|
|
30
|
+
const ERP_QUEUE = path.join(HOME, ".claude", ".erp-retry-queue.json");
|
|
29
31
|
|
|
30
32
|
// Critical files referenced by skills via @-import. If any are missing, skills
|
|
31
33
|
// silently get empty context and produce ungrounded output. We spot-check these
|
|
@@ -115,6 +117,21 @@ function fallbackText() {
|
|
|
115
117
|
}
|
|
116
118
|
}
|
|
117
119
|
|
|
120
|
+
function maybeDrainErpQueue() {
|
|
121
|
+
// Fire-and-forget drain of any reports stranded from a prior /qualia-report
|
|
122
|
+
// upload failure. Cheap fast-path: only spawn if the queue file exists and
|
|
123
|
+
// erp-retry.js is installed. Quiet mode + small max so we never block the
|
|
124
|
+
// session-start critical path. The script itself exits 0 even on internal
|
|
125
|
+
// errors — see erp-retry.js's CLI tail.
|
|
126
|
+
try {
|
|
127
|
+
if (!fs.existsSync(ERP_QUEUE) || !fs.existsSync(ERP_RETRY)) return;
|
|
128
|
+
spawnSync(process.execPath, [ERP_RETRY, "drain", "--quiet", "--max=5", "--timeout=2500"], {
|
|
129
|
+
stdio: "ignore",
|
|
130
|
+
timeout: 8000,
|
|
131
|
+
});
|
|
132
|
+
} catch {}
|
|
133
|
+
}
|
|
134
|
+
|
|
118
135
|
function maybeRenderUpdateBanner() {
|
|
119
136
|
// EMPLOYEE-only sticky banner. auto-update.js writes NOTIF_FILE when a new
|
|
120
137
|
// version is detected; we render it every session until the user actually
|
|
@@ -144,6 +161,7 @@ function renderHealthWarning(missing) {
|
|
|
144
161
|
|
|
145
162
|
try {
|
|
146
163
|
maybeRenderUpdateBanner();
|
|
164
|
+
maybeDrainErpQueue();
|
|
147
165
|
|
|
148
166
|
const healthMissing = checkInstallHealth();
|
|
149
167
|
if (healthMissing) renderHealthWarning(healthMissing);
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "qualia-framework",
|
|
3
|
-
"version": "5.
|
|
3
|
+
"version": "5.9.0",
|
|
4
4
|
"description": "Claude Code workflow framework by Qualia Solutions. Plan, build, verify, ship.",
|
|
5
5
|
"bin": {
|
|
6
6
|
"qualia-framework": "./bin/cli.js"
|
|
@@ -31,7 +31,8 @@
|
|
|
31
31
|
"test:skills": "bash tests/skills.test.sh",
|
|
32
32
|
"test:slop-detect": "bash tests/slop-detect.test.sh",
|
|
33
33
|
"test:statusline": "bash tests/statusline.test.sh",
|
|
34
|
-
"test:
|
|
34
|
+
"test:refs": "bash tests/refs.test.sh",
|
|
35
|
+
"test:shell": "bash tests/statusline.test.sh && bash tests/state.test.sh && bash tests/hooks.test.sh && bash tests/bin.test.sh && bash tests/lib.test.sh && bash tests/skills.test.sh && bash tests/refs.test.sh && bash tests/slop-detect.test.sh"
|
|
35
36
|
},
|
|
36
37
|
"files": [
|
|
37
38
|
"bin/",
|
package/rules/speed.md
CHANGED
|
@@ -47,8 +47,7 @@ The pattern: **on-demand by default; always-on only when the data is irreducibly
|
|
|
47
47
|
|
|
48
48
|
When a Qualia command exists for the situation, use it — don't reinvent:
|
|
49
49
|
- `/qualia` — what's my next step?
|
|
50
|
-
- `/qualia-
|
|
51
|
-
- `/qualia-task` — single focused task, fresh builder spawn, atomic commit
|
|
50
|
+
- `/qualia-feature` — single feature, auto-scoped: inline for trivia, fresh builder spawn for 1-5 file features
|
|
52
51
|
- `/qualia-ship` — full deploy pipeline (quality gates → commit → deploy → verify)
|
|
53
52
|
- `/qualia-review` — production audit
|
|
54
53
|
- `/qualia-pause` — save context before clearing the conversation
|
|
@@ -168,6 +168,17 @@ REPORT_FILE=".planning/reports/report-{date}.md"
|
|
|
168
168
|
SUBMITTED_BY=$(git config user.name || echo "unknown")
|
|
169
169
|
SUBMITTED_AT=$(date -u +%Y-%m-%dT%H:%M:%SZ)
|
|
170
170
|
|
|
171
|
+
# Idempotency key — deterministic per (client_report_id, submitted_at). A retry
|
|
172
|
+
# of the same shift report carries the same key, so the ERP can dedupe at the
|
|
173
|
+
# header level in addition to the UPSERT on (project_id, client_report_id).
|
|
174
|
+
IDEMPOTENCY_KEY=$(node -e "
|
|
175
|
+
const c=require('crypto');
|
|
176
|
+
const seed='$CLIENT_REPORT_ID|$SUBMITTED_AT|'+require('path').basename(process.cwd());
|
|
177
|
+
// RFC 4122 v5-style: deterministic UUID from sha1 of the seed
|
|
178
|
+
const h=c.createHash('sha1').update(seed).digest('hex');
|
|
179
|
+
console.log([h.slice(0,8),h.slice(8,12),'5'+h.slice(13,16),'8'+h.slice(17,20),h.slice(20,32)].join('-'));
|
|
180
|
+
")
|
|
181
|
+
|
|
171
182
|
# Guard: API key required for upload (otherwise curl posts an empty bearer)
|
|
172
183
|
if [ "$ERP_ENABLED" = "true" ] && [ -z "$API_KEY" ] && [ "$DRY_RUN" != "true" ]; then
|
|
173
184
|
node ~/.claude/bin/qualia-ui.js warn "ERP API key missing (~/.claude/.erp-api-key). Run: qualia-framework set-erp-key <key>"
|
|
@@ -189,6 +200,17 @@ PAYLOAD=$(
|
|
|
189
200
|
const commits=[];try{const r=spawnSync('git',['log','--oneline','--since=8 hours ago','--format=%h'],{encoding:'utf8',timeout:3000});if(r.stdout)commits.push(...r.stdout.trim().split('\n').filter(Boolean));}catch{}
|
|
190
201
|
const gitRemote=t.git_remote||git(['config','--get','remote.origin.url']);
|
|
191
202
|
const projectKey=t.project_id||repoSlug(gitRemote)||require('path').basename(process.cwd());
|
|
203
|
+
// Session duration: minutes from session_started_at to submitted_at. The ERP's
|
|
204
|
+
// example payload (docs/erp-contract.md:93) includes this; without it the ERP
|
|
205
|
+
// can't compute shift-length analytics without parsing notes.
|
|
206
|
+
let sessionDurationMinutes=0;
|
|
207
|
+
if(t.session_started_at){
|
|
208
|
+
const startMs=Date.parse(t.session_started_at);
|
|
209
|
+
const endMs=Date.parse(process.env.SUBMITTED_AT)||Date.now();
|
|
210
|
+
if(!Number.isNaN(startMs)&&endMs>startMs){
|
|
211
|
+
sessionDurationMinutes=Math.round((endMs-startMs)/60000);
|
|
212
|
+
}
|
|
213
|
+
}
|
|
192
214
|
console.log(JSON.stringify({
|
|
193
215
|
project:t.project||require('path').basename(process.cwd()),
|
|
194
216
|
project_id:projectKey,team_id:t.team_id||'qualia-solutions',git_remote:gitRemote,
|
|
@@ -200,6 +222,7 @@ PAYLOAD=$(
|
|
|
200
222
|
gap_cycles:(t.gap_cycles||{})[String(t.phase)]||0,build_count:t.build_count||0,
|
|
201
223
|
deploy_count:t.deploy_count||0,deployed_url:t.deployed_url||'',
|
|
202
224
|
session_started_at:t.session_started_at||'',last_pushed_at:t.last_pushed_at||'',
|
|
225
|
+
session_duration_minutes:sessionDurationMinutes,
|
|
203
226
|
lifetime:t.lifetime||{},commits:commits,notes:notes,
|
|
204
227
|
submitted_by:process.env.SUBMITTED_BY||'unknown',submitted_at:process.env.SUBMITTED_AT
|
|
205
228
|
}));
|
|
@@ -214,11 +237,15 @@ if [ "$DRY_RUN" = "true" ]; then
|
|
|
214
237
|
exit 0
|
|
215
238
|
fi
|
|
216
239
|
|
|
217
|
-
# Upload — 3 attempts with 1s/3s/9s backoff
|
|
240
|
+
# Upload — 3 attempts with 1s/3s/9s backoff.
|
|
241
|
+
# Idempotency-Key header carries a deterministic UUID per (client_report_id, submitted_at)
|
|
242
|
+
# so the ERP can dedupe at the request level in addition to the UPSERT key on the body.
|
|
243
|
+
# Documented in docs/erp-contract.md:42-49 with a 24h replay window.
|
|
218
244
|
if [ "$ERP_ENABLED" = "true" ]; then
|
|
219
245
|
for ATTEMPT in 1 2 3; do
|
|
220
246
|
RESPONSE=$(curl -sS -X POST "$ERP_URL/api/v1/reports" \
|
|
221
247
|
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" \
|
|
248
|
+
-H "Idempotency-Key: $IDEMPOTENCY_KEY" \
|
|
222
249
|
-d "$PAYLOAD" --max-time 10 -w "\n__HTTP__%{http_code}" 2>&1)
|
|
223
250
|
HTTP_CODE=$(echo "$RESPONSE" | grep -o "__HTTP__[0-9]*" | sed 's/__HTTP__//')
|
|
224
251
|
BODY=$(echo "$RESPONSE" | sed 's/__HTTP__[0-9]*//g')
|
|
@@ -235,7 +262,42 @@ if [ "$ERP_ENABLED" = "true" ]; then
|
|
|
235
262
|
fi
|
|
236
263
|
[ $ATTEMPT -lt 3 ] && { SLEEP=$((1 * 3 ** (ATTEMPT - 1))); node ~/.claude/bin/qualia-ui.js warn "Attempt $ATTEMPT failed (HTTP ${HTTP_CODE:-timeout}), retrying in ${SLEEP}s..."; sleep $SLEEP; }
|
|
237
264
|
done
|
|
238
|
-
|
|
265
|
+
|
|
266
|
+
# If all 3 in-process attempts failed, enqueue the report into the persistent
|
|
267
|
+
# retry queue (~/.claude/.erp-retry-queue.json). session-start.js drains it on
|
|
268
|
+
# the next Claude Code launch; `qualia-framework erp-flush` drains it on demand.
|
|
269
|
+
# This replaces the prior "will appear after retry" message which was a lie —
|
|
270
|
+
# no retry mechanism existed before v5.9.
|
|
271
|
+
if [ "$ATTEMPT" = "3" ] && [ "$HTTP_CODE" != "200" ]; then
|
|
272
|
+
LAST_ERR="HTTP ${HTTP_CODE:-timeout}"
|
|
273
|
+
if [ -n "$BODY" ]; then LAST_ERR="$LAST_ERR: $(echo "$BODY" | head -c 200)"; fi
|
|
274
|
+
PAYLOAD="$PAYLOAD" \
|
|
275
|
+
CLIENT_REPORT_ID="$CLIENT_REPORT_ID" \
|
|
276
|
+
IDEMPOTENCY_KEY="$IDEMPOTENCY_KEY" \
|
|
277
|
+
ERP_URL="$ERP_URL" \
|
|
278
|
+
LAST_ERR="$LAST_ERR" \
|
|
279
|
+
node -e "
|
|
280
|
+
try {
|
|
281
|
+
const {enqueue} = require(require('os').homedir() + '/.claude/bin/erp-retry.js');
|
|
282
|
+
enqueue({
|
|
283
|
+
client_report_id: process.env.CLIENT_REPORT_ID,
|
|
284
|
+
idempotency_key: process.env.IDEMPOTENCY_KEY,
|
|
285
|
+
url: process.env.ERP_URL + '/api/v1/reports',
|
|
286
|
+
payload: process.env.PAYLOAD,
|
|
287
|
+
last_error: process.env.LAST_ERR,
|
|
288
|
+
});
|
|
289
|
+
process.stdout.write('enqueued');
|
|
290
|
+
} catch (e) {
|
|
291
|
+
process.stderr.write('enqueue failed: ' + (e.message || e));
|
|
292
|
+
process.exit(1);
|
|
293
|
+
}
|
|
294
|
+
" 2>/dev/null && {
|
|
295
|
+
node ~/.claude/bin/qualia-ui.js warn "ERP upload failed after 3 attempts — $CLIENT_REPORT_ID enqueued for auto-retry on next session"
|
|
296
|
+
node ~/.claude/bin/qualia-ui.js info "Drain manually with: qualia-framework erp-flush"
|
|
297
|
+
} || {
|
|
298
|
+
node ~/.claude/bin/qualia-ui.js warn "ERP upload failed after 3 attempts AND queue enqueue failed. $CLIENT_REPORT_ID is committed locally — re-run /qualia-report later to retry."
|
|
299
|
+
}
|
|
300
|
+
fi
|
|
239
301
|
fi
|
|
240
302
|
|
|
241
303
|
[ "$ERP_ENABLED" != "true" ] && node ~/.claude/bin/qualia-ui.js info "ERP upload skipped (disabled). $CLIENT_REPORT_ID committed locally."
|
|
@@ -104,6 +104,22 @@ Append '## Adversarial Findings' to verification file. Empty section fine if not
|
|
|
104
104
|
|
|
105
105
|
Findings merge into main report. Union PASS/FAIL: either pass found CRITICAL/HIGH → phase FAIL.
|
|
106
106
|
|
|
107
|
+
### 2d. INSUFFICIENT EVIDENCE downgrade (mandatory)
|
|
108
|
+
|
|
109
|
+
The verifier marks criteria it could not check (budget exhaustion, missing context) as `INSUFFICIENT EVIDENCE`. The orchestrator treats those as silent PASS unless explicitly downgraded — that's the #1 false-pass vector. Grep the verification file before declaring PASS:
|
|
110
|
+
|
|
111
|
+
```bash
|
|
112
|
+
IE_COUNT=$(grep -c "INSUFFICIENT EVIDENCE" .planning/phase-{N}-verification.md 2>/dev/null || echo 0)
|
|
113
|
+
if [ "$IE_COUNT" -gt 0 ]; then
|
|
114
|
+
node ~/.claude/bin/qualia-ui.js warn "${IE_COUNT} criteria marked INSUFFICIENT EVIDENCE — downgrading verdict to FAIL"
|
|
115
|
+
# Rewrite the verdict line in-place
|
|
116
|
+
sed -i 's/^result: PASS$/result: FAIL/' .planning/phase-{N}-verification.md
|
|
117
|
+
sed -i 's/^## Verdict$/## Verdict\n\n**Downgraded to FAIL:** '"${IE_COUNT}"' criteria left unchecked. Re-run with larger budget (`max(25, tasks*5)` already applied) or simplify the phase plan./' .planning/phase-{N}-verification.md
|
|
118
|
+
fi
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
The same check runs after the adversarial pass if it executed.
|
|
122
|
+
|
|
107
123
|
### 3. Present Results
|
|
108
124
|
|
|
109
125
|
Read verification report. Present:
|
package/templates/help.html
CHANGED
|
@@ -374,9 +374,8 @@
|
|
|
374
374
|
<h3>Quick Paths</h3>
|
|
375
375
|
<p class="cmd-group-note">Lightweight alternatives when the full road is overkill.</p>
|
|
376
376
|
<div class="commands">
|
|
377
|
-
<div class="cmd"><span class="cmd-name">/qualia-
|
|
378
|
-
<div class="cmd"><span class="cmd-name">/qualia-
|
|
379
|
-
<div class="cmd"><span class="cmd-name">/qualia-design</span><span class="cmd-desc">One-shot design transformation — critiques, fixes, polishes, hardens, makes responsive. No reports, no choices, just makes it professional.</span></div>
|
|
377
|
+
<div class="cmd"><span class="cmd-name">/qualia-feature</span><span class="cmd-desc">Auto-scoped single-feature build. Inline for trivia (typo, config), fresh builder spawn for 1-5 file features. Refuses and routes to /qualia-plan for phase-sized work. Flags: --force-spawn, --force-inline.</span></div>
|
|
378
|
+
<div class="cmd"><span class="cmd-name">/qualia-polish</span><span class="cmd-desc">Design pass, scope-adaptive — component, route, full app, redesign, critique, quick. Add --loop for the autonomous screenshot → score → fix loop.</span></div>
|
|
380
379
|
</div>
|
|
381
380
|
</div>
|
|
382
381
|
|
package/tests/bin.test.sh
CHANGED
|
@@ -719,18 +719,36 @@ else
|
|
|
719
719
|
fail_case "research-synthesizer missing model frontmatter"
|
|
720
720
|
fi
|
|
721
721
|
|
|
722
|
-
# 64.
|
|
723
|
-
|
|
722
|
+
# 64. v5.9 model tiering: structured agents (verifier, plan-checker, roadmapper,
|
|
723
|
+
# qa-browser) use Sonnet. Real-reasoning agents (planner, builder, researcher,
|
|
724
|
+
# visual-evaluator) keep inherited Opus.
|
|
725
|
+
SONNET_AGENTS=("verifier.md" "plan-checker.md" "roadmapper.md" "qa-browser.md")
|
|
726
|
+
OPUS_AGENTS=("planner.md" "builder.md" "researcher.md" "visual-evaluator.md")
|
|
727
|
+
|
|
728
|
+
ALL_OK=1
|
|
729
|
+
for a in "${SONNET_AGENTS[@]}"; do
|
|
730
|
+
if ! grep -q '^model: sonnet$' "$TMP/.claude/agents/$a" 2>/dev/null; then
|
|
731
|
+
ALL_OK=0
|
|
732
|
+
echo " missing 'model: sonnet' in $a"
|
|
733
|
+
fi
|
|
734
|
+
done
|
|
735
|
+
if [ "$ALL_OK" = "1" ]; then
|
|
736
|
+
pass "structured agents (verifier/plan-checker/roadmapper/qa-browser) use sonnet (v5.9 tiering)"
|
|
737
|
+
else
|
|
738
|
+
fail_case "v5.9 sonnet-tier agent has wrong model frontmatter"
|
|
739
|
+
fi
|
|
740
|
+
|
|
724
741
|
ALL_OK=1
|
|
725
|
-
for a in "${
|
|
742
|
+
for a in "${OPUS_AGENTS[@]}"; do
|
|
726
743
|
if grep -q '^model: ' "$TMP/.claude/agents/$a" 2>/dev/null; then
|
|
727
744
|
ALL_OK=0
|
|
745
|
+
echo " unexpected 'model:' in $a (should inherit Opus)"
|
|
728
746
|
fi
|
|
729
747
|
done
|
|
730
748
|
if [ "$ALL_OK" = "1" ]; then
|
|
731
|
-
pass "
|
|
749
|
+
pass "reasoning agents (planner/builder/researcher/visual-evaluator) inherit Opus"
|
|
732
750
|
else
|
|
733
|
-
fail_case "
|
|
751
|
+
fail_case "Opus-tier agent has unexpected model frontmatter"
|
|
734
752
|
fi
|
|
735
753
|
|
|
736
754
|
echo ""
|
|
@@ -0,0 +1,146 @@
|
|
|
1
|
+
#!/bin/bash
|
|
2
|
+
# Qualia Framework — surface-drift guard
|
|
3
|
+
# Greps every active surface for backtick-quoted /qualia-{name} command references.
|
|
4
|
+
# Asserts each name has a matching skills/qualia-{name}/SKILL.md.
|
|
5
|
+
#
|
|
6
|
+
# Why: v5.7 + v5.8 removed /qualia-quick, /qualia-task, /qualia-prd, /qualia-design,
|
|
7
|
+
# /qualia-polish-loop. Three user-facing files (rules/speed.md, templates/help.html,
|
|
8
|
+
# docs/onboarding.html) still pointed at the removed commands. New hires hit dead
|
|
9
|
+
# ends. This test fails when that happens again.
|
|
10
|
+
#
|
|
11
|
+
# Scoping rules:
|
|
12
|
+
# - Only matches references quoted in markdown backticks (`/qualia-foo`) or shown
|
|
13
|
+
# as a slash-command at line start. Bare path refs (/qualia-templates/,
|
|
14
|
+
# /qualia-ui.js) are excluded — they're directories or scripts, not commands.
|
|
15
|
+
# - Historical surfaces (docs/reviews/, docs/research/, CHANGELOG) are excluded —
|
|
16
|
+
# they intentionally describe past states.
|
|
17
|
+
# - Migration mentions on active surfaces (README/guide describing v5.7 removals)
|
|
18
|
+
# are excluded via a context prefix check.
|
|
19
|
+
#
|
|
20
|
+
# Run: bash tests/refs.test.sh
|
|
21
|
+
|
|
22
|
+
FRAMEWORK_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
|
|
23
|
+
SKILLS_DIR="$FRAMEWORK_ROOT/skills"
|
|
24
|
+
|
|
25
|
+
PASS=0
|
|
26
|
+
FAIL=0
|
|
27
|
+
|
|
28
|
+
pass() {
|
|
29
|
+
echo " ✓ $1"
|
|
30
|
+
PASS=$((PASS + 1))
|
|
31
|
+
}
|
|
32
|
+
|
|
33
|
+
fail_case() {
|
|
34
|
+
echo " ✗ $1"
|
|
35
|
+
echo " $2"
|
|
36
|
+
FAIL=$((FAIL + 1))
|
|
37
|
+
}
|
|
38
|
+
|
|
39
|
+
# Files we never scan — historical records, backups, vendored deps.
|
|
40
|
+
EXCLUDE_REGEX='/docs/reviews/|/docs/research/|/docs/playwright-loop-pilot-results\.md$|/CHANGELOG\.md$|\.bak\.|\.git/|/node_modules/|/\.continue-here\.md$'
|
|
41
|
+
|
|
42
|
+
# When a `/qualia-foo` ref appears AFTER one of these context tokens on the same line,
|
|
43
|
+
# it's a migration-explainer ("Replaces /qualia-quick" / "deprecated in v5.7"), not
|
|
44
|
+
# a live command reference. Treat it as exempt.
|
|
45
|
+
MIGRATION_CONTEXT_REGEX='Replaces|Removed|removed in|consolidated|deprecated|renamed|former|previously|was the|now the|now\s+`?/qualia|absorbed|superseded|legacy|migrated|after\s+`?/qualia.*-(quick|task|prd|design|polish-loop)'
|
|
46
|
+
|
|
47
|
+
ACTIVE_DIRS=(
|
|
48
|
+
"$FRAMEWORK_ROOT/rules"
|
|
49
|
+
"$FRAMEWORK_ROOT/skills"
|
|
50
|
+
"$FRAMEWORK_ROOT/agents"
|
|
51
|
+
"$FRAMEWORK_ROOT/hooks"
|
|
52
|
+
"$FRAMEWORK_ROOT/templates"
|
|
53
|
+
)
|
|
54
|
+
|
|
55
|
+
ACTIVE_FILES=(
|
|
56
|
+
"$FRAMEWORK_ROOT/README.md"
|
|
57
|
+
"$FRAMEWORK_ROOT/guide.md"
|
|
58
|
+
"$FRAMEWORK_ROOT/CLAUDE.md"
|
|
59
|
+
"$FRAMEWORK_ROOT/AGENTS.md"
|
|
60
|
+
"$FRAMEWORK_ROOT/docs/onboarding.html"
|
|
61
|
+
)
|
|
62
|
+
|
|
63
|
+
echo "refs.test.sh — surface-drift guard (active /qualia-* references must point at shipped skills)"
|
|
64
|
+
echo ""
|
|
65
|
+
|
|
66
|
+
SCAN_FILES=$(
|
|
67
|
+
{
|
|
68
|
+
for d in "${ACTIVE_DIRS[@]}"; do
|
|
69
|
+
[ -d "$d" ] && find "$d" -type f \( -name "*.md" -o -name "*.html" \)
|
|
70
|
+
done
|
|
71
|
+
for f in "${ACTIVE_FILES[@]}"; do
|
|
72
|
+
[ -f "$f" ] && echo "$f"
|
|
73
|
+
done
|
|
74
|
+
} | grep -Ev "$EXCLUDE_REGEX" | sort -u
|
|
75
|
+
)
|
|
76
|
+
|
|
77
|
+
# Extract backtick-quoted command refs only. Two patterns:
|
|
78
|
+
# 1. `/qualia-foo` — backticked, the canonical command-doc style
|
|
79
|
+
# 2. <dt>/qualia-foo</dt> — HTML help/onboarding pages
|
|
80
|
+
# We capture name + file:line so we can show context per failure.
|
|
81
|
+
declare -A SEEN_REFS
|
|
82
|
+
declare -A REF_LOCATIONS
|
|
83
|
+
|
|
84
|
+
while IFS= read -r file; do
|
|
85
|
+
# Pattern A: backtick-quoted commands. Allow trailing flag/word but only capture base name.
|
|
86
|
+
while IFS=: read -r path lineno line; do
|
|
87
|
+
[ -z "$line" ] && continue
|
|
88
|
+
# Skip migration-context lines.
|
|
89
|
+
if echo "$line" | grep -qE "$MIGRATION_CONTEXT_REGEX"; then
|
|
90
|
+
continue
|
|
91
|
+
fi
|
|
92
|
+
# Extract every backticked /qualia-foo on this line.
|
|
93
|
+
matches=$(echo "$line" | grep -oE '`/qualia(-[a-z]+){0,3}`' | sed 's/^`//; s/`$//')
|
|
94
|
+
for ref in $matches; do
|
|
95
|
+
SEEN_REFS["$ref"]=1
|
|
96
|
+
REF_LOCATIONS["$ref"]="${REF_LOCATIONS[$ref]:+${REF_LOCATIONS[$ref]}, }$(basename "$path"):$lineno"
|
|
97
|
+
done
|
|
98
|
+
done < <(grep -nE '`/qualia(-[a-z]+){0,3}`' "$file" 2>/dev/null)
|
|
99
|
+
|
|
100
|
+
# Pattern B: HTML <dt>/qualia-foo</dt>.
|
|
101
|
+
while IFS=: read -r path lineno line; do
|
|
102
|
+
[ -z "$line" ] && continue
|
|
103
|
+
if echo "$line" | grep -qE "$MIGRATION_CONTEXT_REGEX"; then
|
|
104
|
+
continue
|
|
105
|
+
fi
|
|
106
|
+
matches=$(echo "$line" | grep -oE '<dt>/qualia(-[a-z]+){0,3}( [^<]*)?</dt>' | sed -E 's|<dt>(/qualia(-[a-z]+){0,3}).*|\1|')
|
|
107
|
+
for ref in $matches; do
|
|
108
|
+
SEEN_REFS["$ref"]=1
|
|
109
|
+
REF_LOCATIONS["$ref"]="${REF_LOCATIONS[$ref]:+${REF_LOCATIONS[$ref]}, }$(basename "$path"):$lineno"
|
|
110
|
+
done
|
|
111
|
+
done < <(grep -nE '<dt>/qualia' "$file" 2>/dev/null)
|
|
112
|
+
done <<<"$SCAN_FILES"
|
|
113
|
+
|
|
114
|
+
if [ ${#SEEN_REFS[@]} -eq 0 ]; then
|
|
115
|
+
fail_case "scan" "no backticked /qualia-* references found across active surfaces (scanner broken?)"
|
|
116
|
+
echo ""
|
|
117
|
+
echo "Results: $PASS passed, $FAIL failed"
|
|
118
|
+
exit 1
|
|
119
|
+
fi
|
|
120
|
+
|
|
121
|
+
# Sort refs for deterministic output.
|
|
122
|
+
for ref in $(printf '%s\n' "${!SEEN_REFS[@]}" | sort); do
|
|
123
|
+
name="${ref#/}"
|
|
124
|
+
skill_dir="$SKILLS_DIR/$name"
|
|
125
|
+
locations="${REF_LOCATIONS[$ref]}"
|
|
126
|
+
if [ -d "$skill_dir" ] && [ -f "$skill_dir/SKILL.md" ]; then
|
|
127
|
+
pass "$ref → skills/$name/SKILL.md"
|
|
128
|
+
continue
|
|
129
|
+
fi
|
|
130
|
+
fail_case "$ref" "no skills/$name/SKILL.md — referenced by: $locations"
|
|
131
|
+
done
|
|
132
|
+
|
|
133
|
+
echo ""
|
|
134
|
+
echo "Results: $PASS passed, $FAIL failed"
|
|
135
|
+
|
|
136
|
+
if [ "$FAIL" -gt 0 ]; then
|
|
137
|
+
echo ""
|
|
138
|
+
echo "Surface drift detected. To fix one of:"
|
|
139
|
+
echo " 1. Restore the missing skill at skills/<name>/SKILL.md"
|
|
140
|
+
echo " 2. Update the offending file to point at the replacement command"
|
|
141
|
+
echo " 3. Reframe the mention as a migration note (this test skips lines containing"
|
|
142
|
+
echo " 'Replaces', 'consolidated', 'deprecated', 'now', 'former', 'superseded', etc.)"
|
|
143
|
+
exit 1
|
|
144
|
+
fi
|
|
145
|
+
|
|
146
|
+
exit 0
|