npm - qualia-framework - Versions diffs - 4.3.0 → 4.5.0 - Mend

qualia-framework 4.3.0 → 4.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (42) hide show

package/CLAUDE.md +13 -1
package/README.md +16 -13
package/agents/builder.md +12 -20
package/agents/plan-checker.md +18 -0
package/agents/planner.md +9 -0
package/agents/verifier.md +62 -0
package/bin/agent-runs.js +233 -0
package/bin/cli.js +225 -21
package/bin/install.js +25 -5
package/bin/plan-contract.js +220 -0
package/bin/slop-detect.mjs +357 -0
package/bin/state.js +199 -10
package/docs/agent-runs.md +273 -0
package/docs/erp-contract.md +5 -0
package/docs/plan-contract.md +321 -0
package/hooks/auto-update.js +3 -7
package/hooks/pre-compact.js +22 -11
package/hooks/pre-deploy-gate.js +16 -2
package/hooks/pre-push.js +22 -2
package/hooks/stop-session-log.js +1 -1
package/package.json +8 -2
package/rules/design-brand.md +110 -0
package/rules/design-laws.md +144 -0
package/rules/design-product.md +110 -0
package/rules/design-rubric.md +153 -0
package/skills/qualia-build/SKILL.md +5 -5
package/skills/qualia-flush/SKILL.md +1 -1
package/skills/qualia-new/SKILL.md +40 -3
package/skills/qualia-polish/SKILL.md +180 -136
package/skills/qualia-quick/SKILL.md +1 -1
package/skills/qualia-report/SKILL.md +25 -5
package/skills/qualia-ship/SKILL.md +12 -10
package/skills/zoho-workflow/SKILL.md +64 -0
package/templates/DESIGN.md +229 -435
package/templates/PRODUCT.md +95 -0
package/templates/help.html +13 -7
package/tests/bin.test.sh +6 -3
package/tests/hooks.test.sh +9 -20
package/tests/lib.test.sh +217 -0
package/tests/runner.js +96 -75
package/tests/state.test.sh +4 -3
package/skills/qualia-design/SKILL.md +0 -169

package/docs/agent-runs.md ADDED Viewed

@@ -0,0 +1,273 @@
+# Agent Runs Telemetry
+Append-only JSONL ledger of every subagent spawn, recorded per project. Substrate for `qualia-framework agents`, postmortem analysis, and ERP enrichment.
+Status: **draft, v1.** Pressure-test the shape against real spawns before locking.
+## Why this exists
+Today, `traces.jsonl` records hook-level events only. There is zero per-agent telemetry: no record of which builder ran for how long on which task, which verifier failed and why, which researcher hit a rate limit. The data needed to answer "which task failed twice and required a postmortem" doesn't exist.
+This file specifies a per-spawn record that lives next to the project (not in `~/.claude/`) so it travels with the repo, is committed alongside other planning artifacts, and stays attributable to a specific phase.
+## File layout
+```
+.planning/
+  agent-runs.jsonl          # all-time, append-only
+  agent-runs/
+    2026-04-28.jsonl        # daily rotation (optional, see below)
+```
+**Rotation:** start with single-file. If `agent-runs.jsonl` exceeds 5MB, rotate to dated subfile. Cheap, no dependency.
+**Privacy:** records contain file paths, task ids, durations, token counts, error strings — never command output, never file contents, never user prompts. The schema below is the upper bound of what we capture. `QUALIA_TELEMETRY=off` env var disables writes.
+## Schema (v1)
+OpenTelemetry GenAI semantic conventions where they fit; framework-specific fields where they don't.
+```ts
+interface AgentRunRecord {
+  // Identity
+  schema_version: 1;
+  run_id: string;                    // ULID — sortable, monotonic
+  parent_run_id?: string;            // ONLY for true nesting (an agent spawned this one); null otherwise
+  skill_invocation_id: string;       // groups runs from one skill call (sequential or parallel siblings)
+  session_id?: string;                // Claude Code session id when reachable; per-process UUID fallback
+  // What ran
+  agent_type: AgentType;
+  agent_name?: string;               // for custom agents (e.g. "frontend-agent")
+  model: string;                     // "claude-opus-4-7", "claude-sonnet-4-6", etc.
+  effort?: "low" | "medium" | "high" | "max";
+  // Where in the road
+  project: string;                   // tracking.json.project
+  phase?: number;                    // current phase if applicable
+  milestone?: number;
+  task_id?: string;                  // contract task id ("T1", "T2") for builders
+  wave?: number;
+  retry_of?: string;                 // run_id of the prior failed attempt this one is retrying
+  // Lifecycle
+  status: AgentStatus;
+  started_at: string;                // ISO 8601 UTC
+  finished_at: string;               // ISO 8601 UTC
+  duration_ms: number;
+  // Cost (OTel-aligned, optional — only if obtainable from spawn shape)
+  input_tokens?: number;             // gen_ai.usage.input_tokens
+  output_tokens?: number;            // gen_ai.usage.output_tokens
+  cache_read_tokens?: number;        // gen_ai.usage.cache_read.input_tokens
+  cache_creation_tokens?: number;    // gen_ai.usage.cache_creation.input_tokens
+  // Activity
+  tool_calls_count?: number;
+  files_changed?: string[];          // repo-relative, deduped
+  commit_sha?: string;               // if the run produced a commit
+  // Outcome detail
+  // status = did the agent process complete cleanly (success/failure/timeout/...)
+  // verification_result = did the code under test pass (only on agent_type="verifier")
+  // a verifier with status="success" + verification_result="fail" = the verifier ran fine and the code failed.
+  // a verifier with status="failure" = the verifier itself errored (timeout, infra, etc.)
+  verifier_score?: number;           // 1-5 if agent_type=verifier
+  verification_result?: "pass" | "fail" | "partial";
+  failure_reason?: string;           // short, machine-classifiable; see "Failure taxonomy" below
+  failure_detail?: string;           // last 500 chars of stderr/error — keep the tail (newest content), drop the head
+  // Self-link — only set on failure
+  log_file?: string;                 // .planning/agent-runs/<run_id>.log if status != success
+}
+type AgentType =
+  | "planner"
+  | "plan-checker"
+  | "builder"
+  | "verifier"
+  | "qa-browser"
+  | "researcher"
+  | "research-synthesizer"
+  | "roadmapper"
+  | "team-orchestrator"
+  | "custom";                        // user-defined agents
+type AgentStatus =
+  | "success"                        // completed, no failure_reason
+  | "partial"                        // completed but flagged issues (e.g. builder PARTIAL)
+  | "blocked"                        // builder hit a precondition gate (e.g. file lock)
+  | "failure"                        // explicit failure (verifier fail, builder error)
+  | "timeout"                        // exceeded budget
+  | "interrupted";                   // user cancelled / parent killed
+```
+### Failure taxonomy
+`failure_reason` is a closed enum so analytics can classify without parsing free text. Add new values via PR — don't free-text.
+| Code | Meaning |
+|---|---|
+| `tsc-failed` | TypeScript compilation errors |
+| `lint-failed` | ESLint violations |
+| `tests-failed` | Test runner non-zero exit |
+| `build-failed` | Production build broke |
+| `verification-criteria-unmet` | Verifier ran cleanly but criteria failed |
+| `verification-evidence-missing` | Behavioral check lacked required citations |
+| `verification-execution-error` | Check itself errored (binary missing, timeout, cwd missing) — distinct from criteria failure |
+| `file-not-found` | Referenced file absent |
+| `dependency-missing` | Referenced npm/pip/etc package absent |
+| `lock-timeout` | `.planning/.state.lock` not acquired |
+| `network-error` | Outbound HTTP failed (research, ERP) |
+| `rate-limited` | LLM API 429 |
+| `context-overflow` | Prompt exceeded model context |
+| `tool-misuse` | Builder called a forbidden tool |
+| `precondition-unmet` | Required state/file missing before run |
+| `unknown` | Catch-all; should be rare and trigger triage |
+## Example records
+**Successful builder run** (sequential under `/qualia-build` skill invocation `sk_42`):
+```json
+{"schema_version":1,"run_id":"01HXY8N3W2K7Q5MZP9V4F8R6T1","skill_invocation_id":"sk_42","session_id":"sess_abc123","agent_type":"builder","model":"claude-sonnet-4-6","effort":"medium","project":"acme-portal","phase":2,"milestone":1,"task_id":"T1","wave":1,"status":"success","started_at":"2026-04-28T14:32:11Z","finished_at":"2026-04-28T14:34:02Z","duration_ms":111000,"input_tokens":12450,"output_tokens":1820,"cache_read_tokens":11200,"tool_calls_count":7,"files_changed":["src/lib/auth.ts","src/lib/auth-schema.ts"],"commit_sha":"a3f5e1c"}
+```
+**Failed verifier run** (same skill invocation, no parent — it's a sibling of the builder, not nested):
+```json
+{"schema_version":1,"run_id":"01HXY8P5R8K7Q5MZP9V4F8R6T2","skill_invocation_id":"sk_42","session_id":"sess_abc123","agent_type":"verifier","model":"claude-opus-4-7","project":"acme-portal","phase":2,"milestone":1,"status":"failure","started_at":"2026-04-28T14:38:10Z","finished_at":"2026-04-28T14:39:55Z","duration_ms":105000,"input_tokens":18200,"output_tokens":2100,"tool_calls_count":12,"verifier_score":2,"verification_result":"fail","failure_reason":"verification-criteria-unmet","failure_detail":"Task T2 acceptance criterion 'Redirect to /dashboard on 200' could not be verified — page.tsx contains no redirect() call","log_file":".planning/agent-runs/01HXY8P5R8K7Q5MZP9V4F8R6T2.log"}
+```
+**Researcher spawned by team-orchestrator** (true nesting — parent_run_id is set):
+```json
+{"schema_version":1,"run_id":"01HXY8QF1ZK7Q5MZP9V4F8R6T3","parent_run_id":"01HXY8QE9V2K7Q5MZP9V4F8R6T0","skill_invocation_id":"sk_43","session_id":"sess_abc123","agent_type":"researcher","model":"claude-sonnet-4-6","project":"acme-portal","status":"failure","started_at":"2026-04-28T14:42:00Z","finished_at":"2026-04-28T14:42:12Z","duration_ms":12000,"tool_calls_count":1,"failure_reason":"rate-limited","failure_detail":"WebFetch returned 429 from context7.com after 1 attempt","log_file":".planning/agent-runs/01HXY8QF1ZK7Q5MZP9V4F8R6T3.log"}
+```
+### `parent_run_id` vs `skill_invocation_id` — when to use which
+- **`skill_invocation_id`:** every record carries one. Groups all agents that ran under a single user-triggered skill (`/qualia-build phase 2` → planner, all builders, verifier all share an id).
+- **`parent_run_id`:** rare. Set only when one agent literally spawned another via the `Agent` tool — for example, `team-orchestrator` fanning out to `frontend-agent` + `backend-agent`. Sequential planner→builder→verifier under one skill is *not* nesting; those are siblings.
+## How records get written
+A small helper at `bin/lib/agent-runs.js`:
+```js
+// pseudocode
+const ar = require('./lib/agent-runs');
+const run = ar.start({
+  agent_type: 'builder',
+  task_id: 'T1',
+  phase: 2,
+  model: process.env.QUALIA_MODEL || 'claude-sonnet-4-6',
+});
+// ... spawn agent, capture result ...
+ar.finish(run, {
+  status: 'success',
+  files_changed: ['src/lib/auth.ts'],
+  commit_sha: getHeadSha(),
+  input_tokens: 12450,
+  output_tokens: 1820,
+});
+```
+`start()` returns a token; `finish()` writes the full record via a single `fs.appendFileSync` call. Crash between start and finish leaves no partial record on disk — the in-memory record is lost, which is fine.
+**Concurrency:** `qualia-build` spawns multiple builders in the same wave, each calling `finish()` concurrently. Atomicity guarantee: `fs.appendFileSync` opens with `O_APPEND` and issues a single `write()` syscall per call. On Linux ext4/btrfs/xfs and macOS APFS, `write()` to a regular file with `O_APPEND` is atomic for sizes up to the filesystem block size (typically 4096 bytes). Records run ~600–800 bytes — well under. On Windows NTFS, `O_APPEND` semantics are implemented by Node's libuv via an internal seek+write under a file lock; effectively atomic for our sizes. This is *not* the POSIX pipe `PIPE_BUF` guarantee — that applies to pipes, not regular files. The protection here is the kernel's regular-file `O_APPEND` + single-`write()` behavior. If records ever exceed ~3.5KB, switch to a per-write lock (`proper-lockfile` or a `.planning/.agent-runs.lock` flock).
+On `status != "success"`, `finish()` also writes `.planning/agent-runs/<run_id>.log` with the full stderr/error output. JSONL stays lean for analytics; debugging context lives in the side files. Successful runs leave no log file.
+**Where called from:**
+- The skills that orchestrate spawns (`/qualia-build`, `/qualia-verify`, `/qualia-plan`, etc.) wrap each Agent invocation in `ar.start` / `ar.finish`.
+- Skills can't easily measure tokens — those fields are populated when the harness exposes them via `Task` tool result metadata and left undefined otherwise. Don't gate the design on data we may or may not have.
+## How records get read
+`qualia-framework agents` — summary table:
+```
+$ qualia-framework agents
+Agent runs (last 50, project: acme-portal)
+  TIME    AGENT     PHASE  TASK  STATUS    DURATION  TOKENS    NOTE
+  14:34   builder   2      T1    success      111s   14k in    src/lib/auth.ts
+  14:38   verifier  2      —     failure      105s   20k in    verification-criteria-unmet
+  14:42   builder   2      T1    success       89s   13k in    fix: redirect after signin
+  14:45   verifier  2      —     success       97s   19k in    pass
+```
+`qualia-framework agents --failed` — only failure/partial/timeout/blocked:
+```
+$ qualia-framework agents --failed
+2 failures in last 7 days
+  2026-04-28 14:38  verifier   phase 2  verification-criteria-unmet
+  2026-04-26 09:22  builder    phase 1  tsc-failed
+```
+`qualia-framework agents --task T1` — all runs for one task (gap cycles visible):
+```
+$ qualia-framework agents --task T1
+T1 — Add email/password sign-in handler  (3 runs)
+  2026-04-28 14:32  builder   success      111s
+  2026-04-28 14:38  verifier  failure      105s   verification-criteria-unmet
+  2026-04-28 14:42  builder   success       89s   ← retry after gap
+```
+`qualia-framework analytics` extends with: agent failure rate, slowest agents (p50/p95), verifier fail rate by phase, repeated gap-cycles by task.
+## ERP integration (additive, non-breaking)
+Report payload v2 (in `docs/erp-contract.md`) gains:
+```json
+"agent_runs": {
+  "count": 14,
+  "failures": 2,
+  "verifier_fail_rate": 0.14,
+  "slowest_agent_ms": 312000,
+  "by_type": { "builder": 9, "verifier": 4, "planner": 1 }
+}
+```
+Aggregated counts only — never raw records. ERP backend treats the field as optional; old reports without it still parse. The full JSONL stays local.
+## Privacy and opt-out
+| Captured | Not captured |
+|---|---|
+| Agent type, model | User prompts |
+| Phase, task id | LLM responses |
+| File paths (repo-relative) | File contents |
+| Token counts | Command output |
+| Duration, status | Stderr/stdout beyond `failure_detail` last 500 chars |
+| Failure code | Network response bodies |
+| Commit SHA | Git diffs |
+Disable new writes: `export QUALIA_TELEMETRY=off`. The helper short-circuits *writes* — reads (`qualia-framework agents`) still surface previously recorded data. Opting out doesn't erase history.
+## Design decisions (locked v1)
+These were called out as open questions during draft; resolved here so implementation can proceed.
+1. **Token counts:** all token fields are optional. Populate when the harness exposes them via `Task` tool metadata; leave undefined otherwise. The schema doesn't depend on always having them.
+2. **`session_id`:** optional. If Claude Code exposes a stable id to skills/hooks, use it. Otherwise the `bin/lib/agent-runs.js` writer generates a per-process UUID on first call and reuses it for the lifetime of that process.
+3. **Tool-call telemetry:** aggregate `tool_calls_count` only. No per-call spans. If a future analytics need demands per-call detail, add a separate `agent-tool-calls.jsonl` — don't bloat the main ledger.
+4. **`parent_run_id` vs `skill_invocation_id`:** added `skill_invocation_id` as the common grouping key (every record has one). `parent_run_id` is reserved strictly for true agent-spawned-agent nesting (team-orchestrator → fan-out children). Documented inline above.
+5. **`failure_detail`:** capped at 500 chars; keep the tail (most recent stderr), drop the head. The newest content is usually the most useful for classification.
+6. **Side log files:** on `status != success`, `finish()` writes `.planning/agent-runs/<run_id>.log` with full stderr. Lean JSONL stays grep-friendly; debugging context survives.
+7. **Cross-project rollup:** rejected. ERP does fleet-wide aggregation. A `~/.claude/agent-runs.jsonl` mirror would add a sync surface for marginal benefit.
+8. **Append atomicity:** relies on `O_APPEND` + single-`write()` syscall behavior for regular files (Linux/macOS/Windows). Atomic up to filesystem block size; our records are well under. Detailed in the "Concurrency" note above.
+9. **Cleanup:** `qualia-framework agents prune --before YYYY-MM-DD` removes records and matching log files older than the cutoff. Never auto-prunes — operator-driven only.
+10. **`QUALIA_TELEMETRY=off` semantics:** disables *writes* only. Reads (`qualia-framework agents`) still surface existing records — opting out of new collection does not retroactively hide history. Set before a session to silence that session's spawns.
+## Migration plan
+1. Add `bin/lib/agent-runs.js` (writer) + `bin/cli.js agents` (reader). Helper is a no-op if `.planning/` doesn't exist.
+2. Wire `ar.start`/`ar.finish` calls into the orchestrating skills (`/qualia-build`, `/qualia-verify`, `/qualia-plan`, `/qualia-research`, `/qualia-postmortem`).
+3. Add `agents` table to `qualia-framework analytics`.
+4. After two milestones produce real data, extend ERP payload (v2) with aggregated metrics. Coordinate with ERP backend.
+5. Defer postmortem feedback loop and ERP feedback analyzer until ≥4 weeks of real data exist.
+No hard cutover. Pre-existing projects acquire the JSONL on first spawn after upgrade — older runs are simply absent.

package/docs/erp-contract.md CHANGED Viewed

@@ -200,6 +200,11 @@ Authorization: Bearer <api-key>
   external callers. Internal idempotent UPSERT on `(project_id,
   client_report_id)` retries is the one exception (see "Idempotent UPSERT
   on retry" above).
+- The ERP resolves each report to a canonical internal project when possible.
+  Repository URL is the strongest signal, followed by repo/project slug, then
+  configured aliases, then the human report project name. This keeps legacy
+  repo/report names like `USD-Academy` or `USD-ACVADEMY` correctly linked to
+  ERP project names like `Underdog-Sales-Academy`.
 - **`dry_run` retention (v4.0.4+):** The ERP deletes rows where
   `dry_run = true AND submitted_at < now() - 7 days` via a daily cron at
   03:00 UTC. Production report views (list, project tree, email digests)

package/docs/plan-contract.md ADDED Viewed

@@ -0,0 +1,321 @@
+# Plan Contract
+Machine-readable plan format consumed by builder, verifier, plan-checker, and `state.js`. Replaces ad-hoc markdown re-parsing — markdown plans become presentation, this JSON contract is truth.
+Status: **draft, v1.** Pressure-test the shape against real phases before locking.
+## Why this exists
+Today, `templates/plan.md` is structured markdown. Planner emits it, builder re-interprets it, verifier re-interprets it, plan-checker re-interprets it. Three independent LLM interpretations of the same prose = drift. The drift is invisible until verification fails for a reason that doesn't match the planner's intent.
+The contract shifts every machine-driven step (task assignment, dependency check, verification execution) onto deterministic JSON. Prose stays in `phase-N-plan.md` for humans; code reads `phase-N-contract.json`.
+## File layout
+```
+.planning/
+  phase-1-plan.md           # human-facing prose (existing)
+  phase-1-contract.json     # machine truth (NEW)
+  phase-1-deviations.json   # builder→verifier deltas (existing)
+  phase-1-verification.md   # verifier output (existing)
+```
+`contract.json` is committed. It is regenerated only by re-running `/qualia-plan` or `qualia-framework state.js compile-plan`.
+## Schema (v1)
+TypeScript-flavored for readability. Authoritative validator lives at `bin/lib/plan-contract.js` (Zod or hand-rolled — TBD; framework currently has zero deps).
+```ts
+interface PlanContract {
+  version: 1;                    // bump on breaking change
+  phase: number;                 // 1-indexed
+  goal: string;                  // 1-2 sentences, what's TRUE when done
+  why: string;                   // unlocks-what; one sentence
+  generated_at: string;          // ISO 8601 UTC
+  generated_by: "planner" | "compile-plan" | "manual";
+  source_plan_hash: string;      // sha256 of phase-N-plan.md at compile time; "" if generated_by="manual"
+  tasks: Task[];
+  success_criteria: string[];    // phase-level user-facing truths
+}
+interface Task {
+  id: string;                    // "T1", "T2" — stable across reorders
+  title: string;
+  wave: number;                  // 1-indexed; tasks in same wave run in parallel
+  depends_on: string[];          // task ids this task needs
+  persona?: PersonaTag;          // optional, for agent specialization
+  files_modify: string[];        // repo-relative paths
+  files_create: string[];        // repo-relative paths
+  files_delete: string[];        // repo-relative paths (for refactors that remove code)
+  acceptance_criteria: string[]; // observable behaviors (human-facing)
+  action: string;                // concrete builder steps (advisory prose, max 500 chars)
+  context_files: string[];       // repo-relative paths the builder should read
+  verification: VerificationCheck[];
+}
+type PersonaTag =
+  | "security" | "architect" | "ux" | "frontend"
+  | "backend" | "data" | "performance" | "none";
+type VerificationCheck =
+  | FileExistsCheck
+  | GrepMatchCheck
+  | CommandExitCheck
+  | BehavioralCheck;
+interface FileExistsCheck {
+  type: "file-exists";
+  path: string;                  // repo-relative
+  must_contain?: string;         // optional substring assertion
+}
+interface GrepMatchCheck {
+  type: "grep-match";
+  path: string;                  // file or glob
+  pattern: string;               // regex
+  expect: "present" | "absent";
+}
+interface CommandExitCheck {
+  type: "command-exit";
+  command: string;               // executed via execFile, NOT shell
+  args: string[];                // positional args (no shell parsing)
+  cwd?: string;                  // repo-relative; default = repo root
+  expected_exit: number;         // typically 0
+  timeout_ms?: number;           // default 30000
+  expect_stdout_match?: string;  // regex; optional
+}
+interface BehavioralCheck {
+  type: "behavioral";
+  description: string;           // human-readable; verifier interprets
+  evidence_required: Evidence[]; // structured citation requirements; vibes-based passes blocked at schema level
+}
+interface Evidence {
+  path: string;                  // repo-relative file path the verifier must cite
+  matcher?: string;              // optional regex the cited line must satisfy
+  description: string;           // what the cited line should demonstrate
+}
+```
+### Why these four check types
+They map 1:1 with the existing markdown Verification Contract section, so compilation is mechanical:
+| Markdown section | Maps to |
+|---|---|
+| `Check type: file-exists` | `FileExistsCheck` |
+| `Check type: grep-match` | `GrepMatchCheck` |
+| `Check type: command-exit` | `CommandExitCheck` |
+| `Check type: behavioral` | `BehavioralCheck` (last resort) |
+`behavioral` is the only check that retains LLM interpretation — and even there, the schema forces evidence-required so the verifier can't produce vibes-based passes.
+## Example: a real phase contract
+```json
+{
+  "version": 1,
+  "phase": 2,
+  "goal": "Authenticated users can sign in with email/password and reach the dashboard.",
+  "why": "Session persistence is the #1 abandonment trigger in onboarding — verification emails are wasted without it.",
+  "generated_at": "2026-04-28T14:32:00Z",
+  "generated_by": "planner",
+  "source_plan_hash": "sha256:9c1ae6f2b4d8e1f3a5c7b9d0e2f4a6c8e0b1d3f5a7c9e1b3d5f7a9c1e3b5d7f9",
+  "tasks": [
+    {
+      "id": "T1",
+      "title": "Add email/password sign-in handler",
+      "wave": 1,
+      "depends_on": [],
+      "persona": "backend",
+      "files_modify": ["src/lib/auth.ts"],
+      "files_create": ["src/lib/auth-schema.ts"],
+      "files_delete": [],
+      "acceptance_criteria": [
+        "POST /api/auth/signin returns 200 with valid creds",
+        "POST /api/auth/signin returns 401 with invalid creds",
+        "Session cookie is httpOnly and sameSite=lax"
+      ],
+      "action": "Use supabase.auth.signInWithPassword. Validate email/password with Zod schema. Set cookie via Next.js Response API.",
+      "context_files": [
+        "src/lib/supabase/server.ts",
+        "src/lib/supabase/client.ts"
+      ],
+      "verification": [
+        {
+          "type": "file-exists",
+          "path": "src/lib/auth-schema.ts",
+          "must_contain": "z.object"
+        },
+        {
+          "type": "command-exit",
+          "command": "npx",
+          "args": ["tsc", "--noEmit"],
+          "expected_exit": 0,
+          "timeout_ms": 60000
+        },
+        {
+          "type": "grep-match",
+          "path": "src/lib/auth.ts",
+          "pattern": "signInWithPassword",
+          "expect": "present"
+        }
+      ]
+    },
+    {
+      "id": "T2",
+      "title": "Wire sign-in form to handler",
+      "wave": 2,
+      "depends_on": ["T1"],
+      "persona": "frontend",
+      "files_modify": ["src/app/(auth)/signin/page.tsx"],
+      "files_create": [],
+      "files_delete": [],
+      "acceptance_criteria": [
+        "Form posts to /api/auth/signin",
+        "Error toast shows on 401",
+        "Redirect to /dashboard on 200"
+      ],
+      "action": "Add server action; show error state via useFormState; redirect via redirect() from next/navigation.",
+      "context_files": ["src/app/(auth)/signin/page.tsx"],
+      "verification": [
+        {
+          "type": "behavioral",
+          "description": "Form submission with valid creds redirects to /dashboard",
+          "evidence_required": [
+            {
+              "path": "src/app/(auth)/signin/page.tsx",
+              "matcher": "redirect\\(['\"]/dashboard",
+              "description": "redirect() call targeting /dashboard after successful signin"
+            },
+            {
+              "path": "src/app/(auth)/signin/page.tsx",
+              "matcher": "useFormState|action=",
+              "description": "form is wired to a server action or POST handler"
+            }
+          ]
+        }
+      ]
+    }
+  ],
+  "success_criteria": [
+    "User can sign in with valid credentials and land on /dashboard",
+    "User sees a clear error message on invalid credentials without leaving the page",
+    "Session persists across page reloads"
+  ]
+}
+```
+## Validation rules (enforced at emission)
+1. **`tasks[].id` must be unique** within the phase.
+2. **Task ids must match** `^T\d+$` — `T1`, `T2`, etc. The compiler prefixes markdown task numbers (`## Task 1` → `T1`).
+3. **`depends_on` must reference ids that exist** in the same contract.
+4. **No cycles in `depends_on`.**
+5. **Wave assignment must respect dependencies** — a task's wave must be `>` than the max wave of its dependencies. (Trivially: if T2 depends on T1, T2.wave > T1.wave.)
+6. **At least one verification check per task.** Empty `verification: []` is rejected.
+7. **`files_modify`, `files_create`, `files_delete` are pairwise disjoint** — a file is in at most one of the three.
+8. **`command-exit` checks must use execFile-safe args** — no shell metacharacters in `command`; `args[]` carries positional values.
+9. **`success_criteria` minimum 1 entry.**
+10. **`action` ≤ 500 characters** — enforced. Keeps planner from over-specifying implementation.
+11. **`evidence_required[].path` must be repo-relative** and `matcher` (when present) must be a valid regex.
+`bin/state.js validate-plan` runs these. Failures block transition to `built`.
+Validator implementation: hand-rolled at `bin/lib/plan-contract.js`, ~80 LOC, zero dependencies. Framework's no-deps posture is preserved.
+## Drift detection (contract vs markdown)
+Manual edits to `phase-N-plan.md` happen in practice. Without detection, the contract silently goes stale: builder reads JSON truth that no longer matches what humans see in markdown.
+`source_plan_hash` is `sha256(plan_md_contents)` at compile time, prefixed `sha256:`. Stored in the contract.
+`bin/state.js validate-plan --check-drift` re-hashes the current plan markdown and compares. Drift behavior:
+| Scenario | Action |
+|---|---|
+| Hashes match | OK, no output |
+| Hashes differ | Exit 2, message: `plan.md drifted from contract; run compile-plan --refresh` |
+| Contract missing `source_plan_hash` (legacy or `manual`) | Warn but pass — drift checking disabled for that contract |
+`compile-plan --refresh` re-reads markdown, regenerates contract, updates hash. Builder/verifier refuse to run if `--check-drift` fails.
+## Verification execution errors
+A check that *cannot run* (binary missing, timeout, cwd doesn't exist) is distinct from a check that *ran and failed*. The verifier records:
+| Outcome | `verification_result` | `failure_reason` |
+|---|---|---|
+| Check ran, passed | `pass` | — |
+| Check ran, criteria unmet | `fail` | `verification-criteria-unmet` |
+| Behavioral check, evidence missing | `fail` | `verification-evidence-missing` |
+| Check itself errored (cmd not found, timeout, etc.) | `partial` | `verification-execution-error` |
+Execution errors are NOT verification failures. They block phase advance the same way, but a postmortem treats them differently — fix the infrastructure, then re-run.
+## How builder reads it
+```js
+// pseudocode — the actual implementation lives in skills/qualia-build
+const contract = JSON.parse(fs.readFileSync(`.planning/phase-${N}-contract.json`));
+const myTask = contract.tasks.find(t => t.id === assignedTaskId);
+// builder gets:
+//   - exact files to touch
+//   - acceptance_criteria as the "definition of done"
+//   - context_files to read first
+//   - verification[] is the self-check before declaring DONE
+```
+The builder still receives the Action prose as advisory guidance. The contract is the boundary.
+## How verifier reads it
+For each task in the contract:
+1. Walk `verification[]`.
+2. For deterministic checks (`file-exists`, `grep-match`, `command-exit`): execute and record pass/fail with stdout/stderr captured. Distinguish "ran and failed" (`verification-criteria-unmet`) from "could not run" (`verification-execution-error`).
+3. For `behavioral` checks: for each `evidence_required[i]`, the verifier MUST produce a `{path, line, snippet}` citation. If `matcher` is present, the cited line must satisfy the regex. Missing evidence or matcher mismatch → automatic `verification-evidence-missing`.
+4. Aggregate per-task → per-phase pass/fail.
+5. Write `phase-N-verification.json` (machine output) alongside `phase-N-verification.md` (human output).
+This eliminates the "verifier wrote a glowing pass when half the criteria weren't actually met" failure mode — `evidence_required[]` is structured, so vibes-based passes are blocked at the schema level.
+## Compile mode (migrating in-flight projects)
+`bin/state.js compile-plan --phase N` reads `phase-N-plan.md` and emits a best-effort `phase-N-contract.json`:
+- Frontmatter → `phase`, `goal`
+- `## Task N — title` blocks → `tasks[]`
+- `**Files:**` line → `files_modify` (cannot distinguish create vs modify from prose; defaults to modify, warns)
+- `**Acceptance Criteria:**` bullets → `acceptance_criteria`
+- `### Contract for Task N` blocks → `verification[]`
+- Missing fields → `compile-plan` exits non-zero with a list of gaps
+Compile is a one-time bridge. New plans emit JSON directly from the planner agent.
+## Design decisions (locked v1)
+These were called out as open questions during draft; resolved here so implementation can proceed.
+1. **Persona enum:** dropped `data` — covered by `backend`. Six personas + `none`.
+2. **`acceptance_criteria` vs `verification[]`:** kept separate. AC is the human-facing definition of done (lands in commit messages, milestone summaries, ERP reports). `verification[]` is the mechanical execution path. The verifier never interprets AC — it executes `verification[]`. This separation is the whole point of the contract.
+3. **`action` cap:** 500 chars. Advisory only. Validator enforces.
+4. **Versioning:** in-place migration via `compile-plan --upgrade`. `version` field tells the loader which schema to apply. No filename suffixes — canonical filename stays `phase-N-contract.json`.
+5. **Wave placement:** lives on the task. The validator enforces `task.wave > max(deps wave)` so the redundancy with `depends_on` is contained. Wave is a scheduling/display hint; `depends_on` is the constraint.
+6. **`behavioral` checks:** permanent. UX feel, error message clarity, animation timing — none of these are deterministic. The escape hatch is healthy. The `evidence_required[]` field forces verifier to cite proof; vibes-based passes are blocked at the schema level.
+7. **Validator:** hand-rolled in plain Node. Framework keeps zero npm dependencies. Zod is rejected for this layer.
+## Migration plan
+1. Add schema + validator + `compile-plan` command. No callers yet.
+2. Backfill contracts for active projects via `compile-plan` — manual review of warnings.
+3. Update planner agent prompt to emit JSON alongside markdown.
+4. Update builder skill to read JSON for files/AC/verification; markdown still readable.
+5. Update verifier agent to execute `verification[]` deterministically; keep prose verification report for humans.
+6. Update plan-checker to validate JSON.
+7. After two milestones run cleanly on JSON, mark prose plan as advisory-only in docs.
+No hard cutover. Both formats coexist during migration.

package/hooks/auto-update.js CHANGED Viewed

@@ -63,12 +63,12 @@ try {
   }
   // Synchronously fetch the latest version from npm. Tight timeout so the hook
-  // never blocks Claude Code for long. The cache timestamp is written ONLY if
-  // this fetch succeeds — otherwise the next session retries (no 24h blackout
-  // when the network is unreachable).
+  // never blocks Claude Code for long. Stamp the cache before the network call:
+  // if npm/DNS is down, we avoid paying the timeout on every Bash command.
   let latest = "";
   try {
     fs.writeFileSync(LOCK_FILE, String(process.pid));
+    fs.writeFileSync(CACHE_FILE, String(Math.floor(Date.now() / 1000)));
     const r = spawnSync("npm", ["view", "qualia-framework", "version"], {
       encoding: "utf8",
       timeout: 3000,
@@ -80,14 +80,10 @@ try {
   try { fs.unlinkSync(LOCK_FILE); } catch {}
   if (!latest) {
-    // Fetch failed — leave cache untouched so the next call retries.
     _trace("auto-update", "allow", { reason: "npm-fetch-failed" });
     process.exit(0);
   }
-  // Successful fetch — debounce future checks for 24h.
-  fs.writeFileSync(CACHE_FILE, String(Math.floor(Date.now() / 1000)));
   const cmp = (a, b) => {
     const pa = a.split(".").map(Number), pb = b.split(".").map(Number);
     for (let i = 0; i < 3; i++) {