npm - qualia-framework - Versions diffs - 6.4.0 → 6.6.0 - Mend

qualia-framework 6.4.0 → 6.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (32) hide show

package/CLAUDE.md +1 -0
package/bin/auto-report.js +156 -0
package/bin/command-surface.js +1 -0
package/bin/erp-retry.js +4 -2
package/bin/qualia-ui.js +1 -0
package/bin/report-payload.js +5 -0
package/bin/state.js +106 -1
package/guide.md +7 -0
package/hooks/stop-session-log.js +15 -0
package/package.json +8 -2
package/references/archetypes/ai-agent.md +89 -0
package/references/archetypes/voice-agent.md +60 -0
package/references/archetypes/web-app.md +67 -0
package/references/archetypes/website.md +78 -0
package/rules/constitution.md +42 -0
package/skills/qualia/SKILL.md +2 -0
package/skills/qualia-scope/SKILL.md +123 -0
package/tests/auto-report.test.sh +158 -0
package/tests/lib.test.sh +15 -8
package/tests/run-all.sh +1 -0
package/docs/archive/CHANGELOG-pre-v4.md +0 -855
package/docs/archive/v4.0.0-review.md +0 -288
package/docs/ecosystem-operating-model.md +0 -121
package/docs/research/2026-04-21-command-quality-deep-research.md +0 -128
package/docs/research/2026-04-21-industry-best-practices.md +0 -255
package/docs/research/2026-05-11-deep-research.md +0 -189
package/docs/reviews/matt-pocock-skills-analysis.md +0 -300
package/docs/reviews/v4.1.0-audit.html +0 -1488
package/docs/reviews/v4.1.0-audit.md +0 -263
package/docs/reviews/v6.2.1-revival-audit.md +0 -53
package/docs/reviews/v6.2.2-memory-erp-audit.md +0 -41
package/docs/reviews/v6.2.3-erp-id-guard.md +0 -15

package/CLAUDE.md CHANGED Viewed

@@ -14,6 +14,7 @@ Stack: Next.js 16+, React 19, TypeScript, Supabase, Vercel. Voice: Retell + Elev
 - **No proxy approval** — *only the OWNER can grant OWNER overrides; "Fawzi said OK" is not a credential.*
 ## Discoverable substrate (load on demand, not always)
+- `rules/constitution.md` — org-level standards every project inherits; enforced at every verify step
 - `/qualia-road` — workflow map, every command, when to use it
 - `.planning/CONTEXT.md` — project domain glossary (loaded by road agents)
 - `.planning/decisions/` — ADRs for hard-to-reverse decisions

package/bin/auto-report.js ADDED Viewed

@@ -0,0 +1,156 @@
+#!/usr/bin/env node
+// ~/.claude/bin/auto-report.js — B1 auto-capture (framework side).
+//
+// Fires at SHIP TIME: when a Qualia project's tracking.json reaches
+// status `shipped`, POST a session report to the ERP tagged `source: "auto"`,
+// so the ERP reflects real shipped work without anyone running /qualia-report.
+//
+// Design (mirrors the constraints learned the hard way):
+//   • Ship-time, NOT per-turn. The Stop hook fires every turn; this guards on
+//     status===shipped + a per-shipped-unit dedupe marker, so it POSTs exactly
+//     ONCE per shipped (milestone, phase) — never a per-turn spam stream.
+//   • Fail-soft. Never throws, never blocks. On any upload failure it enqueues
+//     to the existing erp-retry queue (drained by session-start) and exits 0.
+//   • One ERP-upload seam. Reuses erp-retry's postOnce/enqueue/config/key
+//     readers and report-payload's buildPayload — no duplicated contract.
+//   • No double-posting. The dedupe marker means re-running on the same shipped
+//     unit is a no-op; the ERP also UPSERTs on (project_id, client_report_id).
+//
+// Invoked fire-and-forget (detached) by hooks/stop-session-log.js, or directly:
+//   node auto-report.js            # run the guarded auto-report for cwd
+//   SOURCE handled internally as "auto"; set DRY_RUN=1 to mark the report dry.
+const fs = require("fs");
+const os = require("os");
+const path = require("path");
+const crypto = require("crypto");
+const { spawnSync } = require("child_process");
+const { buildPayload } = require("./report-payload.js");
+const { enqueue, postOnce, readApiKey, readConfig } = require("./erp-retry.js");
+function qualiaHome(home = os.homedir()) {
+  if (process.env.QUALIA_HOME) return process.env.QUALIA_HOME;
+  const parent = path.basename(path.dirname(__dirname));
+  if (parent === ".codex" || parent === ".claude") return path.dirname(__dirname);
+  return path.join(home, ".claude");
+}
+function readJson(file) {
+  try {
+    return JSON.parse(fs.readFileSync(file, "utf8"));
+  } catch {
+    return null;
+  }
+}
+function markerFile(home, projectKey) {
+  const safe = String(projectKey || "project").replace(/[^a-zA-Z0-9._-]+/g, "-").slice(0, 80);
+  return path.join(qualiaHome(home), `.qualia-auto-report-${safe}.json`);
+}
+function erpUrl(cfg) {
+  const base = (cfg && cfg.erp && cfg.erp.url) || "https://portal.qualiasolutions.net";
+  return base.replace(/\/+$/, "") + "/api/v1/reports";
+}
+function allocateReportId(cwd) {
+  // Sequential QS-REPORT-NN via state.js (the same allocator /qualia-report uses).
+  try {
+    const r = spawnSync("node", [path.join(__dirname, "state.js"), "next-report-id"], {
+      cwd,
+      encoding: "utf8",
+      timeout: 4000,
+    });
+    if (r.status === 0 && r.stdout) {
+      const parsed = JSON.parse(r.stdout);
+      if (parsed && parsed.report_id) return parsed.report_id;
+    }
+  } catch {}
+  return "";
+}
+// The single decision + action. Returns a small status object; never throws.
+async function maybeAutoReport({ cwd = process.cwd(), home = os.homedir(), env = process.env } = {}) {
+  try {
+    // Guard 1 — ERP configured. No key / disabled → silent no-op.
+    const cfg = readConfig();
+    if (cfg && cfg.erp && cfg.erp.enabled === false) return { skipped: "erp-disabled" };
+    const apiKey = readApiKey();
+    if (!apiKey) return { skipped: "no-key" };
+    // Guard 2 — Qualia project at SHIP time only.
+    const tracking = readJson(path.join(cwd, ".planning", "tracking.json"));
+    if (!tracking) return { skipped: "no-project" };
+    if (String(tracking.status) !== "shipped") return { skipped: "not-shipped" };
+    // Guard 3 — dedupe: one report per shipped (milestone, phase).
+    const projectKey =
+      tracking.project_id ||
+      tracking.project ||
+      path.basename(cwd);
+    const unit = `${tracking.milestone || 1}:${tracking.phase || 0}:shipped`;
+    const mFile = markerFile(home, projectKey);
+    const marker = readJson(mFile) || {};
+    if (marker.last === unit) return { skipped: "already-reported", unit };
+    // Allocate a sequential client_report_id (the ERP dedupe key).
+    const clientReportId = allocateReportId(cwd);
+    const idempotencyKey = crypto.randomUUID();
+    const payload = buildPayload({
+      cwd,
+      home,
+      env: { ...env, SOURCE: "auto", CLIENT_REPORT_ID: clientReportId },
+    });
+    const body = JSON.stringify(payload);
+    const url = erpUrl(cfg);
+    const result = await postOnce(
+      { url, payload: body, idempotency_key: idempotencyKey },
+      apiKey,
+    );
+    const writeMarker = (extra) => {
+      try {
+        fs.writeFileSync(
+          mFile,
+          JSON.stringify({ last: unit, client_report_id: clientReportId, at: new Date().toISOString(), ...extra }, null, 2),
+          { mode: 0o600 },
+        );
+      } catch {}
+    };
+    if (result.code === "200") {
+      writeMarker({ posted: true });
+      return { posted: clientReportId, unit };
+    }
+    // Any non-200 → enqueue for the retry queue (session-start drains it).
+    // Mark the unit so we don't re-allocate a new id on the next turn; the
+    // queued item carries this client_report_id and the ERP dedupes on it.
+    try {
+      enqueue({
+        client_report_id: clientReportId,
+        idempotency_key: idempotencyKey,
+        url,
+        payload: body,
+        last_error: result.error ? `network: ${result.error}` : `HTTP ${result.code}`,
+      });
+    } catch {}
+    writeMarker({ queued: true, last_error: result.error || `HTTP ${result.code}` });
+    return { queued: clientReportId, unit, error: result.error || `HTTP ${result.code}` };
+  } catch (e) {
+    // Auto-capture must never break a session.
+    return { skipped: "error", error: e && e.message ? e.message : String(e) };
+  }
+}
+module.exports = { maybeAutoReport };
+if (require.main === module) {
+  maybeAutoReport()
+    .then((r) => {
+      if (process.env.QUALIA_DEBUG) process.stdout.write(JSON.stringify(r) + "\n");
+      process.exit(0);
+    })
+    .catch(() => process.exit(0));
+}

package/bin/command-surface.js CHANGED Viewed

@@ -8,6 +8,7 @@
 const ACTIVE_SKILLS = [
   "qualia",
   "qualia-new",
+  "qualia-scope",
   "qualia-discuss",
   "qualia-map",
   "qualia-research",

package/bin/erp-retry.js CHANGED Viewed

@@ -274,8 +274,10 @@ function actionClear() {
   log(`queue cleared (backup at ${bak})`);
 }
-// ─── Export for in-process use (qualia-report skill enqueues directly) ──
-module.exports = { enqueue, readQueue, writeQueue };
+// ─── Export for in-process use (qualia-report skill enqueues directly;
+//     auto-report.js reuses the POST + config/key readers so there is ONE
+//     ERP-upload seam, not two). ──
+module.exports = { enqueue, readQueue, writeQueue, postOnce, readApiKey, readConfig };
 // ─── CLI entrypoint ─────────────────────────────────────
 if (require.main === module) {

package/bin/qualia-ui.js CHANGED Viewed

@@ -82,6 +82,7 @@ const ACTIONS = {
   auto:       { label: "AUTO MODE",        glyph: "⚡" },
   research:   { label: "RESEARCH",         glyph: "◱" },
   roadmap:    { label: "ROADMAP",          glyph: "◐" },
+  scope:      { label: "SCOPING",          glyph: "⬡" },
 };
 // ─── State Reading ───────────────────────────────────────

package/bin/report-payload.js CHANGED Viewed

@@ -136,6 +136,11 @@ function buildPayload(options = {}) {
     notes,
     submitted_by: env.SUBMITTED_BY || "unknown",
     submitted_at: submittedAt,
+    // B1 — provenance. 'auto' = captured automatically at ship-time (auto-report.js);
+    // 'manual' = a deliberate /qualia-report. Defaults to 'manual' so the manual
+    // flow is unchanged; auto-report passes SOURCE=auto.
+    source: env.SOURCE === "auto" ? "auto" : "manual",
+    ...(env.DRY_RUN === "1" ? { dry_run: true } : {}),
   };
 }

package/bin/state.js CHANGED Viewed

@@ -219,6 +219,9 @@ function ensureLifetime(t) {
   if (typeof t.milestone_name !== "string") t.milestone_name = "";
   if (!Array.isArray(t.milestones)) t.milestones = [];
   if (typeof t.report_seq !== "number") t.report_seq = 0;
+  // Seniority profile (backward compat): old tracking.json files predate this
+  // field. Anything other than the exact string 'standard' defaults to 'strict'.
+  if (t.profile !== "standard" && t.profile !== "strict") t.profile = "strict";
   if (!t.lifetime || typeof t.lifetime !== "object") {
     t.lifetime = {
       tasks_completed: 0,
@@ -343,6 +346,9 @@ function parseStateMd(content) {
     phase_name: phaseMatch ? phaseMatch[3].trim() : "",
     status: get("Status").toLowerCase().replace(/\s+/g, "_") || "setup",
     assigned_to: get("Assigned to") || "",
+    // Seniority profile: 'standard' lets a senior waive a gate; anything else
+    // (including missing or typo'd values) coerces to 'strict' — the safe default.
+    profile: get("Profile").toLowerCase() === "standard" ? "standard" : "strict",
     phases,
     schema_errors,
   };
@@ -377,6 +383,7 @@ See: .planning/PROJECT.md
 Phase: ${s.phase} of ${s.total_phases} — ${s.phase_name}
 Status: ${s.status}
 Assigned to: ${s.assigned_to}
+Profile: ${s.profile || "strict"}
 Last activity: ${now} — ${s.last_activity || "State updated"}
 Progress: [${bar}] ${phaseFrac}%
@@ -572,16 +579,105 @@ function nextCommand(status, phase, totalPhases, verification) {
 // ─── Commands ────────────────────────────────────────────
+// ─── Seniority profile gate contract ────────────────────
+// The effective profile resolves as: $QUALIA_PROFILE (env wins) → STATE.md
+// Profile: line → tracking.json profile → 'strict' (default). Any value other
+// than the exact string 'standard' coerces to 'strict' — the safe gate.
+//
+// Gate semantics (the contract; enforcement lives in the CONSUMING skill,
+// qualia-scope — state.js only stores and surfaces the field, it does NOT
+// enforce gates here or in cmdTransition):
+//   strict   = hard gates, no waivers. The Definition-of-Done gate cannot be
+//              exited until every area is covered and no [NEEDS CLARIFICATION]
+//              markers remain.
+//   standard = gates advisory. A senior may exit the gate early with a reason
+//              logged as an ADR in .planning/decisions/.
+function resolveProfile(s, t) {
+  const raw =
+    process.env.QUALIA_PROFILE ||
+    (s && s.profile) ||
+    (t && t.profile) ||
+    "strict";
+  return String(raw).toLowerCase() === "standard" ? "standard" : "strict";
+}
 function cmdCheck(opts) {
   const t = readTracking();
   const s = parseStateMd(readState());
-  if (!t || !s) {
+  // True NO_PROJECT only when BOTH the durable tracking AND the dashboard are
+  // absent. Either alone is a recoverable half-state.
+  if (!t && !s) {
     return output({
       ok: false,
       error: "NO_PROJECT",
       message: "No .planning/ found. Run /qualia-new to start.",
     });
   }
+  // STATE.md missing/corrupt but tracking.json intact. STATE.md is a derivable
+  // view — tracking.json already carries phase/status/milestone (the statusline
+  // reads them straight from it). Reconstruct and route to repair instead of
+  // falsely reporting NO_PROJECT. Critically, exit 0: cmdCheck feeds the
+  // /qualia router, which runs it inside a PARALLEL Bash batch. A non-zero exit
+  // makes the harness cancel the sibling commands ("Cancelled: parallel tool
+  // call ... errored"), so a recoverable state must never exit non-zero.
+  if (t && !s) {
+    ensureLifetime(t);
+    const phase = Number(t.phase || 1) || 1;
+    return output({
+      ok: true,
+      phase,
+      phase_name: t.phase_name || "",
+      total_phases: Number(t.total_phases || 0) || 0,
+      status: String(t.status || "setup"),
+      assigned_to: t.assigned_to || "",
+      profile: resolveProfile(null, t),
+      milestone: t.milestone || 1,
+      milestone_name: t.milestone_name || "",
+      milestones: t.milestones || [],
+      lifetime: t.lifetime,
+      verification: t.verification || "pending",
+      gap_cycles: (t.gap_cycles || {})[String(phase)] || 0,
+      gap_cycle_limit: getGapCycleLimit(),
+      tasks_done: t.tasks_done || 0,
+      tasks_total: t.tasks_total || 0,
+      deployed_url: t.deployed_url || "",
+      next_command: "state.js fix",
+      warning:
+        "STATE.md missing or unparseable — reconstructed from tracking.json. " +
+        "Run `state.js fix` to rewrite it canonically, then continue.",
+      recovered_from: "tracking.json",
+    });
+  }
+  // tracking.json missing but STATE.md present (the inverse half-state). The
+  // rest of cmdCheck needs tracking for lifetime/milestone/verification, so
+  // route to repair (`state.js fix` rebuilds tracking from STATE.md) rather
+  // than crash on a null tracking object. Exit 0 for the same batch reason.
+  if (!t && s) {
+    return output({
+      ok: true,
+      phase: s.phase,
+      phase_name: s.phase_name,
+      total_phases: s.total_phases,
+      status: s.status,
+      assigned_to: s.assigned_to,
+      profile: resolveProfile(s, null),
+      milestone: 1,
+      milestone_name: "",
+      milestones: [],
+      lifetime: undefined,
+      verification: "pending",
+      gap_cycles: 0,
+      gap_cycle_limit: getGapCycleLimit(),
+      tasks_done: 0,
+      tasks_total: 0,
+      deployed_url: "",
+      next_command: "state.js fix",
+      warning:
+        "tracking.json missing — reconstructed from STATE.md. " +
+        "Run `state.js fix` to rebuild tracking, then continue.",
+      recovered_from: "STATE.md",
+    });
+  }
   ensureLifetime(t);
   output({
     ok: true,
@@ -590,6 +686,7 @@ function cmdCheck(opts) {
     total_phases: s.total_phases,
     status: s.status,
     assigned_to: s.assigned_to,
+    profile: resolveProfile(s, t),
     milestone: t.milestone || 1,
     milestone_name: t.milestone_name || "",
     milestones: t.milestones || [],
@@ -940,6 +1037,12 @@ function cmdInit(opts) {
   const prev = readTracking();
   const prevLife = prev ? ensureLifetime(prev) : null;
+  // Seniority profile: explicit --profile standard opts in; otherwise preserve
+  // the prior project's profile on re-init, defaulting to the safe 'strict'.
+  // Any value other than the exact string 'standard' coerces to 'strict'.
+  const profileSource = opts.profile || (prevLife ? prevLife.profile : "strict");
+  const profile = profileSource === "standard" ? "standard" : "strict";
   // Build state
   const s = {
     phase: 1,
@@ -947,6 +1050,7 @@ function cmdInit(opts) {
     phase_name: phases[0].name,
     status: "setup",
     assigned_to: opts.assigned_to || "",
+    profile,
     last_activity: `Project initialized`,
     phases: phases.map((p, i) => ({
       num: i + 1,
@@ -994,6 +1098,7 @@ function cmdInit(opts) {
     phase_name: phases[0].name,
     total_phases: totalPhases,
     status: "setup",
+    profile,
     wave: 0,
     tasks_done: 0,
     tasks_total: 0,

package/guide.md CHANGED Viewed

@@ -99,6 +99,13 @@ Hard rules (enforced by `state.js` and the roadmapper):
 5. **`/qualia` is your friend** — lost on "what's my next command?" The router reads state and returns the next move.
 6. **`/qualia-idk` is your deeper friend** — confused about *the situation itself*. Reads conversation + planning + code, then returns guidance plus a paste-ready Qualia command sequence.
+## Profiles
+A project runs under one profile, set via `$QUALIA_PROFILE` (defaults to `strict`). `state.js check` surfaces the active profile in its output.
+- **`strict`** (default for the team) — hard gates, no waivers. Every gate must pass before the road advances.
+- **`standard`** — gates are advisory. A senior may exit a Definition-of-Done gate early, provided the reason is logged to `.planning/decisions/`.
 ## When You're Stuck
 ```

package/hooks/stop-session-log.js CHANGED Viewed

@@ -85,6 +85,21 @@ function readJson(p) {
 }
 try {
+  // ── B1 auto-capture: fire-and-forget the ship-time auto-report ─────────
+  // Detached subprocess so this hook stays fast (no network here, per its
+  // design). auto-report.js guards on status===shipped + a per-shipped-unit
+  // dedupe marker, so it's a cheap no-op on every turn except the one right
+  // after a ship. Wrapped + unref'd so it never blocks or breaks the session.
+  try {
+    const { spawn } = require("child_process");
+    const child = spawn(
+      process.execPath,
+      [path.join(__dirname, "..", "bin", "auto-report.js")],
+      { cwd: process.cwd(), detached: true, stdio: "ignore" },
+    );
+    child.unref();
+  } catch {}
   // ── Skip if too soon since last write ────────────────────
   const now = Date.now();
   let lastWrite = 0;

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "qualia-framework",
-  "version": "6.4.0",
+  "version": "6.6.0",
   "description": "Claude Code and Codex workflow framework by Qualia Solutions. Plan, build, verify, ship.",
   "bin": {
     "qualia-framework": "./bin/cli.js"
@@ -45,7 +45,13 @@
     "templates/",
     "references/",
     "tests/",
-    "docs/",
+    "docs/agent-runs.md",
+    "docs/erp-contract.md",
+    "docs/plan-contract.md",
+    "docs/playwright-loop-pilot-results.md",
+    "docs/release.md",
+    "docs/changelog-v6.html",
+    "docs/onboarding.html",
     "CLAUDE.md",
     "AGENTS.md",
     "guide.md"

package/references/archetypes/ai-agent.md ADDED Viewed

@@ -0,0 +1,89 @@
+---
+archetype: ai-agent
+stack: Next.js 16 (Vercel, app + API) · Supabase (Postgres + pgvector) · Railway (workers) · OpenRouter · Tailwind + shadcn/ui
+updated: 2026-05-28
+---
+# Archetype: `ai-agent`
+> LLM / chat / agent products on Supabase + Vercel, with Railway for any long-running or scheduled compute. The roadmapper loads this file when the operator picks `ai-agent`. Voice (`voice-agent`) extends this archetype with a latency + call-testing milestone — see the bottom note.
+## How this file is used
+Same contract as every archetype: `qualia-scope` grills the **Grill variables**, the **Definition of Done** is the fixed coverage, the **Road** is the default 0→100. The differentiator here is **M3 — the eval gate**: an agent isn't "done" because it replies; it's done when it passes measurable cases.
+## Grill variables (what `qualia-scope` must extract)
+- **Job to be done** — one sentence. What does the agent *do*, for whom, replacing what manual work?
+- **Conversation shape** — single-turn tool, multi-turn chat, or autonomous task agent?
+- **Knowledge** — does it need the client's data (RAG)? Sources, freshness, volume → drives pgvector + ingestion.
+- **Tools / actions** — what can it *do* beyond talk (book, query, email, write to a system)? Each tool is a vertical slice.
+- **Model & routing** — quality vs cost tier; which OpenRouter models; fallback chain.
+- **Surface** — embedded widget, standalone app, API, or channel (WhatsApp/Slack)? Auth model.
+- **Compute shape** — purely request/response (Vercel only) or long-running/scheduled/queue work (→ Railway worker)?
+- **Guardrails** — what must it refuse? PII handling? Human escalation path?
+- **Success metric** — how is "good" measured? (This becomes the eval suite. If they can't answer, the project has no finish line — surface it now.)
+- **Cost ceiling** — per-conversation and monthly budget → drives guardrails.
+## Production Definition of Done
+**Foundation & data** — Supabase with **RLS on every table** (conversations, messages, users, embeddings); auth; pgvector if RAG. Migrations in version control.
+**Agent core** — LLM via **OpenRouter** with model fallback; system prompts **versioned in source**, never hardcoded inline; streaming responses; context-window management.
+**RAG (if applicable)** — ingestion pipeline; retrieval quality checked, not assumed; source attribution.
+**Tools/actions** — each action validated server-side; failure + timeout handling; idempotency where it writes.
+**Evals** — pass/fail suite over real cases before "done"; covers the success metric and the refusal/guardrail cases. **This is the ship gate.**
+**Guardrails & cost** — input validation; refusal/safety behavior; graceful fallback on model failure; per-request + daily cost ceilings; token + latency logging.
+**Compute (if Railway)** — health checks (`/health`); structured logging; restart policy; staging→prod env separation; secrets in Railway variables, never logged.
+**App quality** — auth flows; rate limiting; the **non-AI-looking** UI pass; responsive; loading/empty/error/streaming states.
+**Security & compliance** — `service_role` server-only; secrets in env; security headers; MFA on accounts; GDPR posture (EU) — consent, retention, data export/delete.
+**Observability** — Sentry + structured logging + analytics.
+**Deploy & handoff** — Vercel prod (+ Railway prod if worker); env separation; post-deploy smoke including **real agent calls**; credentials + walkthrough + archive + ERP report.
+## The Road (default 0→100)
+### M1 — Foundation & Data
+- Init: Next.js 16 (Vercel) for app + API routes; Supabase project (auth, RLS on every table); Railway service scaffolded *only if* the grill found long-running/scheduled work.
+- Schema: conversations, messages, users; pgvector tables if RAG.
+- OpenRouter wired with a model + fallback; secrets in env.
+- **Exit:** authenticated user can hit a stubbed endpoint; RLS verified by logging in as two users; deploys to preview.
+### M2 — Core Agent Loop (vertical slice: input → model → response → persist)
+- Streaming chat UI; system prompt in source control; conversation persistence.
+- Orchestration: tool-calling scaffold; RAG retrieval if applicable; context management.
+- Cost guardrails + token/latency logging from the first call.
+- **Exit:** a real end-to-end conversation works, persists, and its cost/latency is logged.
+### M3 — Evals & Guardrails (THE GATE)
+- Eval harness with pass/fail cases mapped to the success metric — not vibes.
+- Guardrails: input validation, refusal/safety, fallback on model failure, human-escalation path.
+- Each tool/action: server-side validation, timeout + failure handling, idempotency on writes.
+- Railway health checks + logging if a worker exists.
+- **Exit:** eval suite green; every guardrail case handled. *No ship before this milestone closes.*
+### M4 — App Surface & Polish
+- Auth flows, user management, rate limiting.
+- The non-AI-looking design pass (DESIGN.md, anti-slop), responsive, all async states incl. streaming.
+- **Exit:** product looks and feels built, not generated; passes design-laws.
+### M5 — Handoff (always last)
+- Security review + secrets/env audit; GDPR posture (consent, retention, export/delete).
+- Prod deploy (Vercel + Railway envs separated); post-deploy smoke including **real agent calls**, not just HTTP 200.
+- Credentials handover, walkthrough, archive, `/qualia-report` to ERP.
+- **Exit:** all DoD lines covered or waived with reason; client can operate it.
+## Why M3 exists (the 0→100 insight)
+The reason agents "finish but aren't done" is that M2 *feels* like completion — it talks, it's demo-able. But demo-able ≠ reliable. **M3 is the milestone the old flow never had**: it converts "it replied" into "it passes." If the grill couldn't extract a success metric, M3 has no cases to run — which is the framework telling you the project was never properly scoped. That's the feature, not a bug.
+## Voice extension (`voice-agent`)
+Add a milestone between M3 and M4: **latency budget <800ms end-to-end** (the bar where callers stop noticing it's AI; >1.2s feels like legacy IVR), **end-to-end call testing with pass/fail** through the full Retell + ElevenLabs + Telnyx stack (not just prompt review), turn-taking / barge-in verified, transcript logging + PII redaction, recording-consent disclosure.

package/references/archetypes/voice-agent.md ADDED Viewed

@@ -0,0 +1,60 @@
+---
+archetype: voice-agent
+extends: ai-agent
+stack: Retell (orchestration) · ElevenLabs (voice) · Telnyx (telephony) · OpenRouter (LLM) · Supabase · Vercel/Railway
+updated: 2026-05-29
+---
+# Archetype: `voice-agent`
+> Real-time voice agents (inbound/outbound calls) on Retell + ElevenLabs + Telnyx. **Extends `ai-agent`** — every `ai-agent` Definition-of-Done line still applies (OpenRouter routing, versioned prompts, the eval gate, cost guardrails, RLS, observability, security). This file adds the voice-specific bars, where latency and real call testing are the difference between "demo" and "shippable." Used by `qualia-scope` when the operator picks `voice-agent`.
+## How this file is used
+Same contract: `qualia-scope` grills the **Grill variables**, the **Definition of Done** is the per-increment bar, the **Road** is the default 0→100. Inherits `ai-agent` + `rules/constitution.md`. The new gate is **M-Voice**: real end-to-end calls with pass/fail, not transcript review.
+## Grill variables (added on top of `ai-agent`)
+- **Call direction** — inbound, outbound, or both? Volume/concurrency expected?
+- **The one job** — appointment reminder, intake, qualification, support triage? (Start with one; a vague "assistant" fails.)
+- **Call flow** — the happy path + the branches (no-answer, voicemail, wrong person, transfer-to-human).
+- **Voice & persona** — language(s), accent, ElevenLabs voice, tone, named or anonymous.
+- **Latency tolerance** — confirm the <800ms target fits the use case; identify the slowest dependency (LLM, tool call, DB).
+- **Tools mid-call** — what must it look up or write *during* the call (calendar, CRM, order status)? Each is a latency risk.
+- **Escalation** — when and how does it hand to a human? Warm transfer or callback?
+- **Telephony** — Telnyx numbers, regions, caller-ID, recording laws per region.
+- **Compliance** — recording-consent disclosure, PII handling, GDPR retention (EU). Regulated domain (health/finance)?
+- **Success metric** — answered-rate, completion-rate, transfer-rate, CSAT? (Becomes the eval + call-test pass criteria.)
+## Production Definition of Done (added on top of `ai-agent`)
+**Latency** — **<800ms end-to-end** turn latency is the bar where callers stop noticing it's AI; >1.2s feels like legacy IVR. Measured on real calls, not assumed. Slowest dependency identified and budgeted.
+**Call quality** — turn-taking / barge-in / interruption handled without breaking flow; no dead air on tool calls (filler/await behavior); graceful handling of no-answer, voicemail, silence, wrong person.
+**End-to-end call testing (THE GATE)** — automated test calls through the full Retell + ElevenLabs + Telnyx stack with measurable pass/fail against the success metric. Transcript review is *not* sufficient — the audio path is part of the product.
+**Escalation** — human handoff path tested (transfer or callback); failure modes (LLM/tool/telephony down) degrade safely, never trap the caller.
+**Observability & compliance** — full transcript + recording logging; PII redaction; recording-consent disclosure at call start; GDPR retention policy; per-region recording-law compliance.
+**Cost** — per-minute + per-call cost tracked (voice + LLM + telephony stack); daily ceiling.
+## The Road (default 0→100)
+Follows `ai-agent` M1–M3 (Foundation/Data → Core Loop → Evals & Guardrails), then inserts the voice gate before the app surface:
+### M-Voice — Voice Path & Call Testing (inserted after ai-agent M3, before polish)
+- Retell agent wired to ElevenLabs voice + Telnyx numbers; LLM via OpenRouter.
+- Call flow built: happy path + branches (no-answer, voicemail, wrong person, transfer).
+- Mid-call tools with no-dead-air behavior; barge-in/turn-taking verified.
+- **Latency measured on real calls to the <800ms budget**; slowest dependency optimized.
+- **End-to-end automated call tests** with pass/fail on the success metric.
+- Transcript + recording logging; consent disclosure; PII redaction.
+- **Exit:** real test calls pass the metric at target latency; every branch + escalation handled; compliance wired. *No ship before this closes.*
+### Then — App Surface & Handoff
+- `ai-agent` M4/M5: dashboard (call logs, transcripts, metrics), the non-AI-looking UI, security/GDPR review, prod deploy (Vercel + Railway envs), smoke including **real calls**, handoff or rolling-release.
+## Why M-Voice exists
+A voice agent that reads well in a transcript can still be unusable on a call — 1.5s pauses, talking over the caller, dead air during a lookup. Text evals (ai-agent M3) prove the *reasoning*; M-Voice proves the *experience*. Both gates, or it isn't done.