npm - bossbuild - Versions diffs - 0.97.0 - Mend

bossbuild 0.97.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (128) hide show

package/LICENSE +21 -0
package/PRINCIPLES.md +70 -0
package/README.md +213 -0
package/VERSION +1 -0
package/bin/boss +3 -0
package/library/README.md +19 -0
package/library/agents/.gitkeep +0 -0
package/library/agents/mentor-venture.md +57 -0
package/library/hooks/.gitkeep +0 -0
package/library/hooks/auto-log.js +133 -0
package/library/hooks/memory-cue.js +82 -0
package/library/hooks/secrets-guard.js +87 -0
package/library/memory-seed/README.md +29 -0
package/library/memory-seed/durable-facts-example.md +16 -0
package/library/practices/.gitkeep +0 -0
package/library/practices/agent-security.md +111 -0
package/library/practices/ai-adoption-culture.md +104 -0
package/library/practices/ai-ux-patterns.md +246 -0
package/library/practices/celebration-of-done.md +100 -0
package/library/practices/conscience-voicing.md +121 -0
package/library/practices/context-discipline.md +116 -0
package/library/practices/design-system.md +152 -0
package/library/practices/git-workflow.md +119 -0
package/library/practices/harm-taxonomy.md +45 -0
package/library/practices/quality-ratchet.md +48 -0
package/library/practices/revalidation.md +57 -0
package/library/practices/scalable-architecture.md +111 -0
package/library/practices/ship-it-live.md +149 -0
package/library/practices/skill-authoring.md +70 -0
package/library/skills/.gitkeep +0 -0
package/library/skills/boss-learn/SKILL.md +63 -0
package/library/skills/boss-sync/SKILL.md +48 -0
package/package.json +49 -0
package/registry/CHANGELOG.md +2737 -0
package/src/board.js +655 -0
package/src/brain.js +288 -0
package/src/cli.js +542 -0
package/src/conscience.js +426 -0
package/src/insights.js +147 -0
package/src/learn.js +92 -0
package/src/map.js +103 -0
package/src/modes.js +82 -0
package/src/paths.js +36 -0
package/src/registry.js +34 -0
package/src/scaffold.js +138 -0
package/src/sync.js +292 -0
package/src/team.js +103 -0
package/stages/L0-quickstart/manifest.json +12 -0
package/stages/L0-quickstart/template/.claude/agents/coder-generalist.md +31 -0
package/stages/L0-quickstart/template/.claude/agents/mentor-venture.md +57 -0
package/stages/L0-quickstart/template/.claude/agents/pm.md +28 -0
package/stages/L0-quickstart/template/.claude/hooks/conscience.js +89 -0
package/stages/L0-quickstart/template/.claude/hooks/lib/loop-runtime.js +507 -0
package/stages/L0-quickstart/template/.claude/hooks/lib/yaml.js +163 -0
package/stages/L0-quickstart/template/.claude/hooks/memory-cue.js +82 -0
package/stages/L0-quickstart/template/.claude/hooks/secrets-guard.js +87 -0
package/stages/L0-quickstart/template/.claude/rules/your-app-code.md +17 -0
package/stages/L0-quickstart/template/.claude/settings.json +36 -0
package/stages/L0-quickstart/template/.claude/skills/boss/SKILL.md +161 -0
package/stages/L0-quickstart/template/.claude/skills/boss-learn/SKILL.md +63 -0
package/stages/L0-quickstart/template/.claude/skills/boss-sync/SKILL.md +55 -0
package/stages/L0-quickstart/template/.claude/skills/canvas/SKILL.md +112 -0
package/stages/L0-quickstart/template/.claude/skills/comprehend/SKILL.md +72 -0
package/stages/L0-quickstart/template/.claude/skills/decide/SKILL.md +122 -0
package/stages/L0-quickstart/template/.claude/skills/feedback/SKILL.md +68 -0
package/stages/L0-quickstart/template/.claude/skills/import/SKILL.md +73 -0
package/stages/L0-quickstart/template/.claude/skills/persona/SKILL.md +92 -0
package/stages/L0-quickstart/template/.claude/skills/prototype/SKILL.md +114 -0
package/stages/L0-quickstart/template/.claude/skills/triage/SKILL.md +104 -0
package/stages/L0-quickstart/template/.claude/skills/welcome/SKILL.md +262 -0
package/stages/L0-quickstart/template/AGENTS.md +31 -0
package/stages/L0-quickstart/template/CLAUDE.md +57 -0
package/stages/L0-quickstart/template/docs/IDS.md +42 -0
package/stages/L0-quickstart/template/docs/ideas/INDEX.md +24 -0
package/stages/L0-quickstart/template/docs/loops/canvas-loop.md +90 -0
package/stages/L0-quickstart/template/docs/loops/capture-loop.md +64 -0
package/stages/L1-mvp/manifest.json +12 -0
package/stages/L1-mvp/template/.claude/agents/mentor-architect.md +124 -0
package/stages/L1-mvp/template/.claude/agents/mentor-cofounder.md +85 -0
package/stages/L1-mvp/template/.claude/agents/mentor-gtm.md +49 -0
package/stages/L1-mvp/template/.claude/agents/program-manager.md +46 -0
package/stages/L1-mvp/template/.claude/agents/tester.md +42 -0
package/stages/L1-mvp/template/.claude/hooks/auto-log.js +133 -0
package/stages/L1-mvp/template/.claude/rules/feature-context.md +18 -0
package/stages/L1-mvp/template/.claude/skills/ai-cost/SKILL.md +249 -0
package/stages/L1-mvp/template/.claude/skills/ai-failure-states/SKILL.md +226 -0
package/stages/L1-mvp/template/.claude/skills/ai-first-init/SKILL.md +227 -0
package/stages/L1-mvp/template/.claude/skills/close/SKILL.md +170 -0
package/stages/L1-mvp/template/.claude/skills/consult/SKILL.md +72 -0
package/stages/L1-mvp/template/.claude/skills/cost-review/SKILL.md +204 -0
package/stages/L1-mvp/template/.claude/skills/design-tokens-init/SKILL.md +192 -0
package/stages/L1-mvp/template/.claude/skills/drift-deep/SKILL.md +170 -0
package/stages/L1-mvp/template/.claude/skills/evals/SKILL.md +154 -0
package/stages/L1-mvp/template/.claude/skills/extract/SKILL.md +209 -0
package/stages/L1-mvp/template/.claude/skills/judge-traces/SKILL.md +68 -0
package/stages/L1-mvp/template/.claude/skills/log/SKILL.md +64 -0
package/stages/L1-mvp/template/.claude/skills/practice/SKILL.md +92 -0
package/stages/L1-mvp/template/.claude/skills/pretotype/SKILL.md +95 -0
package/stages/L1-mvp/template/.claude/skills/red-team/SKILL.md +137 -0
package/stages/L1-mvp/template/.claude/skills/revalidate/SKILL.md +51 -0
package/stages/L1-mvp/template/.claude/skills/ship/SKILL.md +105 -0
package/stages/L1-mvp/template/.claude/skills/smoke/SKILL.md +43 -0
package/stages/L1-mvp/template/.claude/skills/spec/SKILL.md +145 -0
package/stages/L1-mvp/template/claude-append.md +122 -0
package/stages/L1-mvp/template/docs/loops/ai-failure-state-loop.md +107 -0
package/stages/L1-mvp/template/docs/loops/coordination-loop.md +116 -0
package/stages/L1-mvp/template/docs/loops/cost-budget-loop.md +117 -0
package/stages/L1-mvp/template/docs/loops/cost-review-loop.md +113 -0
package/stages/L1-mvp/template/docs/loops/design-tokens-loop.md +98 -0
package/stages/L1-mvp/template/docs/loops/drift-loop.md +149 -0
package/stages/L1-mvp/template/docs/loops/extraction-loop.md +128 -0
package/stages/L1-mvp/template/docs/loops/focus-loop.md +106 -0
package/stages/L1-mvp/template/docs/loops/pretotype-loop.md +88 -0
package/stages/L1-mvp/template/docs/loops/spec-loop.md +83 -0
package/stages/L2-v1/manifest.json +12 -0
package/stages/L2-v1/template/.claude/agents/db-architect.md +91 -0
package/stages/L2-v1/template/.claude/agents/mentor-business.md +124 -0
package/stages/L2-v1/template/.claude/agents/mentor-fundraising.md +72 -0
package/stages/L2-v1/template/.claude/agents/mentor-pitch.md +84 -0
package/stages/L2-v1/template/.claude/agents/mentor-talent.md +84 -0
package/stages/L2-v1/template/.claude/agents/ui-designer.md +81 -0
package/stages/L2-v1/template/.claude/agents/ux-designer.md +87 -0
package/stages/L2-v1/template/.claude/skills/board/SKILL.md +98 -0
package/stages/L2-v1/template/.claude/skills/design-review/SKILL.md +77 -0
package/stages/L2-v1/template/.claude/skills/ux-check/SKILL.md +93 -0
package/stages/L2-v1/template/claude-append.md +59 -0
package/stages/L2-v1/template/docs/loops/design-drift-loop.md +108 -0
package/stages/L3-scale/README.md +13 -0

package/stages/L1-mvp/template/.claude/hooks/auto-log.js ADDED Viewed

@@ -0,0 +1,133 @@
+#!/usr/bin/env node
+// BOSS auto-log — a SubagentStop hook (OPT-IN; the trace substrate for IDEA-025).
+//
+// Ported UP from the dhun dogfood (Principle #1), Node-ported for BOSS's zero-dep
+// rule. WHAT IT DOES: after a writer subagent finishes, append one honest line to
+// `.boss/trace.jsonl` recording what that agent actually touched — session, agent,
+// changed files, timestamp. This is the *trace substrate*: the within-session
+// counterpart to `boss insights` (which reads the cross-project registry). It is
+// the raw material a trace-native judge (IDEA-025 Phase 2, `/judge-traces`) and the
+// sleep-time learn loop (Phase 3) read later. It does NOT judge, score, or send
+// anything anywhere.
+//
+// HUMANE CONTRACT (inherits IDEA-021 / IDEA-013, non-negotiable):
+//   - LOCAL-ONLY. Writes one file in this repo. Never transmits. Never shares up.
+//   - APPEND-ONLY, facts not estimates. Records what the git tree shows changed.
+//   - MEASURE, DON'T INSTRUMENT THE HUMAN. It reads the work's own trace, not you.
+//   - READ ON DEMAND. Nothing consumes this file unless you run a skill that does.
+//
+// WHY OPT-IN: a SubagentStop hook fires a process after every subagent — real
+// latency on multi-agent sessions. Ship it dormant; turn it on when you want the
+// trace (regulated/high-stakes cohorts, or BOSS's own repo eating its dogfood).
+//
+// TO TURN IT ON — add to .claude/settings.json (registration IS the on-switch):
+//   "hooks": {
+//     "SubagentStop": [
+//       { "matcher": "",
+//         "hooks": [ { "type": "command",
+//                      "command": "node \"$CLAUDE_PROJECT_DIR/.claude/hooks/auto-log.js\"",
+//                      "timeout": 10 } ] }
+//     ]
+//   }
+//
+// Fail-open: any surprise exits 0. A trace line missed is fine; a broken session is not.
+import process from 'node:process';
+import { execFileSync } from 'node:child_process';
+import { existsSync, mkdirSync, readFileSync, appendFileSync } from 'node:fs';
+import { join } from 'node:path';
+// Read-only agent types never write files — skip them (no trace to record).
+const READ_ONLY = new Set([
+  'Explore', 'Plan', 'claude-code-guide', 'general-purpose-readonly',
+  'mentor-venture', 'mentor-architect', 'mentor-gtm', 'mentor-business',
+  'mentor-fundraising', 'mentor-talent', 'mentor-humane', 'mentor-pitch',
+]);
+function readStdin() {
+  return new Promise((resolve) => {
+    let data = '';
+    process.stdin.setEncoding('utf8');
+    process.stdin.on('data', (c) => (data += c));
+    process.stdin.on('end', () => resolve(data));
+    setTimeout(() => resolve(data), 1500);
+  });
+}
+const git = (repo, args) => {
+  try {
+    // Trailing-only trim: porcelain's leading status columns are significant
+    // (line 1 is " M path" — a full .trim() would eat that leading space and
+    // shift the column parse by one).
+    return execFileSync('git', args, { cwd: repo, encoding: 'utf8' }).replace(/\n+$/, '');
+  } catch {
+    return '';
+  }
+};
+const main = async () => {
+  const repo = process.env.CLAUDE_PROJECT_DIR || process.cwd();
+  let agent = 'unknown';
+  let session = 'unknown';
+  try {
+    const json = JSON.parse(await readStdin());
+    agent = json.subagent_type || json.tool_input?.subagent_type || json.agent_type || 'unknown';
+    session = json.session_id || 'unknown';
+  } catch {
+    process.exit(0); // fail-open
+  }
+  if (READ_ONLY.has(agent)) process.exit(0);
+  // Facts not estimates: read what the working tree actually shows changed —
+  // both tracked modifications AND new untracked files (coders create files,
+  // which a plain `git diff HEAD` would miss). `git status --porcelain` carries
+  // both; columns 0-1 are the status code, the path starts at column 3.
+  const changed = [
+    ...new Set(
+      git(repo, ['status', '--porcelain', '--untracked-files=all'])
+        .split('\n')
+        .filter(Boolean)
+        .map((l) => l.slice(3).replace(/^"|"$/g, '').split(' -> ').pop())
+        .filter(Boolean)
+    ),
+  ];
+  if (changed.length === 0) process.exit(0); // nothing written → nothing to trace
+  const files = changed.slice(0, 25); // cap the line; a 200-file diff doesn't need every path
+  const record = {
+    ts: new Date().toISOString(),
+    session,
+    agent,
+    files,
+    file_count: changed.length,
+  };
+  const dir = join(repo, '.boss');
+  const out = join(dir, 'trace.jsonl');
+  try {
+    if (!existsSync(dir)) mkdirSync(dir, { recursive: true });
+    // Dedup: SubagentStop can fire several times for one change set. Skip if the
+    // last line has the same (session, agent, files) signature.
+    if (existsSync(out)) {
+      const lines = readFileSync(out, 'utf8').trim().split('\n');
+      const last = lines[lines.length - 1];
+      if (last) {
+        try {
+          const p = JSON.parse(last);
+          const same =
+            p.session === session &&
+            p.agent === agent &&
+            JSON.stringify(p.files) === JSON.stringify(files);
+          if (same) process.exit(0);
+        } catch { /* fall through and write */ }
+      }
+    }
+    appendFileSync(out, JSON.stringify(record) + '\n');
+  } catch {
+    process.exit(0); // fail-open: a write surprise must not break the session
+  }
+  process.exit(0);
+};
+main();

package/stages/L1-mvp/template/.claude/rules/feature-context.md ADDED Viewed

@@ -0,0 +1,18 @@
+---
+paths:
+  - "src/**"
+---
+<!-- MVP working-context. Like the Quickstart rule, this loads only when Claude opens a matching file.
+     In MVP you build feature by feature (FEAT-NNN). Keep the *live* feature's working notes here — the
+     local decisions, gotchas, and "don't redo this" that matter while it's in flight, not forever.
+     When the feature ships, /close will compress this to a one-line outcome (BOSS FEAT-020, Phases 2-3).
+     Until then: prune it by hand when a feature lands. Rescope paths: to where the feature's code lives. -->
+# Working context — current feature
+_What the model needs while building the live FEAT. Ephemeral by design — prune when it ships._
+- **Active FEAT:** (e.g. FEAT-003 — checkout flow)
+- **Local decisions:** (the choices that bind this feature's code, with a one-line why)
+- **Gotchas / don't-redo:** (the traps you already hit, so the model doesn't re-walk them)

package/stages/L1-mvp/template/.claude/skills/ai-cost/SKILL.md ADDED Viewed

@@ -0,0 +1,249 @@
+---
+name: ai-cost
+description: Establish AI spend discipline for {{PROJECT_NAME}} — declare per-user / per-feature / monthly budgets, name the model choices, wire a per-call cost logger, set a review cadence. Cohort-aware (first-product gets a tight cap; vibe-virtuoso gets inspect-only; domain-expert gets privacy-first logging). Run at the first inflection where the app actually calls an LLM. Usage - /ai-cost
+---
+# /ai-cost — name the bill before it surprises you
+The cost of an AI-mediated app is the single most-load-bearing operating decision you make once
+your code reaches the model. Token math is small per call and large per cohort. *"Just call GPT-5
+and see"* is a perfectly fine demo posture and a perfectly destructive production posture.
+This skill is the gate between *"the app calls an LLM"* and *"the app is in front of users."* It
+makes you declare the budget BEFORE the bill, wire a logger so you can SEE the bill, and pair
+the cost shape with the right mentor (architecture for shape; business for unit economics).
+## When to run it
+- A FEAT puts an LLM call in the user-facing control flow (not a one-off dev script).
+- You're about to ship that FEAT to anyone other than yourself.
+- You see the conscience's `cost` moment open — the `cost-budget-loop` detected LLM calls in
+  `src/` and no `docs/ai-cost-budget.md`. Run this skill to close the loop.
+- After a model swap or prompt rewrite. The bill changes; the budget should be re-checked.
+- After a real-bill surprise. The bill IS the design signal — codify the lesson here.
+## What this skill produces
+1. **`docs/ai-cost-budget.md`** — the declared contract. Budgets, model choices, alert
+   thresholds, review cadence. The single file your future-self reads when the bill spikes.
+2. **A cost-logger wrapper** — a small function in your stack that wraps every LLM SDK call,
+   records `{ feat, model, input_tokens, output_tokens, estimated_usd, ts }` to a local ledger.
+3. **`.boss/cost-log.jsonl`** — the running ledger (gitignored; local-only by default; ship to
+   a real datastore when you have real users).
+## How to run it
+### 1. Read the cohort
+Read `cohort` from `.boss/config.json`. The cohort decides the default posture. If unset, ask the
+one open question from `/boss` step 6, then continue with the answer.
+### 2. Survey the LLM surface
+Scan `src/` for LLM SDK call sites (`anthropic`, `openai`, `@anthropic-ai/sdk`, `messages.create`,
+`chat.completions.create`, `generateText`, `streamText`, `Anthropic(`, `OpenAI(`, etc.). For each
+hit, identify:
+- **Which FEAT** it serves (link to a `FEAT-NNN`).
+- **Which model** it uses (e.g., `claude-sonnet-4-6`, `gpt-5-mini`, `claude-haiku-4-5`).
+- **Per-call shape** — prompt size order-of-magnitude (small / medium / large), expected outputs.
+- **Call frequency** — once per session? Per user action? Per page render?
+Don't audit every call — find the **three most expensive call patterns** by order of magnitude.
+Most apps are 80/20: a small number of call patterns dominate cost.
+### 3. Pick the budget shape (cohort-aware)
+Walk the founder through the budget framework. Cohort defaults below are *starting points*,
+not prescriptions — they're calibrated to the cohort's risk and operating style. The founder
+picks; the skill records.
+| Cohort | Default per-user/day | Monthly cap | Posture |
+|---|---|---|---|
+| `first-product` | $5 | $100 | Conservative. Hard cap. Auto-fallback to cheaper model on breach. |
+| `vibe-coder-newbie` | $5 | $50 | Strict — protect from runaway. Define cap in plain dollars, not tokens. |
+| `non-tech-founder` | $10 | $200 | Plain-language framing. Show: *"each user costs about $X/day."* |
+| `vibe-virtuoso` | (inspect-only) | (inspect-only) | No gate. Logger on; budget tracked; show the numbers. Override-friendly. |
+| `eng-builder` | (BYO) | (BYO) | Logger on; no opinion on caps. Transparent + inspectable; they'll wire alerts themselves. |
+| `indie-hacker` | $3 (sustainable margin) | $50 | Frame as **% of revenue per user** (target: <30% of MRR per user). Calm-company math. |
+| `returning-founder` | $10 | $300 | Frame as **cost-per-acquired-user** + **cost-per-active-user**. They know unit economics. |
+| `domain-expert` | $20 | $500 | Higher per-user is fine in regulated domains. **Privacy-first logging — NO PII, NO prompt body unless redacted.** Cite the regulatory context. |
+| _(no cohort declared)_ | $10 | $200 | Generic conservative; revisit when cohort sharpens. |
+For each row, the founder edits to fit the actual bet. The numbers are a **starting frame**;
+the founder's read of the business is the real signal.
+### 4. Pick model choices deliberately
+For each call site identified in step 2, name **why** the chosen model. Three valid answers:
+- **"Quality requires it"** — name the failure mode that the cheaper model exhibits. (If you
+  can't, the cheaper model probably works.)
+- **"Speed requires it"** — name the latency budget. (If there's no SLA, latency probably
+  doesn't require the bigger model.)
+- **"Default we haven't tested"** — *valid only as a TODO.* Schedule the A/B against the
+  cheaper model in the same `docs/ai-cost-budget.md`.
+The most common cost win is **downgrading non-load-bearing calls** to cheaper models. The
+second most common is **caching** (Anthropic prompt caching, response caching). The third is
+**batching** (Anthropic batch API; OpenAI batch). Each gets a line in the budget doc.
+### 5. Wire the logger
+A ~30-line wrapper around the LLM SDK that records each call. Stack-agnostic shape:
+```typescript
+// src/lib/ai-cost-logger.ts
+import { appendFileSync } from 'node:fs';
+import { join } from 'node:path';
+const LEDGER = join(process.cwd(), '.boss', 'cost-log.jsonl');
+// Model price table — UPDATE WHEN YOU SWAP MODELS. The math is wrong otherwise.
+const PRICE_PER_M_TOKENS = {
+  'claude-sonnet-4-6':   { input: 3.00, output: 15.00 },
+  'claude-haiku-4-5':    { input: 1.00, output:  5.00 },
+  'claude-opus-4-7':     { input: 15.00, output: 75.00 },
+  'gpt-5-mini':          { input: 0.25, output:  2.00 },
+  // add yours
+};
+export function logCall({ feat, model, inputTokens, outputTokens, userId }) {
+  const p = PRICE_PER_M_TOKENS[model] || { input: 0, output: 0 };
+  const usd = (inputTokens * p.input + outputTokens * p.output) / 1_000_000;
+  const entry = {
+    ts: new Date().toISOString(),
+    feat, model, userId,
+    input_tokens: inputTokens,
+    output_tokens: outputTokens,
+    estimated_usd: Number(usd.toFixed(6)),
+  };
+  appendFileSync(LEDGER, JSON.stringify(entry) + '\n');
+  return entry;
+}
+```
+```python
+# src/ai_cost_logger.py
+import json, os, datetime
+LEDGER = os.path.join(os.getcwd(), ".boss", "cost-log.jsonl")
+PRICE_PER_M = {
+    "claude-sonnet-4-6":   {"input": 3.00, "output": 15.00},
+    "claude-haiku-4-5":    {"input": 1.00, "output":  5.00},
+    "claude-opus-4-7":     {"input": 15.00, "output": 75.00},
+    "gpt-5-mini":          {"input": 0.25, "output":  2.00},
+}
+def log_call(feat, model, input_tokens, output_tokens, user_id=None):
+    p = PRICE_PER_M.get(model, {"input": 0, "output": 0})
+    usd = (input_tokens * p["input"] + output_tokens * p["output"]) / 1_000_000
+    entry = {
+        "ts": datetime.datetime.utcnow().isoformat() + "Z",
+        "feat": feat, "model": model, "user_id": user_id,
+        "input_tokens": input_tokens, "output_tokens": output_tokens,
+        "estimated_usd": round(usd, 6),
+    }
+    with open(LEDGER, "a") as f:
+        f.write(json.dumps(entry) + "\n")
+    return entry
+```
+Wrap each LLM call. The wrapper is the *only* path to the SDK — make it impossible to bypass:
+add a lint rule or a code review note that says *"if you imported `@anthropic-ai/sdk` directly,
+this is a bug — go through `lib/ai-cost-logger`."*
+**Privacy note (domain-expert and any health/legal/financial project):** the logger above
+records token counts and metadata, NOT prompt or response content. Keep it that way. If you
+need to log content for debugging, do it in a separate file with explicit consent + retention,
+and exclude it from any shared logs.
+### 6. Write `docs/ai-cost-budget.md`
+The contract doc. Use this skeleton (frontmatter included so it's discoverable like every other
+BOSS doc):
+```markdown
+---
+id: ai-cost-budget
+type: budget
+owner: pm
+status: declared
+updated: {{DATE}}
+---
+# AI cost budget — {{PROJECT_NAME}}
+## Cohort + posture
+- Cohort: <cohort name from .boss/config.json>
+- Posture: <strict cap | inspect-only | BYO | % of revenue>
+## Budgets
+- **Per user, per day:** $X.XX  (alert at 80% — $X.XX)
+- **Per user, per month:** $X.XX
+- **Monthly cap (all users):** $X.XX  (hard ceiling: pause the feature, don't quietly overrun)
+## Model choices (one row per call site)
+| Call site / FEAT | Model | Why this model | Cheaper-model A/B status |
+|---|---|---|---|
+| <FEAT-001 / classify-intent> | <claude-haiku-4-5> | <quality requires it: classifier fails below this> | <tested 2026-MM-DD; haiku 92%, sonnet 96% — kept sonnet> |
+## Cost levers (revisit when budget breached)
+- [ ] Prompt caching (Anthropic prompt caching for stable system prompts)
+- [ ] Response caching (identical prompts in <N> minutes)
+- [ ] Batch API (non-realtime calls)
+- [ ] Downgrade to cheaper model for non-load-bearing calls
+- [ ] Truncate context (do you really need the whole document?)
+- [ ] Structured outputs (Liu) — smaller schemas = smaller responses
+## Review cadence
+- Weekly during MVP — read `.boss/cost-log.jsonl`, total by FEAT + by user, sanity-check.
+- Monthly during V1 — daily totals; cohort cost-per-user; cost-as-%-of-revenue.
+## Breach grammar (per IDEA-008)
+- When per-user/day exceeds budget by <Y%>, hook should surface the `cost` moment.
+- Override (when legitimate): record in `docs/devlog.md`:
+  - **OVERRIDE:** `cost-budget-loop` overrun on <date> — rationale: <e.g., one power user
+    running a long workflow; not representative; expected to come back into budget by week-end>.
+```
+### 7. Set the review cadence
+Add a reminder to `docs/RESUME.md` next-tasks: *"Review `.boss/cost-log.jsonl` weekly through
+MVP."* This is the discipline part — without the cadence, the ledger fills up unread.
+### 8. Pair with mentors (when warranted)
+After writing the budget doc, optionally:
+- **`mentor-architect`** — when the cost shape implies an architectural decision (batching vs.
+  realtime, caching layer, model fallback strategy). Hand off with: *"`mentor-architect`, the
+  cost shape says X — what architecture decisions does that imply?"*
+- **`mentor-business`** — when unit economics get load-bearing (cost-per-acquired-user, cost
+  vs. willingness-to-pay, pricing implications). Hand off with: *"`mentor-business`, our
+  cost-per-active-user is X; what should the pricing carry?"*
+Don't auto-invoke either. Surface the question; let the founder decide whether to consult.
+## Connection to other loops
+- **Upstream:** `pretotype-loop` closed — you know the demand exists. Don't optimize cost
+  before you've validated the bet; you'll spend on the wrong thing.
+- **Downstream:** `cost-budget-loop` — the conscience moment that fires when LLM calls are
+  present and the budget doc is missing (or breaches it). This skill closes that loop.
+- **Adjacent:** `/evals` — the eval set IS a cost lever (Husain). Cheaper models pass enough
+  evals → ship the cheaper model.
+## Rules
+- **Declare BEFORE the bill.** A budget written after the surprise is a post-mortem, not a budget.
+- **Token math is not optional once users are real.** "I'll watch it" is a budget of $0 with
+  a guarantee of overrun.
+- **Right It before It right (Savoia) — but also Right Costs before Costs Right.** Don't
+  optimize the bill on a feature that hasn't earned its existence.
+- **The logger is the only path to the SDK.** If founders can call the SDK directly, the
+  ledger lies. Lint it; review it; convention it.
+- **Privacy-first logging.** Token counts and metadata are fine. Prompt and response bodies
+  are NOT fine to ship to shared logs without consent + retention discipline.
+- **The cost moment is a nudge, not a gate.** The conscience surfaces drift; the founder
+  decides. Override grammar in `docs/devlog.md` per IDEA-008.
+- **Per-cohort math is real math.** A first-product cohort and a domain-expert cohort don't
+  have the same budget shape; pretending they do produces wrong defaults for both.

package/stages/L1-mvp/template/.claude/skills/ai-failure-states/SKILL.md ADDED Viewed

@@ -0,0 +1,226 @@
+---
+name: ai-failure-states
+description: Design what {{PROJECT_NAME}} does when the AI fails — the five failure states (garbage / refusal / hallucination / timeout / cost-spike) and the declared response for each. Names the UX *before* the failure happens, not after. Cohort-aware (first-product gets named patterns; eng-builder gets lint-anchored unhandled-path discipline; domain-expert gets humane-fallback when stakes are real). Run during /ai-first-init, or any time a FEAT puts an LLM in the user-facing path. Usage - /ai-failure-states
+---
+# /ai-failure-states — name the failure before the user finds it
+Most AI-mediated apps ship the **happy path** and discover the **failure modes** in production.
+The failure modes were always going to happen. What was missing was the *declared response* —
+the UX answer for each, designed before the user encountered it.
+This skill is the design step that costs an hour and saves the next ten. Five failure states.
+One declared response per state. Cohort-aware delivery. Recorded in `docs/ai-failure-states.md`
+so the next FEAT inherits the discipline.
+## The five failure states
+These are the failure modes that **always exist** in AI-mediated code. Naming each one + its
+declared response IS the design. The skill walks you through each in order.
+| # | Failure | What it looks like | Default declared response |
+|---|---|---|---|
+| 1 | **Garbage output** | Model returns nonsense, malformed JSON, off-topic prose, content that violates the schema. | Reject + retry once with a "be more careful" preamble; on second failure, surface the structured-error UI. |
+| 2 | **Refusal** | Model refuses ("I can't help with that"), gives an over-cautious non-answer, returns a safety-template. | Detect refusal patterns; route to a human-handoff or a deterministic fallback. Don't loop on the same prompt. |
+| 3 | **Hallucination** | Model returns confidently-wrong content (made-up citations, invented APIs, fabricated facts). | If the FEAT depends on factual accuracy: add a verification step (citation lookup, schema validation, second-pass cross-check); if not, lower the temperature + tighten the prompt. |
+| 4 | **Timeout / network failure** | The call hangs, the network drops, the provider returns 5xx. | Hard timeout (declared per call site); on timeout, return the *last-known-good* result, a graceful degradation, or a queued retry — never the spinner-forever. |
+| 5 | **Cost spike** | A request consumes 10x the expected tokens (long input, runaway output, prompt injection eating context). | Per-call token cap (input AND output); on cap-hit, truncate gracefully with a labeled response ("this answer was capped at N tokens — refine your question or upgrade"). |
+Other failure modes exist (rate-limit, model deprecation, eval regression, etc.) but these five
+are the **load-bearing** ones — every AI-mediated FEAT has all five. Design responses for each.
+## When to run it
+- During `/ai-first-init` — the conductor calls this as step 5.
+- Before any FEAT that puts an LLM call in the user-visible path ships (acceptance criteria
+  should reference the failure-state response, not assume the happy path).
+- When the `ai-failure-state-loop` opens — the conscience surfaces a `failure-mode` moment
+  saying *"the code calls an LLM but no failure-states doc exists; what does the UI do when
+  the model fails?"*
+- After a real-production failure surprise — codify the new failure mode here so the next
+  FEAT inherits the answer.
+## How to run it
+### 1. Read the cohort + the project's AI-first declaration
+Read `cohort` from `.boss/config.json`. If `docs/ai-first.md` exists, read it — it names what's
+AI-mediated in this project (decides which failure states warrant the most design).
+### 2. Walk the founder through each failure state
+For each of the five, ask **two questions**:
+- **What does it look like in this project?** (Concrete, not abstract — "the user asked for
+  a recipe and got a wall of unrelated text" beats "garbage output.")
+- **What does the UI do?** (Concrete — "show the structured-error card with a retry button"
+  beats "handle gracefully.")
+Cohort-aware delivery:
+- **`first-product`**: walk through each with a named example. Don't assume they know what
+  hallucination looks like in practice. Show the pattern, then ask them to declare.
+- **`vibe-coder-newbie`**: similar — patterns over abstractions. Cite "this is the thing
+  where Claude makes up citations" not "epistemic failure mode #3."
+- **`non-tech-founder`**: plain language. Each failure described as *"the user sees X; the
+  app should do Y."*
+- **`eng-builder`**: terse + inspectable. Hand them the table; they'll declare the responses
+  in a paragraph. They'll likely add their own (e.g., "model deprecation = pin model
+  version + feature flag for swap").
+- **`vibe-virtuoso`**: ship a starter declaration in one pass; they'll edit. Don't coach.
+- **`indie-hacker`**: frame failure as **cost-of-an-unhappy-customer**. Each declared
+  response is a calm-company artifact (no panic UX; calibrated degradation).
+- **`returning-founder`**: skip the 101. *"You've seen these — what's your declared
+  response for each in this project?"*
+- **`domain-expert`**: stakes are real. For **hallucination** in medical/legal/financial
+  contexts: **the declared response is almost always a human-in-the-loop, not a retry.** For
+  **refusal**: the route to a human escalation has to exist *as a first-class UI path,* not a
+  fallback. Cite the regulatory frame in the doc itself.
+### 3. Write `docs/ai-failure-states.md`
+Use this skeleton (frontmatter included so it's discoverable like every other BOSS doc).
+The **Eval-tested** field on each state (v0.30.0+) closes the *"stub forever"* loophole —
+naming which eval case actually exercises the handler turns the declaration into a contract.
+```markdown
+---
+id: ai-failure-states
+type: design-decisions
+owner: pm
+status: declared
+updated: {{DATE}}
+---
+# AI failure states — {{PROJECT_NAME}}
+## Cohort + context
+- Cohort: <cohort name from .boss/config.json>
+- AI-mediated surfaces: <which features depend on the model; pulled from docs/ai-first.md>
+- Stakes: <low / moderate / high — names the regulatory or human-stakes context>
+## The five failure states
+### 1. Garbage output
+- **Looks like:** <project-specific example>
+- **Declared response:** <what the UI does, in code-level detail>
+- **Fallback handler:** <name the function/component that owns this — e.g., `handleGarbageResponse()`,
+  `<ErrorBoundary kind="malformed">`>
+- **Eval-tested:** _(v0.30.0+)_ <eval case id that exercises this — e.g., `feat-007-fail-001-garbage`>
+  or **STUB** (handler exists but no eval — record an override or write the eval).
+### 2. Refusal
+- **Looks like:** ...
+- **Declared response:** ...
+- **Fallback handler:** ...
+- **Eval-tested:** <eval case id> or **STUB**
+### 3. Hallucination
+- **Looks like:** ...
+- **Declared response:** ...
+- **Fallback handler:** ...
+- **Eval-tested:** <eval case id> or **STUB**
+### 4. Timeout / network failure
+- **Looks like:** ...
+- **Declared response:** ...
+- **Hard timeout (ms):** <per-call-site declaration>
+- **Fallback handler:** ...
+- **Eval-tested:** <eval case id> or **STUB**
+### 5. Cost spike
+- **Looks like:** ...
+- **Declared response:** ...
+- **Per-call token cap (in / out):** <numbers>
+- **Fallback handler:** ...
+- **Eval-tested:** <eval case id> or **STUB**
+## Verification cadence
+- Eval set covers each failure state (Husain): yes / no / partial.
+  See `docs/evals/FEAT-NNN.yml`. **v0.30.0+: the `/evals` skill requires AI-mediated FEATs
+  to include at least one `should-fail` case per declared failure state, categorized by
+  `failure_mode` matching the names above.**
+- Production telemetry: how do we know a failure happened? <log signal, alert, etc.>
+- Review cadence: <weekly during MVP / monthly during V1>
+## Override grammar (per IDEA-008)
+When a failure-state response is intentionally not implemented (legitimate skip — e.g., feature
+is dev-only and not user-facing yet) OR when **Eval-tested = STUB** is acceptable for now,
+record in `docs/devlog.md`:
+- **OVERRIDE:** skipped <failure-state-N> response on <date> — rationale: <why; expected
+  re-open condition>.
+- **OVERRIDE:** kept <failure-state-N> as STUB on <date> — rationale: <e.g., handler is a
+  stub because production traffic hasn't surfaced this failure yet; will write the eval +
+  implementation when FEAT-MMM ships>.
+```
+### 4. Wire the fallback handlers in code (or stub them)
+For each failure state, add at minimum a **stub handler** in the code path that wraps the LLM
+call. This satisfies the `ai-failure-state-loop` exit predicate AND prevents the
+forgot-to-handle-this regression.
+```typescript
+// src/lib/ai-handlers.ts — stubs for the five declared responses
+export function handleGarbageResponse(raw: string, retry: () => Promise<unknown>) {
+  // 1. Validate against schema (Liu discipline). If invalid: retry once with stricter prompt.
+  // 2. On second failure: return structured error for the UI.
+  throw new Error('TODO: implement per docs/ai-failure-states.md §1');
+}
+export function handleRefusal(modelText: string) { /* ... */ }
+export function handleHallucination(...) { /* ... */ }
+export function handleTimeout(...) { /* ... */ }
+export function handleCostSpike(...) { /* ... */ }
+```
+The stubs exist so the founder *cannot forget*. The loop's exit predicate scans for these
+handler names — if they exist (even as stubs), the loop closes. The discipline is that the
+declaration exists *before* the FEAT ships; the implementation can happen incrementally.
+### 5. Update existing AI-mediated FEAT specs
+For each `docs/ideas/FEAT-NNN.md` that puts an LLM in the user-visible path:
+- Add a **Failure states** section to the spec (the v0.26 `/spec` upgrade adds this field
+  automatically for new FEATs).
+- Reference the declared response from `docs/ai-failure-states.md`.
+- Update **Acceptance criteria** to include at least one failure-state path (e.g., *"refusal
+  routes to /support, not the spinner"*).
+## Connection to other loops
+- **Upstream:** `cost-budget-loop` closed (budget exists; cost-spike has a number to compare
+  against). `/evals` running (eval set categorizes failure modes per Husain).
+- **Same loop:** `ai-failure-state-loop` — opens when LLM call sites exist without a
+  declared failure-states doc + at least one handler reference at the call site.
+- **Downstream:** Structured outputs (Liu) — each declared response often *depends* on the
+  output being schema-validated; if you haven't declared a schema, garbage detection is
+  guesswork.
+## What this skill is NOT
+- **Not a UI library.** It declares the *response*, not the visual. The visual lives in your
+  component library / design tokens.
+- **Not a substitute for evals.** Evals catch the failures; the failure-states doc *names what
+  to do* when they happen. Both are required.
+- **Not a guarantee.** Designing the response doesn't mean it works on first try; it means
+  the next FEAT inherits a starting point + the override grammar tells you when you skipped.
+## Rules
+- **Name the five.** Each AI-mediated FEAT has all five failure modes. Pretending one doesn't
+  apply is the bug that produces the spinner-forever / silent-fail / wallet-drain bug a month
+  later.
+- **Concrete over abstract.** *"Show the structured-error card with retry"* beats *"handle
+  gracefully."* If you can't name what the UI does, you haven't designed it.
+- **Stubs over nothing — but not stubs forever.** A `handleHallucination()` that throws
+  *"TODO: implement per §3"* is better than no function at all — it satisfies the loop AND
+  prevents the silent regression. **But:** the `Eval-tested` field is what turns a stub
+  into a contract. If you've shipped a stub, you've also committed to writing the eval case
+  that will eventually exercise it — OR to recording the override per IDEA-008 with a
+  re-open condition (v0.30.0+).
+- **Domain-expert exception.** In high-stakes domains, the declared response for hallucination
+  is **almost never a retry** — it's a human-in-the-loop escalation. Don't design AI-as-final-
+  answer in regulated contexts.
+- **Override is legitimate.** Skip a state when the founder has a real reason (dev-only
+  feature; no user-facing path). Record the override in devlog per IDEA-008.
+- **The doc is a living artifact.** When a new failure mode shows up in production, add it as
+  a sixth (and seventh, etc.) — the five are the floor, not the ceiling.