npm - bossbuild - Versions diffs - 0.97.0 - Mend

bossbuild 0.97.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (128) hide show

package/LICENSE +21 -0
package/PRINCIPLES.md +70 -0
package/README.md +213 -0
package/VERSION +1 -0
package/bin/boss +3 -0
package/library/README.md +19 -0
package/library/agents/.gitkeep +0 -0
package/library/agents/mentor-venture.md +57 -0
package/library/hooks/.gitkeep +0 -0
package/library/hooks/auto-log.js +133 -0
package/library/hooks/memory-cue.js +82 -0
package/library/hooks/secrets-guard.js +87 -0
package/library/memory-seed/README.md +29 -0
package/library/memory-seed/durable-facts-example.md +16 -0
package/library/practices/.gitkeep +0 -0
package/library/practices/agent-security.md +111 -0
package/library/practices/ai-adoption-culture.md +104 -0
package/library/practices/ai-ux-patterns.md +246 -0
package/library/practices/celebration-of-done.md +100 -0
package/library/practices/conscience-voicing.md +121 -0
package/library/practices/context-discipline.md +116 -0
package/library/practices/design-system.md +152 -0
package/library/practices/git-workflow.md +119 -0
package/library/practices/harm-taxonomy.md +45 -0
package/library/practices/quality-ratchet.md +48 -0
package/library/practices/revalidation.md +57 -0
package/library/practices/scalable-architecture.md +111 -0
package/library/practices/ship-it-live.md +149 -0
package/library/practices/skill-authoring.md +70 -0
package/library/skills/.gitkeep +0 -0
package/library/skills/boss-learn/SKILL.md +63 -0
package/library/skills/boss-sync/SKILL.md +48 -0
package/package.json +49 -0
package/registry/CHANGELOG.md +2737 -0
package/src/board.js +655 -0
package/src/brain.js +288 -0
package/src/cli.js +542 -0
package/src/conscience.js +426 -0
package/src/insights.js +147 -0
package/src/learn.js +92 -0
package/src/map.js +103 -0
package/src/modes.js +82 -0
package/src/paths.js +36 -0
package/src/registry.js +34 -0
package/src/scaffold.js +138 -0
package/src/sync.js +292 -0
package/src/team.js +103 -0
package/stages/L0-quickstart/manifest.json +12 -0
package/stages/L0-quickstart/template/.claude/agents/coder-generalist.md +31 -0
package/stages/L0-quickstart/template/.claude/agents/mentor-venture.md +57 -0
package/stages/L0-quickstart/template/.claude/agents/pm.md +28 -0
package/stages/L0-quickstart/template/.claude/hooks/conscience.js +89 -0
package/stages/L0-quickstart/template/.claude/hooks/lib/loop-runtime.js +507 -0
package/stages/L0-quickstart/template/.claude/hooks/lib/yaml.js +163 -0
package/stages/L0-quickstart/template/.claude/hooks/memory-cue.js +82 -0
package/stages/L0-quickstart/template/.claude/hooks/secrets-guard.js +87 -0
package/stages/L0-quickstart/template/.claude/rules/your-app-code.md +17 -0
package/stages/L0-quickstart/template/.claude/settings.json +36 -0
package/stages/L0-quickstart/template/.claude/skills/boss/SKILL.md +161 -0
package/stages/L0-quickstart/template/.claude/skills/boss-learn/SKILL.md +63 -0
package/stages/L0-quickstart/template/.claude/skills/boss-sync/SKILL.md +55 -0
package/stages/L0-quickstart/template/.claude/skills/canvas/SKILL.md +112 -0
package/stages/L0-quickstart/template/.claude/skills/comprehend/SKILL.md +72 -0
package/stages/L0-quickstart/template/.claude/skills/decide/SKILL.md +122 -0
package/stages/L0-quickstart/template/.claude/skills/feedback/SKILL.md +68 -0
package/stages/L0-quickstart/template/.claude/skills/import/SKILL.md +73 -0
package/stages/L0-quickstart/template/.claude/skills/persona/SKILL.md +92 -0
package/stages/L0-quickstart/template/.claude/skills/prototype/SKILL.md +114 -0
package/stages/L0-quickstart/template/.claude/skills/triage/SKILL.md +104 -0
package/stages/L0-quickstart/template/.claude/skills/welcome/SKILL.md +262 -0
package/stages/L0-quickstart/template/AGENTS.md +31 -0
package/stages/L0-quickstart/template/CLAUDE.md +57 -0
package/stages/L0-quickstart/template/docs/IDS.md +42 -0
package/stages/L0-quickstart/template/docs/ideas/INDEX.md +24 -0
package/stages/L0-quickstart/template/docs/loops/canvas-loop.md +90 -0
package/stages/L0-quickstart/template/docs/loops/capture-loop.md +64 -0
package/stages/L1-mvp/manifest.json +12 -0
package/stages/L1-mvp/template/.claude/agents/mentor-architect.md +124 -0
package/stages/L1-mvp/template/.claude/agents/mentor-cofounder.md +85 -0
package/stages/L1-mvp/template/.claude/agents/mentor-gtm.md +49 -0
package/stages/L1-mvp/template/.claude/agents/program-manager.md +46 -0
package/stages/L1-mvp/template/.claude/agents/tester.md +42 -0
package/stages/L1-mvp/template/.claude/hooks/auto-log.js +133 -0
package/stages/L1-mvp/template/.claude/rules/feature-context.md +18 -0
package/stages/L1-mvp/template/.claude/skills/ai-cost/SKILL.md +249 -0
package/stages/L1-mvp/template/.claude/skills/ai-failure-states/SKILL.md +226 -0
package/stages/L1-mvp/template/.claude/skills/ai-first-init/SKILL.md +227 -0
package/stages/L1-mvp/template/.claude/skills/close/SKILL.md +170 -0
package/stages/L1-mvp/template/.claude/skills/consult/SKILL.md +72 -0
package/stages/L1-mvp/template/.claude/skills/cost-review/SKILL.md +204 -0
package/stages/L1-mvp/template/.claude/skills/design-tokens-init/SKILL.md +192 -0
package/stages/L1-mvp/template/.claude/skills/drift-deep/SKILL.md +170 -0
package/stages/L1-mvp/template/.claude/skills/evals/SKILL.md +154 -0
package/stages/L1-mvp/template/.claude/skills/extract/SKILL.md +209 -0
package/stages/L1-mvp/template/.claude/skills/judge-traces/SKILL.md +68 -0
package/stages/L1-mvp/template/.claude/skills/log/SKILL.md +64 -0
package/stages/L1-mvp/template/.claude/skills/practice/SKILL.md +92 -0
package/stages/L1-mvp/template/.claude/skills/pretotype/SKILL.md +95 -0
package/stages/L1-mvp/template/.claude/skills/red-team/SKILL.md +137 -0
package/stages/L1-mvp/template/.claude/skills/revalidate/SKILL.md +51 -0
package/stages/L1-mvp/template/.claude/skills/ship/SKILL.md +105 -0
package/stages/L1-mvp/template/.claude/skills/smoke/SKILL.md +43 -0
package/stages/L1-mvp/template/.claude/skills/spec/SKILL.md +145 -0
package/stages/L1-mvp/template/claude-append.md +122 -0
package/stages/L1-mvp/template/docs/loops/ai-failure-state-loop.md +107 -0
package/stages/L1-mvp/template/docs/loops/coordination-loop.md +116 -0
package/stages/L1-mvp/template/docs/loops/cost-budget-loop.md +117 -0
package/stages/L1-mvp/template/docs/loops/cost-review-loop.md +113 -0
package/stages/L1-mvp/template/docs/loops/design-tokens-loop.md +98 -0
package/stages/L1-mvp/template/docs/loops/drift-loop.md +149 -0
package/stages/L1-mvp/template/docs/loops/extraction-loop.md +128 -0
package/stages/L1-mvp/template/docs/loops/focus-loop.md +106 -0
package/stages/L1-mvp/template/docs/loops/pretotype-loop.md +88 -0
package/stages/L1-mvp/template/docs/loops/spec-loop.md +83 -0
package/stages/L2-v1/manifest.json +12 -0
package/stages/L2-v1/template/.claude/agents/db-architect.md +91 -0
package/stages/L2-v1/template/.claude/agents/mentor-business.md +124 -0
package/stages/L2-v1/template/.claude/agents/mentor-fundraising.md +72 -0
package/stages/L2-v1/template/.claude/agents/mentor-pitch.md +84 -0
package/stages/L2-v1/template/.claude/agents/mentor-talent.md +84 -0
package/stages/L2-v1/template/.claude/agents/ui-designer.md +81 -0
package/stages/L2-v1/template/.claude/agents/ux-designer.md +87 -0
package/stages/L2-v1/template/.claude/skills/board/SKILL.md +98 -0
package/stages/L2-v1/template/.claude/skills/design-review/SKILL.md +77 -0
package/stages/L2-v1/template/.claude/skills/ux-check/SKILL.md +93 -0
package/stages/L2-v1/template/claude-append.md +59 -0
package/stages/L2-v1/template/docs/loops/design-drift-loop.md +108 -0
package/stages/L3-scale/README.md +13 -0

package/stages/L1-mvp/template/.claude/skills/cost-review/SKILL.md ADDED Viewed

@@ -0,0 +1,204 @@
+---
+name: cost-review
+description: Read the AI cost ledger and produce a dated review. Reads .boss/cost-log.jsonl (the per-call ledger /ai-cost wires), summarizes spend by FEAT + user + cohort, compares against docs/ai-cost-budget.md, flags overages and surprises, and writes docs/cost-reviews/REVIEW-YYYY-MM-DD.md. Closes the cadence that /ai-cost only declared. Cohort-aware. Usage - /cost-review
+---
+# /cost-review — read the bill, not just the budget
+`/ai-cost` writes the **budget** — what you intend to spend. `/cost-review` reads the
+**ledger** — what you actually spent. The discipline only works when both halves run.
+The v0.25 audit named this as a gap: *"the weekly review cadence is declared in `/ai-cost` but
+no skill reads `.boss/cost-log.jsonl`. The cadence is unenforced."* This skill closes that. It
+doesn't replace the founder's judgment — it surfaces the numbers so the judgment has data.
+## When to run it
+- The conscience surfaced the `cost-stale` moment (cost-review-loop opened — you have a
+  budget declared but no review on record).
+- Weekly during MVP — pair with `/close` at session end on the first session of each week.
+  The ritual is what makes the discipline stick.
+- Monthly during V1 — daily totals; cost-per-active-user; cost-as-%-of-revenue if pricing
+  exists.
+- After a real-bill surprise. The invoice IS the signal; this skill is the post-mortem doc.
+- After any change to model selection, prompt structure, caching layer, or rate-limit
+  posture. Cost shapes shift; the ledger should be re-read.
+## How to run it
+### 1. Orient
+Read, silently:
+- `docs/ai-cost-budget.md` — the declared contract (per-user/day, monthly cap, model rows,
+  cohort, levers checked off).
+- `.boss/cost-log.jsonl` — the running ledger written by the logger wrapper.
+- `docs/cost-reviews/` (if it exists) — prior reviews; the cadence trend lives here.
+- `.boss/config.json` — cohort (decides framing) and any conscience pause state.
+Don't announce these reads. Just orient.
+### 2. Parse the ledger
+`.boss/cost-log.jsonl` is one JSON object per line, each shape:
+```json
+{
+  "ts": "2026-05-24T14:32:11Z",
+  "feat": "FEAT-007",
+  "model": "claude-haiku-4-5",
+  "userId": "u_abc",
+  "input_tokens": 1240,
+  "output_tokens": 89,
+  "estimated_usd": 0.0017
+}
+```
+Aggregate three ways:
+- **By FEAT** — which features dominate spend?
+- **By user (or `userId == null` for dev calls)** — is there a power-user skewing the
+  numbers? Is the dev/test traffic noise level reasonable?
+- **By model** — is the model-mix matching the budget's stated rationale, or did a recent
+  change shift toward a more expensive model?
+Compute totals: last 7 days, last 30 days, since-last-review (if any).
+### 3. Compare against the budget
+For each declared budget line in `docs/ai-cost-budget.md`, name the actual:
+- **Per-user/day budget vs. observed:** declared $X, observed $Y. *Under / at / over.*
+  Compute the percentage of users above the alert threshold.
+- **Monthly cap vs. observed run-rate:** declared $X cap, observed run-rate $Y/month
+  (extrapolate from the last 14 days).
+- **Per-model rationale check:** for each call-site row in the budget, look at actual
+  usage. If the budget says *"using Sonnet because Haiku fails this classifier,"* but the
+  ledger shows 90% Haiku calls, the rationale is stale.
+### 4. Flag surprises
+The whole point of the review is to find things you'd otherwise miss. Look for:
+- **Cost outliers** — calls > 10x the median. Probable causes: prompt injection eating
+  context, runaway output, a workflow that's looping when it shouldn't.
+- **User outliers** — single users spending > 10x the cohort median. Either a power user
+  (interesting; flag for product) or a misbehaving client (interesting; flag for fix).
+- **FEAT skew** — one FEAT eating > 80% of spend that wasn't designed to. Often a sign
+  the cheaper-model A/B never got run.
+- **Quiet drift** — gradual upward trend over weeks with no FEAT explanation. Often a model
+  swap or prompt change that wasn't captured in the budget doc.
+### 5. Write the review file
+`docs/cost-reviews/REVIEW-{{DATE}}.md`:
+```markdown
+---
+id: REVIEW-{{DATE}}
+type: cost-review
+owner: pm
+status: recorded
+created: {{DATE}}
+budget_version: <date the budget doc was last updated>
+window: <last 7 days | since-last-review | custom>
+---
+# AI cost review — {{PROJECT_NAME}} — {{DATE}}
+## Headline
+_One sentence the founder reads in the inbox / Slack scroll. If only this line is read, the
+review should still land. Examples: "On-budget; one outlier user worth investigating."
+"Over by 18%; classifier FEAT is the culprit; consider Haiku A/B." "Under; safe to ship the
+deferred AI feature."_
+## Numbers
+- **Window:** <date range>
+- **Total spend:** $X.XX  (<N> calls, <M> distinct users, <K> distinct FEATs)
+- **Per-user/day:** observed $X.XX (median) / $Y.YY (p95)  vs. declared budget $Z.ZZ
+- **Monthly run-rate:** $X.XX/month projected  vs. declared cap $Y.YY
+- **By FEAT** (top 3): FEAT-NNN $X, FEAT-MMM $Y, FEAT-PPP $Z
+- **By model** (top 3): <model> $X (<%>), <model> $Y (<%>), <model> $Z (<%>)
+## Variance against budget
+- <line for each declared budget item: under / at / over + delta>
+## Surprises
+_Concrete findings from §4 above. One bullet per real surprise. Empty section is honest;
+"no surprises" is a finding worth recording when it's true._
+- <example: "FEAT-007 cost 4x its expected median — single user ran a 14k-token prompt repeatedly; investigate.">
+- <example: "Sonnet share grew from 30% → 65% over the window; no budget update recorded — stale rationale.">
+## Actions
+_What the founder is doing about it. Concrete, dated. The point of the review is to drive
+action, not to file numbers._
+- [ ] <action 1 — owner — by when>
+- [ ] <action 2>
+## Next review
+- Cadence: weekly / monthly / event-driven
+- Next planned: <YYYY-MM-DD>
+```
+### 6. Update the budget doc if the rationale shifted
+If §3's per-model rationale check flagged stale rows (the budget says *"using model X because
+Y"* but the ledger shows the actual usage diverged), update `docs/ai-cost-budget.md`'s
+model-choices table. The rationale should match reality OR explicitly name the gap with a
+TODO + a check-by date.
+### 7. If overages are real — surface the mentor handoff
+When the review shows overages > 10% of monthly cap:
+- `mentor-architect` for **shape**: is this a batching/caching/cheaper-fallback issue?
+  *"`mentor-architect`, our cost shape is breaking the budget; what architecture decisions
+  does that imply?"*
+- `mentor-business` for **economics**: does the cost-per-active-user fit the pricing?
+  *"`mentor-business`, our cost-per-active-user is $X; what should the pricing carry?"*
+Don't auto-invoke either. Surface the question; let the founder decide whether to consult.
+## Cohort-aware delivery
+| Cohort | Posture |
+|---|---|
+| `first-product` | Plain-dollar framing. "You spent $X this week; budget was $Y." Define alert threshold inline. Lean conservative. |
+| `vibe-coder-newbie` | Show the top 3 cost outliers with one-line *"this is probably because..."* explanations. Teach by example. |
+| `non-tech-founder` | Plain language. "Each user costs ~$X/day on average; one user cost $Y this week (review their usage)." No token-math jargon. |
+| `eng-builder` | Inspectable. Show raw numbers + percentiles; let them choose the interpretation. They'll write their own follow-ups. |
+| `vibe-virtuoso` | They want to ship faster. Lead with the *biggest lever* (the one optimization that would unblock budget) — not the full inventory. |
+| `indie-hacker` | Cost-as-%-of-revenue (or path-to-revenue) framing. Calm-company math: *"current run-rate is N% of MRR per user; sustainable margin = <X%>; you're at <Y%>."* |
+| `returning-founder` | Unit economics framing. Cost-per-acquired-user + cost-per-active-user; LTV/CAC implications. Skip the explanation. |
+| `domain-expert` | **Privacy-first framing first.** Confirm the ledger contains no PII / prompt bodies before showing any review data. Then standard surfacing. Cite the regulatory context if relevant. |
+## Connection to other loops + skills
+- **Upstream:** `cost-budget-loop` closed (budget doc + logger wired). Without that, there's
+  nothing to review.
+- **Same loop:** `cost-review-loop` — opens when the budget exists but no review has been
+  recorded. Closes when this skill writes the first review file.
+- **Adjacent:** `/ai-cost` (re-run if the budget shape needs to change based on review
+  findings); `/ai-failure-states` (cost-spike handler is in scope here too — if reviews show
+  recurring cost-spikes, the handler should fire).
+## What this skill is NOT
+- **Not a real-time dashboard.** This is a periodic review pattern. For real-time monitoring
+  the founder ships the ledger to an actual datastore + observability layer when users are
+  real. The skill is the discipline; the tooling scales beyond it.
+- **Not a budget-enforcement gate.** It surfaces variance; it doesn't block calls. The
+  override grammar (per IDEA-008) applies — if the review shows a legitimate overage (single
+  product launch event, expected spike), record the override in the review file itself.
+- **Not a substitute for the budget doc.** `/ai-cost` declares; this skill reads. Both halves
+  required; running only one is half the discipline.
+## Rules
+- **Numbers before narrative.** Lead with the top-line spend, then the percentile, then the
+  variance. The founder's eye goes to numbers; structure for that.
+- **Surprises are the point.** A review with no surprises section is a stamp; a review with
+  named surprises is a tool. Empty *"no surprises"* is honest when it's true; never fabricate.
+- **Actions are dated.** *"Improve cost"* is not an action. *"A/B Haiku on FEAT-007 by
+  2026-06-15"* is.
+- **Privacy-first for domain-expert cohort.** No PII, no prompt bodies, no model output text
+  in the review. Token counts + metadata only.
+- **The review IS the cadence.** A skill that's run once isn't enforcing a cadence — it's a
+  one-off. Pair with `/close` weekly; consider `/cost-review` part of session-end ritual.
+- **Cite the budget version.** Reviews compare against a specific budget snapshot. When the
+  budget changes, name which version this review was against.

package/stages/L1-mvp/template/.claude/skills/design-tokens-init/SKILL.md ADDED Viewed

@@ -0,0 +1,192 @@
+---
+name: design-tokens-init
+description: Scaffold the minimal three-layer design token system at the first UI commit. Prevents the 47-blues / pattern-reinvention / billion-line-drift failure modes that happen by default when AI generates UI without discipline (IDEA-010). Cohort-aware delivery — vibe-coder-newbie gets SHOWING; eng-builder gets OFFERING; vibe-virtuoso gets OVERRIDE-FRIENDLY. JIT — runs when design-tokens-loop opens. Usage - /design-tokens-init
+---
+# /design-tokens-init — scaffold the design tokens system
+The discipline that prevents the most-common AI-generated-UI failure modes. Without it, every new
+screen Claude generates will derive its own colors, spacing, and component patterns — and the
+codebase grows linearly with screens (the "billion-line drift" from IDEA-010). With it, the
+tokens file is the single source of truth that survives AI generation because the AI is
+*explicitly told to use it.*
+This skill is **JIT**: ships in MVP mode, but only runs when `design-tokens-loop` opens (i.e.,
+the project starts accumulating UI without a tokens file). For a backend-only project, this skill
+never runs. For a UI-heavy project, it runs at the first UI commit.
+## When to run it
+- `design-tokens-loop` is open (UI exists, no tokens file).
+- Founder ran the skill explicitly because they want the system before any UI.
+- After a design audit revealed drift — re-init to consolidate.
+## How to run it — cohort-aware delivery (v0.20+)
+Read `.boss/config.json` for the `cohort` field. Adjust the delivery accordingly:
+### `vibe-coder-newbie` / `first-product` — SHOW
+Generate the tokens file + a worked example. Don't just create — refactor one existing component
+to use the new tokens so the founder *sees* the pattern. Plain language:
+> *"I'm setting up a tokens file. Colors, spacing, typography in one place. I'll also refactor
+> your `<existing-button>` to use the tokens — so you can see what the pattern looks like.
+> From now on, when you ask me for new UI, I'll reference these tokens. New colors only get
+> added here; never inline. Sound good?"*
+### `eng-builder` / `returning-founder` — OFFER + SKIP THE 101
+> *"Setting up three-layer tokens: primitives → semantic → component. Standard architecture.
+> Want me to scaffold the Style Dictionary config too, or are you using your own pipeline?"*
+### `vibe-virtuoso` — OVERRIDE-FRIENDLY
+> *"Tokens system, three layers. The override pattern's there if you want to skip — record
+> rationale in the devlog. Most likely failure mode you'll hit if you skip: 47 blues by sprint
+> 3. Your call."*
+### `non-tech-founder` / `domain-expert` — PLAIN-LANGUAGE COACH
+> *"Quick setup before the UI accumulates. The problem this prevents: as you build more
+> screens, each one will use slightly different colors and spacing unless we put them in one
+> file the AI reads every time. I'll do that now — takes a minute. Domain-specific note: if
+> your colors carry meaning (e.g., medical severity, regulated states), name them by meaning
+> in the tokens, not by hue."*
+### `indie-hacker` — RIGHT-SIZED
+> *"Minimal tokens setup. Stack-portable — won't lock you into a tooling choice. I'll skip the
+> heavy Style Dictionary / Theo machinery; we can add it later if the system earns it."*
+### Unspecified — neutral plain language
+> *"Setting up a tokens file so style stays one source of truth. Reduces the chance the
+> codebase grows 47 shades of blue. Takes about a minute."*
+## The minimum three-layer token system (Nathan Curtis layer-cake)
+Create `docs/design/DESIGN_TOKENS.md` (the human-readable spec) + the stack-specific token file
+(JSON / CSS variables / TypeScript / etc. depending on stack):
+### Layer 1 — Primitives (raw values)
+Color scales, spacing scale, typography scale, radius scale, elevation, motion. Names are
+descriptive of the *value*, not the use:
+```yaml
+color.gray.100: '#f7f7f7'
+color.gray.500: '#888888'
+color.blue.500: '#3B82F6'
+spacing.s: 4
+spacing.m: 8
+spacing.l: 16
+```
+### Layer 2 — Semantic (meaning)
+Names describe the *role*, not the value. AI uses these. **This is the AI-tolerant layer.**
+```yaml
+color.action.primary: { ref: color.blue.500 }
+color.surface.background: { ref: color.gray.100 }
+color.surface.foreground: { ref: color.gray.900 }
+color.text.body: { ref: color.gray.700 }
+color.text.muted: { ref: color.gray.500 }
+spacing.element: { ref: spacing.s }
+spacing.section: { ref: spacing.l }
+```
+### Layer 3 — Component (occasionally needed)
+Component-specific tokens when a component has unique constraints. Most projects don't need
+this layer until V1. If you're reaching for it at MVP, you're probably premature.
+### The brand-anchor
+Read `docs/ideas/CANVAS.md` Promises cell. The brand voice declared there should anchor the
+*choice of primitives*. Don't default to Tailwind blues; pick primitives that match the
+declared brand. If the canvas's Promises cell is `_(not yet)_`, flag it — design tokens without
+a brand anchor will drift back to internet-defaults.
+### The 5-token distinctiveness pass (2026 — the "shadcn trap" fix)
+The fastest tell of a generic AI app: shadcn defaults straight out of the box — slate neutrals,
+Inter, 8px radius, indigo accent. AI codegen reaches for these because they're the internet
+average. Five deliberate overrides break the sameness without a redesign — do this *as part of
+picking primitives*, anchored to the canvas Promises cell:
+1. **Warm (or cool) the neutral scale** — pick a neutral with a temperature, not pure slate gray.
+2. **Choose a radius on purpose** — sharp (0–2px) or soft (12px+), not the default 8.
+3. **Intentional type pairing** — one characterful display/heading face + a clean body face; not
+   Inter-on-Inter.
+4. **One saturated accent** the brand owns — a single confident color, not indigo-by-default.
+5. **One "signature token" the defaults omit** — a texture, a shadow language, a custom easing
+   curve. This is the brandable one; it's what makes the UI *yours*.
+Cohort-aware: `vibe-coder-newbie` / `first-product` → just *do* the 5 overrides and show the
+before/after. `eng-builder` / `vibe-virtuoso` → *offer* them as a checklist, override-friendly.
+`domain-expert` → keep it sober; distinctiveness ≠ playful when the stakes are clinical.
+### Name by purpose, build on the standard (DTCG)
+Name semantic tokens by **purpose, not appearance** — `color.text.error`, never `color.red.500`
+at the semantic layer — so the AI can *reason* about intent rather than guess from a hue (Nathan
+Curtis). Where the stack allows, emit tokens in the **W3C DTCG** format (it reached a first stable
+version) so the system is portable and you're not locked into one vendor's token dialect — which
+also honors the canvas's "don't monetize lock-in" line.
+## After scaffolding
+1. Update the project's CLAUDE.md (or claude-append.md) to declare the token discipline:
+   ```markdown
+   ## Design tokens (added by /design-tokens-init)
+   This project uses three-layer design tokens. **All style values come from
+   `docs/design/DESIGN_TOKENS.md` + the corresponding code file.**
+   When generating new UI:
+   1. Search `src/components/` for similar patterns first; reuse before creating.
+   2. Reference tokens by semantic name (`color.action.primary`), never raw hex.
+   3. Specify all 5 states (default / hover / active / disabled / empty) explicitly.
+   4. Reject "make it pretty" prompts; anchor to the canvas Promises cell.
+   Semantic → primitive map (the AI reads this without opening another file):
+   | semantic                 | primitive        |
+   |--------------------------|------------------|
+   | color.action.primary     | <your accent>    |
+   | color.surface.background  | <your neutral>   |
+   | color.text.body           | <your ink>       |
+   | radius.default            | <your radius>    |
+   | font.display / font.body  | <your pairing>   |
+   | <signature token>         | <the one that's yours> |
+   ```
+   Inlining the map in CLAUDE.md (not just `DESIGN_TOKENS.md`) means the agent inherits the
+   brand for free on every turn — the single most useful artifact for a Claude-Code-native scaffold.
+2. The `design-tokens-loop` exit predicate now passes — loop closes.
+3. Going forward, `design-drift-loop` (V1) watches for drift (raw hex codes appearing, near-
+   duplicate components, tokens file untouched while components grow).
+## Rules
+- **Three layers, not two.** Two-layer (primitives → component) is fragile under AI generation.
+  Three layers (primitives → semantic → component) gives AI a meaningful name to grab.
+- **Brand-anchor the primitives.** Don't default to internet-aesthetic. Canvas Promises cell IS
+  the brief.
+- **Tokens file LIVES in the prompt context.** When asking AI for UI work, *always* pass
+  `DESIGN_TOKENS.md` as context. This single discipline prevents 80% of the drift.
+- **JIT — only scaffold when needed.** Don't init tokens before there's UI to use them. The
+  loop's entry predicate is the trigger; don't pre-empt it.
+- **Override is recorded, not blocked.** A founder skipping this skill is legitimate; record
+  in devlog with substantive rationale.
+- **Run the 5-token distinctiveness pass.** Three layers prevent drift; the 5 overrides prevent
+  *sameness* (the generic-AI-app look). Both, not one. The "signature token" is the brandable one.
+- **Name by purpose; emit DTCG where the stack allows.** Semantic tokens reason about intent
+  (`color.text.error`), not hue (`color.red.500`); DTCG keeps it portable (no lock-in).
+- **Cite the field.** Brad Frost (Atomic Design), Nathan Curtis (layer-cake + purpose-naming), W3C
+  Design Tokens Community Group (DTCG stable). 2026 distinctiveness tactic per the AI-UX scan. Not
+  BOSS's inventions — applied with build-integration. Interaction-layer companion:
+  [`library/practices/ai-ux-patterns.md`](../../../../../library/practices/ai-ux-patterns.md).

package/stages/L1-mvp/template/.claude/skills/drift-deep/SKILL.md ADDED Viewed

@@ -0,0 +1,170 @@
+---
+name: drift-deep
+description: The deep, whole-project version of the conscience's drift check — "am I fooling myself across EVERYTHING I've built?" Reads the entire project (canvas, all devlog, all FEAT specs, the actual code, the ideas) and judges, honestly, whether the body of work is validating the named riskiest assumption or building around it. The deliberate, founder-invoked counterpart to the cheap always-on `drift` hook moment (which reads only the last ~5 entries). Uses the model's full context, not a bounded peek. Writes docs/drift-audits/DRIFT-YYYY-MM-DD.md. Run when you want the real audit, not the tripwire. Usage - /drift-deep
+---
+# /drift-deep — the whole-project "am I fooling myself?" audit
+> *"BOSS helps founders build faster without fooling themselves — continuously checking the work
+> against real pain, real workflows, real buyers, real economics, real distribution."* — PRINCIPLES.md.
+The conscience's `drift` hook moment is a **cheap tripwire**: it reads ~5 recent entries and asks
+"you named a risk, you're piling work, nothing tests it — is the recent work on-aim?" That's the
+right thing to fire unprompted on every prompt, because it's affordable to.
+This is the **deep version** — the one a bounded read can't do. You run it deliberately when you
+want the honest, whole-body-of-work audit: *read everything I've built and tell me, across the
+entire project, whether I'm actually validating my riskiest bet or quietly building around it.* It
+uses the model's full context, not a peek. It costs more — which is exactly why it's a skill you
+**invoke**, not a moment that fires automatically. (That restraint is the point; the conscience
+must not become the expensive-AI app it warns founders about.)
+It's broader than the hook moment in two ways:
+- **It runs even when a validation plan exists.** A plan on the canvas closes the cheap gate — but
+  did you *execute* it, or write the experiment line and then drift from running it? The deep audit
+  judges the work, not the presence of a plan-line.
+- **It reads the actual code, not just the devlog.** What you *built* is the ground truth of what
+  you bet on. The deep pass reads `src/` structurally and checks it against the named risk.
+## When to run it
+- Before a mode unlock, a fundraise conversation, a milestone — any moment you want an honest "where
+  do I actually stand against my bet" read, not a glance.
+- When the cheap `drift` moment has fired a few times (check `boss conscience activity`) and you
+  want the thorough version instead of the recurring nudge.
+- When you *have* a validation plan but suspect you've drifted from executing it.
+- Periodically on a longer project — the bounded hook moment can miss slow, cumulative drift that's
+  only visible across the whole body of work.
+- On BOSS itself (self-hosted) at each capability arc.
+## How to run it
+### 1. Read the whole project (this is the 1M-context move — read widely, on purpose)
+Unlike the hook moment's bounded ~5-entry read, here you read **everything that bears on the bet**:
+- **The canvas** — `docs/ideas/*-canvas.md`: the riskiest assumption (the bet), the experiment line
+  (the plan, if any), and the surrounding cells (people, problem, metrics, risks & harms).
+- **The full devlog** — `docs/devlog.md`, *all* entries, not the last five. The arc, not the tail.
+- **Every FEAT spec** — `docs/specs/FEAT-*.md` (or wherever specs live): what was committed to build
+  and its acceptance criteria. Specs are the stated intent; compare against the bet.
+- **The actual code** — `src/` structurally: what modules/features exist, what the app *does*. What
+  you built is the truest record of what you bet on. Read the shape, not every line; follow the
+  parts that touch (or conspicuously avoid) the named risk.
+- **The ideas + extractions + cost/drift history** — `docs/ideas/`, `docs/extractions/`,
+  `docs/cost-reviews/`, prior `docs/drift-audits/` — the thinking record.
+Read silently. For a large project, prioritize by relevance to the named risk, not by recency
+alone — but don't quietly cap coverage; if you skip areas, say which and why in the audit.
+### 2. Judge each area of work against the bet
+For each meaningful body of work (a FEAT, a devlog cluster, a src/ area), ask the one question the
+predicate can't: **does this *test* the riskiest assumption, or does it build *around* it?**
+Be specific and honest. "Built the auth system" doesn't test "teams will switch from spreadsheets" —
+it's necessary plumbing, but it's not validation. "Ran 4 migration sessions with target teams" does.
+The judgment is: of everything built, how much points at the bet vs. away from it?
+Watch for the sharpest drift signal: **building the very thing the risk says to defer** (e.g. the
+risk is "buyers pay before integrations" and you built the integration framework). Name it plainly.
+### 3. Reach a verdict — on-aim / drifting / mixed
+- **on-aim** — the body of work is substantially testing the named risk. Say so plainly; this is
+  the affirming read, the deep counterpart to staying silent. Don't manufacture concern.
+- **drifting** — the work has been building around the bet. Name *where* and *what's missing*.
+- **mixed** — some work tests the bet, some drifts. The common real case. Quantify honestly.
+The verdict is a judgment, not a score. State your confidence and what you couldn't see.
+### 4. Name the smallest re-aim
+If drifting or mixed: the single next step that would point the work back at the risk — the smallest
+experiment, interview, or instrument that tests the bet. Point at `/canvas` (to set or sharpen the
+experiment line) or `/pretotype` (to run a cheap demand test). One step, not a plan.
+If on-aim: name what to keep doing, and the next checkpoint where drift could creep in.
+### 5. Write `docs/drift-audits/DRIFT-YYYY-MM-DD.md`
+```markdown
+---
+id: DRIFT-YYYY-MM-DD
+type: drift-audit
+owner: pm
+status: recorded
+created: {{DATE}}
+verdict: on-aim | drifting | mixed
+---
+# Drift audit — YYYY-MM-DD
+## The named bet
+- **Riskiest assumption:** <verbatim from the canvas>
+- **Validation plan on record:** <the experiment line, or "none recorded">
+## What the body of work actually does (vs. the bet)
+| Area | What it does | Tests the bet? |
+|---|---|---|
+| <FEAT / devlog cluster / src area> | <one line> | yes / no / partial |
+| … | … | … |
+## Verdict: <on-aim / drifting / mixed>
+<2-4 sentences, honest. Where you started, what's solid, where the work points.>
+## The specific gaps
+- <where the work diverged from the risk — name the work and why it doesn't test the bet>
+## Smallest re-aim
+- <the one next step that points work at the risk> → `/canvas` (set the experiment) or
+  `/pretotype` (run it)
+## Confidence + what this audit couldn't see
+- <areas skipped and why; ambiguity; cohort caveats>
+```
+## Cohort-aware delivery
+| Cohort | Posture |
+|---|---|
+| `first-product` | Teach what "test your riskiest bet" *means* with examples from THIS project. The verdict is a mirror, never a grade — name what's solid first. Define terms inline. |
+| `vibe-coder-newbie` | Show the per-area table concretely (their actual FEATs). Avoid abstract "validation" talk; say "does building X tell you whether Y is true?" |
+| `non-tech-founder` | Plain language. Lead with the business bet, not the code. The audit is "are you building toward the thing that makes this work, or away from it?" |
+| `eng-builder` | Terse, inspectable, evidence-first. They'll accept a hard verdict if the per-area evidence is concrete. Skip the teaching. |
+| `vibe-virtuoso` | This is the cohort the deep audit serves most — they ship a lot and validate little. Be direct: of N things built, how many test the bet? Sharper question, not praise. |
+| `indie-hacker` | Calm-company framing. "Is the work earning its keep against the bet?" Understatement; "this is fine" is high praise. |
+| `returning-founder` | Skip the 101. The harder cut: "is the body of work at the level of conviction this bet needs — or are you busy?" They can take it straight. |
+| `domain-expert` | High-stakes: an *un-validated* risk in a regulated domain is a who-could-be-harmed question, not just a business one. Lead with the humane lens — mentor-humane has override authority. Name the real-world stakes of the bet being wrong, conservatively. |
+## Connection to the conscience
+- **The deep counterpart to the `drift` hook moment** (`drift-loop`). The hook moment is the cheap,
+  bounded, always-on tripwire; this is the deliberate, whole-project, founder-invoked audit. The
+  moment can *point here* for the thorough version.
+- **Routes back to** `/canvas` (set/sharpen the experiment) and `/pretotype` (run the demand test).
+- **Not a loop, not a new moment** — there is no `drift-deep-loop` and no nudge to "run your audit."
+  It's on-demand by design; making it a recurring obligation would be the premature ceremony BOSS
+  avoids. You run it when you want the truth, not when a predicate says to.
+## What this skill is NOT
+- **Not the cheap moment.** Don't reach for `/drift-deep` reflexively — the hook `drift` moment is
+  the everyday tripwire. This is the deliberate, more-expensive audit. Restraint is the design.
+- **Not a gate.** It judges and reports; it never blocks. If you disagree with the verdict, record
+  why (the override grammar) and move on.
+- **Not graded by the judgment-eval surface.** Like `/extract`, this is a deliberate skill judgment;
+  its quality is tested in use, not by `conscience-evals/judgment/` (which covers the hook moments).
+## Rules
+- **Read widely, then judge — don't judge from the tail.** The whole point is the full body of
+  work. If you only read recent entries, you've just done the cheap moment by hand.
+- **The verdict is honest, not kind.** on-aim when it's earned; drifting when it's true. Manufacturing
+  concern erodes trust as much as missing real drift.
+- **Name the work, not the founder.** "FEAT-003 builds X, which doesn't test the bet" — never "you
+  wasted time." It's an audit of the work against the bet.
+- **No silent caps.** If the project's too big to read fully, say what you skipped and why — don't
+  let partial coverage read as a whole-project verdict.
+- **Cite the bet verbatim.** The audit is only as honest as the riskiest-assumption it's measured
+  against. Quote it; if it's vague, that's the first finding (point at `/canvas`).