bossbuild 0.97.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (128) hide show
  1. package/LICENSE +21 -0
  2. package/PRINCIPLES.md +70 -0
  3. package/README.md +213 -0
  4. package/VERSION +1 -0
  5. package/bin/boss +3 -0
  6. package/library/README.md +19 -0
  7. package/library/agents/.gitkeep +0 -0
  8. package/library/agents/mentor-venture.md +57 -0
  9. package/library/hooks/.gitkeep +0 -0
  10. package/library/hooks/auto-log.js +133 -0
  11. package/library/hooks/memory-cue.js +82 -0
  12. package/library/hooks/secrets-guard.js +87 -0
  13. package/library/memory-seed/README.md +29 -0
  14. package/library/memory-seed/durable-facts-example.md +16 -0
  15. package/library/practices/.gitkeep +0 -0
  16. package/library/practices/agent-security.md +111 -0
  17. package/library/practices/ai-adoption-culture.md +104 -0
  18. package/library/practices/ai-ux-patterns.md +246 -0
  19. package/library/practices/celebration-of-done.md +100 -0
  20. package/library/practices/conscience-voicing.md +121 -0
  21. package/library/practices/context-discipline.md +116 -0
  22. package/library/practices/design-system.md +152 -0
  23. package/library/practices/git-workflow.md +119 -0
  24. package/library/practices/harm-taxonomy.md +45 -0
  25. package/library/practices/quality-ratchet.md +48 -0
  26. package/library/practices/revalidation.md +57 -0
  27. package/library/practices/scalable-architecture.md +111 -0
  28. package/library/practices/ship-it-live.md +149 -0
  29. package/library/practices/skill-authoring.md +70 -0
  30. package/library/skills/.gitkeep +0 -0
  31. package/library/skills/boss-learn/SKILL.md +63 -0
  32. package/library/skills/boss-sync/SKILL.md +48 -0
  33. package/package.json +49 -0
  34. package/registry/CHANGELOG.md +2737 -0
  35. package/src/board.js +655 -0
  36. package/src/brain.js +288 -0
  37. package/src/cli.js +542 -0
  38. package/src/conscience.js +426 -0
  39. package/src/insights.js +147 -0
  40. package/src/learn.js +92 -0
  41. package/src/map.js +103 -0
  42. package/src/modes.js +82 -0
  43. package/src/paths.js +36 -0
  44. package/src/registry.js +34 -0
  45. package/src/scaffold.js +138 -0
  46. package/src/sync.js +292 -0
  47. package/src/team.js +103 -0
  48. package/stages/L0-quickstart/manifest.json +12 -0
  49. package/stages/L0-quickstart/template/.claude/agents/coder-generalist.md +31 -0
  50. package/stages/L0-quickstart/template/.claude/agents/mentor-venture.md +57 -0
  51. package/stages/L0-quickstart/template/.claude/agents/pm.md +28 -0
  52. package/stages/L0-quickstart/template/.claude/hooks/conscience.js +89 -0
  53. package/stages/L0-quickstart/template/.claude/hooks/lib/loop-runtime.js +507 -0
  54. package/stages/L0-quickstart/template/.claude/hooks/lib/yaml.js +163 -0
  55. package/stages/L0-quickstart/template/.claude/hooks/memory-cue.js +82 -0
  56. package/stages/L0-quickstart/template/.claude/hooks/secrets-guard.js +87 -0
  57. package/stages/L0-quickstart/template/.claude/rules/your-app-code.md +17 -0
  58. package/stages/L0-quickstart/template/.claude/settings.json +36 -0
  59. package/stages/L0-quickstart/template/.claude/skills/boss/SKILL.md +161 -0
  60. package/stages/L0-quickstart/template/.claude/skills/boss-learn/SKILL.md +63 -0
  61. package/stages/L0-quickstart/template/.claude/skills/boss-sync/SKILL.md +55 -0
  62. package/stages/L0-quickstart/template/.claude/skills/canvas/SKILL.md +112 -0
  63. package/stages/L0-quickstart/template/.claude/skills/comprehend/SKILL.md +72 -0
  64. package/stages/L0-quickstart/template/.claude/skills/decide/SKILL.md +122 -0
  65. package/stages/L0-quickstart/template/.claude/skills/feedback/SKILL.md +68 -0
  66. package/stages/L0-quickstart/template/.claude/skills/import/SKILL.md +73 -0
  67. package/stages/L0-quickstart/template/.claude/skills/persona/SKILL.md +92 -0
  68. package/stages/L0-quickstart/template/.claude/skills/prototype/SKILL.md +114 -0
  69. package/stages/L0-quickstart/template/.claude/skills/triage/SKILL.md +104 -0
  70. package/stages/L0-quickstart/template/.claude/skills/welcome/SKILL.md +262 -0
  71. package/stages/L0-quickstart/template/AGENTS.md +31 -0
  72. package/stages/L0-quickstart/template/CLAUDE.md +57 -0
  73. package/stages/L0-quickstart/template/docs/IDS.md +42 -0
  74. package/stages/L0-quickstart/template/docs/ideas/INDEX.md +24 -0
  75. package/stages/L0-quickstart/template/docs/loops/canvas-loop.md +90 -0
  76. package/stages/L0-quickstart/template/docs/loops/capture-loop.md +64 -0
  77. package/stages/L1-mvp/manifest.json +12 -0
  78. package/stages/L1-mvp/template/.claude/agents/mentor-architect.md +124 -0
  79. package/stages/L1-mvp/template/.claude/agents/mentor-cofounder.md +85 -0
  80. package/stages/L1-mvp/template/.claude/agents/mentor-gtm.md +49 -0
  81. package/stages/L1-mvp/template/.claude/agents/program-manager.md +46 -0
  82. package/stages/L1-mvp/template/.claude/agents/tester.md +42 -0
  83. package/stages/L1-mvp/template/.claude/hooks/auto-log.js +133 -0
  84. package/stages/L1-mvp/template/.claude/rules/feature-context.md +18 -0
  85. package/stages/L1-mvp/template/.claude/skills/ai-cost/SKILL.md +249 -0
  86. package/stages/L1-mvp/template/.claude/skills/ai-failure-states/SKILL.md +226 -0
  87. package/stages/L1-mvp/template/.claude/skills/ai-first-init/SKILL.md +227 -0
  88. package/stages/L1-mvp/template/.claude/skills/close/SKILL.md +170 -0
  89. package/stages/L1-mvp/template/.claude/skills/consult/SKILL.md +72 -0
  90. package/stages/L1-mvp/template/.claude/skills/cost-review/SKILL.md +204 -0
  91. package/stages/L1-mvp/template/.claude/skills/design-tokens-init/SKILL.md +192 -0
  92. package/stages/L1-mvp/template/.claude/skills/drift-deep/SKILL.md +170 -0
  93. package/stages/L1-mvp/template/.claude/skills/evals/SKILL.md +154 -0
  94. package/stages/L1-mvp/template/.claude/skills/extract/SKILL.md +209 -0
  95. package/stages/L1-mvp/template/.claude/skills/judge-traces/SKILL.md +68 -0
  96. package/stages/L1-mvp/template/.claude/skills/log/SKILL.md +64 -0
  97. package/stages/L1-mvp/template/.claude/skills/practice/SKILL.md +92 -0
  98. package/stages/L1-mvp/template/.claude/skills/pretotype/SKILL.md +95 -0
  99. package/stages/L1-mvp/template/.claude/skills/red-team/SKILL.md +137 -0
  100. package/stages/L1-mvp/template/.claude/skills/revalidate/SKILL.md +51 -0
  101. package/stages/L1-mvp/template/.claude/skills/ship/SKILL.md +105 -0
  102. package/stages/L1-mvp/template/.claude/skills/smoke/SKILL.md +43 -0
  103. package/stages/L1-mvp/template/.claude/skills/spec/SKILL.md +145 -0
  104. package/stages/L1-mvp/template/claude-append.md +122 -0
  105. package/stages/L1-mvp/template/docs/loops/ai-failure-state-loop.md +107 -0
  106. package/stages/L1-mvp/template/docs/loops/coordination-loop.md +116 -0
  107. package/stages/L1-mvp/template/docs/loops/cost-budget-loop.md +117 -0
  108. package/stages/L1-mvp/template/docs/loops/cost-review-loop.md +113 -0
  109. package/stages/L1-mvp/template/docs/loops/design-tokens-loop.md +98 -0
  110. package/stages/L1-mvp/template/docs/loops/drift-loop.md +149 -0
  111. package/stages/L1-mvp/template/docs/loops/extraction-loop.md +128 -0
  112. package/stages/L1-mvp/template/docs/loops/focus-loop.md +106 -0
  113. package/stages/L1-mvp/template/docs/loops/pretotype-loop.md +88 -0
  114. package/stages/L1-mvp/template/docs/loops/spec-loop.md +83 -0
  115. package/stages/L2-v1/manifest.json +12 -0
  116. package/stages/L2-v1/template/.claude/agents/db-architect.md +91 -0
  117. package/stages/L2-v1/template/.claude/agents/mentor-business.md +124 -0
  118. package/stages/L2-v1/template/.claude/agents/mentor-fundraising.md +72 -0
  119. package/stages/L2-v1/template/.claude/agents/mentor-pitch.md +84 -0
  120. package/stages/L2-v1/template/.claude/agents/mentor-talent.md +84 -0
  121. package/stages/L2-v1/template/.claude/agents/ui-designer.md +81 -0
  122. package/stages/L2-v1/template/.claude/agents/ux-designer.md +87 -0
  123. package/stages/L2-v1/template/.claude/skills/board/SKILL.md +98 -0
  124. package/stages/L2-v1/template/.claude/skills/design-review/SKILL.md +77 -0
  125. package/stages/L2-v1/template/.claude/skills/ux-check/SKILL.md +93 -0
  126. package/stages/L2-v1/template/claude-append.md +59 -0
  127. package/stages/L2-v1/template/docs/loops/design-drift-loop.md +108 -0
  128. package/stages/L3-scale/README.md +13 -0
@@ -0,0 +1,204 @@
1
+ ---
2
+ name: cost-review
3
+ description: Read the AI cost ledger and produce a dated review. Reads .boss/cost-log.jsonl (the per-call ledger /ai-cost wires), summarizes spend by FEAT + user + cohort, compares against docs/ai-cost-budget.md, flags overages and surprises, and writes docs/cost-reviews/REVIEW-YYYY-MM-DD.md. Closes the cadence that /ai-cost only declared. Cohort-aware. Usage - /cost-review
4
+ ---
5
+
6
+ # /cost-review — read the bill, not just the budget
7
+
8
+ `/ai-cost` writes the **budget** — what you intend to spend. `/cost-review` reads the
9
+ **ledger** — what you actually spent. The discipline only works when both halves run.
10
+
11
+ The v0.25 audit named this as a gap: *"the weekly review cadence is declared in `/ai-cost` but
12
+ no skill reads `.boss/cost-log.jsonl`. The cadence is unenforced."* This skill closes that. It
13
+ doesn't replace the founder's judgment — it surfaces the numbers so the judgment has data.
14
+
15
+ ## When to run it
16
+
17
+ - The conscience surfaced the `cost-stale` moment (cost-review-loop opened — you have a
18
+ budget declared but no review on record).
19
+ - Weekly during MVP — pair with `/close` at session end on the first session of each week.
20
+ The ritual is what makes the discipline stick.
21
+ - Monthly during V1 — daily totals; cost-per-active-user; cost-as-%-of-revenue if pricing
22
+ exists.
23
+ - After a real-bill surprise. The invoice IS the signal; this skill is the post-mortem doc.
24
+ - After any change to model selection, prompt structure, caching layer, or rate-limit
25
+ posture. Cost shapes shift; the ledger should be re-read.
26
+
27
+ ## How to run it
28
+
29
+ ### 1. Orient
30
+
31
+ Read, silently:
32
+ - `docs/ai-cost-budget.md` — the declared contract (per-user/day, monthly cap, model rows,
33
+ cohort, levers checked off).
34
+ - `.boss/cost-log.jsonl` — the running ledger written by the logger wrapper.
35
+ - `docs/cost-reviews/` (if it exists) — prior reviews; the cadence trend lives here.
36
+ - `.boss/config.json` — cohort (decides framing) and any conscience pause state.
37
+
38
+ Don't announce these reads. Just orient.
39
+
40
+ ### 2. Parse the ledger
41
+
42
+ `.boss/cost-log.jsonl` is one JSON object per line, each shape:
43
+
44
+ ```json
45
+ {
46
+ "ts": "2026-05-24T14:32:11Z",
47
+ "feat": "FEAT-007",
48
+ "model": "claude-haiku-4-5",
49
+ "userId": "u_abc",
50
+ "input_tokens": 1240,
51
+ "output_tokens": 89,
52
+ "estimated_usd": 0.0017
53
+ }
54
+ ```
55
+
56
+ Aggregate three ways:
57
+ - **By FEAT** — which features dominate spend?
58
+ - **By user (or `userId == null` for dev calls)** — is there a power-user skewing the
59
+ numbers? Is the dev/test traffic noise level reasonable?
60
+ - **By model** — is the model-mix matching the budget's stated rationale, or did a recent
61
+ change shift toward a more expensive model?
62
+
63
+ Compute totals: last 7 days, last 30 days, since-last-review (if any).
64
+
65
+ ### 3. Compare against the budget
66
+
67
+ For each declared budget line in `docs/ai-cost-budget.md`, name the actual:
68
+ - **Per-user/day budget vs. observed:** declared $X, observed $Y. *Under / at / over.*
69
+ Compute the percentage of users above the alert threshold.
70
+ - **Monthly cap vs. observed run-rate:** declared $X cap, observed run-rate $Y/month
71
+ (extrapolate from the last 14 days).
72
+ - **Per-model rationale check:** for each call-site row in the budget, look at actual
73
+ usage. If the budget says *"using Sonnet because Haiku fails this classifier,"* but the
74
+ ledger shows 90% Haiku calls, the rationale is stale.
75
+
76
+ ### 4. Flag surprises
77
+
78
+ The whole point of the review is to find things you'd otherwise miss. Look for:
79
+ - **Cost outliers** — calls > 10x the median. Probable causes: prompt injection eating
80
+ context, runaway output, a workflow that's looping when it shouldn't.
81
+ - **User outliers** — single users spending > 10x the cohort median. Either a power user
82
+ (interesting; flag for product) or a misbehaving client (interesting; flag for fix).
83
+ - **FEAT skew** — one FEAT eating > 80% of spend that wasn't designed to. Often a sign
84
+ the cheaper-model A/B never got run.
85
+ - **Quiet drift** — gradual upward trend over weeks with no FEAT explanation. Often a model
86
+ swap or prompt change that wasn't captured in the budget doc.
87
+
88
+ ### 5. Write the review file
89
+
90
+ `docs/cost-reviews/REVIEW-{{DATE}}.md`:
91
+
92
+ ```markdown
93
+ ---
94
+ id: REVIEW-{{DATE}}
95
+ type: cost-review
96
+ owner: pm
97
+ status: recorded
98
+ created: {{DATE}}
99
+ budget_version: <date the budget doc was last updated>
100
+ window: <last 7 days | since-last-review | custom>
101
+ ---
102
+
103
+ # AI cost review — {{PROJECT_NAME}} — {{DATE}}
104
+
105
+ ## Headline
106
+ _One sentence the founder reads in the inbox / Slack scroll. If only this line is read, the
107
+ review should still land. Examples: "On-budget; one outlier user worth investigating."
108
+ "Over by 18%; classifier FEAT is the culprit; consider Haiku A/B." "Under; safe to ship the
109
+ deferred AI feature."_
110
+
111
+ ## Numbers
112
+ - **Window:** <date range>
113
+ - **Total spend:** $X.XX (<N> calls, <M> distinct users, <K> distinct FEATs)
114
+ - **Per-user/day:** observed $X.XX (median) / $Y.YY (p95) vs. declared budget $Z.ZZ
115
+ - **Monthly run-rate:** $X.XX/month projected vs. declared cap $Y.YY
116
+ - **By FEAT** (top 3): FEAT-NNN $X, FEAT-MMM $Y, FEAT-PPP $Z
117
+ - **By model** (top 3): <model> $X (<%>), <model> $Y (<%>), <model> $Z (<%>)
118
+
119
+ ## Variance against budget
120
+ - <line for each declared budget item: under / at / over + delta>
121
+
122
+ ## Surprises
123
+ _Concrete findings from §4 above. One bullet per real surprise. Empty section is honest;
124
+ "no surprises" is a finding worth recording when it's true._
125
+ - <example: "FEAT-007 cost 4x its expected median — single user ran a 14k-token prompt repeatedly; investigate.">
126
+ - <example: "Sonnet share grew from 30% → 65% over the window; no budget update recorded — stale rationale.">
127
+
128
+ ## Actions
129
+ _What the founder is doing about it. Concrete, dated. The point of the review is to drive
130
+ action, not to file numbers._
131
+ - [ ] <action 1 — owner — by when>
132
+ - [ ] <action 2>
133
+
134
+ ## Next review
135
+ - Cadence: weekly / monthly / event-driven
136
+ - Next planned: <YYYY-MM-DD>
137
+ ```
138
+
139
+ ### 6. Update the budget doc if the rationale shifted
140
+
141
+ If §3's per-model rationale check flagged stale rows (the budget says *"using model X because
142
+ Y"* but the ledger shows the actual usage diverged), update `docs/ai-cost-budget.md`'s
143
+ model-choices table. The rationale should match reality OR explicitly name the gap with a
144
+ TODO + a check-by date.
145
+
146
+ ### 7. If overages are real — surface the mentor handoff
147
+
148
+ When the review shows overages > 10% of monthly cap:
149
+ - `mentor-architect` for **shape**: is this a batching/caching/cheaper-fallback issue?
150
+ *"`mentor-architect`, our cost shape is breaking the budget; what architecture decisions
151
+ does that imply?"*
152
+ - `mentor-business` for **economics**: does the cost-per-active-user fit the pricing?
153
+ *"`mentor-business`, our cost-per-active-user is $X; what should the pricing carry?"*
154
+
155
+ Don't auto-invoke either. Surface the question; let the founder decide whether to consult.
156
+
157
+ ## Cohort-aware delivery
158
+
159
+ | Cohort | Posture |
160
+ |---|---|
161
+ | `first-product` | Plain-dollar framing. "You spent $X this week; budget was $Y." Define alert threshold inline. Lean conservative. |
162
+ | `vibe-coder-newbie` | Show the top 3 cost outliers with one-line *"this is probably because..."* explanations. Teach by example. |
163
+ | `non-tech-founder` | Plain language. "Each user costs ~$X/day on average; one user cost $Y this week (review their usage)." No token-math jargon. |
164
+ | `eng-builder` | Inspectable. Show raw numbers + percentiles; let them choose the interpretation. They'll write their own follow-ups. |
165
+ | `vibe-virtuoso` | They want to ship faster. Lead with the *biggest lever* (the one optimization that would unblock budget) — not the full inventory. |
166
+ | `indie-hacker` | Cost-as-%-of-revenue (or path-to-revenue) framing. Calm-company math: *"current run-rate is N% of MRR per user; sustainable margin = <X%>; you're at <Y%>."* |
167
+ | `returning-founder` | Unit economics framing. Cost-per-acquired-user + cost-per-active-user; LTV/CAC implications. Skip the explanation. |
168
+ | `domain-expert` | **Privacy-first framing first.** Confirm the ledger contains no PII / prompt bodies before showing any review data. Then standard surfacing. Cite the regulatory context if relevant. |
169
+
170
+ ## Connection to other loops + skills
171
+
172
+ - **Upstream:** `cost-budget-loop` closed (budget doc + logger wired). Without that, there's
173
+ nothing to review.
174
+ - **Same loop:** `cost-review-loop` — opens when the budget exists but no review has been
175
+ recorded. Closes when this skill writes the first review file.
176
+ - **Adjacent:** `/ai-cost` (re-run if the budget shape needs to change based on review
177
+ findings); `/ai-failure-states` (cost-spike handler is in scope here too — if reviews show
178
+ recurring cost-spikes, the handler should fire).
179
+
180
+ ## What this skill is NOT
181
+
182
+ - **Not a real-time dashboard.** This is a periodic review pattern. For real-time monitoring
183
+ the founder ships the ledger to an actual datastore + observability layer when users are
184
+ real. The skill is the discipline; the tooling scales beyond it.
185
+ - **Not a budget-enforcement gate.** It surfaces variance; it doesn't block calls. The
186
+ override grammar (per IDEA-008) applies — if the review shows a legitimate overage (single
187
+ product launch event, expected spike), record the override in the review file itself.
188
+ - **Not a substitute for the budget doc.** `/ai-cost` declares; this skill reads. Both halves
189
+ required; running only one is half the discipline.
190
+
191
+ ## Rules
192
+
193
+ - **Numbers before narrative.** Lead with the top-line spend, then the percentile, then the
194
+ variance. The founder's eye goes to numbers; structure for that.
195
+ - **Surprises are the point.** A review with no surprises section is a stamp; a review with
196
+ named surprises is a tool. Empty *"no surprises"* is honest when it's true; never fabricate.
197
+ - **Actions are dated.** *"Improve cost"* is not an action. *"A/B Haiku on FEAT-007 by
198
+ 2026-06-15"* is.
199
+ - **Privacy-first for domain-expert cohort.** No PII, no prompt bodies, no model output text
200
+ in the review. Token counts + metadata only.
201
+ - **The review IS the cadence.** A skill that's run once isn't enforcing a cadence — it's a
202
+ one-off. Pair with `/close` weekly; consider `/cost-review` part of session-end ritual.
203
+ - **Cite the budget version.** Reviews compare against a specific budget snapshot. When the
204
+ budget changes, name which version this review was against.
@@ -0,0 +1,192 @@
1
+ ---
2
+ name: design-tokens-init
3
+ description: Scaffold the minimal three-layer design token system at the first UI commit. Prevents the 47-blues / pattern-reinvention / billion-line-drift failure modes that happen by default when AI generates UI without discipline (IDEA-010). Cohort-aware delivery — vibe-coder-newbie gets SHOWING; eng-builder gets OFFERING; vibe-virtuoso gets OVERRIDE-FRIENDLY. JIT — runs when design-tokens-loop opens. Usage - /design-tokens-init
4
+ ---
5
+
6
+ # /design-tokens-init — scaffold the design tokens system
7
+
8
+ The discipline that prevents the most-common AI-generated-UI failure modes. Without it, every new
9
+ screen Claude generates will derive its own colors, spacing, and component patterns — and the
10
+ codebase grows linearly with screens (the "billion-line drift" from IDEA-010). With it, the
11
+ tokens file is the single source of truth that survives AI generation because the AI is
12
+ *explicitly told to use it.*
13
+
14
+ This skill is **JIT**: ships in MVP mode, but only runs when `design-tokens-loop` opens (i.e.,
15
+ the project starts accumulating UI without a tokens file). For a backend-only project, this skill
16
+ never runs. For a UI-heavy project, it runs at the first UI commit.
17
+
18
+ ## When to run it
19
+
20
+ - `design-tokens-loop` is open (UI exists, no tokens file).
21
+ - Founder ran the skill explicitly because they want the system before any UI.
22
+ - After a design audit revealed drift — re-init to consolidate.
23
+
24
+ ## How to run it — cohort-aware delivery (v0.20+)
25
+
26
+ Read `.boss/config.json` for the `cohort` field. Adjust the delivery accordingly:
27
+
28
+ ### `vibe-coder-newbie` / `first-product` — SHOW
29
+
30
+ Generate the tokens file + a worked example. Don't just create — refactor one existing component
31
+ to use the new tokens so the founder *sees* the pattern. Plain language:
32
+
33
+ > *"I'm setting up a tokens file. Colors, spacing, typography in one place. I'll also refactor
34
+ > your `<existing-button>` to use the tokens — so you can see what the pattern looks like.
35
+ > From now on, when you ask me for new UI, I'll reference these tokens. New colors only get
36
+ > added here; never inline. Sound good?"*
37
+
38
+ ### `eng-builder` / `returning-founder` — OFFER + SKIP THE 101
39
+
40
+ > *"Setting up three-layer tokens: primitives → semantic → component. Standard architecture.
41
+ > Want me to scaffold the Style Dictionary config too, or are you using your own pipeline?"*
42
+
43
+ ### `vibe-virtuoso` — OVERRIDE-FRIENDLY
44
+
45
+ > *"Tokens system, three layers. The override pattern's there if you want to skip — record
46
+ > rationale in the devlog. Most likely failure mode you'll hit if you skip: 47 blues by sprint
47
+ > 3. Your call."*
48
+
49
+ ### `non-tech-founder` / `domain-expert` — PLAIN-LANGUAGE COACH
50
+
51
+ > *"Quick setup before the UI accumulates. The problem this prevents: as you build more
52
+ > screens, each one will use slightly different colors and spacing unless we put them in one
53
+ > file the AI reads every time. I'll do that now — takes a minute. Domain-specific note: if
54
+ > your colors carry meaning (e.g., medical severity, regulated states), name them by meaning
55
+ > in the tokens, not by hue."*
56
+
57
+ ### `indie-hacker` — RIGHT-SIZED
58
+
59
+ > *"Minimal tokens setup. Stack-portable — won't lock you into a tooling choice. I'll skip the
60
+ > heavy Style Dictionary / Theo machinery; we can add it later if the system earns it."*
61
+
62
+ ### Unspecified — neutral plain language
63
+
64
+ > *"Setting up a tokens file so style stays one source of truth. Reduces the chance the
65
+ > codebase grows 47 shades of blue. Takes about a minute."*
66
+
67
+ ## The minimum three-layer token system (Nathan Curtis layer-cake)
68
+
69
+ Create `docs/design/DESIGN_TOKENS.md` (the human-readable spec) + the stack-specific token file
70
+ (JSON / CSS variables / TypeScript / etc. depending on stack):
71
+
72
+ ### Layer 1 — Primitives (raw values)
73
+
74
+ Color scales, spacing scale, typography scale, radius scale, elevation, motion. Names are
75
+ descriptive of the *value*, not the use:
76
+
77
+ ```yaml
78
+ color.gray.100: '#f7f7f7'
79
+ color.gray.500: '#888888'
80
+ color.blue.500: '#3B82F6'
81
+ spacing.s: 4
82
+ spacing.m: 8
83
+ spacing.l: 16
84
+ ```
85
+
86
+ ### Layer 2 — Semantic (meaning)
87
+
88
+ Names describe the *role*, not the value. AI uses these. **This is the AI-tolerant layer.**
89
+
90
+ ```yaml
91
+ color.action.primary: { ref: color.blue.500 }
92
+ color.surface.background: { ref: color.gray.100 }
93
+ color.surface.foreground: { ref: color.gray.900 }
94
+ color.text.body: { ref: color.gray.700 }
95
+ color.text.muted: { ref: color.gray.500 }
96
+ spacing.element: { ref: spacing.s }
97
+ spacing.section: { ref: spacing.l }
98
+ ```
99
+
100
+ ### Layer 3 — Component (occasionally needed)
101
+
102
+ Component-specific tokens when a component has unique constraints. Most projects don't need
103
+ this layer until V1. If you're reaching for it at MVP, you're probably premature.
104
+
105
+ ### The brand-anchor
106
+
107
+ Read `docs/ideas/CANVAS.md` Promises cell. The brand voice declared there should anchor the
108
+ *choice of primitives*. Don't default to Tailwind blues; pick primitives that match the
109
+ declared brand. If the canvas's Promises cell is `_(not yet)_`, flag it — design tokens without
110
+ a brand anchor will drift back to internet-defaults.
111
+
112
+ ### The 5-token distinctiveness pass (2026 — the "shadcn trap" fix)
113
+
114
+ The fastest tell of a generic AI app: shadcn defaults straight out of the box — slate neutrals,
115
+ Inter, 8px radius, indigo accent. AI codegen reaches for these because they're the internet
116
+ average. Five deliberate overrides break the sameness without a redesign — do this *as part of
117
+ picking primitives*, anchored to the canvas Promises cell:
118
+
119
+ 1. **Warm (or cool) the neutral scale** — pick a neutral with a temperature, not pure slate gray.
120
+ 2. **Choose a radius on purpose** — sharp (0–2px) or soft (12px+), not the default 8.
121
+ 3. **Intentional type pairing** — one characterful display/heading face + a clean body face; not
122
+ Inter-on-Inter.
123
+ 4. **One saturated accent** the brand owns — a single confident color, not indigo-by-default.
124
+ 5. **One "signature token" the defaults omit** — a texture, a shadow language, a custom easing
125
+ curve. This is the brandable one; it's what makes the UI *yours*.
126
+
127
+ Cohort-aware: `vibe-coder-newbie` / `first-product` → just *do* the 5 overrides and show the
128
+ before/after. `eng-builder` / `vibe-virtuoso` → *offer* them as a checklist, override-friendly.
129
+ `domain-expert` → keep it sober; distinctiveness ≠ playful when the stakes are clinical.
130
+
131
+ ### Name by purpose, build on the standard (DTCG)
132
+
133
+ Name semantic tokens by **purpose, not appearance** — `color.text.error`, never `color.red.500`
134
+ at the semantic layer — so the AI can *reason* about intent rather than guess from a hue (Nathan
135
+ Curtis). Where the stack allows, emit tokens in the **W3C DTCG** format (it reached a first stable
136
+ version) so the system is portable and you're not locked into one vendor's token dialect — which
137
+ also honors the canvas's "don't monetize lock-in" line.
138
+
139
+ ## After scaffolding
140
+
141
+ 1. Update the project's CLAUDE.md (or claude-append.md) to declare the token discipline:
142
+
143
+ ```markdown
144
+ ## Design tokens (added by /design-tokens-init)
145
+
146
+ This project uses three-layer design tokens. **All style values come from
147
+ `docs/design/DESIGN_TOKENS.md` + the corresponding code file.**
148
+
149
+ When generating new UI:
150
+ 1. Search `src/components/` for similar patterns first; reuse before creating.
151
+ 2. Reference tokens by semantic name (`color.action.primary`), never raw hex.
152
+ 3. Specify all 5 states (default / hover / active / disabled / empty) explicitly.
153
+ 4. Reject "make it pretty" prompts; anchor to the canvas Promises cell.
154
+
155
+ Semantic → primitive map (the AI reads this without opening another file):
156
+ | semantic | primitive |
157
+ |--------------------------|------------------|
158
+ | color.action.primary | <your accent> |
159
+ | color.surface.background | <your neutral> |
160
+ | color.text.body | <your ink> |
161
+ | radius.default | <your radius> |
162
+ | font.display / font.body | <your pairing> |
163
+ | <signature token> | <the one that's yours> |
164
+ ```
165
+ Inlining the map in CLAUDE.md (not just `DESIGN_TOKENS.md`) means the agent inherits the
166
+ brand for free on every turn — the single most useful artifact for a Claude-Code-native scaffold.
167
+
168
+ 2. The `design-tokens-loop` exit predicate now passes — loop closes.
169
+
170
+ 3. Going forward, `design-drift-loop` (V1) watches for drift (raw hex codes appearing, near-
171
+ duplicate components, tokens file untouched while components grow).
172
+
173
+ ## Rules
174
+
175
+ - **Three layers, not two.** Two-layer (primitives → component) is fragile under AI generation.
176
+ Three layers (primitives → semantic → component) gives AI a meaningful name to grab.
177
+ - **Brand-anchor the primitives.** Don't default to internet-aesthetic. Canvas Promises cell IS
178
+ the brief.
179
+ - **Tokens file LIVES in the prompt context.** When asking AI for UI work, *always* pass
180
+ `DESIGN_TOKENS.md` as context. This single discipline prevents 80% of the drift.
181
+ - **JIT — only scaffold when needed.** Don't init tokens before there's UI to use them. The
182
+ loop's entry predicate is the trigger; don't pre-empt it.
183
+ - **Override is recorded, not blocked.** A founder skipping this skill is legitimate; record
184
+ in devlog with substantive rationale.
185
+ - **Run the 5-token distinctiveness pass.** Three layers prevent drift; the 5 overrides prevent
186
+ *sameness* (the generic-AI-app look). Both, not one. The "signature token" is the brandable one.
187
+ - **Name by purpose; emit DTCG where the stack allows.** Semantic tokens reason about intent
188
+ (`color.text.error`), not hue (`color.red.500`); DTCG keeps it portable (no lock-in).
189
+ - **Cite the field.** Brad Frost (Atomic Design), Nathan Curtis (layer-cake + purpose-naming), W3C
190
+ Design Tokens Community Group (DTCG stable). 2026 distinctiveness tactic per the AI-UX scan. Not
191
+ BOSS's inventions — applied with build-integration. Interaction-layer companion:
192
+ [`library/practices/ai-ux-patterns.md`](../../../../../library/practices/ai-ux-patterns.md).
@@ -0,0 +1,170 @@
1
+ ---
2
+ name: drift-deep
3
+ description: The deep, whole-project version of the conscience's drift check — "am I fooling myself across EVERYTHING I've built?" Reads the entire project (canvas, all devlog, all FEAT specs, the actual code, the ideas) and judges, honestly, whether the body of work is validating the named riskiest assumption or building around it. The deliberate, founder-invoked counterpart to the cheap always-on `drift` hook moment (which reads only the last ~5 entries). Uses the model's full context, not a bounded peek. Writes docs/drift-audits/DRIFT-YYYY-MM-DD.md. Run when you want the real audit, not the tripwire. Usage - /drift-deep
4
+ ---
5
+
6
+ # /drift-deep — the whole-project "am I fooling myself?" audit
7
+
8
+ > *"BOSS helps founders build faster without fooling themselves — continuously checking the work
9
+ > against real pain, real workflows, real buyers, real economics, real distribution."* — PRINCIPLES.md.
10
+
11
+ The conscience's `drift` hook moment is a **cheap tripwire**: it reads ~5 recent entries and asks
12
+ "you named a risk, you're piling work, nothing tests it — is the recent work on-aim?" That's the
13
+ right thing to fire unprompted on every prompt, because it's affordable to.
14
+
15
+ This is the **deep version** — the one a bounded read can't do. You run it deliberately when you
16
+ want the honest, whole-body-of-work audit: *read everything I've built and tell me, across the
17
+ entire project, whether I'm actually validating my riskiest bet or quietly building around it.* It
18
+ uses the model's full context, not a peek. It costs more — which is exactly why it's a skill you
19
+ **invoke**, not a moment that fires automatically. (That restraint is the point; the conscience
20
+ must not become the expensive-AI app it warns founders about.)
21
+
22
+ It's broader than the hook moment in two ways:
23
+ - **It runs even when a validation plan exists.** A plan on the canvas closes the cheap gate — but
24
+ did you *execute* it, or write the experiment line and then drift from running it? The deep audit
25
+ judges the work, not the presence of a plan-line.
26
+ - **It reads the actual code, not just the devlog.** What you *built* is the ground truth of what
27
+ you bet on. The deep pass reads `src/` structurally and checks it against the named risk.
28
+
29
+ ## When to run it
30
+
31
+ - Before a mode unlock, a fundraise conversation, a milestone — any moment you want an honest "where
32
+ do I actually stand against my bet" read, not a glance.
33
+ - When the cheap `drift` moment has fired a few times (check `boss conscience activity`) and you
34
+ want the thorough version instead of the recurring nudge.
35
+ - When you *have* a validation plan but suspect you've drifted from executing it.
36
+ - Periodically on a longer project — the bounded hook moment can miss slow, cumulative drift that's
37
+ only visible across the whole body of work.
38
+ - On BOSS itself (self-hosted) at each capability arc.
39
+
40
+ ## How to run it
41
+
42
+ ### 1. Read the whole project (this is the 1M-context move — read widely, on purpose)
43
+
44
+ Unlike the hook moment's bounded ~5-entry read, here you read **everything that bears on the bet**:
45
+
46
+ - **The canvas** — `docs/ideas/*-canvas.md`: the riskiest assumption (the bet), the experiment line
47
+ (the plan, if any), and the surrounding cells (people, problem, metrics, risks & harms).
48
+ - **The full devlog** — `docs/devlog.md`, *all* entries, not the last five. The arc, not the tail.
49
+ - **Every FEAT spec** — `docs/specs/FEAT-*.md` (or wherever specs live): what was committed to build
50
+ and its acceptance criteria. Specs are the stated intent; compare against the bet.
51
+ - **The actual code** — `src/` structurally: what modules/features exist, what the app *does*. What
52
+ you built is the truest record of what you bet on. Read the shape, not every line; follow the
53
+ parts that touch (or conspicuously avoid) the named risk.
54
+ - **The ideas + extractions + cost/drift history** — `docs/ideas/`, `docs/extractions/`,
55
+ `docs/cost-reviews/`, prior `docs/drift-audits/` — the thinking record.
56
+
57
+ Read silently. For a large project, prioritize by relevance to the named risk, not by recency
58
+ alone — but don't quietly cap coverage; if you skip areas, say which and why in the audit.
59
+
60
+ ### 2. Judge each area of work against the bet
61
+
62
+ For each meaningful body of work (a FEAT, a devlog cluster, a src/ area), ask the one question the
63
+ predicate can't: **does this *test* the riskiest assumption, or does it build *around* it?**
64
+
65
+ Be specific and honest. "Built the auth system" doesn't test "teams will switch from spreadsheets" —
66
+ it's necessary plumbing, but it's not validation. "Ran 4 migration sessions with target teams" does.
67
+ The judgment is: of everything built, how much points at the bet vs. away from it?
68
+
69
+ Watch for the sharpest drift signal: **building the very thing the risk says to defer** (e.g. the
70
+ risk is "buyers pay before integrations" and you built the integration framework). Name it plainly.
71
+
72
+ ### 3. Reach a verdict — on-aim / drifting / mixed
73
+
74
+ - **on-aim** — the body of work is substantially testing the named risk. Say so plainly; this is
75
+ the affirming read, the deep counterpart to staying silent. Don't manufacture concern.
76
+ - **drifting** — the work has been building around the bet. Name *where* and *what's missing*.
77
+ - **mixed** — some work tests the bet, some drifts. The common real case. Quantify honestly.
78
+
79
+ The verdict is a judgment, not a score. State your confidence and what you couldn't see.
80
+
81
+ ### 4. Name the smallest re-aim
82
+
83
+ If drifting or mixed: the single next step that would point the work back at the risk — the smallest
84
+ experiment, interview, or instrument that tests the bet. Point at `/canvas` (to set or sharpen the
85
+ experiment line) or `/pretotype` (to run a cheap demand test). One step, not a plan.
86
+
87
+ If on-aim: name what to keep doing, and the next checkpoint where drift could creep in.
88
+
89
+ ### 5. Write `docs/drift-audits/DRIFT-YYYY-MM-DD.md`
90
+
91
+ ```markdown
92
+ ---
93
+ id: DRIFT-YYYY-MM-DD
94
+ type: drift-audit
95
+ owner: pm
96
+ status: recorded
97
+ created: {{DATE}}
98
+ verdict: on-aim | drifting | mixed
99
+ ---
100
+
101
+ # Drift audit — YYYY-MM-DD
102
+
103
+ ## The named bet
104
+ - **Riskiest assumption:** <verbatim from the canvas>
105
+ - **Validation plan on record:** <the experiment line, or "none recorded">
106
+
107
+ ## What the body of work actually does (vs. the bet)
108
+ | Area | What it does | Tests the bet? |
109
+ |---|---|---|
110
+ | <FEAT / devlog cluster / src area> | <one line> | yes / no / partial |
111
+ | … | … | … |
112
+
113
+ ## Verdict: <on-aim / drifting / mixed>
114
+ <2-4 sentences, honest. Where you started, what's solid, where the work points.>
115
+
116
+ ## The specific gaps
117
+ - <where the work diverged from the risk — name the work and why it doesn't test the bet>
118
+
119
+ ## Smallest re-aim
120
+ - <the one next step that points work at the risk> → `/canvas` (set the experiment) or
121
+ `/pretotype` (run it)
122
+
123
+ ## Confidence + what this audit couldn't see
124
+ - <areas skipped and why; ambiguity; cohort caveats>
125
+ ```
126
+
127
+ ## Cohort-aware delivery
128
+
129
+ | Cohort | Posture |
130
+ |---|---|
131
+ | `first-product` | Teach what "test your riskiest bet" *means* with examples from THIS project. The verdict is a mirror, never a grade — name what's solid first. Define terms inline. |
132
+ | `vibe-coder-newbie` | Show the per-area table concretely (their actual FEATs). Avoid abstract "validation" talk; say "does building X tell you whether Y is true?" |
133
+ | `non-tech-founder` | Plain language. Lead with the business bet, not the code. The audit is "are you building toward the thing that makes this work, or away from it?" |
134
+ | `eng-builder` | Terse, inspectable, evidence-first. They'll accept a hard verdict if the per-area evidence is concrete. Skip the teaching. |
135
+ | `vibe-virtuoso` | This is the cohort the deep audit serves most — they ship a lot and validate little. Be direct: of N things built, how many test the bet? Sharper question, not praise. |
136
+ | `indie-hacker` | Calm-company framing. "Is the work earning its keep against the bet?" Understatement; "this is fine" is high praise. |
137
+ | `returning-founder` | Skip the 101. The harder cut: "is the body of work at the level of conviction this bet needs — or are you busy?" They can take it straight. |
138
+ | `domain-expert` | High-stakes: an *un-validated* risk in a regulated domain is a who-could-be-harmed question, not just a business one. Lead with the humane lens — mentor-humane has override authority. Name the real-world stakes of the bet being wrong, conservatively. |
139
+
140
+ ## Connection to the conscience
141
+
142
+ - **The deep counterpart to the `drift` hook moment** (`drift-loop`). The hook moment is the cheap,
143
+ bounded, always-on tripwire; this is the deliberate, whole-project, founder-invoked audit. The
144
+ moment can *point here* for the thorough version.
145
+ - **Routes back to** `/canvas` (set/sharpen the experiment) and `/pretotype` (run the demand test).
146
+ - **Not a loop, not a new moment** — there is no `drift-deep-loop` and no nudge to "run your audit."
147
+ It's on-demand by design; making it a recurring obligation would be the premature ceremony BOSS
148
+ avoids. You run it when you want the truth, not when a predicate says to.
149
+
150
+ ## What this skill is NOT
151
+
152
+ - **Not the cheap moment.** Don't reach for `/drift-deep` reflexively — the hook `drift` moment is
153
+ the everyday tripwire. This is the deliberate, more-expensive audit. Restraint is the design.
154
+ - **Not a gate.** It judges and reports; it never blocks. If you disagree with the verdict, record
155
+ why (the override grammar) and move on.
156
+ - **Not graded by the judgment-eval surface.** Like `/extract`, this is a deliberate skill judgment;
157
+ its quality is tested in use, not by `conscience-evals/judgment/` (which covers the hook moments).
158
+
159
+ ## Rules
160
+
161
+ - **Read widely, then judge — don't judge from the tail.** The whole point is the full body of
162
+ work. If you only read recent entries, you've just done the cheap moment by hand.
163
+ - **The verdict is honest, not kind.** on-aim when it's earned; drifting when it's true. Manufacturing
164
+ concern erodes trust as much as missing real drift.
165
+ - **Name the work, not the founder.** "FEAT-003 builds X, which doesn't test the bet" — never "you
166
+ wasted time." It's an audit of the work against the bet.
167
+ - **No silent caps.** If the project's too big to read fully, say what you skipped and why — don't
168
+ let partial coverage read as a whole-project verdict.
169
+ - **Cite the bet verbatim.** The audit is only as honest as the riskiest-assumption it's measured
170
+ against. Quote it; if it's vague, that's the first finding (point at `/canvas`).