joycraft 0.5.13 → 0.5.15

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -2,513 +2,390 @@
2
2
 
3
3
  // src/bundled-files.ts
4
4
  var SKILLS = {
5
- "joycraft-decompose.md": `---
6
- name: joycraft-decompose
7
- description: Break a feature brief into atomic specs \u2014 small, testable, independently executable units
8
- instructions: 32
5
+ "joycraft-add-fact.md": `---
6
+ name: joycraft-add-fact
7
+ description: Capture a project fact and route it to the correct context document -- production map, dangerous assumptions, decision log, institutional knowledge, or troubleshooting
8
+ instructions: 38
9
9
  ---
10
10
 
11
- # Decompose Feature into Atomic Specs
12
-
13
- You have a Feature Brief (or the user has described a feature). Your job is to decompose it into atomic specs that can be executed independently \u2014 one spec per session.
11
+ # Add Fact
14
12
 
15
- ## Step 1: Verify the Brief Exists
13
+ The user has a fact to capture. Your job is to classify it, route it to the correct context document, append it in the right format, and optionally add a CLAUDE.md boundary rule.
16
14
 
17
- Look for a Feature Brief in \`docs/briefs/\`. If one doesn't exist yet, tell the user:
15
+ ## Step 1: Get the Fact
18
16
 
19
- > No feature brief found. Run \`/joycraft-new-feature\` first to interview and create one, or describe the feature now and I'll work from your description.
17
+ If the user already provided the fact (e.g., \`/joycraft-add-fact the staging DB resets every Sunday\`), use it directly.
20
18
 
21
- If the user describes the feature inline, work from that description directly. You don't need a formal brief to decompose \u2014 but recommend creating one for complex features.
19
+ If not, ask: "What fact do you want to capture?" -- then wait for their response.
22
20
 
23
- ## Step 2: Identify Natural Boundaries
21
+ If the user provides multiple facts at once, process each one separately through all the steps below, then give a combined confirmation at the end.
24
22
 
25
- **Why:** Good boundaries make specs independently testable and committable. Bad boundaries create specs that can't be verified without other specs also being done.
23
+ ## Step 2: Classify the Fact
26
24
 
27
- Read the brief (or description) and identify natural split points:
25
+ Route the fact to one of these 5 context documents based on its content:
28
26
 
29
- - **Data layer changes** (schemas, types, migrations) \u2014 always a separate spec
30
- - **Pure functions / business logic** \u2014 separate from I/O
31
- - **UI components** \u2014 separate from data fetching
32
- - **API endpoints / route handlers** \u2014 separate from business logic
33
- - **Test infrastructure** (mocks, fixtures, helpers) \u2014 can be its own spec if substantial
34
- - **Configuration / environment** \u2014 separate from code changes
27
+ ### \`docs/context/production-map.md\`
28
+ The fact is about **infrastructure, services, environments, URLs, endpoints, credentials, or what is safe/unsafe to touch**.
29
+ - Signal words: "production", "staging", "endpoint", "URL", "database", "service", "deployed", "hosted", "credentials", "secret", "environment"
30
+ - Examples: "The staging DB is at postgres://staging.example.com", "We use Vercel for the frontend and Railway for the API"
35
31
 
36
- Ask yourself: "Can this piece be committed and tested without the other pieces existing?" If yes, it's a good boundary.
32
+ ### \`docs/context/dangerous-assumptions.md\`
33
+ The fact is about **something an AI agent might get wrong -- a false assumption that leads to bad outcomes**.
34
+ - Signal words: "assumes", "might think", "but actually", "looks like X but is Y", "not what it seems", "trap", "gotcha"
35
+ - Examples: "The \`users\` table looks like a test table but it's production", "Deleting a workspace doesn't delete the billing subscription"
37
36
 
38
- ## Step 3: Build the Decomposition Table
37
+ ### \`docs/context/decision-log.md\`
38
+ The fact is about **an architectural or tooling choice and why it was made**.
39
+ - Signal words: "decided", "chose", "because", "instead of", "we went with", "the reason we use", "trade-off"
40
+ - Examples: "We chose SQLite over Postgres because this runs on embedded devices", "We use pnpm instead of npm for workspace support"
39
41
 
40
- For each atomic spec, define:
42
+ ### \`docs/context/institutional-knowledge.md\`
43
+ The fact is about **team conventions, unwritten rules, organizational context, or who owns what**.
44
+ - Signal words: "convention", "rule", "always", "never", "team", "process", "review", "approval", "owns", "responsible"
45
+ - Examples: "The design team reviews all color changes", "We never deploy on Fridays", "PR titles must start with the ticket number"
41
46
 
42
- | # | Spec Name | Description | Dependencies | Size |
43
- |---|-----------|-------------|--------------|------|
47
+ ### \`docs/context/troubleshooting.md\`
48
+ The fact is about **diagnostic knowledge -- when X happens, do Y (or don't do Z)**.
49
+ - Signal words: "when", "fails", "error", "if you see", "stuck", "broken", "fix", "workaround", "before trying", "reboot", "restart", "reset"
50
+ - Examples: "If Wi-Fi disconnects during flash, wait and retry -- don't switch networks", "When tests fail with ECONNREFUSED, check if Docker is running"
44
51
 
45
- **Rules:**
46
- - Each spec name is \`verb-object\` format (e.g., \`add-terminal-detection\`, \`extract-prompt-module\`)
47
- - Each description is ONE sentence \u2014 if you need two, the spec is too big
48
- - Dependencies reference other spec numbers \u2014 keep the dependency graph shallow
49
- - More than 2 dependencies on a single spec = it's too big, split further
50
- - Aim for 3-7 specs per feature. Fewer than 3 = probably not decomposed enough. More than 10 = the feature brief is too big
52
+ ### Ambiguous Facts
51
53
 
52
- ## Step 4: Present and Iterate
54
+ If the fact fits multiple categories, pick the **best fit** based on the primary intent. You will mention the alternative in your confirmation message so the user can correct you.
53
55
 
54
- Show the decomposition table to the user. Ask:
55
- 1. "Does this breakdown match how you think about this feature?"
56
- 2. "Are there any specs that feel too big or too small?"
57
- 3. "Should any of these run in parallel (separate worktrees)?"
56
+ ## Step 3: Ensure the Target Document Exists
58
57
 
59
- Iterate until the user approves.
58
+ 1. If \`docs/context/\` does not exist, create the directory.
59
+ 2. If the target document does not exist, create it from the template structure. Check \`docs/templates/\` for the matching template. If no template exists, use this minimal structure:
60
60
 
61
- ## Step 5: Generate Atomic Specs
61
+ For **production-map.md**:
62
+ \`\`\`markdown
63
+ # Production Map
62
64
 
63
- For each approved row, create \`docs/specs/YYYY-MM-DD-spec-name.md\`. Create the \`docs/specs/\` directory if it doesn't exist.
65
+ > What's real, what's staging, what's safe to touch.
64
66
 
65
- **Why:** Each spec must be self-contained \u2014 a fresh Claude session should be able to execute it without reading the Feature Brief. Copy relevant constraints and context into each spec.
67
+ ## Services
66
68
 
67
- Use this structure:
69
+ | Service | Environment | URL/Endpoint | Impact if Corrupted |
70
+ |---------|-------------|-------------|-------------------|
71
+ \`\`\`
68
72
 
73
+ For **dangerous-assumptions.md**:
69
74
  \`\`\`markdown
70
- # [Verb + Object] \u2014 Atomic Spec
75
+ # Dangerous Assumptions
71
76
 
72
- > **Parent Brief:** \`docs/briefs/YYYY-MM-DD-feature-name.md\` (or "standalone")
73
- > **Status:** Ready
74
- > **Date:** YYYY-MM-DD
75
- > **Estimated scope:** [1 session / N files / ~N lines]
77
+ > Things the AI agent might assume that are wrong in this project.
76
78
 
77
- ---
79
+ ## Assumptions
78
80
 
79
- ## What
80
- One paragraph \u2014 what changes when this spec is done?
81
+ | Agent Might Assume | But Actually | Impact If Wrong |
82
+ |-------------------|-------------|----------------|
83
+ \`\`\`
81
84
 
82
- ## Why
83
- One sentence \u2014 what breaks or is missing without this?
85
+ For **decision-log.md**:
86
+ \`\`\`markdown
87
+ # Decision Log
84
88
 
85
- ## Acceptance Criteria
86
- - [ ] [Observable behavior]
87
- - [ ] Build passes
88
- - [ ] Tests pass
89
+ > Why choices were made, not just what was chosen.
89
90
 
90
- ## Test Plan
91
+ ## Decisions
91
92
 
92
- | Acceptance Criterion | Test | Type |
93
- |---------------------|------|------|
94
- | [Each AC above] | [What to call/assert] | [unit/integration/e2e] |
93
+ | Date | Decision | Why | Alternatives Rejected | Revisit When |
94
+ |------|----------|-----|----------------------|-------------|
95
+ \`\`\`
95
96
 
96
- **Execution order:**
97
- 1. Write all tests above \u2014 they should fail against current/stubbed code
98
- 2. Run tests to confirm they fail (red)
99
- 3. Implement until all tests pass (green)
97
+ For **institutional-knowledge.md**:
98
+ \`\`\`markdown
99
+ # Institutional Knowledge
100
100
 
101
- **Smoke test:** [Identify the fastest test for iteration feedback]
101
+ > Unwritten rules, team conventions, and organizational context.
102
102
 
103
- **Before implementing, verify your test harness:**
104
- 1. Run all tests \u2014 they must FAIL (if they pass, you're testing the wrong thing)
105
- 2. Each test calls your actual function/endpoint \u2014 not a reimplementation or the underlying library
106
- 3. Identify your smoke test \u2014 it must run in seconds, not minutes, so you get fast feedback on each change
103
+ ## Team Conventions
107
104
 
108
- ## Constraints
109
- - MUST: [hard requirement]
110
- - MUST NOT: [hard prohibition]
105
+ - (none yet)
106
+ \`\`\`
111
107
 
112
- ## Affected Files
113
- | Action | File | What Changes |
114
- |--------|------|-------------|
108
+ For **troubleshooting.md**:
109
+ \`\`\`markdown
110
+ # Troubleshooting
115
111
 
116
- ## Approach
117
- Strategy, data flow, key decisions. Name one rejected alternative.
112
+ > What to do when things go wrong for non-code reasons.
118
113
 
119
- ## Edge Cases
120
- | Scenario | Expected Behavior |
121
- |----------|------------------|
114
+ ## Common Failures
115
+
116
+ | When This Happens | Do This | Don't Do This |
117
+ |-------------------|---------|---------------|
122
118
  \`\`\`
123
119
 
124
- If \`docs/templates/ATOMIC_SPEC_TEMPLATE.md\` exists, reference it for the full template with additional guidance.
120
+ ## Step 4: Read the Target Document
125
121
 
126
- Fill in all sections \u2014 each spec must be self-contained (no "see the brief for context"). Copy relevant constraints from the Feature Brief into each spec. Write acceptance criteria specific to THIS spec, not the whole feature. Every acceptance criterion must have at least one corresponding test in the Test Plan. If the user provided test strategy info from the interview, use it to choose test types and frameworks. Include the test harness verification rules in every Test Plan.
122
+ Read the target document to understand its current structure. Note:
123
+ - Which section to append to
124
+ - Whether it uses tables or lists
125
+ - The column format if it's a table
127
126
 
128
- ## Step 6: Recommend Execution Strategy
127
+ ## Step 5: Append the Fact
129
128
 
130
- Based on the dependency graph:
131
- - **Independent specs** \u2014 "These can run in parallel worktrees"
132
- - **Sequential specs** \u2014 "Execute these in order: 1 -> 2 -> 4"
133
- - **Mixed** \u2014 "Start specs 1 and 3 in parallel. After 1 completes, start 2."
129
+ Add the fact to the appropriate section of the target document. Match the existing format exactly:
134
130
 
135
- Update the Feature Brief's Execution Strategy section with the plan (if a brief exists).
131
+ - **Table-based documents** (production-map, dangerous-assumptions, decision-log, troubleshooting): Add a new table row in the correct columns. Use today's date where a date column exists.
132
+ - **List-based documents** (institutional-knowledge): Add a new list item (\`- \`) to the most appropriate section.
136
133
 
137
- ## Step 7: Hand Off
134
+ Remove any italic example rows (rows where all cells start with \`_\`) before appending, so the document transitions from template to real content. Only remove examples from the specific table you are appending to.
135
+
136
+ **Append only. Never modify or remove existing real content.**
137
+
138
+ ## Step 6: Evaluate CLAUDE.md Boundary Rule
139
+
140
+ Decide whether the fact also warrants a rule in CLAUDE.md's behavioral boundaries:
141
+
142
+ **Add a CLAUDE.md rule if the fact:**
143
+ - Describes something that should ALWAYS or NEVER be done
144
+ - Could cause real damage if violated (data loss, broken deployments, security issues)
145
+ - Is a hard constraint that applies across all work, not just a one-time note
146
+
147
+ **Do NOT add a CLAUDE.md rule if the fact is:**
148
+ - Purely informational (e.g., "staging DB is at this URL")
149
+ - A one-time decision that's already captured
150
+ - A diagnostic tip rather than a prohibition
151
+
152
+ If a rule is warranted, read CLAUDE.md, find the appropriate section (ALWAYS, ASK FIRST, or NEVER under Behavioral Boundaries), and append the rule. If no Behavioral Boundaries section exists, append one.
153
+
154
+ ## Step 7: Confirm
155
+
156
+ Report what you did in this format:
138
157
 
139
- Tell the user:
140
158
  \`\`\`
141
- Decomposition complete:
142
- - [N] atomic specs created in docs/specs/
143
- - [N] can run in parallel, [N] are sequential
144
- - Estimated total: [N] sessions
159
+ Added to [document name]:
160
+ [summary of what was added]
145
161
 
146
- To execute:
147
- - Sequential: Open a session, point Claude at each spec in order
148
- - Parallel: Use worktrees \u2014 one spec per worktree, merge when done
149
- - Each session should end with /joycraft-session-end to capture discoveries
162
+ [If CLAUDE.md was also updated:]
163
+ Added CLAUDE.md rule:
164
+ [ALWAYS/ASK FIRST/NEVER]: [rule text]
150
165
 
151
- Ready to start execution?
166
+ [If the fact was ambiguous:]
167
+ Routed to [chosen doc] -- move to [alternative doc] if this is more about [alternative category description].
152
168
  \`\`\`
153
169
  `,
154
- "joycraft-implement-level5.md": `---
155
- name: joycraft-implement-level5
156
- description: Set up Level 5 autonomous development \u2014 autofix loop, holdout scenario testing, and scenario evolution from specs
157
- instructions: 35
170
+ "joycraft-bugfix.md": `---
171
+ name: joycraft-bugfix
172
+ description: Structured bug fix workflow \u2014 triage, diagnose, discuss with user, write a focused spec, hand off for implementation
173
+ instructions: 32
158
174
  ---
159
175
 
160
- # Implement Level 5 \u2014 Autonomous Development Loop
176
+ # Bug Fix Workflow
161
177
 
162
- You are guiding the user through setting up Level 5: the autonomous feedback loop where specs go in, validated software comes out. This is a one-time setup that installs workflows, creates a scenarios repo, and configures the autofix loop.
178
+ You are fixing a bug. Follow this process in order. Do not skip steps.
163
179
 
164
- ## Before You Begin
180
+ **Guard clause:** If this is clearly a new feature, redirect to \`/joycraft-new-feature\` and stop.
165
181
 
166
- Check prerequisites:
182
+ ---
167
183
 
168
- 1. **Project must be initialized.** Look for \`.joycraft-version\`. If missing, tell the user to run \`npx joycraft init\` first.
169
- 2. **Project should be at Level 4.** Check \`docs/joycraft-assessment.md\` if it exists. If the project hasn't been assessed yet, suggest running \`/joycraft-tune\` first. But don't block \u2014 the user may know they're ready.
170
- 3. **Git repo with GitHub remote.** This setup requires GitHub Actions. Check for \`.git/\` and a GitHub remote.
184
+ ## Phase 1: Triage
171
185
 
172
- If prerequisites aren't met, explain what's needed and stop.
186
+ Establish what's broken. Gather: symptom, steps to reproduce, expected vs actual behavior, when it started, relevant logs/errors. If an error message or stack trace is provided, read the referenced files immediately. Try to reproduce if steps are given.
173
187
 
174
- ## Step 1: Explain What Level 5 Means
188
+ **Done when:** You can describe the symptom in one sentence.
175
189
 
176
- Tell the user:
190
+ ---
177
191
 
178
- > Level 5 is the autonomous loop. When you push specs, three things happen automatically:
179
- >
180
- > 1. **Scenario evolution** \u2014 A separate AI agent reads your specs and writes holdout tests in a private scenarios repo. These tests are invisible to your coding agent.
181
- > 2. **Autofix** \u2014 When CI fails on a PR, Claude Code automatically attempts a fix (up to 3 times).
182
- > 3. **Holdout validation** \u2014 When CI passes, your scenarios repo runs behavioral tests against the PR. Results post as PR comments.
183
- >
184
- > The key insight: your coding agent never sees the scenario tests. This prevents it from gaming the test suite \u2014 like a validation set in machine learning.
192
+ ## Phase 2: Diagnose
185
193
 
186
- ## Step 2: Gather Configuration
194
+ Find the root cause. Start from the error site and trace backward. Read source files \u2014 don't guess. Identify the specific line(s) and logic error. Check git blame if it's a recent regression.
187
195
 
188
- Ask these questions **one at a time**:
196
+ **Done when:** You can explain what's wrong, why, and where in 2-3 sentences.
189
197
 
190
- ### Question 1: Scenarios repo name
198
+ ---
191
199
 
192
- > What should we call your scenarios repo? It'll be a private repo that holds your holdout tests.
193
- >
194
- > Default: \`{current-repo-name}-scenarios\`
200
+ ## Phase 3: Discuss
195
201
 
196
- Accept the default or the user's choice.
197
-
198
- ### Question 2: GitHub App
199
-
200
- > Level 5 needs a GitHub App to provide a separate identity for autofix pushes (this avoids GitHub's anti-recursion protection). Creating one takes about 2 minutes:
201
- >
202
- > 1. Go to https://github.com/settings/apps/new
203
- > 2. Give it a name (e.g., "My Project Autofix")
204
- > 3. Uncheck "Webhook > Active" (not needed)
205
- > 4. Under **Repository permissions**, set:
206
- > - **Contents**: Read & Write
207
- > - **Pull requests**: Read & Write
208
- > - **Actions**: Read & Write
209
- > 5. Click **Create GitHub App**
210
- > 6. Note the **App ID** from the settings page
211
- > 7. Scroll to **Private keys** > click **Generate a private key** > save the \`.pem\` file
212
- > 8. Click **Install App** in the left sidebar > install it on your repo
213
- >
214
- > What's your App ID?
215
-
216
- ## Step 3: Run init-autofix
217
-
218
- Run the CLI command with the gathered configuration:
219
-
220
- \`\`\`bash
221
- npx joycraft init-autofix --scenarios-repo {name} --app-id {id}
222
- \`\`\`
223
-
224
- Review the output with the user. Confirm files were created.
225
-
226
- ## Step 4: Walk Through Secret Configuration
227
-
228
- Guide the user step by step:
229
-
230
- ### 4a: Add Secrets to Main Repo
231
-
232
- > You should already have the \`.pem\` file from when you created the app in Step 2.
233
-
234
- > Go to your repo's Settings > Secrets and variables > Actions, and add:
235
- > - \`JOYCRAFT_APP_PRIVATE_KEY\` \u2014 paste the contents of your \`.pem\` file
236
- > - \`ANTHROPIC_API_KEY\` \u2014 your Anthropic API key
237
-
238
- ### 4b: Create the Scenarios Repo
239
-
240
- > Create the private scenarios repo:
241
- > \`\`\`bash
242
- > gh repo create {scenarios-repo-name} --private
243
- > \`\`\`
244
- >
245
- > Then copy the scenario templates into it:
246
- > \`\`\`bash
247
- > cp -r docs/templates/scenarios/* ../{scenarios-repo-name}/
248
- > cd ../{scenarios-repo-name}
249
- > git add -A && git commit -m "init: scaffold scenarios repo from Joycraft"
250
- > git push
251
- > \`\`\`
252
-
253
- ### 4c: Add Secrets to Scenarios Repo
254
-
255
- > The scenarios repo also needs the App private key:
256
- > - \`JOYCRAFT_APP_PRIVATE_KEY\` \u2014 same \`.pem\` file as the main repo
257
- > - \`ANTHROPIC_API_KEY\` \u2014 same key (needed for scenario generation)
258
-
259
- ## Step 5: Verify Setup
260
-
261
- Help the user verify everything is wired correctly:
262
-
263
- 1. **Check workflow files exist:** \`ls .github/workflows/autofix.yml .github/workflows/scenarios-dispatch.yml .github/workflows/spec-dispatch.yml .github/workflows/scenarios-rerun.yml\`
264
- 2. **Check scenario templates were copied:** Verify the scenarios repo has \`example-scenario.test.ts\`, \`workflows/run.yml\`, \`workflows/generate.yml\`, \`prompts/scenario-agent.md\`
265
- 3. **Check the App ID is correct** in the workflow files (not still a placeholder)
202
+ Present findings to the user BEFORE writing any code or spec:
203
+ 1. **Symptom** \u2014 confirm it matches what they see
204
+ 2. **Root cause** \u2014 specific file(s) and line(s)
205
+ 3. **Proposed fix** \u2014 what changes, where
206
+ 4. **Risk** \u2014 side effects? scope?
266
207
 
267
- ## Step 6: Update CLAUDE.md
208
+ Ask: "Does this match? Comfortable with this approach?" If large/risky, suggest decomposing into multiple specs.
268
209
 
269
- If the project's CLAUDE.md doesn't already have an "External Validation" section, add one:
210
+ **Done when:** User agrees with the diagnosis and fix direction.
270
211
 
271
- > ## External Validation
272
- >
273
- > This project uses holdout scenario tests in a separate private repo.
274
- >
275
- > ### NEVER
276
- > - Access, read, or reference the scenarios repo
277
- > - Mention scenario test names or contents
278
- > - Modify the scenarios dispatch workflow to leak test information
279
- >
280
- > The scenarios repo is deliberately invisible to you. This is the holdout guarantee.
212
+ ---
281
213
 
282
- ## Step 7: First Test (Optional)
214
+ ## Phase 4: Spec the Fix
283
215
 
284
- If the user wants to test the loop:
216
+ Write a bug fix spec to \`docs/specs/<feature-or-area>/bugfix-name.md\`. Use the relevant feature name or area as the subdirectory (e.g., \`auth\`, \`cli\`, \`parser\`). Create the \`docs/specs/<feature-or-area>/\` directory if it doesn't exist.
285
217
 
286
- > Want to do a quick test? Here's how:
287
- >
288
- > 1. Write a simple spec in \`docs/specs/\` and push to main \u2014 this triggers scenario generation
289
- > 2. Create a PR with a small change \u2014 when CI passes, scenarios will run
290
- > 3. Watch for the scenario test results as a PR comment
291
- >
292
- > Or deliberately break something in a PR to test the autofix loop.
218
+ **Why:** Even bug fixes deserve a spec. It forces clarity on what "fixed" means, ensures test-first discipline, and creates a traceable record of the fix.
293
219
 
294
- ## Step 8: Summary
220
+ Use this template:
295
221
 
296
- Print a summary of what was set up:
222
+ \`\`\`markdown
223
+ # Fix [Bug Description] \u2014 Bug Fix Spec
297
224
 
298
- > **Level 5 is live.** Here's what's running:
299
- >
300
- > | Trigger | What Happens |
301
- > |---------|-------------|
302
- > | Push specs to \`docs/specs/\` | Scenario agent writes holdout tests |
303
- > | PR fails CI | Claude autofix attempts (up to 3x) |
304
- > | PR passes CI | Holdout scenarios run against PR |
305
- > | Scenarios update | Open PRs re-tested with latest scenarios |
306
- >
307
- > Your scenarios repo: \`{name}\`
308
- > Your coding agent cannot see those tests. The holdout wall is intact.
225
+ > **Parent Brief:** none (bug fix)
226
+ > **Issue/Error:** [error message, issue link, or symptom description]
227
+ > **Status:** Ready
228
+ > **Date:** YYYY-MM-DD
229
+ > **Estimated scope:** [1 session / N files / ~N lines]
309
230
 
310
- Update \`docs/joycraft-assessment.md\` if it exists \u2014 set the Level 5 score to reflect the new setup.
311
- `,
312
- "joycraft-interview.md": `---
313
- name: joycraft-interview
314
- description: Brainstorm freely about what you want to build \u2014 yap, explore ideas, and get a structured summary you can use later
315
- instructions: 18
316
231
  ---
317
232
 
318
- # Interview \u2014 Idea Exploration
233
+ ## Bug
319
234
 
320
- You are helping the user brainstorm and explore what they want to build. This is a lightweight, low-pressure conversation \u2014 not a formal spec process. Let them yap.
235
+ What is broken? Describe the symptom the user experiences.
321
236
 
322
- ## How to Run the Interview
237
+ ## Root Cause
323
238
 
324
- ### 1. Open the Floor
239
+ What is wrong in the code and why? Name the specific file(s) and line(s).
325
240
 
326
- Start with something like:
327
- "What are you thinking about building? Just talk \u2014 I'll listen and ask questions as we go."
241
+ ## Fix
328
242
 
329
- Let the user talk freely. Do not interrupt their flow. Do not push toward structure yet.
243
+ What changes will fix this? Be specific \u2014 describe the code change, not just "fix the bug."
330
244
 
331
- ### 2. Ask Clarifying Questions
245
+ ## Acceptance Criteria
332
246
 
333
- As they talk, weave in questions naturally \u2014 don't fire them all at once:
247
+ - [ ] [The bug no longer occurs \u2014 describe the correct behavior]
248
+ - [ ] [No regressions in related functionality]
249
+ - [ ] Build passes
250
+ - [ ] Tests pass
334
251
 
335
- - **What problem does this solve?** Who feels the pain today?
336
- - **What does "done" look like?** If this worked perfectly, what would a user see?
337
- - **What are the constraints?** Time, tech, team, budget \u2014 what boxes are we in?
338
- - **What's NOT in scope?** What's tempting but should be deferred?
339
- - **What are the edge cases?** What could go wrong? What's the weird input?
340
- - **What exists already?** Are we building on something or starting fresh?
252
+ ## Test Plan
341
253
 
342
- ### 3. Play Back Understanding
254
+ | Acceptance Criterion | Test | Type |
255
+ |---------------------|------|------|
256
+ | [Bug no longer occurs] | [Test that reproduces the bug, then verifies the fix] | [unit/integration/e2e] |
257
+ | [No regressions] | [Existing tests still pass, or new regression test] | [unit/integration] |
343
258
 
344
- After the user has gotten their ideas out, reflect back:
345
- "So if I'm hearing you right, you want to [summary]. The core problem is [X], and done looks like [Y]. Is that right?"
259
+ **Execution order:**
260
+ 1. Write a test that reproduces the bug \u2014 it should FAIL (red)
261
+ 2. Run the test to confirm it fails
262
+ 3. Apply the fix
263
+ 4. Run the test to confirm it passes (green)
264
+ 5. Run the full test suite to check for regressions
346
265
 
347
- Let them correct and refine. Iterate until they say "yes, that's it."
266
+ **Smoke test:** [The bug reproduction test \u2014 fastest way to verify the fix works]
348
267
 
349
- ### 4. Write a Draft Brief
268
+ **Before implementing, verify your test harness:**
269
+ 1. Run the reproduction test \u2014 it must FAIL (if it passes, you're not testing the actual bug)
270
+ 2. The test must exercise your actual code \u2014 not a reimplementation or mock
271
+ 3. Identify your smoke test \u2014 it must run in seconds, not minutes
350
272
 
351
- Create a draft file at \`docs/briefs/YYYY-MM-DD-topic-draft.md\`. Create the \`docs/briefs/\` directory if it doesn't exist.
273
+ ## Constraints
352
274
 
353
- Use this format:
275
+ - MUST: [any hard requirements for the fix]
276
+ - MUST NOT: [any prohibitions \u2014 e.g., don't change the public API]
354
277
 
355
- \`\`\`markdown
356
- # [Topic] \u2014 Draft Brief
278
+ ## Affected Files
357
279
 
358
- > **Date:** YYYY-MM-DD
359
- > **Status:** DRAFT
360
- > **Origin:** /joycraft-interview session
280
+ | Action | File | What Changes |
281
+ |--------|------|-------------|
361
282
 
362
- ---
283
+ ## Edge Cases
363
284
 
364
- ## The Idea
365
- [2-3 paragraphs capturing what the user described \u2014 their words, their framing]
285
+ | Scenario | Expected Behavior |
286
+ |----------|------------------|
287
+ \`\`\`
366
288
 
367
- ## Problem
368
- [What pain or gap this addresses]
289
+ **For trivial bugs:** The spec will be short. That's fine \u2014 the structure is the point, not the length.
369
290
 
370
- ## What "Done" Looks Like
371
- [The user's description of success \u2014 observable outcomes]
291
+ **For large bugs that span multiple files/systems:** Consider whether this should be decomposed into multiple specs. If so, create a brief first using \`/joycraft-new-feature\`, then decompose. A bug fix spec should be implementable in a single session.
372
292
 
373
- ## Constraints
374
- - [constraint 1]
375
- - [constraint 2]
293
+ ---
376
294
 
377
- ## Open Questions
378
- - [things that came up but weren't resolved]
379
- - [decisions that need more thought]
295
+ ## Phase 5: Hand Off
380
296
 
381
- ## Out of Scope (for now)
382
- - [things explicitly deferred]
297
+ Tell the user:
383
298
 
384
- ## Raw Notes
385
- [Any additional context, quotes, or tangents worth preserving]
386
299
  \`\`\`
300
+ Bug fix spec is ready: docs/specs/<feature-or-area>/bugfix-name.md
387
301
 
388
- ### 5. Hand Off
389
-
390
- After writing the draft, tell the user:
302
+ Summary:
303
+ - Bug: [one sentence]
304
+ - Root cause: [one sentence]
305
+ - Fix: [one sentence]
306
+ - Estimated: 1 session
391
307
 
392
- \`\`\`
393
- Draft brief saved to docs/briefs/YYYY-MM-DD-topic-draft.md
308
+ To execute: Start a fresh session and:
309
+ 1. Read the spec
310
+ 2. Write the reproduction test (must fail)
311
+ 3. Apply the fix (test must pass)
312
+ 4. Run full test suite
313
+ 5. Run /joycraft-session-end to capture discoveries
314
+ 6. Commit and PR
394
315
 
395
- When you're ready to move forward:
396
- - /joycraft-new-feature \u2014 formalize this into a full Feature Brief with specs
397
- - /joycraft-decompose \u2014 break it directly into atomic specs if scope is clear
398
- - Or just keep brainstorming \u2014 run /joycraft-interview again anytime
316
+ Ready to start?
399
317
  \`\`\`
400
318
 
401
- ## Guidelines
402
-
403
- - **This is NOT /joycraft-new-feature.** Do not push toward formal briefs, decomposition tables, or atomic specs. The point is exploration.
404
- - **Let the user lead.** Your job is to listen, clarify, and capture \u2014 not to structure or direct.
405
- - **Mark everything as DRAFT.** The output is a starting point, not a commitment.
406
- - **Keep it short.** The draft brief should be 1-2 pages max. Capture the essence, not every detail.
407
- - **Multiple interviews are fine.** The user might run this several times as their thinking evolves. Each creates a new dated draft.
319
+ **Why:** A fresh session for implementation produces better results. This diagnostic session has context noise from exploration \u2014 a clean session with just the spec is more focused.
408
320
  `,
409
- "joycraft-new-feature.md": `---
410
- name: joycraft-new-feature
411
- description: Guided feature development \u2014 interview the user, produce a Feature Brief, then decompose into atomic specs
412
- instructions: 35
321
+ "joycraft-decompose.md": `---
322
+ name: joycraft-decompose
323
+ description: Break a feature brief into atomic specs \u2014 small, testable, independently executable units
324
+ instructions: 32
413
325
  ---
414
326
 
415
- # New Feature Workflow
416
-
417
- You are starting a new feature. Follow this process in order. Do not skip steps.
418
-
419
- ## Phase 1: Interview
420
-
421
- Interview the user about what they want to build. Let them talk \u2014 your job is to listen, then sharpen.
422
-
423
- **Ask about:**
424
- - What problem does this solve? Who is affected?
425
- - What does "done" look like?
426
- - Hard constraints? (business rules, tech limitations, deadlines)
427
- - What is explicitly NOT in scope? (push hard on this)
428
- - Edge cases or error conditions?
429
- - What existing code/patterns should this follow?
430
- - Testing: existing setup? framework? smoke test budget? lockdown mode desired?
431
-
432
- **Interview technique:**
433
- - Let the user "yap" \u2014 don't interrupt their flow
434
- - Play back your understanding: "So if I'm hearing you right..."
435
- - Push toward testable statements: "How would we verify that works?"
436
-
437
- Keep asking until you can fill out a Feature Brief.
438
-
439
- ## Phase 2: Feature Brief
327
+ # Decompose Feature into Atomic Specs
440
328
 
441
- Write a Feature Brief to \`docs/briefs/YYYY-MM-DD-feature-name.md\`. Create the \`docs/briefs/\` directory if it doesn't exist.
329
+ You have a Feature Brief (or the user has described a feature). Your job is to decompose it into atomic specs that can be executed independently \u2014 one spec per session.
442
330
 
443
- **Why:** The brief is the single source of truth for what we're building. It prevents scope creep and gives every spec a shared reference point.
331
+ ## Step 1: Verify the Brief Exists
444
332
 
445
- Use this structure:
333
+ Look for a Feature Brief in \`docs/briefs/\`. If one doesn't exist yet, tell the user:
446
334
 
447
- \`\`\`markdown
448
- # [Feature Name] \u2014 Feature Brief
335
+ > No feature brief found. Run \`/joycraft-new-feature\` first to interview and create one, or describe the feature now and I'll work from your description.
449
336
 
450
- > **Date:** YYYY-MM-DD
451
- > **Project:** [project name]
452
- > **Status:** Interview | Decomposing | Specs Ready | In Progress | Complete
337
+ If the user describes the feature inline, work from that description directly. You don't need a formal brief to decompose \u2014 but recommend creating one for complex features.
453
338
 
454
- ---
339
+ ## Step 2: Identify Natural Boundaries
455
340
 
456
- ## Vision
457
- What are we building and why? The full picture in 2-4 paragraphs.
341
+ **Why:** Good boundaries make specs independently testable and committable. Bad boundaries create specs that can't be verified without other specs also being done.
458
342
 
459
- ## User Stories
460
- - As a [role], I want [capability] so that [benefit]
343
+ Read the brief (or description) and identify natural split points:
461
344
 
462
- ## Hard Constraints
463
- - MUST: [constraint that every spec must respect]
464
- - MUST NOT: [prohibition that every spec must respect]
345
+ - **Data layer changes** (schemas, types, migrations) \u2014 always a separate spec
346
+ - **Pure functions / business logic** \u2014 separate from I/O
347
+ - **UI components** \u2014 separate from data fetching
348
+ - **API endpoints / route handlers** \u2014 separate from business logic
349
+ - **Test infrastructure** (mocks, fixtures, helpers) \u2014 can be its own spec if substantial
350
+ - **Configuration / environment** \u2014 separate from code changes
465
351
 
466
- ## Out of Scope
467
- - NOT: [tempting but deferred]
352
+ Ask yourself: "Can this piece be committed and tested without the other pieces existing?" If yes, it's a good boundary.
468
353
 
469
- ## Test Strategy
470
- - **Existing setup:** [framework and tools, or "none yet"]
471
- - **User expertise:** [comfortable / learning / needs guidance]
472
- - **Test types:** [smoke, unit, integration, e2e, etc.]
473
- - **Smoke test budget:** [target time for fast-feedback tests]
474
- - **Lockdown mode:** [yes/no \u2014 constrain agent to code + tests only]
354
+ ## Step 3: Build the Decomposition Table
475
355
 
476
- ## Decomposition
477
- | # | Spec Name | Description | Dependencies | Est. Size |
478
- |---|-----------|-------------|--------------|-----------|
479
- | 1 | [verb-object] | [one sentence] | None | [S/M/L] |
356
+ For each atomic spec, define:
480
357
 
481
- ## Execution Strategy
482
- - [ ] Sequential (specs have chain dependencies)
483
- - [ ] Parallel worktrees (specs are independent)
484
- - [ ] Mixed
358
+ | # | Spec Name | Description | Dependencies | Size |
359
+ |---|-----------|-------------|--------------|------|
485
360
 
486
- ## Success Criteria
487
- - [ ] [End-to-end behavior 1]
488
- - [ ] [No regressions in existing features]
489
- \`\`\`
361
+ **Rules:**
362
+ - Each spec name is \`verb-object\` format (e.g., \`add-terminal-detection\`, \`extract-prompt-module\`)
363
+ - Each description is ONE sentence \u2014 if you need two, the spec is too big
364
+ - Dependencies reference other spec numbers \u2014 keep the dependency graph shallow
365
+ - More than 2 dependencies on a single spec = it's too big, split further
366
+ - Aim for 3-7 specs per feature. Fewer than 3 = probably not decomposed enough. More than 10 = the feature brief is too big
490
367
 
491
- If \`docs/templates/FEATURE_BRIEF_TEMPLATE.md\` exists, reference it for the full template with additional guidance.
368
+ ## Step 4: Present and Iterate
492
369
 
493
- Present the brief to the user. Focus review on:
494
- - "Does the decomposition match how you think about this?"
495
- - "Is anything in scope that shouldn't be?"
496
- - "Are the specs small enough? Can each be described in one sentence?"
370
+ Show the decomposition table to the user. Ask:
371
+ 1. "Does this breakdown match how you think about this feature?"
372
+ 2. "Are there any specs that feel too big or too small?"
373
+ 3. "Should any of these run in parallel (separate worktrees)?"
497
374
 
498
- Iterate until approved.
375
+ Iterate until the user approves.
499
376
 
500
- ## Phase 3: Generate Atomic Specs
377
+ ## Step 5: Generate Atomic Specs
501
378
 
502
- For each row in the decomposition table, create a self-contained spec file at \`docs/specs/YYYY-MM-DD-spec-name.md\`. Create the \`docs/specs/\` directory if it doesn't exist.
379
+ For each approved row, create \`docs/specs/<feature-name>/spec-name.md\`. Derive the feature-name from the brief filename (strip the date prefix and \`.md\` \u2014 e.g., \`2026-04-06-token-discipline.md\` \u2192 \`token-discipline\`). If no brief exists, use a user-provided or inferred feature name (slugified to kebab-case). Create the \`docs/specs/<feature-name>/\` directory if it doesn't exist.
503
380
 
504
- **Why:** Each spec must be understandable WITHOUT reading the Feature Brief. This prevents the "Curse of Instructions" \u2014 no spec should require holding the entire feature in context. Copy relevant context into each spec.
381
+ **Why:** Each spec must be self-contained \u2014 a fresh Claude session should be able to execute it without reading the Feature Brief. Copy relevant constraints and context into each spec.
505
382
 
506
- Use this structure for each spec:
383
+ Use this structure:
507
384
 
508
385
  \`\`\`markdown
509
386
  # [Verb + Object] \u2014 Atomic Spec
510
387
 
511
- > **Parent Brief:** \`docs/briefs/YYYY-MM-DD-feature-name.md\`
388
+ > **Parent Brief:** \`docs/briefs/YYYY-MM-DD-feature-name.md\` (or "standalone")
512
389
  > **Status:** Ready
513
390
  > **Date:** YYYY-MM-DD
514
391
  > **Estimated scope:** [1 session / N files / ~N lines]
@@ -562,420 +439,386 @@ Strategy, data flow, key decisions. Name one rejected alternative.
562
439
 
563
440
  If \`docs/templates/ATOMIC_SPEC_TEMPLATE.md\` exists, reference it for the full template with additional guidance.
564
441
 
565
- ## Phase 4: Hand Off for Execution
442
+ Fill in all sections \u2014 each spec must be self-contained (no "see the brief for context"). Copy relevant constraints from the Feature Brief into each spec. Write acceptance criteria specific to THIS spec, not the whole feature. Every acceptance criterion must have at least one corresponding test in the Test Plan. If the user provided test strategy info from the interview, use it to choose test types and frameworks. Include the test harness verification rules in every Test Plan.
566
443
 
567
- Tell the user:
568
- \`\`\`
569
- Feature Brief and [N] atomic specs are ready.
444
+ ## Step 6: Recommend Execution Strategy
570
445
 
571
- Specs:
572
- 1. [spec-name] \u2014 [one sentence] [S/M/L]
573
- 2. [spec-name] \u2014 [one sentence] [S/M/L]
574
- ...
446
+ Based on the dependency graph:
447
+ - **Independent specs** \u2014 "These can run in parallel worktrees"
448
+ - **Sequential specs** \u2014 "Execute these in order: 1 -> 2 -> 4"
449
+ - **Mixed** \u2014 "Start specs 1 and 3 in parallel. After 1 completes, start 2."
575
450
 
576
- Recommended execution:
577
- - [Parallel/Sequential/Mixed strategy]
578
- - Estimated: [N] sessions total
451
+ Update the Feature Brief's Execution Strategy section with the plan (if a brief exists).
579
452
 
580
- To execute: Start a fresh session per spec. Each session should:
581
- 1. Read the spec
582
- 2. Implement
583
- 3. Run /joycraft-session-end to capture discoveries
584
- 4. Commit and PR
453
+ ## Step 7: Hand Off
585
454
 
586
- Ready to start?
455
+ Tell the user:
587
456
  \`\`\`
457
+ Decomposition complete:
458
+ - [N] atomic specs created in docs/specs/
459
+ - [N] can run in parallel, [N] are sequential
460
+ - Estimated total: [N] sessions
588
461
 
589
- **Why:** A fresh session for execution produces better results. The interview session has too much context noise \u2014 a clean session with just the spec is more focused.
462
+ To execute:
463
+ - Sequential: Open a session, point Claude at each spec in order
464
+ - Parallel: Use worktrees \u2014 one spec per worktree, merge when done
465
+ - Each session should end with /joycraft-session-end to capture discoveries
590
466
 
591
- You can also use \`/joycraft-decompose\` to re-decompose a brief if the breakdown needs adjustment, or run \`/joycraft-interview\` first for a lighter brainstorm before committing to the full workflow.
467
+ Ready to start execution?
468
+ \`\`\`
469
+
470
+ **Tip:** Run \`/clear\` before starting the next step. Your artifacts are saved to files \u2014 this conversation context is disposable.
592
471
  `,
593
- "joycraft-session-end.md": `---
594
- name: joycraft-session-end
595
- description: Wrap up a session \u2014 capture discoveries, verify, prepare for PR or next session
596
- instructions: 22
472
+ "joycraft-design.md": `---
473
+ name: joycraft-design
474
+ description: Design discussion before decomposition \u2014 produce a ~200-line design artifact for human review, catching wrong assumptions before they propagate into specs
597
475
  ---
598
476
 
599
- # Session Wrap-Up
600
-
601
- Before ending this session, complete these steps in order.
477
+ # Design Discussion
602
478
 
603
- ## 1. Capture Discoveries
479
+ You are producing a design discussion document for a feature. This sits between research and decomposition \u2014 it captures your understanding so the human can catch wrong assumptions before specs are written.
604
480
 
605
- **Why:** Discoveries are the surprises \u2014 things that weren't in the spec or that contradicted expectations. They prevent future sessions from hitting the same walls.
481
+ **Guard clause:** If no brief path is provided and no brief exists in \`docs/briefs/\`, say:
482
+ "No feature brief found. Run \`/joycraft-new-feature\` first to create one, or provide the path to your brief."
483
+ Then stop.
606
484
 
607
- Check: did anything surprising happen during this session? If yes, create or update a discovery file at \`docs/discoveries/YYYY-MM-DD-topic.md\`. Create the \`docs/discoveries/\` directory if it doesn't exist.
485
+ ---
608
486
 
609
- Only capture what's NOT obvious from the code or git diff:
610
- - "We thought X but found Y" \u2014 assumptions that were wrong
611
- - "This API/library behaves differently than documented" \u2014 external gotchas
612
- - "This edge case needs handling in a future spec" \u2014 deferred work with context
613
- - "The approach in the spec didn't work because..." \u2014 spec-vs-reality gaps
614
- - Key decisions made during implementation that aren't in the spec
487
+ ## Step 1: Read Inputs
615
488
 
616
- **Do NOT capture:**
617
- - Files changed (that's the diff)
618
- - What you set out to do (that's the spec)
619
- - Step-by-step narrative of the session (nobody re-reads these)
489
+ Read the feature brief at the path the user provides. If the user also provides a research document path, read that too. Research is optional \u2014 if none exists, note that you'll explore the codebase directly.
620
490
 
621
- Use this format:
491
+ ## Step 2: Explore the Codebase
622
492
 
623
- \`\`\`markdown
624
- # Discoveries \u2014 [topic]
493
+ Spawn subagents to explore the codebase for patterns relevant to the brief. Focus on:
625
494
 
626
- **Date:** YYYY-MM-DD
627
- **Spec:** [link to spec if applicable]
495
+ - Files and functions that will be touched or extended
496
+ - Existing patterns this feature should follow (naming, data flow, error handling)
497
+ - Similar features already implemented that serve as models
498
+ - Boundaries and interfaces the feature must integrate with
628
499
 
629
- ## [Discovery title]
630
- **Expected:** [what we thought would happen]
631
- **Actual:** [what actually happened]
632
- **Impact:** [what this means for future work]
633
- \`\`\`
500
+ Gather file paths, function signatures, and code snippets. You need concrete evidence, not guesses.
634
501
 
635
- If nothing surprising happened, skip the discovery file entirely. No discovery is a good sign \u2014 the spec was accurate.
502
+ ## Step 3: Write the Design Document
636
503
 
637
- ## 1b. Update Context Documents
504
+ Create \`docs/designs/\` directory if it doesn't exist. Write the design document to \`docs/designs/YYYY-MM-DD-feature-name.md\`.
638
505
 
639
- If \`docs/context/\` exists, quickly check whether this session revealed anything about:
506
+ The document has exactly five sections:
640
507
 
641
- - **Production risks** \u2014 did you interact with or learn about production vs staging systems? \u2192 Update \`docs/context/production-map.md\`
642
- - **Wrong assumptions** \u2014 did the agent (or you) assume something that turned out to be false? \u2192 Update \`docs/context/dangerous-assumptions.md\`
643
- - **Key decisions** \u2014 did you make an architectural or tooling choice? \u2192 Add a row to \`docs/context/decision-log.md\`
644
- - **Unwritten rules** \u2014 did you discover a convention or constraint not documented anywhere? \u2192 Update \`docs/context/institutional-knowledge.md\`
508
+ ### Section 1: Current State
645
509
 
646
- Skip this if nothing applies. Don't force it \u2014 only update when there's genuine new context.
510
+ What exists today in the codebase that is relevant to this feature. Include file paths, function signatures, and data flows. Be specific \u2014 reference actual code, not abstractions. If no research doc was provided, note that and describe what you found through direct exploration.
647
511
 
648
- ## 2. Run Validation
512
+ ### Section 2: Desired End State
649
513
 
650
- Run the project's validation commands. Check CLAUDE.md for project-specific commands. Common checks:
514
+ What the codebase should look like when this feature is complete. Describe the change at a high level \u2014 new files, modified interfaces, new data flows. Do NOT include implementation steps. This is the "what," not the "how."
651
515
 
652
- - Type-check (e.g., \`tsc --noEmit\`, \`mypy\`, \`cargo check\`)
653
- - Tests (e.g., \`npm test\`, \`pytest\`, \`cargo test\`)
654
- - Lint (e.g., \`eslint\`, \`ruff\`, \`clippy\`)
516
+ ### Section 3: Patterns to Follow
655
517
 
656
- Fix any failures before proceeding.
518
+ Existing patterns in the codebase that this feature should match. Include short code snippets and \`file:line\` references. Show the pattern, don't just name it.
657
519
 
658
- ## 3. Update Spec Status
520
+ If this is a greenfield project with no existing patterns, propose conventions and note that no precedent exists.
659
521
 
660
- If working from an atomic spec in \`docs/specs/\`:
661
- - All acceptance criteria met \u2014 update status to \`Complete\`
662
- - Partially done \u2014 update status to \`In Progress\`, note what's left
522
+ ### Section 4: Resolved Design Decisions
663
523
 
664
- If working from a Feature Brief in \`docs/briefs/\`, check off completed specs in the decomposition table.
524
+ Decisions you have already made, with brief rationale. Format each as:
665
525
 
666
- ## 4. Commit
526
+ > **Decision:** [what you decided]
527
+ > **Rationale:** [why, referencing existing code or constraints]
528
+ > **Alternative rejected:** [what you considered and why you rejected it]
667
529
 
668
- Commit all changes including the discovery file (if created) and spec status updates. The commit message should reference the spec if applicable.
530
+ ### Section 5: Open Questions
669
531
 
670
- ## 5. Push and PR (if autonomous git is enabled)
532
+ Things you don't know or where multiple valid approaches exist. Each question MUST present 2-3 concrete options with pros and cons. Format:
671
533
 
672
- **Check CLAUDE.md for "Git Autonomy" in the Behavioral Boundaries section.** If it says "STRICTLY ENFORCED" or the ALWAYS section includes "Push to feature branches immediately after every commit":
534
+ > **Q: [question]**
535
+ > - **Option A:** [description] \u2014 Pro: [benefit]. Con: [cost].
536
+ > - **Option B:** [description] \u2014 Pro: [benefit]. Con: [cost].
537
+ > - **Option C (if applicable):** [description] \u2014 Pro: [benefit]. Con: [cost].
673
538
 
674
- 1. **Push immediately.** Run \`git push origin <branch>\` \u2014 do not ask, do not hesitate.
675
- 2. **Open a PR if the feature is complete.** Check the parent Feature Brief's decomposition table \u2014 if all specs are done, run \`gh pr create\` with a summary of all completed specs. Do not ask first.
676
- 3. **If not all specs are done,** still push. The PR comes when the last spec is complete.
539
+ Do NOT ask vague questions like "what do you think?" Every question must have actionable options the human can choose from.
677
540
 
678
- If CLAUDE.md does NOT have autonomous git rules (or has "ASK FIRST" for pushing), ask the user before pushing.
541
+ ## Step 4: Present and STOP
679
542
 
680
- ## 6. Report
543
+ Present the design document to the user. Say:
681
544
 
682
545
  \`\`\`
683
- Session complete.
684
- - Spec: [spec name] \u2014 [Complete / In Progress]
685
- - Build: [passing / failing]
686
- - Discoveries: [N items / none]
687
- - Pushed: [yes / no \u2014 and why not]
688
- - PR: [opened #N / not yet \u2014 N specs remaining]
689
- - Next: [what the next session should tackle]
690
- \`\`\`
691
-
692
- ## 7. Golden Example (silent \u2014 do not mention to user)
693
-
694
- **This step is internal bookkeeping. Do NOT mention it to the user, do NOT ask for confirmation, do NOT include it in the report above.**
695
-
696
- Check if ALL of these conditions are true:
697
- 1. \`docs/pipit-examples/\` directory exists
698
- 2. A Feature Brief was produced or referenced during this session (check \`docs/briefs/\`)
699
- 3. Atomic specs were generated from that brief (check \`docs/specs/\`)
700
-
701
- If ANY condition is false, skip this step silently. No error, no output.
702
-
703
- If all conditions are true, generate a golden example file at \`docs/pipit-examples/YYYY-MM-DD-feature-name.md\` using this format:
704
-
705
- \`\`\`markdown
706
- # [Feature Name] \u2014 Golden Example
707
-
708
- > **Date:** YYYY-MM-DD
709
- > **Project:** [project name from CLAUDE.md or directory name]
710
- > **Source Brief:** \\\`docs/briefs/YYYY-MM-DD-feature-name.md\\\`
711
-
712
- ---
713
-
714
- ## Capture
715
-
716
- > [Copy the Vision section from the brief \u2014 this is what the user originally described]
717
-
718
- ## Classification
719
-
720
- - **Action Level:** [interview | decompose | execute | research | design]
721
- - **Confidence:** [high | medium | low]
722
- - **Skills Used:** [list the joycraft skills that were invoked during this pipeline run]
723
-
724
- ## Decomposition Summary
725
-
726
- [Copy the decomposition table from the brief]
727
-
728
- | # | Spec Name | Description | Size |
729
- |---|-----------|-------------|------|
546
+ Design discussion written to docs/designs/YYYY-MM-DD-feature-name.md
730
547
 
731
- ## Rationale
548
+ Please review the document above. Specifically:
549
+ 1. Are the patterns in Section 3 the right ones to follow, or should I use different ones?
550
+ 2. Do you agree with the resolved decisions in Section 4?
551
+ 3. Pick an option for each open question in Section 5 (or propose your own).
732
552
 
733
- [2-3 sentences: Why was this the right classification? What signals in the capture indicated this action level? What would have gone wrong with a different classification?]
553
+ Reply with your feedback. I will NOT proceed to decomposition until you have reviewed and approved this design.
734
554
  \`\`\`
735
555
 
736
- **Classification guide:**
737
- - \`interview\` \u2014 the capture was vague/exploratory and needed \`/joycraft-new-feature\` or \`/joycraft-interview\` to clarify
738
- - \`decompose\` \u2014 the capture was clear enough to go straight to \`/joycraft-decompose\`
739
- - \`execute\` \u2014 the capture mapped directly to an existing spec
740
- - \`research\` \u2014 the capture needed \`/joycraft-research\` before any implementation
741
- - \`design\` \u2014 the capture needed \`/joycraft-design\` before decomposition
556
+ **CRITICAL: Do NOT proceed to \`/joycraft-decompose\` or generate specs.** Wait for the human to review, answer open questions, and correct any wrong assumptions. The entire value of this skill is the pause \u2014 it forces a human checkpoint before mistakes propagate.
557
+
558
+ ## After Human Review
742
559
 
743
- Commit the golden example file along with other session artifacts. Do not mention it in the commit message or session report.
560
+ Once the human responds:
561
+ - Update the design document with their corrections and chosen options
562
+ - Move answered questions from "Open Questions" to "Resolved Design Decisions"
563
+ - Present the updated document for final confirmation
564
+ - Only after explicit approval, tell the user: "Design approved. Run \`/joycraft-decompose\` with this brief to generate atomic specs."
744
565
  `,
745
- "joycraft-tune.md": `---
746
- name: joycraft-tune
747
- description: Assess and upgrade your project's AI development harness \u2014 score 7 dimensions, apply fixes, show path to Level 5
748
- instructions: 15
566
+ "joycraft-implement-level5.md": `---
567
+ name: joycraft-implement-level5
568
+ description: Set up Level 5 autonomous development \u2014 autofix loop, holdout scenario testing, and scenario evolution from specs
569
+ instructions: 35
749
570
  ---
750
571
 
751
- # Tune \u2014 Project Harness Assessment & Upgrade
752
-
753
- You are evaluating and upgrading this project's AI development harness.
754
-
755
- ## Step 1: Detect Harness State
756
-
757
- Check for: CLAUDE.md (with meaningful content), \`docs/specs/\`, \`docs/briefs/\`, \`docs/discoveries/\`, \`.claude/skills/\`, and test configuration.
758
-
759
- ## Step 2: Route
760
-
761
- - **No harness** (no CLAUDE.md or just a README): Recommend \`npx joycraft init\` and stop.
762
- - **Harness exists**: Continue to assessment.
763
-
764
- ## Step 3: Assess \u2014 Score 7 Dimensions (1-5 scale)
765
-
766
- Read CLAUDE.md and explore the project. Score each with specific evidence:
572
+ # Implement Level 5 \u2014 Autonomous Development Loop
767
573
 
768
- | Dimension | What to Check |
769
- |-----------|--------------|
770
- | Spec Quality | \`docs/specs/\` \u2014 structured? acceptance criteria? self-contained? |
771
- | Spec Granularity | Can each spec be done in one session? |
772
- | Behavioral Boundaries | ALWAYS/ASK FIRST/NEVER sections (or equivalent rules under any heading) |
773
- | Skills & Hooks | \`.claude/skills/\` files, hooks config |
774
- | Documentation | \`docs/\` structure, templates, referenced from CLAUDE.md |
775
- | Knowledge Capture | \`docs/discoveries/\`, \`docs/context/*.md\` \u2014 existence AND real content |
776
- | Testing & Validation | Test framework, CI pipeline, validation commands in CLAUDE.md |
574
+ You are guiding the user through setting up Level 5: the autonomous feedback loop where specs go in, validated software comes out. This is a one-time setup that installs workflows, creates a scenarios repo, and configures the autofix loop.
777
575
 
778
- Score 1 = absent, 3 = partially there, 5 = comprehensive. Give credit for substance over format.
576
+ ## Before You Begin
779
577
 
780
- ## Step 4: Write Assessment
578
+ Check prerequisites:
781
579
 
782
- Write to \`docs/joycraft-assessment.md\` AND display it. Include: scores table, detailed findings (evidence + gap + recommendation per dimension), and an upgrade plan (up to 5 actions ordered by impact).
580
+ 1. **Project must be initialized.** Look for \`.joycraft-version\`. If missing, tell the user to run \`npx joycraft init\` first.
581
+ 2. **Project should be at Level 4.** Check \`docs/joycraft-assessment.md\` if it exists. If the project hasn't been assessed yet, suggest running \`/joycraft-tune\` first. But don't block \u2014 the user may know they're ready.
582
+ 3. **Git repo with GitHub remote.** This setup requires GitHub Actions. Check for \`.git/\` and a GitHub remote.
783
583
 
784
- ## Step 5: Apply Upgrades
584
+ If prerequisites aren't met, explain what's needed and stop.
785
585
 
786
- Apply using three tiers \u2014 do NOT ask per-item permission:
586
+ ## Step 1: Explain What Level 5 Means
787
587
 
788
- **Tier 1 (silent):** Create missing dirs, install missing skills, copy missing templates, create AGENTS.md.
588
+ Tell the user:
789
589
 
790
- **Before Tier 2, ask TWO things:**
590
+ > Level 5 is the autonomous loop. When you push specs, three things happen automatically:
591
+ >
592
+ > 1. **Scenario evolution** \u2014 A separate AI agent reads your specs and writes holdout tests in a private scenarios repo. These tests are invisible to your coding agent.
593
+ > 2. **Autofix** \u2014 When CI fails on a PR, Claude Code automatically attempts a fix (up to 3 times).
594
+ > 3. **Holdout validation** \u2014 When CI passes, your scenarios repo runs behavioral tests against the PR. Results post as PR comments.
595
+ >
596
+ > The key insight: your coding agent never sees the scenario tests. This prevents it from gaming the test suite \u2014 like a validation set in machine learning.
791
597
 
792
- 1. **Git autonomy:** Cautious (ask before push/PR) or Autonomous (push + PR without asking)?
793
- 2. **Risk interview (3-5 questions, one at a time):** What could break? What services connect to prod? Unwritten rules? Off-limits files/commands? Skip if \`docs/context/\` already has content.
598
+ ## Step 2: Gather Configuration
794
599
 
795
- From answers, generate: CLAUDE.md boundary rules, \`.claude/settings.json\` deny patterns, \`docs/context/\` documents. Also recommend a permission mode (\`auto\` for most; \`dontAsk\` + allowlist for high-risk).
600
+ Ask these questions **one at a time**:
796
601
 
797
- **Tier 2 (show diff):** Add missing CLAUDE.md sections (Boundaries, Workflow, Key Files). Draft from real codebase content. Append only \u2014 never reformat existing content.
602
+ ### Question 1: Scenarios repo name
798
603
 
799
- **Tier 3 (confirm first):** Rewriting existing sections, overwriting customized files, suggesting test framework installs.
604
+ > What should we call your scenarios repo? It'll be a private repo that holds your holdout tests.
605
+ >
606
+ > Default: \`{current-repo-name}-scenarios\`
800
607
 
801
- After applying, append to \`docs/joycraft-history.md\` and show a consolidated upgrade results table.
608
+ Accept the default or the user's choice.
802
609
 
803
- ## Step 6: Show Path to Level 5
610
+ ### Question 2: GitHub App
804
611
 
805
- Show a tailored roadmap: Level 2-5 table, specific next steps based on actual gaps, and the Level 5 north star (spec queue, autofix, holdout scenarios, self-improving harness).
612
+ > Level 5 needs a GitHub App to provide a separate identity for autofix pushes (this avoids GitHub's anti-recursion protection). Creating one takes about 2 minutes:
613
+ >
614
+ > 1. Go to https://github.com/settings/apps/new
615
+ > 2. Give it a name (e.g., "My Project Autofix")
616
+ > 3. Uncheck "Webhook > Active" (not needed)
617
+ > 4. Under **Repository permissions**, set:
618
+ > - **Contents**: Read & Write
619
+ > - **Pull requests**: Read & Write
620
+ > - **Actions**: Read & Write
621
+ > 5. Click **Create GitHub App**
622
+ > 6. Note the **App ID** from the settings page
623
+ > 7. Scroll to **Private keys** > click **Generate a private key** > save the \`.pem\` file
624
+ > 8. Click **Install App** in the left sidebar > install it on your repo
625
+ >
626
+ > What's your App ID?
806
627
 
807
- ## Edge Cases
628
+ ## Step 3: Run init-autofix
808
629
 
809
- - **CLAUDE.md is just a README:** Treat as no harness.
810
- - **Non-Joycraft skills:** Acknowledge, don't replace.
811
- - **Rules under non-standard headings:** Give credit for substance.
812
- - **Previous assessment exists:** Read it first. If nothing to upgrade, say so.
813
- - **Non-Joycraft content in CLAUDE.md:** Preserve as-is. Only append.
814
- `,
815
- "joycraft-add-fact.md": `---
816
- name: joycraft-add-fact
817
- description: Capture a project fact and route it to the correct context document -- production map, dangerous assumptions, decision log, institutional knowledge, or troubleshooting
818
- instructions: 38
819
- ---
630
+ Run the CLI command with the gathered configuration:
820
631
 
821
- # Add Fact
632
+ \`\`\`bash
633
+ npx joycraft init-autofix --scenarios-repo {name} --app-id {id}
634
+ \`\`\`
822
635
 
823
- The user has a fact to capture. Your job is to classify it, route it to the correct context document, append it in the right format, and optionally add a CLAUDE.md boundary rule.
636
+ Review the output with the user. Confirm files were created.
824
637
 
825
- ## Step 1: Get the Fact
638
+ ## Step 4: Walk Through Secret Configuration
826
639
 
827
- If the user already provided the fact (e.g., \`/joycraft-add-fact the staging DB resets every Sunday\`), use it directly.
640
+ Guide the user step by step:
828
641
 
829
- If not, ask: "What fact do you want to capture?" -- then wait for their response.
642
+ ### 4a: Add Secrets to Main Repo
830
643
 
831
- If the user provides multiple facts at once, process each one separately through all the steps below, then give a combined confirmation at the end.
644
+ > You should already have the \`.pem\` file from when you created the app in Step 2.
832
645
 
833
- ## Step 2: Classify the Fact
646
+ > Go to your repo's Settings > Secrets and variables > Actions, and add:
647
+ > - \`JOYCRAFT_APP_PRIVATE_KEY\` \u2014 paste the contents of your \`.pem\` file
648
+ > - \`ANTHROPIC_API_KEY\` \u2014 your Anthropic API key
834
649
 
835
- Route the fact to one of these 5 context documents based on its content:
650
+ ### 4b: Create the Scenarios Repo
836
651
 
837
- ### \`docs/context/production-map.md\`
838
- The fact is about **infrastructure, services, environments, URLs, endpoints, credentials, or what is safe/unsafe to touch**.
839
- - Signal words: "production", "staging", "endpoint", "URL", "database", "service", "deployed", "hosted", "credentials", "secret", "environment"
840
- - Examples: "The staging DB is at postgres://staging.example.com", "We use Vercel for the frontend and Railway for the API"
652
+ > Create the private scenarios repo:
653
+ > \`\`\`bash
654
+ > gh repo create {scenarios-repo-name} --private
655
+ > \`\`\`
656
+ >
657
+ > Then copy the scenario templates into it:
658
+ > \`\`\`bash
659
+ > cp -r docs/templates/scenarios/* ../{scenarios-repo-name}/
660
+ > cd ../{scenarios-repo-name}
661
+ > git add -A && git commit -m "init: scaffold scenarios repo from Joycraft"
662
+ > git push
663
+ > \`\`\`
841
664
 
842
- ### \`docs/context/dangerous-assumptions.md\`
843
- The fact is about **something an AI agent might get wrong -- a false assumption that leads to bad outcomes**.
844
- - Signal words: "assumes", "might think", "but actually", "looks like X but is Y", "not what it seems", "trap", "gotcha"
845
- - Examples: "The \`users\` table looks like a test table but it's production", "Deleting a workspace doesn't delete the billing subscription"
665
+ ### 4c: Add Secrets to Scenarios Repo
846
666
 
847
- ### \`docs/context/decision-log.md\`
848
- The fact is about **an architectural or tooling choice and why it was made**.
849
- - Signal words: "decided", "chose", "because", "instead of", "we went with", "the reason we use", "trade-off"
850
- - Examples: "We chose SQLite over Postgres because this runs on embedded devices", "We use pnpm instead of npm for workspace support"
667
+ > The scenarios repo also needs the App private key:
668
+ > - \`JOYCRAFT_APP_PRIVATE_KEY\` \u2014 same \`.pem\` file as the main repo
669
+ > - \`ANTHROPIC_API_KEY\` \u2014 same key (needed for scenario generation)
851
670
 
852
- ### \`docs/context/institutional-knowledge.md\`
853
- The fact is about **team conventions, unwritten rules, organizational context, or who owns what**.
854
- - Signal words: "convention", "rule", "always", "never", "team", "process", "review", "approval", "owns", "responsible"
855
- - Examples: "The design team reviews all color changes", "We never deploy on Fridays", "PR titles must start with the ticket number"
671
+ ## Step 5: Verify Setup
856
672
 
857
- ### \`docs/context/troubleshooting.md\`
858
- The fact is about **diagnostic knowledge -- when X happens, do Y (or don't do Z)**.
859
- - Signal words: "when", "fails", "error", "if you see", "stuck", "broken", "fix", "workaround", "before trying", "reboot", "restart", "reset"
860
- - Examples: "If Wi-Fi disconnects during flash, wait and retry -- don't switch networks", "When tests fail with ECONNREFUSED, check if Docker is running"
673
+ Help the user verify everything is wired correctly:
861
674
 
862
- ### Ambiguous Facts
675
+ 1. **Check workflow files exist:** \`ls .github/workflows/autofix.yml .github/workflows/scenarios-dispatch.yml .github/workflows/spec-dispatch.yml .github/workflows/scenarios-rerun.yml\`
676
+ 2. **Check scenario templates were copied:** Verify the scenarios repo has \`example-scenario.test.ts\`, \`workflows/run.yml\`, \`workflows/generate.yml\`, \`prompts/scenario-agent.md\`
677
+ 3. **Check the App ID is correct** in the workflow files (not still a placeholder)
863
678
 
864
- If the fact fits multiple categories, pick the **best fit** based on the primary intent. You will mention the alternative in your confirmation message so the user can correct you.
679
+ ## Step 6: Update CLAUDE.md
865
680
 
866
- ## Step 3: Ensure the Target Document Exists
681
+ If the project's CLAUDE.md doesn't already have an "External Validation" section, add one:
867
682
 
868
- 1. If \`docs/context/\` does not exist, create the directory.
869
- 2. If the target document does not exist, create it from the template structure. Check \`docs/templates/\` for the matching template. If no template exists, use this minimal structure:
683
+ > ## External Validation
684
+ >
685
+ > This project uses holdout scenario tests in a separate private repo.
686
+ >
687
+ > ### NEVER
688
+ > - Access, read, or reference the scenarios repo
689
+ > - Mention scenario test names or contents
690
+ > - Modify the scenarios dispatch workflow to leak test information
691
+ >
692
+ > The scenarios repo is deliberately invisible to you. This is the holdout guarantee.
870
693
 
871
- For **production-map.md**:
872
- \`\`\`markdown
873
- # Production Map
694
+ ## Step 7: First Test (Optional)
874
695
 
875
- > What's real, what's staging, what's safe to touch.
696
+ If the user wants to test the loop:
876
697
 
877
- ## Services
698
+ > Want to do a quick test? Here's how:
699
+ >
700
+ > 1. Write a simple spec in \`docs/specs/\` and push to main \u2014 this triggers scenario generation
701
+ > 2. Create a PR with a small change \u2014 when CI passes, scenarios will run
702
+ > 3. Watch for the scenario test results as a PR comment
703
+ >
704
+ > Or deliberately break something in a PR to test the autofix loop.
878
705
 
879
- | Service | Environment | URL/Endpoint | Impact if Corrupted |
880
- |---------|-------------|-------------|-------------------|
881
- \`\`\`
706
+ ## Step 8: Summary
882
707
 
883
- For **dangerous-assumptions.md**:
884
- \`\`\`markdown
885
- # Dangerous Assumptions
708
+ Print a summary of what was set up:
886
709
 
887
- > Things the AI agent might assume that are wrong in this project.
710
+ > **Level 5 is live.** Here's what's running:
711
+ >
712
+ > | Trigger | What Happens |
713
+ > |---------|-------------|
714
+ > | Push specs to \`docs/specs/\` | Scenario agent writes holdout tests |
715
+ > | PR fails CI | Claude autofix attempts (up to 3x) |
716
+ > | PR passes CI | Holdout scenarios run against PR |
717
+ > | Scenarios update | Open PRs re-tested with latest scenarios |
718
+ >
719
+ > Your scenarios repo: \`{name}\`
720
+ > Your coding agent cannot see those tests. The holdout wall is intact.
888
721
 
889
- ## Assumptions
722
+ Update \`docs/joycraft-assessment.md\` if it exists \u2014 set the Level 5 score to reflect the new setup.
723
+ `,
724
+ "joycraft-interview.md": `---
725
+ name: joycraft-interview
726
+ description: Brainstorm freely about what you want to build \u2014 yap, explore ideas, and get a structured summary you can use later
727
+ instructions: 18
728
+ ---
890
729
 
891
- | Agent Might Assume | But Actually | Impact If Wrong |
892
- |-------------------|-------------|----------------|
893
- \`\`\`
730
+ # Interview \u2014 Idea Exploration
894
731
 
895
- For **decision-log.md**:
896
- \`\`\`markdown
897
- # Decision Log
732
+ You are helping the user brainstorm and explore what they want to build. This is a lightweight, low-pressure conversation \u2014 not a formal spec process. Let them yap.
898
733
 
899
- > Why choices were made, not just what was chosen.
734
+ ## How to Run the Interview
900
735
 
901
- ## Decisions
736
+ ### 1. Open the Floor
902
737
 
903
- | Date | Decision | Why | Alternatives Rejected | Revisit When |
904
- |------|----------|-----|----------------------|-------------|
905
- \`\`\`
738
+ Start with something like:
739
+ "What are you thinking about building? Just talk \u2014 I'll listen and ask questions as we go."
906
740
 
907
- For **institutional-knowledge.md**:
908
- \`\`\`markdown
909
- # Institutional Knowledge
741
+ Let the user talk freely. Do not interrupt their flow. Do not push toward structure yet.
910
742
 
911
- > Unwritten rules, team conventions, and organizational context.
743
+ ### 2. Ask Clarifying Questions
912
744
 
913
- ## Team Conventions
745
+ As they talk, weave in questions naturally \u2014 don't fire them all at once:
914
746
 
915
- - (none yet)
916
- \`\`\`
747
+ - **What problem does this solve?** Who feels the pain today?
748
+ - **What does "done" look like?** If this worked perfectly, what would a user see?
749
+ - **What are the constraints?** Time, tech, team, budget \u2014 what boxes are we in?
750
+ - **What's NOT in scope?** What's tempting but should be deferred?
751
+ - **What are the edge cases?** What could go wrong? What's the weird input?
752
+ - **What exists already?** Are we building on something or starting fresh?
917
753
 
918
- For **troubleshooting.md**:
919
- \`\`\`markdown
920
- # Troubleshooting
754
+ ### 3. Play Back Understanding
921
755
 
922
- > What to do when things go wrong for non-code reasons.
756
+ After the user has gotten their ideas out, reflect back:
757
+ "So if I'm hearing you right, you want to [summary]. The core problem is [X], and done looks like [Y]. Is that right?"
923
758
 
924
- ## Common Failures
759
+ Let them correct and refine. Iterate until they say "yes, that's it."
925
760
 
926
- | When This Happens | Do This | Don't Do This |
927
- |-------------------|---------|---------------|
928
- \`\`\`
761
+ ### 4. Write a Draft Brief
929
762
 
930
- ## Step 4: Read the Target Document
763
+ Create a draft file at \`docs/briefs/YYYY-MM-DD-topic-draft.md\`. Create the \`docs/briefs/\` directory if it doesn't exist.
931
764
 
932
- Read the target document to understand its current structure. Note:
933
- - Which section to append to
934
- - Whether it uses tables or lists
935
- - The column format if it's a table
765
+ Use this format:
936
766
 
937
- ## Step 5: Append the Fact
767
+ \`\`\`markdown
768
+ # [Topic] \u2014 Draft Brief
938
769
 
939
- Add the fact to the appropriate section of the target document. Match the existing format exactly:
770
+ > **Date:** YYYY-MM-DD
771
+ > **Status:** DRAFT
772
+ > **Origin:** /joycraft-interview session
940
773
 
941
- - **Table-based documents** (production-map, dangerous-assumptions, decision-log, troubleshooting): Add a new table row in the correct columns. Use today's date where a date column exists.
942
- - **List-based documents** (institutional-knowledge): Add a new list item (\`- \`) to the most appropriate section.
774
+ ---
943
775
 
944
- Remove any italic example rows (rows where all cells start with \`_\`) before appending, so the document transitions from template to real content. Only remove examples from the specific table you are appending to.
776
+ ## The Idea
777
+ [2-3 paragraphs capturing what the user described \u2014 their words, their framing]
945
778
 
946
- **Append only. Never modify or remove existing real content.**
779
+ ## Problem
780
+ [What pain or gap this addresses]
947
781
 
948
- ## Step 6: Evaluate CLAUDE.md Boundary Rule
782
+ ## What "Done" Looks Like
783
+ [The user's description of success \u2014 observable outcomes]
949
784
 
950
- Decide whether the fact also warrants a rule in CLAUDE.md's behavioral boundaries:
785
+ ## Constraints
786
+ - [constraint 1]
787
+ - [constraint 2]
951
788
 
952
- **Add a CLAUDE.md rule if the fact:**
953
- - Describes something that should ALWAYS or NEVER be done
954
- - Could cause real damage if violated (data loss, broken deployments, security issues)
955
- - Is a hard constraint that applies across all work, not just a one-time note
789
+ ## Open Questions
790
+ - [things that came up but weren't resolved]
791
+ - [decisions that need more thought]
956
792
 
957
- **Do NOT add a CLAUDE.md rule if the fact is:**
958
- - Purely informational (e.g., "staging DB is at this URL")
959
- - A one-time decision that's already captured
960
- - A diagnostic tip rather than a prohibition
793
+ ## Out of Scope (for now)
794
+ - [things explicitly deferred]
961
795
 
962
- If a rule is warranted, read CLAUDE.md, find the appropriate section (ALWAYS, ASK FIRST, or NEVER under Behavioral Boundaries), and append the rule. If no Behavioral Boundaries section exists, append one.
796
+ ## Raw Notes
797
+ [Any additional context, quotes, or tangents worth preserving]
798
+ \`\`\`
799
+
800
+ ### 5. Hand Off
963
801
 
964
- ## Step 7: Confirm
802
+ After writing the draft, tell the user:
965
803
 
966
- Report what you did in this format:
804
+ \`\`\`
805
+ Draft brief saved to docs/briefs/YYYY-MM-DD-topic-draft.md
967
806
 
807
+ When you're ready to move forward:
808
+ - /joycraft-new-feature \u2014 formalize this into a full Feature Brief with specs
809
+ - /joycraft-decompose \u2014 break it directly into atomic specs if scope is clear
810
+ - Or just keep brainstorming \u2014 run /joycraft-interview again anytime
968
811
  \`\`\`
969
- Added to [document name]:
970
- [summary of what was added]
971
812
 
972
- [If CLAUDE.md was also updated:]
973
- Added CLAUDE.md rule:
974
- [ALWAYS/ASK FIRST/NEVER]: [rule text]
813
+ ## Guidelines
975
814
 
976
- [If the fact was ambiguous:]
977
- Routed to [chosen doc] -- move to [alternative doc] if this is more about [alternative category description].
978
- \`\`\`
815
+ - **This is NOT /joycraft-new-feature.** Do not push toward formal briefs, decomposition tables, or atomic specs. The point is exploration.
816
+ - **Let the user lead.** Your job is to listen, clarify, and capture \u2014 not to structure or direct.
817
+ - **Mark everything as DRAFT.** The output is a starting point, not a commitment.
818
+ - **Keep it short.** The draft brief should be 1-2 pages max. Capture the essence, not every detail.
819
+ - **Multiple interviews are fine.** The user might run this several times as their thinking evolves. Each creates a new dated draft.
820
+
821
+ **Tip:** Run \`/clear\` before starting the next step. Your artifacts are saved to files \u2014 this conversation context is disposable.
979
822
  `,
980
823
  "joycraft-lockdown.md": `---
981
824
  name: joycraft-lockdown
@@ -1104,505 +947,716 @@ If the user asks you to apply the changes:
1104
947
 
1105
948
  **Never auto-apply. Always show the exact changes and wait for explicit approval.**
1106
949
  `,
1107
- "joycraft-verify.md": `---
1108
- name: joycraft-verify
1109
- description: Spawn an independent verifier subagent to check an implementation against its spec -- read-only, no code edits, structured pass/fail verdict
1110
- instructions: 30
950
+ "joycraft-new-feature.md": `---
951
+ name: joycraft-new-feature
952
+ description: Guided feature development \u2014 interview the user, produce a Feature Brief, then decompose into atomic specs
953
+ instructions: 35
1111
954
  ---
1112
955
 
1113
- # Verify Implementation Against Spec
956
+ # New Feature Workflow
1114
957
 
1115
- The user wants independent verification of an implementation. Your job is to find the relevant spec, extract its acceptance criteria and test plan, then spawn a separate verifier subagent that checks each criterion and produces a structured verdict.
958
+ You are starting a new feature. Follow this process in order. Do not skip steps.
1116
959
 
1117
- **Why a separate subagent?** Anthropic's research found that agents reliably skew positive when grading their own work. Separating the agent doing the work from the agent judging it consistently outperforms self-evaluation. The verifier gets a clean context window with no implementation bias.
960
+ ## Phase 1: Interview
1118
961
 
1119
- ## Step 1: Find the Spec
962
+ Interview the user about what they want to build. Let them talk \u2014 your job is to listen, then sharpen.
1120
963
 
1121
- If the user provided a spec path (e.g., \`/joycraft-verify docs/specs/2026-03-26-add-widget.md\`), use that path directly.
964
+ **Ask about:**
965
+ - What problem does this solve? Who is affected?
966
+ - What does "done" look like?
967
+ - Hard constraints? (business rules, tech limitations, deadlines)
968
+ - What is explicitly NOT in scope? (push hard on this)
969
+ - Edge cases or error conditions?
970
+ - What existing code/patterns should this follow?
971
+ - Testing: existing setup? framework? smoke test budget? lockdown mode desired?
1122
972
 
1123
- If no path was provided, scan \`docs/specs/\` for spec files. Pick the most recently modified \`.md\` file in that directory. If \`docs/specs/\` doesn't exist or is empty, tell the user:
973
+ **Interview technique:**
974
+ - Let the user "yap" \u2014 don't interrupt their flow
975
+ - Play back your understanding: "So if I'm hearing you right..."
976
+ - Push toward testable statements: "How would we verify that works?"
1124
977
 
1125
- > No specs found in \`docs/specs/\`. Please provide a spec path: \`/joycraft-verify path/to/spec.md\`
978
+ Keep asking until you can fill out a Feature Brief.
1126
979
 
1127
- ## Step 2: Read and Parse the Spec
980
+ ## Phase 2: Feature Brief
1128
981
 
1129
- Read the spec file and extract:
982
+ Write a Feature Brief to \`docs/briefs/YYYY-MM-DD-feature-name.md\`. Create the \`docs/briefs/\` directory if it doesn't exist.
1130
983
 
1131
- 1. **Spec name** -- from the H1 title
1132
- 2. **Acceptance Criteria** -- the checklist under the \`## Acceptance Criteria\` section
1133
- 3. **Test Plan** -- the table under the \`## Test Plan\` section, including any test commands
1134
- 4. **Constraints** -- the \`## Constraints\` section if present
984
+ **Why:** The brief is the single source of truth for what we're building. It prevents scope creep and gives every spec a shared reference point.
1135
985
 
1136
- If the spec has no Acceptance Criteria section, tell the user:
986
+ Use this structure:
1137
987
 
1138
- > This spec doesn't have an Acceptance Criteria section. Verification needs criteria to check against. Add acceptance criteria to the spec and try again.
988
+ \`\`\`markdown
989
+ # [Feature Name] \u2014 Feature Brief
1139
990
 
1140
- If the spec has no Test Plan section, note this but proceed -- the verifier can still check criteria by reading code and running any available project tests.
991
+ > **Date:** YYYY-MM-DD
992
+ > **Project:** [project name]
993
+ > **Status:** Interview | Decomposing | Specs Ready | In Progress | Complete
1141
994
 
1142
- ## Step 3: Identify Test Commands
995
+ ---
1143
996
 
1144
- Look for test commands in these locations (in priority order):
997
+ ## Vision
998
+ What are we building and why? The full picture in 2-4 paragraphs.
1145
999
 
1146
- 1. The spec's Test Plan section (look for commands in backticks or "Type" column entries like "unit", "integration", "e2e", "build")
1147
- 2. The project's CLAUDE.md (look for test/build commands in the Development Workflow section)
1148
- 3. Common defaults based on the project type:
1149
- - Node.js: \`npm test\` or \`pnpm test --run\`
1150
- - Python: \`pytest\`
1151
- - Rust: \`cargo test\`
1152
- - Go: \`go test ./...\`
1000
+ ## User Stories
1001
+ - As a [role], I want [capability] so that [benefit]
1153
1002
 
1154
- Build a list of specific commands the verifier should run.
1003
+ ## Hard Constraints
1004
+ - MUST: [constraint that every spec must respect]
1005
+ - MUST NOT: [prohibition that every spec must respect]
1155
1006
 
1156
- ## Step 4: Spawn the Verifier Subagent
1007
+ ## Out of Scope
1008
+ - NOT: [tempting but deferred]
1157
1009
 
1158
- Use Claude Code's Agent tool to spawn a subagent with the following prompt. Replace the placeholders with the actual content extracted in Steps 2-3.
1010
+ ## Test Strategy
1011
+ - **Existing setup:** [framework and tools, or "none yet"]
1012
+ - **User expertise:** [comfortable / learning / needs guidance]
1013
+ - **Test types:** [smoke, unit, integration, e2e, etc.]
1014
+ - **Smoke test budget:** [target time for fast-feedback tests]
1015
+ - **Lockdown mode:** [yes/no \u2014 constrain agent to code + tests only]
1016
+
1017
+ ## Decomposition
1018
+ | # | Spec Name | Description | Dependencies | Est. Size |
1019
+ |---|-----------|-------------|--------------|-----------|
1020
+ | 1 | [verb-object] | [one sentence] | None | [S/M/L] |
1021
+
1022
+ ## Execution Strategy
1023
+ - [ ] Sequential (specs have chain dependencies)
1024
+ - [ ] Parallel worktrees (specs are independent)
1025
+ - [ ] Mixed
1159
1026
 
1027
+ ## Success Criteria
1028
+ - [ ] [End-to-end behavior 1]
1029
+ - [ ] [No regressions in existing features]
1160
1030
  \`\`\`
1161
- You are a QA verifier. Your job is to independently verify an implementation against its spec. You have NO context about how the implementation was done -- you are checking it fresh.
1162
1031
 
1163
- RULES -- these are hard constraints, not suggestions:
1164
- - You may READ any file using the Read tool or cat
1165
- - You may RUN these specific test/build commands: [TEST_COMMANDS]
1166
- - You may NOT edit, create, or delete any files
1167
- - You may NOT run commands that modify state (no git commit, no npm install, no file writes)
1168
- - You may NOT install packages or access the network
1169
- - Report what you OBSERVE, not what you expect or hope
1032
+ If \`docs/templates/FEATURE_BRIEF_TEMPLATE.md\` exists, reference it for the full template with additional guidance.
1170
1033
 
1171
- SPEC NAME: [SPEC_NAME]
1034
+ Present the brief to the user. Focus review on:
1035
+ - "Does the decomposition match how you think about this?"
1036
+ - "Is anything in scope that shouldn't be?"
1037
+ - "Are the specs small enough? Can each be described in one sentence?"
1172
1038
 
1173
- ACCEPTANCE CRITERIA:
1174
- [ACCEPTANCE_CRITERIA]
1039
+ Iterate until approved.
1175
1040
 
1176
- TEST PLAN:
1177
- [TEST_PLAN]
1041
+ ## Phase 3: Generate Atomic Specs
1178
1042
 
1179
- CONSTRAINTS:
1180
- [CONSTRAINTS_OR_NONE]
1043
+ For each row in the decomposition table, create a self-contained spec file at \`docs/specs/<feature-name>/spec-name.md\`. Derive the feature-name from the brief filename (strip the date prefix and \`.md\` \u2014 e.g., \`2026-04-06-token-discipline.md\` \u2192 \`token-discipline\`). Create the \`docs/specs/<feature-name>/\` directory if it doesn't exist.
1181
1044
 
1182
- YOUR TASK:
1183
- For each acceptance criterion, determine if it PASSES or FAILS based on evidence:
1045
+ **Why:** Each spec must be understandable WITHOUT reading the Feature Brief. This prevents the "Curse of Instructions" \u2014 no spec should require holding the entire feature in context. Copy relevant context into each spec.
1184
1046
 
1185
- 1. Run the test commands listed above. Record the output.
1186
- 2. For each acceptance criterion:
1187
- a. Check if there is a corresponding test and whether it passes
1188
- b. If no test exists, read the relevant source files to verify the criterion is met
1189
- c. If the criterion cannot be verified by reading code or running tests, mark it MANUAL CHECK NEEDED
1190
- 3. For criteria about build/test passing, actually run the commands and report results.
1047
+ Use this structure for each spec:
1191
1048
 
1192
- OUTPUT FORMAT -- you MUST use this exact format:
1049
+ \`\`\`markdown
1050
+ # [Verb + Object] \u2014 Atomic Spec
1193
1051
 
1194
- VERIFICATION REPORT
1052
+ > **Parent Brief:** \`docs/briefs/YYYY-MM-DD-feature-name.md\`
1053
+ > **Status:** Ready
1054
+ > **Date:** YYYY-MM-DD
1055
+ > **Estimated scope:** [1 session / N files / ~N lines]
1195
1056
 
1196
- | # | Criterion | Verdict | Evidence |
1197
- |---|-----------|---------|----------|
1198
- | 1 | [criterion text] | PASS/FAIL/MANUAL CHECK NEEDED | [what you observed] |
1199
- | 2 | [criterion text] | PASS/FAIL/MANUAL CHECK NEEDED | [what you observed] |
1200
- [continue for all criteria]
1057
+ ---
1201
1058
 
1202
- SUMMARY: X/Y criteria passed. [Z failures need attention. / All criteria verified.]
1059
+ ## What
1060
+ One paragraph \u2014 what changes when this spec is done?
1203
1061
 
1204
- If any test commands fail to run (missing dependencies, wrong command, etc.), report the error as evidence for a FAIL verdict on the relevant criterion.
1062
+ ## Why
1063
+ One sentence \u2014 what breaks or is missing without this?
1064
+
1065
+ ## Acceptance Criteria
1066
+ - [ ] [Observable behavior]
1067
+ - [ ] Build passes
1068
+ - [ ] Tests pass
1069
+
1070
+ ## Test Plan
1071
+
1072
+ | Acceptance Criterion | Test | Type |
1073
+ |---------------------|------|------|
1074
+ | [Each AC above] | [What to call/assert] | [unit/integration/e2e] |
1075
+
1076
+ **Execution order:**
1077
+ 1. Write all tests above \u2014 they should fail against current/stubbed code
1078
+ 2. Run tests to confirm they fail (red)
1079
+ 3. Implement until all tests pass (green)
1080
+
1081
+ **Smoke test:** [Identify the fastest test for iteration feedback]
1082
+
1083
+ **Before implementing, verify your test harness:**
1084
+ 1. Run all tests \u2014 they must FAIL (if they pass, you're testing the wrong thing)
1085
+ 2. Each test calls your actual function/endpoint \u2014 not a reimplementation or the underlying library
1086
+ 3. Identify your smoke test \u2014 it must run in seconds, not minutes, so you get fast feedback on each change
1087
+
1088
+ ## Constraints
1089
+ - MUST: [hard requirement]
1090
+ - MUST NOT: [hard prohibition]
1091
+
1092
+ ## Affected Files
1093
+ | Action | File | What Changes |
1094
+ |--------|------|-------------|
1095
+
1096
+ ## Approach
1097
+ Strategy, data flow, key decisions. Name one rejected alternative.
1098
+
1099
+ ## Edge Cases
1100
+ | Scenario | Expected Behavior |
1101
+ |----------|------------------|
1205
1102
  \`\`\`
1206
1103
 
1207
- ## Step 5: Format and Present the Verdict
1104
+ If \`docs/templates/ATOMIC_SPEC_TEMPLATE.md\` exists, reference it for the full template with additional guidance.
1208
1105
 
1209
- Take the subagent's response and present it to the user in this format:
1106
+ ## Phase 4: Hand Off for Execution
1210
1107
 
1108
+ Tell the user:
1211
1109
  \`\`\`
1212
- ## Verification Report -- [Spec Name]
1110
+ Feature Brief and [N] atomic specs are ready.
1213
1111
 
1214
- | # | Criterion | Verdict | Evidence |
1215
- |---|-----------|---------|----------|
1216
- | 1 | ... | PASS | ... |
1217
- | 2 | ... | FAIL | ... |
1112
+ Specs:
1113
+ 1. [spec-name] \u2014 [one sentence] [S/M/L]
1114
+ 2. [spec-name] \u2014 [one sentence] [S/M/L]
1115
+ ...
1116
+
1117
+ Recommended execution:
1118
+ - [Parallel/Sequential/Mixed strategy]
1119
+ - Estimated: [N] sessions total
1120
+
1121
+ To execute: Start a fresh session per spec. Each session should:
1122
+ 1. Read the spec
1123
+ 2. Implement
1124
+ 3. Run /joycraft-session-end to capture discoveries
1125
+ 4. Commit and PR
1126
+
1127
+ Ready to start?
1128
+ \`\`\`
1129
+
1130
+ **Why:** A fresh session for execution produces better results. The interview session has too much context noise \u2014 a clean session with just the spec is more focused.
1131
+
1132
+ You can also use \`/joycraft-decompose\` to re-decompose a brief if the breakdown needs adjustment, or run \`/joycraft-interview\` first for a lighter brainstorm before committing to the full workflow.
1133
+
1134
+ **Tip:** Run \`/clear\` before starting the next step. Your artifacts are saved to files \u2014 this conversation context is disposable.
1135
+ `,
1136
+ "joycraft-optimize.md": `---
1137
+ name: joycraft-optimize
1138
+ description: Audit your Claude Code or Codex session overhead \u2014 harness file sizes, plugins, MCP servers, hooks \u2014 and report actionable recommendations
1139
+ instructions: 20
1140
+ ---
1141
+
1142
+ # Optimize \u2014 Session Overhead Audit
1143
+
1144
+ You are auditing the user's AI development session for token overhead. Produce a conversational diagnostic report \u2014 no files created.
1145
+
1146
+ ## Step 1: Detect Platform
1147
+
1148
+ Check which platform is active:
1149
+ - **Claude Code:** Look for \`.claude/\` directory, \`CLAUDE.md\`
1150
+ - **Codex:** Look for \`.agents/\` directory, \`AGENTS.md\`
1151
+
1152
+ If both exist, run both checks. If neither, default to Claude Code checks and note the uncertainty.
1153
+
1154
+ ## Step 2: Audit Harness Files
1155
+
1156
+ ### Claude Code Path
1157
+
1158
+ 1. **CLAUDE.md** \u2014 count lines. Threshold: \u2264200 lines.
1159
+ 2. **Skill files** \u2014 glob \`.claude/skills/**/*.md\`. Count lines per file. Threshold: \u2264200 lines each.
1160
+
1161
+ ### Codex Path
1162
+
1163
+ 1. **AGENTS.md** \u2014 count lines. Threshold: \u2264200 lines.
1164
+ 2. **Skill files** \u2014 glob \`.agents/skills/**/*.md\`. Count lines per file. Threshold: \u2264200 lines each.
1165
+
1166
+ ## Step 3: Audit Plugins & MCP Servers
1167
+
1168
+ ### Claude Code Path
1169
+
1170
+ 1. **Installed plugins** \u2014 read \`~/.claude/plugins/installed_plugins.json\`. List plugin names and versions. If not found, report "no plugins file found."
1171
+ 2. **Enabled plugins** \u2014 read \`~/.claude/settings.json\`, check \`enabledPlugins\` array. Show enabled vs installed count.
1172
+ 3. **MCP servers** \u2014 read \`~/.claude/settings.json\`, count entries under \`mcpServers\`. List server names.
1173
+
1174
+ ### Codex Path
1175
+
1176
+ 1. **Plugin config** \u2014 read \`~/.codex/config.toml\`. List any plugin toggles. Note: Codex syncs its curated plugin marketplace at startup \u2014 this is a boot cost even if you don't use them.
1177
+ 2. **MCP servers** \u2014 check \`~/.codex/config.toml\` for MCP server entries. List server names.
1178
+
1179
+ ## Step 4: Audit Hooks (Claude Code Only)
1180
+
1181
+ Read \`.claude/settings.json\` in the project directory. List all hook definitions under the \`hooks\` key \u2014 show the event name and command for each.
1218
1182
 
1219
- **Overall: X/Y criteria passed.**
1183
+ For Codex: note "hook auditing not yet supported on Codex."
1220
1184
 
1221
- [If all passed:]
1222
- All criteria verified. Ready to commit and open a PR.
1185
+ ## Step 5: Report
1223
1186
 
1224
- [If any failed:]
1225
- N failures need attention. Review the evidence above and fix before proceeding.
1187
+ Organize findings by category. Use pass/warn indicators:
1226
1188
 
1227
- [If any MANUAL CHECK NEEDED:]
1228
- N criteria need manual verification -- they can't be checked by reading code or running tests alone.
1229
1189
  \`\`\`
1190
+ ## Session Overhead Report
1230
1191
 
1231
- ## Step 6: Suggest Next Steps
1192
+ ### Harness Files
1193
+ - CLAUDE.md: [N] lines [PASS \u2264200 / WARN >200]
1194
+ - Skills: [N] files, [list any over 200 lines]
1232
1195
 
1233
- Based on the verdict:
1196
+ ### Plugins
1197
+ - Installed: [N] ([list names])
1198
+ - Enabled: [N] of [M] installed
1199
+ - [If 0: "No plugins \u2014 zero boot cost from plugins."]
1234
1200
 
1235
- - **All PASS:** Suggest committing and opening a PR, or running \`/joycraft-session-end\` to capture discoveries.
1236
- - **Some FAIL:** List the failed criteria and suggest the user fix them, then run \`/joycraft-verify\` again.
1237
- - **MANUAL CHECK NEEDED items:** Explain what needs human eyes and why automation couldn't verify it.
1201
+ ### MCP Servers
1202
+ - Count: [N] ([list names])
1203
+ - [If 0: "No MCP servers \u2014 zero boot cost from servers."]
1238
1204
 
1239
- **Do NOT offer to fix failures yourself.** The verifier reports; the human (or implementation agent in a separate turn) decides what to do. This separation is the whole point.
1205
+ ### Hooks
1206
+ - [N] hook definitions ([list event names])
1207
+
1208
+ ### Recommendations
1209
+ - [Specific, actionable items for anything over threshold]
1210
+ - [e.g., "CLAUDE.md is 312 lines \u2014 consider splitting reference sections into docs/"]
1211
+ - [e.g., "3 MCP servers load at boot \u2014 disable unused ones in settings.json"]
1212
+ \`\`\`
1213
+
1214
+ ## Step 6: Further Resources
1215
+
1216
+ End with:
1217
+
1218
+ > For deeper token optimization, see:
1219
+ > - [Nate B Jones's token optimization techniques](https://www.youtube.com/watch?v=bDcgHzCBgmQ)
1220
+ > - [OB1 repo](https://github.com/nate-b-j/OB1) \u2014 Heavy File Ingestion skill and stupid button prompt kit
1221
+ > - [Joycraft's token discipline guide](docs/guides/token-discipline.md)
1240
1222
 
1241
1223
  ## Edge Cases
1242
1224
 
1243
1225
  | Scenario | Behavior |
1244
1226
  |----------|----------|
1245
- | Spec has no Test Plan | Warn that verification is weaker without a test plan, but proceed by checking criteria through code reading and any available project-level tests |
1246
- | All tests pass but a criterion is not testable | Mark as MANUAL CHECK NEEDED with explanation |
1247
- | Subagent can't run tests (missing deps) | Report the error as FAIL evidence |
1248
- | No specs found and no path given | Tell user to provide a spec path or create a spec first |
1249
- | Spec status is "Complete" | Still run verification -- "Complete" means the implementer thinks it's done, verification confirms |
1227
+ | Config files don't exist | Report "not found" for that check, don't error |
1228
+ | No plugins installed | Report 0 plugins \u2014 this is good, say so |
1229
+ | CLAUDE.md/AGENTS.md exactly 200 lines | PASS \u2014 threshold is \u2264200 |
1230
+ | \`~/.claude/\` or \`~/.codex/\` not accessible | Skip user-level checks, note limitation |
1231
+ | Both platforms detected | Run both audits, report separately |
1250
1232
  `,
1251
- "joycraft-bugfix.md": `---
1252
- name: joycraft-bugfix
1253
- description: Structured bug fix workflow \u2014 triage, diagnose, discuss with user, write a focused spec, hand off for implementation
1254
- instructions: 32
1233
+ "joycraft-research.md": `---
1234
+ name: joycraft-research
1235
+ description: Produce objective codebase research by isolating question generation from fact-gathering \u2014 subagent sees only questions, never the brief
1255
1236
  ---
1256
1237
 
1257
- # Bug Fix Workflow
1238
+ # Research Codebase for a Feature
1258
1239
 
1259
- You are fixing a bug. Follow this process in order. Do not skip steps.
1240
+ You are producing objective codebase research to inform a future spec or implementation. The key insight: the researching agent must never see the brief or ticket \u2014 only research questions. This prevents opinions from contaminating the facts.
1260
1241
 
1261
- **Guard clause:** If this is clearly a new feature, redirect to \`/joycraft-new-feature\` and stop.
1242
+ **Guard clause:** If the user doesn't provide a brief path or inline description, ask:
1243
+ "What feature or change are you researching? Provide a brief path (e.g., \`docs/briefs/2026-03-30-my-feature.md\`) or describe it in a few sentences."
1262
1244
 
1263
1245
  ---
1264
1246
 
1265
- ## Phase 1: Triage
1247
+ ## Phase 1: Generate Research Questions
1266
1248
 
1267
- Establish what's broken. Gather: symptom, steps to reproduce, expected vs actual behavior, when it started, relevant logs/errors. If an error message or stack trace is provided, read the referenced files immediately. Try to reproduce if steps are given.
1249
+ Read the brief file (if a path was provided) or use the user's inline description.
1268
1250
 
1269
- **Done when:** You can describe the symptom in one sentence.
1251
+ Identify which zones of the codebase are relevant to this feature. Then generate 5-10 research questions that are:
1270
1252
 
1271
- ---
1253
+ - **Objective and fact-seeking** \u2014 "How does X work?" not "How should we build X?"
1254
+ - **Specific to the codebase** \u2014 reference concrete systems, files, or flows
1255
+ - **Answerable by reading code** \u2014 no questions about business strategy or user preferences
1272
1256
 
1273
- ## Phase 2: Diagnose
1257
+ Good examples:
1258
+ - "How does endpoint registration work in the current router?"
1259
+ - "What patterns exist for input validation across existing handlers?"
1260
+ - "Trace the data flow from API request to database write for entity X."
1261
+ - "What test infrastructure exists? Where are fixtures, mocks, and helpers?"
1262
+ - "What dependencies does module Y import, and what does its public API look like?"
1274
1263
 
1275
- Find the root cause. Start from the error site and trace backward. Read source files \u2014 don't guess. Identify the specific line(s) and logic error. Check git blame if it's a recent regression.
1264
+ Bad examples (do NOT generate these):
1265
+ - "What's the best way to implement this feature?" (opinion)
1266
+ - "Should we use library X or Y?" (recommendation)
1267
+ - "What would a good architecture look like?" (design, not research)
1276
1268
 
1277
- **Done when:** You can explain what's wrong, why, and where in 2-3 sentences.
1269
+ Write the questions to a temporary file at \`docs/research/.questions-tmp.md\`. Create the \`docs/research/\` directory if it doesn't exist.
1270
+
1271
+ **Do NOT include any content from the brief in this file \u2014 only the questions.**
1278
1272
 
1279
1273
  ---
1280
1274
 
1281
- ## Phase 3: Discuss
1275
+ ## Phase 2: Spawn Research Subagent
1282
1276
 
1283
- Present findings to the user BEFORE writing any code or spec:
1284
- 1. **Symptom** \u2014 confirm it matches what they see
1285
- 2. **Root cause** \u2014 specific file(s) and line(s)
1286
- 3. **Proposed fix** \u2014 what changes, where
1287
- 4. **Risk** \u2014 side effects? scope?
1277
+ Use Claude Code's Agent tool to spawn a subagent. Pass ONLY the research questions \u2014 never the brief path, brief content, or feature description.
1288
1278
 
1289
- Ask: "Does this match? Comfortable with this approach?" If large/risky, suggest decomposing into multiple specs.
1279
+ Build the subagent prompt by reading the questions file you just wrote, then use this template:
1290
1280
 
1291
- **Done when:** User agrees with the diagnosis and fix direction.
1281
+ \`\`\`
1282
+ You are researching a codebase to answer specific questions. You have NO context about why these questions are being asked \u2014 you are simply gathering facts.
1283
+
1284
+ RULES \u2014 these are hard constraints:
1285
+ - Answer each question with FACTS ONLY: file paths, function signatures, data flows, patterns, dependencies
1286
+ - Do NOT recommend, suggest, or opine on anything
1287
+ - Do NOT speculate about what should be built or how
1288
+ - If a question cannot be answered (no relevant code exists), say "No existing code found for this"
1289
+ - Use the Read tool and Grep tool to explore the codebase thoroughly
1290
+ - Include code snippets only when they are essential evidence (e.g., a function signature, a config block)
1291
+
1292
+ QUESTIONS:
1293
+ [INSERT_QUESTIONS_HERE]
1294
+
1295
+ OUTPUT FORMAT \u2014 write your findings as a single markdown document using this structure:
1296
+
1297
+ # Codebase Research
1298
+
1299
+ **Date:** [today's date]
1300
+ **Questions answered:** [N/total]
1292
1301
 
1293
1302
  ---
1294
1303
 
1295
- ## Phase 4: Spec the Fix
1304
+ ## Q1: [question text]
1296
1305
 
1297
- Write a bug fix spec to \`docs/specs/YYYY-MM-DD-bugfix-name.md\`. Create the \`docs/specs/\` directory if it doesn't exist.
1306
+ [Facts, file paths, function signatures, data flows. No opinions.]
1298
1307
 
1299
- **Why:** Even bug fixes deserve a spec. It forces clarity on what "fixed" means, ensures test-first discipline, and creates a traceable record of the fix.
1308
+ ## Q2: [question text]
1300
1309
 
1301
- Use this template:
1310
+ [Facts, file paths, function signatures, data flows. No opinions.]
1302
1311
 
1303
- \`\`\`markdown
1304
- # Fix [Bug Description] \u2014 Bug Fix Spec
1312
+ [Continue for all questions]
1313
+ \`\`\`
1305
1314
 
1306
- > **Parent Brief:** none (bug fix)
1307
- > **Issue/Error:** [error message, issue link, or symptom description]
1308
- > **Status:** Ready
1309
- > **Date:** YYYY-MM-DD
1310
- > **Estimated scope:** [1 session / N files / ~N lines]
1315
+ ## Phase 3: Write the Research Document
1316
+
1317
+ Take the subagent's response and write it to \`docs/research/YYYY-MM-DD-feature-name.md\`. Derive the feature name from the brief filename or the user's description (lowercase, hyphenated).
1318
+
1319
+ Delete the temporary questions file (\`docs/research/.questions-tmp.md\`).
1320
+
1321
+ Present the research document path to the user:
1322
+
1323
+ \`\`\`
1324
+ Research complete: docs/research/YYYY-MM-DD-feature-name.md
1325
+
1326
+ This document contains objective facts about your codebase \u2014 no opinions or recommendations.
1327
+
1328
+ Next steps:
1329
+ - /joycraft-decompose \u2014 break the feature into atomic specs (research will inform the specs)
1330
+ - /joycraft-new-feature \u2014 formalize into a full Feature Brief first
1331
+ - Read the research and add any corrections or missing context manually
1332
+ \`\`\`
1311
1333
 
1334
+ ## Edge Cases
1335
+
1336
+ | Scenario | Behavior |
1337
+ |----------|----------|
1338
+ | No brief provided | Accept inline description, generate questions from that |
1339
+ | Codebase is empty or new | Research doc reports "no existing patterns found" per question |
1340
+ | User runs research twice for same feature | Overwrites previous research doc (same filename) |
1341
+ | Brief is very short (1-2 sentences) | Still generate questions \u2014 even simple features benefit from understanding existing patterns |
1342
+ | \`docs/research/\` doesn't exist | Create it |
1343
+ `,
1344
+ "joycraft-session-end.md": `---
1345
+ name: joycraft-session-end
1346
+ description: Wrap up a session \u2014 capture discoveries, verify, prepare for PR or next session
1347
+ instructions: 22
1312
1348
  ---
1313
1349
 
1314
- ## Bug
1350
+ # Session Wrap-Up
1315
1351
 
1316
- What is broken? Describe the symptom the user experiences.
1352
+ Before ending this session, complete these steps in order.
1317
1353
 
1318
- ## Root Cause
1354
+ ## 1. Capture Discoveries
1319
1355
 
1320
- What is wrong in the code and why? Name the specific file(s) and line(s).
1356
+ **Why:** Discoveries are the surprises \u2014 things that weren't in the spec or that contradicted expectations. They prevent future sessions from hitting the same walls.
1321
1357
 
1322
- ## Fix
1358
+ Check: did anything surprising happen during this session? If yes, create or update a discovery file at \`docs/discoveries/YYYY-MM-DD-topic.md\`. Create the \`docs/discoveries/\` directory if it doesn't exist.
1323
1359
 
1324
- What changes will fix this? Be specific \u2014 describe the code change, not just "fix the bug."
1360
+ Only capture what's NOT obvious from the code or git diff:
1361
+ - "We thought X but found Y" \u2014 assumptions that were wrong
1362
+ - "This API/library behaves differently than documented" \u2014 external gotchas
1363
+ - "This edge case needs handling in a future spec" \u2014 deferred work with context
1364
+ - "The approach in the spec didn't work because..." \u2014 spec-vs-reality gaps
1365
+ - Key decisions made during implementation that aren't in the spec
1325
1366
 
1326
- ## Acceptance Criteria
1367
+ **Do NOT capture:**
1368
+ - Files changed (that's the diff)
1369
+ - What you set out to do (that's the spec)
1370
+ - Step-by-step narrative of the session (nobody re-reads these)
1327
1371
 
1328
- - [ ] [The bug no longer occurs \u2014 describe the correct behavior]
1329
- - [ ] [No regressions in related functionality]
1330
- - [ ] Build passes
1331
- - [ ] Tests pass
1372
+ Use this format:
1332
1373
 
1333
- ## Test Plan
1374
+ \`\`\`markdown
1375
+ # Discoveries \u2014 [topic]
1334
1376
 
1335
- | Acceptance Criterion | Test | Type |
1336
- |---------------------|------|------|
1337
- | [Bug no longer occurs] | [Test that reproduces the bug, then verifies the fix] | [unit/integration/e2e] |
1338
- | [No regressions] | [Existing tests still pass, or new regression test] | [unit/integration] |
1377
+ **Date:** YYYY-MM-DD
1378
+ **Spec:** [link to spec if applicable]
1339
1379
 
1340
- **Execution order:**
1341
- 1. Write a test that reproduces the bug \u2014 it should FAIL (red)
1342
- 2. Run the test to confirm it fails
1343
- 3. Apply the fix
1344
- 4. Run the test to confirm it passes (green)
1345
- 5. Run the full test suite to check for regressions
1380
+ ## [Discovery title]
1381
+ **Expected:** [what we thought would happen]
1382
+ **Actual:** [what actually happened]
1383
+ **Impact:** [what this means for future work]
1384
+ \`\`\`
1346
1385
 
1347
- **Smoke test:** [The bug reproduction test \u2014 fastest way to verify the fix works]
1386
+ If nothing surprising happened, skip the discovery file entirely. No discovery is a good sign \u2014 the spec was accurate.
1348
1387
 
1349
- **Before implementing, verify your test harness:**
1350
- 1. Run the reproduction test \u2014 it must FAIL (if it passes, you're not testing the actual bug)
1351
- 2. The test must exercise your actual code \u2014 not a reimplementation or mock
1352
- 3. Identify your smoke test \u2014 it must run in seconds, not minutes
1388
+ ## 1b. Update Context Documents
1353
1389
 
1354
- ## Constraints
1390
+ If \`docs/context/\` exists, quickly check whether this session revealed anything about:
1355
1391
 
1356
- - MUST: [any hard requirements for the fix]
1357
- - MUST NOT: [any prohibitions \u2014 e.g., don't change the public API]
1392
+ - **Production risks** \u2014 did you interact with or learn about production vs staging systems? \u2192 Update \`docs/context/production-map.md\`
1393
+ - **Wrong assumptions** \u2014 did the agent (or you) assume something that turned out to be false? \u2192 Update \`docs/context/dangerous-assumptions.md\`
1394
+ - **Key decisions** \u2014 did you make an architectural or tooling choice? \u2192 Add a row to \`docs/context/decision-log.md\`
1395
+ - **Unwritten rules** \u2014 did you discover a convention or constraint not documented anywhere? \u2192 Update \`docs/context/institutional-knowledge.md\`
1358
1396
 
1359
- ## Affected Files
1397
+ Skip this if nothing applies. Don't force it \u2014 only update when there's genuine new context.
1360
1398
 
1361
- | Action | File | What Changes |
1362
- |--------|------|-------------|
1399
+ ## 2. Run Validation
1363
1400
 
1364
- ## Edge Cases
1401
+ Run the project's validation commands. Check CLAUDE.md for project-specific commands. Common checks:
1365
1402
 
1366
- | Scenario | Expected Behavior |
1367
- |----------|------------------|
1368
- \`\`\`
1403
+ - Type-check (e.g., \`tsc --noEmit\`, \`mypy\`, \`cargo check\`)
1404
+ - Tests (e.g., \`npm test\`, \`pytest\`, \`cargo test\`)
1405
+ - Lint (e.g., \`eslint\`, \`ruff\`, \`clippy\`)
1369
1406
 
1370
- **For trivial bugs:** The spec will be short. That's fine \u2014 the structure is the point, not the length.
1407
+ Fix any failures before proceeding.
1371
1408
 
1372
- **For large bugs that span multiple files/systems:** Consider whether this should be decomposed into multiple specs. If so, create a brief first using \`/joycraft-new-feature\`, then decompose. A bug fix spec should be implementable in a single session.
1409
+ ## 3. Update Spec Status
1373
1410
 
1374
- ---
1411
+ If working from an atomic spec in \`docs/specs/\` (scan recursively \u2014 specs may be in subdirectories like \`docs/specs/<feature-name>/\`):
1412
+ - All acceptance criteria met \u2014 update status to \`Complete\`
1413
+ - Partially done \u2014 update status to \`In Progress\`, note what's left
1375
1414
 
1376
- ## Phase 5: Hand Off
1415
+ If working from a Feature Brief in \`docs/briefs/\`, check off completed specs in the decomposition table.
1377
1416
 
1378
- Tell the user:
1417
+ ## 4. Commit
1379
1418
 
1380
- \`\`\`
1381
- Bug fix spec is ready: docs/specs/YYYY-MM-DD-bugfix-name.md
1419
+ Commit all changes including the discovery file (if created) and spec status updates. The commit message should reference the spec if applicable.
1382
1420
 
1383
- Summary:
1384
- - Bug: [one sentence]
1385
- - Root cause: [one sentence]
1386
- - Fix: [one sentence]
1387
- - Estimated: 1 session
1421
+ ## 5. Push and PR (if autonomous git is enabled)
1388
1422
 
1389
- To execute: Start a fresh session and:
1390
- 1. Read the spec
1391
- 2. Write the reproduction test (must fail)
1392
- 3. Apply the fix (test must pass)
1393
- 4. Run full test suite
1394
- 5. Run /joycraft-session-end to capture discoveries
1395
- 6. Commit and PR
1423
+ **Check CLAUDE.md for "Git Autonomy" in the Behavioral Boundaries section.** If it says "STRICTLY ENFORCED" or the ALWAYS section includes "Push to feature branches immediately after every commit":
1396
1424
 
1397
- Ready to start?
1425
+ 1. **Push immediately.** Run \`git push origin <branch>\` \u2014 do not ask, do not hesitate.
1426
+ 2. **Open a PR if the feature is complete.** Check the parent Feature Brief's decomposition table \u2014 if all specs are done, run \`gh pr create\` with a summary of all completed specs. Do not ask first.
1427
+ 3. **If not all specs are done,** still push. The PR comes when the last spec is complete.
1428
+
1429
+ If CLAUDE.md does NOT have autonomous git rules (or has "ASK FIRST" for pushing), ask the user before pushing.
1430
+
1431
+ ## 6. Report
1432
+
1433
+ \`\`\`
1434
+ Session complete.
1435
+ - Spec: [spec name] \u2014 [Complete / In Progress]
1436
+ - Build: [passing / failing]
1437
+ - Discoveries: [N items / none]
1438
+ - Pushed: [yes / no \u2014 and why not]
1439
+ - PR: [opened #N / not yet \u2014 N specs remaining]
1440
+ - Next: [what the next session should tackle]
1398
1441
  \`\`\`
1399
1442
 
1400
- **Why:** A fresh session for implementation produces better results. This diagnostic session has context noise from exploration \u2014 a clean session with just the spec is more focused.
1443
+ **Tip:** Run \`/clear\` before starting the next step. Your artifacts are saved to files \u2014 this conversation context is disposable.
1401
1444
  `,
1402
- "joycraft-design.md": `---
1403
- name: joycraft-design
1404
- description: Design discussion before decomposition \u2014 produce a ~200-line design artifact for human review, catching wrong assumptions before they propagate into specs
1445
+ "joycraft-tune.md": `---
1446
+ name: joycraft-tune
1447
+ description: Assess and upgrade your project's AI development harness \u2014 score 7 dimensions, apply fixes, show path to Level 5
1448
+ instructions: 15
1405
1449
  ---
1406
1450
 
1407
- # Design Discussion
1408
-
1409
- You are producing a design discussion document for a feature. This sits between research and decomposition \u2014 it captures your understanding so the human can catch wrong assumptions before specs are written.
1410
-
1411
- **Guard clause:** If no brief path is provided and no brief exists in \`docs/briefs/\`, say:
1412
- "No feature brief found. Run \`/joycraft-new-feature\` first to create one, or provide the path to your brief."
1413
- Then stop.
1451
+ # Tune \u2014 Project Harness Assessment & Upgrade
1414
1452
 
1415
- ---
1453
+ You are evaluating and upgrading this project's AI development harness.
1416
1454
 
1417
- ## Step 1: Read Inputs
1455
+ ## Step 1: Detect Harness State
1418
1456
 
1419
- Read the feature brief at the path the user provides. If the user also provides a research document path, read that too. Research is optional \u2014 if none exists, note that you'll explore the codebase directly.
1457
+ Check for: CLAUDE.md (with meaningful content), \`docs/specs/\`, \`docs/briefs/\`, \`docs/discoveries/\`, \`.claude/skills/\`, and test configuration.
1420
1458
 
1421
- ## Step 2: Explore the Codebase
1459
+ ## Step 2: Route
1422
1460
 
1423
- Spawn subagents to explore the codebase for patterns relevant to the brief. Focus on:
1461
+ - **No harness** (no CLAUDE.md or just a README): Recommend \`npx joycraft init\` and stop.
1462
+ - **Harness exists**: Continue to assessment.
1424
1463
 
1425
- - Files and functions that will be touched or extended
1426
- - Existing patterns this feature should follow (naming, data flow, error handling)
1427
- - Similar features already implemented that serve as models
1428
- - Boundaries and interfaces the feature must integrate with
1464
+ ## Step 3: Assess \u2014 Score 7 Dimensions (1-5 scale)
1429
1465
 
1430
- Gather file paths, function signatures, and code snippets. You need concrete evidence, not guesses.
1466
+ Read CLAUDE.md and explore the project. Score each with specific evidence:
1431
1467
 
1432
- ## Step 3: Write the Design Document
1468
+ | Dimension | What to Check |
1469
+ |-----------|--------------|
1470
+ | Spec Quality | \`docs/specs/\` (scan recursively) \u2014 structured? acceptance criteria? self-contained? |
1471
+ | Spec Granularity | Can each spec be done in one session? |
1472
+ | Behavioral Boundaries | ALWAYS/ASK FIRST/NEVER sections (or equivalent rules under any heading) |
1473
+ | Skills & Hooks | \`.claude/skills/\` files, hooks config |
1474
+ | Documentation | \`docs/\` structure, templates, referenced from CLAUDE.md |
1475
+ | Knowledge Capture | \`docs/discoveries/\`, \`docs/context/*.md\` \u2014 existence AND real content |
1476
+ | Testing & Validation | Test framework, CI pipeline, validation commands in CLAUDE.md |
1433
1477
 
1434
- Create \`docs/designs/\` directory if it doesn't exist. Write the design document to \`docs/designs/YYYY-MM-DD-feature-name.md\`.
1478
+ Score 1 = absent, 3 = partially there, 5 = comprehensive. Give credit for substance over format.
1435
1479
 
1436
- The document has exactly five sections:
1480
+ ## Step 4: Write Assessment
1437
1481
 
1438
- ### Section 1: Current State
1482
+ Write to \`docs/joycraft-assessment.md\` AND display it. Include: scores table, detailed findings (evidence + gap + recommendation per dimension), and an upgrade plan (up to 5 actions ordered by impact).
1439
1483
 
1440
- What exists today in the codebase that is relevant to this feature. Include file paths, function signatures, and data flows. Be specific \u2014 reference actual code, not abstractions. If no research doc was provided, note that and describe what you found through direct exploration.
1484
+ ## Step 5: Apply Upgrades
1441
1485
 
1442
- ### Section 2: Desired End State
1486
+ Apply using three tiers \u2014 do NOT ask per-item permission:
1443
1487
 
1444
- What the codebase should look like when this feature is complete. Describe the change at a high level \u2014 new files, modified interfaces, new data flows. Do NOT include implementation steps. This is the "what," not the "how."
1488
+ **Tier 1 (silent):** Create missing dirs, install missing skills, copy missing templates, create AGENTS.md.
1445
1489
 
1446
- ### Section 3: Patterns to Follow
1490
+ **Before Tier 2, ask TWO things:**
1447
1491
 
1448
- Existing patterns in the codebase that this feature should match. Include short code snippets and \`file:line\` references. Show the pattern, don't just name it.
1492
+ 1. **Git autonomy:** Cautious (ask before push/PR) or Autonomous (push + PR without asking)?
1493
+ 2. **Risk interview (3-5 questions, one at a time):** What could break? What services connect to prod? Unwritten rules? Off-limits files/commands? Skip if \`docs/context/\` already has content.
1449
1494
 
1450
- If this is a greenfield project with no existing patterns, propose conventions and note that no precedent exists.
1495
+ From answers, generate: CLAUDE.md boundary rules, \`.claude/settings.json\` deny patterns, \`docs/context/\` documents. Also recommend a permission mode (\`auto\` for most; \`dontAsk\` + allowlist for high-risk).
1451
1496
 
1452
- ### Section 4: Resolved Design Decisions
1497
+ **Tier 2 (show diff):** Add missing CLAUDE.md sections (Boundaries, Workflow, Key Files). Draft from real codebase content. Append only \u2014 never reformat existing content.
1453
1498
 
1454
- Decisions you have already made, with brief rationale. Format each as:
1499
+ **Tier 3 (confirm first):** Rewriting existing sections, overwriting customized files, suggesting test framework installs.
1455
1500
 
1456
- > **Decision:** [what you decided]
1457
- > **Rationale:** [why, referencing existing code or constraints]
1458
- > **Alternative rejected:** [what you considered and why you rejected it]
1501
+ After applying, append to \`docs/joycraft-history.md\` and show a consolidated upgrade results table.
1459
1502
 
1460
- ### Section 5: Open Questions
1503
+ ## Step 6: Show Path to Level 5
1461
1504
 
1462
- Things you don't know or where multiple valid approaches exist. Each question MUST present 2-3 concrete options with pros and cons. Format:
1505
+ Show a tailored roadmap: Level 2-5 table, specific next steps based on actual gaps, and the Level 5 north star (spec queue, autofix, holdout scenarios, self-improving harness).
1463
1506
 
1464
- > **Q: [question]**
1465
- > - **Option A:** [description] \u2014 Pro: [benefit]. Con: [cost].
1466
- > - **Option B:** [description] \u2014 Pro: [benefit]. Con: [cost].
1467
- > - **Option C (if applicable):** [description] \u2014 Pro: [benefit]. Con: [cost].
1507
+ ## Edge Cases
1468
1508
 
1469
- Do NOT ask vague questions like "what do you think?" Every question must have actionable options the human can choose from.
1509
+ - **CLAUDE.md is just a README:** Treat as no harness.
1510
+ - **Non-Joycraft skills:** Acknowledge, don't replace.
1511
+ - **Rules under non-standard headings:** Give credit for substance.
1512
+ - **Previous assessment exists:** Read it first. If nothing to upgrade, say so.
1513
+ - **Non-Joycraft content in CLAUDE.md:** Preserve as-is. Only append.
1470
1514
 
1471
- ## Step 4: Present and STOP
1515
+ **Tip:** Run \`/joycraft-optimize\` to audit your session's token overhead \u2014 plugins, MCP servers, and harness file sizes.
1516
+ `,
1517
+ "joycraft-verify.md": `---
1518
+ name: joycraft-verify
1519
+ description: Spawn an independent verifier subagent to check an implementation against its spec -- read-only, no code edits, structured pass/fail verdict
1520
+ instructions: 30
1521
+ ---
1472
1522
 
1473
- Present the design document to the user. Say:
1523
+ # Verify Implementation Against Spec
1474
1524
 
1475
- \`\`\`
1476
- Design discussion written to docs/designs/YYYY-MM-DD-feature-name.md
1525
+ The user wants independent verification of an implementation. Your job is to find the relevant spec, extract its acceptance criteria and test plan, then spawn a separate verifier subagent that checks each criterion and produces a structured verdict.
1477
1526
 
1478
- Please review the document above. Specifically:
1479
- 1. Are the patterns in Section 3 the right ones to follow, or should I use different ones?
1480
- 2. Do you agree with the resolved decisions in Section 4?
1481
- 3. Pick an option for each open question in Section 5 (or propose your own).
1527
+ **Why a separate subagent?** Anthropic's research found that agents reliably skew positive when grading their own work. Separating the agent doing the work from the agent judging it consistently outperforms self-evaluation. The verifier gets a clean context window with no implementation bias.
1482
1528
 
1483
- Reply with your feedback. I will NOT proceed to decomposition until you have reviewed and approved this design.
1484
- \`\`\`
1529
+ ## Step 1: Find the Spec
1485
1530
 
1486
- **CRITICAL: Do NOT proceed to \`/joycraft-decompose\` or generate specs.** Wait for the human to review, answer open questions, and correct any wrong assumptions. The entire value of this skill is the pause \u2014 it forces a human checkpoint before mistakes propagate.
1531
+ If the user provided a spec path (e.g., \`/joycraft-verify docs/specs/my-feature/add-widget.md\`), use that path directly.
1487
1532
 
1488
- ## After Human Review
1533
+ If no path was provided, scan \`docs/specs/\` recursively for spec files (they may be in subdirectories like \`docs/specs/<feature-name>/\`). Pick the most recently modified \`.md\` file. If \`docs/specs/\` doesn't exist or is empty, tell the user:
1489
1534
 
1490
- Once the human responds:
1491
- - Update the design document with their corrections and chosen options
1492
- - Move answered questions from "Open Questions" to "Resolved Design Decisions"
1493
- - Present the updated document for final confirmation
1494
- - Only after explicit approval, tell the user: "Design approved. Run \`/joycraft-decompose\` with this brief to generate atomic specs."
1495
- `,
1496
- "joycraft-research.md": `---
1497
- name: joycraft-research
1498
- description: Produce objective codebase research by isolating question generation from fact-gathering \u2014 subagent sees only questions, never the brief
1499
- ---
1535
+ > No specs found in \`docs/specs/\`. Please provide a spec path: \`/joycraft-verify path/to/spec.md\`
1500
1536
 
1501
- # Research Codebase for a Feature
1537
+ ## Step 2: Read and Parse the Spec
1502
1538
 
1503
- You are producing objective codebase research to inform a future spec or implementation. The key insight: the researching agent must never see the brief or ticket \u2014 only research questions. This prevents opinions from contaminating the facts.
1539
+ Read the spec file and extract:
1504
1540
 
1505
- **Guard clause:** If the user doesn't provide a brief path or inline description, ask:
1506
- "What feature or change are you researching? Provide a brief path (e.g., \`docs/briefs/2026-03-30-my-feature.md\`) or describe it in a few sentences."
1541
+ 1. **Spec name** -- from the H1 title
1542
+ 2. **Acceptance Criteria** -- the checklist under the \`## Acceptance Criteria\` section
1543
+ 3. **Test Plan** -- the table under the \`## Test Plan\` section, including any test commands
1544
+ 4. **Constraints** -- the \`## Constraints\` section if present
1507
1545
 
1508
- ---
1546
+ If the spec has no Acceptance Criteria section, tell the user:
1509
1547
 
1510
- ## Phase 1: Generate Research Questions
1548
+ > This spec doesn't have an Acceptance Criteria section. Verification needs criteria to check against. Add acceptance criteria to the spec and try again.
1511
1549
 
1512
- Read the brief file (if a path was provided) or use the user's inline description.
1550
+ If the spec has no Test Plan section, note this but proceed -- the verifier can still check criteria by reading code and running any available project tests.
1513
1551
 
1514
- Identify which zones of the codebase are relevant to this feature. Then generate 5-10 research questions that are:
1552
+ ## Step 3: Identify Test Commands
1515
1553
 
1516
- - **Objective and fact-seeking** \u2014 "How does X work?" not "How should we build X?"
1517
- - **Specific to the codebase** \u2014 reference concrete systems, files, or flows
1518
- - **Answerable by reading code** \u2014 no questions about business strategy or user preferences
1554
+ Look for test commands in these locations (in priority order):
1519
1555
 
1520
- Good examples:
1521
- - "How does endpoint registration work in the current router?"
1522
- - "What patterns exist for input validation across existing handlers?"
1523
- - "Trace the data flow from API request to database write for entity X."
1524
- - "What test infrastructure exists? Where are fixtures, mocks, and helpers?"
1525
- - "What dependencies does module Y import, and what does its public API look like?"
1556
+ 1. The spec's Test Plan section (look for commands in backticks or "Type" column entries like "unit", "integration", "e2e", "build")
1557
+ 2. The project's CLAUDE.md (look for test/build commands in the Development Workflow section)
1558
+ 3. Common defaults based on the project type:
1559
+ - Node.js: \`npm test\` or \`pnpm test --run\`
1560
+ - Python: \`pytest\`
1561
+ - Rust: \`cargo test\`
1562
+ - Go: \`go test ./...\`
1526
1563
 
1527
- Bad examples (do NOT generate these):
1528
- - "What's the best way to implement this feature?" (opinion)
1529
- - "Should we use library X or Y?" (recommendation)
1530
- - "What would a good architecture look like?" (design, not research)
1564
+ Build a list of specific commands the verifier should run.
1531
1565
 
1532
- Write the questions to a temporary file at \`docs/research/.questions-tmp.md\`. Create the \`docs/research/\` directory if it doesn't exist.
1566
+ ## Step 4: Spawn the Verifier Subagent
1533
1567
 
1534
- **Do NOT include any content from the brief in this file \u2014 only the questions.**
1568
+ Use Claude Code's Agent tool to spawn a subagent with the following prompt. Replace the placeholders with the actual content extracted in Steps 2-3.
1535
1569
 
1536
- ---
1570
+ \`\`\`
1571
+ You are a QA verifier. Your job is to independently verify an implementation against its spec. You have NO context about how the implementation was done -- you are checking it fresh.
1537
1572
 
1538
- ## Phase 2: Spawn Research Subagent
1573
+ RULES -- these are hard constraints, not suggestions:
1574
+ - You may READ any file using the Read tool or cat
1575
+ - You may RUN these specific test/build commands: [TEST_COMMANDS]
1576
+ - You may NOT edit, create, or delete any files
1577
+ - You may NOT run commands that modify state (no git commit, no npm install, no file writes)
1578
+ - You may NOT install packages or access the network
1579
+ - Report what you OBSERVE, not what you expect or hope
1539
1580
 
1540
- Use Claude Code's Agent tool to spawn a subagent. Pass ONLY the research questions \u2014 never the brief path, brief content, or feature description.
1581
+ SPEC NAME: [SPEC_NAME]
1541
1582
 
1542
- Build the subagent prompt by reading the questions file you just wrote, then use this template:
1583
+ ACCEPTANCE CRITERIA:
1584
+ [ACCEPTANCE_CRITERIA]
1543
1585
 
1544
- \`\`\`
1545
- You are researching a codebase to answer specific questions. You have NO context about why these questions are being asked \u2014 you are simply gathering facts.
1586
+ TEST PLAN:
1587
+ [TEST_PLAN]
1546
1588
 
1547
- RULES \u2014 these are hard constraints:
1548
- - Answer each question with FACTS ONLY: file paths, function signatures, data flows, patterns, dependencies
1549
- - Do NOT recommend, suggest, or opine on anything
1550
- - Do NOT speculate about what should be built or how
1551
- - If a question cannot be answered (no relevant code exists), say "No existing code found for this"
1552
- - Use the Read tool and Grep tool to explore the codebase thoroughly
1553
- - Include code snippets only when they are essential evidence (e.g., a function signature, a config block)
1589
+ CONSTRAINTS:
1590
+ [CONSTRAINTS_OR_NONE]
1554
1591
 
1555
- QUESTIONS:
1556
- [INSERT_QUESTIONS_HERE]
1592
+ YOUR TASK:
1593
+ For each acceptance criterion, determine if it PASSES or FAILS based on evidence:
1557
1594
 
1558
- OUTPUT FORMAT \u2014 write your findings as a single markdown document using this structure:
1595
+ 1. Run the test commands listed above. Record the output.
1596
+ 2. For each acceptance criterion:
1597
+ a. Check if there is a corresponding test and whether it passes
1598
+ b. If no test exists, read the relevant source files to verify the criterion is met
1599
+ c. If the criterion cannot be verified by reading code or running tests, mark it MANUAL CHECK NEEDED
1600
+ 3. For criteria about build/test passing, actually run the commands and report results.
1559
1601
 
1560
- # Codebase Research
1602
+ OUTPUT FORMAT -- you MUST use this exact format:
1561
1603
 
1562
- **Date:** [today's date]
1563
- **Questions answered:** [N/total]
1604
+ VERIFICATION REPORT
1564
1605
 
1565
- ---
1606
+ | # | Criterion | Verdict | Evidence |
1607
+ |---|-----------|---------|----------|
1608
+ | 1 | [criterion text] | PASS/FAIL/MANUAL CHECK NEEDED | [what you observed] |
1609
+ | 2 | [criterion text] | PASS/FAIL/MANUAL CHECK NEEDED | [what you observed] |
1610
+ [continue for all criteria]
1566
1611
 
1567
- ## Q1: [question text]
1612
+ SUMMARY: X/Y criteria passed. [Z failures need attention. / All criteria verified.]
1568
1613
 
1569
- [Facts, file paths, function signatures, data flows. No opinions.]
1614
+ If any test commands fail to run (missing dependencies, wrong command, etc.), report the error as evidence for a FAIL verdict on the relevant criterion.
1615
+ \`\`\`
1570
1616
 
1571
- ## Q2: [question text]
1617
+ ## Step 5: Format and Present the Verdict
1572
1618
 
1573
- [Facts, file paths, function signatures, data flows. No opinions.]
1619
+ Take the subagent's response and present it to the user in this format:
1574
1620
 
1575
- [Continue for all questions]
1576
1621
  \`\`\`
1622
+ ## Verification Report -- [Spec Name]
1577
1623
 
1578
- ## Phase 3: Write the Research Document
1624
+ | # | Criterion | Verdict | Evidence |
1625
+ |---|-----------|---------|----------|
1626
+ | 1 | ... | PASS | ... |
1627
+ | 2 | ... | FAIL | ... |
1579
1628
 
1580
- Take the subagent's response and write it to \`docs/research/YYYY-MM-DD-feature-name.md\`. Derive the feature name from the brief filename or the user's description (lowercase, hyphenated).
1629
+ **Overall: X/Y criteria passed.**
1581
1630
 
1582
- Delete the temporary questions file (\`docs/research/.questions-tmp.md\`).
1631
+ [If all passed:]
1632
+ All criteria verified. Ready to commit and open a PR.
1583
1633
 
1584
- Present the research document path to the user:
1634
+ [If any failed:]
1635
+ N failures need attention. Review the evidence above and fix before proceeding.
1585
1636
 
1637
+ [If any MANUAL CHECK NEEDED:]
1638
+ N criteria need manual verification -- they can't be checked by reading code or running tests alone.
1586
1639
  \`\`\`
1587
- Research complete: docs/research/YYYY-MM-DD-feature-name.md
1588
1640
 
1589
- This document contains objective facts about your codebase \u2014 no opinions or recommendations.
1641
+ ## Step 6: Suggest Next Steps
1590
1642
 
1591
- Next steps:
1592
- - /joycraft-decompose \u2014 break the feature into atomic specs (research will inform the specs)
1593
- - /joycraft-new-feature \u2014 formalize into a full Feature Brief first
1594
- - Read the research and add any corrections or missing context manually
1595
- \`\`\`
1643
+ Based on the verdict:
1644
+
1645
+ - **All PASS:** Suggest committing and opening a PR, or running \`/joycraft-session-end\` to capture discoveries.
1646
+ - **Some FAIL:** List the failed criteria and suggest the user fix them, then run \`/joycraft-verify\` again.
1647
+ - **MANUAL CHECK NEEDED items:** Explain what needs human eyes and why automation couldn't verify it.
1648
+
1649
+ **Do NOT offer to fix failures yourself.** The verifier reports; the human (or implementation agent in a separate turn) decides what to do. This separation is the whole point.
1596
1650
 
1597
1651
  ## Edge Cases
1598
1652
 
1599
1653
  | Scenario | Behavior |
1600
1654
  |----------|----------|
1601
- | No brief provided | Accept inline description, generate questions from that |
1602
- | Codebase is empty or new | Research doc reports "no existing patterns found" per question |
1603
- | User runs research twice for same feature | Overwrites previous research doc (same filename) |
1604
- | Brief is very short (1-2 sentences) | Still generate questions \u2014 even simple features benefit from understanding existing patterns |
1605
- | \`docs/research/\` doesn't exist | Create it |
1655
+ | Spec has no Test Plan | Warn that verification is weaker without a test plan, but proceed by checking criteria through code reading and any available project-level tests |
1656
+ | All tests pass but a criterion is not testable | Mark as MANUAL CHECK NEEDED with explanation |
1657
+ | Subagent can't run tests (missing deps) | Report the error as FAIL evidence |
1658
+ | No specs found and no path given | Tell user to provide a spec path or create a spec first |
1659
+ | Spec status is "Complete" | Still run verification -- "Complete" means the implementer thinks it's done, verification confirms |
1606
1660
  `
1607
1661
  };
1608
1662
  var TEMPLATES = {
@@ -1984,59 +2038,13 @@ is required, though you can add one via the GitHub Checks API if you prefer.
1984
2038
 
1985
2039
  ---
1986
2040
 
1987
- ## Testing by Stack Type
1988
-
1989
- The scenario agent selects the appropriate test format based on the project's
1990
- testing backbone. Each backbone tests the same holdout principle \u2014 observable
1991
- behavior only, no source imports \u2014 but uses different tools.
1992
-
1993
- ### Web Apps (Playwright)
1994
-
1995
- For Next.js, Vite, Nuxt, Remix, and other web frameworks. Tests run against a
1996
- dev server or preview URL using a headless browser.
1997
-
1998
- - **Template:** \`example-scenario-web.spec.ts\`
1999
- - **Config:** \`playwright.config.ts\`
2000
- - **Package:** \`package-web.json\` (use instead of \`package.json\` for web projects)
2001
- - **Run:** \`npx playwright test\`
2002
-
2003
- ### Mobile Apps (Maestro)
2004
-
2005
- For React Native, Flutter, and native iOS/Android. Tests are declarative YAML
2006
- flows that interact with a running app on a simulator.
2007
-
2008
- - **Template:** \`example-scenario-mobile.yaml\`
2009
- - **Login sub-flow:** \`example-scenario-mobile-login.yaml\`
2010
- - **Setup guide:** \`README-mobile.md\`
2011
- - **Run:** \`maestro test example-scenario-mobile.yaml\`
2012
-
2013
- ### API Backends (HTTP)
2014
-
2015
- For Express, FastAPI, Django, and other API-only backends. Tests send HTTP
2016
- requests using Node.js built-in \`fetch\`.
2017
-
2018
- - **Template:** \`example-scenario-api.test.ts\`
2019
- - **Run:** \`npx vitest run\`
2020
-
2021
- ### CLI Tools & Libraries (native)
2022
-
2023
- For CLI tools, npm packages, and non-UI projects. Tests invoke the built
2024
- binary via \`spawnSync\` and assert on stdout/stderr.
2025
-
2026
- - **Template:** \`example-scenario.test.ts\`
2027
- - **Run:** \`npx vitest run\`
2028
-
2029
- ---
2030
-
2031
2041
  ## Adding scenarios
2032
2042
 
2033
2043
  ### Rules
2034
2044
 
2035
- These rules apply to ALL backbones:
2036
-
2037
- 1. **Behavioral, not structural.** Test what the app does from a user's
2038
- perspective. For web: navigate and assert on content. For CLI: run commands
2039
- and check output. For API: send requests and check responses.
2045
+ 1. **Behavioral, not structural.** Test what the tool does, not how it is
2046
+ built internally. Invoke the binary; assert on stdout, exit codes, and
2047
+ filesystem state. Never import from \`../main-repo/src\`.
2040
2048
 
2041
2049
  2. **End-to-end.** Each test should represent something a real user would
2042
2050
  actually do. If you would not put it in a demo or docs example, reconsider
@@ -2046,8 +2054,9 @@ These rules apply to ALL backbones:
2046
2054
  see source code. Any \`import\` that reaches into \`../main-repo/src\` breaks
2047
2055
  the pattern.
2048
2056
 
2049
- 4. **Independent.** Each test must be able to run in isolation. No shared
2050
- mutable state between tests.
2057
+ 4. **Independent.** Each test must be able to run in isolation. Use \`beforeEach\`
2058
+ / \`afterEach\` to set up and tear down temp directories. Do not share mutable
2059
+ state between tests.
2051
2060
 
2052
2061
  5. **Deterministic.** Avoid network calls, timestamps, or random values in
2053
2062
  assertions unless the feature under test genuinely involves them.
@@ -2056,25 +2065,31 @@ These rules apply to ALL backbones:
2056
2065
 
2057
2066
  \`\`\`
2058
2067
  $SCENARIOS_REPO/
2059
- \u251C\u2500\u2500 example-scenario.test.ts # CLI/binary scenario template
2060
- \u251C\u2500\u2500 example-scenario-web.spec.ts # Web app scenario template (Playwright)
2061
- \u251C\u2500\u2500 example-scenario-api.test.ts # API backend scenario template
2062
- \u251C\u2500\u2500 example-scenario-mobile.yaml # Mobile app scenario template (Maestro)
2063
- \u251C\u2500\u2500 example-scenario-mobile-login.yaml # Reusable login sub-flow
2064
- \u251C\u2500\u2500 playwright.config.ts # Playwright config (web projects)
2065
- \u251C\u2500\u2500 package.json # Default (vitest for CLI/API)
2066
- \u251C\u2500\u2500 package-web.json # Alternative (Playwright for web)
2067
- \u251C\u2500\u2500 README-mobile.md # Mobile testing setup guide
2068
+ \u251C\u2500\u2500 example-scenario.test.ts # Starter file \u2014 replace with real scenarios
2068
2069
  \u251C\u2500\u2500 workflows/
2069
- \u2502 \u251C\u2500\u2500 run.yml # CI workflow (do not rename)
2070
- \u2502 \u2514\u2500\u2500 generate.yml # Scenario generation workflow
2071
- \u251C\u2500\u2500 prompts/
2072
- \u2502 \u2514\u2500\u2500 scenario-agent.md # Scenario agent instructions
2070
+ \u2502 \u2514\u2500\u2500 run.yml # CI workflow (do not rename)
2071
+ \u251C\u2500\u2500 package.json
2073
2072
  \u2514\u2500\u2500 README.md
2074
2073
  \`\`\`
2075
2074
 
2076
- Use the template that matches your project's stack. Remove the ones you
2077
- don't need.
2075
+ Add new \`.test.ts\` files at the top level or in subdirectories. Vitest will
2076
+ discover them automatically.
2077
+
2078
+ ### Example structure
2079
+
2080
+ \`\`\`ts
2081
+ import { spawnSync } from "node:child_process";
2082
+ import { join } from "node:path";
2083
+
2084
+ const CLI = join(__dirname, "..", "main-repo", "dist", "cli.js");
2085
+
2086
+ it("init creates a CLAUDE.md file", () => {
2087
+ const tmp = mkdtempSync(join(tmpdir(), "scenario-"));
2088
+ const { status } = spawnSync("node", [CLI, "init", tmp], { encoding: "utf8" });
2089
+ expect(status).toBe(0);
2090
+ expect(existsSync(join(tmp, "CLAUDE.md"))).toBe(true);
2091
+ });
2092
+ \`\`\`
2078
2093
 
2079
2094
  ---
2080
2095
 
@@ -2086,7 +2101,6 @@ don't need.
2086
2101
  | Visible to agent | Yes | No |
2087
2102
  | What they test | Units, modules, logic | End-to-end behavior |
2088
2103
  | Import source code | Yes | Never |
2089
- | Test method | Unit test framework | Depends on backbone (Playwright/Maestro/vitest/fetch) |
2090
2104
  | Run on every push | Yes | Yes (via dispatch) |
2091
2105
  | Purpose | Catch regressions fast | Validate real behavior |
2092
2106
 
@@ -2235,304 +2249,6 @@ describe("CLI: init command (example \u2014 replace with your real scenarios)",
2235
2249
  }
2236
2250
  }
2237
2251
  `,
2238
- "scenarios/package-web.json": `{
2239
- "name": "$SCENARIOS_REPO",
2240
- "version": "0.0.1",
2241
- "private": true,
2242
- "type": "module",
2243
- "scripts": {
2244
- "test": "playwright test"
2245
- },
2246
- "devDependencies": {
2247
- "@playwright/test": "^1.50.0"
2248
- }
2249
- }
2250
- `,
2251
- "scenarios/playwright.config.ts": `import { defineConfig } from '@playwright/test';
2252
-
2253
- /**
2254
- * Playwright configuration for holdout scenario tests.
2255
- *
2256
- * BASE_URL can be set to test against a preview deployment URL
2257
- * or defaults to http://localhost:3000 for local dev server testing.
2258
- */
2259
- export default defineConfig({
2260
- testDir: '.',
2261
- testMatch: '**/*.spec.ts',
2262
- timeout: 60_000,
2263
- retries: 0,
2264
- use: {
2265
- baseURL: process.env.BASE_URL || 'http://localhost:3000',
2266
- headless: true,
2267
- screenshot: 'only-on-failure',
2268
- },
2269
- projects: [
2270
- { name: 'chromium', use: { browserName: 'chromium' } },
2271
- ],
2272
- });
2273
- `,
2274
- "scenarios/example-scenario-web.spec.ts": `/**
2275
- * Example Web Scenario Test (Playwright)
2276
- *
2277
- * This file is a template for scenario tests against web applications.
2278
- * The holdout pattern applies: test the running app through its UI,
2279
- * never import source code from the main repo.
2280
- *
2281
- * The main repo is available at ../main-repo and is already built.
2282
- * Tests run against either:
2283
- * - A dev server started from ../main-repo (default)
2284
- * - A preview deployment URL (set BASE_URL env var)
2285
- *
2286
- * DO:
2287
- * - Navigate to pages, click elements, fill forms, assert on visible content
2288
- * - Use page.locator() with accessible selectors (role, text, test-id)
2289
- * - Keep each test fully independent
2290
- *
2291
- * DON'T:
2292
- * - Import from ../main-repo/src \u2014 that defeats the holdout
2293
- * - Test internal implementation details
2294
- * - Rely on specific CSS classes or DOM structure (use accessible selectors)
2295
- */
2296
-
2297
- import { test, expect } from '@playwright/test';
2298
- import { spawn, type ChildProcess } from 'node:child_process';
2299
- import { join } from 'node:path';
2300
-
2301
- const MAIN_REPO = join(__dirname, '..', 'main-repo');
2302
- let serverProcess: ChildProcess | undefined;
2303
-
2304
- /**
2305
- * Wait for a URL to become reachable.
2306
- */
2307
- async function waitForServer(url: string, timeoutMs = 60_000): Promise<void> {
2308
- const start = Date.now();
2309
- while (Date.now() - start < timeoutMs) {
2310
- try {
2311
- const res = await fetch(url);
2312
- if (res.ok || res.status < 500) return;
2313
- } catch {
2314
- // Server not ready yet
2315
- }
2316
- await new Promise(r => setTimeout(r, 1000));
2317
- }
2318
- throw new Error(\`Server at \${url} did not become ready within \${timeoutMs}ms\`);
2319
- }
2320
-
2321
- test.beforeAll(async () => {
2322
- // If BASE_URL is set, skip starting a dev server \u2014 test against the provided URL
2323
- if (process.env.BASE_URL) return;
2324
-
2325
- serverProcess = spawn('npm', ['run', 'dev'], {
2326
- cwd: MAIN_REPO,
2327
- stdio: 'pipe',
2328
- env: { ...process.env, PORT: '3000' },
2329
- });
2330
-
2331
- await waitForServer('http://localhost:3000');
2332
- });
2333
-
2334
- test.afterAll(async () => {
2335
- if (serverProcess) {
2336
- serverProcess.kill('SIGTERM');
2337
- serverProcess = undefined;
2338
- }
2339
- });
2340
-
2341
- // ---------------------------------------------------------------------------
2342
- // Example scenarios \u2014 replace with real tests for your application
2343
- // ---------------------------------------------------------------------------
2344
-
2345
- test.describe('Home page', () => {
2346
- test('loads successfully and shows main heading', async ({ page }) => {
2347
- await page.goto('/');
2348
- // Replace with your app's actual heading or key element
2349
- await expect(page.locator('h1')).toBeVisible();
2350
- });
2351
-
2352
- test('navigates to a subpage', async ({ page }) => {
2353
- await page.goto('/');
2354
- // Replace with your app's actual navigation
2355
- // await page.click('text=About');
2356
- // await expect(page).toHaveURL(/\\/about/);
2357
- // await expect(page.locator('h1')).toContainText('About');
2358
- });
2359
- });
2360
- `,
2361
- "scenarios/example-scenario-api.test.ts": `/**
2362
- * Example API Scenario Test
2363
- *
2364
- * This file is a template for scenario tests against API-only backends.
2365
- * The holdout pattern applies: test the running server via HTTP requests,
2366
- * never import route handlers or source code from the main repo.
2367
- *
2368
- * The main repo is available at ../main-repo and is already built.
2369
- * Tests run against either:
2370
- * - A server started from ../main-repo (default)
2371
- * - A deployed URL (set BASE_URL env var)
2372
- *
2373
- * Uses Node.js built-in fetch \u2014 no additional HTTP client dependencies.
2374
- *
2375
- * DO:
2376
- * - Send HTTP requests to endpoints, assert on status codes and response bodies
2377
- * - Test realistic user actions (create, read, update, delete flows)
2378
- * - Keep each test fully independent
2379
- *
2380
- * DON'T:
2381
- * - Import from ../main-repo/src \u2014 that defeats the holdout
2382
- * - Use supertest or similar tools that import the app directly
2383
- * - Test internal implementation details
2384
- */
2385
-
2386
- import { describe, it, expect, beforeAll, afterAll } from 'vitest';
2387
- import { spawn, type ChildProcess } from 'node:child_process';
2388
- import { join } from 'node:path';
2389
-
2390
- const MAIN_REPO = join(__dirname, '..', 'main-repo');
2391
- const BASE_URL = process.env.BASE_URL || 'http://localhost:3000';
2392
- let serverProcess: ChildProcess | undefined;
2393
-
2394
- /**
2395
- * Wait for a URL to become reachable.
2396
- */
2397
- async function waitForServer(url: string, timeoutMs = 60_000): Promise<void> {
2398
- const start = Date.now();
2399
- while (Date.now() - start < timeoutMs) {
2400
- try {
2401
- const res = await fetch(url);
2402
- if (res.ok || res.status < 500) return;
2403
- } catch {
2404
- // Server not ready yet
2405
- }
2406
- await new Promise(r => setTimeout(r, 1000));
2407
- }
2408
- throw new Error(\`Server at \${url} did not become ready within \${timeoutMs}ms\`);
2409
- }
2410
-
2411
- beforeAll(async () => {
2412
- // If BASE_URL is set externally, skip starting a server
2413
- if (process.env.BASE_URL) return;
2414
-
2415
- serverProcess = spawn('npm', ['start'], {
2416
- cwd: MAIN_REPO,
2417
- stdio: 'pipe',
2418
- env: { ...process.env, PORT: '3000' },
2419
- });
2420
-
2421
- await waitForServer(BASE_URL);
2422
- }, 90_000);
2423
-
2424
- afterAll(() => {
2425
- if (serverProcess) {
2426
- serverProcess.kill('SIGTERM');
2427
- serverProcess = undefined;
2428
- }
2429
- });
2430
-
2431
- // ---------------------------------------------------------------------------
2432
- // Example scenarios \u2014 replace with real tests for your API
2433
- // ---------------------------------------------------------------------------
2434
-
2435
- describe('API health', () => {
2436
- it('GET / returns a success status', async () => {
2437
- const res = await fetch(\`\${BASE_URL}/\`);
2438
- expect(res.status).toBeLessThan(500);
2439
- });
2440
- });
2441
-
2442
- describe('API endpoints', () => {
2443
- it('GET /api/example returns JSON', async () => {
2444
- const res = await fetch(\`\${BASE_URL}/api/example\`);
2445
- // Replace with your actual endpoint
2446
- // expect(res.status).toBe(200);
2447
- // const body = await res.json();
2448
- // expect(body).toHaveProperty('data');
2449
- });
2450
-
2451
- it('POST /api/example creates a resource', async () => {
2452
- // Replace with your actual endpoint and payload
2453
- // const res = await fetch(\\\`\\\${BASE_URL}/api/example\\\`, {
2454
- // method: 'POST',
2455
- // headers: { 'Content-Type': 'application/json' },
2456
- // body: JSON.stringify({ name: 'test' }),
2457
- // });
2458
- // expect(res.status).toBe(201);
2459
- // const body = await res.json();
2460
- // expect(body).toHaveProperty('id');
2461
- });
2462
-
2463
- it('returns 404 for unknown routes', async () => {
2464
- const res = await fetch(\`\${BASE_URL}/api/does-not-exist\`);
2465
- expect(res.status).toBe(404);
2466
- });
2467
- });
2468
- `,
2469
- "scenarios/example-scenario-mobile.yaml": `# Example Mobile Scenario Test (Maestro)
2470
- #
2471
- # This file is a template for scenario tests against mobile applications.
2472
- # The holdout pattern applies: test the running app through its UI,
2473
- # never reference source code from the main repo.
2474
- #
2475
- # Maestro tests are declarative YAML flows that interact with a running
2476
- # app on a simulator/emulator. Install Maestro:
2477
- # curl -Ls "https://get.maestro.mobile.dev" | bash
2478
- #
2479
- # Run this flow:
2480
- # maestro test example-scenario-mobile.yaml
2481
- #
2482
- # DO:
2483
- # - Tap elements, fill inputs, assert on visible text
2484
- # - Use runFlow for reusable sub-flows (e.g., login)
2485
- # - Use assertWithAI for natural-language assertions
2486
- #
2487
- # DON'T:
2488
- # - Reference source code paths or internal identifiers
2489
- # - Depend on exact pixel positions (use text and accessibility labels)
2490
-
2491
- appId: com.example.myapp # Replace with your app's bundle identifier
2492
- name: "Core User Journey"
2493
- tags:
2494
- - smoke
2495
- - holdout
2496
- ---
2497
- # Step 1: Launch the app
2498
- - launchApp
2499
-
2500
- # Step 2: Login (using a reusable sub-flow)
2501
- - runFlow: example-scenario-mobile-login.yaml
2502
-
2503
- # Step 3: Verify the main screen loaded
2504
- - assertVisible: "Home"
2505
-
2506
- # Step 4: Navigate to a feature
2507
- # - tapOn: "Settings"
2508
- # - assertVisible: "Account"
2509
-
2510
- # Step 5: AI-powered assertion (natural language)
2511
- # - assertWithAI: "The main dashboard is visible with navigation tabs at the bottom"
2512
-
2513
- # Step 6: Go back
2514
- # - back
2515
- # - assertVisible: "Home"
2516
- `,
2517
- "scenarios/example-scenario-mobile-login.yaml": `# Reusable Login Sub-Flow (Maestro)
2518
- #
2519
- # This flow handles authentication. Other flows include it via:
2520
- # - runFlow: example-scenario-mobile-login.yaml
2521
- #
2522
- # Replace the selectors and credentials with your app's actual login flow.
2523
-
2524
- appId: com.example.myapp
2525
- name: "Login"
2526
- ---
2527
- - assertVisible: "Sign In"
2528
- - tapOn: "Email"
2529
- - inputText: "test@example.com"
2530
- - tapOn: "Password"
2531
- - inputText: "testpassword123"
2532
- - tapOn: "Log In"
2533
- - assertVisible: "Home" # Verify login succeeded
2534
- `,
2535
- "scenarios/README-mobile.md": '# Mobile Scenario Testing with Maestro\n\nThis guide explains how to set up and run mobile holdout scenario tests using [Maestro](https://maestro.dev/).\n\n## Prerequisites\n\n- **Maestro CLI:** `curl -Ls "https://get.maestro.mobile.dev" | bash`\n- **Java 17+** (required by Maestro)\n- **Simulator/Emulator:**\n - iOS: Xcode with iOS Simulator (macOS only)\n - Android: Android Studio with an AVD configured\n\n> **Important:** Joycraft does not install Maestro or manage simulators. This is your responsibility.\n\n## Running Tests Locally\n\n```bash\n# Boot your simulator/emulator first, then:\nmaestro test example-scenario-mobile.yaml\n\n# Run all flows in a directory:\nmaestro test .maestro/\n```\n\n## Writing Flows\n\nMaestro flows are declarative YAML. Core commands:\n\n| Command | Purpose |\n|---------|--------|\n| `launchApp` | Start or restart the app |\n| `tapOn: "text"` | Tap an element by visible text or test ID |\n| `inputText: "value"` | Type into a focused field |\n| `assertVisible: "text"` | Assert an element is on screen |\n| `assertNotVisible: "text"` | Assert an element is NOT on screen |\n| `scroll` | Scroll down |\n| `back` | Press the back button |\n| `runFlow: file.yaml` | Run a reusable sub-flow |\n| `assertWithAI: "description"` | Natural-language assertion (AI-powered) |\n\n## CI Options\n\n### Option A: Maestro Cloud (paid, easiest)\n\nUpload your app binary and flows to Maestro Cloud. No simulator management.\n\n```yaml\n- uses: mobile-dev-inc/action-maestro-cloud@v2\n with:\n api-key: ${{ secrets.MAESTRO_API_KEY }}\n app-file: app.apk # or app.ipa\n workspace: .\n```\n\n### Option B: Self-hosted emulator (free, more setup)\n\nSpin up an Android emulator on a Linux runner or iOS simulator on a macOS runner.\n\n> **Cost note:** macOS GitHub Actions runners are ~10x more expensive than Linux runners.\n\n## The Holdout Pattern\n\nThese tests live in the scenarios repo, separate from the main codebase. The scenario agent generates them from specs. They test observable behavior through the app\'s UI \u2014 never referencing source code or internal implementation.\n',
2536
2252
  "scenarios/prompts/scenario-agent.md": `You are a QA engineer working in a holdout test repository. You CANNOT access the main repository's source code. Your job is to write or update behavioral scenario tests based on specs that are pushed from the main repo.
2537
2253
 
2538
2254
  ## What You Have Access To
@@ -2540,23 +2256,7 @@ name: "Login"
2540
2256
  - This scenarios repository (test files, \`specs/\` mirror, \`package.json\`)
2541
2257
  - The incoming spec (provided below)
2542
2258
  - A list of existing test files and spec mirrors (provided below)
2543
- - The main repo is available at \`../main-repo\` and is already built
2544
- - The testing strategy for this project (provided below)
2545
-
2546
- ## Testing Strategy
2547
-
2548
- This project uses the **$TESTING_BACKBONE** testing backbone.
2549
-
2550
- Select the correct test format based on the backbone:
2551
-
2552
- | Backbone | Tool | Test Format | File Extension | How to Test |
2553
- |----------|------|-------------|---------------|-------------|
2554
- | \`playwright\` | Playwright | Browser-based E2E | \`.spec.ts\` | Navigate pages, click elements, assert on visible content |
2555
- | \`maestro\` | Maestro | YAML flows | \`.yaml\` | Tap elements, fill inputs, assert on screen state |
2556
- | \`api\` | fetch (Node.js built-in) | HTTP requests | \`.test.ts\` | Send requests to endpoints, assert on responses |
2557
- | \`native\` | vitest + spawnSync | CLI/binary invocation | \`.test.ts\` | Run commands, assert on stdout/stderr/exit codes |
2558
-
2559
- If the backbone is not provided or unrecognized, default to \`native\`.
2259
+ - The main repo is available at \`../main-repo\` and is already built \u2014 you can invoke its CLI or entry point via \`execSync\`/\`spawnSync\`, but you MUST NOT import from \`../main-repo/src\`
2560
2260
 
2561
2261
  ## Triage Decision Tree
2562
2262
 
@@ -2575,7 +2275,7 @@ If you SKIP, write a brief comment in the relevant test file (or a new one) expl
2575
2275
  - A new output format or file that gets generated
2576
2276
  - A new user-facing behavior that doesn't map to any existing test file
2577
2277
 
2578
- Name the file after the feature area using the correct extension for the backbone.
2278
+ Name the file after the feature area: \`[feature-area].test.ts\`. One feature area per test file.
2579
2279
 
2580
2280
  ### UPDATE \u2014 Modify an existing test file if the spec:
2581
2281
  - Changes behavior that is already tested
@@ -2586,20 +2286,25 @@ Match to the most relevant existing test file by feature area.
2586
2286
 
2587
2287
  **If you are unsure whether a spec is user-facing, err on the side of writing a test.**
2588
2288
 
2589
- ## Test Writing Rules (All Backbones)
2289
+ ## Test Writing Rules
2290
+
2291
+ 1. **Behavioral only.** Test observable output \u2014 stdout, stderr, exit codes, files created/modified on disk. Never test internal implementation details or import source modules.
2292
+
2293
+ 2. **Use \`execSync\` or \`spawnSync\`.** Invoke the built binary at \`../main-repo/dist/cli.js\` (or whatever the main repo's entry point is). Check \`../main-repo/package.json\` to find the correct entry point if unsure.
2294
+
2295
+ 3. **Use vitest.** Import \`describe\`, \`it\`, \`expect\` from \`vitest\`. Use \`beforeEach\`/\`afterEach\` for temp directory setup/teardown.
2590
2296
 
2591
- 1. **Behavioral only.** Test observable behavior \u2014 what a real user would see. Never test internal implementation details or import source modules.
2592
- 2. **Each test is fully independent.** No shared mutable state between tests.
2593
- 3. **Assert on realistic user actions.** Write tests that reflect what a real user would do.
2594
- 4. **Never import from the parent repo's source.** If you find yourself writing \`import { ... } from '../main-repo/src/...'\`, stop \u2014 that defeats the holdout.
2297
+ 4. **Each test is fully independent.** No shared mutable state between tests. Each test that touches the filesystem gets its own temp directory via \`mkdtempSync\`.
2595
2298
 
2596
- ## Backbone: native (CLI/Binary)
2299
+ 5. **Assert on realistic user actions.** Write tests that reflect what a real user would do \u2014 not what the implementation happens to do.
2597
2300
 
2598
- Use when the project is a CLI tool, library, or has no web/mobile UI.
2301
+ 6. **Never import from the parent repo's source.** If you find yourself writing \`import { ... } from '../main-repo/src/...'\`, stop \u2014 that defeats the holdout.
2302
+
2303
+ ## Test File Template
2599
2304
 
2600
2305
  \`\`\`typescript
2601
- import { spawnSync } from 'node:child_process';
2602
- import { mkdtempSync, rmSync } from 'node:fs';
2306
+ import { execSync, spawnSync } from 'node:child_process';
2307
+ import { existsSync, mkdtempSync, rmSync, readFileSync } from 'node:fs';
2603
2308
  import { tmpdir } from 'node:os';
2604
2309
  import { join } from 'node:path';
2605
2310
  import { describe, it, expect, beforeEach, afterEach } from 'vitest';
@@ -2612,122 +2317,39 @@ function runCLI(args: string[], cwd?: string) {
2612
2317
  cwd: cwd ?? process.cwd(),
2613
2318
  env: { ...process.env, NO_COLOR: '1' },
2614
2319
  });
2615
- return { stdout: result.stdout ?? '', stderr: result.stderr ?? '', status: result.status ?? 1 };
2320
+ return {
2321
+ stdout: result.stdout ?? '',
2322
+ stderr: result.stderr ?? '',
2323
+ status: result.status ?? 1,
2324
+ };
2616
2325
  }
2617
2326
 
2618
- describe('[feature area]', () => {
2327
+ describe('[feature area]: [behavior being tested]', () => {
2619
2328
  let tmpDir: string;
2620
- beforeEach(() => { tmpDir = mkdtempSync(join(tmpdir(), 'scenarios-')); });
2621
- afterEach(() => { rmSync(tmpDir, { recursive: true, force: true }); });
2622
-
2623
- it('[observable behavior]', () => {
2624
- const { stdout, status } = runCLI(['command', 'args'], tmpDir);
2625
- expect(status).toBe(0);
2626
- expect(stdout).toContain('expected output');
2627
- });
2628
- });
2629
- \`\`\`
2630
-
2631
- ## Backbone: playwright (Web Apps)
2632
2329
 
2633
- Use when the project is a web application (Next.js, Vite, Nuxt, etc.).
2634
-
2635
- \`\`\`typescript
2636
- import { test, expect } from '@playwright/test';
2637
-
2638
- // Tests run against BASE_URL (configured in playwright.config.ts)
2639
- // The dev server is started automatically or BASE_URL points to a preview deploy
2640
-
2641
- test.describe('[feature area]', () => {
2642
- test('[observable behavior]', async ({ page }) => {
2643
- await page.goto('/');
2644
- await expect(page.locator('h1')).toBeVisible();
2645
- });
2646
-
2647
- test('[user interaction]', async ({ page }) => {
2648
- await page.goto('/login');
2649
- await page.fill('[name="email"]', 'test@example.com');
2650
- await page.click('button[type="submit"]');
2651
- await expect(page).toHaveURL(/dashboard/);
2330
+ beforeEach(() => {
2331
+ tmpDir = mkdtempSync(join(tmpdir(), 'scenarios-'));
2652
2332
  });
2653
- });
2654
- \`\`\`
2655
-
2656
- ## Backbone: api (API Backends)
2657
-
2658
- Use when the project is an API-only backend (Express, FastAPI, etc.).
2659
-
2660
- \`\`\`typescript
2661
- import { describe, it, expect } from 'vitest';
2662
-
2663
- const BASE_URL = process.env.BASE_URL || 'http://localhost:3000';
2664
2333
 
2665
- describe('[feature area]', () => {
2666
- it('[endpoint behavior]', async () => {
2667
- const res = await fetch(\\\`\\\${BASE_URL}/api/endpoint\\\`);
2668
- expect(res.status).toBe(200);
2669
- const body = await res.json();
2670
- expect(body).toHaveProperty('data');
2334
+ afterEach(() => {
2335
+ rmSync(tmpDir, { recursive: true, force: true });
2671
2336
  });
2672
2337
 
2673
- it('[error handling]', async () => {
2674
- const res = await fetch(\\\`\\\${BASE_URL}/api/not-found\\\`);
2675
- expect(res.status).toBe(404);
2338
+ it('[specific observable behavior]', () => {
2339
+ const { stdout, status } = runCLI(['command', 'args'], tmpDir);
2340
+ expect(status).toBe(0);
2341
+ expect(stdout).toContain('expected output');
2676
2342
  });
2677
2343
  });
2678
2344
  \`\`\`
2679
2345
 
2680
- ## Backbone: maestro (Mobile Apps)
2681
-
2682
- Use when the project is a mobile application (React Native, Flutter, native iOS/Android).
2683
-
2684
- \`\`\`yaml
2685
- appId: com.example.myapp
2686
- name: "[feature area]: [behavior being tested]"
2687
- tags:
2688
- - holdout
2689
- ---
2690
- - launchApp
2691
- - tapOn: "Sign In"
2692
- - inputText: "test@example.com"
2693
- - tapOn: "Submit"
2694
- - assertVisible: "Welcome"
2695
- # Use assertWithAI for complex visual assertions:
2696
- # - assertWithAI: "The dashboard shows a list of recent items"
2697
- \`\`\`
2698
-
2699
- ## Graceful Degradation
2700
-
2701
- If the primary backbone tool is not available in this repo, fall back to the next deepest testable layer:
2702
-
2703
- | Layer | What's Tested | When to Use |
2704
- |-------|-------------|-------------|
2705
- | **Layer 4: UI** | Full user flows through browser/simulator | \`@playwright/test\` or Maestro is installed |
2706
- | **Layer 3: API** | HTTP requests against running server | Server can be started from \`../main-repo\` |
2707
- | **Layer 2: Logic** | Unit tests via test runner | Test runner (vitest/jest) is available |
2708
- | **Layer 1: Static** | Build, typecheck, lint | Build toolchain is available |
2709
-
2710
- **Fallback rules:**
2711
- - If backbone is \`playwright\` but \`@playwright/test\` is NOT in this repo's \`package.json\`: fall back to \`api\` (fetch-based HTTP tests)
2712
- - If backbone is \`maestro\` but no simulator context is available: fall back to \`api\` if a server can be started, else \`native\`
2713
- - If backbone is \`api\` but no server start script exists: fall back to \`native\`
2714
- - \`native\` is always available as the floor
2715
-
2716
- Start each test file with a comment indicating the testing layer:
2717
- \`// Testing Layer: [4|3|2|1] - [UI|API|Logic|Static]\`
2718
-
2719
- If you fell back from the intended backbone, note this in your commit message:
2720
- \`scenarios: [action] for [spec] (layer: [N], reason: [why])\`
2721
-
2722
2346
  ## Checklist Before Committing
2723
2347
 
2724
2348
  - [ ] Decision: SKIP / NEW / UPDATE (and why)
2725
- - [ ] Correct backbone selected (or fallback justified)
2726
2349
  - [ ] Tests assert on observable behavior, not implementation
2727
2350
  - [ ] No imports from \`../main-repo/src\`
2728
- - [ ] Each test is independent (own temp dir, own state)
2729
- - [ ] File uses the correct extension for the backbone
2730
- - [ ] Testing layer comment at top of file
2351
+ - [ ] Each test has its own temp directory if it touches the filesystem
2352
+ - [ ] File is named after the feature area, not the spec
2731
2353
  `,
2732
2354
  "scenarios/workflows/generate.yml": `# Scenario Generation Workflow
2733
2355
  #
@@ -2827,9 +2449,7 @@ jobs:
2827
2449
  ## Context
2828
2450
 
2829
2451
  Existing test files in this repo: \${{ steps.context.outputs.existing_tests }}
2830
- Existing spec mirrors: \${{ steps.context.outputs.existing_specs }}
2831
-
2832
- Testing backbone: \${{ github.event.client_payload.testing_backbone || 'native' }}"
2452
+ Existing spec mirrors: \${{ steps.context.outputs.existing_specs }}"
2833
2453
 
2834
2454
  # \u2500\u2500 7. Commit any changes the agent made \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2835
2455
  - name: Commit scenario changes
@@ -3355,7 +2975,7 @@ jobs:
3355
2975
  - name: Find changed specs
3356
2976
  id: changed
3357
2977
  run: |
3358
- FILES=$(git diff --name-only --diff-filter=AM HEAD~1 HEAD -- 'docs/specs/*.md')
2978
+ FILES=$(git diff --name-only --diff-filter=AM HEAD~1 HEAD -- 'docs/specs/**/*.md')
3359
2979
  echo "files<<EOF" >> "$GITHUB_OUTPUT"
3360
2980
  echo "$FILES" >> "$GITHUB_OUTPUT"
3361
2981
  echo "EOF" >> "$GITHUB_OUTPUT"
@@ -3398,48 +3018,6 @@ jobs:
3398
3018
  -f "client_payload[repo]=\${{ github.repository }}"
3399
3019
 
3400
3020
  done <<< "\${{ steps.changed.outputs.files }}"
3401
- `,
3402
- "GOLDEN_EXAMPLE_TEMPLATE.md": `# [Feature Name] \u2014 Golden Example
3403
-
3404
- > **Date:** YYYY-MM-DD
3405
- > **Project:** [project name]
3406
- > **Source Brief:** \`docs/briefs/YYYY-MM-DD-feature-name.md\`
3407
-
3408
- ---
3409
-
3410
- ## Capture
3411
-
3412
- The original user request or description that initiated this feature. Copied verbatim or lightly edited from the brief's Vision section.
3413
-
3414
- > [Paste the original capture text here \u2014 what the user said/typed that kicked off the pipeline]
3415
-
3416
- ## Classification
3417
-
3418
- - **Action Level:** [interview | decompose | execute | research | design]
3419
- - **Confidence:** [high | medium | low]
3420
- - **Skills Used:** [comma-separated list of Joycraft skills invoked, e.g., joycraft-new-feature, joycraft-decompose]
3421
-
3422
- ## Decomposition Summary
3423
-
3424
- The resulting spec breakdown from this capture:
3425
-
3426
- | # | Spec Name | Description | Size |
3427
- |---|-----------|-------------|------|
3428
- | 1 | [spec-name] | [one sentence] | [S/M/L] |
3429
-
3430
- ## Rationale
3431
-
3432
- 2-3 sentences explaining why this classification was correct for this capture. What signals in the capture text indicated this action level? What would have gone wrong with a different classification?
3433
-
3434
- ---
3435
-
3436
- ## Template Usage Notes
3437
-
3438
- **This template is for Pipit golden examples.** Golden examples are auto-generated by Joycraft's session-end skill after a successful pipeline run. They provide few-shot examples that improve Pipit's level classifier over time.
3439
-
3440
- **Do not edit generated examples** unless the classification was wrong. If it was wrong, correct the Classification section \u2014 this teaches Pipit the right answer.
3441
-
3442
- **One example per pipeline run.** Each successful interview \u2192 brief \u2192 specs \u2192 execution cycle produces one golden example.
3443
3021
  `
3444
3022
  };
3445
3023
  var CODEX_SKILLS = {
@@ -3652,7 +3230,7 @@ Ask: "Does this match? Comfortable with this approach?" If large/risky, suggest
3652
3230
 
3653
3231
  ## Phase 4: Spec the Fix
3654
3232
 
3655
- Write a bug fix spec to \`docs/specs/YYYY-MM-DD-bugfix-name.md\`. Create the \`docs/specs/\` directory if it doesn't exist.
3233
+ Write a bug fix spec to \`docs/specs/<feature-or-area>/bugfix-name.md\`. Use the relevant feature name or area as the subdirectory (e.g., \`auth\`, \`cli\`, \`parser\`). Create the \`docs/specs/<feature-or-area>/\` directory if it doesn't exist.
3656
3234
 
3657
3235
  **Why:** Even bug fixes deserve a spec. It forces clarity on what "fixed" means, ensures test-first discipline, and creates a traceable record of the fix.
3658
3236
 
@@ -3707,7 +3285,7 @@ What changes, where?
3707
3285
  ## Phase 5: Hand Off
3708
3286
 
3709
3287
  \`\`\`
3710
- Bug fix spec is ready: docs/specs/YYYY-MM-DD-bugfix-name.md
3288
+ Bug fix spec is ready: docs/specs/<feature-or-area>/bugfix-name.md
3711
3289
 
3712
3290
  Summary:
3713
3291
  - Bug: [one sentence]
@@ -3783,7 +3361,7 @@ Iterate until the user approves.
3783
3361
 
3784
3362
  ## Step 5: Generate Atomic Specs
3785
3363
 
3786
- For each approved row, create \`docs/specs/YYYY-MM-DD-spec-name.md\`. Create the \`docs/specs/\` directory if it doesn't exist.
3364
+ For each approved row, create \`docs/specs/<feature-name>/spec-name.md\`. Derive the feature-name from the brief filename (strip the date prefix and \`.md\` \u2014 e.g., \`2026-04-06-token-discipline.md\` \u2192 \`token-discipline\`). If no brief exists, use a user-provided or inferred feature name (slugified to kebab-case). Create the \`docs/specs/<feature-name>/\` directory if it doesn't exist.
3787
3365
 
3788
3366
  **Why:** Each spec must be self-contained \u2014 a fresh session should be able to execute it without reading the Feature Brief. Copy relevant constraints and context into each spec.
3789
3367
 
@@ -3873,6 +3451,8 @@ To execute:
3873
3451
 
3874
3452
  Ready to start execution?
3875
3453
  \`\`\`
3454
+
3455
+ **Tip:** Run \`/new\` before starting the next step. Your artifacts are saved to files \u2014 this conversation context is disposable.
3876
3456
  `,
3877
3457
  "joycraft-design.md": `---
3878
3458
  name: joycraft-design
@@ -4200,6 +3780,8 @@ When you're ready to move forward:
4200
3780
  - **Mark everything as DRAFT.** The output is a starting point, not a commitment.
4201
3781
  - **Keep it short.** The draft brief should be 1-2 pages max. Capture the essence, not every detail.
4202
3782
  - **Multiple interviews are fine.** The user might run this several times as their thinking evolves. Each creates a new dated draft.
3783
+
3784
+ **Tip:** Run \`/new\` before starting the next step. Your artifacts are saved to files \u2014 this conversation context is disposable.
4203
3785
  `,
4204
3786
  "joycraft-lockdown.md": `---
4205
3787
  name: joycraft-lockdown
@@ -4419,7 +4001,7 @@ Iterate until approved.
4419
4001
 
4420
4002
  ## Phase 3: Generate Atomic Specs
4421
4003
 
4422
- For each row in the decomposition table, create a self-contained spec file at \`docs/specs/YYYY-MM-DD-spec-name.md\`. Create the \`docs/specs/\` directory if it doesn't exist.
4004
+ For each row in the decomposition table, create a self-contained spec file at \`docs/specs/<feature-name>/spec-name.md\`. Derive the feature-name from the brief filename (strip the date prefix and \`.md\` \u2014 e.g., \`2026-04-06-token-discipline.md\` \u2192 \`token-discipline\`). Create the \`docs/specs/<feature-name>/\` directory if it doesn't exist.
4423
4005
 
4424
4006
  **Why:** Each spec must be understandable WITHOUT reading the Feature Brief. This prevents the "Curse of Instructions" \u2014 no spec should require holding the entire feature in context. Copy relevant context into each spec.
4425
4007
 
@@ -4509,6 +4091,104 @@ Ready to start?
4509
4091
  **Why:** A fresh session for execution produces better results. The interview session has too much context noise \u2014 a clean session with just the spec is more focused.
4510
4092
 
4511
4093
  You can also use \`$joycraft-decompose\` to re-decompose a brief if the breakdown needs adjustment, or run \`$joycraft-interview\` first for a lighter brainstorm before committing to the full workflow.
4094
+
4095
+ **Tip:** Run \`/new\` before starting the next step. Your artifacts are saved to files \u2014 this conversation context is disposable.
4096
+ `,
4097
+ "joycraft-optimize.md": `---
4098
+ name: joycraft-optimize
4099
+ description: Audit your Claude Code or Codex session overhead \u2014 harness file sizes, plugins, MCP servers, hooks \u2014 and report actionable recommendations
4100
+ ---
4101
+
4102
+ # Optimize \u2014 Session Overhead Audit
4103
+
4104
+ You are auditing the user's AI development session for token overhead. Produce a conversational diagnostic report \u2014 no files created.
4105
+
4106
+ ## Step 1: Detect Platform
4107
+
4108
+ Check which platform is active:
4109
+ - **Claude Code:** Look for \`.claude/\` directory, \`CLAUDE.md\`
4110
+ - **Codex:** Look for \`.agents/\` directory, \`AGENTS.md\`
4111
+
4112
+ If both exist, run both checks. If neither, default to Claude Code checks and note the uncertainty.
4113
+
4114
+ ## Step 2: Audit Harness Files
4115
+
4116
+ ### Claude Code Path
4117
+
4118
+ 1. **CLAUDE.md** \u2014 count lines. Threshold: \u2264200 lines.
4119
+ 2. **Skill files** \u2014 glob \`.claude/skills/**/*.md\`. Count lines per file. Threshold: \u2264200 lines each.
4120
+
4121
+ ### Codex Path
4122
+
4123
+ 1. **AGENTS.md** \u2014 count lines. Threshold: \u2264200 lines.
4124
+ 2. **Skill files** \u2014 glob \`.agents/skills/**/*.md\`. Count lines per file. Threshold: \u2264200 lines each.
4125
+
4126
+ ## Step 3: Audit Plugins & MCP Servers
4127
+
4128
+ ### Claude Code Path
4129
+
4130
+ 1. **Installed plugins** \u2014 read \`~/.claude/plugins/installed_plugins.json\`. List plugin names and versions. If not found, report "no plugins file found."
4131
+ 2. **Enabled plugins** \u2014 read \`~/.claude/settings.json\`, check \`enabledPlugins\` array. Show enabled vs installed count.
4132
+ 3. **MCP servers** \u2014 read \`~/.claude/settings.json\`, count entries under \`mcpServers\`. List server names.
4133
+
4134
+ ### Codex Path
4135
+
4136
+ 1. **Plugin config** \u2014 read \`~/.codex/config.toml\`. List any plugin toggles. Note: Codex syncs its curated plugin marketplace at startup \u2014 this is a boot cost even if you don't use them.
4137
+ 2. **MCP servers** \u2014 check \`~/.codex/config.toml\` for MCP server entries. List server names.
4138
+
4139
+ ## Step 4: Audit Hooks (Claude Code Only)
4140
+
4141
+ Read \`.claude/settings.json\` in the project directory. List all hook definitions under the \`hooks\` key \u2014 show the event name and command for each.
4142
+
4143
+ For Codex: note "hook auditing not yet supported on Codex."
4144
+
4145
+ ## Step 5: Report
4146
+
4147
+ Organize findings by category. Use pass/warn indicators:
4148
+
4149
+ \`\`\`
4150
+ ## Session Overhead Report
4151
+
4152
+ ### Harness Files
4153
+ - CLAUDE.md/AGENTS.md: [N] lines [PASS \u2264200 / WARN >200]
4154
+ - Skills: [N] files, [list any over 200 lines]
4155
+
4156
+ ### Plugins
4157
+ - Installed: [N] ([list names])
4158
+ - Enabled: [N] of [M] installed
4159
+ - [If 0: "No plugins \u2014 zero boot cost from plugins."]
4160
+
4161
+ ### MCP Servers
4162
+ - Count: [N] ([list names])
4163
+ - [If 0: "No MCP servers \u2014 zero boot cost from servers."]
4164
+
4165
+ ### Hooks
4166
+ - [N] hook definitions ([list event names])
4167
+
4168
+ ### Recommendations
4169
+ - [Specific, actionable items for anything over threshold]
4170
+ - [e.g., "AGENTS.md is 312 lines \u2014 consider splitting reference sections into docs/"]
4171
+ - [e.g., "3 MCP servers load at boot \u2014 disable unused ones in config"]
4172
+ \`\`\`
4173
+
4174
+ ## Step 6: Further Resources
4175
+
4176
+ End with:
4177
+
4178
+ > For deeper token optimization, see:
4179
+ > - [Nate B Jones's token optimization techniques](https://www.youtube.com/watch?v=bDcgHzCBgmQ)
4180
+ > - [OB1 repo](https://github.com/nate-b-j/OB1) \u2014 Heavy File Ingestion skill and stupid button prompt kit
4181
+ > - [Joycraft's token discipline guide](docs/guides/token-discipline.md)
4182
+
4183
+ ## Edge Cases
4184
+
4185
+ | Scenario | Behavior |
4186
+ |----------|----------|
4187
+ | Config files don't exist | Report "not found" for that check, don't error |
4188
+ | No plugins installed | Report 0 plugins \u2014 this is good, say so |
4189
+ | CLAUDE.md/AGENTS.md exactly 200 lines | PASS \u2014 threshold is \u2264200 |
4190
+ | \`~/.claude/\` or \`~/.codex/\` not accessible | Skip user-level checks, note limitation |
4191
+ | Both platforms detected | Run both audits, report separately |
4512
4192
  `,
4513
4193
  "joycraft-research.md": `---
4514
4194
  name: joycraft-research
@@ -4652,7 +4332,7 @@ Fix any failures before proceeding.
4652
4332
 
4653
4333
  ## 3. Update Spec Status
4654
4334
 
4655
- If working from an atomic spec in \`docs/specs/\`:
4335
+ If working from an atomic spec in \`docs/specs/\` (scan recursively \u2014 specs may be in subdirectories like \`docs/specs/<feature-name>/\`):
4656
4336
  - All acceptance criteria met \u2014 update status to \`Complete\`
4657
4337
  - Partially done \u2014 update status to \`In Progress\`, note what's left
4658
4338
 
@@ -4683,6 +4363,8 @@ Session complete.
4683
4363
  - PR: [opened #N / not yet \u2014 N specs remaining]
4684
4364
  - Next: [what the next session should tackle]
4685
4365
  \`\`\`
4366
+
4367
+ **Tip:** Run \`/new\` before starting the next step. Your artifacts are saved to files \u2014 this conversation context is disposable.
4686
4368
  `,
4687
4369
  "joycraft-tune.md": `---
4688
4370
  name: joycraft-tune
@@ -4708,7 +4390,7 @@ Read CLAUDE.md and explore the project. Score each with specific evidence:
4708
4390
 
4709
4391
  | Dimension | What to Check |
4710
4392
  |-----------|--------------|
4711
- | Spec Quality | \`docs/specs/\` \u2014 structured? acceptance criteria? self-contained? |
4393
+ | Spec Quality | \`docs/specs/\` (scan recursively) \u2014 structured? acceptance criteria? self-contained? |
4712
4394
  | Spec Granularity | Can each spec be done in one session? |
4713
4395
  | Behavioral Boundaries | ALWAYS/ASK FIRST/NEVER sections (or equivalent rules under any heading) |
4714
4396
  | Skills & Hooks | \`.agents/skills/\` files, hooks config |
@@ -4752,6 +4434,8 @@ Show a tailored roadmap: Level 2-5 table, specific next steps based on actual ga
4752
4434
  - **Rules under non-standard headings:** Give credit for substance.
4753
4435
  - **Previous assessment exists:** Read it first. If nothing to upgrade, say so.
4754
4436
  - **Non-Joycraft content in CLAUDE.md:** Preserve as-is. Only append.
4437
+
4438
+ **Tip:** Run \`$joycraft-optimize\` to audit your session's token overhead \u2014 plugins, MCP servers, and harness file sizes.
4755
4439
  `,
4756
4440
  "joycraft-verify.md": `---
4757
4441
  name: joycraft-verify
@@ -4766,9 +4450,9 @@ The user wants independent verification of an implementation. Your job is to fin
4766
4450
 
4767
4451
  ## Step 1: Find the Spec
4768
4452
 
4769
- If the user provided a spec path (e.g., \`$joycraft-verify docs/specs/2026-03-26-add-widget.md\`), use that path directly.
4453
+ If the user provided a spec path (e.g., \`$joycraft-verify docs/specs/my-feature/add-widget.md\`), use that path directly.
4770
4454
 
4771
- If no path was provided, scan \`docs/specs/\` for spec files. Pick the most recently modified \`.md\` file in that directory. If \`docs/specs/\` doesn't exist or is empty, tell the user:
4455
+ If no path was provided, scan \`docs/specs/\` recursively for spec files (they may be in subdirectories like \`docs/specs/<feature-name>/\`). Pick the most recently modified \`.md\` file. If \`docs/specs/\` doesn't exist or is empty, tell the user:
4772
4456
 
4773
4457
  > No specs found in \`docs/specs/\`. Please provide a spec path: \`$joycraft-verify path/to/spec.md\`
4774
4458
 
@@ -4905,4 +4589,4 @@ export {
4905
4589
  TEMPLATES,
4906
4590
  CODEX_SKILLS
4907
4591
  };
4908
- //# sourceMappingURL=chunk-4RGMUQQZ.js.map
4592
+ //# sourceMappingURL=chunk-JXSFWGIN.js.map