joycraft 0.5.14 → 0.5.16

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,4379 +0,0 @@
1
- #!/usr/bin/env node
2
-
3
- // src/bundled-files.ts
4
- var SKILLS = {
5
- "joycraft-add-fact.md": `---
6
- name: joycraft-add-fact
7
- description: Capture a project fact and route it to the correct context document -- production map, dangerous assumptions, decision log, institutional knowledge, or troubleshooting
8
- instructions: 38
9
- ---
10
-
11
- # Add Fact
12
-
13
- The user has a fact to capture. Your job is to classify it, route it to the correct context document, append it in the right format, and optionally add a CLAUDE.md boundary rule.
14
-
15
- ## Step 1: Get the Fact
16
-
17
- If the user already provided the fact (e.g., \`/joycraft-add-fact the staging DB resets every Sunday\`), use it directly.
18
-
19
- If not, ask: "What fact do you want to capture?" -- then wait for their response.
20
-
21
- If the user provides multiple facts at once, process each one separately through all the steps below, then give a combined confirmation at the end.
22
-
23
- ## Step 2: Classify the Fact
24
-
25
- Route the fact to one of these 5 context documents based on its content:
26
-
27
- ### \`docs/context/production-map.md\`
28
- The fact is about **infrastructure, services, environments, URLs, endpoints, credentials, or what is safe/unsafe to touch**.
29
- - Signal words: "production", "staging", "endpoint", "URL", "database", "service", "deployed", "hosted", "credentials", "secret", "environment"
30
- - Examples: "The staging DB is at postgres://staging.example.com", "We use Vercel for the frontend and Railway for the API"
31
-
32
- ### \`docs/context/dangerous-assumptions.md\`
33
- The fact is about **something an AI agent might get wrong -- a false assumption that leads to bad outcomes**.
34
- - Signal words: "assumes", "might think", "but actually", "looks like X but is Y", "not what it seems", "trap", "gotcha"
35
- - Examples: "The \`users\` table looks like a test table but it's production", "Deleting a workspace doesn't delete the billing subscription"
36
-
37
- ### \`docs/context/decision-log.md\`
38
- The fact is about **an architectural or tooling choice and why it was made**.
39
- - Signal words: "decided", "chose", "because", "instead of", "we went with", "the reason we use", "trade-off"
40
- - Examples: "We chose SQLite over Postgres because this runs on embedded devices", "We use pnpm instead of npm for workspace support"
41
-
42
- ### \`docs/context/institutional-knowledge.md\`
43
- The fact is about **team conventions, unwritten rules, organizational context, or who owns what**.
44
- - Signal words: "convention", "rule", "always", "never", "team", "process", "review", "approval", "owns", "responsible"
45
- - Examples: "The design team reviews all color changes", "We never deploy on Fridays", "PR titles must start with the ticket number"
46
-
47
- ### \`docs/context/troubleshooting.md\`
48
- The fact is about **diagnostic knowledge -- when X happens, do Y (or don't do Z)**.
49
- - Signal words: "when", "fails", "error", "if you see", "stuck", "broken", "fix", "workaround", "before trying", "reboot", "restart", "reset"
50
- - Examples: "If Wi-Fi disconnects during flash, wait and retry -- don't switch networks", "When tests fail with ECONNREFUSED, check if Docker is running"
51
-
52
- ### Ambiguous Facts
53
-
54
- If the fact fits multiple categories, pick the **best fit** based on the primary intent. You will mention the alternative in your confirmation message so the user can correct you.
55
-
56
- ## Step 3: Ensure the Target Document Exists
57
-
58
- 1. If \`docs/context/\` does not exist, create the directory.
59
- 2. If the target document does not exist, create it from the template structure. Check \`docs/templates/\` for the matching template. If no template exists, use this minimal structure:
60
-
61
- For **production-map.md**:
62
- \`\`\`markdown
63
- # Production Map
64
-
65
- > What's real, what's staging, what's safe to touch.
66
-
67
- ## Services
68
-
69
- | Service | Environment | URL/Endpoint | Impact if Corrupted |
70
- |---------|-------------|-------------|-------------------|
71
- \`\`\`
72
-
73
- For **dangerous-assumptions.md**:
74
- \`\`\`markdown
75
- # Dangerous Assumptions
76
-
77
- > Things the AI agent might assume that are wrong in this project.
78
-
79
- ## Assumptions
80
-
81
- | Agent Might Assume | But Actually | Impact If Wrong |
82
- |-------------------|-------------|----------------|
83
- \`\`\`
84
-
85
- For **decision-log.md**:
86
- \`\`\`markdown
87
- # Decision Log
88
-
89
- > Why choices were made, not just what was chosen.
90
-
91
- ## Decisions
92
-
93
- | Date | Decision | Why | Alternatives Rejected | Revisit When |
94
- |------|----------|-----|----------------------|-------------|
95
- \`\`\`
96
-
97
- For **institutional-knowledge.md**:
98
- \`\`\`markdown
99
- # Institutional Knowledge
100
-
101
- > Unwritten rules, team conventions, and organizational context.
102
-
103
- ## Team Conventions
104
-
105
- - (none yet)
106
- \`\`\`
107
-
108
- For **troubleshooting.md**:
109
- \`\`\`markdown
110
- # Troubleshooting
111
-
112
- > What to do when things go wrong for non-code reasons.
113
-
114
- ## Common Failures
115
-
116
- | When This Happens | Do This | Don't Do This |
117
- |-------------------|---------|---------------|
118
- \`\`\`
119
-
120
- ## Step 4: Read the Target Document
121
-
122
- Read the target document to understand its current structure. Note:
123
- - Which section to append to
124
- - Whether it uses tables or lists
125
- - The column format if it's a table
126
-
127
- ## Step 5: Append the Fact
128
-
129
- Add the fact to the appropriate section of the target document. Match the existing format exactly:
130
-
131
- - **Table-based documents** (production-map, dangerous-assumptions, decision-log, troubleshooting): Add a new table row in the correct columns. Use today's date where a date column exists.
132
- - **List-based documents** (institutional-knowledge): Add a new list item (\`- \`) to the most appropriate section.
133
-
134
- Remove any italic example rows (rows where all cells start with \`_\`) before appending, so the document transitions from template to real content. Only remove examples from the specific table you are appending to.
135
-
136
- **Append only. Never modify or remove existing real content.**
137
-
138
- ## Step 6: Evaluate CLAUDE.md Boundary Rule
139
-
140
- Decide whether the fact also warrants a rule in CLAUDE.md's behavioral boundaries:
141
-
142
- **Add a CLAUDE.md rule if the fact:**
143
- - Describes something that should ALWAYS or NEVER be done
144
- - Could cause real damage if violated (data loss, broken deployments, security issues)
145
- - Is a hard constraint that applies across all work, not just a one-time note
146
-
147
- **Do NOT add a CLAUDE.md rule if the fact is:**
148
- - Purely informational (e.g., "staging DB is at this URL")
149
- - A one-time decision that's already captured
150
- - A diagnostic tip rather than a prohibition
151
-
152
- If a rule is warranted, read CLAUDE.md, find the appropriate section (ALWAYS, ASK FIRST, or NEVER under Behavioral Boundaries), and append the rule. If no Behavioral Boundaries section exists, append one.
153
-
154
- ## Step 7: Confirm
155
-
156
- Report what you did in this format:
157
-
158
- \`\`\`
159
- Added to [document name]:
160
- [summary of what was added]
161
-
162
- [If CLAUDE.md was also updated:]
163
- Added CLAUDE.md rule:
164
- [ALWAYS/ASK FIRST/NEVER]: [rule text]
165
-
166
- [If the fact was ambiguous:]
167
- Routed to [chosen doc] -- move to [alternative doc] if this is more about [alternative category description].
168
- \`\`\`
169
- `,
170
- "joycraft-bugfix.md": `---
171
- name: joycraft-bugfix
172
- description: Structured bug fix workflow \u2014 triage, diagnose, discuss with user, write a focused spec, hand off for implementation
173
- instructions: 32
174
- ---
175
-
176
- # Bug Fix Workflow
177
-
178
- You are fixing a bug. Follow this process in order. Do not skip steps.
179
-
180
- **Guard clause:** If this is clearly a new feature, redirect to \`/joycraft-new-feature\` and stop.
181
-
182
- ---
183
-
184
- ## Phase 1: Triage
185
-
186
- Establish what's broken. Gather: symptom, steps to reproduce, expected vs actual behavior, when it started, relevant logs/errors. If an error message or stack trace is provided, read the referenced files immediately. Try to reproduce if steps are given.
187
-
188
- **Done when:** You can describe the symptom in one sentence.
189
-
190
- ---
191
-
192
- ## Phase 2: Diagnose
193
-
194
- Find the root cause. Start from the error site and trace backward. Read source files \u2014 don't guess. Identify the specific line(s) and logic error. Check git blame if it's a recent regression.
195
-
196
- **Done when:** You can explain what's wrong, why, and where in 2-3 sentences.
197
-
198
- ---
199
-
200
- ## Phase 3: Discuss
201
-
202
- Present findings to the user BEFORE writing any code or spec:
203
- 1. **Symptom** \u2014 confirm it matches what they see
204
- 2. **Root cause** \u2014 specific file(s) and line(s)
205
- 3. **Proposed fix** \u2014 what changes, where
206
- 4. **Risk** \u2014 side effects? scope?
207
-
208
- Ask: "Does this match? Comfortable with this approach?" If large/risky, suggest decomposing into multiple specs.
209
-
210
- **Done when:** User agrees with the diagnosis and fix direction.
211
-
212
- ---
213
-
214
- ## Phase 4: Spec the Fix
215
-
216
- Write a bug fix spec to \`docs/specs/YYYY-MM-DD-bugfix-name.md\`. Create the \`docs/specs/\` directory if it doesn't exist.
217
-
218
- **Why:** Even bug fixes deserve a spec. It forces clarity on what "fixed" means, ensures test-first discipline, and creates a traceable record of the fix.
219
-
220
- Use this template:
221
-
222
- \`\`\`markdown
223
- # Fix [Bug Description] \u2014 Bug Fix Spec
224
-
225
- > **Parent Brief:** none (bug fix)
226
- > **Issue/Error:** [error message, issue link, or symptom description]
227
- > **Status:** Ready
228
- > **Date:** YYYY-MM-DD
229
- > **Estimated scope:** [1 session / N files / ~N lines]
230
-
231
- ---
232
-
233
- ## Bug
234
-
235
- What is broken? Describe the symptom the user experiences.
236
-
237
- ## Root Cause
238
-
239
- What is wrong in the code and why? Name the specific file(s) and line(s).
240
-
241
- ## Fix
242
-
243
- What changes will fix this? Be specific \u2014 describe the code change, not just "fix the bug."
244
-
245
- ## Acceptance Criteria
246
-
247
- - [ ] [The bug no longer occurs \u2014 describe the correct behavior]
248
- - [ ] [No regressions in related functionality]
249
- - [ ] Build passes
250
- - [ ] Tests pass
251
-
252
- ## Test Plan
253
-
254
- | Acceptance Criterion | Test | Type |
255
- |---------------------|------|------|
256
- | [Bug no longer occurs] | [Test that reproduces the bug, then verifies the fix] | [unit/integration/e2e] |
257
- | [No regressions] | [Existing tests still pass, or new regression test] | [unit/integration] |
258
-
259
- **Execution order:**
260
- 1. Write a test that reproduces the bug \u2014 it should FAIL (red)
261
- 2. Run the test to confirm it fails
262
- 3. Apply the fix
263
- 4. Run the test to confirm it passes (green)
264
- 5. Run the full test suite to check for regressions
265
-
266
- **Smoke test:** [The bug reproduction test \u2014 fastest way to verify the fix works]
267
-
268
- **Before implementing, verify your test harness:**
269
- 1. Run the reproduction test \u2014 it must FAIL (if it passes, you're not testing the actual bug)
270
- 2. The test must exercise your actual code \u2014 not a reimplementation or mock
271
- 3. Identify your smoke test \u2014 it must run in seconds, not minutes
272
-
273
- ## Constraints
274
-
275
- - MUST: [any hard requirements for the fix]
276
- - MUST NOT: [any prohibitions \u2014 e.g., don't change the public API]
277
-
278
- ## Affected Files
279
-
280
- | Action | File | What Changes |
281
- |--------|------|-------------|
282
-
283
- ## Edge Cases
284
-
285
- | Scenario | Expected Behavior |
286
- |----------|------------------|
287
- \`\`\`
288
-
289
- **For trivial bugs:** The spec will be short. That's fine \u2014 the structure is the point, not the length.
290
-
291
- **For large bugs that span multiple files/systems:** Consider whether this should be decomposed into multiple specs. If so, create a brief first using \`/joycraft-new-feature\`, then decompose. A bug fix spec should be implementable in a single session.
292
-
293
- ---
294
-
295
- ## Phase 5: Hand Off
296
-
297
- Tell the user:
298
-
299
- \`\`\`
300
- Bug fix spec is ready: docs/specs/YYYY-MM-DD-bugfix-name.md
301
-
302
- Summary:
303
- - Bug: [one sentence]
304
- - Root cause: [one sentence]
305
- - Fix: [one sentence]
306
- - Estimated: 1 session
307
-
308
- To execute: Start a fresh session and:
309
- 1. Read the spec
310
- 2. Write the reproduction test (must fail)
311
- 3. Apply the fix (test must pass)
312
- 4. Run full test suite
313
- 5. Run /joycraft-session-end to capture discoveries
314
- 6. Commit and PR
315
-
316
- Ready to start?
317
- \`\`\`
318
-
319
- **Why:** A fresh session for implementation produces better results. This diagnostic session has context noise from exploration \u2014 a clean session with just the spec is more focused.
320
- `,
321
- "joycraft-decompose.md": `---
322
- name: joycraft-decompose
323
- description: Break a feature brief into atomic specs \u2014 small, testable, independently executable units
324
- instructions: 32
325
- ---
326
-
327
- # Decompose Feature into Atomic Specs
328
-
329
- You have a Feature Brief (or the user has described a feature). Your job is to decompose it into atomic specs that can be executed independently \u2014 one spec per session.
330
-
331
- ## Step 1: Verify the Brief Exists
332
-
333
- Look for a Feature Brief in \`docs/briefs/\`. If one doesn't exist yet, tell the user:
334
-
335
- > No feature brief found. Run \`/joycraft-new-feature\` first to interview and create one, or describe the feature now and I'll work from your description.
336
-
337
- If the user describes the feature inline, work from that description directly. You don't need a formal brief to decompose \u2014 but recommend creating one for complex features.
338
-
339
- ## Step 2: Identify Natural Boundaries
340
-
341
- **Why:** Good boundaries make specs independently testable and committable. Bad boundaries create specs that can't be verified without other specs also being done.
342
-
343
- Read the brief (or description) and identify natural split points:
344
-
345
- - **Data layer changes** (schemas, types, migrations) \u2014 always a separate spec
346
- - **Pure functions / business logic** \u2014 separate from I/O
347
- - **UI components** \u2014 separate from data fetching
348
- - **API endpoints / route handlers** \u2014 separate from business logic
349
- - **Test infrastructure** (mocks, fixtures, helpers) \u2014 can be its own spec if substantial
350
- - **Configuration / environment** \u2014 separate from code changes
351
-
352
- Ask yourself: "Can this piece be committed and tested without the other pieces existing?" If yes, it's a good boundary.
353
-
354
- ## Step 3: Build the Decomposition Table
355
-
356
- For each atomic spec, define:
357
-
358
- | # | Spec Name | Description | Dependencies | Size |
359
- |---|-----------|-------------|--------------|------|
360
-
361
- **Rules:**
362
- - Each spec name is \`verb-object\` format (e.g., \`add-terminal-detection\`, \`extract-prompt-module\`)
363
- - Each description is ONE sentence \u2014 if you need two, the spec is too big
364
- - Dependencies reference other spec numbers \u2014 keep the dependency graph shallow
365
- - More than 2 dependencies on a single spec = it's too big, split further
366
- - Aim for 3-7 specs per feature. Fewer than 3 = probably not decomposed enough. More than 10 = the feature brief is too big
367
-
368
- ## Step 4: Present and Iterate
369
-
370
- Show the decomposition table to the user. Ask:
371
- 1. "Does this breakdown match how you think about this feature?"
372
- 2. "Are there any specs that feel too big or too small?"
373
- 3. "Should any of these run in parallel (separate worktrees)?"
374
-
375
- Iterate until the user approves.
376
-
377
- ## Step 5: Generate Atomic Specs
378
-
379
- For each approved row, create \`docs/specs/YYYY-MM-DD-spec-name.md\`. Create the \`docs/specs/\` directory if it doesn't exist.
380
-
381
- **Why:** Each spec must be self-contained \u2014 a fresh Claude session should be able to execute it without reading the Feature Brief. Copy relevant constraints and context into each spec.
382
-
383
- Use this structure:
384
-
385
- \`\`\`markdown
386
- # [Verb + Object] \u2014 Atomic Spec
387
-
388
- > **Parent Brief:** \`docs/briefs/YYYY-MM-DD-feature-name.md\` (or "standalone")
389
- > **Status:** Ready
390
- > **Date:** YYYY-MM-DD
391
- > **Estimated scope:** [1 session / N files / ~N lines]
392
-
393
- ---
394
-
395
- ## What
396
- One paragraph \u2014 what changes when this spec is done?
397
-
398
- ## Why
399
- One sentence \u2014 what breaks or is missing without this?
400
-
401
- ## Acceptance Criteria
402
- - [ ] [Observable behavior]
403
- - [ ] Build passes
404
- - [ ] Tests pass
405
-
406
- ## Test Plan
407
-
408
- | Acceptance Criterion | Test | Type |
409
- |---------------------|------|------|
410
- | [Each AC above] | [What to call/assert] | [unit/integration/e2e] |
411
-
412
- **Execution order:**
413
- 1. Write all tests above \u2014 they should fail against current/stubbed code
414
- 2. Run tests to confirm they fail (red)
415
- 3. Implement until all tests pass (green)
416
-
417
- **Smoke test:** [Identify the fastest test for iteration feedback]
418
-
419
- **Before implementing, verify your test harness:**
420
- 1. Run all tests \u2014 they must FAIL (if they pass, you're testing the wrong thing)
421
- 2. Each test calls your actual function/endpoint \u2014 not a reimplementation or the underlying library
422
- 3. Identify your smoke test \u2014 it must run in seconds, not minutes, so you get fast feedback on each change
423
-
424
- ## Constraints
425
- - MUST: [hard requirement]
426
- - MUST NOT: [hard prohibition]
427
-
428
- ## Affected Files
429
- | Action | File | What Changes |
430
- |--------|------|-------------|
431
-
432
- ## Approach
433
- Strategy, data flow, key decisions. Name one rejected alternative.
434
-
435
- ## Edge Cases
436
- | Scenario | Expected Behavior |
437
- |----------|------------------|
438
- \`\`\`
439
-
440
- If \`docs/templates/ATOMIC_SPEC_TEMPLATE.md\` exists, reference it for the full template with additional guidance.
441
-
442
- Fill in all sections \u2014 each spec must be self-contained (no "see the brief for context"). Copy relevant constraints from the Feature Brief into each spec. Write acceptance criteria specific to THIS spec, not the whole feature. Every acceptance criterion must have at least one corresponding test in the Test Plan. If the user provided test strategy info from the interview, use it to choose test types and frameworks. Include the test harness verification rules in every Test Plan.
443
-
444
- ## Step 6: Recommend Execution Strategy
445
-
446
- Based on the dependency graph:
447
- - **Independent specs** \u2014 "These can run in parallel worktrees"
448
- - **Sequential specs** \u2014 "Execute these in order: 1 -> 2 -> 4"
449
- - **Mixed** \u2014 "Start specs 1 and 3 in parallel. After 1 completes, start 2."
450
-
451
- Update the Feature Brief's Execution Strategy section with the plan (if a brief exists).
452
-
453
- ## Step 7: Hand Off
454
-
455
- Tell the user:
456
- \`\`\`
457
- Decomposition complete:
458
- - [N] atomic specs created in docs/specs/
459
- - [N] can run in parallel, [N] are sequential
460
- - Estimated total: [N] sessions
461
-
462
- To execute:
463
- - Sequential: Open a session, point Claude at each spec in order
464
- - Parallel: Use worktrees \u2014 one spec per worktree, merge when done
465
- - Each session should end with /joycraft-session-end to capture discoveries
466
-
467
- Ready to start execution?
468
- \`\`\`
469
- `,
470
- "joycraft-design.md": `---
471
- name: joycraft-design
472
- description: Design discussion before decomposition \u2014 produce a ~200-line design artifact for human review, catching wrong assumptions before they propagate into specs
473
- ---
474
-
475
- # Design Discussion
476
-
477
- You are producing a design discussion document for a feature. This sits between research and decomposition \u2014 it captures your understanding so the human can catch wrong assumptions before specs are written.
478
-
479
- **Guard clause:** If no brief path is provided and no brief exists in \`docs/briefs/\`, say:
480
- "No feature brief found. Run \`/joycraft-new-feature\` first to create one, or provide the path to your brief."
481
- Then stop.
482
-
483
- ---
484
-
485
- ## Step 1: Read Inputs
486
-
487
- Read the feature brief at the path the user provides. If the user also provides a research document path, read that too. Research is optional \u2014 if none exists, note that you'll explore the codebase directly.
488
-
489
- ## Step 2: Explore the Codebase
490
-
491
- Spawn subagents to explore the codebase for patterns relevant to the brief. Focus on:
492
-
493
- - Files and functions that will be touched or extended
494
- - Existing patterns this feature should follow (naming, data flow, error handling)
495
- - Similar features already implemented that serve as models
496
- - Boundaries and interfaces the feature must integrate with
497
-
498
- Gather file paths, function signatures, and code snippets. You need concrete evidence, not guesses.
499
-
500
- ## Step 3: Write the Design Document
501
-
502
- Create \`docs/designs/\` directory if it doesn't exist. Write the design document to \`docs/designs/YYYY-MM-DD-feature-name.md\`.
503
-
504
- The document has exactly five sections:
505
-
506
- ### Section 1: Current State
507
-
508
- What exists today in the codebase that is relevant to this feature. Include file paths, function signatures, and data flows. Be specific \u2014 reference actual code, not abstractions. If no research doc was provided, note that and describe what you found through direct exploration.
509
-
510
- ### Section 2: Desired End State
511
-
512
- What the codebase should look like when this feature is complete. Describe the change at a high level \u2014 new files, modified interfaces, new data flows. Do NOT include implementation steps. This is the "what," not the "how."
513
-
514
- ### Section 3: Patterns to Follow
515
-
516
- Existing patterns in the codebase that this feature should match. Include short code snippets and \`file:line\` references. Show the pattern, don't just name it.
517
-
518
- If this is a greenfield project with no existing patterns, propose conventions and note that no precedent exists.
519
-
520
- ### Section 4: Resolved Design Decisions
521
-
522
- Decisions you have already made, with brief rationale. Format each as:
523
-
524
- > **Decision:** [what you decided]
525
- > **Rationale:** [why, referencing existing code or constraints]
526
- > **Alternative rejected:** [what you considered and why you rejected it]
527
-
528
- ### Section 5: Open Questions
529
-
530
- Things you don't know or where multiple valid approaches exist. Each question MUST present 2-3 concrete options with pros and cons. Format:
531
-
532
- > **Q: [question]**
533
- > - **Option A:** [description] \u2014 Pro: [benefit]. Con: [cost].
534
- > - **Option B:** [description] \u2014 Pro: [benefit]. Con: [cost].
535
- > - **Option C (if applicable):** [description] \u2014 Pro: [benefit]. Con: [cost].
536
-
537
- Do NOT ask vague questions like "what do you think?" Every question must have actionable options the human can choose from.
538
-
539
- ## Step 4: Present and STOP
540
-
541
- Present the design document to the user. Say:
542
-
543
- \`\`\`
544
- Design discussion written to docs/designs/YYYY-MM-DD-feature-name.md
545
-
546
- Please review the document above. Specifically:
547
- 1. Are the patterns in Section 3 the right ones to follow, or should I use different ones?
548
- 2. Do you agree with the resolved decisions in Section 4?
549
- 3. Pick an option for each open question in Section 5 (or propose your own).
550
-
551
- Reply with your feedback. I will NOT proceed to decomposition until you have reviewed and approved this design.
552
- \`\`\`
553
-
554
- **CRITICAL: Do NOT proceed to \`/joycraft-decompose\` or generate specs.** Wait for the human to review, answer open questions, and correct any wrong assumptions. The entire value of this skill is the pause \u2014 it forces a human checkpoint before mistakes propagate.
555
-
556
- ## After Human Review
557
-
558
- Once the human responds:
559
- - Update the design document with their corrections and chosen options
560
- - Move answered questions from "Open Questions" to "Resolved Design Decisions"
561
- - Present the updated document for final confirmation
562
- - Only after explicit approval, tell the user: "Design approved. Run \`/joycraft-decompose\` with this brief to generate atomic specs."
563
- `,
564
- "joycraft-implement-level5.md": `---
565
- name: joycraft-implement-level5
566
- description: Set up Level 5 autonomous development \u2014 autofix loop, holdout scenario testing, and scenario evolution from specs
567
- instructions: 35
568
- ---
569
-
570
- # Implement Level 5 \u2014 Autonomous Development Loop
571
-
572
- You are guiding the user through setting up Level 5: the autonomous feedback loop where specs go in, validated software comes out. This is a one-time setup that installs workflows, creates a scenarios repo, and configures the autofix loop.
573
-
574
- ## Before You Begin
575
-
576
- Check prerequisites:
577
-
578
- 1. **Project must be initialized.** Look for \`.joycraft-version\`. If missing, tell the user to run \`npx joycraft init\` first.
579
- 2. **Project should be at Level 4.** Check \`docs/joycraft-assessment.md\` if it exists. If the project hasn't been assessed yet, suggest running \`/joycraft-tune\` first. But don't block \u2014 the user may know they're ready.
580
- 3. **Git repo with GitHub remote.** This setup requires GitHub Actions. Check for \`.git/\` and a GitHub remote.
581
-
582
- If prerequisites aren't met, explain what's needed and stop.
583
-
584
- ## Step 1: Explain What Level 5 Means
585
-
586
- Tell the user:
587
-
588
- > Level 5 is the autonomous loop. When you push specs, three things happen automatically:
589
- >
590
- > 1. **Scenario evolution** \u2014 A separate AI agent reads your specs and writes holdout tests in a private scenarios repo. These tests are invisible to your coding agent.
591
- > 2. **Autofix** \u2014 When CI fails on a PR, Claude Code automatically attempts a fix (up to 3 times).
592
- > 3. **Holdout validation** \u2014 When CI passes, your scenarios repo runs behavioral tests against the PR. Results post as PR comments.
593
- >
594
- > The key insight: your coding agent never sees the scenario tests. This prevents it from gaming the test suite \u2014 like a validation set in machine learning.
595
-
596
- ## Step 2: Gather Configuration
597
-
598
- Ask these questions **one at a time**:
599
-
600
- ### Question 1: Scenarios repo name
601
-
602
- > What should we call your scenarios repo? It'll be a private repo that holds your holdout tests.
603
- >
604
- > Default: \`{current-repo-name}-scenarios\`
605
-
606
- Accept the default or the user's choice.
607
-
608
- ### Question 2: GitHub App
609
-
610
- > Level 5 needs a GitHub App to provide a separate identity for autofix pushes (this avoids GitHub's anti-recursion protection). Creating one takes about 2 minutes:
611
- >
612
- > 1. Go to https://github.com/settings/apps/new
613
- > 2. Give it a name (e.g., "My Project Autofix")
614
- > 3. Uncheck "Webhook > Active" (not needed)
615
- > 4. Under **Repository permissions**, set:
616
- > - **Contents**: Read & Write
617
- > - **Pull requests**: Read & Write
618
- > - **Actions**: Read & Write
619
- > 5. Click **Create GitHub App**
620
- > 6. Note the **App ID** from the settings page
621
- > 7. Scroll to **Private keys** > click **Generate a private key** > save the \`.pem\` file
622
- > 8. Click **Install App** in the left sidebar > install it on your repo
623
- >
624
- > What's your App ID?
625
-
626
- ## Step 3: Run init-autofix
627
-
628
- Run the CLI command with the gathered configuration:
629
-
630
- \`\`\`bash
631
- npx joycraft init-autofix --scenarios-repo {name} --app-id {id}
632
- \`\`\`
633
-
634
- Review the output with the user. Confirm files were created.
635
-
636
- ## Step 4: Walk Through Secret Configuration
637
-
638
- Guide the user step by step:
639
-
640
- ### 4a: Add Secrets to Main Repo
641
-
642
- > You should already have the \`.pem\` file from when you created the app in Step 2.
643
-
644
- > Go to your repo's Settings > Secrets and variables > Actions, and add:
645
- > - \`JOYCRAFT_APP_PRIVATE_KEY\` \u2014 paste the contents of your \`.pem\` file
646
- > - \`ANTHROPIC_API_KEY\` \u2014 your Anthropic API key
647
-
648
- ### 4b: Create the Scenarios Repo
649
-
650
- > Create the private scenarios repo:
651
- > \`\`\`bash
652
- > gh repo create {scenarios-repo-name} --private
653
- > \`\`\`
654
- >
655
- > Then copy the scenario templates into it:
656
- > \`\`\`bash
657
- > cp -r docs/templates/scenarios/* ../{scenarios-repo-name}/
658
- > cd ../{scenarios-repo-name}
659
- > git add -A && git commit -m "init: scaffold scenarios repo from Joycraft"
660
- > git push
661
- > \`\`\`
662
-
663
- ### 4c: Add Secrets to Scenarios Repo
664
-
665
- > The scenarios repo also needs the App private key:
666
- > - \`JOYCRAFT_APP_PRIVATE_KEY\` \u2014 same \`.pem\` file as the main repo
667
- > - \`ANTHROPIC_API_KEY\` \u2014 same key (needed for scenario generation)
668
-
669
- ## Step 5: Verify Setup
670
-
671
- Help the user verify everything is wired correctly:
672
-
673
- 1. **Check workflow files exist:** \`ls .github/workflows/autofix.yml .github/workflows/scenarios-dispatch.yml .github/workflows/spec-dispatch.yml .github/workflows/scenarios-rerun.yml\`
674
- 2. **Check scenario templates were copied:** Verify the scenarios repo has \`example-scenario.test.ts\`, \`workflows/run.yml\`, \`workflows/generate.yml\`, \`prompts/scenario-agent.md\`
675
- 3. **Check the App ID is correct** in the workflow files (not still a placeholder)
676
-
677
- ## Step 6: Update CLAUDE.md
678
-
679
- If the project's CLAUDE.md doesn't already have an "External Validation" section, add one:
680
-
681
- > ## External Validation
682
- >
683
- > This project uses holdout scenario tests in a separate private repo.
684
- >
685
- > ### NEVER
686
- > - Access, read, or reference the scenarios repo
687
- > - Mention scenario test names or contents
688
- > - Modify the scenarios dispatch workflow to leak test information
689
- >
690
- > The scenarios repo is deliberately invisible to you. This is the holdout guarantee.
691
-
692
- ## Step 7: First Test (Optional)
693
-
694
- If the user wants to test the loop:
695
-
696
- > Want to do a quick test? Here's how:
697
- >
698
- > 1. Write a simple spec in \`docs/specs/\` and push to main \u2014 this triggers scenario generation
699
- > 2. Create a PR with a small change \u2014 when CI passes, scenarios will run
700
- > 3. Watch for the scenario test results as a PR comment
701
- >
702
- > Or deliberately break something in a PR to test the autofix loop.
703
-
704
- ## Step 8: Summary
705
-
706
- Print a summary of what was set up:
707
-
708
- > **Level 5 is live.** Here's what's running:
709
- >
710
- > | Trigger | What Happens |
711
- > |---------|-------------|
712
- > | Push specs to \`docs/specs/\` | Scenario agent writes holdout tests |
713
- > | PR fails CI | Claude autofix attempts (up to 3x) |
714
- > | PR passes CI | Holdout scenarios run against PR |
715
- > | Scenarios update | Open PRs re-tested with latest scenarios |
716
- >
717
- > Your scenarios repo: \`{name}\`
718
- > Your coding agent cannot see those tests. The holdout wall is intact.
719
-
720
- Update \`docs/joycraft-assessment.md\` if it exists \u2014 set the Level 5 score to reflect the new setup.
721
- `,
722
- "joycraft-interview.md": `---
723
- name: joycraft-interview
724
- description: Brainstorm freely about what you want to build \u2014 yap, explore ideas, and get a structured summary you can use later
725
- instructions: 18
726
- ---
727
-
728
- # Interview \u2014 Idea Exploration
729
-
730
- You are helping the user brainstorm and explore what they want to build. This is a lightweight, low-pressure conversation \u2014 not a formal spec process. Let them yap.
731
-
732
- ## How to Run the Interview
733
-
734
- ### 1. Open the Floor
735
-
736
- Start with something like:
737
- "What are you thinking about building? Just talk \u2014 I'll listen and ask questions as we go."
738
-
739
- Let the user talk freely. Do not interrupt their flow. Do not push toward structure yet.
740
-
741
- ### 2. Ask Clarifying Questions
742
-
743
- As they talk, weave in questions naturally \u2014 don't fire them all at once:
744
-
745
- - **What problem does this solve?** Who feels the pain today?
746
- - **What does "done" look like?** If this worked perfectly, what would a user see?
747
- - **What are the constraints?** Time, tech, team, budget \u2014 what boxes are we in?
748
- - **What's NOT in scope?** What's tempting but should be deferred?
749
- - **What are the edge cases?** What could go wrong? What's the weird input?
750
- - **What exists already?** Are we building on something or starting fresh?
751
-
752
- ### 3. Play Back Understanding
753
-
754
- After the user has gotten their ideas out, reflect back:
755
- "So if I'm hearing you right, you want to [summary]. The core problem is [X], and done looks like [Y]. Is that right?"
756
-
757
- Let them correct and refine. Iterate until they say "yes, that's it."
758
-
759
- ### 4. Write a Draft Brief
760
-
761
- Create a draft file at \`docs/briefs/YYYY-MM-DD-topic-draft.md\`. Create the \`docs/briefs/\` directory if it doesn't exist.
762
-
763
- Use this format:
764
-
765
- \`\`\`markdown
766
- # [Topic] \u2014 Draft Brief
767
-
768
- > **Date:** YYYY-MM-DD
769
- > **Status:** DRAFT
770
- > **Origin:** /joycraft-interview session
771
-
772
- ---
773
-
774
- ## The Idea
775
- [2-3 paragraphs capturing what the user described \u2014 their words, their framing]
776
-
777
- ## Problem
778
- [What pain or gap this addresses]
779
-
780
- ## What "Done" Looks Like
781
- [The user's description of success \u2014 observable outcomes]
782
-
783
- ## Constraints
784
- - [constraint 1]
785
- - [constraint 2]
786
-
787
- ## Open Questions
788
- - [things that came up but weren't resolved]
789
- - [decisions that need more thought]
790
-
791
- ## Out of Scope (for now)
792
- - [things explicitly deferred]
793
-
794
- ## Raw Notes
795
- [Any additional context, quotes, or tangents worth preserving]
796
- \`\`\`
797
-
798
- ### 5. Hand Off
799
-
800
- After writing the draft, tell the user:
801
-
802
- \`\`\`
803
- Draft brief saved to docs/briefs/YYYY-MM-DD-topic-draft.md
804
-
805
- When you're ready to move forward:
806
- - /joycraft-new-feature \u2014 formalize this into a full Feature Brief with specs
807
- - /joycraft-decompose \u2014 break it directly into atomic specs if scope is clear
808
- - Or just keep brainstorming \u2014 run /joycraft-interview again anytime
809
- \`\`\`
810
-
811
- ## Guidelines
812
-
813
- - **This is NOT /joycraft-new-feature.** Do not push toward formal briefs, decomposition tables, or atomic specs. The point is exploration.
814
- - **Let the user lead.** Your job is to listen, clarify, and capture \u2014 not to structure or direct.
815
- - **Mark everything as DRAFT.** The output is a starting point, not a commitment.
816
- - **Keep it short.** The draft brief should be 1-2 pages max. Capture the essence, not every detail.
817
- - **Multiple interviews are fine.** The user might run this several times as their thinking evolves. Each creates a new dated draft.
818
- `,
819
- "joycraft-lockdown.md": `---
820
- name: joycraft-lockdown
821
- description: Generate constrained execution boundaries for an implementation session -- NEVER rules and deny patterns to prevent agent overreach
822
- instructions: 28
823
- ---
824
-
825
- # Lockdown Mode
826
-
827
- The user wants to constrain agent behavior for an implementation session. Your job is to interview them about what should be off-limits, then generate CLAUDE.md NEVER rules and \`.claude/settings.json\` deny patterns they can review and apply.
828
-
829
- ## When Is Lockdown Useful?
830
-
831
- Lockdown is most valuable for:
832
- - **Complex tech stacks** (hardware, firmware, multi-device) where agents can cause real damage
833
- - **Long-running autonomous sessions** where you won't be monitoring every action
834
- - **Production-adjacent work** where accidental network calls or package installs are risky
835
-
836
- For simple feature work on a well-tested codebase, lockdown is usually overkill. Mention this context to the user so they can decide.
837
-
838
- ## Step 1: Check for Tests
839
-
840
- Before starting the interview, check if the project has test files or directories (look for \`tests/\`, \`test/\`, \`__tests__/\`, \`spec/\`, or files matching \`*.test.*\`, \`*.spec.*\`).
841
-
842
- If no tests are found, tell the user:
843
-
844
- > Lockdown mode is most useful when you already have tests in place -- it prevents the agent from modifying them while constraining behavior to writing code and running tests. Consider running \`/joycraft-new-feature\` first to set up a test-driven workflow, then come back to lock it down.
845
-
846
- If the user wants to proceed anyway, continue with the interview.
847
-
848
- ## Step 2: Interview -- What to Lock Down
849
-
850
- Ask these three questions, one at a time. Wait for the user's response before proceeding to the next question.
851
-
852
- ### Question 1: Read-Only Files
853
-
854
- > What test files or directories should be off-limits for editing? (e.g., \`tests/\`, \`__tests__/\`, \`spec/\`, specific test files)
855
- >
856
- > I'll generate NEVER rules to prevent editing these.
857
-
858
- If the user isn't sure, suggest the test directories you found in Step 1.
859
-
860
- ### Question 2: Allowed Commands
861
-
862
- > What commands should the agent be allowed to run? Defaults:
863
- > - Write and edit source code files
864
- > - Run the project's smoke test command
865
- > - Run the full test suite
866
- >
867
- > Any other commands to explicitly allow? Or should I restrict to just these?
868
-
869
- ### Question 3: Denied Commands
870
-
871
- > What commands should be denied? Defaults:
872
- > - Package installs (\`npm install\`, \`pip install\`, \`cargo add\`, \`go get\`, etc.)
873
- > - Network tools (\`curl\`, \`wget\`, \`ping\`, \`ssh\`)
874
- > - Direct log file reading
875
- >
876
- > Any specific commands to add or remove from this list?
877
-
878
- **Edge case -- user wants to allow some network access:** If the user mentions API tests or specific endpoints that need network access, exclude those from the deny list and note the exception in the output.
879
-
880
- **Edge case -- user wants to lock down file writes:** If the user wants to prevent ALL file writes, warn them:
881
-
882
- > Denying all file writes would prevent the agent from doing any work. I recommend keeping source code writes allowed and only locking down test files, config files, or other sensitive directories.
883
-
884
- ## Step 3: Generate Boundaries
885
-
886
- Based on the interview responses, generate output in this exact format:
887
-
888
- \`\`\`
889
- ## Lockdown boundaries generated
890
-
891
- Review these suggestions and add them to your project:
892
-
893
- ### CLAUDE.md -- add to NEVER section:
894
-
895
- - Edit any file in \`[user's test directories]\`
896
- - Run \`[denied package manager commands]\`
897
- - Use \`[denied network tools]\`
898
- - Read log files directly -- interact with logs only through test assertions
899
- - [Any additional NEVER rules based on user responses]
900
-
901
- ### .claude/settings.json -- suggested deny patterns:
902
-
903
- Add these to the \`permissions.deny\` array:
904
-
905
- ["[command1]", "[command2]", "[command3]"]
906
-
907
- ---
908
-
909
- Copy these into your project manually, or tell me to apply them now (I'll show you the exact changes for approval first).
910
- \`\`\`
911
-
912
- Adjust the content based on the actual interview responses:
913
- - Only include deny patterns for commands the user confirmed should be denied
914
- - Only include NEVER rules for directories/files the user specified
915
- - If the user allowed certain network tools or package managers, exclude those
916
-
917
- ## Recommended Permission Mode
918
-
919
- After generating the boundaries above, also recommend a Claude Code permission mode. Include this section in your output:
920
-
921
- \`\`\`
922
- ### Recommended Permission Mode
923
-
924
- You don't need \`--dangerously-skip-permissions\`. Safer alternatives exist:
925
-
926
- | Your situation | Use | Why |
927
- |---|---|---|
928
- | Autonomous spec execution | \`--permission-mode dontAsk\` + allowlist above | Only pre-approved commands run |
929
- | Long session with some trust | \`--permission-mode auto\` | Safety classifier reviews each action |
930
- | Interactive development | \`--permission-mode acceptEdits\` | Auto-approves file edits, prompts for commands |
931
-
932
- **For lockdown mode, we recommend \`--permission-mode dontAsk\`** combined with the deny patterns above. This gives you full autonomy for allowed operations while blocking everything else -- no classifier overhead, no prompts, and no safety bypass.
933
-
934
- \`--dangerously-skip-permissions\` disables ALL safety checks. The modes above give you autonomy without removing the guardrails.
935
- \`\`\`
936
-
937
- ## Step 4: Offer to Apply
938
-
939
- If the user asks you to apply the changes:
940
-
941
- 1. **For CLAUDE.md:** Read the existing CLAUDE.md, find the Behavioral Boundaries section, and show the user the exact diff for the NEVER section. Ask for confirmation before writing.
942
- 2. **For settings.json:** Read the existing \`.claude/settings.json\`, show the user what the \`permissions.deny\` array will look like after adding the new patterns. Ask for confirmation before writing.
943
-
944
- **Never auto-apply. Always show the exact changes and wait for explicit approval.**
945
- `,
946
- "joycraft-new-feature.md": `---
947
- name: joycraft-new-feature
948
- description: Guided feature development \u2014 interview the user, produce a Feature Brief, then decompose into atomic specs
949
- instructions: 35
950
- ---
951
-
952
- # New Feature Workflow
953
-
954
- You are starting a new feature. Follow this process in order. Do not skip steps.
955
-
956
- ## Phase 1: Interview
957
-
958
- Interview the user about what they want to build. Let them talk \u2014 your job is to listen, then sharpen.
959
-
960
- **Ask about:**
961
- - What problem does this solve? Who is affected?
962
- - What does "done" look like?
963
- - Hard constraints? (business rules, tech limitations, deadlines)
964
- - What is explicitly NOT in scope? (push hard on this)
965
- - Edge cases or error conditions?
966
- - What existing code/patterns should this follow?
967
- - Testing: existing setup? framework? smoke test budget? lockdown mode desired?
968
-
969
- **Interview technique:**
970
- - Let the user "yap" \u2014 don't interrupt their flow
971
- - Play back your understanding: "So if I'm hearing you right..."
972
- - Push toward testable statements: "How would we verify that works?"
973
-
974
- Keep asking until you can fill out a Feature Brief.
975
-
976
- ## Phase 2: Feature Brief
977
-
978
- Write a Feature Brief to \`docs/briefs/YYYY-MM-DD-feature-name.md\`. Create the \`docs/briefs/\` directory if it doesn't exist.
979
-
980
- **Why:** The brief is the single source of truth for what we're building. It prevents scope creep and gives every spec a shared reference point.
981
-
982
- Use this structure:
983
-
984
- \`\`\`markdown
985
- # [Feature Name] \u2014 Feature Brief
986
-
987
- > **Date:** YYYY-MM-DD
988
- > **Project:** [project name]
989
- > **Status:** Interview | Decomposing | Specs Ready | In Progress | Complete
990
-
991
- ---
992
-
993
- ## Vision
994
- What are we building and why? The full picture in 2-4 paragraphs.
995
-
996
- ## User Stories
997
- - As a [role], I want [capability] so that [benefit]
998
-
999
- ## Hard Constraints
1000
- - MUST: [constraint that every spec must respect]
1001
- - MUST NOT: [prohibition that every spec must respect]
1002
-
1003
- ## Out of Scope
1004
- - NOT: [tempting but deferred]
1005
-
1006
- ## Test Strategy
1007
- - **Existing setup:** [framework and tools, or "none yet"]
1008
- - **User expertise:** [comfortable / learning / needs guidance]
1009
- - **Test types:** [smoke, unit, integration, e2e, etc.]
1010
- - **Smoke test budget:** [target time for fast-feedback tests]
1011
- - **Lockdown mode:** [yes/no \u2014 constrain agent to code + tests only]
1012
-
1013
- ## Decomposition
1014
- | # | Spec Name | Description | Dependencies | Est. Size |
1015
- |---|-----------|-------------|--------------|-----------|
1016
- | 1 | [verb-object] | [one sentence] | None | [S/M/L] |
1017
-
1018
- ## Execution Strategy
1019
- - [ ] Sequential (specs have chain dependencies)
1020
- - [ ] Parallel worktrees (specs are independent)
1021
- - [ ] Mixed
1022
-
1023
- ## Success Criteria
1024
- - [ ] [End-to-end behavior 1]
1025
- - [ ] [No regressions in existing features]
1026
- \`\`\`
1027
-
1028
- If \`docs/templates/FEATURE_BRIEF_TEMPLATE.md\` exists, reference it for the full template with additional guidance.
1029
-
1030
- Present the brief to the user. Focus review on:
1031
- - "Does the decomposition match how you think about this?"
1032
- - "Is anything in scope that shouldn't be?"
1033
- - "Are the specs small enough? Can each be described in one sentence?"
1034
-
1035
- Iterate until approved.
1036
-
1037
- ## Phase 3: Generate Atomic Specs
1038
-
1039
- For each row in the decomposition table, create a self-contained spec file at \`docs/specs/YYYY-MM-DD-spec-name.md\`. Create the \`docs/specs/\` directory if it doesn't exist.
1040
-
1041
- **Why:** Each spec must be understandable WITHOUT reading the Feature Brief. This prevents the "Curse of Instructions" \u2014 no spec should require holding the entire feature in context. Copy relevant context into each spec.
1042
-
1043
- Use this structure for each spec:
1044
-
1045
- \`\`\`markdown
1046
- # [Verb + Object] \u2014 Atomic Spec
1047
-
1048
- > **Parent Brief:** \`docs/briefs/YYYY-MM-DD-feature-name.md\`
1049
- > **Status:** Ready
1050
- > **Date:** YYYY-MM-DD
1051
- > **Estimated scope:** [1 session / N files / ~N lines]
1052
-
1053
- ---
1054
-
1055
- ## What
1056
- One paragraph \u2014 what changes when this spec is done?
1057
-
1058
- ## Why
1059
- One sentence \u2014 what breaks or is missing without this?
1060
-
1061
- ## Acceptance Criteria
1062
- - [ ] [Observable behavior]
1063
- - [ ] Build passes
1064
- - [ ] Tests pass
1065
-
1066
- ## Test Plan
1067
-
1068
- | Acceptance Criterion | Test | Type |
1069
- |---------------------|------|------|
1070
- | [Each AC above] | [What to call/assert] | [unit/integration/e2e] |
1071
-
1072
- **Execution order:**
1073
- 1. Write all tests above \u2014 they should fail against current/stubbed code
1074
- 2. Run tests to confirm they fail (red)
1075
- 3. Implement until all tests pass (green)
1076
-
1077
- **Smoke test:** [Identify the fastest test for iteration feedback]
1078
-
1079
- **Before implementing, verify your test harness:**
1080
- 1. Run all tests \u2014 they must FAIL (if they pass, you're testing the wrong thing)
1081
- 2. Each test calls your actual function/endpoint \u2014 not a reimplementation or the underlying library
1082
- 3. Identify your smoke test \u2014 it must run in seconds, not minutes, so you get fast feedback on each change
1083
-
1084
- ## Constraints
1085
- - MUST: [hard requirement]
1086
- - MUST NOT: [hard prohibition]
1087
-
1088
- ## Affected Files
1089
- | Action | File | What Changes |
1090
- |--------|------|-------------|
1091
-
1092
- ## Approach
1093
- Strategy, data flow, key decisions. Name one rejected alternative.
1094
-
1095
- ## Edge Cases
1096
- | Scenario | Expected Behavior |
1097
- |----------|------------------|
1098
- \`\`\`
1099
-
1100
- If \`docs/templates/ATOMIC_SPEC_TEMPLATE.md\` exists, reference it for the full template with additional guidance.
1101
-
1102
- ## Phase 4: Hand Off for Execution
1103
-
1104
- Tell the user:
1105
- \`\`\`
1106
- Feature Brief and [N] atomic specs are ready.
1107
-
1108
- Specs:
1109
- 1. [spec-name] \u2014 [one sentence] [S/M/L]
1110
- 2. [spec-name] \u2014 [one sentence] [S/M/L]
1111
- ...
1112
-
1113
- Recommended execution:
1114
- - [Parallel/Sequential/Mixed strategy]
1115
- - Estimated: [N] sessions total
1116
-
1117
- To execute: Start a fresh session per spec. Each session should:
1118
- 1. Read the spec
1119
- 2. Implement
1120
- 3. Run /joycraft-session-end to capture discoveries
1121
- 4. Commit and PR
1122
-
1123
- Ready to start?
1124
- \`\`\`
1125
-
1126
- **Why:** A fresh session for execution produces better results. The interview session has too much context noise \u2014 a clean session with just the spec is more focused.
1127
-
1128
- You can also use \`/joycraft-decompose\` to re-decompose a brief if the breakdown needs adjustment, or run \`/joycraft-interview\` first for a lighter brainstorm before committing to the full workflow.
1129
- `,
1130
- "joycraft-research.md": `---
1131
- name: joycraft-research
1132
- description: Produce objective codebase research by isolating question generation from fact-gathering \u2014 subagent sees only questions, never the brief
1133
- ---
1134
-
1135
- # Research Codebase for a Feature
1136
-
1137
- You are producing objective codebase research to inform a future spec or implementation. The key insight: the researching agent must never see the brief or ticket \u2014 only research questions. This prevents opinions from contaminating the facts.
1138
-
1139
- **Guard clause:** If the user doesn't provide a brief path or inline description, ask:
1140
- "What feature or change are you researching? Provide a brief path (e.g., \`docs/briefs/2026-03-30-my-feature.md\`) or describe it in a few sentences."
1141
-
1142
- ---
1143
-
1144
- ## Phase 1: Generate Research Questions
1145
-
1146
- Read the brief file (if a path was provided) or use the user's inline description.
1147
-
1148
- Identify which zones of the codebase are relevant to this feature. Then generate 5-10 research questions that are:
1149
-
1150
- - **Objective and fact-seeking** \u2014 "How does X work?" not "How should we build X?"
1151
- - **Specific to the codebase** \u2014 reference concrete systems, files, or flows
1152
- - **Answerable by reading code** \u2014 no questions about business strategy or user preferences
1153
-
1154
- Good examples:
1155
- - "How does endpoint registration work in the current router?"
1156
- - "What patterns exist for input validation across existing handlers?"
1157
- - "Trace the data flow from API request to database write for entity X."
1158
- - "What test infrastructure exists? Where are fixtures, mocks, and helpers?"
1159
- - "What dependencies does module Y import, and what does its public API look like?"
1160
-
1161
- Bad examples (do NOT generate these):
1162
- - "What's the best way to implement this feature?" (opinion)
1163
- - "Should we use library X or Y?" (recommendation)
1164
- - "What would a good architecture look like?" (design, not research)
1165
-
1166
- Write the questions to a temporary file at \`docs/research/.questions-tmp.md\`. Create the \`docs/research/\` directory if it doesn't exist.
1167
-
1168
- **Do NOT include any content from the brief in this file \u2014 only the questions.**
1169
-
1170
- ---
1171
-
1172
- ## Phase 2: Spawn Research Subagent
1173
-
1174
- Use Claude Code's Agent tool to spawn a subagent. Pass ONLY the research questions \u2014 never the brief path, brief content, or feature description.
1175
-
1176
- Build the subagent prompt by reading the questions file you just wrote, then use this template:
1177
-
1178
- \`\`\`
1179
- You are researching a codebase to answer specific questions. You have NO context about why these questions are being asked \u2014 you are simply gathering facts.
1180
-
1181
- RULES \u2014 these are hard constraints:
1182
- - Answer each question with FACTS ONLY: file paths, function signatures, data flows, patterns, dependencies
1183
- - Do NOT recommend, suggest, or opine on anything
1184
- - Do NOT speculate about what should be built or how
1185
- - If a question cannot be answered (no relevant code exists), say "No existing code found for this"
1186
- - Use the Read tool and Grep tool to explore the codebase thoroughly
1187
- - Include code snippets only when they are essential evidence (e.g., a function signature, a config block)
1188
-
1189
- QUESTIONS:
1190
- [INSERT_QUESTIONS_HERE]
1191
-
1192
- OUTPUT FORMAT \u2014 write your findings as a single markdown document using this structure:
1193
-
1194
- # Codebase Research
1195
-
1196
- **Date:** [today's date]
1197
- **Questions answered:** [N/total]
1198
-
1199
- ---
1200
-
1201
- ## Q1: [question text]
1202
-
1203
- [Facts, file paths, function signatures, data flows. No opinions.]
1204
-
1205
- ## Q2: [question text]
1206
-
1207
- [Facts, file paths, function signatures, data flows. No opinions.]
1208
-
1209
- [Continue for all questions]
1210
- \`\`\`
1211
-
1212
- ## Phase 3: Write the Research Document
1213
-
1214
- Take the subagent's response and write it to \`docs/research/YYYY-MM-DD-feature-name.md\`. Derive the feature name from the brief filename or the user's description (lowercase, hyphenated).
1215
-
1216
- Delete the temporary questions file (\`docs/research/.questions-tmp.md\`).
1217
-
1218
- Present the research document path to the user:
1219
-
1220
- \`\`\`
1221
- Research complete: docs/research/YYYY-MM-DD-feature-name.md
1222
-
1223
- This document contains objective facts about your codebase \u2014 no opinions or recommendations.
1224
-
1225
- Next steps:
1226
- - /joycraft-decompose \u2014 break the feature into atomic specs (research will inform the specs)
1227
- - /joycraft-new-feature \u2014 formalize into a full Feature Brief first
1228
- - Read the research and add any corrections or missing context manually
1229
- \`\`\`
1230
-
1231
- ## Edge Cases
1232
-
1233
- | Scenario | Behavior |
1234
- |----------|----------|
1235
- | No brief provided | Accept inline description, generate questions from that |
1236
- | Codebase is empty or new | Research doc reports "no existing patterns found" per question |
1237
- | User runs research twice for same feature | Overwrites previous research doc (same filename) |
1238
- | Brief is very short (1-2 sentences) | Still generate questions \u2014 even simple features benefit from understanding existing patterns |
1239
- | \`docs/research/\` doesn't exist | Create it |
1240
- `,
1241
- "joycraft-session-end.md": `---
1242
- name: joycraft-session-end
1243
- description: Wrap up a session \u2014 capture discoveries, verify, prepare for PR or next session
1244
- instructions: 22
1245
- ---
1246
-
1247
- # Session Wrap-Up
1248
-
1249
- Before ending this session, complete these steps in order.
1250
-
1251
- ## 1. Capture Discoveries
1252
-
1253
- **Why:** Discoveries are the surprises \u2014 things that weren't in the spec or that contradicted expectations. They prevent future sessions from hitting the same walls.
1254
-
1255
- Check: did anything surprising happen during this session? If yes, create or update a discovery file at \`docs/discoveries/YYYY-MM-DD-topic.md\`. Create the \`docs/discoveries/\` directory if it doesn't exist.
1256
-
1257
- Only capture what's NOT obvious from the code or git diff:
1258
- - "We thought X but found Y" \u2014 assumptions that were wrong
1259
- - "This API/library behaves differently than documented" \u2014 external gotchas
1260
- - "This edge case needs handling in a future spec" \u2014 deferred work with context
1261
- - "The approach in the spec didn't work because..." \u2014 spec-vs-reality gaps
1262
- - Key decisions made during implementation that aren't in the spec
1263
-
1264
- **Do NOT capture:**
1265
- - Files changed (that's the diff)
1266
- - What you set out to do (that's the spec)
1267
- - Step-by-step narrative of the session (nobody re-reads these)
1268
-
1269
- Use this format:
1270
-
1271
- \`\`\`markdown
1272
- # Discoveries \u2014 [topic]
1273
-
1274
- **Date:** YYYY-MM-DD
1275
- **Spec:** [link to spec if applicable]
1276
-
1277
- ## [Discovery title]
1278
- **Expected:** [what we thought would happen]
1279
- **Actual:** [what actually happened]
1280
- **Impact:** [what this means for future work]
1281
- \`\`\`
1282
-
1283
- If nothing surprising happened, skip the discovery file entirely. No discovery is a good sign \u2014 the spec was accurate.
1284
-
1285
- ## 1b. Update Context Documents
1286
-
1287
- If \`docs/context/\` exists, quickly check whether this session revealed anything about:
1288
-
1289
- - **Production risks** \u2014 did you interact with or learn about production vs staging systems? \u2192 Update \`docs/context/production-map.md\`
1290
- - **Wrong assumptions** \u2014 did the agent (or you) assume something that turned out to be false? \u2192 Update \`docs/context/dangerous-assumptions.md\`
1291
- - **Key decisions** \u2014 did you make an architectural or tooling choice? \u2192 Add a row to \`docs/context/decision-log.md\`
1292
- - **Unwritten rules** \u2014 did you discover a convention or constraint not documented anywhere? \u2192 Update \`docs/context/institutional-knowledge.md\`
1293
-
1294
- Skip this if nothing applies. Don't force it \u2014 only update when there's genuine new context.
1295
-
1296
- ## 2. Run Validation
1297
-
1298
- Run the project's validation commands. Check CLAUDE.md for project-specific commands. Common checks:
1299
-
1300
- - Type-check (e.g., \`tsc --noEmit\`, \`mypy\`, \`cargo check\`)
1301
- - Tests (e.g., \`npm test\`, \`pytest\`, \`cargo test\`)
1302
- - Lint (e.g., \`eslint\`, \`ruff\`, \`clippy\`)
1303
-
1304
- Fix any failures before proceeding.
1305
-
1306
- ## 3. Update Spec Status
1307
-
1308
- If working from an atomic spec in \`docs/specs/\`:
1309
- - All acceptance criteria met \u2014 update status to \`Complete\`
1310
- - Partially done \u2014 update status to \`In Progress\`, note what's left
1311
-
1312
- If working from a Feature Brief in \`docs/briefs/\`, check off completed specs in the decomposition table.
1313
-
1314
- ## 4. Commit
1315
-
1316
- Commit all changes including the discovery file (if created) and spec status updates. The commit message should reference the spec if applicable.
1317
-
1318
- ## 5. Push and PR (if autonomous git is enabled)
1319
-
1320
- **Check CLAUDE.md for "Git Autonomy" in the Behavioral Boundaries section.** If it says "STRICTLY ENFORCED" or the ALWAYS section includes "Push to feature branches immediately after every commit":
1321
-
1322
- 1. **Push immediately.** Run \`git push origin <branch>\` \u2014 do not ask, do not hesitate.
1323
- 2. **Open a PR if the feature is complete.** Check the parent Feature Brief's decomposition table \u2014 if all specs are done, run \`gh pr create\` with a summary of all completed specs. Do not ask first.
1324
- 3. **If not all specs are done,** still push. The PR comes when the last spec is complete.
1325
-
1326
- If CLAUDE.md does NOT have autonomous git rules (or has "ASK FIRST" for pushing), ask the user before pushing.
1327
-
1328
- ## 6. Report
1329
-
1330
- \`\`\`
1331
- Session complete.
1332
- - Spec: [spec name] \u2014 [Complete / In Progress]
1333
- - Build: [passing / failing]
1334
- - Discoveries: [N items / none]
1335
- - Pushed: [yes / no \u2014 and why not]
1336
- - PR: [opened #N / not yet \u2014 N specs remaining]
1337
- - Next: [what the next session should tackle]
1338
- \`\`\`
1339
- `,
1340
- "joycraft-tune.md": `---
1341
- name: joycraft-tune
1342
- description: Assess and upgrade your project's AI development harness \u2014 score 7 dimensions, apply fixes, show path to Level 5
1343
- instructions: 15
1344
- ---
1345
-
1346
- # Tune \u2014 Project Harness Assessment & Upgrade
1347
-
1348
- You are evaluating and upgrading this project's AI development harness.
1349
-
1350
- ## Step 1: Detect Harness State
1351
-
1352
- Check for: CLAUDE.md (with meaningful content), \`docs/specs/\`, \`docs/briefs/\`, \`docs/discoveries/\`, \`.claude/skills/\`, and test configuration.
1353
-
1354
- ## Step 2: Route
1355
-
1356
- - **No harness** (no CLAUDE.md or just a README): Recommend \`npx joycraft init\` and stop.
1357
- - **Harness exists**: Continue to assessment.
1358
-
1359
- ## Step 3: Assess \u2014 Score 7 Dimensions (1-5 scale)
1360
-
1361
- Read CLAUDE.md and explore the project. Score each with specific evidence:
1362
-
1363
- | Dimension | What to Check |
1364
- |-----------|--------------|
1365
- | Spec Quality | \`docs/specs/\` \u2014 structured? acceptance criteria? self-contained? |
1366
- | Spec Granularity | Can each spec be done in one session? |
1367
- | Behavioral Boundaries | ALWAYS/ASK FIRST/NEVER sections (or equivalent rules under any heading) |
1368
- | Skills & Hooks | \`.claude/skills/\` files, hooks config |
1369
- | Documentation | \`docs/\` structure, templates, referenced from CLAUDE.md |
1370
- | Knowledge Capture | \`docs/discoveries/\`, \`docs/context/*.md\` \u2014 existence AND real content |
1371
- | Testing & Validation | Test framework, CI pipeline, validation commands in CLAUDE.md |
1372
-
1373
- Score 1 = absent, 3 = partially there, 5 = comprehensive. Give credit for substance over format.
1374
-
1375
- ## Step 4: Write Assessment
1376
-
1377
- Write to \`docs/joycraft-assessment.md\` AND display it. Include: scores table, detailed findings (evidence + gap + recommendation per dimension), and an upgrade plan (up to 5 actions ordered by impact).
1378
-
1379
- ## Step 5: Apply Upgrades
1380
-
1381
- Apply using three tiers \u2014 do NOT ask per-item permission:
1382
-
1383
- **Tier 1 (silent):** Create missing dirs, install missing skills, copy missing templates, create AGENTS.md.
1384
-
1385
- **Before Tier 2, ask TWO things:**
1386
-
1387
- 1. **Git autonomy:** Cautious (ask before push/PR) or Autonomous (push + PR without asking)?
1388
- 2. **Risk interview (3-5 questions, one at a time):** What could break? What services connect to prod? Unwritten rules? Off-limits files/commands? Skip if \`docs/context/\` already has content.
1389
-
1390
- From answers, generate: CLAUDE.md boundary rules, \`.claude/settings.json\` deny patterns, \`docs/context/\` documents. Also recommend a permission mode (\`auto\` for most; \`dontAsk\` + allowlist for high-risk).
1391
-
1392
- **Tier 2 (show diff):** Add missing CLAUDE.md sections (Boundaries, Workflow, Key Files). Draft from real codebase content. Append only \u2014 never reformat existing content.
1393
-
1394
- **Tier 3 (confirm first):** Rewriting existing sections, overwriting customized files, suggesting test framework installs.
1395
-
1396
- After applying, append to \`docs/joycraft-history.md\` and show a consolidated upgrade results table.
1397
-
1398
- ## Step 6: Show Path to Level 5
1399
-
1400
- Show a tailored roadmap: Level 2-5 table, specific next steps based on actual gaps, and the Level 5 north star (spec queue, autofix, holdout scenarios, self-improving harness).
1401
-
1402
- ## Edge Cases
1403
-
1404
- - **CLAUDE.md is just a README:** Treat as no harness.
1405
- - **Non-Joycraft skills:** Acknowledge, don't replace.
1406
- - **Rules under non-standard headings:** Give credit for substance.
1407
- - **Previous assessment exists:** Read it first. If nothing to upgrade, say so.
1408
- - **Non-Joycraft content in CLAUDE.md:** Preserve as-is. Only append.
1409
- `,
1410
- "joycraft-verify.md": `---
1411
- name: joycraft-verify
1412
- description: Spawn an independent verifier subagent to check an implementation against its spec -- read-only, no code edits, structured pass/fail verdict
1413
- instructions: 30
1414
- ---
1415
-
1416
- # Verify Implementation Against Spec
1417
-
1418
- The user wants independent verification of an implementation. Your job is to find the relevant spec, extract its acceptance criteria and test plan, then spawn a separate verifier subagent that checks each criterion and produces a structured verdict.
1419
-
1420
- **Why a separate subagent?** Anthropic's research found that agents reliably skew positive when grading their own work. Separating the agent doing the work from the agent judging it consistently outperforms self-evaluation. The verifier gets a clean context window with no implementation bias.
1421
-
1422
- ## Step 1: Find the Spec
1423
-
1424
- If the user provided a spec path (e.g., \`/joycraft-verify docs/specs/2026-03-26-add-widget.md\`), use that path directly.
1425
-
1426
- If no path was provided, scan \`docs/specs/\` for spec files. Pick the most recently modified \`.md\` file in that directory. If \`docs/specs/\` doesn't exist or is empty, tell the user:
1427
-
1428
- > No specs found in \`docs/specs/\`. Please provide a spec path: \`/joycraft-verify path/to/spec.md\`
1429
-
1430
- ## Step 2: Read and Parse the Spec
1431
-
1432
- Read the spec file and extract:
1433
-
1434
- 1. **Spec name** -- from the H1 title
1435
- 2. **Acceptance Criteria** -- the checklist under the \`## Acceptance Criteria\` section
1436
- 3. **Test Plan** -- the table under the \`## Test Plan\` section, including any test commands
1437
- 4. **Constraints** -- the \`## Constraints\` section if present
1438
-
1439
- If the spec has no Acceptance Criteria section, tell the user:
1440
-
1441
- > This spec doesn't have an Acceptance Criteria section. Verification needs criteria to check against. Add acceptance criteria to the spec and try again.
1442
-
1443
- If the spec has no Test Plan section, note this but proceed -- the verifier can still check criteria by reading code and running any available project tests.
1444
-
1445
- ## Step 3: Identify Test Commands
1446
-
1447
- Look for test commands in these locations (in priority order):
1448
-
1449
- 1. The spec's Test Plan section (look for commands in backticks or "Type" column entries like "unit", "integration", "e2e", "build")
1450
- 2. The project's CLAUDE.md (look for test/build commands in the Development Workflow section)
1451
- 3. Common defaults based on the project type:
1452
- - Node.js: \`npm test\` or \`pnpm test --run\`
1453
- - Python: \`pytest\`
1454
- - Rust: \`cargo test\`
1455
- - Go: \`go test ./...\`
1456
-
1457
- Build a list of specific commands the verifier should run.
1458
-
1459
- ## Step 4: Spawn the Verifier Subagent
1460
-
1461
- Use Claude Code's Agent tool to spawn a subagent with the following prompt. Replace the placeholders with the actual content extracted in Steps 2-3.
1462
-
1463
- \`\`\`
1464
- You are a QA verifier. Your job is to independently verify an implementation against its spec. You have NO context about how the implementation was done -- you are checking it fresh.
1465
-
1466
- RULES -- these are hard constraints, not suggestions:
1467
- - You may READ any file using the Read tool or cat
1468
- - You may RUN these specific test/build commands: [TEST_COMMANDS]
1469
- - You may NOT edit, create, or delete any files
1470
- - You may NOT run commands that modify state (no git commit, no npm install, no file writes)
1471
- - You may NOT install packages or access the network
1472
- - Report what you OBSERVE, not what you expect or hope
1473
-
1474
- SPEC NAME: [SPEC_NAME]
1475
-
1476
- ACCEPTANCE CRITERIA:
1477
- [ACCEPTANCE_CRITERIA]
1478
-
1479
- TEST PLAN:
1480
- [TEST_PLAN]
1481
-
1482
- CONSTRAINTS:
1483
- [CONSTRAINTS_OR_NONE]
1484
-
1485
- YOUR TASK:
1486
- For each acceptance criterion, determine if it PASSES or FAILS based on evidence:
1487
-
1488
- 1. Run the test commands listed above. Record the output.
1489
- 2. For each acceptance criterion:
1490
- a. Check if there is a corresponding test and whether it passes
1491
- b. If no test exists, read the relevant source files to verify the criterion is met
1492
- c. If the criterion cannot be verified by reading code or running tests, mark it MANUAL CHECK NEEDED
1493
- 3. For criteria about build/test passing, actually run the commands and report results.
1494
-
1495
- OUTPUT FORMAT -- you MUST use this exact format:
1496
-
1497
- VERIFICATION REPORT
1498
-
1499
- | # | Criterion | Verdict | Evidence |
1500
- |---|-----------|---------|----------|
1501
- | 1 | [criterion text] | PASS/FAIL/MANUAL CHECK NEEDED | [what you observed] |
1502
- | 2 | [criterion text] | PASS/FAIL/MANUAL CHECK NEEDED | [what you observed] |
1503
- [continue for all criteria]
1504
-
1505
- SUMMARY: X/Y criteria passed. [Z failures need attention. / All criteria verified.]
1506
-
1507
- If any test commands fail to run (missing dependencies, wrong command, etc.), report the error as evidence for a FAIL verdict on the relevant criterion.
1508
- \`\`\`
1509
-
1510
- ## Step 5: Format and Present the Verdict
1511
-
1512
- Take the subagent's response and present it to the user in this format:
1513
-
1514
- \`\`\`
1515
- ## Verification Report -- [Spec Name]
1516
-
1517
- | # | Criterion | Verdict | Evidence |
1518
- |---|-----------|---------|----------|
1519
- | 1 | ... | PASS | ... |
1520
- | 2 | ... | FAIL | ... |
1521
-
1522
- **Overall: X/Y criteria passed.**
1523
-
1524
- [If all passed:]
1525
- All criteria verified. Ready to commit and open a PR.
1526
-
1527
- [If any failed:]
1528
- N failures need attention. Review the evidence above and fix before proceeding.
1529
-
1530
- [If any MANUAL CHECK NEEDED:]
1531
- N criteria need manual verification -- they can't be checked by reading code or running tests alone.
1532
- \`\`\`
1533
-
1534
- ## Step 6: Suggest Next Steps
1535
-
1536
- Based on the verdict:
1537
-
1538
- - **All PASS:** Suggest committing and opening a PR, or running \`/joycraft-session-end\` to capture discoveries.
1539
- - **Some FAIL:** List the failed criteria and suggest the user fix them, then run \`/joycraft-verify\` again.
1540
- - **MANUAL CHECK NEEDED items:** Explain what needs human eyes and why automation couldn't verify it.
1541
-
1542
- **Do NOT offer to fix failures yourself.** The verifier reports; the human (or implementation agent in a separate turn) decides what to do. This separation is the whole point.
1543
-
1544
- ## Edge Cases
1545
-
1546
- | Scenario | Behavior |
1547
- |----------|----------|
1548
- | Spec has no Test Plan | Warn that verification is weaker without a test plan, but proceed by checking criteria through code reading and any available project-level tests |
1549
- | All tests pass but a criterion is not testable | Mark as MANUAL CHECK NEEDED with explanation |
1550
- | Subagent can't run tests (missing deps) | Report the error as FAIL evidence |
1551
- | No specs found and no path given | Tell user to provide a spec path or create a spec first |
1552
- | Spec status is "Complete" | Still run verification -- "Complete" means the implementer thinks it's done, verification confirms |
1553
- `
1554
- };
1555
- var TEMPLATES = {
1556
- "context/dangerous-assumptions.md": `# Dangerous Assumptions
1557
-
1558
- > Things the AI agent might assume that are wrong in this project.
1559
- > Generated by Joycraft risk interview. Update when you discover new gotchas.
1560
-
1561
- ## Assumptions
1562
-
1563
- | Agent Might Assume | But Actually | Impact If Wrong |
1564
- |-------------------|-------------|----------------|
1565
- | _Example: All databases are dev/test_ | _The default connection is production_ | _Data loss_ |
1566
- | _Example: Deleting and recreating is safe_ | _Some resources have manual config not in code_ | _Hours of manual recovery_ |
1567
-
1568
- ## Historical Incidents
1569
-
1570
- | Date | What Happened | Lesson | Rule Added |
1571
- |------|-------------|--------|------------|
1572
- | _Example: 2026-03-15_ | _Agent deleted staging infra thinking it was temp_ | _Always verify environment before destructive ops_ | _NEVER: Delete cloud resources without listing them first_ |
1573
- `,
1574
- "context/decision-log.md": `# Decision Log
1575
-
1576
- > Why choices were made, not just what was chosen.
1577
- > Update this when making architectural, tooling, or process decisions.
1578
- > This is the institutional memory that prevents re-litigating settled questions.
1579
-
1580
- ## Decisions
1581
-
1582
- | Date | Decision | Why | Alternatives Rejected | Revisit When |
1583
- |------|----------|-----|----------------------|-------------|
1584
- | _Example: 2026-03-15_ | _Use Supabase over Firebase_ | _Postgres flexibility, row-level security, self-hostable_ | _Firebase (vendor lock-in), PlanetScale (no RLS)_ | _If we need real-time sync beyond Supabase's capabilities_ |
1585
-
1586
- ## Principles
1587
-
1588
- _Capture recurring decision patterns here \u2014 they save time on future choices._
1589
-
1590
- - _Example: "Prefer tools we can self-host over pure SaaS \u2014 reduces vendor risk"_
1591
- - _Example: "Choose boring technology for infrastructure, cutting-edge only for core differentiators"_
1592
- `,
1593
- "context/institutional-knowledge.md": `# Institutional Knowledge
1594
-
1595
- > Unwritten rules, team conventions, and organizational context that AI agents can't derive from code.
1596
- > This is the knowledge that takes a new developer months to absorb.
1597
- > Update when you catch yourself saying "oh, you didn't know about that?"
1598
-
1599
- ## Team Conventions
1600
-
1601
- _Things everyone on the team knows but nobody wrote down._
1602
-
1603
- - _Example: "We never deploy on Fridays"_
1604
- - _Example: "The CEO reviews all UI changes before they ship"_
1605
- - _Example: "PR titles must reference the Jira ticket number"_
1606
-
1607
- ## Organizational Constraints
1608
-
1609
- _Business rules, compliance requirements, or political realities that affect technical decisions._
1610
-
1611
- - _Example: "Legal requires all user data to be stored in EU regions"_
1612
- - _Example: "The payments team owns the billing schema \u2014 never modify without their approval"_
1613
- - _Example: "We have an informal agreement with Vendor X about API rate limits"_
1614
-
1615
- ## Historical Context
1616
-
1617
- _Why things are the way they are \u2014 especially when it looks wrong._
1618
-
1619
- - _Example: "The auth module uses an old pattern because it predates our TypeScript migration \u2014 don't refactor without a spec"_
1620
- - _Example: "The caching layer has a 5-second TTL because we had a consistency bug in 2025 \u2014 increasing it requires careful testing"_
1621
-
1622
- ## People & Ownership
1623
-
1624
- _Who owns what, who to ask, who cares about what._
1625
-
1626
- - _Example: "Alice owns the payment pipeline \u2014 all changes need her review"_
1627
- - _Example: "The data team is sensitive about query performance on the analytics tables"_
1628
- `,
1629
- "context/production-map.md": `# Production Map
1630
-
1631
- > What's real, what's staging, what's safe to touch.
1632
- > Generated by Joycraft risk interview. Update as your infrastructure evolves.
1633
-
1634
- ## Services
1635
-
1636
- | Service | Environment | URL/Endpoint | Impact if Corrupted |
1637
- |---------|-------------|-------------|-------------------|
1638
- | _Example: Main DB_ | _Production_ | _postgres://prod.example.com_ | _1.9M user records lost_ |
1639
- | _Example: Staging DB_ | _Staging_ | _postgres://staging.example.com_ | _Test data only, safe to reset_ |
1640
-
1641
- ## Secrets & Credentials
1642
-
1643
- | Secret | Location | Notes |
1644
- |--------|----------|-------|
1645
- | _Example: DATABASE_URL_ | _.env.local_ | _Production connection \u2014 NEVER commit_ |
1646
-
1647
- ## Safe to Touch
1648
-
1649
- - [ ] Staging environment at [URL]
1650
- - [ ] Test/fixture data in [location]
1651
- - [ ] Development API keys
1652
-
1653
- ## NEVER Touch Without Explicit Approval
1654
-
1655
- - [ ] Production database
1656
- - [ ] Live API endpoints
1657
- - [ ] User-facing infrastructure
1658
- `,
1659
- "context/troubleshooting.md": `# Troubleshooting
1660
-
1661
- > What to do when things go wrong for non-code reasons.
1662
- > Environment issues, flaky dependencies, hardware quirks, and diagnostic steps.
1663
- > Update when you discover new failure modes and their fixes.
1664
-
1665
- ## Common Failures
1666
-
1667
- | When This Happens | Do This | Don't Do This |
1668
- |-------------------|---------|---------------|
1669
- | _Example: Tests fail with ECONNREFUSED_ | _Check if the dev database is running_ | _Don't rewrite the test or mock the connection_ |
1670
- | _Example: Build fails with out-of-memory_ | _Increase Node heap size or close other processes_ | _Don't simplify the code to reduce bundle size_ |
1671
- | _Example: Lint passes locally but fails in CI_ | _Check Node/tool version mismatch between local and CI_ | _Don't disable the lint rule_ |
1672
-
1673
- ## Environment Issues
1674
-
1675
- | Symptom | Likely Cause | Fix |
1676
- |---------|-------------|-----|
1677
- | _Example: "Module not found" after branch switch_ | _Dependencies changed on the new branch_ | _Run the package manager install command_ |
1678
- | _Example: Port already in use_ | _Previous dev server didn't shut down cleanly_ | _Kill the process on that port or use a different one_ |
1679
- | _Example: Permission denied on file/directory_ | _File ownership or permission mismatch_ | _Check and fix file permissions, don't run as root_ |
1680
-
1681
- ## Diagnostic Steps
1682
-
1683
- _When something fails unexpectedly, follow this sequence before trying to fix the code:_
1684
-
1685
- 1. **Check the error message literally** -- don't assume what it means, read it
1686
- 2. **Check environment prerequisites** -- are all services running? Correct versions?
1687
- 3. **Check recent changes** -- did a config file, dependency, or environment variable change?
1688
- 4. **Check network/connectivity** -- is the internet up? Are external services reachable?
1689
- 5. **Search project docs first** -- check this file and \`docs/discoveries/\` before web searching
1690
-
1691
- ## "Stop and Ask" Scenarios
1692
-
1693
- _Situations where the AI agent should stop and ask the human instead of trying to fix things._
1694
-
1695
- - _Example: Hardware device not responding -- the human may need to physically reconnect it_
1696
- - _Example: Authentication token expired -- the human needs to re-authenticate manually_
1697
- - _Example: CI pipeline blocked by a required approval -- a human needs to approve it_
1698
- - _Example: Error messages referencing infrastructure the agent doesn't have access to_
1699
- `,
1700
- "examples/example-brief.md": `# Add User Notifications \u2014 Feature Brief
1701
-
1702
- > **Date:** 2026-03-15
1703
- > **Project:** acme-web
1704
- > **Status:** Specs Ready
1705
-
1706
- ---
1707
-
1708
- ## Vision
1709
-
1710
- Our users have no idea when things happen in their account. A teammate comments on their pull request, a deployment finishes, a billing threshold is hit \u2014 they find out by accident, minutes or hours later. This is the #1 complaint in our last user survey.
1711
-
1712
- We are building a notification system that delivers real-time and batched notifications across in-app, email, and (later) Slack channels. Users will have fine-grained control over what they receive and how. When this ships, no important event goes unnoticed, and no user gets buried in noise they didn't ask for.
1713
-
1714
- The system is designed to be extensible \u2014 new event types plug in without touching the notification infrastructure. We start with three event types (PR comments, deploy status, billing alerts) and prove the pattern works before expanding.
1715
-
1716
- ## User Stories
1717
-
1718
- - As a developer, I want to see a notification badge in the app when someone comments on my PR so that I can respond quickly
1719
- - As a team lead, I want to receive an email when a production deployment fails so that I can coordinate the response
1720
- - As a billing admin, I want to get alerted when usage exceeds 80% of our plan limit so that I can upgrade before service is disrupted
1721
- - As any user, I want to control which notifications I receive and through which channels so that I am not overwhelmed
1722
-
1723
- ## Hard Constraints
1724
-
1725
- - MUST: All notifications go through a single event bus \u2014 no direct coupling between event producers and delivery channels
1726
- - MUST: Email delivery uses the existing SendGrid integration (do not add a new email provider)
1727
- - MUST: Respect user preferences before delivering \u2014 never send a notification the user has opted out of
1728
- - MUST NOT: Store notification content in plaintext in the database \u2014 use the existing encryption-at-rest pattern
1729
- - MUST NOT: Send more than 50 emails per user per day (batch if necessary)
1730
-
1731
- ## Out of Scope
1732
-
1733
- - NOT: Slack/Discord integration (Phase 2)
1734
- - NOT: Push notifications / mobile (Phase 2)
1735
- - NOT: Notification templates with rich HTML \u2014 plain text and simple markdown only for now
1736
- - NOT: Admin dashboard for monitoring notification delivery rates
1737
- - NOT: Retroactive notifications for events that happened before the feature ships
1738
-
1739
- ## Decomposition
1740
-
1741
- | # | Spec Name | Description | Dependencies | Est. Size |
1742
- |---|-----------|-------------|--------------|-----------|
1743
- | 1 | add-notification-preferences-api | Create REST endpoints for users to read and update their notification preferences | None | M |
1744
- | 2 | add-event-bus-infrastructure | Set up the internal event bus that decouples event producers from notification delivery | None | M |
1745
- | 3 | add-notification-delivery-service | Build the service that consumes events, checks preferences, and dispatches to channels (in-app, email) | Spec 1, Spec 2 | L |
1746
- | 4 | add-in-app-notification-ui | Add notification bell, dropdown, and badge count to the app header | Spec 3 | M |
1747
- | 5 | add-email-batching | Implement daily digest batching for email notifications that exceed the per-user threshold | Spec 3 | S |
1748
-
1749
- ## Execution Strategy
1750
-
1751
- - [x] Agent teams (parallel teammates within phases, sequential between phases)
1752
-
1753
- \`\`\`
1754
- Phase 1: Teammate A -> Spec 1 (preferences API), Teammate B -> Spec 2 (event bus)
1755
- Phase 2: Teammate A -> Spec 3 (delivery service) \u2014 depends on Phase 1
1756
- Phase 3: Teammate A -> Spec 4 (UI), Teammate B -> Spec 5 (batching) \u2014 both depend on Spec 3
1757
- \`\`\`
1758
-
1759
- ## Success Criteria
1760
-
1761
- - [ ] User updates notification preferences via API, and subsequent events respect those preferences
1762
- - [ ] A PR comment event triggers an in-app notification visible in the UI within 2 seconds
1763
- - [ ] A deploy failure event sends an email to subscribed users via SendGrid
1764
- - [ ] When email threshold (50/day) is exceeded, remaining notifications are batched into a daily digest
1765
- - [ ] No regressions in existing PR, deployment, or billing features
1766
-
1767
- ## External Scenarios
1768
-
1769
- | Scenario | What It Tests | Pass Criteria |
1770
- |----------|--------------|---------------|
1771
- | opt-out-respected | User disables email for deploy events, deploy fails | No email sent, in-app notification still appears |
1772
- | batch-threshold | Send 51 email-eligible events for one user in a day | 50 individual emails + 1 digest containing the overflow |
1773
- | preference-persistence | User sets preferences, logs out, logs back in | Preferences are unchanged |
1774
- `,
1775
- "examples/example-spec.md": `# Add Notification Preferences API \u2014 Atomic Spec
1776
-
1777
- > **Parent Brief:** \`docs/briefs/2026-03-15-add-user-notifications.md\`
1778
- > **Status:** Ready
1779
- > **Date:** 2026-03-15
1780
- > **Estimated scope:** 1 session / 4 files / ~250 lines
1781
-
1782
- ---
1783
-
1784
- ## What
1785
-
1786
- Add REST API endpoints that let users read and update their notification preferences. Each user gets a preferences record with per-event-type, per-channel toggles (e.g., "PR comments: in-app=on, email=off"). Preferences default to all-on for new users and are stored encrypted alongside the user profile.
1787
-
1788
- ## Why
1789
-
1790
- The notification delivery service (Spec 3) needs to check preferences before dispatching. Without this API, there is no way for users to control what they receive, and we cannot build the delivery pipeline.
1791
-
1792
- ## Acceptance Criteria
1793
-
1794
- - [ ] \`GET /api/v1/notifications/preferences\` returns the current user's preferences as JSON
1795
- - [ ] \`PATCH /api/v1/notifications/preferences\` updates one or more preference fields and returns the updated record
1796
- - [ ] New users get default preferences (all channels enabled for all event types) on first read
1797
- - [ ] Preferences are validated \u2014 unknown event types or channels return 400
1798
- - [ ] Preferences are stored using the existing encryption-at-rest pattern (\`EncryptedJsonColumn\`)
1799
- - [ ] Endpoint requires authentication (returns 401 for unauthenticated requests)
1800
- - [ ] Build passes
1801
- - [ ] Tests pass (unit + integration)
1802
-
1803
- ## Test Plan
1804
-
1805
- | Acceptance Criterion | Test | Type |
1806
- |---------------------|------|------|
1807
- | GET returns preferences as JSON | Call GET with authenticated user, assert 200 + JSON shape matches preferences schema | integration |
1808
- | PATCH updates preferences | Call PATCH with valid partial update, assert 200 + returned record reflects changes | integration |
1809
- | New users get defaults | Call GET for user with no existing record, assert default preferences (all channels enabled) | unit |
1810
- | Unknown event types return 400 | Call PATCH with \`{"foo": {"email": true}}\`, assert 400 + validation error | unit |
1811
- | Stored with EncryptedJsonColumn | Verify model uses EncryptedJsonColumn for preferences field | unit |
1812
- | Auth required | Call GET/PATCH without auth token, assert 401 | integration |
1813
- | Build passes | Verified by build step \u2014 no separate test needed | build |
1814
- | Tests pass | Verified by test runner \u2014 no separate test needed | meta |
1815
-
1816
- **Execution order:**
1817
- 1. Write all tests above \u2014 they should fail against current/stubbed code
1818
- 2. Run tests to confirm they fail (red)
1819
- 3. Implement until all tests pass (green)
1820
-
1821
- **Smoke test:** The "New users get defaults" unit test \u2014 no database or HTTP needed, fastest feedback loop.
1822
-
1823
- **Before implementing, verify your test harness:**
1824
- 1. Run all tests \u2014 they must FAIL (if they pass, you're testing the wrong thing)
1825
- 2. Each test calls your actual function/endpoint \u2014 not a reimplementation or the underlying library
1826
- 3. Identify your smoke test \u2014 it must run in seconds, not minutes, so you get fast feedback on each change
1827
-
1828
- ## Constraints
1829
-
1830
- - MUST: Use the existing \`EncryptedJsonColumn\` utility for storage \u2014 do not roll a new encryption pattern
1831
- - MUST: Follow the existing REST controller pattern in \`src/controllers/\`
1832
- - MUST NOT: Expose other users' preferences (scope queries to authenticated user only)
1833
- - SHOULD: Return the full preferences object on PATCH (not just the changed fields), so the frontend can replace state without merging
1834
-
1835
- ## Affected Files
1836
-
1837
- | Action | File | What Changes |
1838
- |--------|------|-------------|
1839
- | Create | \`src/controllers/notification-preferences.controller.ts\` | New controller with GET and PATCH handlers |
1840
- | Create | \`src/models/notification-preferences.model.ts\` | Sequelize model with EncryptedJsonColumn for preferences blob |
1841
- | Create | \`src/migrations/20260315-add-notification-preferences.ts\` | Database migration to create notification_preferences table |
1842
- | Create | \`tests/controllers/notification-preferences.test.ts\` | Unit and integration tests for both endpoints |
1843
- | Modify | \`src/routes/index.ts\` | Register the new controller routes |
1844
-
1845
- ## Approach
1846
-
1847
- Create a \`NotificationPreferences\` model backed by a single \`notification_preferences\` table with columns: \`id\`, \`user_id\` (unique FK), \`preferences\` (EncryptedJsonColumn), \`created_at\`, \`updated_at\`. The \`preferences\` column stores a JSON blob shaped like \`{ "pr_comment": { "in_app": true, "email": true }, "deploy_status": { ... } }\`.
1848
-
1849
- The GET endpoint does a find-or-create: if no record exists for the user, create one with defaults and return it. The PATCH endpoint deep-merges the request body into the existing preferences, validates the result against a known schema of event types and channels, and saves.
1850
-
1851
- **Rejected alternative:** Storing preferences as individual rows (one per event-type-channel pair). This would make queries more complex and would require N rows per user instead of 1. The JSON blob approach is simpler and matches how the frontend will consume the data.
1852
-
1853
- ## Edge Cases
1854
-
1855
- | Scenario | Expected Behavior |
1856
- |----------|------------------|
1857
- | PATCH with empty body \`{}\` | Return 200 with unchanged preferences (no-op) |
1858
- | PATCH with unknown event type \`{"foo": {"email": true}}\` | Return 400 with validation error listing valid event types |
1859
- | GET for user with no existing record | Create default preferences, return 200 |
1860
- | Concurrent PATCH requests | Last-write-wins (optimistic, no locking) \u2014 acceptable for user preferences |
1861
- `,
1862
- "scenarios/README.md": `# $SCENARIOS_REPO
1863
-
1864
- Holdout scenario tests for the main project. These tests run in CI against the
1865
- built artifact of each PR \u2014 but they live here, in a separate repository, so
1866
- the coding agent working on the main project cannot see them.
1867
-
1868
- ---
1869
-
1870
- ## What is the holdout pattern?
1871
-
1872
- Think of it like a validation set in machine learning. When you train a model,
1873
- you keep a slice of your data hidden from the training process. If the model
1874
- scores well on data it has never seen, you can trust that it has actually
1875
- learned something \u2014 not just memorized the training examples.
1876
-
1877
- Scenario tests work the same way. The coding agent writes code and passes
1878
- internal tests in the main repo. These scenario tests then check whether the
1879
- result behaves correctly from a real user's perspective, using only the public
1880
- interface of the built artifact.
1881
-
1882
- Because the agent cannot read this repository, it cannot game the tests. A
1883
- passing scenario run means the feature genuinely works.
1884
-
1885
- ---
1886
-
1887
- ## Why a separate repository?
1888
-
1889
- A single repository would expose the tests to the agent. Claude Code reads
1890
- files in the working directory; if scenario tests lived in the main repo, the
1891
- agent could (and would) read them when fixing failures, which defeats the
1892
- purpose.
1893
-
1894
- A separate repo also means:
1895
-
1896
- - The test suite can be updated by humans without triggering the autofix loop
1897
- - Scenarios can reference multiple projects over time
1898
- - Access controls are independent \u2014 the scenarios repo can be more restricted
1899
-
1900
- ---
1901
-
1902
- ## How the CI pipeline works
1903
-
1904
- \`\`\`
1905
- Main repo PR opened
1906
- |
1907
- v
1908
- Main repo CI runs (unit + integration tests)
1909
- |
1910
- | passes
1911
- v
1912
- scenarios-dispatch.yml fires a repository_dispatch event
1913
- |
1914
- v
1915
- This repo: run.yml receives the event
1916
- |
1917
- +-- clones main-repo PR branch to ../main-repo
1918
- |
1919
- +-- builds the artifact (npm ci && npm run build)
1920
- |
1921
- +-- runs: NO_COLOR=1 npx vitest run
1922
- |
1923
- +-- captures exit code + output
1924
- |
1925
- v
1926
- Posts PASS / FAIL comment on the originating PR
1927
- \`\`\`
1928
-
1929
- The PR author sees the scenario result as a comment. No separate status check
1930
- is required, though you can add one via the GitHub Checks API if you prefer.
1931
-
1932
- ---
1933
-
1934
- ## Adding scenarios
1935
-
1936
- ### Rules
1937
-
1938
- 1. **Behavioral, not structural.** Test what the tool does, not how it is
1939
- built internally. Invoke the binary; assert on stdout, exit codes, and
1940
- filesystem state. Never import from \`../main-repo/src\`.
1941
-
1942
- 2. **End-to-end.** Each test should represent something a real user would
1943
- actually do. If you would not put it in a demo or docs example, reconsider
1944
- whether it belongs here.
1945
-
1946
- 3. **No source imports.** The entire point of the holdout is that tests cannot
1947
- see source code. Any \`import\` that reaches into \`../main-repo/src\` breaks
1948
- the pattern.
1949
-
1950
- 4. **Independent.** Each test must be able to run in isolation. Use \`beforeEach\`
1951
- / \`afterEach\` to set up and tear down temp directories. Do not share mutable
1952
- state between tests.
1953
-
1954
- 5. **Deterministic.** Avoid network calls, timestamps, or random values in
1955
- assertions unless the feature under test genuinely involves them.
1956
-
1957
- ### File layout
1958
-
1959
- \`\`\`
1960
- $SCENARIOS_REPO/
1961
- \u251C\u2500\u2500 example-scenario.test.ts # Starter file \u2014 replace with real scenarios
1962
- \u251C\u2500\u2500 workflows/
1963
- \u2502 \u2514\u2500\u2500 run.yml # CI workflow (do not rename)
1964
- \u251C\u2500\u2500 package.json
1965
- \u2514\u2500\u2500 README.md
1966
- \`\`\`
1967
-
1968
- Add new \`.test.ts\` files at the top level or in subdirectories. Vitest will
1969
- discover them automatically.
1970
-
1971
- ### Example structure
1972
-
1973
- \`\`\`ts
1974
- import { spawnSync } from "node:child_process";
1975
- import { join } from "node:path";
1976
-
1977
- const CLI = join(__dirname, "..", "main-repo", "dist", "cli.js");
1978
-
1979
- it("init creates a CLAUDE.md file", () => {
1980
- const tmp = mkdtempSync(join(tmpdir(), "scenario-"));
1981
- const { status } = spawnSync("node", [CLI, "init", tmp], { encoding: "utf8" });
1982
- expect(status).toBe(0);
1983
- expect(existsSync(join(tmp, "CLAUDE.md"))).toBe(true);
1984
- });
1985
- \`\`\`
1986
-
1987
- ---
1988
-
1989
- ## Internal tests vs scenario tests
1990
-
1991
- | | Internal tests (main repo) | Scenario tests (this repo) |
1992
- |---|---|---|
1993
- | Location | \`tests/\` in main repo | This repo |
1994
- | Visible to agent | Yes | No |
1995
- | What they test | Units, modules, logic | End-to-end behavior |
1996
- | Import source code | Yes | Never |
1997
- | Run on every push | Yes | Yes (via dispatch) |
1998
- | Purpose | Catch regressions fast | Validate real behavior |
1999
-
2000
- ---
2001
-
2002
- ## Relationship to Joycraft
2003
-
2004
- This repository was bootstrapped by \`npx joycraft init --autofix\`. Joycraft
2005
- manages the \`run.yml\` workflow and keeps it in sync when you run
2006
- \`npx joycraft upgrade\`. The test files are yours \u2014 Joycraft will never
2007
- overwrite them.
2008
-
2009
- If the \`run.yml\` workflow needs updating (e.g., a new version of
2010
- \`actions/create-github-app-token\`), run \`npx joycraft upgrade\` in this repo
2011
- and review the diff before applying.
2012
- `,
2013
- "scenarios/example-scenario.test.ts": `/**
2014
- * Example Scenario Test
2015
- *
2016
- * This file is a template for scenario tests in your holdout repository.
2017
- * Scenarios are behavioral, end-to-end tests that run against the BUILT
2018
- * artifact of your main project \u2014 not its source code.
2019
- *
2020
- * The Holdout Pattern
2021
- * -------------------
2022
- * These tests live in a SEPARATE repository that your coding agent cannot
2023
- * see. This is intentional: if the agent could read these tests, it could
2024
- * write code that passes them without actually solving the problem correctly
2025
- * (the same way a student who sees the exam beforehand can score well without
2026
- * understanding the material).
2027
- *
2028
- * In CI, the main repo is cloned to ../main-repo (relative to this repo's
2029
- * checkout). The run.yml workflow builds the artifact there before running
2030
- * these tests, so \`../main-repo\` is always available and already built.
2031
- *
2032
- * How to Write Scenarios
2033
- * ----------------------
2034
- * DO:
2035
- * - Invoke the built binary / entry point via child_process (execSync, spawnSync)
2036
- * - Test observable behavior: exit codes, stdout/stderr content, file system state
2037
- * - Write scenarios around things a real user would actually do
2038
- * - Keep each test fully independent \u2014 no shared state between tests
2039
- *
2040
- * DON'T:
2041
- * - Import from ../main-repo/src \u2014 that defeats the holdout
2042
- * - Test internal implementation details (function names, module structure)
2043
- * - Rely on network access unless your tool genuinely requires it
2044
- * - Share mutable fixtures across tests
2045
- */
2046
-
2047
- import { execSync, spawnSync } from "node:child_process";
2048
- import { existsSync, mkdtempSync, rmSync } from "node:fs";
2049
- import { tmpdir } from "node:os";
2050
- import { join } from "node:path";
2051
- import { afterEach, beforeEach, describe, expect, it } from "vitest";
2052
-
2053
- // Path to the built CLI entry point in the main repo.
2054
- // The run.yml workflow clones the main repo to ../main-repo and builds it
2055
- // before this test file runs, so this path is always valid in CI.
2056
- const CLI = join(__dirname, "..", "main-repo", "dist", "cli.js");
2057
-
2058
- // ---------------------------------------------------------------------------
2059
- // Helpers
2060
- // ---------------------------------------------------------------------------
2061
-
2062
- /** Run the CLI and return { stdout, stderr, status }. Never throws. */
2063
- function runCLI(args: string[], cwd?: string) {
2064
- const result = spawnSync("node", [CLI, ...args], {
2065
- encoding: "utf8",
2066
- cwd: cwd ?? process.cwd(),
2067
- env: { ...process.env, NO_COLOR: "1" },
2068
- });
2069
- return {
2070
- stdout: result.stdout ?? "",
2071
- stderr: result.stderr ?? "",
2072
- status: result.status ?? 1,
2073
- };
2074
- }
2075
-
2076
- // ---------------------------------------------------------------------------
2077
- // Basic invocation scenarios
2078
- // ---------------------------------------------------------------------------
2079
-
2080
- describe("CLI: basic invocation", () => {
2081
- it("--help prints usage information", () => {
2082
- const { stdout, status } = runCLI(["--help"]);
2083
- expect(status).toBe(0);
2084
- expect(stdout).toContain("Usage:");
2085
- });
2086
-
2087
- it("--version returns a semver string", () => {
2088
- const { stdout, status } = runCLI(["--version"]);
2089
- expect(status).toBe(0);
2090
- // Matches x.y.z, x.y.z-alpha.1, etc.
2091
- expect(stdout.trim()).toMatch(/^\\d+\\.\\d+\\.\\d+/);
2092
- });
2093
-
2094
- it("unknown command exits non-zero", () => {
2095
- const { status } = runCLI(["not-a-real-command"]);
2096
- expect(status).not.toBe(0);
2097
- });
2098
- });
2099
-
2100
- // ---------------------------------------------------------------------------
2101
- // Example: filesystem interaction scenario
2102
- //
2103
- // This pattern is useful when your CLI creates or modifies files.
2104
- // Each test gets a fresh temp directory so they can't interfere.
2105
- // ---------------------------------------------------------------------------
2106
-
2107
- describe("CLI: init command (example \u2014 replace with your real scenarios)", () => {
2108
- let tmpDir: string;
2109
-
2110
- beforeEach(() => {
2111
- tmpDir = mkdtempSync(join(tmpdir(), "scenarios-"));
2112
- });
2113
-
2114
- afterEach(() => {
2115
- rmSync(tmpDir, { recursive: true, force: true });
2116
- });
2117
-
2118
- it("init creates expected output in an empty directory", () => {
2119
- // This is a placeholder. Replace with whatever your CLI actually does.
2120
- // The point is: invoke the binary, observe side effects, assert on them.
2121
- const { status } = runCLI(["init", tmpDir]);
2122
-
2123
- // Example assertions \u2014 adjust to your tool's actual behavior:
2124
- // expect(status).toBe(0);
2125
- // expect(existsSync(join(tmpDir, "CLAUDE.md"))).toBe(true);
2126
-
2127
- // Remove this line once you've written a real assertion above:
2128
- expect(typeof status).toBe("number"); // placeholder
2129
- });
2130
- });
2131
- `,
2132
- "scenarios/package.json": `{
2133
- "name": "$SCENARIOS_REPO",
2134
- "version": "0.0.1",
2135
- "private": true,
2136
- "type": "module",
2137
- "scripts": {
2138
- "test": "vitest run"
2139
- },
2140
- "devDependencies": {
2141
- "vitest": "^3.0.0"
2142
- }
2143
- }
2144
- `,
2145
- "scenarios/prompts/scenario-agent.md": `You are a QA engineer working in a holdout test repository. You CANNOT access the main repository's source code. Your job is to write or update behavioral scenario tests based on specs that are pushed from the main repo.
2146
-
2147
- ## What You Have Access To
2148
-
2149
- - This scenarios repository (test files, \`specs/\` mirror, \`package.json\`)
2150
- - The incoming spec (provided below)
2151
- - A list of existing test files and spec mirrors (provided below)
2152
- - The main repo is available at \`../main-repo\` and is already built \u2014 you can invoke its CLI or entry point via \`execSync\`/\`spawnSync\`, but you MUST NOT import from \`../main-repo/src\`
2153
-
2154
- ## Triage Decision Tree
2155
-
2156
- Read the incoming spec carefully. Decide which of these three actions to take:
2157
-
2158
- ### SKIP \u2014 Do nothing if the spec is:
2159
- - An internal refactor with no user-facing behavior change (e.g., "extract module", "rename internal type")
2160
- - CI or dev tooling changes (e.g., "add lint rule", "update GitHub Actions workflow")
2161
- - Documentation-only changes
2162
- - Performance improvements with identical observable behavior
2163
-
2164
- If you SKIP, write a brief comment in the relevant test file (or a new one) explaining why, then stop.
2165
-
2166
- ### NEW \u2014 Create a new test file if the spec describes:
2167
- - A new command, flag, or subcommand
2168
- - A new output format or file that gets generated
2169
- - A new user-facing behavior that doesn't map to any existing test file
2170
-
2171
- Name the file after the feature area: \`[feature-area].test.ts\`. One feature area per test file.
2172
-
2173
- ### UPDATE \u2014 Modify an existing test file if the spec:
2174
- - Changes behavior that is already tested
2175
- - Adds a flag or option to an existing command
2176
- - Modifies output format for an existing feature
2177
-
2178
- Match to the most relevant existing test file by feature area.
2179
-
2180
- **If you are unsure whether a spec is user-facing, err on the side of writing a test.**
2181
-
2182
- ## Test Writing Rules
2183
-
2184
- 1. **Behavioral only.** Test observable output \u2014 stdout, stderr, exit codes, files created/modified on disk. Never test internal implementation details or import source modules.
2185
-
2186
- 2. **Use \`execSync\` or \`spawnSync\`.** Invoke the built binary at \`../main-repo/dist/cli.js\` (or whatever the main repo's entry point is). Check \`../main-repo/package.json\` to find the correct entry point if unsure.
2187
-
2188
- 3. **Use vitest.** Import \`describe\`, \`it\`, \`expect\` from \`vitest\`. Use \`beforeEach\`/\`afterEach\` for temp directory setup/teardown.
2189
-
2190
- 4. **Each test is fully independent.** No shared mutable state between tests. Each test that touches the filesystem gets its own temp directory via \`mkdtempSync\`.
2191
-
2192
- 5. **Assert on realistic user actions.** Write tests that reflect what a real user would do \u2014 not what the implementation happens to do.
2193
-
2194
- 6. **Never import from the parent repo's source.** If you find yourself writing \`import { ... } from '../main-repo/src/...'\`, stop \u2014 that defeats the holdout.
2195
-
2196
- ## Test File Template
2197
-
2198
- \`\`\`typescript
2199
- import { execSync, spawnSync } from 'node:child_process';
2200
- import { existsSync, mkdtempSync, rmSync, readFileSync } from 'node:fs';
2201
- import { tmpdir } from 'node:os';
2202
- import { join } from 'node:path';
2203
- import { describe, it, expect, beforeEach, afterEach } from 'vitest';
2204
-
2205
- const CLI = join(__dirname, '..', 'main-repo', 'dist', 'cli.js');
2206
-
2207
- function runCLI(args: string[], cwd?: string) {
2208
- const result = spawnSync('node', [CLI, ...args], {
2209
- encoding: 'utf8',
2210
- cwd: cwd ?? process.cwd(),
2211
- env: { ...process.env, NO_COLOR: '1' },
2212
- });
2213
- return {
2214
- stdout: result.stdout ?? '',
2215
- stderr: result.stderr ?? '',
2216
- status: result.status ?? 1,
2217
- };
2218
- }
2219
-
2220
- describe('[feature area]: [behavior being tested]', () => {
2221
- let tmpDir: string;
2222
-
2223
- beforeEach(() => {
2224
- tmpDir = mkdtempSync(join(tmpdir(), 'scenarios-'));
2225
- });
2226
-
2227
- afterEach(() => {
2228
- rmSync(tmpDir, { recursive: true, force: true });
2229
- });
2230
-
2231
- it('[specific observable behavior]', () => {
2232
- const { stdout, status } = runCLI(['command', 'args'], tmpDir);
2233
- expect(status).toBe(0);
2234
- expect(stdout).toContain('expected output');
2235
- });
2236
- });
2237
- \`\`\`
2238
-
2239
- ## Checklist Before Committing
2240
-
2241
- - [ ] Decision: SKIP / NEW / UPDATE (and why)
2242
- - [ ] Tests assert on observable behavior, not implementation
2243
- - [ ] No imports from \`../main-repo/src\`
2244
- - [ ] Each test has its own temp directory if it touches the filesystem
2245
- - [ ] File is named after the feature area, not the spec
2246
- `,
2247
- "scenarios/workflows/generate.yml": `# Scenario Generation Workflow
2248
- #
2249
- # Triggered by a \`spec-pushed\` repository_dispatch event sent from the main
2250
- # project when a spec is added or modified on main. A scenario agent triages
2251
- # the spec and writes or updates holdout tests in this repo.
2252
- #
2253
- # After the agent commits changes, fires \`scenarios-updated\` back to the main
2254
- # repo so that any open PRs are re-tested with the new/updated scenarios.
2255
- #
2256
- # Prerequisites:
2257
- # - ANTHROPIC_API_KEY secret: Anthropic API key for Claude Code
2258
- # - JOYCRAFT_APP_PRIVATE_KEY secret: GitHub App private key (.pem)
2259
- # - $JOYCRAFT_APP_ID is replaced with the actual App ID number at install time
2260
-
2261
- name: Generate Scenarios
2262
-
2263
- on:
2264
- repository_dispatch:
2265
- types: [spec-pushed]
2266
-
2267
- jobs:
2268
- generate:
2269
- name: Run scenario agent
2270
- runs-on: ubuntu-latest
2271
-
2272
- steps:
2273
- # \u2500\u2500 1. Check out the scenarios repo \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2274
- - name: Checkout scenarios repo
2275
- uses: actions/checkout@v4
2276
-
2277
- # \u2500\u2500 2. Save incoming spec to local mirror \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2278
- # The agent reads this file to understand what changed.
2279
- - name: Save spec to mirror
2280
- run: |
2281
- mkdir -p specs
2282
- cat > "specs/\${{ github.event.client_payload.spec_filename }}" << 'SPEC_EOF'
2283
- \${{ github.event.client_payload.spec_content }}
2284
- SPEC_EOF
2285
- echo "Saved \${{ github.event.client_payload.spec_filename }} to specs/"
2286
-
2287
- # \u2500\u2500 3. Gather context for the agent \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2288
- # Bounded context: filenames only (not file contents) to stay within
2289
- # token limits. The agent uses these lists to decide whether to create
2290
- # a new test file or update an existing one.
2291
- - name: Gather context
2292
- id: context
2293
- run: |
2294
- EXISTING_TESTS=$(find . -name "*.test.ts" -not -path "./.git/*" \\
2295
- | sed 's|^\\./||' | sort | tr '\\n' ',' | sed 's/,$//')
2296
- EXISTING_SPECS=$(find specs/ -name "*.md" 2>/dev/null \\
2297
- | sed 's|^specs/||' | sort | tr '\\n' ',' | sed 's/,$//')
2298
-
2299
- echo "existing_tests=$EXISTING_TESTS" >> "$GITHUB_OUTPUT"
2300
- echo "existing_specs=$EXISTING_SPECS" >> "$GITHUB_OUTPUT"
2301
- echo "Existing test files: $EXISTING_TESTS"
2302
- echo "Existing spec mirrors: $EXISTING_SPECS"
2303
-
2304
- # \u2500\u2500 4. Set up Node.js \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2305
- - name: Set up Node.js
2306
- uses: actions/setup-node@v4
2307
- with:
2308
- node-version: "20"
2309
-
2310
- # \u2500\u2500 5. Install Claude Code CLI \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2311
- - name: Install Claude Code
2312
- run: npm install -g @anthropic-ai/claude-code
2313
-
2314
- # \u2500\u2500 6. Run scenario agent \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2315
- # - Uses \`claude -p\` (prompt mode) for non-interactive execution.
2316
- # - No --model flag: the environment's default model is used.
2317
- # - --dangerously-skip-permissions lets Claude write files without prompts.
2318
- # - --max-turns 20 caps the agentic loop so it can't run indefinitely.
2319
- - name: Run scenario agent
2320
- id: agent
2321
- env:
2322
- ANTHROPIC_API_KEY: \${{ secrets.ANTHROPIC_API_KEY }}
2323
- run: |
2324
- PROMPT=$(cat .claude/prompts/scenario-agent.md 2>/dev/null || cat prompts/scenario-agent.md)
2325
-
2326
- claude -p \\
2327
- --dangerously-skip-permissions \\
2328
- --max-turns 20 \\
2329
- "\${PROMPT}
2330
-
2331
- ---
2332
-
2333
- ## Incoming Spec
2334
-
2335
- Filename: \${{ github.event.client_payload.spec_filename }}
2336
-
2337
- Content:
2338
- $(cat 'specs/\${{ github.event.client_payload.spec_filename }}')
2339
-
2340
- ---
2341
-
2342
- ## Context
2343
-
2344
- Existing test files in this repo: \${{ steps.context.outputs.existing_tests }}
2345
- Existing spec mirrors: \${{ steps.context.outputs.existing_specs }}"
2346
-
2347
- # \u2500\u2500 7. Commit any changes the agent made \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2348
- - name: Commit scenario changes
2349
- id: commit
2350
- run: |
2351
- git config user.name "Joycraft Scenario Agent"
2352
- git config user.email "joycraft-scenarios@users.noreply.github.com"
2353
-
2354
- git add -A
2355
-
2356
- if git diff --cached --quiet; then
2357
- echo "No changes to commit \u2014 spec triaged as no-op."
2358
- echo "committed=false" >> "$GITHUB_OUTPUT"
2359
- exit 0
2360
- fi
2361
-
2362
- git commit -m "scenarios: update tests for \${{ github.event.client_payload.spec_filename }}"
2363
- git push
2364
- echo "committed=true" >> "$GITHUB_OUTPUT"
2365
-
2366
- # \u2500\u2500 8. Generate GitHub App token for cross-repo dispatch \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2367
- # Only needed if the agent committed changes (otherwise nothing to re-run).
2368
- - name: Generate GitHub App token
2369
- id: app-token
2370
- if: steps.commit.outputs.committed == 'true'
2371
- uses: actions/create-github-app-token@v1
2372
- with:
2373
- app-id: $JOYCRAFT_APP_ID
2374
- private-key: \${{ secrets.JOYCRAFT_APP_PRIVATE_KEY }}
2375
- repositories: \${{ github.event.client_payload.repo }}
2376
-
2377
- # \u2500\u2500 9. Notify main repo that scenarios were updated \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2378
- # Fires \`scenarios-updated\` so the main repo's re-run workflow can
2379
- # trigger scenario runs against any open PRs that may now be affected.
2380
- - name: Dispatch scenarios-updated to main repo
2381
- if: steps.commit.outputs.committed == 'true'
2382
- env:
2383
- GH_TOKEN: \${{ steps.app-token.outputs.token }}
2384
- run: |
2385
- REPO="\${{ github.event.client_payload.repo }}"
2386
- REPO_OWNER="\${REPO%%/*}"
2387
- REPO_NAME="\${REPO##*/}"
2388
-
2389
- gh api "repos/\${REPO_OWNER}/\${REPO_NAME}/dispatches" \\
2390
- -f event_type=scenarios-updated \\
2391
- -f "client_payload[spec_filename]=\${{ github.event.client_payload.spec_filename }}" \\
2392
- -f "client_payload[scenarios_repo]=\${{ github.repository }}"
2393
-
2394
- echo "Dispatched scenarios-updated to \${REPO}"
2395
- `,
2396
- "scenarios/workflows/run.yml": `# Scenarios Run Workflow
2397
- #
2398
- # Triggered by a \`repository_dispatch\` event (type: run-scenarios) sent from
2399
- # the main project's CI pipeline after a PR passes its internal tests.
2400
- #
2401
- # This workflow:
2402
- # 1. Clones the main repo's PR branch to ../main-repo
2403
- # 2. Builds the artifact
2404
- # 3. Runs the scenario tests in this repo
2405
- # 4. Posts a PASS or FAIL comment on the originating PR
2406
- #
2407
- # Prerequisites:
2408
- # - A GitHub App ("Joycraft Autofix" or equivalent) installed on BOTH repos.
2409
- # $JOYCRAFT_APP_ID is replaced with the actual App ID number at install time.
2410
- # JOYCRAFT_APP_PRIVATE_KEY must be stored as a repository secret in this repo.
2411
- # - This scenarios repo must be added to the App's repository access list.
2412
-
2413
- name: Run Scenarios
2414
-
2415
- on:
2416
- repository_dispatch:
2417
- types: [run-scenarios]
2418
-
2419
- jobs:
2420
- run-scenarios:
2421
- name: Run holdout scenario tests
2422
- runs-on: ubuntu-latest
2423
-
2424
- steps:
2425
- # \u2500\u2500 1. Check out the scenarios repo \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2426
- - name: Checkout scenarios repo
2427
- uses: actions/checkout@v4
2428
- with:
2429
- path: scenarios
2430
-
2431
- # \u2500\u2500 2. Mint a GitHub App token \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2432
- # $JOYCRAFT_APP_ID is replaced with the numeric App ID at install time
2433
- # (e.g., app-id: 3180156). It is NOT a secret \u2014 App IDs are public.
2434
- - name: Generate GitHub App token
2435
- id: app-token
2436
- uses: actions/create-github-app-token@v1
2437
- with:
2438
- app-id: $JOYCRAFT_APP_ID
2439
- private-key: \${{ secrets.JOYCRAFT_APP_PRIVATE_KEY }}
2440
- repositories: \${{ github.event.client_payload.repo }}
2441
-
2442
- # \u2500\u2500 3. Clone the main repo's PR branch \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2443
- # Cloned to ./main-repo so scenario tests can reference ../main-repo
2444
- # relative to the scenarios/ checkout.
2445
- - name: Clone main repo PR branch
2446
- env:
2447
- GH_TOKEN: \${{ steps.app-token.outputs.token }}
2448
- run: |
2449
- git clone \\
2450
- --branch \${{ github.event.client_payload.branch }} \\
2451
- --depth 1 \\
2452
- https://x-access-token:\${GH_TOKEN}@github.com/\${{ github.event.client_payload.repo }}.git \\
2453
- main-repo
2454
-
2455
- # \u2500\u2500 4. Set up Node.js \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2456
- - name: Set up Node.js
2457
- uses: actions/setup-node@v4
2458
- with:
2459
- node-version: "20"
2460
- cache: "npm"
2461
- cache-dependency-path: main-repo/package-lock.json
2462
-
2463
- # \u2500\u2500 5. Build the main repo artifact \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2464
- - name: Build main repo
2465
- working-directory: main-repo
2466
- run: npm ci && npm run build
2467
-
2468
- # \u2500\u2500 6. Install scenario test dependencies \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2469
- - name: Install scenario dependencies
2470
- working-directory: scenarios
2471
- run: npm ci
2472
-
2473
- # \u2500\u2500 7. Run scenario tests \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2474
- # set +e \u2014 don't abort on non-zero exit; we capture it manually
2475
- # set -o pipefail \u2014 propagate failures through pipes (for tee)
2476
- # NO_COLOR=1 \u2014 strip color codes before they reach tee
2477
- # ANSI codes are also stripped via sed as a belt-and-suspenders measure
2478
- - name: Run scenario tests
2479
- id: scenarios
2480
- working-directory: scenarios
2481
- run: |
2482
- set +e
2483
- set -o pipefail
2484
- NO_COLOR=1 npx vitest run 2>&1 \\
2485
- | sed 's/\\x1b\\[[0-9;]*m//g' \\
2486
- | tee test-output.txt
2487
- VITEST_EXIT=$?
2488
- echo "exit_code=$VITEST_EXIT" >> "$GITHUB_OUTPUT"
2489
- exit $VITEST_EXIT
2490
-
2491
- # \u2500\u2500 8. Post PASS or FAIL comment on the originating PR \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2492
- # Always runs so the PR author always gets feedback.
2493
- - name: Post result comment on PR
2494
- if: always()
2495
- env:
2496
- GH_TOKEN: \${{ steps.app-token.outputs.token }}
2497
- PR_NUMBER: \${{ github.event.client_payload.pr_number }}
2498
- MAIN_REPO: \${{ github.event.client_payload.repo }}
2499
- VITEST_EXIT: \${{ steps.scenarios.outputs.exit_code }}
2500
- run: |
2501
- # Read test output (cap at 100 lines to keep the comment manageable)
2502
- OUTPUT=$(head -100 scenarios/test-output.txt 2>/dev/null || echo "(no output captured)")
2503
-
2504
- if [ "$VITEST_EXIT" = "0" ]; then
2505
- STATUS_LINE="**Scenario tests: PASS**"
2506
- else
2507
- STATUS_LINE="**Scenario tests: FAIL** (exit code: $VITEST_EXIT)"
2508
- fi
2509
-
2510
- BODY="\${STATUS_LINE}
2511
-
2512
- <details>
2513
- <summary>Test output</summary>
2514
-
2515
- \\\`\\\`\\\`
2516
- \${OUTPUT}
2517
- \\\`\\\`\\\`
2518
-
2519
- </details>
2520
-
2521
- Run triggered by commit \\\`\${{ github.event.client_payload.sha }}\\\`."
2522
-
2523
- gh api "repos/\${MAIN_REPO}/issues/\${PR_NUMBER}/comments" \\
2524
- -f body="$BODY"
2525
- `,
2526
- "workflows/autofix.yml": `# Autofix Workflow
2527
- #
2528
- # Triggered when CI fails on a PR. Uses Claude Code to attempt an automated fix,
2529
- # then pushes a commit and re-triggers CI. Limits to 3 autofix attempts per PR
2530
- # before escalating to human review.
2531
- #
2532
- # Prerequisites:
2533
- # - A GitHub App called "Joycraft Autofix" (or equivalent) installed on the repo.
2534
- # Its credentials must be stored as repository secrets:
2535
- # JOYCRAFT_APP_ID \u2014 the App's numeric ID
2536
- # JOYCRAFT_APP_PRIVATE_KEY \u2014 the App's PEM private key
2537
- # - ANTHROPIC_API_KEY secret for Claude Code
2538
-
2539
- name: Autofix
2540
-
2541
- on:
2542
- workflow_run:
2543
- # Replace with the exact name of your CI workflow
2544
- workflows: ["CI"]
2545
- types: [completed]
2546
-
2547
- # One autofix run per PR at a time \u2014 cancel in-flight runs for the same PR
2548
- concurrency:
2549
- group: autofix-pr-\${{ github.event.workflow_run.pull_requests[0].number }}
2550
- cancel-in-progress: true
2551
-
2552
- jobs:
2553
- autofix:
2554
- name: Attempt automated fix
2555
- runs-on: ubuntu-latest
2556
-
2557
- # Only run when CI failed and the triggering workflow was on a PR
2558
- if: |
2559
- github.event.workflow_run.conclusion == 'failure' &&
2560
- github.event.workflow_run.pull_requests[0] != null
2561
-
2562
- steps:
2563
- # \u2500\u2500 1. Mint a short-lived GitHub App token \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2564
- # Using a dedicated App identity lets this workflow push commits without
2565
- # triggering GitHub's anti-recursion protection on the GITHUB_TOKEN.
2566
- - name: Generate GitHub App token
2567
- id: app-token
2568
- uses: actions/create-github-app-token@v1
2569
- with:
2570
- # $JOYCRAFT_APP_ID is replaced with the actual App ID number at install time
2571
- app-id: $JOYCRAFT_APP_ID
2572
- private-key: \${{ secrets.JOYCRAFT_APP_PRIVATE_KEY }}
2573
-
2574
- # \u2500\u2500 2. Check out the PR branch \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2575
- # We check out the exact branch (not a merge ref) so that any commit we
2576
- # push lands directly on the PR branch.
2577
- - name: Checkout PR branch
2578
- uses: actions/checkout@v4
2579
- with:
2580
- token: \${{ steps.app-token.outputs.token }}
2581
- ref: \${{ github.event.workflow_run.pull_requests[0].head.ref }}
2582
- fetch-depth: 0
2583
-
2584
- # \u2500\u2500 3. Count previous autofix attempts \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2585
- # Count "autofix:" commits in the log. If we have already made 3 attempts
2586
- # on this PR, stop and ask a human to review instead.
2587
- - name: Check autofix iteration count
2588
- id: iteration
2589
- run: |
2590
- COUNT=$(git log --oneline | grep "autofix:" | wc -l | tr -d ' ')
2591
- echo "count=$COUNT" >> "$GITHUB_OUTPUT"
2592
- echo "Autofix attempts so far: $COUNT"
2593
-
2594
- # \u2500\u2500 4. Post "human review needed" and exit if limit reached \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2595
- - name: Post human-review comment and exit
2596
- if: steps.iteration.outputs.count >= 3
2597
- env:
2598
- GH_TOKEN: \${{ steps.app-token.outputs.token }}
2599
- PR_NUMBER: \${{ github.event.workflow_run.pull_requests[0].number }}
2600
- run: |
2601
- gh pr comment "$PR_NUMBER" \\
2602
- --body "**Autofix limit reached (3 attempts).** Please review manually \u2014 Claude was unable to resolve the CI failures automatically."
2603
- echo "Max iterations reached. Exiting without further autofix."
2604
- exit 0
2605
-
2606
- # \u2500\u2500 5. Fetch the CI failure logs \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2607
- # Download logs from the failed workflow run so Claude has concrete
2608
- # failure context to work from. ANSI escape codes are stripped so the
2609
- # logs are readable as plain text.
2610
- - name: Fetch CI failure logs
2611
- id: logs
2612
- env:
2613
- GH_TOKEN: \${{ github.token }}
2614
- RUN_ID: \${{ github.event.workflow_run.id }}
2615
- run: |
2616
- gh run view "$RUN_ID" --log-failed 2>&1 \\
2617
- | sed 's/\\x1b\\[[0-9;]*m//g' \\
2618
- > /tmp/ci-failure.log
2619
- echo "=== CI failure log (first 200 lines) ==="
2620
- head -200 /tmp/ci-failure.log
2621
-
2622
- # \u2500\u2500 6. Set up Node.js (adjust version to match your project) \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2623
- - name: Set up Node.js
2624
- uses: actions/setup-node@v4
2625
- with:
2626
- node-version: "20"
2627
- cache: "npm"
2628
-
2629
- # \u2500\u2500 7. Install project dependencies \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2630
- - name: Install dependencies
2631
- run: npm ci
2632
-
2633
- # \u2500\u2500 8. Install Claude Code CLI \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2634
- - name: Install Claude Code
2635
- run: npm install -g @anthropic-ai/claude-code
2636
-
2637
- # \u2500\u2500 9. Run Claude Code to fix the failure \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2638
- # - Uses \`claude -p\` (prompt mode) so it runs non-interactively.
2639
- # - No --model flag: the environment's default model is used.
2640
- # - --dangerously-skip-permissions lets Claude edit files without prompts.
2641
- # - --max-turns 20 caps the agentic loop so it can't run indefinitely.
2642
- # - set +e captures the exit code without aborting the step immediately.
2643
- # - set -o pipefail ensures piped commands propagate failures correctly.
2644
- - name: Run Claude Code autofix
2645
- id: claude
2646
- env:
2647
- ANTHROPIC_API_KEY: \${{ secrets.ANTHROPIC_API_KEY }}
2648
- GH_TOKEN: \${{ steps.app-token.outputs.token }}
2649
- run: |
2650
- set +e
2651
- set -o pipefail
2652
-
2653
- FAILURE_LOG=$(cat /tmp/ci-failure.log)
2654
-
2655
- claude -p \\
2656
- --dangerously-skip-permissions \\
2657
- --max-turns 20 \\
2658
- "CI is failing on this PR. Here are the failure logs:
2659
-
2660
- \${FAILURE_LOG}
2661
-
2662
- Please investigate the root cause, fix the code, and make sure the tests pass.
2663
- Do not modify workflow files. Focus only on source code and test files.
2664
- After making changes, run the test suite to verify the fix works." \\
2665
- 2>&1 | sed 's/\\x1b\\[[0-9;]*m//g' | tee /tmp/claude-output.log
2666
-
2667
- CLAUDE_EXIT=$?
2668
- echo "exit_code=$CLAUDE_EXIT" >> "$GITHUB_OUTPUT"
2669
- exit $CLAUDE_EXIT
2670
-
2671
- # \u2500\u2500 10. Commit and push any changes Claude made \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2672
- # If Claude modified files, commit them with an "autofix:" prefix so the
2673
- # iteration counter in step 3 can find them on future runs.
2674
- - name: Commit and push autofix changes
2675
- if: steps.claude.outputs.exit_code == '0'
2676
- env:
2677
- GH_TOKEN: \${{ steps.app-token.outputs.token }}
2678
- run: |
2679
- git config user.name "Joycraft Autofix"
2680
- git config user.email "autofix@joycraft.dev"
2681
-
2682
- git add -A
2683
-
2684
- if git diff --cached --quiet; then
2685
- echo "No changes to commit \u2014 Claude made no file modifications."
2686
- exit 0
2687
- fi
2688
-
2689
- ITERATION=\${{ steps.iteration.outputs.count }}
2690
- NEXT=$(( ITERATION + 1 ))
2691
-
2692
- git commit -m "autofix: attempt $NEXT \u2014 fix CI failures [skip autofix]"
2693
- git push
2694
-
2695
- # \u2500\u2500 11. Post a summary comment on the PR \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2696
- # Always post a comment so the PR author knows what happened.
2697
- - name: Post result comment
2698
- if: always()
2699
- env:
2700
- GH_TOKEN: \${{ steps.app-token.outputs.token }}
2701
- PR_NUMBER: \${{ github.event.workflow_run.pull_requests[0].number }}
2702
- CLAUDE_EXIT: \${{ steps.claude.outputs.exit_code }}
2703
- run: |
2704
- if [ "$CLAUDE_EXIT" = "0" ]; then
2705
- BODY="**Autofix pushed a fix.** CI has been re-triggered. If it still fails, another autofix attempt will run (up to 3 total)."
2706
- else
2707
- BODY="**Autofix ran but could not produce a clean fix** (exit code: $CLAUDE_EXIT). Please review the logs and fix manually."
2708
- fi
2709
-
2710
- gh pr comment "$PR_NUMBER" --body "$BODY"
2711
- `,
2712
- "workflows/scenarios-dispatch.yml": `# Scenarios Dispatch Workflow
2713
- #
2714
- # Triggered when CI passes on a PR. Fires a \`repository_dispatch\` event to a
2715
- # separate scenarios repository so that integration / end-to-end scenario tests
2716
- # can run against the PR's code without living in this repo.
2717
- #
2718
- # Prerequisites:
2719
- # - JOYCRAFT_APP_PRIVATE_KEY secret: GitHub App private key (.pem)
2720
- # - $SCENARIOS_REPO is replaced with the actual repo name at install time
2721
-
2722
- name: Scenarios Dispatch
2723
-
2724
- on:
2725
- workflow_run:
2726
- # Replace with the exact name of your CI workflow
2727
- workflows: ["CI"]
2728
- types: [completed]
2729
-
2730
- jobs:
2731
- dispatch:
2732
- name: Fire scenarios dispatch
2733
- runs-on: ubuntu-latest
2734
-
2735
- # Only run when CI succeeded and the triggering workflow was on a PR
2736
- if: |
2737
- github.event.workflow_run.conclusion == 'success' &&
2738
- github.event.workflow_run.pull_requests[0] != null
2739
-
2740
- steps:
2741
- # \u2500\u2500 1. Generate GitHub App token for cross-repo dispatch \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2742
- - name: Generate GitHub App token
2743
- id: app-token
2744
- uses: actions/create-github-app-token@v1
2745
- with:
2746
- app-id: $JOYCRAFT_APP_ID
2747
- private-key: \${{ secrets.JOYCRAFT_APP_PRIVATE_KEY }}
2748
- repositories: $SCENARIOS_REPO
2749
-
2750
- # \u2500\u2500 2. Fire repository_dispatch to the scenarios repo \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2751
- # Sends a \`run-scenarios\` event carrying enough context for the scenarios
2752
- # repo to check out the correct branch/SHA and know which PR triggered it.
2753
- # $SCENARIOS_REPO is replaced with the actual repo name at install time.
2754
- - name: Dispatch run-scenarios event
2755
- env:
2756
- GH_TOKEN: \${{ steps.app-token.outputs.token }}
2757
- run: |
2758
- PR_NUMBER=\${{ github.event.workflow_run.pull_requests[0].number }}
2759
- BRANCH=\${{ github.event.workflow_run.pull_requests[0].head.ref }}
2760
- SHA=\${{ github.event.workflow_run.head_sha }}
2761
-
2762
- gh api repos/\${{ github.repository_owner }}/$SCENARIOS_REPO/dispatches \\
2763
- -f event_type=run-scenarios \\
2764
- -f "client_payload[pr_number]=$PR_NUMBER" \\
2765
- -f "client_payload[branch]=$BRANCH" \\
2766
- -f "client_payload[sha]=$SHA" \\
2767
- -f "client_payload[repo]=\${{ github.repository }}"
2768
-
2769
- echo "Dispatched run-scenarios to $SCENARIOS_REPO for PR #$PR_NUMBER"
2770
- `,
2771
- "workflows/scenarios-rerun.yml": `# Scenarios Re-run Workflow
2772
- #
2773
- # Triggered when the scenarios repo reports that it has updated its tests
2774
- # (type: scenarios-updated). Finds all open PRs and fires a \`run-scenarios\`
2775
- # dispatch to the scenarios repo for each one, so that newly generated or
2776
- # updated tests are exercised against in-flight PR branches.
2777
- #
2778
- # This handles the race condition where a PR's implementation completes before
2779
- # the scenario agent has finished writing its holdout tests.
2780
- #
2781
- # Prerequisites:
2782
- # - JOYCRAFT_APP_PRIVATE_KEY secret: GitHub App private key (.pem)
2783
- # - $JOYCRAFT_APP_ID is replaced with the actual App ID number at install time
2784
- # - $SCENARIOS_REPO is replaced with the actual scenarios repo name at install time
2785
-
2786
- name: Scenarios Re-run
2787
-
2788
- on:
2789
- repository_dispatch:
2790
- types: [scenarios-updated]
2791
-
2792
- jobs:
2793
- rerun:
2794
- name: Re-run scenarios against open PRs
2795
- runs-on: ubuntu-latest
2796
-
2797
- steps:
2798
- # \u2500\u2500 1. Generate GitHub App token for cross-repo dispatch \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2799
- - name: Generate GitHub App token
2800
- id: app-token
2801
- uses: actions/create-github-app-token@v1
2802
- with:
2803
- app-id: $JOYCRAFT_APP_ID
2804
- private-key: \${{ secrets.JOYCRAFT_APP_PRIVATE_KEY }}
2805
- repositories: $SCENARIOS_REPO
2806
-
2807
- # \u2500\u2500 2. List open PRs and dispatch run-scenarios for each \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2808
- # If there are no open PRs, exits cleanly \u2014 nothing to do.
2809
- - name: Dispatch run-scenarios for each open PR
2810
- env:
2811
- GH_TOKEN: \${{ steps.app-token.outputs.token }}
2812
- run: |
2813
- OPEN_PRS=$(gh api repos/\${{ github.repository }}/pulls \\
2814
- --jq '.[] | "\\(.number) \\(.head.ref) \\(.head.sha)"')
2815
-
2816
- if [ -z "$OPEN_PRS" ]; then
2817
- echo "No open PRs \u2014 nothing to re-run."
2818
- exit 0
2819
- fi
2820
-
2821
- while IFS=' ' read -r PR_NUMBER BRANCH SHA; do
2822
- [ -z "$PR_NUMBER" ] && continue
2823
-
2824
- echo "Dispatching run-scenarios for PR #$PR_NUMBER (branch: $BRANCH, sha: $SHA)"
2825
-
2826
- gh api repos/\${{ github.repository_owner }}/$SCENARIOS_REPO/dispatches \\
2827
- -f event_type=run-scenarios \\
2828
- -f "client_payload[pr_number]=$PR_NUMBER" \\
2829
- -f "client_payload[branch]=$BRANCH" \\
2830
- -f "client_payload[sha]=$SHA" \\
2831
- -f "client_payload[repo]=\${{ github.repository }}"
2832
-
2833
- done <<< "$OPEN_PRS"
2834
- `,
2835
- "workflows/spec-dispatch.yml": `# Spec Dispatch Workflow
2836
- #
2837
- # Triggered when specs are pushed to main. For each added or modified spec,
2838
- # fires a \`spec-pushed\` repository_dispatch event to the scenarios repo so
2839
- # that a scenario agent can triage the spec and write/update holdout tests.
2840
- #
2841
- # Prerequisites:
2842
- # - JOYCRAFT_APP_PRIVATE_KEY secret: GitHub App private key (.pem)
2843
- # - $JOYCRAFT_APP_ID is replaced with the actual App ID number at install time
2844
- # - $SCENARIOS_REPO is replaced with the actual scenarios repo name at install time
2845
-
2846
- name: Spec Dispatch
2847
-
2848
- on:
2849
- push:
2850
- branches: [main]
2851
- paths:
2852
- - "docs/specs/**"
2853
-
2854
- jobs:
2855
- dispatch:
2856
- name: Dispatch changed specs to scenarios repo
2857
- runs-on: ubuntu-latest
2858
-
2859
- steps:
2860
- # \u2500\u2500 1. Check out with depth 2 to enable HEAD~1 diff \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2861
- - name: Checkout
2862
- uses: actions/checkout@v4
2863
- with:
2864
- fetch-depth: 2
2865
-
2866
- # \u2500\u2500 2. Find added or modified spec files \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2867
- # --diff-filter=AM: Added or Modified only \u2014 ignore deletions.
2868
- - name: Find changed specs
2869
- id: changed
2870
- run: |
2871
- FILES=$(git diff --name-only --diff-filter=AM HEAD~1 HEAD -- 'docs/specs/*.md')
2872
- echo "files<<EOF" >> "$GITHUB_OUTPUT"
2873
- echo "$FILES" >> "$GITHUB_OUTPUT"
2874
- echo "EOF" >> "$GITHUB_OUTPUT"
2875
- echo "Changed specs: $FILES"
2876
-
2877
- # \u2500\u2500 3. Generate GitHub App token for cross-repo dispatch \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2878
- # Skipped if no specs changed (token unused, save the round-trip).
2879
- - name: Generate GitHub App token
2880
- id: app-token
2881
- if: steps.changed.outputs.files != ''
2882
- uses: actions/create-github-app-token@v1
2883
- with:
2884
- app-id: $JOYCRAFT_APP_ID
2885
- private-key: \${{ secrets.JOYCRAFT_APP_PRIVATE_KEY }}
2886
- repositories: $SCENARIOS_REPO
2887
-
2888
- # \u2500\u2500 4. Dispatch each changed spec to the scenarios repo \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2889
- # Sends a \`spec-pushed\` event with the spec filename, full content,
2890
- # commit SHA, branch, and originating repo. The scenario agent uses
2891
- # this payload to triage and generate/update tests.
2892
- - name: Dispatch spec-pushed events
2893
- if: steps.changed.outputs.files != ''
2894
- env:
2895
- GH_TOKEN: \${{ steps.app-token.outputs.token }}
2896
- run: |
2897
- while IFS= read -r SPEC_FILE; do
2898
- [ -z "$SPEC_FILE" ] && continue
2899
-
2900
- SPEC_FILENAME=$(basename "$SPEC_FILE")
2901
- SPEC_CONTENT=$(cat "$SPEC_FILE")
2902
-
2903
- echo "Dispatching spec-pushed for $SPEC_FILENAME"
2904
-
2905
- gh api repos/\${{ github.repository_owner }}/$SCENARIOS_REPO/dispatches \\
2906
- -f event_type=spec-pushed \\
2907
- -f "client_payload[spec_filename]=$SPEC_FILENAME" \\
2908
- -f "client_payload[spec_content]=$SPEC_CONTENT" \\
2909
- -f "client_payload[commit_sha]=\${{ github.sha }}" \\
2910
- -f "client_payload[branch]=\${{ github.ref_name }}" \\
2911
- -f "client_payload[repo]=\${{ github.repository }}"
2912
-
2913
- done <<< "\${{ steps.changed.outputs.files }}"
2914
- `
2915
- };
2916
- var CODEX_SKILLS = {
2917
- "joycraft-add-fact.md": `---
2918
- name: joycraft-add-fact
2919
- description: Capture a project fact and route it to the correct context document -- production map, dangerous assumptions, decision log, institutional knowledge, or troubleshooting
2920
- ---
2921
-
2922
- # Add Fact
2923
-
2924
- The user has a fact to capture. Your job is to classify it, route it to the correct context document, append it in the right format, and optionally add a boundary rule to CLAUDE.md or AGENTS.md.
2925
-
2926
- ## Step 1: Get the Fact
2927
-
2928
- If the user already provided the fact (e.g., \`$joycraft-add-fact the staging DB resets every Sunday\`), use it directly.
2929
-
2930
- If not, ask: "What fact do you want to capture?" -- then wait for their response.
2931
-
2932
- If the user provides multiple facts at once, process each one separately through all the steps below, then give a combined confirmation at the end.
2933
-
2934
- ## Step 2: Classify the Fact
2935
-
2936
- Route the fact to one of these 5 context documents based on its content:
2937
-
2938
- ### \`docs/context/production-map.md\`
2939
- The fact is about **infrastructure, services, environments, URLs, endpoints, credentials, or what is safe/unsafe to touch**.
2940
- - Signal words: "production", "staging", "endpoint", "URL", "database", "service", "deployed", "hosted", "credentials", "secret", "environment"
2941
- - Examples: "The staging DB is at postgres://staging.example.com", "We use Vercel for the frontend and Railway for the API"
2942
-
2943
- ### \`docs/context/dangerous-assumptions.md\`
2944
- The fact is about **something an AI agent might get wrong -- a false assumption that leads to bad outcomes**.
2945
- - Signal words: "assumes", "might think", "but actually", "looks like X but is Y", "not what it seems", "trap", "gotcha"
2946
- - Examples: "The \`users\` table looks like a test table but it's production", "Deleting a workspace doesn't delete the billing subscription"
2947
-
2948
- ### \`docs/context/decision-log.md\`
2949
- The fact is about **an architectural or tooling choice and why it was made**.
2950
- - Signal words: "decided", "chose", "because", "instead of", "we went with", "the reason we use", "trade-off"
2951
- - Examples: "We chose SQLite over Postgres because this runs on embedded devices", "We use pnpm instead of npm for workspace support"
2952
-
2953
- ### \`docs/context/institutional-knowledge.md\`
2954
- The fact is about **team conventions, unwritten rules, organizational context, or who owns what**.
2955
- - Signal words: "convention", "rule", "always", "never", "team", "process", "review", "approval", "owns", "responsible"
2956
- - Examples: "The design team reviews all color changes", "We never deploy on Fridays", "PR titles must start with the ticket number"
2957
-
2958
- ### \`docs/context/troubleshooting.md\`
2959
- The fact is about **diagnostic knowledge -- when X happens, do Y (or don't do Z)**.
2960
- - Signal words: "when", "fails", "error", "if you see", "stuck", "broken", "fix", "workaround", "before trying", "reboot", "restart", "reset"
2961
- - Examples: "If Wi-Fi disconnects during flash, wait and retry -- don't switch networks", "When tests fail with ECONNREFUSED, check if Docker is running"
2962
-
2963
- ### Ambiguous Facts
2964
-
2965
- If the fact fits multiple categories, pick the **best fit** based on the primary intent. You will mention the alternative in your confirmation message so the user can correct you.
2966
-
2967
- ## Step 3: Ensure the Target Document Exists
2968
-
2969
- 1. If \`docs/context/\` does not exist, create the directory.
2970
- 2. If the target document does not exist, create it from the template structure. Check \`docs/templates/\` for the matching template. If no template exists, use this minimal structure:
2971
-
2972
- For **production-map.md**:
2973
- \`\`\`markdown
2974
- # Production Map
2975
-
2976
- > What's real, what's staging, what's safe to touch.
2977
-
2978
- ## Services
2979
-
2980
- | Service | Environment | URL/Endpoint | Impact if Corrupted |
2981
- |---------|-------------|-------------|-------------------|
2982
- \`\`\`
2983
-
2984
- For **dangerous-assumptions.md**:
2985
- \`\`\`markdown
2986
- # Dangerous Assumptions
2987
-
2988
- > Things the AI agent might assume that are wrong in this project.
2989
-
2990
- ## Assumptions
2991
-
2992
- | Agent Might Assume | But Actually | Impact If Wrong |
2993
- |-------------------|-------------|----------------|
2994
- \`\`\`
2995
-
2996
- For **decision-log.md**:
2997
- \`\`\`markdown
2998
- # Decision Log
2999
-
3000
- > Why choices were made, not just what was chosen.
3001
-
3002
- ## Decisions
3003
-
3004
- | Date | Decision | Why | Alternatives Rejected | Revisit When |
3005
- |------|----------|-----|----------------------|-------------|
3006
- \`\`\`
3007
-
3008
- For **institutional-knowledge.md**:
3009
- \`\`\`markdown
3010
- # Institutional Knowledge
3011
-
3012
- > Unwritten rules, team conventions, and organizational context.
3013
-
3014
- ## Team Conventions
3015
-
3016
- - (none yet)
3017
- \`\`\`
3018
-
3019
- For **troubleshooting.md**:
3020
- \`\`\`markdown
3021
- # Troubleshooting
3022
-
3023
- > What to do when things go wrong for non-code reasons.
3024
-
3025
- ## Common Failures
3026
-
3027
- | When This Happens | Do This | Don't Do This |
3028
- |-------------------|---------|---------------|
3029
- \`\`\`
3030
-
3031
- ## Step 4: Read the Target Document
3032
-
3033
- Read the target document to understand its current structure. Note:
3034
- - Which section to append to
3035
- - Whether it uses tables or lists
3036
- - The column format if it's a table
3037
-
3038
- ## Step 5: Append the Fact
3039
-
3040
- Add the fact to the appropriate section of the target document. Match the existing format exactly:
3041
-
3042
- - **Table-based documents** (production-map, dangerous-assumptions, decision-log, troubleshooting): Add a new table row in the correct columns. Use today's date where a date column exists.
3043
- - **List-based documents** (institutional-knowledge): Add a new list item (\`- \`) to the most appropriate section.
3044
-
3045
- Remove any italic example rows (rows where all cells start with \`_\`) before appending, so the document transitions from template to real content. Only remove examples from the specific table you are appending to.
3046
-
3047
- **Append only. Never modify or remove existing real content.**
3048
-
3049
- ## Step 6: Evaluate Boundary Rule
3050
-
3051
- Decide whether the fact also warrants a rule in the project's boundary configuration (CLAUDE.md and/or AGENTS.md -- check which files the project uses and update accordingly):
3052
-
3053
- **Add a boundary rule if the fact:**
3054
- - Describes something that should ALWAYS or NEVER be done
3055
- - Could cause real damage if violated (data loss, broken deployments, security issues)
3056
- - Is a hard constraint that applies across all work, not just a one-time note
3057
-
3058
- **Do NOT add a boundary rule if the fact is:**
3059
- - Purely informational (e.g., "staging DB is at this URL")
3060
- - A one-time decision that's already captured
3061
- - A diagnostic tip rather than a prohibition
3062
-
3063
- If a rule is warranted, read the project's boundary file(s) -- CLAUDE.md and/or AGENTS.md -- find the appropriate section (ALWAYS, ASK FIRST, or NEVER under Behavioral Boundaries), and append the rule. If no Behavioral Boundaries section exists, append one. Update whichever boundary files the project uses (some projects have CLAUDE.md, some have AGENTS.md, some have both).
3064
-
3065
- ## Step 7: Confirm
3066
-
3067
- Report what you did in this format:
3068
-
3069
- \`\`\`
3070
- Added to [document name]:
3071
- [summary of what was added]
3072
-
3073
- [If boundary file(s) were also updated:]
3074
- Added boundary rule to [CLAUDE.md / AGENTS.md / both]:
3075
- [ALWAYS/ASK FIRST/NEVER]: [rule text]
3076
-
3077
- [If the fact was ambiguous:]
3078
- Routed to [chosen doc] -- move to [alternative doc] if this is more about [alternative category description].
3079
- \`\`\`
3080
- `,
3081
- "joycraft-bugfix.md": `---
3082
- name: joycraft-bugfix
3083
- description: Structured bug fix workflow \u2014 triage, diagnose, discuss with user, write a focused spec, hand off for implementation
3084
- ---
3085
-
3086
- # Bug Fix Workflow
3087
-
3088
- You are fixing a bug. Follow this process in order. Do not skip steps.
3089
-
3090
- **Guard clause:** If this is clearly a new feature, redirect to \`$joycraft-new-feature\` and stop.
3091
-
3092
- ---
3093
-
3094
- ## Phase 1: Triage
3095
-
3096
- Establish what's broken. Gather: symptom, steps to reproduce, expected vs actual behavior, when it started, relevant logs/errors. If an error message or stack trace is provided, read the referenced files immediately. Try to reproduce if steps are given.
3097
-
3098
- **Done when:** You can describe the symptom in one sentence.
3099
-
3100
- ---
3101
-
3102
- ## Phase 2: Diagnose
3103
-
3104
- Find the root cause. Start from the error site and trace backward. Search the codebase and read files \u2014 don't guess. Identify the specific line(s) and logic error. Check git blame if it's a recent regression.
3105
-
3106
- **Done when:** You can explain what's wrong, why, and where in 2-3 sentences.
3107
-
3108
- ---
3109
-
3110
- ## Phase 3: Discuss
3111
-
3112
- Present findings to the user BEFORE writing any code or spec:
3113
- 1. **Symptom** \u2014 confirm it matches what they see
3114
- 2. **Root cause** \u2014 specific file(s) and line(s)
3115
- 3. **Proposed fix** \u2014 what changes, where
3116
- 4. **Risk** \u2014 side effects? scope?
3117
-
3118
- Ask: "Does this match? Comfortable with this approach?" If large/risky, suggest decomposing into multiple specs.
3119
-
3120
- **Done when:** User agrees with the diagnosis and fix direction.
3121
-
3122
- ---
3123
-
3124
- ## Phase 4: Spec the Fix
3125
-
3126
- Write a bug fix spec to \`docs/specs/YYYY-MM-DD-bugfix-name.md\`. Create the \`docs/specs/\` directory if it doesn't exist.
3127
-
3128
- **Why:** Even bug fixes deserve a spec. It forces clarity on what "fixed" means, ensures test-first discipline, and creates a traceable record of the fix.
3129
-
3130
- Use this structure:
3131
-
3132
- \`\`\`markdown
3133
- # [Bug Name] \u2014 Bug Fix Spec
3134
-
3135
- > **Status:** Ready
3136
- > **Date:** YYYY-MM-DD
3137
- > **Estimated scope:** [1 session / N files / ~N lines]
3138
-
3139
- ---
3140
-
3141
- ## Bug
3142
- One sentence \u2014 what's broken?
3143
-
3144
- ## Root Cause
3145
- What's actually wrong, in which file(s) and line(s)?
3146
-
3147
- ## Fix
3148
- What changes, where?
3149
-
3150
- ## Acceptance Criteria
3151
- - [ ] [Observable behavior that proves the fix works]
3152
- - [ ] No regressions \u2014 existing tests still pass
3153
- - [ ] Build passes
3154
-
3155
- ## Test Plan
3156
- 1. Write a reproduction test that fails before the fix
3157
- 2. Apply the fix
3158
- 3. Reproduction test passes
3159
- 4. Full test suite passes
3160
-
3161
- ## Constraints
3162
- - MUST: [hard requirement]
3163
- - MUST NOT: [hard prohibition]
3164
-
3165
- ## Affected Files
3166
- | Action | File | What Changes |
3167
- |--------|------|-------------|
3168
-
3169
- ## Edge Cases
3170
- | Scenario | Expected Behavior |
3171
- |----------|------------------|
3172
- \`\`\`
3173
-
3174
- **For large bugs that span multiple files/systems:** Consider whether this should be decomposed into multiple specs. If so, create a brief first using \`$joycraft-new-feature\`, then decompose.
3175
-
3176
- ---
3177
-
3178
- ## Phase 5: Hand Off
3179
-
3180
- \`\`\`
3181
- Bug fix spec is ready: docs/specs/YYYY-MM-DD-bugfix-name.md
3182
-
3183
- Summary:
3184
- - Bug: [one sentence]
3185
- - Root cause: [one sentence]
3186
- - Fix: [one sentence]
3187
- - Estimated: 1 session
3188
-
3189
- To execute: Start a fresh session and:
3190
- 1. Read the spec
3191
- 2. Write the reproduction test (must fail)
3192
- 3. Apply the fix (test must pass)
3193
- 4. Run full test suite
3194
- 5. Run $joycraft-session-end to capture discoveries
3195
- 6. Commit and PR
3196
-
3197
- Ready to start?
3198
- \`\`\`
3199
- `,
3200
- "joycraft-decompose.md": `---
3201
- name: joycraft-decompose
3202
- description: Break a feature brief into atomic specs \u2014 small, testable, independently executable units
3203
- ---
3204
-
3205
- # Decompose Feature into Atomic Specs
3206
-
3207
- You have a Feature Brief (or the user has described a feature). Your job is to decompose it into atomic specs that can be executed independently \u2014 one spec per session.
3208
-
3209
- ## Step 1: Verify the Brief Exists
3210
-
3211
- Look for a Feature Brief in \`docs/briefs/\`. If one doesn't exist yet, tell the user:
3212
-
3213
- > No feature brief found. Run \`$joycraft-new-feature\` first to interview and create one, or describe the feature now and I'll work from your description.
3214
-
3215
- If the user describes the feature inline, work from that description directly. You don't need a formal brief to decompose \u2014 but recommend creating one for complex features.
3216
-
3217
- ## Step 2: Identify Natural Boundaries
3218
-
3219
- **Why:** Good boundaries make specs independently testable and committable. Bad boundaries create specs that can't be verified without other specs also being done.
3220
-
3221
- Read the brief (or description) and identify natural split points:
3222
-
3223
- - **Data layer changes** (schemas, types, migrations) \u2014 always a separate spec
3224
- - **Pure functions / business logic** \u2014 separate from I/O
3225
- - **UI components** \u2014 separate from data fetching
3226
- - **API endpoints / route handlers** \u2014 separate from business logic
3227
- - **Test infrastructure** (mocks, fixtures, helpers) \u2014 can be its own spec if substantial
3228
- - **Configuration / environment** \u2014 separate from code changes
3229
-
3230
- Ask yourself: "Can this piece be committed and tested without the other pieces existing?" If yes, it's a good boundary.
3231
-
3232
- ## Step 3: Build the Decomposition Table
3233
-
3234
- For each atomic spec, define:
3235
-
3236
- | # | Spec Name | Description | Dependencies | Size |
3237
- |---|-----------|-------------|--------------|------|
3238
-
3239
- **Rules:**
3240
- - Each spec name is \`verb-object\` format (e.g., \`add-terminal-detection\`, \`extract-prompt-module\`)
3241
- - Each description is ONE sentence \u2014 if you need two, the spec is too big
3242
- - Dependencies reference other spec numbers \u2014 keep the dependency graph shallow
3243
- - More than 2 dependencies on a single spec = it's too big, split further
3244
- - Aim for 3-7 specs per feature. Fewer than 3 = probably not decomposed enough. More than 10 = the feature brief is too big
3245
-
3246
- ## Step 4: Present and Iterate
3247
-
3248
- Show the decomposition table to the user. Ask:
3249
- 1. "Does this breakdown match how you think about this feature?"
3250
- 2. "Are there any specs that feel too big or too small?"
3251
- 3. "Should any of these run in parallel (separate branches)?"
3252
-
3253
- Iterate until the user approves.
3254
-
3255
- ## Step 5: Generate Atomic Specs
3256
-
3257
- For each approved row, create \`docs/specs/YYYY-MM-DD-spec-name.md\`. Create the \`docs/specs/\` directory if it doesn't exist.
3258
-
3259
- **Why:** Each spec must be self-contained \u2014 a fresh session should be able to execute it without reading the Feature Brief. Copy relevant constraints and context into each spec.
3260
-
3261
- Use this structure:
3262
-
3263
- \`\`\`markdown
3264
- # [Verb + Object] \u2014 Atomic Spec
3265
-
3266
- > **Parent Brief:** \`docs/briefs/YYYY-MM-DD-feature-name.md\` (or "standalone")
3267
- > **Status:** Ready
3268
- > **Date:** YYYY-MM-DD
3269
- > **Estimated scope:** [1 session / N files / ~N lines]
3270
-
3271
- ---
3272
-
3273
- ## What
3274
- One paragraph \u2014 what changes when this spec is done?
3275
-
3276
- ## Why
3277
- One sentence \u2014 what breaks or is missing without this?
3278
-
3279
- ## Acceptance Criteria
3280
- - [ ] [Observable behavior]
3281
- - [ ] Build passes
3282
- - [ ] Tests pass
3283
-
3284
- ## Test Plan
3285
-
3286
- | Acceptance Criterion | Test | Type |
3287
- |---------------------|------|------|
3288
- | [Each AC above] | [What to call/assert] | [unit/integration/e2e] |
3289
-
3290
- **Execution order:**
3291
- 1. Write all tests above \u2014 they should fail against current/stubbed code
3292
- 2. Run tests to confirm they fail (red)
3293
- 3. Implement until all tests pass (green)
3294
-
3295
- **Smoke test:** [Identify the fastest test for iteration feedback]
3296
-
3297
- **Before implementing, verify your test harness:**
3298
- 1. Run all tests \u2014 they must FAIL (if they pass, you're testing the wrong thing)
3299
- 2. Each test calls your actual function/endpoint \u2014 not a reimplementation or the underlying library
3300
- 3. Identify your smoke test \u2014 it must run in seconds, not minutes, so you get fast feedback on each change
3301
-
3302
- ## Constraints
3303
- - MUST: [hard requirement]
3304
- - MUST NOT: [hard prohibition]
3305
-
3306
- ## Affected Files
3307
- | Action | File | What Changes |
3308
- |--------|------|-------------|
3309
-
3310
- ## Approach
3311
- Strategy, data flow, key decisions. Name one rejected alternative.
3312
-
3313
- ## Edge Cases
3314
- | Scenario | Expected Behavior |
3315
- |----------|------------------|
3316
- \`\`\`
3317
-
3318
- If \`docs/templates/ATOMIC_SPEC_TEMPLATE.md\` exists, reference it for the full template with additional guidance.
3319
-
3320
- Fill in all sections \u2014 each spec must be self-contained (no "see the brief for context"). Copy relevant constraints from the Feature Brief into each spec. Write acceptance criteria specific to THIS spec, not the whole feature. Every acceptance criterion must have at least one corresponding test in the Test Plan. If the user provided test strategy info from the interview, use it to choose test types and frameworks. Include the test harness verification rules in every Test Plan.
3321
-
3322
- ## Step 6: Recommend Execution Strategy
3323
-
3324
- Based on the dependency graph:
3325
- - **Independent specs** \u2014 "These can run in parallel branches"
3326
- - **Sequential specs** \u2014 "Execute these in order: 1 -> 2 -> 4"
3327
- - **Mixed** \u2014 "Start specs 1 and 3 in parallel. After 1 completes, start 2."
3328
-
3329
- Update the Feature Brief's Execution Strategy section with the plan (if a brief exists).
3330
-
3331
- ## Step 7: Hand Off
3332
-
3333
- Tell the user:
3334
- \`\`\`
3335
- Decomposition complete:
3336
- - [N] atomic specs created in docs/specs/
3337
- - [N] can run in parallel, [N] are sequential
3338
- - Estimated total: [N] sessions
3339
-
3340
- To execute:
3341
- - Sequential: Open a session, point at each spec in order
3342
- - Parallel: One spec per branch, merge when done
3343
- - Each session should end with $joycraft-session-end to capture discoveries
3344
-
3345
- Ready to start execution?
3346
- \`\`\`
3347
- `,
3348
- "joycraft-design.md": `---
3349
- name: joycraft-design
3350
- description: Design discussion before decomposition \u2014 produce a ~200-line design artifact for human review, catching wrong assumptions before they propagate into specs
3351
- ---
3352
-
3353
- # Design Discussion
3354
-
3355
- You are producing a design discussion document for a feature. This sits between research and decomposition \u2014 it captures your understanding so the human can catch wrong assumptions before specs are written.
3356
-
3357
- **Guard clause:** If no brief path is provided and no brief exists in \`docs/briefs/\`, say:
3358
- "No feature brief found. Run \`$joycraft-new-feature\` first to create one, or provide the path to your brief."
3359
- Then stop.
3360
-
3361
- ---
3362
-
3363
- ## Step 1: Read Inputs
3364
-
3365
- Read the feature brief at the path the user provides. If the user also provides a research document path, read that too.
3366
-
3367
- ## Step 2: Explore the Codebase
3368
-
3369
- Spawn concurrent subagent threads to explore the codebase for patterns relevant to the brief. Focus on:
3370
-
3371
- - Files and functions that will be touched or extended
3372
- - Existing patterns this feature should follow
3373
- - Similar features already implemented that serve as models
3374
- - Boundaries and interfaces the feature must integrate with
3375
-
3376
- Each subagent should search the codebase and read files to gather file paths, function signatures, and code snippets.
3377
-
3378
- ## Step 3: Write the Design Document
3379
-
3380
- Create \`docs/designs/\` directory if it doesn't exist. Write to \`docs/designs/YYYY-MM-DD-feature-name.md\`.
3381
-
3382
- The document has exactly five sections:
3383
-
3384
- ### Section 1: Current State
3385
- What exists today in the codebase. Include file paths, function signatures, data flows. Be specific.
3386
-
3387
- ### Section 2: Desired End State
3388
- What the codebase should look like when this feature is complete.
3389
-
3390
- ### Section 3: Patterns to Follow
3391
- Existing patterns in the codebase that this feature should match. Include code snippets and \`file:line\` references.
3392
-
3393
- ### Section 4: Resolved Design Decisions
3394
- Decisions made with rationale. Format: Decision, Rationale, Alternative rejected.
3395
-
3396
- ### Section 5: Open Questions
3397
- Things where multiple valid approaches exist. Each question MUST present 2-3 concrete options with pros and cons.
3398
-
3399
- ## Step 4: Present and STOP
3400
-
3401
- Present the design document. Say:
3402
- \`\`\`
3403
- Design discussion written to docs/designs/YYYY-MM-DD-feature-name.md
3404
-
3405
- Please review. Specifically:
3406
- 1. Are the patterns in Section 3 right?
3407
- 2. Do you agree with the resolved decisions?
3408
- 3. Pick an option for each open question.
3409
-
3410
- Reply with your feedback. I will NOT proceed to decomposition until you have reviewed and approved.
3411
- \`\`\`
3412
-
3413
- **CRITICAL: Do NOT proceed to \`$joycraft-decompose\` or generate specs.** Wait for human review.
3414
-
3415
- ## After Human Review
3416
-
3417
- - Update the design document with corrections
3418
- - Move answered questions to Resolved Design Decisions
3419
- - Present for final confirmation
3420
- - Only after explicit approval: "Design approved. Run \`$joycraft-decompose\` with this brief to generate atomic specs."
3421
- `,
3422
- "joycraft-implement-level5.md": `---
3423
- name: joycraft-implement-level5
3424
- description: Set up Level 5 autonomous development \u2014 autofix loop, holdout scenario testing, and scenario evolution from specs
3425
- ---
3426
-
3427
- # Implement Level 5 \u2014 Autonomous Development Loop
3428
-
3429
- You are guiding the user through setting up Level 5: the autonomous feedback loop where specs go in, validated software comes out. This is a one-time setup that installs workflows, creates a scenarios repo, and configures the autofix loop.
3430
-
3431
- ## Before You Begin
3432
-
3433
- Check prerequisites:
3434
-
3435
- 1. **Project must be initialized.** Search for \`.joycraft-version\`. If missing, tell the user to run \`npx joycraft init\` first.
3436
- 2. **Project should be at Level 4.** Read \`docs/joycraft-assessment.md\` if it exists. If the project hasn't been assessed yet, suggest running \`$joycraft-tune\` first. But don't block -- the user may know they're ready.
3437
- 3. **Git repo with GitHub remote.** This setup requires GitHub Actions. Check for \`.git/\` and a GitHub remote.
3438
-
3439
- If prerequisites aren't met, explain what's needed and stop.
3440
-
3441
- ## Step 1: Explain What Level 5 Means
3442
-
3443
- Tell the user:
3444
-
3445
- > Level 5 is the autonomous loop. When you push specs, three things happen automatically:
3446
- >
3447
- > 1. **Scenario evolution** -- An AI agent reads your specs and writes holdout tests in a private scenarios repo. These tests are invisible to your coding agent.
3448
- > 2. **Autofix** -- When CI fails on a PR, the agent automatically attempts a fix (up to 3 times).
3449
- > 3. **Holdout validation** -- When CI passes, your scenarios repo runs behavioral tests against the PR. Results post as PR comments.
3450
- >
3451
- > The key insight: your coding agent never sees the scenario tests. This prevents it from gaming the test suite -- like a validation set in machine learning.
3452
-
3453
- ## Step 2: Gather Configuration
3454
-
3455
- Ask these questions **one at a time**:
3456
-
3457
- ### Question 1: Scenarios repo name
3458
-
3459
- > What should we call your scenarios repo? It'll be a private repo that holds your holdout tests.
3460
- >
3461
- > Default: \`{current-repo-name}-scenarios\`
3462
-
3463
- Accept the default or the user's choice.
3464
-
3465
- ### Question 2: GitHub App
3466
-
3467
- > Level 5 needs a GitHub App to provide a separate identity for autofix pushes (this avoids GitHub's anti-recursion protection). Creating one takes about 2 minutes:
3468
- >
3469
- > 1. Go to https://github.com/settings/apps/new
3470
- > 2. Give it a name (e.g., "My Project Autofix")
3471
- > 3. Uncheck "Webhook > Active" (not needed)
3472
- > 4. Under **Repository permissions**, set:
3473
- > - **Contents**: Read & Write
3474
- > - **Pull requests**: Read & Write
3475
- > - **Actions**: Read & Write
3476
- > 5. Click **Create GitHub App**
3477
- > 6. Note the **App ID** from the settings page
3478
- > 7. Scroll to **Private keys** > click **Generate a private key** > save the \`.pem\` file
3479
- > 8. Click **Install App** in the left sidebar > install it on your repo
3480
- >
3481
- > What's your App ID?
3482
-
3483
- ## Step 3: Run init-autofix
3484
-
3485
- Run the CLI command with the gathered configuration:
3486
-
3487
- \`\`\`bash
3488
- npx joycraft init-autofix --scenarios-repo {name} --app-id {id}
3489
- \`\`\`
3490
-
3491
- Review the output with the user. Confirm files were created.
3492
-
3493
- ## Step 4: Walk Through Secret Configuration
3494
-
3495
- Guide the user step by step:
3496
-
3497
- ### 4a: Add Secrets to Main Repo
3498
-
3499
- > You should already have the \`.pem\` file from when you created the app in Step 2.
3500
-
3501
- > Go to your repo's Settings > Secrets and variables > Actions, and add:
3502
- > - \`JOYCRAFT_APP_PRIVATE_KEY\` -- paste the contents of your \`.pem\` file
3503
- > - \`ANTHROPIC_API_KEY\` -- your Anthropic API key (or the appropriate AI provider key for your setup)
3504
-
3505
- ### 4b: Create the Scenarios Repo
3506
-
3507
- > Create the private scenarios repo:
3508
- > \`\`\`bash
3509
- > gh repo create {scenarios-repo-name} --private
3510
- > \`\`\`
3511
- >
3512
- > Then copy the scenario templates into it:
3513
- > \`\`\`bash
3514
- > cp -r docs/templates/scenarios/* ../{scenarios-repo-name}/
3515
- > cd ../{scenarios-repo-name}
3516
- > git add -A && git commit -m "init: scaffold scenarios repo from Joycraft"
3517
- > git push
3518
- > \`\`\`
3519
-
3520
- ### 4c: Add Secrets to Scenarios Repo
3521
-
3522
- > The scenarios repo also needs the App private key:
3523
- > - \`JOYCRAFT_APP_PRIVATE_KEY\` -- same \`.pem\` file as the main repo
3524
- > - \`ANTHROPIC_API_KEY\` -- same key (needed for scenario generation)
3525
-
3526
- ## Step 5: Verify Setup
3527
-
3528
- Help the user verify everything is wired correctly:
3529
-
3530
- 1. **Check workflow files exist:** \`ls .github/workflows/autofix.yml .github/workflows/scenarios-dispatch.yml .github/workflows/spec-dispatch.yml .github/workflows/scenarios-rerun.yml\`
3531
- 2. **Check scenario templates were copied:** Verify the scenarios repo has \`example-scenario.test.ts\`, \`workflows/run.yml\`, \`workflows/generate.yml\`, \`prompts/scenario-agent.md\`
3532
- 3. **Check the App ID is correct** in the workflow files (not still a placeholder)
3533
-
3534
- ## Step 6: Update AGENTS.md
3535
-
3536
- If the project's AGENTS.md doesn't already have an "External Validation" section, add one:
3537
-
3538
- > ## External Validation
3539
- >
3540
- > This project uses holdout scenario tests in a separate private repo.
3541
- >
3542
- > ### NEVER
3543
- > - Access, read, or reference the scenarios repo
3544
- > - Mention scenario test names or contents
3545
- > - Modify the scenarios dispatch workflow to leak test information
3546
- >
3547
- > The scenarios repo is deliberately invisible to you. This is the holdout guarantee.
3548
-
3549
- ## Step 7: First Test (Optional)
3550
-
3551
- If the user wants to test the loop:
3552
-
3553
- > Want to do a quick test? Here's how:
3554
- >
3555
- > 1. Write a simple spec in \`docs/specs/\` and push to main -- this triggers scenario generation
3556
- > 2. Create a PR with a small change -- when CI passes, scenarios will run
3557
- > 3. Watch for the scenario test results as a PR comment
3558
- >
3559
- > Or deliberately break something in a PR to test the autofix loop.
3560
-
3561
- ## Step 8: Summary
3562
-
3563
- Print a summary of what was set up:
3564
-
3565
- > **Level 5 is live.** Here's what's running:
3566
- >
3567
- > | Trigger | What Happens |
3568
- > |---------|-------------|
3569
- > | Push specs to \`docs/specs/\` | Scenario agent writes holdout tests |
3570
- > | PR fails CI | Autofix agent attempts a fix (up to 3x) |
3571
- > | PR passes CI | Holdout scenarios run against PR |
3572
- > | Scenarios update | Open PRs re-tested with latest scenarios |
3573
- >
3574
- > Your scenarios repo: \`{name}\`
3575
- > Your coding agent cannot see those tests. The holdout wall is intact.
3576
-
3577
- Update \`docs/joycraft-assessment.md\` if it exists -- set the Level 5 score to reflect the new setup.
3578
- `,
3579
- "joycraft-interview.md": `---
3580
- name: joycraft-interview
3581
- description: Brainstorm freely about what you want to build \u2014 yap, explore ideas, and get a structured summary you can use later
3582
- ---
3583
-
3584
- # Interview \u2014 Idea Exploration
3585
-
3586
- You are helping the user brainstorm and explore what they want to build. This is a lightweight, low-pressure conversation \u2014 not a formal spec process. Let them yap.
3587
-
3588
- ## How to Run the Interview
3589
-
3590
- ### 1. Open the Floor
3591
-
3592
- Start with something like:
3593
- "What are you thinking about building? Just talk \u2014 I'll listen and ask questions as we go."
3594
-
3595
- Let the user talk freely. Do not interrupt their flow. Do not push toward structure yet.
3596
-
3597
- ### 2. Ask Clarifying Questions
3598
-
3599
- As they talk, weave in questions naturally \u2014 don't fire them all at once:
3600
-
3601
- - **What problem does this solve?** Who feels the pain today?
3602
- - **What does "done" look like?** If this worked perfectly, what would a user see?
3603
- - **What are the constraints?** Time, tech, team, budget \u2014 what boxes are we in?
3604
- - **What's NOT in scope?** What's tempting but should be deferred?
3605
- - **What are the edge cases?** What could go wrong? What's the weird input?
3606
- - **What exists already?** Are we building on something or starting fresh?
3607
-
3608
- ### 3. Play Back Understanding
3609
-
3610
- After the user has gotten their ideas out, reflect back:
3611
- "So if I'm hearing you right, you want to [summary]. The core problem is [X], and done looks like [Y]. Is that right?"
3612
-
3613
- Let them correct and refine. Iterate until they say "yes, that's it."
3614
-
3615
- ### 4. Write a Draft Brief
3616
-
3617
- Create a draft file at \`docs/briefs/YYYY-MM-DD-topic-draft.md\`. Create the \`docs/briefs/\` directory if it doesn't exist.
3618
-
3619
- Use this format:
3620
-
3621
- \`\`\`markdown
3622
- # [Topic] \u2014 Draft Brief
3623
-
3624
- > **Date:** YYYY-MM-DD
3625
- > **Status:** DRAFT
3626
- > **Origin:** $joycraft-interview session
3627
-
3628
- ---
3629
-
3630
- ## The Idea
3631
- [2-3 paragraphs capturing what the user described \u2014 their words, their framing]
3632
-
3633
- ## Problem
3634
- [What pain or gap this addresses]
3635
-
3636
- ## What "Done" Looks Like
3637
- [The user's description of success \u2014 observable outcomes]
3638
-
3639
- ## Constraints
3640
- - [constraint 1]
3641
- - [constraint 2]
3642
-
3643
- ## Open Questions
3644
- - [things that came up but weren't resolved]
3645
- - [decisions that need more thought]
3646
-
3647
- ## Out of Scope (for now)
3648
- - [things explicitly deferred]
3649
-
3650
- ## Raw Notes
3651
- [Any additional context, quotes, or tangents worth preserving]
3652
- \`\`\`
3653
-
3654
- ### 5. Hand Off
3655
-
3656
- After writing the draft, tell the user:
3657
-
3658
- \`\`\`
3659
- Draft brief saved to docs/briefs/YYYY-MM-DD-topic-draft.md
3660
-
3661
- When you're ready to move forward:
3662
- - $joycraft-new-feature \u2014 formalize this into a full Feature Brief with specs
3663
- - $joycraft-decompose \u2014 break it directly into atomic specs if scope is clear
3664
- - Or just keep brainstorming \u2014 run $joycraft-interview again anytime
3665
- \`\`\`
3666
-
3667
- ## Guidelines
3668
-
3669
- - **This is NOT $joycraft-new-feature.** Do not push toward formal briefs, decomposition tables, or atomic specs. The point is exploration.
3670
- - **Let the user lead.** Your job is to listen, clarify, and capture \u2014 not to structure or direct.
3671
- - **Mark everything as DRAFT.** The output is a starting point, not a commitment.
3672
- - **Keep it short.** The draft brief should be 1-2 pages max. Capture the essence, not every detail.
3673
- - **Multiple interviews are fine.** The user might run this several times as their thinking evolves. Each creates a new dated draft.
3674
- `,
3675
- "joycraft-lockdown.md": `---
3676
- name: joycraft-lockdown
3677
- description: Generate constrained execution boundaries for an implementation session -- NEVER rules and deny patterns to prevent agent overreach
3678
- ---
3679
-
3680
- # Lockdown Mode
3681
-
3682
- The user wants to constrain agent behavior for an implementation session. Your job is to interview them about what should be off-limits, then generate AGENTS.md NEVER rules and Codex configuration deny patterns they can review and apply.
3683
-
3684
- ## When Is Lockdown Useful?
3685
-
3686
- Lockdown is most valuable for:
3687
- - **Complex tech stacks** (hardware, firmware, multi-device) where agents can cause real damage
3688
- - **Long-running autonomous sessions** where you won't be monitoring every action
3689
- - **Production-adjacent work** where accidental network calls or package installs are risky
3690
-
3691
- For simple feature work on a well-tested codebase, lockdown is usually overkill. Mention this context to the user so they can decide.
3692
-
3693
- ## Step 1: Check for Tests
3694
-
3695
- Before starting the interview, search the codebase for test files or directories (look for \`tests/\`, \`test/\`, \`__tests__/\`, \`spec/\`, or files matching \`*.test.*\`, \`*.spec.*\`).
3696
-
3697
- If no tests are found, tell the user:
3698
-
3699
- > Lockdown mode is most useful when you already have tests in place -- it prevents the agent from modifying them while constraining behavior to writing code and running tests. Consider running \`$joycraft-new-feature\` first to set up a test-driven workflow, then come back to lock it down.
3700
-
3701
- If the user wants to proceed anyway, continue with the interview.
3702
-
3703
- ## Step 2: Interview -- What to Lock Down
3704
-
3705
- Ask these three questions, one at a time. Wait for the user's response before proceeding to the next question.
3706
-
3707
- ### Question 1: Read-Only Files
3708
-
3709
- > What test files or directories should be off-limits for editing? (e.g., \`tests/\`, \`__tests__/\`, \`spec/\`, specific test files)
3710
- >
3711
- > I'll generate NEVER rules to prevent editing these.
3712
-
3713
- If the user isn't sure, suggest the test directories you found in Step 1.
3714
-
3715
- ### Question 2: Allowed Commands
3716
-
3717
- > What commands should the agent be allowed to run? Defaults:
3718
- > - Write and edit source code files
3719
- > - Run the project's smoke test command
3720
- > - Run the full test suite
3721
- >
3722
- > Any other commands to explicitly allow? Or should I restrict to just these?
3723
-
3724
- ### Question 3: Denied Commands
3725
-
3726
- > What commands should be denied? Defaults:
3727
- > - Package installs (\`npm install\`, \`pip install\`, \`cargo add\`, \`go get\`, etc.)
3728
- > - Network tools (\`curl\`, \`wget\`, \`ping\`, \`ssh\`)
3729
- > - Direct log file reading
3730
- >
3731
- > Any specific commands to add or remove from this list?
3732
-
3733
- **Edge case -- user wants to allow some network access:** If the user mentions API tests or specific endpoints that need network access, exclude those from the deny list and note the exception in the output.
3734
-
3735
- **Edge case -- user wants to lock down file writes:** If the user wants to prevent ALL file writes, warn them:
3736
-
3737
- > Denying all file writes would prevent the agent from doing any work. I recommend keeping source code writes allowed and only locking down test files, config files, or other sensitive directories.
3738
-
3739
- ## Step 3: Generate Boundaries
3740
-
3741
- Based on the interview responses, generate output in this exact format:
3742
-
3743
- \`\`\`
3744
- ## Lockdown boundaries generated
3745
-
3746
- Review these suggestions and add them to your project:
3747
-
3748
- ### AGENTS.md -- add to NEVER section:
3749
-
3750
- - Edit any file in \`[user's test directories]\`
3751
- - Run \`[denied package manager commands]\`
3752
- - Use \`[denied network tools]\`
3753
- - Read log files directly -- interact with logs only through test assertions
3754
- - [Any additional NEVER rules based on user responses]
3755
-
3756
- ### Codex configuration -- suggested deny patterns:
3757
-
3758
- Add these to your Codex sandbox configuration to restrict command execution:
3759
-
3760
- ["[command1]", "[command2]", "[command3]"]
3761
-
3762
- ---
3763
-
3764
- Copy these into your project manually, or tell me to apply them now (I'll show you the exact changes for approval first).
3765
- \`\`\`
3766
-
3767
- Adjust the content based on the actual interview responses:
3768
- - Only include deny patterns for commands the user confirmed should be denied
3769
- - Only include NEVER rules for directories/files the user specified
3770
- - If the user allowed certain network tools or package managers, exclude those
3771
-
3772
- ## Recommended Execution Model
3773
-
3774
- After generating the boundaries above, also recommend a Codex execution configuration. Include this section in your output:
3775
-
3776
- \`\`\`
3777
- ### Recommended Execution Configuration
3778
-
3779
- Codex runs in a sandboxed environment by default. To maximize safety during lockdown:
3780
-
3781
- | Your situation | Configuration | Why |
3782
- |---|---|---|
3783
- | Autonomous spec execution | Sandbox with deny patterns above | Only pre-approved commands run |
3784
- | Long session with some trust | Default sandbox | Network-disabled sandbox prevents external access |
3785
- | Interactive development | Default with manual review | Review outputs before applying |
3786
-
3787
- **For lockdown mode, we recommend the default sandboxed execution** combined with the deny patterns above. Codex's sandbox already disables network access by default -- the deny patterns add file-level and command-level restrictions on top.
3788
-
3789
- If you need network access for specific commands (e.g., API tests), configure explicit network allowances in your Codex setup rather than disabling the sandbox entirely.
3790
- \`\`\`
3791
-
3792
- ## Step 4: Offer to Apply
3793
-
3794
- If the user asks you to apply the changes:
3795
-
3796
- 1. **For AGENTS.md:** Read the existing AGENTS.md, find the Behavioral Boundaries section, and show the user the exact diff for the NEVER section. Ask for confirmation before writing.
3797
- 2. **For Codex configuration:** Show the user what the deny patterns will look like after adding the new restrictions. Ask for confirmation before writing.
3798
-
3799
- **Never auto-apply. Always show the exact changes and wait for explicit approval.**
3800
- `,
3801
- "joycraft-new-feature.md": `---
3802
- name: joycraft-new-feature
3803
- description: Guided feature development \u2014 interview the user, produce a Feature Brief, then decompose into atomic specs
3804
- ---
3805
-
3806
- # New Feature Workflow
3807
-
3808
- You are starting a new feature. Follow this process in order. Do not skip steps.
3809
-
3810
- ## Phase 1: Interview
3811
-
3812
- Interview the user about what they want to build. Let them talk \u2014 your job is to listen, then sharpen.
3813
-
3814
- **Ask about:**
3815
- - What problem does this solve? Who is affected?
3816
- - What does "done" look like?
3817
- - Hard constraints? (business rules, tech limitations, deadlines)
3818
- - What is explicitly NOT in scope? (push hard on this)
3819
- - Edge cases or error conditions?
3820
- - What existing code/patterns should this follow?
3821
- - Testing: existing setup? framework? smoke test budget? lockdown mode desired?
3822
-
3823
- **Interview technique:**
3824
- - Let the user "yap" \u2014 don't interrupt their flow
3825
- - Play back your understanding: "So if I'm hearing you right..."
3826
- - Push toward testable statements: "How would we verify that works?"
3827
-
3828
- Keep asking until you can fill out a Feature Brief.
3829
-
3830
- ## Phase 2: Feature Brief
3831
-
3832
- Write a Feature Brief to \`docs/briefs/YYYY-MM-DD-feature-name.md\`. Create the \`docs/briefs/\` directory if it doesn't exist.
3833
-
3834
- **Why:** The brief is the single source of truth for what we're building. It prevents scope creep and gives every spec a shared reference point.
3835
-
3836
- Use this structure:
3837
-
3838
- \`\`\`markdown
3839
- # [Feature Name] \u2014 Feature Brief
3840
-
3841
- > **Date:** YYYY-MM-DD
3842
- > **Project:** [project name]
3843
- > **Status:** Interview | Decomposing | Specs Ready | In Progress | Complete
3844
-
3845
- ---
3846
-
3847
- ## Vision
3848
- What are we building and why? The full picture in 2-4 paragraphs.
3849
-
3850
- ## User Stories
3851
- - As a [role], I want [capability] so that [benefit]
3852
-
3853
- ## Hard Constraints
3854
- - MUST: [constraint that every spec must respect]
3855
- - MUST NOT: [prohibition that every spec must respect]
3856
-
3857
- ## Out of Scope
3858
- - NOT: [tempting but deferred]
3859
-
3860
- ## Test Strategy
3861
- - **Existing setup:** [framework and tools, or "none yet"]
3862
- - **User expertise:** [comfortable / learning / needs guidance]
3863
- - **Test types:** [smoke, unit, integration, e2e, etc.]
3864
- - **Smoke test budget:** [target time for fast-feedback tests]
3865
- - **Lockdown mode:** [yes/no \u2014 constrain agent to code + tests only]
3866
-
3867
- ## Decomposition
3868
- | # | Spec Name | Description | Dependencies | Est. Size |
3869
- |---|-----------|-------------|--------------|-----------|
3870
- | 1 | [verb-object] | [one sentence] | None | [S/M/L] |
3871
-
3872
- ## Execution Strategy
3873
- - [ ] Sequential (specs have chain dependencies)
3874
- - [ ] Parallel (specs are independent)
3875
- - [ ] Mixed
3876
-
3877
- ## Success Criteria
3878
- - [ ] [End-to-end behavior 1]
3879
- - [ ] [No regressions in existing features]
3880
- \`\`\`
3881
-
3882
- If \`docs/templates/FEATURE_BRIEF_TEMPLATE.md\` exists, reference it for the full template with additional guidance.
3883
-
3884
- Present the brief to the user. Focus review on:
3885
- - "Does the decomposition match how you think about this?"
3886
- - "Is anything in scope that shouldn't be?"
3887
- - "Are the specs small enough? Can each be described in one sentence?"
3888
-
3889
- Iterate until approved.
3890
-
3891
- ## Phase 3: Generate Atomic Specs
3892
-
3893
- For each row in the decomposition table, create a self-contained spec file at \`docs/specs/YYYY-MM-DD-spec-name.md\`. Create the \`docs/specs/\` directory if it doesn't exist.
3894
-
3895
- **Why:** Each spec must be understandable WITHOUT reading the Feature Brief. This prevents the "Curse of Instructions" \u2014 no spec should require holding the entire feature in context. Copy relevant context into each spec.
3896
-
3897
- Use this structure for each spec:
3898
-
3899
- \`\`\`markdown
3900
- # [Verb + Object] \u2014 Atomic Spec
3901
-
3902
- > **Parent Brief:** \`docs/briefs/YYYY-MM-DD-feature-name.md\`
3903
- > **Status:** Ready
3904
- > **Date:** YYYY-MM-DD
3905
- > **Estimated scope:** [1 session / N files / ~N lines]
3906
-
3907
- ---
3908
-
3909
- ## What
3910
- One paragraph \u2014 what changes when this spec is done?
3911
-
3912
- ## Why
3913
- One sentence \u2014 what breaks or is missing without this?
3914
-
3915
- ## Acceptance Criteria
3916
- - [ ] [Observable behavior]
3917
- - [ ] Build passes
3918
- - [ ] Tests pass
3919
-
3920
- ## Test Plan
3921
-
3922
- | Acceptance Criterion | Test | Type |
3923
- |---------------------|------|------|
3924
- | [Each AC above] | [What to call/assert] | [unit/integration/e2e] |
3925
-
3926
- **Execution order:**
3927
- 1. Write all tests above \u2014 they should fail against current/stubbed code
3928
- 2. Run tests to confirm they fail (red)
3929
- 3. Implement until all tests pass (green)
3930
-
3931
- **Smoke test:** [Identify the fastest test for iteration feedback]
3932
-
3933
- **Before implementing, verify your test harness:**
3934
- 1. Run all tests \u2014 they must FAIL (if they pass, you're testing the wrong thing)
3935
- 2. Each test calls your actual function/endpoint \u2014 not a reimplementation or the underlying library
3936
- 3. Identify your smoke test \u2014 it must run in seconds, not minutes, so you get fast feedback on each change
3937
-
3938
- ## Constraints
3939
- - MUST: [hard requirement]
3940
- - MUST NOT: [hard prohibition]
3941
-
3942
- ## Affected Files
3943
- | Action | File | What Changes |
3944
- |--------|------|-------------|
3945
-
3946
- ## Approach
3947
- Strategy, data flow, key decisions. Name one rejected alternative.
3948
-
3949
- ## Edge Cases
3950
- | Scenario | Expected Behavior |
3951
- |----------|------------------|
3952
- \`\`\`
3953
-
3954
- If \`docs/templates/ATOMIC_SPEC_TEMPLATE.md\` exists, reference it for the full template with additional guidance.
3955
-
3956
- ## Phase 4: Hand Off for Execution
3957
-
3958
- Tell the user:
3959
- \`\`\`
3960
- Feature Brief and [N] atomic specs are ready.
3961
-
3962
- Specs:
3963
- 1. [spec-name] \u2014 [one sentence] [S/M/L]
3964
- 2. [spec-name] \u2014 [one sentence] [S/M/L]
3965
- ...
3966
-
3967
- Recommended execution:
3968
- - [Parallel/Sequential/Mixed strategy]
3969
- - Estimated: [N] sessions total
3970
-
3971
- To execute: Start a fresh session per spec. Each session should:
3972
- 1. Read the spec
3973
- 2. Implement
3974
- 3. Run $joycraft-session-end to capture discoveries
3975
- 4. Commit and PR
3976
-
3977
- Ready to start?
3978
- \`\`\`
3979
-
3980
- **Why:** A fresh session for execution produces better results. The interview session has too much context noise \u2014 a clean session with just the spec is more focused.
3981
-
3982
- You can also use \`$joycraft-decompose\` to re-decompose a brief if the breakdown needs adjustment, or run \`$joycraft-interview\` first for a lighter brainstorm before committing to the full workflow.
3983
- `,
3984
- "joycraft-research.md": `---
3985
- name: joycraft-research
3986
- description: Produce objective codebase research by isolating question generation from fact-gathering \u2014 subagent sees only questions, never the brief
3987
- ---
3988
-
3989
- # Research Codebase for a Feature
3990
-
3991
- You are producing objective codebase research to inform a future spec or implementation. The key insight: the researching agent must never see the brief or ticket \u2014 only research questions. This prevents opinions from contaminating the facts.
3992
-
3993
- **Guard clause:** If the user doesn't provide a brief path or inline description, ask:
3994
- "What feature or change are you researching? Provide a brief path or describe it."
3995
-
3996
- ---
3997
-
3998
- ## Phase 1: Generate Research Questions
3999
-
4000
- Read the brief and identify which zones of the codebase are relevant. Generate 5-10 research questions that are:
4001
- - **Objective and fact-seeking** \u2014 "How does X work?" not "How should we build X?"
4002
- - **Specific to the codebase**
4003
- - **Answerable by reading code**
4004
-
4005
- Write the questions to \`docs/research/.questions-tmp.md\`. **Do NOT include any content from the brief.**
4006
-
4007
- ---
4008
-
4009
- ## Phase 2: Spawn Research Subagent
4010
-
4011
- Spawn a subagent to perform the research. Pass ONLY the research questions \u2014 never the brief.
4012
-
4013
- Subagent prompt:
4014
- \`\`\`
4015
- You are researching a codebase to answer specific questions. You have NO context about why these questions are being asked.
4016
-
4017
- RULES:
4018
- - Answer each question with FACTS ONLY: file paths, function signatures, data flows, patterns, dependencies
4019
- - Do NOT recommend, suggest, or opine
4020
- - Do NOT speculate about what should be built
4021
- - If a question cannot be answered, say "No existing code found for this"
4022
- - Search the codebase and read files thoroughly
4023
- - Include code snippets only when essential evidence
4024
-
4025
- QUESTIONS:
4026
- [INSERT_QUESTIONS_HERE]
4027
-
4028
- OUTPUT FORMAT:
4029
-
4030
- # Codebase Research
4031
-
4032
- **Date:** [today]
4033
- **Questions answered:** [N/total]
4034
-
4035
- ---
4036
-
4037
- ## Q1: [question]
4038
- [Facts only]
4039
-
4040
- ## Q2: [question]
4041
- [Facts only]
4042
- \`\`\`
4043
-
4044
- ## Phase 3: Write the Research Document
4045
-
4046
- Write the subagent's response to \`docs/research/YYYY-MM-DD-feature-name.md\`. Delete the temporary questions file.
4047
-
4048
- Present:
4049
- \`\`\`
4050
- Research complete: docs/research/YYYY-MM-DD-feature-name.md
4051
-
4052
- This document contains objective facts \u2014 no opinions or recommendations.
4053
-
4054
- Next steps:
4055
- - $joycraft-decompose \u2014 break the feature into atomic specs
4056
- - $joycraft-new-feature \u2014 formalize into a full Feature Brief first
4057
- - Read the research and add corrections manually
4058
- \`\`\`
4059
- `,
4060
- "joycraft-session-end.md": `---
4061
- name: joycraft-session-end
4062
- description: Wrap up a session \u2014 capture discoveries, verify, prepare for PR or next session
4063
- ---
4064
-
4065
- # Session Wrap-Up
4066
-
4067
- Before ending this session, complete these steps in order.
4068
-
4069
- ## 1. Capture Discoveries
4070
-
4071
- **Why:** Discoveries are the surprises \u2014 things that weren't in the spec or that contradicted expectations. They prevent future sessions from hitting the same walls.
4072
-
4073
- Check: did anything surprising happen during this session? If yes, create or update a discovery file at \`docs/discoveries/YYYY-MM-DD-topic.md\`. Create the \`docs/discoveries/\` directory if it doesn't exist.
4074
-
4075
- Only capture what's NOT obvious from the code or git diff:
4076
- - "We thought X but found Y" \u2014 assumptions that were wrong
4077
- - "This API/library behaves differently than documented" \u2014 external gotchas
4078
- - "This edge case needs handling in a future spec" \u2014 deferred work with context
4079
- - "The approach in the spec didn't work because..." \u2014 spec-vs-reality gaps
4080
- - Key decisions made during implementation that aren't in the spec
4081
-
4082
- **Do NOT capture:**
4083
- - Files changed (that's the diff)
4084
- - What you set out to do (that's the spec)
4085
- - Step-by-step narrative of the session (nobody re-reads these)
4086
-
4087
- Use this format:
4088
-
4089
- \`\`\`markdown
4090
- # Discoveries \u2014 [topic]
4091
-
4092
- **Date:** YYYY-MM-DD
4093
- **Spec:** [link to spec if applicable]
4094
-
4095
- ## [Discovery title]
4096
- **Expected:** [what we thought would happen]
4097
- **Actual:** [what actually happened]
4098
- **Impact:** [what this means for future work]
4099
- \`\`\`
4100
-
4101
- If nothing surprising happened, skip the discovery file entirely. No discovery is a good sign \u2014 the spec was accurate.
4102
-
4103
- ## 1b. Update Context Documents
4104
-
4105
- If \`docs/context/\` exists, quickly check whether this session revealed anything about:
4106
-
4107
- - **Production risks** \u2014 did you interact with or learn about production vs staging systems? Update \`docs/context/production-map.md\`
4108
- - **Wrong assumptions** \u2014 did you assume something that turned out to be false? Update \`docs/context/dangerous-assumptions.md\`
4109
- - **Key decisions** \u2014 did you make an architectural or tooling choice? Add a row to \`docs/context/decision-log.md\`
4110
- - **Unwritten rules** \u2014 did you discover a convention or constraint not documented anywhere? Update \`docs/context/institutional-knowledge.md\`
4111
-
4112
- Skip this if nothing applies. Don't force it \u2014 only update when there's genuine new context.
4113
-
4114
- ## 2. Run Validation
4115
-
4116
- Run the project's validation commands. Check CLAUDE.md or AGENTS.md for project-specific commands. Common checks:
4117
-
4118
- - Type-check (e.g., \`tsc --noEmit\`, \`mypy\`, \`cargo check\`)
4119
- - Tests (e.g., \`npm test\`, \`pytest\`, \`cargo test\`)
4120
- - Lint (e.g., \`eslint\`, \`ruff\`, \`clippy\`)
4121
-
4122
- Fix any failures before proceeding.
4123
-
4124
- ## 3. Update Spec Status
4125
-
4126
- If working from an atomic spec in \`docs/specs/\`:
4127
- - All acceptance criteria met \u2014 update status to \`Complete\`
4128
- - Partially done \u2014 update status to \`In Progress\`, note what's left
4129
-
4130
- If working from a Feature Brief in \`docs/briefs/\`, check off completed specs in the decomposition table.
4131
-
4132
- ## 4. Commit
4133
-
4134
- Commit all changes including the discovery file (if created) and spec status updates. The commit message should reference the spec if applicable.
4135
-
4136
- ## 5. Push and PR (if autonomous git is enabled)
4137
-
4138
- **Check CLAUDE.md or AGENTS.md for "Git Autonomy" in the Behavioral Boundaries section.** If it says "STRICTLY ENFORCED" or the ALWAYS section includes "Push to feature branches immediately after every commit":
4139
-
4140
- 1. **Push immediately.** Run \`git push origin <branch>\` \u2014 do not ask, do not hesitate.
4141
- 2. **Open a PR if the feature is complete.** Check the parent Feature Brief's decomposition table \u2014 if all specs are done, run \`gh pr create\` with a summary of all completed specs. Do not ask first.
4142
- 3. **If not all specs are done,** still push. The PR comes when the last spec is complete.
4143
-
4144
- If CLAUDE.md or AGENTS.md does NOT have autonomous git rules (or has "ASK FIRST" for pushing), ask the user before pushing.
4145
-
4146
- ## 6. Report
4147
-
4148
- \`\`\`
4149
- Session complete.
4150
- - Spec: [spec name] \u2014 [Complete / In Progress]
4151
- - Build: [passing / failing]
4152
- - Discoveries: [N items / none]
4153
- - Pushed: [yes / no \u2014 and why not]
4154
- - PR: [opened #N / not yet \u2014 N specs remaining]
4155
- - Next: [what the next session should tackle]
4156
- \`\`\`
4157
- `,
4158
- "joycraft-tune.md": `---
4159
- name: joycraft-tune
4160
- description: Assess and upgrade your project's AI development harness \u2014 score 7 dimensions, apply fixes, show path to Level 5
4161
- ---
4162
-
4163
- # Tune \u2014 Project Harness Assessment & Upgrade
4164
-
4165
- You are evaluating and upgrading this project's AI development harness.
4166
-
4167
- ## Step 1: Detect Harness State
4168
-
4169
- Search the codebase for: CLAUDE.md (with meaningful content), \`docs/specs/\`, \`docs/briefs/\`, \`docs/discoveries/\`, \`.agents/skills/\`, and test configuration.
4170
-
4171
- ## Step 2: Route
4172
-
4173
- - **No harness** (no CLAUDE.md or just a README): Recommend \`npx joycraft init\` and stop.
4174
- - **Harness exists**: Continue to assessment.
4175
-
4176
- ## Step 3: Assess \u2014 Score 7 Dimensions (1-5 scale)
4177
-
4178
- Read CLAUDE.md and explore the project. Score each with specific evidence:
4179
-
4180
- | Dimension | What to Check |
4181
- |-----------|--------------|
4182
- | Spec Quality | \`docs/specs/\` \u2014 structured? acceptance criteria? self-contained? |
4183
- | Spec Granularity | Can each spec be done in one session? |
4184
- | Behavioral Boundaries | ALWAYS/ASK FIRST/NEVER sections (or equivalent rules under any heading) |
4185
- | Skills & Hooks | \`.agents/skills/\` files, hooks config |
4186
- | Documentation | \`docs/\` structure, templates, referenced from CLAUDE.md |
4187
- | Knowledge Capture | \`docs/discoveries/\`, \`docs/context/*.md\` \u2014 existence AND real content |
4188
- | Testing & Validation | Test framework, CI pipeline, validation commands in CLAUDE.md |
4189
-
4190
- Score 1 = absent, 3 = partially there, 5 = comprehensive. Give credit for substance over format.
4191
-
4192
- ## Step 4: Write Assessment
4193
-
4194
- Write to \`docs/joycraft-assessment.md\` AND display it. Include: scores table, detailed findings (evidence + gap + recommendation per dimension), and an upgrade plan (up to 5 actions ordered by impact).
4195
-
4196
- ## Step 5: Apply Upgrades
4197
-
4198
- Apply using three tiers \u2014 do NOT ask per-item permission:
4199
-
4200
- **Tier 1 (silent):** Create missing dirs, install missing skills, copy missing templates, create AGENTS.md.
4201
-
4202
- **Before Tier 2, ask TWO things:**
4203
-
4204
- 1. **Git autonomy:** Cautious (ask before push/PR) or Autonomous (push + PR without asking)?
4205
- 2. **Risk interview (3-5 questions, one at a time):** What could break? What services connect to prod? Unwritten rules? Off-limits files/commands? Skip if \`docs/context/\` already has content.
4206
-
4207
- From answers, generate: CLAUDE.md boundary rules, deny patterns configuration, \`docs/context/\` documents. Also recommend a permission mode (\`auto\` for most; \`dontAsk\` + allowlist for high-risk).
4208
-
4209
- **Tier 2 (show diff):** Add missing CLAUDE.md sections (Boundaries, Workflow, Key Files). Draft from real codebase content. Append only \u2014 never reformat existing content.
4210
-
4211
- **Tier 3 (confirm first):** Rewriting existing sections, overwriting customized files, suggesting test framework installs.
4212
-
4213
- After applying, append to \`docs/joycraft-history.md\` and show a consolidated upgrade results table.
4214
-
4215
- ## Step 6: Show Path to Level 5
4216
-
4217
- Show a tailored roadmap: Level 2-5 table, specific next steps based on actual gaps, and the Level 5 north star (spec queue, autofix, holdout scenarios, self-improving harness).
4218
-
4219
- ## Edge Cases
4220
-
4221
- - **CLAUDE.md is just a README:** Treat as no harness.
4222
- - **Non-Joycraft skills:** Acknowledge, don't replace.
4223
- - **Rules under non-standard headings:** Give credit for substance.
4224
- - **Previous assessment exists:** Read it first. If nothing to upgrade, say so.
4225
- - **Non-Joycraft content in CLAUDE.md:** Preserve as-is. Only append.
4226
- `,
4227
- "joycraft-verify.md": `---
4228
- name: joycraft-verify
4229
- description: Spawn an independent verifier subagent to check an implementation against its spec -- read-only, no code edits, structured pass/fail verdict
4230
- ---
4231
-
4232
- # Verify Implementation Against Spec
4233
-
4234
- The user wants independent verification of an implementation. Your job is to find the relevant spec, extract its acceptance criteria and test plan, then spawn a separate verifier subagent that checks each criterion and produces a structured verdict.
4235
-
4236
- **Why a separate subagent?** Research found that agents reliably skew positive when grading their own work. Separating the agent doing the work from the agent judging it consistently outperforms self-evaluation. The verifier gets a clean context window with no implementation bias.
4237
-
4238
- ## Step 1: Find the Spec
4239
-
4240
- If the user provided a spec path (e.g., \`$joycraft-verify docs/specs/2026-03-26-add-widget.md\`), use that path directly.
4241
-
4242
- If no path was provided, scan \`docs/specs/\` for spec files. Pick the most recently modified \`.md\` file in that directory. If \`docs/specs/\` doesn't exist or is empty, tell the user:
4243
-
4244
- > No specs found in \`docs/specs/\`. Please provide a spec path: \`$joycraft-verify path/to/spec.md\`
4245
-
4246
- ## Step 2: Read and Parse the Spec
4247
-
4248
- Read the spec file and extract:
4249
-
4250
- 1. **Spec name** -- from the H1 title
4251
- 2. **Acceptance Criteria** -- the checklist under the \`## Acceptance Criteria\` section
4252
- 3. **Test Plan** -- the table under the \`## Test Plan\` section, including any test commands
4253
- 4. **Constraints** -- the \`## Constraints\` section if present
4254
-
4255
- If the spec has no Acceptance Criteria section, tell the user:
4256
-
4257
- > This spec doesn't have an Acceptance Criteria section. Verification needs criteria to check against. Add acceptance criteria to the spec and try again.
4258
-
4259
- If the spec has no Test Plan section, note this but proceed -- the verifier can still check criteria by reading code and running any available project tests.
4260
-
4261
- ## Step 3: Identify Test Commands
4262
-
4263
- Look for test commands in these locations (in priority order):
4264
-
4265
- 1. The spec's Test Plan section (look for commands in backticks or "Type" column entries like "unit", "integration", "e2e", "build")
4266
- 2. The project's CLAUDE.md or AGENTS.md (look for test/build commands in the Development Workflow section)
4267
- 3. Common defaults based on the project type:
4268
- - Node.js: \`npm test\` or \`pnpm test --run\`
4269
- - Python: \`pytest\`
4270
- - Rust: \`cargo test\`
4271
- - Go: \`go test ./...\`
4272
-
4273
- Build a list of specific commands the verifier should run.
4274
-
4275
- ## Step 4: Spawn the Verifier Subagent
4276
-
4277
- Spawn a concurrent subagent thread with the following prompt. Replace the placeholders with the actual content extracted in Steps 2-3.
4278
-
4279
- **Important:** The subagent must be given read-only constraints. It may search the codebase, read files, and run the specified test/build commands, but it must NOT edit or create any files.
4280
-
4281
- \`\`\`
4282
- You are a QA verifier. Your job is to independently verify an implementation against its spec. You have NO context about how the implementation was done -- you are checking it fresh.
4283
-
4284
- RULES -- these are hard constraints, not suggestions:
4285
- - You may search the codebase and read any file
4286
- - You may RUN these specific test/build commands: [TEST_COMMANDS]
4287
- - You may NOT edit, create, or delete any files
4288
- - You may NOT run commands that modify state (no git commit, no npm install, no file writes)
4289
- - You may NOT install packages or access the network
4290
- - Report what you OBSERVE, not what you expect or hope
4291
-
4292
- SPEC NAME: [SPEC_NAME]
4293
-
4294
- ACCEPTANCE CRITERIA:
4295
- [ACCEPTANCE_CRITERIA]
4296
-
4297
- TEST PLAN:
4298
- [TEST_PLAN]
4299
-
4300
- CONSTRAINTS:
4301
- [CONSTRAINTS_OR_NONE]
4302
-
4303
- YOUR TASK:
4304
- For each acceptance criterion, determine if it PASSES or FAILS based on evidence:
4305
-
4306
- 1. Run the test commands listed above. Record the output.
4307
- 2. For each acceptance criterion:
4308
- a. Check if there is a corresponding test and whether it passes
4309
- b. If no test exists, read the relevant source files to verify the criterion is met
4310
- c. If the criterion cannot be verified by reading code or running tests, mark it MANUAL CHECK NEEDED
4311
- 3. For criteria about build/test passing, actually run the commands and report results.
4312
-
4313
- OUTPUT FORMAT -- you MUST use this exact format:
4314
-
4315
- VERIFICATION REPORT
4316
-
4317
- | # | Criterion | Verdict | Evidence |
4318
- |---|-----------|---------|----------|
4319
- | 1 | [criterion text] | PASS/FAIL/MANUAL CHECK NEEDED | [what you observed] |
4320
- | 2 | [criterion text] | PASS/FAIL/MANUAL CHECK NEEDED | [what you observed] |
4321
- [continue for all criteria]
4322
-
4323
- SUMMARY: X/Y criteria passed. [Z failures need attention. / All criteria verified.]
4324
-
4325
- If any test commands fail to run (missing dependencies, wrong command, etc.), report the error as evidence for a FAIL verdict on the relevant criterion.
4326
- \`\`\`
4327
-
4328
- ## Step 5: Format and Present the Verdict
4329
-
4330
- Take the subagent's response and present it to the user in this format:
4331
-
4332
- \`\`\`
4333
- ## Verification Report -- [Spec Name]
4334
-
4335
- | # | Criterion | Verdict | Evidence |
4336
- |---|-----------|---------|----------|
4337
- | 1 | ... | PASS | ... |
4338
- | 2 | ... | FAIL | ... |
4339
-
4340
- **Overall: X/Y criteria passed.**
4341
-
4342
- [If all passed:]
4343
- All criteria verified. Ready to commit and open a PR.
4344
-
4345
- [If any failed:]
4346
- N failures need attention. Review the evidence above and fix before proceeding.
4347
-
4348
- [If any MANUAL CHECK NEEDED:]
4349
- N criteria need manual verification -- they can't be checked by reading code or running tests alone.
4350
- \`\`\`
4351
-
4352
- ## Step 6: Suggest Next Steps
4353
-
4354
- Based on the verdict:
4355
-
4356
- - **All PASS:** Suggest committing and opening a PR, or running \`$joycraft-session-end\` to capture discoveries.
4357
- - **Some FAIL:** List the failed criteria and suggest the user fix them, then run \`$joycraft-verify\` again.
4358
- - **MANUAL CHECK NEEDED items:** Explain what needs human eyes and why automation couldn't verify it.
4359
-
4360
- **Do NOT offer to fix failures yourself.** The verifier reports; the human (or implementation agent in a separate turn) decides what to do. This separation is the whole point.
4361
-
4362
- ## Edge Cases
4363
-
4364
- | Scenario | Behavior |
4365
- |----------|----------|
4366
- | Spec has no Test Plan | Warn that verification is weaker without a test plan, but proceed by checking criteria through code reading and any available project-level tests |
4367
- | All tests pass but a criterion is not testable | Mark as MANUAL CHECK NEEDED with explanation |
4368
- | Subagent can't run tests (missing deps) | Report the error as FAIL evidence |
4369
- | No specs found and no path given | Tell user to provide a spec path or create a spec first |
4370
- | Spec status is "Complete" | Still run verification -- "Complete" means the implementer thinks it's done, verification confirms |
4371
- `
4372
- };
4373
-
4374
- export {
4375
- SKILLS,
4376
- TEMPLATES,
4377
- CODEX_SKILLS
4378
- };
4379
- //# sourceMappingURL=chunk-QU5VHXMV.js.map