joycraft 0.4.0 → 0.5.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,2297 @@
1
+ #!/usr/bin/env node
2
+
3
+ // src/bundled-files.ts
4
+ var SKILLS = {
5
+ "joycraft-decompose.md": `---
6
+ name: joycraft-decompose
7
+ description: Break a feature brief into atomic specs \u2014 small, testable, independently executable units
8
+ ---
9
+
10
+ # Decompose Feature into Atomic Specs
11
+
12
+ You have a Feature Brief (or the user has described a feature). Your job is to decompose it into atomic specs that can be executed independently \u2014 one spec per session.
13
+
14
+ ## Step 1: Verify the Brief Exists
15
+
16
+ Look for a Feature Brief in \`docs/briefs/\`. If one doesn't exist yet, tell the user:
17
+
18
+ > No feature brief found. Run \`/joycraft-new-feature\` first to interview and create one, or describe the feature now and I'll work from your description.
19
+
20
+ If the user describes the feature inline, work from that description directly. You don't need a formal brief to decompose \u2014 but recommend creating one for complex features.
21
+
22
+ ## Step 2: Identify Natural Boundaries
23
+
24
+ **Why:** Good boundaries make specs independently testable and committable. Bad boundaries create specs that can't be verified without other specs also being done.
25
+
26
+ Read the brief (or description) and identify natural split points:
27
+
28
+ - **Data layer changes** (schemas, types, migrations) \u2014 always a separate spec
29
+ - **Pure functions / business logic** \u2014 separate from I/O
30
+ - **UI components** \u2014 separate from data fetching
31
+ - **API endpoints / route handlers** \u2014 separate from business logic
32
+ - **Test infrastructure** (mocks, fixtures, helpers) \u2014 can be its own spec if substantial
33
+ - **Configuration / environment** \u2014 separate from code changes
34
+
35
+ Ask yourself: "Can this piece be committed and tested without the other pieces existing?" If yes, it's a good boundary.
36
+
37
+ ## Step 3: Build the Decomposition Table
38
+
39
+ For each atomic spec, define:
40
+
41
+ | # | Spec Name | Description | Dependencies | Size |
42
+ |---|-----------|-------------|--------------|------|
43
+
44
+ **Rules:**
45
+ - Each spec name is \`verb-object\` format (e.g., \`add-terminal-detection\`, \`extract-prompt-module\`)
46
+ - Each description is ONE sentence \u2014 if you need two, the spec is too big
47
+ - Dependencies reference other spec numbers \u2014 keep the dependency graph shallow
48
+ - More than 2 dependencies on a single spec = it's too big, split further
49
+ - Aim for 3-7 specs per feature. Fewer than 3 = probably not decomposed enough. More than 10 = the feature brief is too big
50
+
51
+ ## Step 4: Present and Iterate
52
+
53
+ Show the decomposition table to the user. Ask:
54
+ 1. "Does this breakdown match how you think about this feature?"
55
+ 2. "Are there any specs that feel too big or too small?"
56
+ 3. "Should any of these run in parallel (separate worktrees)?"
57
+
58
+ Iterate until the user approves.
59
+
60
+ ## Step 5: Generate Atomic Specs
61
+
62
+ For each approved row, create \`docs/specs/YYYY-MM-DD-spec-name.md\`. Create the \`docs/specs/\` directory if it doesn't exist.
63
+
64
+ **Why:** Each spec must be self-contained \u2014 a fresh Claude session should be able to execute it without reading the Feature Brief. Copy relevant constraints and context into each spec.
65
+
66
+ Use this structure:
67
+
68
+ \`\`\`markdown
69
+ # [Verb + Object] \u2014 Atomic Spec
70
+
71
+ > **Parent Brief:** \`docs/briefs/YYYY-MM-DD-feature-name.md\` (or "standalone")
72
+ > **Status:** Ready
73
+ > **Date:** YYYY-MM-DD
74
+ > **Estimated scope:** [1 session / N files / ~N lines]
75
+
76
+ ---
77
+
78
+ ## What
79
+ One paragraph \u2014 what changes when this spec is done?
80
+
81
+ ## Why
82
+ One sentence \u2014 what breaks or is missing without this?
83
+
84
+ ## Acceptance Criteria
85
+ - [ ] [Observable behavior]
86
+ - [ ] Build passes
87
+ - [ ] Tests pass
88
+
89
+ ## Constraints
90
+ - MUST: [hard requirement]
91
+ - MUST NOT: [hard prohibition]
92
+
93
+ ## Affected Files
94
+ | Action | File | What Changes |
95
+ |--------|------|-------------|
96
+
97
+ ## Approach
98
+ Strategy, data flow, key decisions. Name one rejected alternative.
99
+
100
+ ## Edge Cases
101
+ | Scenario | Expected Behavior |
102
+ |----------|------------------|
103
+ \`\`\`
104
+
105
+ If \`docs/templates/ATOMIC_SPEC_TEMPLATE.md\` exists, reference it for the full template with additional guidance.
106
+
107
+ Fill in all sections \u2014 each spec must be self-contained (no "see the brief for context"). Copy relevant constraints from the Feature Brief into each spec. Write acceptance criteria specific to THIS spec, not the whole feature.
108
+
109
+ ## Step 6: Recommend Execution Strategy
110
+
111
+ Based on the dependency graph:
112
+ - **Independent specs** \u2014 "These can run in parallel worktrees"
113
+ - **Sequential specs** \u2014 "Execute these in order: 1 -> 2 -> 4"
114
+ - **Mixed** \u2014 "Start specs 1 and 3 in parallel. After 1 completes, start 2."
115
+
116
+ Update the Feature Brief's Execution Strategy section with the plan (if a brief exists).
117
+
118
+ ## Step 7: Hand Off
119
+
120
+ Tell the user:
121
+ \`\`\`
122
+ Decomposition complete:
123
+ - [N] atomic specs created in docs/specs/
124
+ - [N] can run in parallel, [N] are sequential
125
+ - Estimated total: [N] sessions
126
+
127
+ To execute:
128
+ - Sequential: Open a session, point Claude at each spec in order
129
+ - Parallel: Use worktrees \u2014 one spec per worktree, merge when done
130
+ - Each session should end with /joycraft-session-end to capture discoveries
131
+
132
+ Ready to start execution?
133
+ \`\`\`
134
+ `,
135
+ "joycraft-implement-level5.md": `---
136
+ name: joycraft-implement-level5
137
+ description: Set up Level 5 autonomous development \u2014 autofix loop, holdout scenario testing, and scenario evolution from specs
138
+ ---
139
+
140
+ # Implement Level 5 \u2014 Autonomous Development Loop
141
+
142
+ You are guiding the user through setting up Level 5: the autonomous feedback loop where specs go in, validated software comes out. This is a one-time setup that installs workflows, creates a scenarios repo, and configures the autofix loop.
143
+
144
+ ## Before You Begin
145
+
146
+ Check prerequisites:
147
+
148
+ 1. **Project must be initialized.** Look for \`.joycraft-version\`. If missing, tell the user to run \`npx joycraft init\` first.
149
+ 2. **Project should be at Level 4.** Check \`docs/joycraft-assessment.md\` if it exists. If the project hasn't been assessed yet, suggest running \`/joycraft-tune\` first. But don't block \u2014 the user may know they're ready.
150
+ 3. **Git repo with GitHub remote.** This setup requires GitHub Actions. Check for \`.git/\` and a GitHub remote.
151
+
152
+ If prerequisites aren't met, explain what's needed and stop.
153
+
154
+ ## Step 1: Explain What Level 5 Means
155
+
156
+ Tell the user:
157
+
158
+ > Level 5 is the autonomous loop. When you push specs, three things happen automatically:
159
+ >
160
+ > 1. **Scenario evolution** \u2014 A separate AI agent reads your specs and writes holdout tests in a private scenarios repo. These tests are invisible to your coding agent.
161
+ > 2. **Autofix** \u2014 When CI fails on a PR, Claude Code automatically attempts a fix (up to 3 times).
162
+ > 3. **Holdout validation** \u2014 When CI passes, your scenarios repo runs behavioral tests against the PR. Results post as PR comments.
163
+ >
164
+ > The key insight: your coding agent never sees the scenario tests. This prevents it from gaming the test suite \u2014 like a validation set in machine learning.
165
+
166
+ ## Step 2: Gather Configuration
167
+
168
+ Ask these questions **one at a time**:
169
+
170
+ ### Question 1: Scenarios repo name
171
+
172
+ > What should we call your scenarios repo? It'll be a private repo that holds your holdout tests.
173
+ >
174
+ > Default: \`{current-repo-name}-scenarios\`
175
+
176
+ Accept the default or the user's choice.
177
+
178
+ ### Question 2: GitHub App
179
+
180
+ > Level 5 needs a GitHub App to provide a separate identity for autofix pushes (this avoids GitHub's anti-recursion protection).
181
+ >
182
+ > **Option A:** Install the shared Joycraft Autofix app (quickest \u2014 1 click)
183
+ > **Option B:** Create your own GitHub App (more control)
184
+ >
185
+ > Which do you prefer?
186
+
187
+ If Option A: The App ID is \`3180156\`. Note this for later.
188
+ If Option B: Guide them to create an app at \`https://github.com/settings/apps/new\` with permissions: Contents (Read & Write), Pull Requests (Read & Write), Actions (Read & Write). They'll need the App ID from the settings page.
189
+
190
+ ### Question 3: App ID
191
+
192
+ If they chose Option B, ask for their App ID. If Option A, use \`3180156\`.
193
+
194
+ ## Step 3: Run init-autofix
195
+
196
+ Run the CLI command with the gathered configuration:
197
+
198
+ \`\`\`bash
199
+ npx joycraft init-autofix --scenarios-repo {name} --app-id {id}
200
+ \`\`\`
201
+
202
+ Review the output with the user. Confirm files were created.
203
+
204
+ ## Step 4: Walk Through Secret Configuration
205
+
206
+ Guide the user step by step:
207
+
208
+ ### 4a: GitHub App Private Key
209
+
210
+ > If you chose the shared Joycraft Autofix app, you'll need to generate a private key:
211
+ > 1. Go to https://github.com/settings/apps/joycraft-autofix
212
+ > 2. Scroll to "Private keys" and generate one
213
+ > 3. Download the \`.pem\` file
214
+ >
215
+ > If you created your own app, generate a private key from your app's settings page.
216
+
217
+ ### 4b: Add Secrets to Main Repo
218
+
219
+ > Go to your repo's Settings > Secrets and variables > Actions, and add:
220
+ > - \`JOYCRAFT_APP_PRIVATE_KEY\` \u2014 paste the contents of your \`.pem\` file
221
+ > - \`ANTHROPIC_API_KEY\` \u2014 your Anthropic API key
222
+
223
+ ### 4c: Install the App
224
+
225
+ > The GitHub App needs to be installed on your repo:
226
+ > - Shared app: https://github.com/apps/joycraft-autofix/installations/new
227
+ > - Own app: Go to your app's settings > Install App
228
+
229
+ ### 4d: Create the Scenarios Repo
230
+
231
+ > Create the private scenarios repo:
232
+ > \`\`\`bash
233
+ > gh repo create {scenarios-repo-name} --private
234
+ > \`\`\`
235
+ >
236
+ > Then copy the scenario templates into it:
237
+ > \`\`\`bash
238
+ > cp -r docs/templates/scenarios/* ../{scenarios-repo-name}/
239
+ > cd ../{scenarios-repo-name}
240
+ > git add -A && git commit -m "init: scaffold scenarios repo from Joycraft"
241
+ > git push
242
+ > \`\`\`
243
+
244
+ ### 4e: Add Secrets to Scenarios Repo
245
+
246
+ > The scenarios repo also needs the App private key:
247
+ > - \`JOYCRAFT_APP_PRIVATE_KEY\` \u2014 same \`.pem\` file as the main repo
248
+ > - \`ANTHROPIC_API_KEY\` \u2014 same key (needed for scenario generation)
249
+
250
+ ## Step 5: Verify Setup
251
+
252
+ Help the user verify everything is wired correctly:
253
+
254
+ 1. **Check workflow files exist:** \`ls .github/workflows/autofix.yml .github/workflows/scenarios-dispatch.yml .github/workflows/spec-dispatch.yml .github/workflows/scenarios-rerun.yml\`
255
+ 2. **Check scenario templates were copied:** Verify the scenarios repo has \`example-scenario.test.ts\`, \`workflows/run.yml\`, \`workflows/generate.yml\`, \`prompts/scenario-agent.md\`
256
+ 3. **Check the App ID is correct** in the workflow files (not still a placeholder)
257
+
258
+ ## Step 6: Update CLAUDE.md
259
+
260
+ If the project's CLAUDE.md doesn't already have an "External Validation" section, add one:
261
+
262
+ > ## External Validation
263
+ >
264
+ > This project uses holdout scenario tests in a separate private repo.
265
+ >
266
+ > ### NEVER
267
+ > - Access, read, or reference the scenarios repo
268
+ > - Mention scenario test names or contents
269
+ > - Modify the scenarios dispatch workflow to leak test information
270
+ >
271
+ > The scenarios repo is deliberately invisible to you. This is the holdout guarantee.
272
+
273
+ ## Step 7: First Test (Optional)
274
+
275
+ If the user wants to test the loop:
276
+
277
+ > Want to do a quick test? Here's how:
278
+ >
279
+ > 1. Write a simple spec in \`docs/specs/\` and push to main \u2014 this triggers scenario generation
280
+ > 2. Create a PR with a small change \u2014 when CI passes, scenarios will run
281
+ > 3. Watch for the scenario test results as a PR comment
282
+ >
283
+ > Or deliberately break something in a PR to test the autofix loop.
284
+
285
+ ## Step 8: Summary
286
+
287
+ Print a summary of what was set up:
288
+
289
+ > **Level 5 is live.** Here's what's running:
290
+ >
291
+ > | Trigger | What Happens |
292
+ > |---------|-------------|
293
+ > | Push specs to \`docs/specs/\` | Scenario agent writes holdout tests |
294
+ > | PR fails CI | Claude autofix attempts (up to 3x) |
295
+ > | PR passes CI | Holdout scenarios run against PR |
296
+ > | Scenarios update | Open PRs re-tested with latest scenarios |
297
+ >
298
+ > Your scenarios repo: \`{name}\`
299
+ > Your coding agent cannot see those tests. The holdout wall is intact.
300
+
301
+ Update \`docs/joycraft-assessment.md\` if it exists \u2014 set the Level 5 score to reflect the new setup.
302
+ `,
303
+ "joycraft-interview.md": `---
304
+ name: joycraft-interview
305
+ description: Brainstorm freely about what you want to build \u2014 yap, explore ideas, and get a structured summary you can use later
306
+ ---
307
+
308
+ # Interview \u2014 Idea Exploration
309
+
310
+ You are helping the user brainstorm and explore what they want to build. This is a lightweight, low-pressure conversation \u2014 not a formal spec process. Let them yap.
311
+
312
+ ## How to Run the Interview
313
+
314
+ ### 1. Open the Floor
315
+
316
+ Start with something like:
317
+ "What are you thinking about building? Just talk \u2014 I'll listen and ask questions as we go."
318
+
319
+ Let the user talk freely. Do not interrupt their flow. Do not push toward structure yet.
320
+
321
+ ### 2. Ask Clarifying Questions
322
+
323
+ As they talk, weave in questions naturally \u2014 don't fire them all at once:
324
+
325
+ - **What problem does this solve?** Who feels the pain today?
326
+ - **What does "done" look like?** If this worked perfectly, what would a user see?
327
+ - **What are the constraints?** Time, tech, team, budget \u2014 what boxes are we in?
328
+ - **What's NOT in scope?** What's tempting but should be deferred?
329
+ - **What are the edge cases?** What could go wrong? What's the weird input?
330
+ - **What exists already?** Are we building on something or starting fresh?
331
+
332
+ ### 3. Play Back Understanding
333
+
334
+ After the user has gotten their ideas out, reflect back:
335
+ "So if I'm hearing you right, you want to [summary]. The core problem is [X], and done looks like [Y]. Is that right?"
336
+
337
+ Let them correct and refine. Iterate until they say "yes, that's it."
338
+
339
+ ### 4. Write a Draft Brief
340
+
341
+ Create a draft file at \`docs/briefs/YYYY-MM-DD-topic-draft.md\`. Create the \`docs/briefs/\` directory if it doesn't exist.
342
+
343
+ Use this format:
344
+
345
+ \`\`\`markdown
346
+ # [Topic] \u2014 Draft Brief
347
+
348
+ > **Date:** YYYY-MM-DD
349
+ > **Status:** DRAFT
350
+ > **Origin:** /joycraft-interview session
351
+
352
+ ---
353
+
354
+ ## The Idea
355
+ [2-3 paragraphs capturing what the user described \u2014 their words, their framing]
356
+
357
+ ## Problem
358
+ [What pain or gap this addresses]
359
+
360
+ ## What "Done" Looks Like
361
+ [The user's description of success \u2014 observable outcomes]
362
+
363
+ ## Constraints
364
+ - [constraint 1]
365
+ - [constraint 2]
366
+
367
+ ## Open Questions
368
+ - [things that came up but weren't resolved]
369
+ - [decisions that need more thought]
370
+
371
+ ## Out of Scope (for now)
372
+ - [things explicitly deferred]
373
+
374
+ ## Raw Notes
375
+ [Any additional context, quotes, or tangents worth preserving]
376
+ \`\`\`
377
+
378
+ ### 5. Hand Off
379
+
380
+ After writing the draft, tell the user:
381
+
382
+ \`\`\`
383
+ Draft brief saved to docs/briefs/YYYY-MM-DD-topic-draft.md
384
+
385
+ When you're ready to move forward:
386
+ - /joycraft-new-feature \u2014 formalize this into a full Feature Brief with specs
387
+ - /joycraft-decompose \u2014 break it directly into atomic specs if scope is clear
388
+ - Or just keep brainstorming \u2014 run /joycraft-interview again anytime
389
+ \`\`\`
390
+
391
+ ## Guidelines
392
+
393
+ - **This is NOT /joycraft-new-feature.** Do not push toward formal briefs, decomposition tables, or atomic specs. The point is exploration.
394
+ - **Let the user lead.** Your job is to listen, clarify, and capture \u2014 not to structure or direct.
395
+ - **Mark everything as DRAFT.** The output is a starting point, not a commitment.
396
+ - **Keep it short.** The draft brief should be 1-2 pages max. Capture the essence, not every detail.
397
+ - **Multiple interviews are fine.** The user might run this several times as their thinking evolves. Each creates a new dated draft.
398
+ `,
399
+ "joycraft-new-feature.md": `---
400
+ name: joycraft-new-feature
401
+ description: Guided feature development \u2014 interview the user, produce a Feature Brief, then decompose into atomic specs
402
+ ---
403
+
404
+ # New Feature Workflow
405
+
406
+ You are starting a new feature. Follow this process in order. Do not skip steps.
407
+
408
+ ## Phase 1: Interview
409
+
410
+ Interview the user about what they want to build. Let them talk \u2014 your job is to listen, then sharpen.
411
+
412
+ **Why:** A thorough interview prevents wasted implementation time. Most failed features fail because the problem wasn't understood, not because the code was wrong.
413
+
414
+ **Ask about:**
415
+ - What problem does this solve? Who is affected?
416
+ - What does "done" look like? How will a user know this works?
417
+ - What are the hard constraints? (business rules, tech limitations, deadlines)
418
+ - What is explicitly NOT in scope? (push hard on this \u2014 aggressive scoping is key)
419
+ - Are there edge cases or error conditions we need to handle?
420
+ - What existing code/patterns should this follow?
421
+
422
+ **Interview technique:**
423
+ - Let the user "yap" \u2014 don't interrupt their flow of ideas
424
+ - After they finish, play back your understanding: "So if I'm hearing you right..."
425
+ - Ask clarifying questions that force specificity: "When you say 'handle errors,' what should the user see?"
426
+ - Push toward testable statements: "How would we verify that works?"
427
+
428
+ Keep asking until you can fill out a Feature Brief. When ready, say:
429
+ "I have enough context. Let me write the Feature Brief for your review."
430
+
431
+ ## Phase 2: Feature Brief
432
+
433
+ Write a Feature Brief to \`docs/briefs/YYYY-MM-DD-feature-name.md\`. Create the \`docs/briefs/\` directory if it doesn't exist.
434
+
435
+ **Why:** The brief is the single source of truth for what we're building. It prevents scope creep and gives every spec a shared reference point.
436
+
437
+ Use this structure:
438
+
439
+ \`\`\`markdown
440
+ # [Feature Name] \u2014 Feature Brief
441
+
442
+ > **Date:** YYYY-MM-DD
443
+ > **Project:** [project name]
444
+ > **Status:** Interview | Decomposing | Specs Ready | In Progress | Complete
445
+
446
+ ---
447
+
448
+ ## Vision
449
+ What are we building and why? The full picture in 2-4 paragraphs.
450
+
451
+ ## User Stories
452
+ - As a [role], I want [capability] so that [benefit]
453
+
454
+ ## Hard Constraints
455
+ - MUST: [constraint that every spec must respect]
456
+ - MUST NOT: [prohibition that every spec must respect]
457
+
458
+ ## Out of Scope
459
+ - NOT: [tempting but deferred]
460
+
461
+ ## Decomposition
462
+ | # | Spec Name | Description | Dependencies | Est. Size |
463
+ |---|-----------|-------------|--------------|-----------|
464
+ | 1 | [verb-object] | [one sentence] | None | [S/M/L] |
465
+
466
+ ## Execution Strategy
467
+ - [ ] Sequential (specs have chain dependencies)
468
+ - [ ] Parallel worktrees (specs are independent)
469
+ - [ ] Mixed
470
+
471
+ ## Success Criteria
472
+ - [ ] [End-to-end behavior 1]
473
+ - [ ] [No regressions in existing features]
474
+ \`\`\`
475
+
476
+ If \`docs/templates/FEATURE_BRIEF_TEMPLATE.md\` exists, reference it for the full template with additional guidance.
477
+
478
+ Present the brief to the user. Focus review on:
479
+ - "Does the decomposition match how you think about this?"
480
+ - "Is anything in scope that shouldn't be?"
481
+ - "Are the specs small enough? Can each be described in one sentence?"
482
+
483
+ Iterate until approved.
484
+
485
+ ## Phase 3: Generate Atomic Specs
486
+
487
+ For each row in the decomposition table, create a self-contained spec file at \`docs/specs/YYYY-MM-DD-spec-name.md\`. Create the \`docs/specs/\` directory if it doesn't exist.
488
+
489
+ **Why:** Each spec must be understandable WITHOUT reading the Feature Brief. This prevents the "Curse of Instructions" \u2014 no spec should require holding the entire feature in context. Copy relevant context into each spec.
490
+
491
+ Use this structure for each spec:
492
+
493
+ \`\`\`markdown
494
+ # [Verb + Object] \u2014 Atomic Spec
495
+
496
+ > **Parent Brief:** \`docs/briefs/YYYY-MM-DD-feature-name.md\`
497
+ > **Status:** Ready
498
+ > **Date:** YYYY-MM-DD
499
+ > **Estimated scope:** [1 session / N files / ~N lines]
500
+
501
+ ---
502
+
503
+ ## What
504
+ One paragraph \u2014 what changes when this spec is done?
505
+
506
+ ## Why
507
+ One sentence \u2014 what breaks or is missing without this?
508
+
509
+ ## Acceptance Criteria
510
+ - [ ] [Observable behavior]
511
+ - [ ] Build passes
512
+ - [ ] Tests pass
513
+
514
+ ## Constraints
515
+ - MUST: [hard requirement]
516
+ - MUST NOT: [hard prohibition]
517
+
518
+ ## Affected Files
519
+ | Action | File | What Changes |
520
+ |--------|------|-------------|
521
+
522
+ ## Approach
523
+ Strategy, data flow, key decisions. Name one rejected alternative.
524
+
525
+ ## Edge Cases
526
+ | Scenario | Expected Behavior |
527
+ |----------|------------------|
528
+ \`\`\`
529
+
530
+ If \`docs/templates/ATOMIC_SPEC_TEMPLATE.md\` exists, reference it for the full template with additional guidance.
531
+
532
+ ## Phase 4: Hand Off for Execution
533
+
534
+ Tell the user:
535
+ \`\`\`
536
+ Feature Brief and [N] atomic specs are ready.
537
+
538
+ Specs:
539
+ 1. [spec-name] \u2014 [one sentence] [S/M/L]
540
+ 2. [spec-name] \u2014 [one sentence] [S/M/L]
541
+ ...
542
+
543
+ Recommended execution:
544
+ - [Parallel/Sequential/Mixed strategy]
545
+ - Estimated: [N] sessions total
546
+
547
+ To execute: Start a fresh session per spec. Each session should:
548
+ 1. Read the spec
549
+ 2. Implement
550
+ 3. Run /joycraft-session-end to capture discoveries
551
+ 4. Commit and PR
552
+
553
+ Ready to start?
554
+ \`\`\`
555
+
556
+ **Why:** A fresh session for execution produces better results. The interview session has too much context noise \u2014 a clean session with just the spec is more focused.
557
+
558
+ You can also use \`/joycraft-decompose\` to re-decompose a brief if the breakdown needs adjustment, or run \`/joycraft-interview\` first for a lighter brainstorm before committing to the full workflow.
559
+ `,
560
+ "joycraft-session-end.md": `---
561
+ name: joycraft-session-end
562
+ description: Wrap up a session \u2014 capture discoveries, verify, prepare for PR or next session
563
+ ---
564
+
565
+ # Session Wrap-Up
566
+
567
+ Before ending this session, complete these steps in order.
568
+
569
+ ## 1. Capture Discoveries
570
+
571
+ **Why:** Discoveries are the surprises \u2014 things that weren't in the spec or that contradicted expectations. They prevent future sessions from hitting the same walls.
572
+
573
+ Check: did anything surprising happen during this session? If yes, create or update a discovery file at \`docs/discoveries/YYYY-MM-DD-topic.md\`. Create the \`docs/discoveries/\` directory if it doesn't exist.
574
+
575
+ Only capture what's NOT obvious from the code or git diff:
576
+ - "We thought X but found Y" \u2014 assumptions that were wrong
577
+ - "This API/library behaves differently than documented" \u2014 external gotchas
578
+ - "This edge case needs handling in a future spec" \u2014 deferred work with context
579
+ - "The approach in the spec didn't work because..." \u2014 spec-vs-reality gaps
580
+ - Key decisions made during implementation that aren't in the spec
581
+
582
+ **Do NOT capture:**
583
+ - Files changed (that's the diff)
584
+ - What you set out to do (that's the spec)
585
+ - Step-by-step narrative of the session (nobody re-reads these)
586
+
587
+ Use this format:
588
+
589
+ \`\`\`markdown
590
+ # Discoveries \u2014 [topic]
591
+
592
+ **Date:** YYYY-MM-DD
593
+ **Spec:** [link to spec if applicable]
594
+
595
+ ## [Discovery title]
596
+ **Expected:** [what we thought would happen]
597
+ **Actual:** [what actually happened]
598
+ **Impact:** [what this means for future work]
599
+ \`\`\`
600
+
601
+ If nothing surprising happened, skip the discovery file entirely. No discovery is a good sign \u2014 the spec was accurate.
602
+
603
+ ## 1b. Update Context Documents
604
+
605
+ If \`docs/context/\` exists, quickly check whether this session revealed anything about:
606
+
607
+ - **Production risks** \u2014 did you interact with or learn about production vs staging systems? \u2192 Update \`docs/context/production-map.md\`
608
+ - **Wrong assumptions** \u2014 did the agent (or you) assume something that turned out to be false? \u2192 Update \`docs/context/dangerous-assumptions.md\`
609
+ - **Key decisions** \u2014 did you make an architectural or tooling choice? \u2192 Add a row to \`docs/context/decision-log.md\`
610
+ - **Unwritten rules** \u2014 did you discover a convention or constraint not documented anywhere? \u2192 Update \`docs/context/institutional-knowledge.md\`
611
+
612
+ Skip this if nothing applies. Don't force it \u2014 only update when there's genuine new context.
613
+
614
+ ## 2. Run Validation
615
+
616
+ Run the project's validation commands. Check CLAUDE.md for project-specific commands. Common checks:
617
+
618
+ - Type-check (e.g., \`tsc --noEmit\`, \`mypy\`, \`cargo check\`)
619
+ - Tests (e.g., \`npm test\`, \`pytest\`, \`cargo test\`)
620
+ - Lint (e.g., \`eslint\`, \`ruff\`, \`clippy\`)
621
+
622
+ Fix any failures before proceeding.
623
+
624
+ ## 3. Update Spec Status
625
+
626
+ If working from an atomic spec in \`docs/specs/\`:
627
+ - All acceptance criteria met \u2014 update status to \`Complete\`
628
+ - Partially done \u2014 update status to \`In Progress\`, note what's left
629
+
630
+ If working from a Feature Brief in \`docs/briefs/\`, check off completed specs in the decomposition table.
631
+
632
+ ## 4. Commit
633
+
634
+ Commit all changes including the discovery file (if created) and spec status updates. The commit message should reference the spec if applicable.
635
+
636
+ ## 5. Report
637
+
638
+ \`\`\`
639
+ Session complete.
640
+ - Spec: [spec name] \u2014 [Complete / In Progress]
641
+ - Build: [passing / failing]
642
+ - Discoveries: [N items / none]
643
+ - Next: [what the next session should tackle, or "ready for PR"]
644
+ \`\`\`
645
+ `,
646
+ "joycraft-tune.md": `---
647
+ name: joycraft-tune
648
+ description: Assess and upgrade your project's AI development harness \u2014 score 7 dimensions, apply fixes, show path to Level 5
649
+ ---
650
+
651
+ # Tune \u2014 Project Harness Assessment & Upgrade
652
+
653
+ You are evaluating and upgrading this project's AI development harness. Follow these steps in order.
654
+
655
+ ## Step 1: Detect Harness State
656
+
657
+ Check the following and note what exists:
658
+
659
+ 1. **CLAUDE.md** \u2014 Read it if it exists. Check whether it contains meaningful content (not just a project name or generic README).
660
+ 2. **Key directories** \u2014 Check for: \`docs/specs/\`, \`docs/briefs/\`, \`docs/discoveries/\`, \`docs/templates/\`, \`.claude/skills/\`
661
+ 3. **Boundary framework** \u2014 Look for \`Always\`, \`Ask First\`, and \`Never\` sections in CLAUDE.md (or similar behavioral constraints under any heading).
662
+ 4. **Skills infrastructure** \u2014 Check \`.claude/skills/\` for installed skill files.
663
+ 5. **Test configuration** \u2014 Look for test commands in package.json, pyproject.toml, Cargo.toml, Makefile, or CI config files.
664
+
665
+ ## Step 2: Route Based on State
666
+
667
+ ### If No Harness (no CLAUDE.md, or CLAUDE.md is just a README with no structured sections):
668
+
669
+ Tell the user:
670
+ - Their project has no AI development harness
671
+ - Recommend running \`npx joycraft init\` to scaffold one
672
+ - Briefly explain what it sets up: CLAUDE.md with boundaries, spec/brief templates, skills, documentation structure
673
+ - **Stop here** \u2014 do not run the full assessment on a bare project
674
+
675
+ ### If Harness Exists (CLAUDE.md has structured content \u2014 boundaries, commands, architecture, or domain rules):
676
+
677
+ Continue to Step 3 for the full assessment.
678
+
679
+ ## Step 3: Score 7 Dimensions
680
+
681
+ Read CLAUDE.md thoroughly. Explore the project structure. Score each dimension on a 1-5 scale with specific evidence.
682
+
683
+ ### Dimension 1: Spec Quality
684
+
685
+ Look in \`docs/specs/\` for specification files.
686
+
687
+ | Score | Criteria |
688
+ |-------|----------|
689
+ | 1 | No specs directory or no spec files |
690
+ | 2 | Specs exist but are informal notes or TODOs |
691
+ | 3 | Specs have structure (sections, some criteria) but lack consistency |
692
+ | 4 | Specs are structured with clear acceptance criteria and constraints |
693
+ | 5 | Atomic specs: self-contained, acceptance criteria, constraints, edge cases, affected files |
694
+
695
+ **Evidence:** Number of specs found, example of best/worst, whether acceptance criteria are present.
696
+
697
+ ### Dimension 2: Spec Granularity
698
+
699
+ Can each spec be completed in a single coding session?
700
+
701
+ | Score | Criteria |
702
+ |-------|----------|
703
+ | 1 | No specs |
704
+ | 2 | Specs cover entire features or epics |
705
+ | 3 | Specs are feature-sized (multi-session but bounded) |
706
+ | 4 | Most specs are session-sized with clear scope |
707
+ | 5 | All specs are atomic \u2014 one session, one concern, clear done state |
708
+
709
+ ### Dimension 3: Behavioral Boundaries
710
+
711
+ Read CLAUDE.md for explicit behavioral constraints.
712
+
713
+ | Score | Criteria |
714
+ |-------|----------|
715
+ | 1 | No CLAUDE.md or no behavioral guidance |
716
+ | 2 | CLAUDE.md exists with general instructions but no structured boundaries |
717
+ | 3 | Some boundaries exist but not organized as Always/Ask First/Never |
718
+ | 4 | Always/Ask First/Never sections present with reasonable coverage |
719
+ | 5 | Comprehensive boundaries covering code style, testing, deployment, dependencies, and dangerous operations |
720
+
721
+ **Important:** Projects may have strong rules under different headings (e.g., "Critical Rules", "Constraints"). Give credit for substance over format \u2014 a project with clear, enforced rules scores higher than one with empty Always/Ask First/Never sections.
722
+
723
+ ### Dimension 4: Skills & Hooks
724
+
725
+ Look in \`.claude/skills/\` for skill files. Check for hooks configuration.
726
+
727
+ | Score | Criteria |
728
+ |-------|----------|
729
+ | 1 | No .claude/ directory |
730
+ | 2 | .claude/ exists but empty or minimal |
731
+ | 3 | A few skills installed, no hooks |
732
+ | 4 | Multiple relevant skills, basic hooks |
733
+ | 5 | Comprehensive skills covering workflow, hooks for validation |
734
+
735
+ ### Dimension 5: Documentation
736
+
737
+ Examine \`docs/\` directory structure and content.
738
+
739
+ | Score | Criteria |
740
+ |-------|----------|
741
+ | 1 | No docs/ directory |
742
+ | 2 | docs/ exists with ad-hoc files |
743
+ | 3 | Some structure (subdirectories) but inconsistent |
744
+ | 4 | Structured docs/ with templates and clear organization |
745
+ | 5 | Full structure: briefs/, specs/, templates/, architecture docs, referenced from CLAUDE.md |
746
+
747
+ ### Dimension 6: Knowledge Capture & Contextual Stewardship
748
+
749
+ Look for discoveries, decisions, session notes, and context documents.
750
+
751
+ | Score | Criteria |
752
+ |-------|----------|
753
+ | 1 | No knowledge capture mechanism |
754
+ | 2 | Ad-hoc notes or a discoveries directory with no entries |
755
+ | 3 | Discoveries directory with some entries, or context docs exist but empty |
756
+ | 4 | Active discoveries + at least 2 context docs with content (production-map, dangerous-assumptions, decision-log, institutional-knowledge) |
757
+ | 5 | Full contextual stewardship: discoveries with entries, all 4 context docs maintained, session-end workflow in active use |
758
+
759
+ **Check for:** \`docs/discoveries/\`, \`docs/context/production-map.md\`, \`docs/context/dangerous-assumptions.md\`, \`docs/context/decision-log.md\`, \`docs/context/institutional-knowledge.md\`. Score based on both existence AND whether they have real content (not just templates).
760
+
761
+ ### Dimension 7: Testing & Validation
762
+
763
+ Look for test config, CI setup, and validation commands.
764
+
765
+ | Score | Criteria |
766
+ |-------|----------|
767
+ | 1 | No test configuration |
768
+ | 2 | Test framework installed but few/no tests |
769
+ | 3 | Tests exist with reasonable coverage |
770
+ | 4 | Tests + CI pipeline configured |
771
+ | 5 | Tests + CI + validation commands in CLAUDE.md + scenario tests |
772
+
773
+ ## Step 4: Write Assessment
774
+
775
+ Write the assessment to \`docs/joycraft-assessment.md\` AND display it in the conversation. Use this format:
776
+
777
+ \`\`\`markdown
778
+ # Joycraft Assessment \u2014 [Project Name]
779
+
780
+ **Date:** [today's date]
781
+ **Overall Level:** [1-5, based on average score]
782
+
783
+ ## Scores
784
+
785
+ | Dimension | Score | Summary |
786
+ |-----------|-------|---------|
787
+ | Spec Quality | X/5 | [one-line summary] |
788
+ | Spec Granularity | X/5 | [one-line summary] |
789
+ | Behavioral Boundaries | X/5 | [one-line summary] |
790
+ | Skills & Hooks | X/5 | [one-line summary] |
791
+ | Documentation | X/5 | [one-line summary] |
792
+ | Knowledge Capture | X/5 | [one-line summary] |
793
+ | Testing & Validation | X/5 | [one-line summary] |
794
+
795
+ **Average:** X.X/5
796
+
797
+ ## Detailed Findings
798
+
799
+ ### [Dimension Name] \u2014 X/5
800
+ **Evidence:** [specific files, paths, counts found]
801
+ **Gap:** [what's missing]
802
+ **Recommendation:** [specific action to improve]
803
+
804
+ ## Upgrade Plan
805
+
806
+ To reach Level [current + 1], complete these steps:
807
+ 1. [Most impactful action] \u2014 addresses [dimension] (X -> Y)
808
+ 2. [Next action] \u2014 addresses [dimension] (X -> Y)
809
+ [up to 5 actions, ordered by impact]
810
+ \`\`\`
811
+
812
+ ## Step 5: Apply Upgrades
813
+
814
+ Immediately after presenting the assessment, apply upgrades using the three-tier model below. Do NOT ask for per-item permission \u2014 batch everything and show a consolidated report at the end.
815
+
816
+ ### Tier 1: Silent Apply (just do it)
817
+ These are safe, additive operations. Apply them without asking:
818
+ - Create missing directories (\`docs/specs/\`, \`docs/briefs/\`, \`docs/discoveries/\`, \`docs/templates/\`)
819
+ - Install missing skills to \`.claude/skills/\`
820
+ - Copy missing templates to \`docs/templates/\`
821
+ - Create AGENTS.md if it doesn't exist
822
+
823
+ ### Git Autonomy Preference
824
+
825
+ Before applying Behavioral Boundaries to CLAUDE.md, ask the user ONE question:
826
+
827
+ > How autonomous should git operations be?
828
+ > 1. **Cautious** \u2014 commits freely, asks before pushing or opening PRs *(good for learning the workflow)*
829
+ > 2. **Autonomous** \u2014 commits, pushes to branches, and opens PRs without asking *(good for spec-driven development)*
830
+
831
+ Based on their answer, use the appropriate git rules in the Behavioral Boundaries section:
832
+
833
+ **If Cautious (default):**
834
+ \`\`\`
835
+ ### ASK FIRST
836
+ - Pushing to remote
837
+ - Creating or merging pull requests
838
+ - Any destructive git operation (force-push, reset --hard, branch deletion)
839
+
840
+ ### NEVER
841
+ - Push directly to main/master without approval
842
+ - Amend commits that have been pushed
843
+ \`\`\`
844
+
845
+ **If Autonomous:**
846
+ \`\`\`
847
+ ### ALWAYS
848
+ - Push to feature branches after each commit
849
+ - Open a PR when all specs in a feature are complete
850
+ - Use descriptive branch names: feature/spec-name
851
+
852
+ ### ASK FIRST
853
+ - Merging PRs to main/master
854
+ - Any destructive git operation (force-push, reset --hard, branch deletion)
855
+
856
+ ### NEVER
857
+ - Push directly to main/master (always use feature branches + PR)
858
+ - Amend commits that have been pushed to remote
859
+ \`\`\`
860
+
861
+ ### Risk Interview
862
+
863
+ Before applying upgrades, ask 3-5 targeted questions to capture what's dangerous in this project. Skip this if \`docs/context/production-map.md\` or \`docs/context/dangerous-assumptions.md\` already exist (offer to update instead).
864
+
865
+ **Question 1:** "What could this agent break that would ruin your day? Think: production databases, live APIs, billing systems, user data, infrastructure."
866
+
867
+ From the answer, generate:
868
+ - NEVER rules for CLAUDE.md (e.g., "NEVER connect to production DB at postgres://prod.example.com")
869
+ - Deny patterns for .claude/settings.json (e.g., deny Bash commands containing production hostnames)
870
+
871
+ **Question 2:** "What external services does this project connect to? Which are production vs. staging/dev?"
872
+
873
+ From the answer, generate:
874
+ - \`docs/context/production-map.md\` documenting what's real vs safe to touch
875
+ - Include: service name, URL/endpoint, environment (prod/staging/dev), what happens if corrupted
876
+
877
+ **Question 3:** "What are the unwritten rules a new developer would need months to learn about this project?"
878
+
879
+ From the answer, generate:
880
+ - Additions to CLAUDE.md boundaries (new ALWAYS/ASK FIRST/NEVER rules)
881
+ - \`docs/context/dangerous-assumptions.md\` with "Agent might assume X, but actually Y"
882
+
883
+ **Question 4 (optional):** "What happened last time something went wrong with an automated tool or deploy?"
884
+
885
+ If the user has a story, capture the lesson as a specific NEVER rule and add to dangerous-assumptions.md.
886
+
887
+ **Question 5:** "Any files, directories, or commands that should be completely off-limits?"
888
+
889
+ From the answer, generate deny rules for .claude/settings.json and add to NEVER section.
890
+
891
+ **Rules for the interview:**
892
+ - Ask questions ONE AT A TIME, not all at once
893
+ - If the user says "nothing" or "skip", respect that and move on
894
+ - Keep it to 2-3 minutes total \u2014 don't interrogate
895
+ - Generate artifacts immediately after the interview, don't wait for all questions
896
+ - This is the SECOND and LAST set of questions during /joycraft-tune (first is git autonomy)
897
+
898
+ ### Tier 2: Apply and Show Diff (do it, then report)
899
+ These modify important files but are additive (append-only). Apply them, then show what changed so the user can review. Git is the undo button.
900
+ - Add missing sections to CLAUDE.md (Behavioral Boundaries, Development Workflow, Getting Started with Joycraft, Key Files, Common Gotchas)
901
+ - Use the git autonomy preference from above when generating the Behavioral Boundaries section
902
+ - Draft section content from the actual codebase \u2014 not generic placeholders. Read the project's real rules, real commands, real structure.
903
+ - Only append \u2014 never modify or reformat existing content
904
+
905
+ ### Tier 3: Confirm First (ask before acting)
906
+ These are potentially destructive or opinionated. Ask before proceeding:
907
+ - Rewriting or reorganizing existing CLAUDE.md sections
908
+ - Overwriting files the user has customized
909
+ - Suggesting test framework installation or CI setup (present as recommendations, don't auto-install)
910
+
911
+ ### Reading a Previous Assessment
912
+
913
+ If \`docs/joycraft-assessment.md\` already exists, read it first. If all recommendations have been applied, report "nothing to upgrade" and offer to re-assess.
914
+
915
+ ### After Applying
916
+
917
+ Append a history entry to \`docs/joycraft-history.md\` (create if needed):
918
+ \`\`\`
919
+ | [date] | [new avg score] | [change from last] | [summary of what changed] |
920
+ \`\`\`
921
+
922
+ Then display a single consolidated report:
923
+
924
+ \`\`\`markdown
925
+ ## Upgrade Results
926
+
927
+ | Dimension | Before | After | Change |
928
+ |------------------------|--------|-------|--------|
929
+ | Spec Quality | X/5 | X/5 | +X |
930
+ | ... | ... | ... | ... |
931
+
932
+ **Previous Level:** X \u2014 **New Level:** X
933
+
934
+ ### What Changed
935
+ - [list each change applied]
936
+
937
+ ### Remaining Gaps
938
+ - [anything still below 3.5, with specific next action]
939
+ \`\`\`
940
+
941
+ Update \`docs/joycraft-assessment.md\` with the new scores and today's date.
942
+
943
+ ## Step 6: Show Path to Level 5
944
+
945
+ After the upgrade report, always show the Level 5 roadmap tailored to the project's current state:
946
+
947
+ \`\`\`markdown
948
+ ## Path to Level 5 \u2014 Autonomous Development
949
+
950
+ You're at Level [X]. Here's what each level looks like:
951
+
952
+ | Level | You | AI | Key Skill |
953
+ |-------|-----|-----|-----------|
954
+ | 2 | Guide direction | Multi-file changes | AI-native tooling |
955
+ | 3 | Review diffs | Primary developer | Code review at scale |
956
+ | 4 | Write specs, check tests | End-to-end development | Specification writing |
957
+ | 5 | Define what + why | Specs in, software out | Systems design |
958
+
959
+ ### Your Next Steps Toward Level [X+1]:
960
+ 1. [Specific action based on current gaps \u2014 e.g., "Write your first atomic spec using /joycraft-new-feature"]
961
+ 2. [Next action \u2014 e.g., "Add vitest and write tests for your core logic"]
962
+ 3. [Next action \u2014 e.g., "Use /joycraft-session-end consistently to build your discoveries log"]
963
+
964
+ ### What Level 5 Looks Like (Your North Star):
965
+ - A backlog of ready specs that agents pull from and execute autonomously
966
+ - CI failures auto-generate fix specs \u2014 no human triage for regressions
967
+ - Multi-agent execution with parallel worktrees, one spec per agent
968
+ - External holdout scenarios (tests the agent can't see) prevent overfitting
969
+ - CLAUDE.md evolves from discoveries \u2014 the harness improves itself
970
+
971
+ ### You'll Know You're at Level 5 When:
972
+ - You describe a feature in one sentence and walk away
973
+ - The system produces a PR with tests, docs, and discoveries \u2014 without further input
974
+ - Failed CI runs generate their own fix specs
975
+ - Your harness improves without you manually editing CLAUDE.md
976
+
977
+ This is a significant journey. Most teams are at Level 2. Getting to Level 4 with Joycraft's workflow is achievable \u2014 Level 5 requires building validation infrastructure (scenario tests, spec queues, CI feedback loops) that goes beyond what Joycraft scaffolds today. But the harness you're building now is the foundation.
978
+ \`\`\`
979
+
980
+ Tailor the "Next Steps" section based on the project's actual gaps \u2014 don't show generic advice.
981
+
982
+ ## Edge Cases
983
+
984
+ - **Not a git repo:** Note this. Joycraft works best in a git repo.
985
+ - **CLAUDE.md is just a README:** Treat as "no harness."
986
+ - **Non-Joycraft skills already installed:** Acknowledge them. Do not replace \u2014 suggest additions.
987
+ - **Monorepo:** Assess the root CLAUDE.md. Note if component-level CLAUDE.md files exist.
988
+ - **Project has rules under non-standard headings:** Give credit. Suggest reformatting as Always/Ask First/Never but acknowledge the rules are there.
989
+ - **Assessment file missing when upgrading:** Run the full assessment first, then offer to apply.
990
+ - **Assessment is stale:** Warn and offer to re-assess before proceeding.
991
+ - **All recommendations already applied:** Report "nothing to upgrade" and stop.
992
+ - **User declines a recommendation:** Skip it, continue, include in "What Was Skipped."
993
+ - **CLAUDE.md does not exist at all:** Create it with recommended sections, but ask the user first.
994
+ - **Non-Joycraft content in CLAUDE.md:** Preserve exactly as-is. Only append or merge \u2014 never remove or reformat existing content.
995
+ `
996
+ };
997
+ var TEMPLATES = {
998
+ "context/dangerous-assumptions.md": `# Dangerous Assumptions
999
+
1000
+ > Things the AI agent might assume that are wrong in this project.
1001
+ > Generated by Joycraft risk interview. Update when you discover new gotchas.
1002
+
1003
+ ## Assumptions
1004
+
1005
+ | Agent Might Assume | But Actually | Impact If Wrong |
1006
+ |-------------------|-------------|----------------|
1007
+ | _Example: All databases are dev/test_ | _The default connection is production_ | _Data loss_ |
1008
+ | _Example: Deleting and recreating is safe_ | _Some resources have manual config not in code_ | _Hours of manual recovery_ |
1009
+
1010
+ ## Historical Incidents
1011
+
1012
+ | Date | What Happened | Lesson | Rule Added |
1013
+ |------|-------------|--------|------------|
1014
+ | _Example: 2026-03-15_ | _Agent deleted staging infra thinking it was temp_ | _Always verify environment before destructive ops_ | _NEVER: Delete cloud resources without listing them first_ |
1015
+ `,
1016
+ "context/decision-log.md": `# Decision Log
1017
+
1018
+ > Why choices were made, not just what was chosen.
1019
+ > Update this when making architectural, tooling, or process decisions.
1020
+ > This is the institutional memory that prevents re-litigating settled questions.
1021
+
1022
+ ## Decisions
1023
+
1024
+ | Date | Decision | Why | Alternatives Rejected | Revisit When |
1025
+ |------|----------|-----|----------------------|-------------|
1026
+ | _Example: 2026-03-15_ | _Use Supabase over Firebase_ | _Postgres flexibility, row-level security, self-hostable_ | _Firebase (vendor lock-in), PlanetScale (no RLS)_ | _If we need real-time sync beyond Supabase's capabilities_ |
1027
+
1028
+ ## Principles
1029
+
1030
+ _Capture recurring decision patterns here \u2014 they save time on future choices._
1031
+
1032
+ - _Example: "Prefer tools we can self-host over pure SaaS \u2014 reduces vendor risk"_
1033
+ - _Example: "Choose boring technology for infrastructure, cutting-edge only for core differentiators"_
1034
+ `,
1035
+ "context/institutional-knowledge.md": `# Institutional Knowledge
1036
+
1037
+ > Unwritten rules, team conventions, and organizational context that AI agents can't derive from code.
1038
+ > This is the knowledge that takes a new developer months to absorb.
1039
+ > Update when you catch yourself saying "oh, you didn't know about that?"
1040
+
1041
+ ## Team Conventions
1042
+
1043
+ _Things everyone on the team knows but nobody wrote down._
1044
+
1045
+ - _Example: "We never deploy on Fridays"_
1046
+ - _Example: "The CEO reviews all UI changes before they ship"_
1047
+ - _Example: "PR titles must reference the Jira ticket number"_
1048
+
1049
+ ## Organizational Constraints
1050
+
1051
+ _Business rules, compliance requirements, or political realities that affect technical decisions._
1052
+
1053
+ - _Example: "Legal requires all user data to be stored in EU regions"_
1054
+ - _Example: "The payments team owns the billing schema \u2014 never modify without their approval"_
1055
+ - _Example: "We have an informal agreement with Vendor X about API rate limits"_
1056
+
1057
+ ## Historical Context
1058
+
1059
+ _Why things are the way they are \u2014 especially when it looks wrong._
1060
+
1061
+ - _Example: "The auth module uses an old pattern because it predates our TypeScript migration \u2014 don't refactor without a spec"_
1062
+ - _Example: "The caching layer has a 5-second TTL because we had a consistency bug in 2025 \u2014 increasing it requires careful testing"_
1063
+
1064
+ ## People & Ownership
1065
+
1066
+ _Who owns what, who to ask, who cares about what._
1067
+
1068
+ - _Example: "Alice owns the payment pipeline \u2014 all changes need her review"_
1069
+ - _Example: "The data team is sensitive about query performance on the analytics tables"_
1070
+ `,
1071
+ "context/production-map.md": `# Production Map
1072
+
1073
+ > What's real, what's staging, what's safe to touch.
1074
+ > Generated by Joycraft risk interview. Update as your infrastructure evolves.
1075
+
1076
+ ## Services
1077
+
1078
+ | Service | Environment | URL/Endpoint | Impact if Corrupted |
1079
+ |---------|-------------|-------------|-------------------|
1080
+ | _Example: Main DB_ | _Production_ | _postgres://prod.example.com_ | _1.9M user records lost_ |
1081
+ | _Example: Staging DB_ | _Staging_ | _postgres://staging.example.com_ | _Test data only, safe to reset_ |
1082
+
1083
+ ## Secrets & Credentials
1084
+
1085
+ | Secret | Location | Notes |
1086
+ |--------|----------|-------|
1087
+ | _Example: DATABASE_URL_ | _.env.local_ | _Production connection \u2014 NEVER commit_ |
1088
+
1089
+ ## Safe to Touch
1090
+
1091
+ - [ ] Staging environment at [URL]
1092
+ - [ ] Test/fixture data in [location]
1093
+ - [ ] Development API keys
1094
+
1095
+ ## NEVER Touch Without Explicit Approval
1096
+
1097
+ - [ ] Production database
1098
+ - [ ] Live API endpoints
1099
+ - [ ] User-facing infrastructure
1100
+ `,
1101
+ "examples/example-brief.md": `# Add User Notifications \u2014 Feature Brief
1102
+
1103
+ > **Date:** 2026-03-15
1104
+ > **Project:** acme-web
1105
+ > **Status:** Specs Ready
1106
+
1107
+ ---
1108
+
1109
+ ## Vision
1110
+
1111
+ Our users have no idea when things happen in their account. A teammate comments on their pull request, a deployment finishes, a billing threshold is hit \u2014 they find out by accident, minutes or hours later. This is the #1 complaint in our last user survey.
1112
+
1113
+ We are building a notification system that delivers real-time and batched notifications across in-app, email, and (later) Slack channels. Users will have fine-grained control over what they receive and how. When this ships, no important event goes unnoticed, and no user gets buried in noise they didn't ask for.
1114
+
1115
+ The system is designed to be extensible \u2014 new event types plug in without touching the notification infrastructure. We start with three event types (PR comments, deploy status, billing alerts) and prove the pattern works before expanding.
1116
+
1117
+ ## User Stories
1118
+
1119
+ - As a developer, I want to see a notification badge in the app when someone comments on my PR so that I can respond quickly
1120
+ - As a team lead, I want to receive an email when a production deployment fails so that I can coordinate the response
1121
+ - As a billing admin, I want to get alerted when usage exceeds 80% of our plan limit so that I can upgrade before service is disrupted
1122
+ - As any user, I want to control which notifications I receive and through which channels so that I am not overwhelmed
1123
+
1124
+ ## Hard Constraints
1125
+
1126
+ - MUST: All notifications go through a single event bus \u2014 no direct coupling between event producers and delivery channels
1127
+ - MUST: Email delivery uses the existing SendGrid integration (do not add a new email provider)
1128
+ - MUST: Respect user preferences before delivering \u2014 never send a notification the user has opted out of
1129
+ - MUST NOT: Store notification content in plaintext in the database \u2014 use the existing encryption-at-rest pattern
1130
+ - MUST NOT: Send more than 50 emails per user per day (batch if necessary)
1131
+
1132
+ ## Out of Scope
1133
+
1134
+ - NOT: Slack/Discord integration (Phase 2)
1135
+ - NOT: Push notifications / mobile (Phase 2)
1136
+ - NOT: Notification templates with rich HTML \u2014 plain text and simple markdown only for now
1137
+ - NOT: Admin dashboard for monitoring notification delivery rates
1138
+ - NOT: Retroactive notifications for events that happened before the feature ships
1139
+
1140
+ ## Decomposition
1141
+
1142
+ | # | Spec Name | Description | Dependencies | Est. Size |
1143
+ |---|-----------|-------------|--------------|-----------|
1144
+ | 1 | add-notification-preferences-api | Create REST endpoints for users to read and update their notification preferences | None | M |
1145
+ | 2 | add-event-bus-infrastructure | Set up the internal event bus that decouples event producers from notification delivery | None | M |
1146
+ | 3 | add-notification-delivery-service | Build the service that consumes events, checks preferences, and dispatches to channels (in-app, email) | Spec 1, Spec 2 | L |
1147
+ | 4 | add-in-app-notification-ui | Add notification bell, dropdown, and badge count to the app header | Spec 3 | M |
1148
+ | 5 | add-email-batching | Implement daily digest batching for email notifications that exceed the per-user threshold | Spec 3 | S |
1149
+
1150
+ ## Execution Strategy
1151
+
1152
+ - [x] Agent teams (parallel teammates within phases, sequential between phases)
1153
+
1154
+ \`\`\`
1155
+ Phase 1: Teammate A -> Spec 1 (preferences API), Teammate B -> Spec 2 (event bus)
1156
+ Phase 2: Teammate A -> Spec 3 (delivery service) \u2014 depends on Phase 1
1157
+ Phase 3: Teammate A -> Spec 4 (UI), Teammate B -> Spec 5 (batching) \u2014 both depend on Spec 3
1158
+ \`\`\`
1159
+
1160
+ ## Success Criteria
1161
+
1162
+ - [ ] User updates notification preferences via API, and subsequent events respect those preferences
1163
+ - [ ] A PR comment event triggers an in-app notification visible in the UI within 2 seconds
1164
+ - [ ] A deploy failure event sends an email to subscribed users via SendGrid
1165
+ - [ ] When email threshold (50/day) is exceeded, remaining notifications are batched into a daily digest
1166
+ - [ ] No regressions in existing PR, deployment, or billing features
1167
+
1168
+ ## External Scenarios
1169
+
1170
+ | Scenario | What It Tests | Pass Criteria |
1171
+ |----------|--------------|---------------|
1172
+ | opt-out-respected | User disables email for deploy events, deploy fails | No email sent, in-app notification still appears |
1173
+ | batch-threshold | Send 51 email-eligible events for one user in a day | 50 individual emails + 1 digest containing the overflow |
1174
+ | preference-persistence | User sets preferences, logs out, logs back in | Preferences are unchanged |
1175
+ `,
1176
+ "examples/example-spec.md": `# Add Notification Preferences API \u2014 Atomic Spec
1177
+
1178
+ > **Parent Brief:** \`docs/briefs/2026-03-15-add-user-notifications.md\`
1179
+ > **Status:** Ready
1180
+ > **Date:** 2026-03-15
1181
+ > **Estimated scope:** 1 session / 4 files / ~250 lines
1182
+
1183
+ ---
1184
+
1185
+ ## What
1186
+
1187
+ Add REST API endpoints that let users read and update their notification preferences. Each user gets a preferences record with per-event-type, per-channel toggles (e.g., "PR comments: in-app=on, email=off"). Preferences default to all-on for new users and are stored encrypted alongside the user profile.
1188
+
1189
+ ## Why
1190
+
1191
+ The notification delivery service (Spec 3) needs to check preferences before dispatching. Without this API, there is no way for users to control what they receive, and we cannot build the delivery pipeline.
1192
+
1193
+ ## Acceptance Criteria
1194
+
1195
+ - [ ] \`GET /api/v1/notifications/preferences\` returns the current user's preferences as JSON
1196
+ - [ ] \`PATCH /api/v1/notifications/preferences\` updates one or more preference fields and returns the updated record
1197
+ - [ ] New users get default preferences (all channels enabled for all event types) on first read
1198
+ - [ ] Preferences are validated \u2014 unknown event types or channels return 400
1199
+ - [ ] Preferences are stored using the existing encryption-at-rest pattern (\`EncryptedJsonColumn\`)
1200
+ - [ ] Endpoint requires authentication (returns 401 for unauthenticated requests)
1201
+ - [ ] Build passes
1202
+ - [ ] Tests pass (unit + integration)
1203
+
1204
+ ## Constraints
1205
+
1206
+ - MUST: Use the existing \`EncryptedJsonColumn\` utility for storage \u2014 do not roll a new encryption pattern
1207
+ - MUST: Follow the existing REST controller pattern in \`src/controllers/\`
1208
+ - MUST NOT: Expose other users' preferences (scope queries to authenticated user only)
1209
+ - SHOULD: Return the full preferences object on PATCH (not just the changed fields), so the frontend can replace state without merging
1210
+
1211
+ ## Affected Files
1212
+
1213
+ | Action | File | What Changes |
1214
+ |--------|------|-------------|
1215
+ | Create | \`src/controllers/notification-preferences.controller.ts\` | New controller with GET and PATCH handlers |
1216
+ | Create | \`src/models/notification-preferences.model.ts\` | Sequelize model with EncryptedJsonColumn for preferences blob |
1217
+ | Create | \`src/migrations/20260315-add-notification-preferences.ts\` | Database migration to create notification_preferences table |
1218
+ | Create | \`tests/controllers/notification-preferences.test.ts\` | Unit and integration tests for both endpoints |
1219
+ | Modify | \`src/routes/index.ts\` | Register the new controller routes |
1220
+
1221
+ ## Approach
1222
+
1223
+ Create a \`NotificationPreferences\` model backed by a single \`notification_preferences\` table with columns: \`id\`, \`user_id\` (unique FK), \`preferences\` (EncryptedJsonColumn), \`created_at\`, \`updated_at\`. The \`preferences\` column stores a JSON blob shaped like \`{ "pr_comment": { "in_app": true, "email": true }, "deploy_status": { ... } }\`.
1224
+
1225
+ The GET endpoint does a find-or-create: if no record exists for the user, create one with defaults and return it. The PATCH endpoint deep-merges the request body into the existing preferences, validates the result against a known schema of event types and channels, and saves.
1226
+
1227
+ **Rejected alternative:** Storing preferences as individual rows (one per event-type-channel pair). This would make queries more complex and would require N rows per user instead of 1. The JSON blob approach is simpler and matches how the frontend will consume the data.
1228
+
1229
+ ## Edge Cases
1230
+
1231
+ | Scenario | Expected Behavior |
1232
+ |----------|------------------|
1233
+ | PATCH with empty body \`{}\` | Return 200 with unchanged preferences (no-op) |
1234
+ | PATCH with unknown event type \`{"foo": {"email": true}}\` | Return 400 with validation error listing valid event types |
1235
+ | GET for user with no existing record | Create default preferences, return 200 |
1236
+ | Concurrent PATCH requests | Last-write-wins (optimistic, no locking) \u2014 acceptable for user preferences |
1237
+ `,
1238
+ "scenarios/README.md": `# $SCENARIOS_REPO
1239
+
1240
+ Holdout scenario tests for the main project. These tests run in CI against the
1241
+ built artifact of each PR \u2014 but they live here, in a separate repository, so
1242
+ the coding agent working on the main project cannot see them.
1243
+
1244
+ ---
1245
+
1246
+ ## What is the holdout pattern?
1247
+
1248
+ Think of it like a validation set in machine learning. When you train a model,
1249
+ you keep a slice of your data hidden from the training process. If the model
1250
+ scores well on data it has never seen, you can trust that it has actually
1251
+ learned something \u2014 not just memorized the training examples.
1252
+
1253
+ Scenario tests work the same way. The coding agent writes code and passes
1254
+ internal tests in the main repo. These scenario tests then check whether the
1255
+ result behaves correctly from a real user's perspective, using only the public
1256
+ interface of the built artifact.
1257
+
1258
+ Because the agent cannot read this repository, it cannot game the tests. A
1259
+ passing scenario run means the feature genuinely works.
1260
+
1261
+ ---
1262
+
1263
+ ## Why a separate repository?
1264
+
1265
+ A single repository would expose the tests to the agent. Claude Code reads
1266
+ files in the working directory; if scenario tests lived in the main repo, the
1267
+ agent could (and would) read them when fixing failures, which defeats the
1268
+ purpose.
1269
+
1270
+ A separate repo also means:
1271
+
1272
+ - The test suite can be updated by humans without triggering the autofix loop
1273
+ - Scenarios can reference multiple projects over time
1274
+ - Access controls are independent \u2014 the scenarios repo can be more restricted
1275
+
1276
+ ---
1277
+
1278
+ ## How the CI pipeline works
1279
+
1280
+ \`\`\`
1281
+ Main repo PR opened
1282
+ |
1283
+ v
1284
+ Main repo CI runs (unit + integration tests)
1285
+ |
1286
+ | passes
1287
+ v
1288
+ scenarios-dispatch.yml fires a repository_dispatch event
1289
+ |
1290
+ v
1291
+ This repo: run.yml receives the event
1292
+ |
1293
+ +-- clones main-repo PR branch to ../main-repo
1294
+ |
1295
+ +-- builds the artifact (npm ci && npm run build)
1296
+ |
1297
+ +-- runs: NO_COLOR=1 npx vitest run
1298
+ |
1299
+ +-- captures exit code + output
1300
+ |
1301
+ v
1302
+ Posts PASS / FAIL comment on the originating PR
1303
+ \`\`\`
1304
+
1305
+ The PR author sees the scenario result as a comment. No separate status check
1306
+ is required, though you can add one via the GitHub Checks API if you prefer.
1307
+
1308
+ ---
1309
+
1310
+ ## Adding scenarios
1311
+
1312
+ ### Rules
1313
+
1314
+ 1. **Behavioral, not structural.** Test what the tool does, not how it is
1315
+ built internally. Invoke the binary; assert on stdout, exit codes, and
1316
+ filesystem state. Never import from \`../main-repo/src\`.
1317
+
1318
+ 2. **End-to-end.** Each test should represent something a real user would
1319
+ actually do. If you would not put it in a demo or docs example, reconsider
1320
+ whether it belongs here.
1321
+
1322
+ 3. **No source imports.** The entire point of the holdout is that tests cannot
1323
+ see source code. Any \`import\` that reaches into \`../main-repo/src\` breaks
1324
+ the pattern.
1325
+
1326
+ 4. **Independent.** Each test must be able to run in isolation. Use \`beforeEach\`
1327
+ / \`afterEach\` to set up and tear down temp directories. Do not share mutable
1328
+ state between tests.
1329
+
1330
+ 5. **Deterministic.** Avoid network calls, timestamps, or random values in
1331
+ assertions unless the feature under test genuinely involves them.
1332
+
1333
+ ### File layout
1334
+
1335
+ \`\`\`
1336
+ $SCENARIOS_REPO/
1337
+ \u251C\u2500\u2500 example-scenario.test.ts # Starter file \u2014 replace with real scenarios
1338
+ \u251C\u2500\u2500 workflows/
1339
+ \u2502 \u2514\u2500\u2500 run.yml # CI workflow (do not rename)
1340
+ \u251C\u2500\u2500 package.json
1341
+ \u2514\u2500\u2500 README.md
1342
+ \`\`\`
1343
+
1344
+ Add new \`.test.ts\` files at the top level or in subdirectories. Vitest will
1345
+ discover them automatically.
1346
+
1347
+ ### Example structure
1348
+
1349
+ \`\`\`ts
1350
+ import { spawnSync } from "node:child_process";
1351
+ import { join } from "node:path";
1352
+
1353
+ const CLI = join(__dirname, "..", "main-repo", "dist", "cli.js");
1354
+
1355
+ it("init creates a CLAUDE.md file", () => {
1356
+ const tmp = mkdtempSync(join(tmpdir(), "scenario-"));
1357
+ const { status } = spawnSync("node", [CLI, "init", tmp], { encoding: "utf8" });
1358
+ expect(status).toBe(0);
1359
+ expect(existsSync(join(tmp, "CLAUDE.md"))).toBe(true);
1360
+ });
1361
+ \`\`\`
1362
+
1363
+ ---
1364
+
1365
+ ## Internal tests vs scenario tests
1366
+
1367
+ | | Internal tests (main repo) | Scenario tests (this repo) |
1368
+ |---|---|---|
1369
+ | Location | \`tests/\` in main repo | This repo |
1370
+ | Visible to agent | Yes | No |
1371
+ | What they test | Units, modules, logic | End-to-end behavior |
1372
+ | Import source code | Yes | Never |
1373
+ | Run on every push | Yes | Yes (via dispatch) |
1374
+ | Purpose | Catch regressions fast | Validate real behavior |
1375
+
1376
+ ---
1377
+
1378
+ ## Relationship to Joycraft
1379
+
1380
+ This repository was bootstrapped by \`npx joycraft init --autofix\`. Joycraft
1381
+ manages the \`run.yml\` workflow and keeps it in sync when you run
1382
+ \`npx joycraft upgrade\`. The test files are yours \u2014 Joycraft will never
1383
+ overwrite them.
1384
+
1385
+ If the \`run.yml\` workflow needs updating (e.g., a new version of
1386
+ \`actions/create-github-app-token\`), run \`npx joycraft upgrade\` in this repo
1387
+ and review the diff before applying.
1388
+ `,
1389
+ "scenarios/example-scenario.test.ts": `/**
1390
+ * Example Scenario Test
1391
+ *
1392
+ * This file is a template for scenario tests in your holdout repository.
1393
+ * Scenarios are behavioral, end-to-end tests that run against the BUILT
1394
+ * artifact of your main project \u2014 not its source code.
1395
+ *
1396
+ * The Holdout Pattern
1397
+ * -------------------
1398
+ * These tests live in a SEPARATE repository that your coding agent cannot
1399
+ * see. This is intentional: if the agent could read these tests, it could
1400
+ * write code that passes them without actually solving the problem correctly
1401
+ * (the same way a student who sees the exam beforehand can score well without
1402
+ * understanding the material).
1403
+ *
1404
+ * In CI, the main repo is cloned to ../main-repo (relative to this repo's
1405
+ * checkout). The run.yml workflow builds the artifact there before running
1406
+ * these tests, so \`../main-repo\` is always available and already built.
1407
+ *
1408
+ * How to Write Scenarios
1409
+ * ----------------------
1410
+ * DO:
1411
+ * - Invoke the built binary / entry point via child_process (execSync, spawnSync)
1412
+ * - Test observable behavior: exit codes, stdout/stderr content, file system state
1413
+ * - Write scenarios around things a real user would actually do
1414
+ * - Keep each test fully independent \u2014 no shared state between tests
1415
+ *
1416
+ * DON'T:
1417
+ * - Import from ../main-repo/src \u2014 that defeats the holdout
1418
+ * - Test internal implementation details (function names, module structure)
1419
+ * - Rely on network access unless your tool genuinely requires it
1420
+ * - Share mutable fixtures across tests
1421
+ */
1422
+
1423
+ import { execSync, spawnSync } from "node:child_process";
1424
+ import { existsSync, mkdtempSync, rmSync } from "node:fs";
1425
+ import { tmpdir } from "node:os";
1426
+ import { join } from "node:path";
1427
+ import { afterEach, beforeEach, describe, expect, it } from "vitest";
1428
+
1429
+ // Path to the built CLI entry point in the main repo.
1430
+ // The run.yml workflow clones the main repo to ../main-repo and builds it
1431
+ // before this test file runs, so this path is always valid in CI.
1432
+ const CLI = join(__dirname, "..", "main-repo", "dist", "cli.js");
1433
+
1434
+ // ---------------------------------------------------------------------------
1435
+ // Helpers
1436
+ // ---------------------------------------------------------------------------
1437
+
1438
+ /** Run the CLI and return { stdout, stderr, status }. Never throws. */
1439
+ function runCLI(args: string[], cwd?: string) {
1440
+ const result = spawnSync("node", [CLI, ...args], {
1441
+ encoding: "utf8",
1442
+ cwd: cwd ?? process.cwd(),
1443
+ env: { ...process.env, NO_COLOR: "1" },
1444
+ });
1445
+ return {
1446
+ stdout: result.stdout ?? "",
1447
+ stderr: result.stderr ?? "",
1448
+ status: result.status ?? 1,
1449
+ };
1450
+ }
1451
+
1452
+ // ---------------------------------------------------------------------------
1453
+ // Basic invocation scenarios
1454
+ // ---------------------------------------------------------------------------
1455
+
1456
+ describe("CLI: basic invocation", () => {
1457
+ it("--help prints usage information", () => {
1458
+ const { stdout, status } = runCLI(["--help"]);
1459
+ expect(status).toBe(0);
1460
+ expect(stdout).toContain("Usage:");
1461
+ });
1462
+
1463
+ it("--version returns a semver string", () => {
1464
+ const { stdout, status } = runCLI(["--version"]);
1465
+ expect(status).toBe(0);
1466
+ // Matches x.y.z, x.y.z-alpha.1, etc.
1467
+ expect(stdout.trim()).toMatch(/^\\d+\\.\\d+\\.\\d+/);
1468
+ });
1469
+
1470
+ it("unknown command exits non-zero", () => {
1471
+ const { status } = runCLI(["not-a-real-command"]);
1472
+ expect(status).not.toBe(0);
1473
+ });
1474
+ });
1475
+
1476
+ // ---------------------------------------------------------------------------
1477
+ // Example: filesystem interaction scenario
1478
+ //
1479
+ // This pattern is useful when your CLI creates or modifies files.
1480
+ // Each test gets a fresh temp directory so they can't interfere.
1481
+ // ---------------------------------------------------------------------------
1482
+
1483
+ describe("CLI: init command (example \u2014 replace with your real scenarios)", () => {
1484
+ let tmpDir: string;
1485
+
1486
+ beforeEach(() => {
1487
+ tmpDir = mkdtempSync(join(tmpdir(), "scenarios-"));
1488
+ });
1489
+
1490
+ afterEach(() => {
1491
+ rmSync(tmpDir, { recursive: true, force: true });
1492
+ });
1493
+
1494
+ it("init creates expected output in an empty directory", () => {
1495
+ // This is a placeholder. Replace with whatever your CLI actually does.
1496
+ // The point is: invoke the binary, observe side effects, assert on them.
1497
+ const { status } = runCLI(["init", tmpDir]);
1498
+
1499
+ // Example assertions \u2014 adjust to your tool's actual behavior:
1500
+ // expect(status).toBe(0);
1501
+ // expect(existsSync(join(tmpDir, "CLAUDE.md"))).toBe(true);
1502
+
1503
+ // Remove this line once you've written a real assertion above:
1504
+ expect(typeof status).toBe("number"); // placeholder
1505
+ });
1506
+ });
1507
+ `,
1508
+ "scenarios/package.json": `{
1509
+ "name": "$SCENARIOS_REPO",
1510
+ "version": "0.0.1",
1511
+ "private": true,
1512
+ "type": "module",
1513
+ "scripts": {
1514
+ "test": "vitest run"
1515
+ },
1516
+ "devDependencies": {
1517
+ "vitest": "^3.0.0"
1518
+ }
1519
+ }
1520
+ `,
1521
+ "scenarios/prompts/scenario-agent.md": `You are a QA engineer working in a holdout test repository. You CANNOT access the main repository's source code. Your job is to write or update behavioral scenario tests based on specs that are pushed from the main repo.
1522
+
1523
+ ## What You Have Access To
1524
+
1525
+ - This scenarios repository (test files, \`specs/\` mirror, \`package.json\`)
1526
+ - The incoming spec (provided below)
1527
+ - A list of existing test files and spec mirrors (provided below)
1528
+ - The main repo is available at \`../main-repo\` and is already built \u2014 you can invoke its CLI or entry point via \`execSync\`/\`spawnSync\`, but you MUST NOT import from \`../main-repo/src\`
1529
+
1530
+ ## Triage Decision Tree
1531
+
1532
+ Read the incoming spec carefully. Decide which of these three actions to take:
1533
+
1534
+ ### SKIP \u2014 Do nothing if the spec is:
1535
+ - An internal refactor with no user-facing behavior change (e.g., "extract module", "rename internal type")
1536
+ - CI or dev tooling changes (e.g., "add lint rule", "update GitHub Actions workflow")
1537
+ - Documentation-only changes
1538
+ - Performance improvements with identical observable behavior
1539
+
1540
+ If you SKIP, write a brief comment in the relevant test file (or a new one) explaining why, then stop.
1541
+
1542
+ ### NEW \u2014 Create a new test file if the spec describes:
1543
+ - A new command, flag, or subcommand
1544
+ - A new output format or file that gets generated
1545
+ - A new user-facing behavior that doesn't map to any existing test file
1546
+
1547
+ Name the file after the feature area: \`[feature-area].test.ts\`. One feature area per test file.
1548
+
1549
+ ### UPDATE \u2014 Modify an existing test file if the spec:
1550
+ - Changes behavior that is already tested
1551
+ - Adds a flag or option to an existing command
1552
+ - Modifies output format for an existing feature
1553
+
1554
+ Match to the most relevant existing test file by feature area.
1555
+
1556
+ **If you are unsure whether a spec is user-facing, err on the side of writing a test.**
1557
+
1558
+ ## Test Writing Rules
1559
+
1560
+ 1. **Behavioral only.** Test observable output \u2014 stdout, stderr, exit codes, files created/modified on disk. Never test internal implementation details or import source modules.
1561
+
1562
+ 2. **Use \`execSync\` or \`spawnSync\`.** Invoke the built binary at \`../main-repo/dist/cli.js\` (or whatever the main repo's entry point is). Check \`../main-repo/package.json\` to find the correct entry point if unsure.
1563
+
1564
+ 3. **Use vitest.** Import \`describe\`, \`it\`, \`expect\` from \`vitest\`. Use \`beforeEach\`/\`afterEach\` for temp directory setup/teardown.
1565
+
1566
+ 4. **Each test is fully independent.** No shared mutable state between tests. Each test that touches the filesystem gets its own temp directory via \`mkdtempSync\`.
1567
+
1568
+ 5. **Assert on realistic user actions.** Write tests that reflect what a real user would do \u2014 not what the implementation happens to do.
1569
+
1570
+ 6. **Never import from the parent repo's source.** If you find yourself writing \`import { ... } from '../main-repo/src/...'\`, stop \u2014 that defeats the holdout.
1571
+
1572
+ ## Test File Template
1573
+
1574
+ \`\`\`typescript
1575
+ import { execSync, spawnSync } from 'node:child_process';
1576
+ import { existsSync, mkdtempSync, rmSync, readFileSync } from 'node:fs';
1577
+ import { tmpdir } from 'node:os';
1578
+ import { join } from 'node:path';
1579
+ import { describe, it, expect, beforeEach, afterEach } from 'vitest';
1580
+
1581
+ const CLI = join(__dirname, '..', 'main-repo', 'dist', 'cli.js');
1582
+
1583
+ function runCLI(args: string[], cwd?: string) {
1584
+ const result = spawnSync('node', [CLI, ...args], {
1585
+ encoding: 'utf8',
1586
+ cwd: cwd ?? process.cwd(),
1587
+ env: { ...process.env, NO_COLOR: '1' },
1588
+ });
1589
+ return {
1590
+ stdout: result.stdout ?? '',
1591
+ stderr: result.stderr ?? '',
1592
+ status: result.status ?? 1,
1593
+ };
1594
+ }
1595
+
1596
+ describe('[feature area]: [behavior being tested]', () => {
1597
+ let tmpDir: string;
1598
+
1599
+ beforeEach(() => {
1600
+ tmpDir = mkdtempSync(join(tmpdir(), 'scenarios-'));
1601
+ });
1602
+
1603
+ afterEach(() => {
1604
+ rmSync(tmpDir, { recursive: true, force: true });
1605
+ });
1606
+
1607
+ it('[specific observable behavior]', () => {
1608
+ const { stdout, status } = runCLI(['command', 'args'], tmpDir);
1609
+ expect(status).toBe(0);
1610
+ expect(stdout).toContain('expected output');
1611
+ });
1612
+ });
1613
+ \`\`\`
1614
+
1615
+ ## Checklist Before Committing
1616
+
1617
+ - [ ] Decision: SKIP / NEW / UPDATE (and why)
1618
+ - [ ] Tests assert on observable behavior, not implementation
1619
+ - [ ] No imports from \`../main-repo/src\`
1620
+ - [ ] Each test has its own temp directory if it touches the filesystem
1621
+ - [ ] File is named after the feature area, not the spec
1622
+ `,
1623
+ "scenarios/workflows/generate.yml": `# Scenario Generation Workflow
1624
+ #
1625
+ # Triggered by a \`spec-pushed\` repository_dispatch event sent from the main
1626
+ # project when a spec is added or modified on main. A scenario agent triages
1627
+ # the spec and writes or updates holdout tests in this repo.
1628
+ #
1629
+ # After the agent commits changes, fires \`scenarios-updated\` back to the main
1630
+ # repo so that any open PRs are re-tested with the new/updated scenarios.
1631
+ #
1632
+ # Prerequisites:
1633
+ # - ANTHROPIC_API_KEY secret: Anthropic API key for Claude Code
1634
+ # - JOYCRAFT_APP_PRIVATE_KEY secret: GitHub App private key (.pem)
1635
+ # - $JOYCRAFT_APP_ID is replaced with the actual App ID number at install time
1636
+
1637
+ name: Generate Scenarios
1638
+
1639
+ on:
1640
+ repository_dispatch:
1641
+ types: [spec-pushed]
1642
+
1643
+ jobs:
1644
+ generate:
1645
+ name: Run scenario agent
1646
+ runs-on: ubuntu-latest
1647
+
1648
+ steps:
1649
+ # \u2500\u2500 1. Check out the scenarios repo \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
1650
+ - name: Checkout scenarios repo
1651
+ uses: actions/checkout@v4
1652
+
1653
+ # \u2500\u2500 2. Save incoming spec to local mirror \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
1654
+ # The agent reads this file to understand what changed.
1655
+ - name: Save spec to mirror
1656
+ run: |
1657
+ mkdir -p specs
1658
+ cat > "specs/\${{ github.event.client_payload.spec_filename }}" << 'SPEC_EOF'
1659
+ \${{ github.event.client_payload.spec_content }}
1660
+ SPEC_EOF
1661
+ echo "Saved \${{ github.event.client_payload.spec_filename }} to specs/"
1662
+
1663
+ # \u2500\u2500 3. Gather context for the agent \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
1664
+ # Bounded context: filenames only (not file contents) to stay within
1665
+ # token limits. The agent uses these lists to decide whether to create
1666
+ # a new test file or update an existing one.
1667
+ - name: Gather context
1668
+ id: context
1669
+ run: |
1670
+ EXISTING_TESTS=$(find . -name "*.test.ts" -not -path "./.git/*" \\
1671
+ | sed 's|^\\./||' | sort | tr '\\n' ',' | sed 's/,$//')
1672
+ EXISTING_SPECS=$(find specs/ -name "*.md" 2>/dev/null \\
1673
+ | sed 's|^specs/||' | sort | tr '\\n' ',' | sed 's/,$//')
1674
+
1675
+ echo "existing_tests=$EXISTING_TESTS" >> "$GITHUB_OUTPUT"
1676
+ echo "existing_specs=$EXISTING_SPECS" >> "$GITHUB_OUTPUT"
1677
+ echo "Existing test files: $EXISTING_TESTS"
1678
+ echo "Existing spec mirrors: $EXISTING_SPECS"
1679
+
1680
+ # \u2500\u2500 4. Set up Node.js \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
1681
+ - name: Set up Node.js
1682
+ uses: actions/setup-node@v4
1683
+ with:
1684
+ node-version: "20"
1685
+
1686
+ # \u2500\u2500 5. Install Claude Code CLI \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
1687
+ - name: Install Claude Code
1688
+ run: npm install -g @anthropic-ai/claude-code
1689
+
1690
+ # \u2500\u2500 6. Run scenario agent \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
1691
+ # - Uses \`claude -p\` (prompt mode) for non-interactive execution.
1692
+ # - No --model flag: the environment's default model is used.
1693
+ # - --dangerously-skip-permissions lets Claude write files without prompts.
1694
+ # - --max-turns 20 caps the agentic loop so it can't run indefinitely.
1695
+ - name: Run scenario agent
1696
+ id: agent
1697
+ env:
1698
+ ANTHROPIC_API_KEY: \${{ secrets.ANTHROPIC_API_KEY }}
1699
+ run: |
1700
+ PROMPT=$(cat .claude/prompts/scenario-agent.md 2>/dev/null || cat prompts/scenario-agent.md)
1701
+
1702
+ claude -p \\
1703
+ --dangerously-skip-permissions \\
1704
+ --max-turns 20 \\
1705
+ "\${PROMPT}
1706
+
1707
+ ---
1708
+
1709
+ ## Incoming Spec
1710
+
1711
+ Filename: \${{ github.event.client_payload.spec_filename }}
1712
+
1713
+ Content:
1714
+ $(cat 'specs/\${{ github.event.client_payload.spec_filename }}')
1715
+
1716
+ ---
1717
+
1718
+ ## Context
1719
+
1720
+ Existing test files in this repo: \${{ steps.context.outputs.existing_tests }}
1721
+ Existing spec mirrors: \${{ steps.context.outputs.existing_specs }}"
1722
+
1723
+ # \u2500\u2500 7. Commit any changes the agent made \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
1724
+ - name: Commit scenario changes
1725
+ id: commit
1726
+ run: |
1727
+ git config user.name "Joycraft Scenario Agent"
1728
+ git config user.email "joycraft-scenarios@users.noreply.github.com"
1729
+
1730
+ git add -A
1731
+
1732
+ if git diff --cached --quiet; then
1733
+ echo "No changes to commit \u2014 spec triaged as no-op."
1734
+ echo "committed=false" >> "$GITHUB_OUTPUT"
1735
+ exit 0
1736
+ fi
1737
+
1738
+ git commit -m "scenarios: update tests for \${{ github.event.client_payload.spec_filename }}"
1739
+ git push
1740
+ echo "committed=true" >> "$GITHUB_OUTPUT"
1741
+
1742
+ # \u2500\u2500 8. Generate GitHub App token for cross-repo dispatch \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
1743
+ # Only needed if the agent committed changes (otherwise nothing to re-run).
1744
+ - name: Generate GitHub App token
1745
+ id: app-token
1746
+ if: steps.commit.outputs.committed == 'true'
1747
+ uses: actions/create-github-app-token@v1
1748
+ with:
1749
+ app-id: $JOYCRAFT_APP_ID
1750
+ private-key: \${{ secrets.JOYCRAFT_APP_PRIVATE_KEY }}
1751
+ repositories: \${{ github.event.client_payload.repo }}
1752
+
1753
+ # \u2500\u2500 9. Notify main repo that scenarios were updated \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
1754
+ # Fires \`scenarios-updated\` so the main repo's re-run workflow can
1755
+ # trigger scenario runs against any open PRs that may now be affected.
1756
+ - name: Dispatch scenarios-updated to main repo
1757
+ if: steps.commit.outputs.committed == 'true'
1758
+ env:
1759
+ GH_TOKEN: \${{ steps.app-token.outputs.token }}
1760
+ run: |
1761
+ REPO="\${{ github.event.client_payload.repo }}"
1762
+ REPO_OWNER="\${REPO%%/*}"
1763
+ REPO_NAME="\${REPO##*/}"
1764
+
1765
+ gh api "repos/\${REPO_OWNER}/\${REPO_NAME}/dispatches" \\
1766
+ -f event_type=scenarios-updated \\
1767
+ -f "client_payload[spec_filename]=\${{ github.event.client_payload.spec_filename }}" \\
1768
+ -f "client_payload[scenarios_repo]=\${{ github.repository }}"
1769
+
1770
+ echo "Dispatched scenarios-updated to \${REPO}"
1771
+ `,
1772
+ "scenarios/workflows/run.yml": `# Scenarios Run Workflow
1773
+ #
1774
+ # Triggered by a \`repository_dispatch\` event (type: run-scenarios) sent from
1775
+ # the main project's CI pipeline after a PR passes its internal tests.
1776
+ #
1777
+ # This workflow:
1778
+ # 1. Clones the main repo's PR branch to ../main-repo
1779
+ # 2. Builds the artifact
1780
+ # 3. Runs the scenario tests in this repo
1781
+ # 4. Posts a PASS or FAIL comment on the originating PR
1782
+ #
1783
+ # Prerequisites:
1784
+ # - A GitHub App ("Joycraft Autofix" or equivalent) installed on BOTH repos.
1785
+ # $JOYCRAFT_APP_ID is replaced with the actual App ID number at install time.
1786
+ # JOYCRAFT_APP_PRIVATE_KEY must be stored as a repository secret in this repo.
1787
+ # - This scenarios repo must be added to the App's repository access list.
1788
+
1789
+ name: Run Scenarios
1790
+
1791
+ on:
1792
+ repository_dispatch:
1793
+ types: [run-scenarios]
1794
+
1795
+ jobs:
1796
+ run-scenarios:
1797
+ name: Run holdout scenario tests
1798
+ runs-on: ubuntu-latest
1799
+
1800
+ steps:
1801
+ # \u2500\u2500 1. Check out the scenarios repo \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
1802
+ - name: Checkout scenarios repo
1803
+ uses: actions/checkout@v4
1804
+ with:
1805
+ path: scenarios
1806
+
1807
+ # \u2500\u2500 2. Mint a GitHub App token \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
1808
+ # $JOYCRAFT_APP_ID is replaced with the numeric App ID at install time
1809
+ # (e.g., app-id: 3180156). It is NOT a secret \u2014 App IDs are public.
1810
+ - name: Generate GitHub App token
1811
+ id: app-token
1812
+ uses: actions/create-github-app-token@v1
1813
+ with:
1814
+ app-id: $JOYCRAFT_APP_ID
1815
+ private-key: \${{ secrets.JOYCRAFT_APP_PRIVATE_KEY }}
1816
+ repositories: \${{ github.event.client_payload.repo }}
1817
+
1818
+ # \u2500\u2500 3. Clone the main repo's PR branch \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
1819
+ # Cloned to ./main-repo so scenario tests can reference ../main-repo
1820
+ # relative to the scenarios/ checkout.
1821
+ - name: Clone main repo PR branch
1822
+ env:
1823
+ GH_TOKEN: \${{ steps.app-token.outputs.token }}
1824
+ run: |
1825
+ git clone \\
1826
+ --branch \${{ github.event.client_payload.branch }} \\
1827
+ --depth 1 \\
1828
+ https://x-access-token:\${GH_TOKEN}@github.com/\${{ github.event.client_payload.repo }}.git \\
1829
+ main-repo
1830
+
1831
+ # \u2500\u2500 4. Set up Node.js \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
1832
+ - name: Set up Node.js
1833
+ uses: actions/setup-node@v4
1834
+ with:
1835
+ node-version: "20"
1836
+ cache: "npm"
1837
+ cache-dependency-path: main-repo/package-lock.json
1838
+
1839
+ # \u2500\u2500 5. Build the main repo artifact \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
1840
+ - name: Build main repo
1841
+ working-directory: main-repo
1842
+ run: npm ci && npm run build
1843
+
1844
+ # \u2500\u2500 6. Install scenario test dependencies \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
1845
+ - name: Install scenario dependencies
1846
+ working-directory: scenarios
1847
+ run: npm ci
1848
+
1849
+ # \u2500\u2500 7. Run scenario tests \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
1850
+ # set +e \u2014 don't abort on non-zero exit; we capture it manually
1851
+ # set -o pipefail \u2014 propagate failures through pipes (for tee)
1852
+ # NO_COLOR=1 \u2014 strip color codes before they reach tee
1853
+ # ANSI codes are also stripped via sed as a belt-and-suspenders measure
1854
+ - name: Run scenario tests
1855
+ id: scenarios
1856
+ working-directory: scenarios
1857
+ run: |
1858
+ set +e
1859
+ set -o pipefail
1860
+ NO_COLOR=1 npx vitest run 2>&1 \\
1861
+ | sed 's/\\x1b\\[[0-9;]*m//g' \\
1862
+ | tee test-output.txt
1863
+ VITEST_EXIT=$?
1864
+ echo "exit_code=$VITEST_EXIT" >> "$GITHUB_OUTPUT"
1865
+ exit $VITEST_EXIT
1866
+
1867
+ # \u2500\u2500 8. Post PASS or FAIL comment on the originating PR \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
1868
+ # Always runs so the PR author always gets feedback.
1869
+ - name: Post result comment on PR
1870
+ if: always()
1871
+ env:
1872
+ GH_TOKEN: \${{ steps.app-token.outputs.token }}
1873
+ PR_NUMBER: \${{ github.event.client_payload.pr_number }}
1874
+ MAIN_REPO: \${{ github.event.client_payload.repo }}
1875
+ VITEST_EXIT: \${{ steps.scenarios.outputs.exit_code }}
1876
+ run: |
1877
+ # Read test output (cap at 100 lines to keep the comment manageable)
1878
+ OUTPUT=$(head -100 scenarios/test-output.txt 2>/dev/null || echo "(no output captured)")
1879
+
1880
+ if [ "$VITEST_EXIT" = "0" ]; then
1881
+ STATUS_LINE="**Scenario tests: PASS**"
1882
+ else
1883
+ STATUS_LINE="**Scenario tests: FAIL** (exit code: $VITEST_EXIT)"
1884
+ fi
1885
+
1886
+ BODY="\${STATUS_LINE}
1887
+
1888
+ <details>
1889
+ <summary>Test output</summary>
1890
+
1891
+ \\\`\\\`\\\`
1892
+ \${OUTPUT}
1893
+ \\\`\\\`\\\`
1894
+
1895
+ </details>
1896
+
1897
+ Run triggered by commit \\\`\${{ github.event.client_payload.sha }}\\\`."
1898
+
1899
+ gh api "repos/\${MAIN_REPO}/issues/\${PR_NUMBER}/comments" \\
1900
+ -f body="$BODY"
1901
+ `,
1902
+ "workflows/autofix.yml": `# Autofix Workflow
1903
+ #
1904
+ # Triggered when CI fails on a PR. Uses Claude Code to attempt an automated fix,
1905
+ # then pushes a commit and re-triggers CI. Limits to 3 autofix attempts per PR
1906
+ # before escalating to human review.
1907
+ #
1908
+ # Prerequisites:
1909
+ # - A GitHub App called "Joycraft Autofix" (or equivalent) installed on the repo.
1910
+ # Its credentials must be stored as repository secrets:
1911
+ # JOYCRAFT_APP_ID \u2014 the App's numeric ID
1912
+ # JOYCRAFT_APP_PRIVATE_KEY \u2014 the App's PEM private key
1913
+ # - ANTHROPIC_API_KEY secret for Claude Code
1914
+
1915
+ name: Autofix
1916
+
1917
+ on:
1918
+ workflow_run:
1919
+ # Replace with the exact name of your CI workflow
1920
+ workflows: ["CI"]
1921
+ types: [completed]
1922
+
1923
+ # One autofix run per PR at a time \u2014 cancel in-flight runs for the same PR
1924
+ concurrency:
1925
+ group: autofix-pr-\${{ github.event.workflow_run.pull_requests[0].number }}
1926
+ cancel-in-progress: true
1927
+
1928
+ jobs:
1929
+ autofix:
1930
+ name: Attempt automated fix
1931
+ runs-on: ubuntu-latest
1932
+
1933
+ # Only run when CI failed and the triggering workflow was on a PR
1934
+ if: |
1935
+ github.event.workflow_run.conclusion == 'failure' &&
1936
+ github.event.workflow_run.pull_requests[0] != null
1937
+
1938
+ steps:
1939
+ # \u2500\u2500 1. Mint a short-lived GitHub App token \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
1940
+ # Using a dedicated App identity lets this workflow push commits without
1941
+ # triggering GitHub's anti-recursion protection on the GITHUB_TOKEN.
1942
+ - name: Generate GitHub App token
1943
+ id: app-token
1944
+ uses: actions/create-github-app-token@v1
1945
+ with:
1946
+ # $JOYCRAFT_APP_ID is replaced with the actual App ID number at install time
1947
+ app-id: $JOYCRAFT_APP_ID
1948
+ private-key: \${{ secrets.JOYCRAFT_APP_PRIVATE_KEY }}
1949
+
1950
+ # \u2500\u2500 2. Check out the PR branch \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
1951
+ # We check out the exact branch (not a merge ref) so that any commit we
1952
+ # push lands directly on the PR branch.
1953
+ - name: Checkout PR branch
1954
+ uses: actions/checkout@v4
1955
+ with:
1956
+ token: \${{ steps.app-token.outputs.token }}
1957
+ ref: \${{ github.event.workflow_run.pull_requests[0].head.ref }}
1958
+ fetch-depth: 0
1959
+
1960
+ # \u2500\u2500 3. Count previous autofix attempts \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
1961
+ # Count "autofix:" commits in the log. If we have already made 3 attempts
1962
+ # on this PR, stop and ask a human to review instead.
1963
+ - name: Check autofix iteration count
1964
+ id: iteration
1965
+ run: |
1966
+ COUNT=$(git log --oneline | grep "autofix:" | wc -l | tr -d ' ')
1967
+ echo "count=$COUNT" >> "$GITHUB_OUTPUT"
1968
+ echo "Autofix attempts so far: $COUNT"
1969
+
1970
+ # \u2500\u2500 4. Post "human review needed" and exit if limit reached \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
1971
+ - name: Post human-review comment and exit
1972
+ if: steps.iteration.outputs.count >= 3
1973
+ env:
1974
+ GH_TOKEN: \${{ steps.app-token.outputs.token }}
1975
+ PR_NUMBER: \${{ github.event.workflow_run.pull_requests[0].number }}
1976
+ run: |
1977
+ gh pr comment "$PR_NUMBER" \\
1978
+ --body "**Autofix limit reached (3 attempts).** Please review manually \u2014 Claude was unable to resolve the CI failures automatically."
1979
+ echo "Max iterations reached. Exiting without further autofix."
1980
+ exit 0
1981
+
1982
+ # \u2500\u2500 5. Fetch the CI failure logs \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
1983
+ # Download logs from the failed workflow run so Claude has concrete
1984
+ # failure context to work from. ANSI escape codes are stripped so the
1985
+ # logs are readable as plain text.
1986
+ - name: Fetch CI failure logs
1987
+ id: logs
1988
+ env:
1989
+ GH_TOKEN: \${{ github.token }}
1990
+ RUN_ID: \${{ github.event.workflow_run.id }}
1991
+ run: |
1992
+ gh run view "$RUN_ID" --log-failed 2>&1 \\
1993
+ | sed 's/\\x1b\\[[0-9;]*m//g' \\
1994
+ > /tmp/ci-failure.log
1995
+ echo "=== CI failure log (first 200 lines) ==="
1996
+ head -200 /tmp/ci-failure.log
1997
+
1998
+ # \u2500\u2500 6. Set up Node.js (adjust version to match your project) \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
1999
+ - name: Set up Node.js
2000
+ uses: actions/setup-node@v4
2001
+ with:
2002
+ node-version: "20"
2003
+ cache: "npm"
2004
+
2005
+ # \u2500\u2500 7. Install project dependencies \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2006
+ - name: Install dependencies
2007
+ run: npm ci
2008
+
2009
+ # \u2500\u2500 8. Install Claude Code CLI \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2010
+ - name: Install Claude Code
2011
+ run: npm install -g @anthropic-ai/claude-code
2012
+
2013
+ # \u2500\u2500 9. Run Claude Code to fix the failure \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2014
+ # - Uses \`claude -p\` (prompt mode) so it runs non-interactively.
2015
+ # - No --model flag: the environment's default model is used.
2016
+ # - --dangerously-skip-permissions lets Claude edit files without prompts.
2017
+ # - --max-turns 20 caps the agentic loop so it can't run indefinitely.
2018
+ # - set +e captures the exit code without aborting the step immediately.
2019
+ # - set -o pipefail ensures piped commands propagate failures correctly.
2020
+ - name: Run Claude Code autofix
2021
+ id: claude
2022
+ env:
2023
+ ANTHROPIC_API_KEY: \${{ secrets.ANTHROPIC_API_KEY }}
2024
+ GH_TOKEN: \${{ steps.app-token.outputs.token }}
2025
+ run: |
2026
+ set +e
2027
+ set -o pipefail
2028
+
2029
+ FAILURE_LOG=$(cat /tmp/ci-failure.log)
2030
+
2031
+ claude -p \\
2032
+ --dangerously-skip-permissions \\
2033
+ --max-turns 20 \\
2034
+ "CI is failing on this PR. Here are the failure logs:
2035
+
2036
+ \${FAILURE_LOG}
2037
+
2038
+ Please investigate the root cause, fix the code, and make sure the tests pass.
2039
+ Do not modify workflow files. Focus only on source code and test files.
2040
+ After making changes, run the test suite to verify the fix works." \\
2041
+ 2>&1 | sed 's/\\x1b\\[[0-9;]*m//g' | tee /tmp/claude-output.log
2042
+
2043
+ CLAUDE_EXIT=$?
2044
+ echo "exit_code=$CLAUDE_EXIT" >> "$GITHUB_OUTPUT"
2045
+ exit $CLAUDE_EXIT
2046
+
2047
+ # \u2500\u2500 10. Commit and push any changes Claude made \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2048
+ # If Claude modified files, commit them with an "autofix:" prefix so the
2049
+ # iteration counter in step 3 can find them on future runs.
2050
+ - name: Commit and push autofix changes
2051
+ if: steps.claude.outputs.exit_code == '0'
2052
+ env:
2053
+ GH_TOKEN: \${{ steps.app-token.outputs.token }}
2054
+ run: |
2055
+ git config user.name "Joycraft Autofix"
2056
+ git config user.email "autofix@joycraft.dev"
2057
+
2058
+ git add -A
2059
+
2060
+ if git diff --cached --quiet; then
2061
+ echo "No changes to commit \u2014 Claude made no file modifications."
2062
+ exit 0
2063
+ fi
2064
+
2065
+ ITERATION=\${{ steps.iteration.outputs.count }}
2066
+ NEXT=$(( ITERATION + 1 ))
2067
+
2068
+ git commit -m "autofix: attempt $NEXT \u2014 fix CI failures [skip autofix]"
2069
+ git push
2070
+
2071
+ # \u2500\u2500 11. Post a summary comment on the PR \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2072
+ # Always post a comment so the PR author knows what happened.
2073
+ - name: Post result comment
2074
+ if: always()
2075
+ env:
2076
+ GH_TOKEN: \${{ steps.app-token.outputs.token }}
2077
+ PR_NUMBER: \${{ github.event.workflow_run.pull_requests[0].number }}
2078
+ CLAUDE_EXIT: \${{ steps.claude.outputs.exit_code }}
2079
+ run: |
2080
+ if [ "$CLAUDE_EXIT" = "0" ]; then
2081
+ BODY="**Autofix pushed a fix.** CI has been re-triggered. If it still fails, another autofix attempt will run (up to 3 total)."
2082
+ else
2083
+ BODY="**Autofix ran but could not produce a clean fix** (exit code: $CLAUDE_EXIT). Please review the logs and fix manually."
2084
+ fi
2085
+
2086
+ gh pr comment "$PR_NUMBER" --body "$BODY"
2087
+ `,
2088
+ "workflows/scenarios-dispatch.yml": `# Scenarios Dispatch Workflow
2089
+ #
2090
+ # Triggered when CI passes on a PR. Fires a \`repository_dispatch\` event to a
2091
+ # separate scenarios repository so that integration / end-to-end scenario tests
2092
+ # can run against the PR's code without living in this repo.
2093
+ #
2094
+ # Prerequisites:
2095
+ # - JOYCRAFT_APP_PRIVATE_KEY secret: GitHub App private key (.pem)
2096
+ # - $SCENARIOS_REPO is replaced with the actual repo name at install time
2097
+
2098
+ name: Scenarios Dispatch
2099
+
2100
+ on:
2101
+ workflow_run:
2102
+ # Replace with the exact name of your CI workflow
2103
+ workflows: ["CI"]
2104
+ types: [completed]
2105
+
2106
+ jobs:
2107
+ dispatch:
2108
+ name: Fire scenarios dispatch
2109
+ runs-on: ubuntu-latest
2110
+
2111
+ # Only run when CI succeeded and the triggering workflow was on a PR
2112
+ if: |
2113
+ github.event.workflow_run.conclusion == 'success' &&
2114
+ github.event.workflow_run.pull_requests[0] != null
2115
+
2116
+ steps:
2117
+ # \u2500\u2500 1. Generate GitHub App token for cross-repo dispatch \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2118
+ - name: Generate GitHub App token
2119
+ id: app-token
2120
+ uses: actions/create-github-app-token@v1
2121
+ with:
2122
+ app-id: $JOYCRAFT_APP_ID
2123
+ private-key: \${{ secrets.JOYCRAFT_APP_PRIVATE_KEY }}
2124
+ repositories: $SCENARIOS_REPO
2125
+
2126
+ # \u2500\u2500 2. Fire repository_dispatch to the scenarios repo \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2127
+ # Sends a \`run-scenarios\` event carrying enough context for the scenarios
2128
+ # repo to check out the correct branch/SHA and know which PR triggered it.
2129
+ # $SCENARIOS_REPO is replaced with the actual repo name at install time.
2130
+ - name: Dispatch run-scenarios event
2131
+ env:
2132
+ GH_TOKEN: \${{ steps.app-token.outputs.token }}
2133
+ run: |
2134
+ PR_NUMBER=\${{ github.event.workflow_run.pull_requests[0].number }}
2135
+ BRANCH=\${{ github.event.workflow_run.pull_requests[0].head.ref }}
2136
+ SHA=\${{ github.event.workflow_run.head_sha }}
2137
+
2138
+ gh api repos/\${{ github.repository_owner }}/$SCENARIOS_REPO/dispatches \\
2139
+ -f event_type=run-scenarios \\
2140
+ -f "client_payload[pr_number]=$PR_NUMBER" \\
2141
+ -f "client_payload[branch]=$BRANCH" \\
2142
+ -f "client_payload[sha]=$SHA" \\
2143
+ -f "client_payload[repo]=\${{ github.repository }}"
2144
+
2145
+ echo "Dispatched run-scenarios to $SCENARIOS_REPO for PR #$PR_NUMBER"
2146
+ `,
2147
+ "workflows/scenarios-rerun.yml": `# Scenarios Re-run Workflow
2148
+ #
2149
+ # Triggered when the scenarios repo reports that it has updated its tests
2150
+ # (type: scenarios-updated). Finds all open PRs and fires a \`run-scenarios\`
2151
+ # dispatch to the scenarios repo for each one, so that newly generated or
2152
+ # updated tests are exercised against in-flight PR branches.
2153
+ #
2154
+ # This handles the race condition where a PR's implementation completes before
2155
+ # the scenario agent has finished writing its holdout tests.
2156
+ #
2157
+ # Prerequisites:
2158
+ # - JOYCRAFT_APP_PRIVATE_KEY secret: GitHub App private key (.pem)
2159
+ # - $JOYCRAFT_APP_ID is replaced with the actual App ID number at install time
2160
+ # - $SCENARIOS_REPO is replaced with the actual scenarios repo name at install time
2161
+
2162
+ name: Scenarios Re-run
2163
+
2164
+ on:
2165
+ repository_dispatch:
2166
+ types: [scenarios-updated]
2167
+
2168
+ jobs:
2169
+ rerun:
2170
+ name: Re-run scenarios against open PRs
2171
+ runs-on: ubuntu-latest
2172
+
2173
+ steps:
2174
+ # \u2500\u2500 1. Generate GitHub App token for cross-repo dispatch \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2175
+ - name: Generate GitHub App token
2176
+ id: app-token
2177
+ uses: actions/create-github-app-token@v1
2178
+ with:
2179
+ app-id: $JOYCRAFT_APP_ID
2180
+ private-key: \${{ secrets.JOYCRAFT_APP_PRIVATE_KEY }}
2181
+ repositories: $SCENARIOS_REPO
2182
+
2183
+ # \u2500\u2500 2. List open PRs and dispatch run-scenarios for each \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2184
+ # If there are no open PRs, exits cleanly \u2014 nothing to do.
2185
+ - name: Dispatch run-scenarios for each open PR
2186
+ env:
2187
+ GH_TOKEN: \${{ steps.app-token.outputs.token }}
2188
+ run: |
2189
+ OPEN_PRS=$(gh api repos/\${{ github.repository }}/pulls \\
2190
+ --jq '.[] | "\\(.number) \\(.head.ref) \\(.head.sha)"')
2191
+
2192
+ if [ -z "$OPEN_PRS" ]; then
2193
+ echo "No open PRs \u2014 nothing to re-run."
2194
+ exit 0
2195
+ fi
2196
+
2197
+ while IFS=' ' read -r PR_NUMBER BRANCH SHA; do
2198
+ [ -z "$PR_NUMBER" ] && continue
2199
+
2200
+ echo "Dispatching run-scenarios for PR #$PR_NUMBER (branch: $BRANCH, sha: $SHA)"
2201
+
2202
+ gh api repos/\${{ github.repository_owner }}/$SCENARIOS_REPO/dispatches \\
2203
+ -f event_type=run-scenarios \\
2204
+ -f "client_payload[pr_number]=$PR_NUMBER" \\
2205
+ -f "client_payload[branch]=$BRANCH" \\
2206
+ -f "client_payload[sha]=$SHA" \\
2207
+ -f "client_payload[repo]=\${{ github.repository }}"
2208
+
2209
+ done <<< "$OPEN_PRS"
2210
+ `,
2211
+ "workflows/spec-dispatch.yml": `# Spec Dispatch Workflow
2212
+ #
2213
+ # Triggered when specs are pushed to main. For each added or modified spec,
2214
+ # fires a \`spec-pushed\` repository_dispatch event to the scenarios repo so
2215
+ # that a scenario agent can triage the spec and write/update holdout tests.
2216
+ #
2217
+ # Prerequisites:
2218
+ # - JOYCRAFT_APP_PRIVATE_KEY secret: GitHub App private key (.pem)
2219
+ # - $JOYCRAFT_APP_ID is replaced with the actual App ID number at install time
2220
+ # - $SCENARIOS_REPO is replaced with the actual scenarios repo name at install time
2221
+
2222
+ name: Spec Dispatch
2223
+
2224
+ on:
2225
+ push:
2226
+ branches: [main]
2227
+ paths:
2228
+ - "docs/specs/**"
2229
+
2230
+ jobs:
2231
+ dispatch:
2232
+ name: Dispatch changed specs to scenarios repo
2233
+ runs-on: ubuntu-latest
2234
+
2235
+ steps:
2236
+ # \u2500\u2500 1. Check out with depth 2 to enable HEAD~1 diff \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2237
+ - name: Checkout
2238
+ uses: actions/checkout@v4
2239
+ with:
2240
+ fetch-depth: 2
2241
+
2242
+ # \u2500\u2500 2. Find added or modified spec files \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2243
+ # --diff-filter=AM: Added or Modified only \u2014 ignore deletions.
2244
+ - name: Find changed specs
2245
+ id: changed
2246
+ run: |
2247
+ FILES=$(git diff --name-only --diff-filter=AM HEAD~1 HEAD -- 'docs/specs/*.md')
2248
+ echo "files<<EOF" >> "$GITHUB_OUTPUT"
2249
+ echo "$FILES" >> "$GITHUB_OUTPUT"
2250
+ echo "EOF" >> "$GITHUB_OUTPUT"
2251
+ echo "Changed specs: $FILES"
2252
+
2253
+ # \u2500\u2500 3. Generate GitHub App token for cross-repo dispatch \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2254
+ # Skipped if no specs changed (token unused, save the round-trip).
2255
+ - name: Generate GitHub App token
2256
+ id: app-token
2257
+ if: steps.changed.outputs.files != ''
2258
+ uses: actions/create-github-app-token@v1
2259
+ with:
2260
+ app-id: $JOYCRAFT_APP_ID
2261
+ private-key: \${{ secrets.JOYCRAFT_APP_PRIVATE_KEY }}
2262
+ repositories: $SCENARIOS_REPO
2263
+
2264
+ # \u2500\u2500 4. Dispatch each changed spec to the scenarios repo \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
2265
+ # Sends a \`spec-pushed\` event with the spec filename, full content,
2266
+ # commit SHA, branch, and originating repo. The scenario agent uses
2267
+ # this payload to triage and generate/update tests.
2268
+ - name: Dispatch spec-pushed events
2269
+ if: steps.changed.outputs.files != ''
2270
+ env:
2271
+ GH_TOKEN: \${{ steps.app-token.outputs.token }}
2272
+ run: |
2273
+ while IFS= read -r SPEC_FILE; do
2274
+ [ -z "$SPEC_FILE" ] && continue
2275
+
2276
+ SPEC_FILENAME=$(basename "$SPEC_FILE")
2277
+ SPEC_CONTENT=$(cat "$SPEC_FILE")
2278
+
2279
+ echo "Dispatching spec-pushed for $SPEC_FILENAME"
2280
+
2281
+ gh api repos/\${{ github.repository_owner }}/$SCENARIOS_REPO/dispatches \\
2282
+ -f event_type=spec-pushed \\
2283
+ -f "client_payload[spec_filename]=$SPEC_FILENAME" \\
2284
+ -f "client_payload[spec_content]=$SPEC_CONTENT" \\
2285
+ -f "client_payload[commit_sha]=\${{ github.sha }}" \\
2286
+ -f "client_payload[branch]=\${{ github.ref_name }}" \\
2287
+ -f "client_payload[repo]=\${{ github.repository }}"
2288
+
2289
+ done <<< "\${{ steps.changed.outputs.files }}"
2290
+ `
2291
+ };
2292
+
2293
+ export {
2294
+ SKILLS,
2295
+ TEMPLATES
2296
+ };
2297
+ //# sourceMappingURL=chunk-HHW4Q2UC.js.map