joycraft 0.5.7 → 0.5.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -174,355 +174,9 @@ Joycraft tracks what it installed vs. what you've customized. Unmodified files u
174
174
 
175
175
  > **A note on complexity:** Setting up Level 5 does have some moving parts and, depending on the complexity of your stack (software vs. hardware, monorepo vs. single app, etc.), this will require a good amount of prompting and trial-and-error to get right. I've done my best to make this as painless as possible, but just note - this is not a one-shot-prompt-done-in-5-minutes kind of thing. For small projects and simple stacks it will be easy, but any level of complexity is going to take some iteration, so plan ahead. Full step-by-step guides along with a video coming soon.
176
176
 
177
- Level 5 is where specs go in and validated software comes out. Joycraft implements this as four interlocking GitHub Actions workflows, a separate scenarios repository, and two independent AI agents that can never see each other's work.
177
+ Level 5 is where specs go in and validated software comes out four GitHub Actions workflows, a separate scenarios repo, and two AI agents that can never see each other's work. Run `/joycraft-implement-level5` for guided setup, or `npx joycraft init-autofix` via CLI.
178
178
 
179
- Run `/joycraft-implement-level5` in Claude Code for a guided setup, or use the CLI directly:
180
-
181
- ```bash
182
- npx joycraft init-autofix --scenarios-repo my-project-scenarios --app-id 3180156
183
- ```
184
-
185
- ### Architecture Overview
186
-
187
- Level 5 has four moving parts. Each is a GitHub Actions workflow that communicates via `repository_dispatch` events. No custom servers, no webhooks, no external services.
188
-
189
- ```mermaid
190
- graph TB
191
- subgraph "Main Repository"
192
- A[Push specs to docs/specs/] -->|push to main| B[Spec Dispatch Workflow]
193
- C[PR opened] --> D[CI runs]
194
- D -->|CI fails| E[Autofix Workflow]
195
- D -->|CI passes| F[Scenarios Dispatch Workflow]
196
- G[Scenarios Re-run Workflow]
197
- end
198
-
199
- subgraph "Scenarios Repository (private)"
200
- H[Scenario Generation Workflow]
201
- I[Scenario Run Workflow]
202
- J[Holdout Tests]
203
- K[Specs Mirror]
204
- end
205
-
206
- B -->|repository_dispatch: spec-pushed| H
207
- H -->|reads specs, writes tests| J
208
- H -->|repository_dispatch: scenarios-updated| G
209
- G -->|repository_dispatch: run-scenarios| I
210
- F -->|repository_dispatch: run-scenarios| I
211
- I -->|posts PASS/FAIL comment| C
212
- E -->|Claude fixes code, pushes| D
213
-
214
- style J fill:#f9f,stroke:#333
215
- style K fill:#bbf,stroke:#333
216
- ```
217
-
218
- ### The Four Workflows
219
-
220
- #### 1. Autofix Workflow (`autofix.yml`)
221
-
222
- Triggered when CI **fails** on a PR. Claude Code CLI reads the failure logs and attempts a fix.
223
-
224
- ```mermaid
225
- sequenceDiagram
226
- participant CI as CI Workflow
227
- participant AF as Autofix Workflow
228
- participant Claude as Claude Code CLI
229
- participant PR as Pull Request
230
-
231
- CI->>AF: workflow_run (conclusion: failure)
232
- AF->>AF: Generate GitHub App token
233
- AF->>AF: Checkout PR branch
234
- AF->>AF: Count previous autofix attempts
235
-
236
- alt attempts >= 3
237
- AF->>PR: Comment: "Human review needed"
238
- else attempts < 3
239
- AF->>AF: Fetch CI failure logs
240
- AF->>AF: Strip ANSI codes
241
- AF->>Claude: claude -p "Fix this CI failure..." <br/> --dangerously-skip-permissions --max-turns 20
242
- Claude->>Claude: Read logs, edit code, run tests
243
- Claude->>AF: Exit (changes committed locally)
244
- AF->>PR: Push fix (commit prefix: "autofix:")
245
- AF->>PR: Comment: summary of fix
246
- Note over CI,PR: CI re-runs automatically on push
247
- end
248
- ```
249
-
250
- **Key details:**
251
- - Uses a GitHub App identity for pushes to avoid GitHub's anti-recursion protection
252
- - Concurrency group per PR so only one autofix runs at a time
253
- - Max 3 iterations, then posts "human review needed"
254
- - No `--model` flag. Claude CLI handles model selection.
255
- - Strips ANSI escape codes from logs so Claude gets clean text
256
-
257
- #### 2. Scenarios Dispatch Workflow (`scenarios-dispatch.yml`)
258
-
259
- Triggered when CI **passes** on a PR. Fires a `repository_dispatch` to the scenarios repo to run holdout tests against the PR branch.
260
-
261
- ```mermaid
262
- sequenceDiagram
263
- participant CI as CI Workflow
264
- participant SD as Scenarios Dispatch
265
- participant SR as Scenarios Repo
266
-
267
- CI->>SD: workflow_run (conclusion: success, PR)
268
- SD->>SD: Generate GitHub App token
269
- SD->>SR: repository_dispatch: run-scenarios<br/>payload: {pr_number, branch, sha, repo}
270
- ```
271
-
272
- #### 3. Spec Dispatch Workflow (`spec-dispatch.yml`)
273
-
274
- Triggered when spec files are pushed to `main`. Sends the spec content to the scenarios repo so the scenario agent can write tests.
275
-
276
- ```mermaid
277
- sequenceDiagram
278
- participant Dev as Developer
279
- participant Main as Main Repo (push to main)
280
- participant SPD as Spec Dispatch Workflow
281
- participant SR as Scenarios Repo
282
-
283
- Dev->>Main: Push specs to docs/specs/
284
- Main->>SPD: push event (docs/specs/** changed)
285
- SPD->>SPD: git diff --diff-filter=AM (added/modified only)
286
-
287
- loop For each changed spec
288
- SPD->>SR: repository_dispatch: spec-pushed<br/>payload: {spec_filename, spec_content, commit_sha, branch, repo}
289
- end
290
-
291
- Note over SPD: Deleted specs are ignored -<br/>existing scenario tests remain
292
- ```
293
-
294
- #### 4. Scenarios Re-run Workflow (`scenarios-rerun.yml`)
295
-
296
- Triggered when the scenarios repo updates its tests. Re-dispatches all open PRs to the scenarios repo so they get tested with the latest holdout tests.
297
-
298
- ```mermaid
299
- sequenceDiagram
300
- participant SR as Scenarios Repo
301
- participant RR as Re-run Workflow
302
- participant SRun as Scenarios Run
303
-
304
- SR->>RR: repository_dispatch: scenarios-updated
305
- RR->>RR: List open PRs via GitHub API
306
-
307
- alt No open PRs
308
- RR->>RR: Exit (no-op)
309
- else Has open PRs
310
- loop For each open PR
311
- RR->>SRun: repository_dispatch: run-scenarios<br/>payload: {pr_number, branch, sha, repo}
312
- end
313
- end
314
- ```
315
-
316
- **Why this exists:** There's a race condition. The implementation agent might open a PR before the scenario agent finishes writing new tests. The re-run workflow handles this by re-testing all open PRs when new tests land. Worst case, a PR merges before the re-run, and the new tests protect the very next PR. You're never more than one cycle behind.
317
-
318
- ### The Holdout Wall
319
-
320
- The core safety mechanism. Two agents, two repos, one shared interface (specs):
321
-
322
- ```mermaid
323
- graph LR
324
- subgraph "Implementation Agent (main repo)"
325
- IA_sees["Can see:<br/>Source code<br/>Internal tests<br/>Specs"]
326
- IA_cant["Cannot see:<br/>Scenario tests<br/>Scenario repo"]
327
- end
328
-
329
- subgraph "Specs (shared interface)"
330
- Specs["docs/specs/*.md<br/>Describes WHAT should happen<br/>Never describes HOW it's tested"]
331
- end
332
-
333
- subgraph "Scenario Agent (scenarios repo)"
334
- SA_sees["Can see:<br/>Specs (via dispatch)<br/>Scenario tests<br/>Specs mirror"]
335
- SA_cant["Cannot see:<br/>Source code<br/>Internal tests"]
336
- end
337
-
338
- IA_sees --> Specs
339
- Specs --> SA_sees
340
-
341
- style IA_cant fill:#fcc,stroke:#933
342
- style SA_cant fill:#fcc,stroke:#933
343
- style Specs fill:#cfc,stroke:#393
344
- ```
345
-
346
- This is the same principle as a holdout set in machine learning. If the implementation agent could see the scenario tests, it would optimize to pass them specifically instead of building correct software. By keeping the wall intact, scenario tests catch real behavioral regressions, not test-gaming.
347
-
348
- ### Scenario Evolution
349
-
350
- Scenarios aren't static. When you push new specs, the scenario agent automatically triages them and writes new holdout tests.
351
-
352
- ```mermaid
353
- flowchart TD
354
- A[New spec pushed to main] --> B[Spec Dispatch sends to scenarios repo]
355
- B --> C[Scenario Agent reads spec]
356
- C --> D{Triage: is this user-facing?}
357
-
358
- D -->|Internal refactor, CI, dev tooling| E[Skip - commit note: 'No scenario changes needed']
359
- D -->|New user-facing behavior| F[Write new scenario test file]
360
- D -->|Modified existing behavior| G[Update existing scenario tests]
361
-
362
- F --> H[Commit to scenarios main]
363
- G --> H
364
- H --> I[Dispatch scenarios-updated to main repo]
365
- I --> J[Re-run workflow tests open PRs with new scenarios]
366
-
367
- style D fill:#ffd,stroke:#993
368
- style E fill:#ddd,stroke:#999
369
- style F fill:#cfc,stroke:#393
370
- style G fill:#cfc,stroke:#393
371
- ```
372
-
373
- **The scenario agent's prompt instructs it to:**
374
- - Act as a QA engineer, never a developer
375
- - Write only behavioral tests (invoke the built artifact, assert on output)
376
- - Never import source code or reference internal implementation
377
- - Use a triage decision tree: SKIP / NEW / UPDATE
378
- - Err on the side of writing a test if the spec is ambiguous
379
-
380
- **The specs mirror:** The scenarios repo maintains a `specs/` folder that mirrors every spec it receives. This gives the scenario agent historical context ("what features already exist?") without access to the main repo's codebase.
381
-
382
- ### The Complete Loop
383
-
384
- Here's the full lifecycle from spec to shipped, validated code:
385
-
386
- ```mermaid
387
- sequenceDiagram
388
- participant Human as Human (writes specs)
389
- participant Main as Main Repo
390
- participant ScAgent as Scenario Agent
391
- participant ScRepo as Scenarios Repo
392
- participant ImplAgent as Implementation Agent
393
- participant Autofix as Autofix Workflow
394
-
395
- Human->>Main: Push spec to docs/specs/
396
- Main->>ScAgent: spec-pushed dispatch
397
-
398
- par Scenario Generation
399
- ScAgent->>ScAgent: Triage spec
400
- ScAgent->>ScRepo: Write/update holdout tests
401
- ScRepo->>Main: scenarios-updated dispatch
402
- and Implementation
403
- Human->>ImplAgent: Execute spec (fresh session)
404
- ImplAgent->>Main: Open PR
405
- end
406
-
407
- Main->>Main: CI runs on PR
408
-
409
- alt CI fails
410
- Main->>Autofix: Autofix workflow triggers
411
- Autofix->>Main: Push fix, CI re-runs
412
- end
413
-
414
- alt CI passes
415
- Main->>ScRepo: run-scenarios dispatch
416
- ScRepo->>ScRepo: Clone PR branch, build, run holdout tests
417
- ScRepo->>Main: Post PASS/FAIL comment on PR
418
- end
419
-
420
- alt Scenarios PASS
421
- Note over Human,Main: Ready for human review and merge
422
- else Scenarios FAIL
423
- Main->>Autofix: Autofix attempts fix
424
- Note over Autofix,ScRepo: Loop continues (max 3 iterations)
425
- end
426
- ```
427
-
428
- ### What Gets Installed
429
-
430
- | Where | File | Purpose |
431
- |-------|------|---------|
432
- | Main repo | `.github/workflows/autofix.yml` | CI failure → Claude fix → push |
433
- | Main repo | `.github/workflows/scenarios-dispatch.yml` | CI pass → trigger holdout tests |
434
- | Main repo | `.github/workflows/spec-dispatch.yml` | Spec push → trigger scenario generation |
435
- | Main repo | `.github/workflows/scenarios-rerun.yml` | New tests → re-test open PRs |
436
- | Scenarios repo | `workflows/run.yml` | Clone PR, build, run tests, post results |
437
- | Scenarios repo | `workflows/generate.yml` | Receive spec, run scenario agent |
438
- | Scenarios repo | `prompts/scenario-agent.md` | Scenario agent prompt template |
439
- | Scenarios repo | `example-scenario.test.ts` | Example holdout test |
440
- | Scenarios repo | `package.json` | Minimal vitest setup |
441
- | Scenarios repo | `README.md` | Explains holdout pattern to contributors |
442
-
443
- ### Setup Guide
444
-
445
- The fastest way: run `/joycraft-implement-level5` in Claude Code and it walks you through everything interactively. Or follow these steps manually:
446
-
447
- #### Step 1: Create a GitHub App
448
-
449
- The autofix workflow needs a GitHub App identity to push commits. GitHub blocks workflows from triggering other workflows with the default `GITHUB_TOKEN` -- a separate App identity solves this. Creating one takes about 2 minutes:
450
-
451
- 1. Go to https://github.com/settings/apps/new
452
- 2. Give it a name (e.g., "My Project Autofix")
453
- 3. Uncheck "Webhook > Active" (not needed)
454
- 4. Under **Repository permissions**, set:
455
- - **Contents**: Read & Write
456
- - **Pull requests**: Read & Write
457
- - **Actions**: Read & Write
458
- 5. Click **Create GitHub App**
459
- 6. Note the **App ID** from the settings page (you'll need it in Step 2)
460
- 7. Scroll to **Private keys** > click **Generate a private key**
461
- 8. Save the downloaded `.pem` file -- you'll need it in Step 3
462
- 9. Click **Install App** in the left sidebar > install it on the repo(s) you want to use
463
-
464
- > **Coming soon:** We're working on a shared Joycraft Autofix app that will reduce this to a single click. For now, creating your own app gives you full control and takes just a couple minutes.
465
-
466
- #### Step 2: Run the CLI
467
-
468
- ```bash
469
- npx joycraft init-autofix --scenarios-repo my-project-scenarios --app-id YOUR_APP_ID
470
- ```
471
-
472
- Replace `YOUR_APP_ID` with the App ID from Step 1. This installs the four workflow files in your main repo and copies scenario templates to `docs/templates/scenarios/`.
473
-
474
- #### Step 3: Add secrets to your main repo
475
-
476
- Go to your repo's **Settings > Secrets and variables > Actions** and add:
477
-
478
- | Secret | Value |
479
- |--------|-------|
480
- | `JOYCRAFT_APP_PRIVATE_KEY` | The full contents of the `.pem` file from Step 1 |
481
- | `ANTHROPIC_API_KEY` | Your Anthropic API key (used by the autofix workflow to run Claude) |
482
-
483
- #### Step 4: Create the scenarios repo
484
-
485
- ```bash
486
- # Create a private repo for holdout tests
487
- gh repo create my-project-scenarios --private
488
-
489
- # Copy the scenario templates into it
490
- cp -r docs/templates/scenarios/* ../my-project-scenarios/
491
- cd ../my-project-scenarios
492
- git add -A && git commit -m "init: scaffold scenarios repo from Joycraft"
493
- git push
494
- ```
495
-
496
- Then add the **same two secrets** (`JOYCRAFT_APP_PRIVATE_KEY` and `ANTHROPIC_API_KEY`) to the scenarios repo's Settings > Secrets.
497
-
498
- #### Step 5: Verify
499
-
500
- ```bash
501
- # Check workflow files exist in your main repo
502
- ls .github/workflows/autofix.yml .github/workflows/scenarios-dispatch.yml \
503
- .github/workflows/spec-dispatch.yml .github/workflows/scenarios-rerun.yml
504
-
505
- # Check scenario templates in the scenarios repo
506
- ls ../my-project-scenarios/workflows/run.yml ../my-project-scenarios/workflows/generate.yml \
507
- ../my-project-scenarios/prompts/scenario-agent.md ../my-project-scenarios/example-scenario.test.ts
508
- ```
509
-
510
- #### Step 6: Test it
511
-
512
- 1. Push a spec to `docs/specs/` on main -- this triggers scenario generation in the scenarios repo
513
- 2. Open a PR with a small change -- when CI passes, scenarios run against the PR
514
- 3. Watch for the scenario test results posted as a PR comment
515
-
516
- Or deliberately break something in a PR to test the autofix loop.
517
-
518
- ### Cost
519
-
520
- Validated in the Pipit trial (~3 minutes, one iteration, zero human intervention). With Claude Sonnet + `--max-turns 20` + max 3 iterations per PR:
521
- - **Autofix:** ~$0.50 per attempt, worst case ~$1.50 per PR (3 iterations)
522
- - **Scenario generation:** ~$0.20 per spec dispatch
523
- - **Solo dev with ~10 PRs/month:** ~$5-10/month for the full loop
524
-
525
- The iteration guard and max-turns cap prevent runaway costs.
179
+ See the full **[Level 5 Autonomy Guide](docs/guides/level-5-autonomy.md)** for architecture diagrams, setup steps, workflow details, and cost estimates.
526
180
 
527
181
  ## Tuning: Risk Interview & Git Autonomy
528
182