joycraft 0.5.7 → 0.5.9
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +2 -348
- package/dist/{chunk-G342HURJ.js → chunk-Y6GBN6R4.js} +672 -61
- package/dist/chunk-Y6GBN6R4.js.map +1 -0
- package/dist/cli.js +3 -3
- package/dist/index.d.ts +6 -0
- package/dist/index.js +78 -7
- package/dist/index.js.map +1 -1
- package/dist/{init-7FUTURUY.js → init-B7MYPB7Z.js} +82 -11
- package/dist/init-B7MYPB7Z.js.map +1 -0
- package/dist/{init-autofix-K4B5BD5V.js → init-autofix-FKZN5S3R.js} +2 -2
- package/dist/{upgrade-2Y7D2HCD.js → upgrade-WXNR3ZNI.js} +2 -2
- package/package.json +1 -1
- package/dist/chunk-G342HURJ.js.map +0 -1
- package/dist/init-7FUTURUY.js.map +0 -1
- /package/dist/{init-autofix-K4B5BD5V.js.map → init-autofix-FKZN5S3R.js.map} +0 -0
- /package/dist/{upgrade-2Y7D2HCD.js.map → upgrade-WXNR3ZNI.js.map} +0 -0
package/README.md
CHANGED
|
@@ -174,355 +174,9 @@ Joycraft tracks what it installed vs. what you've customized. Unmodified files u
|
|
|
174
174
|
|
|
175
175
|
> **A note on complexity:** Setting up Level 5 does have some moving parts and, depending on the complexity of your stack (software vs. hardware, monorepo vs. single app, etc.), this will require a good amount of prompting and trial-and-error to get right. I've done my best to make this as painless as possible, but just note - this is not a one-shot-prompt-done-in-5-minutes kind of thing. For small projects and simple stacks it will be easy, but any level of complexity is going to take some iteration, so plan ahead. Full step-by-step guides along with a video coming soon.
|
|
176
176
|
|
|
177
|
-
Level 5 is where specs go in and validated software comes out
|
|
177
|
+
Level 5 is where specs go in and validated software comes out — four GitHub Actions workflows, a separate scenarios repo, and two AI agents that can never see each other's work. Run `/joycraft-implement-level5` for guided setup, or `npx joycraft init-autofix` via CLI.
|
|
178
178
|
|
|
179
|
-
|
|
180
|
-
|
|
181
|
-
```bash
|
|
182
|
-
npx joycraft init-autofix --scenarios-repo my-project-scenarios --app-id 3180156
|
|
183
|
-
```
|
|
184
|
-
|
|
185
|
-
### Architecture Overview
|
|
186
|
-
|
|
187
|
-
Level 5 has four moving parts. Each is a GitHub Actions workflow that communicates via `repository_dispatch` events. No custom servers, no webhooks, no external services.
|
|
188
|
-
|
|
189
|
-
```mermaid
|
|
190
|
-
graph TB
|
|
191
|
-
subgraph "Main Repository"
|
|
192
|
-
A[Push specs to docs/specs/] -->|push to main| B[Spec Dispatch Workflow]
|
|
193
|
-
C[PR opened] --> D[CI runs]
|
|
194
|
-
D -->|CI fails| E[Autofix Workflow]
|
|
195
|
-
D -->|CI passes| F[Scenarios Dispatch Workflow]
|
|
196
|
-
G[Scenarios Re-run Workflow]
|
|
197
|
-
end
|
|
198
|
-
|
|
199
|
-
subgraph "Scenarios Repository (private)"
|
|
200
|
-
H[Scenario Generation Workflow]
|
|
201
|
-
I[Scenario Run Workflow]
|
|
202
|
-
J[Holdout Tests]
|
|
203
|
-
K[Specs Mirror]
|
|
204
|
-
end
|
|
205
|
-
|
|
206
|
-
B -->|repository_dispatch: spec-pushed| H
|
|
207
|
-
H -->|reads specs, writes tests| J
|
|
208
|
-
H -->|repository_dispatch: scenarios-updated| G
|
|
209
|
-
G -->|repository_dispatch: run-scenarios| I
|
|
210
|
-
F -->|repository_dispatch: run-scenarios| I
|
|
211
|
-
I -->|posts PASS/FAIL comment| C
|
|
212
|
-
E -->|Claude fixes code, pushes| D
|
|
213
|
-
|
|
214
|
-
style J fill:#f9f,stroke:#333
|
|
215
|
-
style K fill:#bbf,stroke:#333
|
|
216
|
-
```
|
|
217
|
-
|
|
218
|
-
### The Four Workflows
|
|
219
|
-
|
|
220
|
-
#### 1. Autofix Workflow (`autofix.yml`)
|
|
221
|
-
|
|
222
|
-
Triggered when CI **fails** on a PR. Claude Code CLI reads the failure logs and attempts a fix.
|
|
223
|
-
|
|
224
|
-
```mermaid
|
|
225
|
-
sequenceDiagram
|
|
226
|
-
participant CI as CI Workflow
|
|
227
|
-
participant AF as Autofix Workflow
|
|
228
|
-
participant Claude as Claude Code CLI
|
|
229
|
-
participant PR as Pull Request
|
|
230
|
-
|
|
231
|
-
CI->>AF: workflow_run (conclusion: failure)
|
|
232
|
-
AF->>AF: Generate GitHub App token
|
|
233
|
-
AF->>AF: Checkout PR branch
|
|
234
|
-
AF->>AF: Count previous autofix attempts
|
|
235
|
-
|
|
236
|
-
alt attempts >= 3
|
|
237
|
-
AF->>PR: Comment: "Human review needed"
|
|
238
|
-
else attempts < 3
|
|
239
|
-
AF->>AF: Fetch CI failure logs
|
|
240
|
-
AF->>AF: Strip ANSI codes
|
|
241
|
-
AF->>Claude: claude -p "Fix this CI failure..." <br/> --dangerously-skip-permissions --max-turns 20
|
|
242
|
-
Claude->>Claude: Read logs, edit code, run tests
|
|
243
|
-
Claude->>AF: Exit (changes committed locally)
|
|
244
|
-
AF->>PR: Push fix (commit prefix: "autofix:")
|
|
245
|
-
AF->>PR: Comment: summary of fix
|
|
246
|
-
Note over CI,PR: CI re-runs automatically on push
|
|
247
|
-
end
|
|
248
|
-
```
|
|
249
|
-
|
|
250
|
-
**Key details:**
|
|
251
|
-
- Uses a GitHub App identity for pushes to avoid GitHub's anti-recursion protection
|
|
252
|
-
- Concurrency group per PR so only one autofix runs at a time
|
|
253
|
-
- Max 3 iterations, then posts "human review needed"
|
|
254
|
-
- No `--model` flag. Claude CLI handles model selection.
|
|
255
|
-
- Strips ANSI escape codes from logs so Claude gets clean text
|
|
256
|
-
|
|
257
|
-
#### 2. Scenarios Dispatch Workflow (`scenarios-dispatch.yml`)
|
|
258
|
-
|
|
259
|
-
Triggered when CI **passes** on a PR. Fires a `repository_dispatch` to the scenarios repo to run holdout tests against the PR branch.
|
|
260
|
-
|
|
261
|
-
```mermaid
|
|
262
|
-
sequenceDiagram
|
|
263
|
-
participant CI as CI Workflow
|
|
264
|
-
participant SD as Scenarios Dispatch
|
|
265
|
-
participant SR as Scenarios Repo
|
|
266
|
-
|
|
267
|
-
CI->>SD: workflow_run (conclusion: success, PR)
|
|
268
|
-
SD->>SD: Generate GitHub App token
|
|
269
|
-
SD->>SR: repository_dispatch: run-scenarios<br/>payload: {pr_number, branch, sha, repo}
|
|
270
|
-
```
|
|
271
|
-
|
|
272
|
-
#### 3. Spec Dispatch Workflow (`spec-dispatch.yml`)
|
|
273
|
-
|
|
274
|
-
Triggered when spec files are pushed to `main`. Sends the spec content to the scenarios repo so the scenario agent can write tests.
|
|
275
|
-
|
|
276
|
-
```mermaid
|
|
277
|
-
sequenceDiagram
|
|
278
|
-
participant Dev as Developer
|
|
279
|
-
participant Main as Main Repo (push to main)
|
|
280
|
-
participant SPD as Spec Dispatch Workflow
|
|
281
|
-
participant SR as Scenarios Repo
|
|
282
|
-
|
|
283
|
-
Dev->>Main: Push specs to docs/specs/
|
|
284
|
-
Main->>SPD: push event (docs/specs/** changed)
|
|
285
|
-
SPD->>SPD: git diff --diff-filter=AM (added/modified only)
|
|
286
|
-
|
|
287
|
-
loop For each changed spec
|
|
288
|
-
SPD->>SR: repository_dispatch: spec-pushed<br/>payload: {spec_filename, spec_content, commit_sha, branch, repo}
|
|
289
|
-
end
|
|
290
|
-
|
|
291
|
-
Note over SPD: Deleted specs are ignored -<br/>existing scenario tests remain
|
|
292
|
-
```
|
|
293
|
-
|
|
294
|
-
#### 4. Scenarios Re-run Workflow (`scenarios-rerun.yml`)
|
|
295
|
-
|
|
296
|
-
Triggered when the scenarios repo updates its tests. Re-dispatches all open PRs to the scenarios repo so they get tested with the latest holdout tests.
|
|
297
|
-
|
|
298
|
-
```mermaid
|
|
299
|
-
sequenceDiagram
|
|
300
|
-
participant SR as Scenarios Repo
|
|
301
|
-
participant RR as Re-run Workflow
|
|
302
|
-
participant SRun as Scenarios Run
|
|
303
|
-
|
|
304
|
-
SR->>RR: repository_dispatch: scenarios-updated
|
|
305
|
-
RR->>RR: List open PRs via GitHub API
|
|
306
|
-
|
|
307
|
-
alt No open PRs
|
|
308
|
-
RR->>RR: Exit (no-op)
|
|
309
|
-
else Has open PRs
|
|
310
|
-
loop For each open PR
|
|
311
|
-
RR->>SRun: repository_dispatch: run-scenarios<br/>payload: {pr_number, branch, sha, repo}
|
|
312
|
-
end
|
|
313
|
-
end
|
|
314
|
-
```
|
|
315
|
-
|
|
316
|
-
**Why this exists:** There's a race condition. The implementation agent might open a PR before the scenario agent finishes writing new tests. The re-run workflow handles this by re-testing all open PRs when new tests land. Worst case, a PR merges before the re-run, and the new tests protect the very next PR. You're never more than one cycle behind.
|
|
317
|
-
|
|
318
|
-
### The Holdout Wall
|
|
319
|
-
|
|
320
|
-
The core safety mechanism. Two agents, two repos, one shared interface (specs):
|
|
321
|
-
|
|
322
|
-
```mermaid
|
|
323
|
-
graph LR
|
|
324
|
-
subgraph "Implementation Agent (main repo)"
|
|
325
|
-
IA_sees["Can see:<br/>Source code<br/>Internal tests<br/>Specs"]
|
|
326
|
-
IA_cant["Cannot see:<br/>Scenario tests<br/>Scenario repo"]
|
|
327
|
-
end
|
|
328
|
-
|
|
329
|
-
subgraph "Specs (shared interface)"
|
|
330
|
-
Specs["docs/specs/*.md<br/>Describes WHAT should happen<br/>Never describes HOW it's tested"]
|
|
331
|
-
end
|
|
332
|
-
|
|
333
|
-
subgraph "Scenario Agent (scenarios repo)"
|
|
334
|
-
SA_sees["Can see:<br/>Specs (via dispatch)<br/>Scenario tests<br/>Specs mirror"]
|
|
335
|
-
SA_cant["Cannot see:<br/>Source code<br/>Internal tests"]
|
|
336
|
-
end
|
|
337
|
-
|
|
338
|
-
IA_sees --> Specs
|
|
339
|
-
Specs --> SA_sees
|
|
340
|
-
|
|
341
|
-
style IA_cant fill:#fcc,stroke:#933
|
|
342
|
-
style SA_cant fill:#fcc,stroke:#933
|
|
343
|
-
style Specs fill:#cfc,stroke:#393
|
|
344
|
-
```
|
|
345
|
-
|
|
346
|
-
This is the same principle as a holdout set in machine learning. If the implementation agent could see the scenario tests, it would optimize to pass them specifically instead of building correct software. By keeping the wall intact, scenario tests catch real behavioral regressions, not test-gaming.
|
|
347
|
-
|
|
348
|
-
### Scenario Evolution
|
|
349
|
-
|
|
350
|
-
Scenarios aren't static. When you push new specs, the scenario agent automatically triages them and writes new holdout tests.
|
|
351
|
-
|
|
352
|
-
```mermaid
|
|
353
|
-
flowchart TD
|
|
354
|
-
A[New spec pushed to main] --> B[Spec Dispatch sends to scenarios repo]
|
|
355
|
-
B --> C[Scenario Agent reads spec]
|
|
356
|
-
C --> D{Triage: is this user-facing?}
|
|
357
|
-
|
|
358
|
-
D -->|Internal refactor, CI, dev tooling| E[Skip - commit note: 'No scenario changes needed']
|
|
359
|
-
D -->|New user-facing behavior| F[Write new scenario test file]
|
|
360
|
-
D -->|Modified existing behavior| G[Update existing scenario tests]
|
|
361
|
-
|
|
362
|
-
F --> H[Commit to scenarios main]
|
|
363
|
-
G --> H
|
|
364
|
-
H --> I[Dispatch scenarios-updated to main repo]
|
|
365
|
-
I --> J[Re-run workflow tests open PRs with new scenarios]
|
|
366
|
-
|
|
367
|
-
style D fill:#ffd,stroke:#993
|
|
368
|
-
style E fill:#ddd,stroke:#999
|
|
369
|
-
style F fill:#cfc,stroke:#393
|
|
370
|
-
style G fill:#cfc,stroke:#393
|
|
371
|
-
```
|
|
372
|
-
|
|
373
|
-
**The scenario agent's prompt instructs it to:**
|
|
374
|
-
- Act as a QA engineer, never a developer
|
|
375
|
-
- Write only behavioral tests (invoke the built artifact, assert on output)
|
|
376
|
-
- Never import source code or reference internal implementation
|
|
377
|
-
- Use a triage decision tree: SKIP / NEW / UPDATE
|
|
378
|
-
- Err on the side of writing a test if the spec is ambiguous
|
|
379
|
-
|
|
380
|
-
**The specs mirror:** The scenarios repo maintains a `specs/` folder that mirrors every spec it receives. This gives the scenario agent historical context ("what features already exist?") without access to the main repo's codebase.
|
|
381
|
-
|
|
382
|
-
### The Complete Loop
|
|
383
|
-
|
|
384
|
-
Here's the full lifecycle from spec to shipped, validated code:
|
|
385
|
-
|
|
386
|
-
```mermaid
|
|
387
|
-
sequenceDiagram
|
|
388
|
-
participant Human as Human (writes specs)
|
|
389
|
-
participant Main as Main Repo
|
|
390
|
-
participant ScAgent as Scenario Agent
|
|
391
|
-
participant ScRepo as Scenarios Repo
|
|
392
|
-
participant ImplAgent as Implementation Agent
|
|
393
|
-
participant Autofix as Autofix Workflow
|
|
394
|
-
|
|
395
|
-
Human->>Main: Push spec to docs/specs/
|
|
396
|
-
Main->>ScAgent: spec-pushed dispatch
|
|
397
|
-
|
|
398
|
-
par Scenario Generation
|
|
399
|
-
ScAgent->>ScAgent: Triage spec
|
|
400
|
-
ScAgent->>ScRepo: Write/update holdout tests
|
|
401
|
-
ScRepo->>Main: scenarios-updated dispatch
|
|
402
|
-
and Implementation
|
|
403
|
-
Human->>ImplAgent: Execute spec (fresh session)
|
|
404
|
-
ImplAgent->>Main: Open PR
|
|
405
|
-
end
|
|
406
|
-
|
|
407
|
-
Main->>Main: CI runs on PR
|
|
408
|
-
|
|
409
|
-
alt CI fails
|
|
410
|
-
Main->>Autofix: Autofix workflow triggers
|
|
411
|
-
Autofix->>Main: Push fix, CI re-runs
|
|
412
|
-
end
|
|
413
|
-
|
|
414
|
-
alt CI passes
|
|
415
|
-
Main->>ScRepo: run-scenarios dispatch
|
|
416
|
-
ScRepo->>ScRepo: Clone PR branch, build, run holdout tests
|
|
417
|
-
ScRepo->>Main: Post PASS/FAIL comment on PR
|
|
418
|
-
end
|
|
419
|
-
|
|
420
|
-
alt Scenarios PASS
|
|
421
|
-
Note over Human,Main: Ready for human review and merge
|
|
422
|
-
else Scenarios FAIL
|
|
423
|
-
Main->>Autofix: Autofix attempts fix
|
|
424
|
-
Note over Autofix,ScRepo: Loop continues (max 3 iterations)
|
|
425
|
-
end
|
|
426
|
-
```
|
|
427
|
-
|
|
428
|
-
### What Gets Installed
|
|
429
|
-
|
|
430
|
-
| Where | File | Purpose |
|
|
431
|
-
|-------|------|---------|
|
|
432
|
-
| Main repo | `.github/workflows/autofix.yml` | CI failure → Claude fix → push |
|
|
433
|
-
| Main repo | `.github/workflows/scenarios-dispatch.yml` | CI pass → trigger holdout tests |
|
|
434
|
-
| Main repo | `.github/workflows/spec-dispatch.yml` | Spec push → trigger scenario generation |
|
|
435
|
-
| Main repo | `.github/workflows/scenarios-rerun.yml` | New tests → re-test open PRs |
|
|
436
|
-
| Scenarios repo | `workflows/run.yml` | Clone PR, build, run tests, post results |
|
|
437
|
-
| Scenarios repo | `workflows/generate.yml` | Receive spec, run scenario agent |
|
|
438
|
-
| Scenarios repo | `prompts/scenario-agent.md` | Scenario agent prompt template |
|
|
439
|
-
| Scenarios repo | `example-scenario.test.ts` | Example holdout test |
|
|
440
|
-
| Scenarios repo | `package.json` | Minimal vitest setup |
|
|
441
|
-
| Scenarios repo | `README.md` | Explains holdout pattern to contributors |
|
|
442
|
-
|
|
443
|
-
### Setup Guide
|
|
444
|
-
|
|
445
|
-
The fastest way: run `/joycraft-implement-level5` in Claude Code and it walks you through everything interactively. Or follow these steps manually:
|
|
446
|
-
|
|
447
|
-
#### Step 1: Create a GitHub App
|
|
448
|
-
|
|
449
|
-
The autofix workflow needs a GitHub App identity to push commits. GitHub blocks workflows from triggering other workflows with the default `GITHUB_TOKEN` -- a separate App identity solves this. Creating one takes about 2 minutes:
|
|
450
|
-
|
|
451
|
-
1. Go to https://github.com/settings/apps/new
|
|
452
|
-
2. Give it a name (e.g., "My Project Autofix")
|
|
453
|
-
3. Uncheck "Webhook > Active" (not needed)
|
|
454
|
-
4. Under **Repository permissions**, set:
|
|
455
|
-
- **Contents**: Read & Write
|
|
456
|
-
- **Pull requests**: Read & Write
|
|
457
|
-
- **Actions**: Read & Write
|
|
458
|
-
5. Click **Create GitHub App**
|
|
459
|
-
6. Note the **App ID** from the settings page (you'll need it in Step 2)
|
|
460
|
-
7. Scroll to **Private keys** > click **Generate a private key**
|
|
461
|
-
8. Save the downloaded `.pem` file -- you'll need it in Step 3
|
|
462
|
-
9. Click **Install App** in the left sidebar > install it on the repo(s) you want to use
|
|
463
|
-
|
|
464
|
-
> **Coming soon:** We're working on a shared Joycraft Autofix app that will reduce this to a single click. For now, creating your own app gives you full control and takes just a couple minutes.
|
|
465
|
-
|
|
466
|
-
#### Step 2: Run the CLI
|
|
467
|
-
|
|
468
|
-
```bash
|
|
469
|
-
npx joycraft init-autofix --scenarios-repo my-project-scenarios --app-id YOUR_APP_ID
|
|
470
|
-
```
|
|
471
|
-
|
|
472
|
-
Replace `YOUR_APP_ID` with the App ID from Step 1. This installs the four workflow files in your main repo and copies scenario templates to `docs/templates/scenarios/`.
|
|
473
|
-
|
|
474
|
-
#### Step 3: Add secrets to your main repo
|
|
475
|
-
|
|
476
|
-
Go to your repo's **Settings > Secrets and variables > Actions** and add:
|
|
477
|
-
|
|
478
|
-
| Secret | Value |
|
|
479
|
-
|--------|-------|
|
|
480
|
-
| `JOYCRAFT_APP_PRIVATE_KEY` | The full contents of the `.pem` file from Step 1 |
|
|
481
|
-
| `ANTHROPIC_API_KEY` | Your Anthropic API key (used by the autofix workflow to run Claude) |
|
|
482
|
-
|
|
483
|
-
#### Step 4: Create the scenarios repo
|
|
484
|
-
|
|
485
|
-
```bash
|
|
486
|
-
# Create a private repo for holdout tests
|
|
487
|
-
gh repo create my-project-scenarios --private
|
|
488
|
-
|
|
489
|
-
# Copy the scenario templates into it
|
|
490
|
-
cp -r docs/templates/scenarios/* ../my-project-scenarios/
|
|
491
|
-
cd ../my-project-scenarios
|
|
492
|
-
git add -A && git commit -m "init: scaffold scenarios repo from Joycraft"
|
|
493
|
-
git push
|
|
494
|
-
```
|
|
495
|
-
|
|
496
|
-
Then add the **same two secrets** (`JOYCRAFT_APP_PRIVATE_KEY` and `ANTHROPIC_API_KEY`) to the scenarios repo's Settings > Secrets.
|
|
497
|
-
|
|
498
|
-
#### Step 5: Verify
|
|
499
|
-
|
|
500
|
-
```bash
|
|
501
|
-
# Check workflow files exist in your main repo
|
|
502
|
-
ls .github/workflows/autofix.yml .github/workflows/scenarios-dispatch.yml \
|
|
503
|
-
.github/workflows/spec-dispatch.yml .github/workflows/scenarios-rerun.yml
|
|
504
|
-
|
|
505
|
-
# Check scenario templates in the scenarios repo
|
|
506
|
-
ls ../my-project-scenarios/workflows/run.yml ../my-project-scenarios/workflows/generate.yml \
|
|
507
|
-
../my-project-scenarios/prompts/scenario-agent.md ../my-project-scenarios/example-scenario.test.ts
|
|
508
|
-
```
|
|
509
|
-
|
|
510
|
-
#### Step 6: Test it
|
|
511
|
-
|
|
512
|
-
1. Push a spec to `docs/specs/` on main -- this triggers scenario generation in the scenarios repo
|
|
513
|
-
2. Open a PR with a small change -- when CI passes, scenarios run against the PR
|
|
514
|
-
3. Watch for the scenario test results posted as a PR comment
|
|
515
|
-
|
|
516
|
-
Or deliberately break something in a PR to test the autofix loop.
|
|
517
|
-
|
|
518
|
-
### Cost
|
|
519
|
-
|
|
520
|
-
Validated in the Pipit trial (~3 minutes, one iteration, zero human intervention). With Claude Sonnet + `--max-turns 20` + max 3 iterations per PR:
|
|
521
|
-
- **Autofix:** ~$0.50 per attempt, worst case ~$1.50 per PR (3 iterations)
|
|
522
|
-
- **Scenario generation:** ~$0.20 per spec dispatch
|
|
523
|
-
- **Solo dev with ~10 PRs/month:** ~$5-10/month for the full loop
|
|
524
|
-
|
|
525
|
-
The iteration guard and max-turns cap prevent runaway costs.
|
|
179
|
+
See the full **[Level 5 Autonomy Guide](docs/guides/level-5-autonomy.md)** for architecture diagrams, setup steps, workflow details, and cost estimates.
|
|
526
180
|
|
|
527
181
|
## Tuning: Risk Interview & Git Autonomy
|
|
528
182
|
|