joycraft 0.5.6 → 0.5.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -72,6 +72,8 @@ Joycraft auto-detects your tech stack and creates:
72
72
  - `/joycraft-interview` Lightweight brainstorm. Yap about ideas, get a structured summary
73
73
  - `/joycraft-decompose` Break a brief into small, testable specs
74
74
  - `/joycraft-add-fact` Capture project knowledge on the fly -- routes to the right context doc
75
+ - `/joycraft-lockdown` Generate constrained execution boundaries (read-only tests, deny patterns)
76
+ - `/joycraft-verify` Spawn a separate subagent to independently verify implementation against spec
75
77
  - `/joycraft-session-end` Capture discoveries, verify, commit, push
76
78
  - `/joycraft-implement-level5` Set up Level 5 (autofix loop, holdout scenarios, scenario evolution)
77
79
  - **docs/** structure: `briefs/`, `specs/`, `discoveries/`, `contracts/`, `decisions/`, `context/`
@@ -96,6 +98,8 @@ After init, open Claude Code and use the installed skills:
96
98
  /joycraft-new-feature # Interview → Feature Brief → Atomic Specs → ready to execute
97
99
  /joycraft-decompose # Break any feature into small, independent specs
98
100
  /joycraft-add-fact # Capture a fact mid-session -- auto-routes to the right context doc
101
+ /joycraft-lockdown # Generate constrained execution boundaries for autonomous sessions
102
+ /joycraft-verify # Independent verification -- spawns a subagent to check your work
99
103
  /joycraft-session-end # Wrap up: discoveries, verification, commit, push
100
104
  /joycraft-implement-level5 # Set up Level 5 (autofix, holdout scenarios, evolution)
101
105
  ```
@@ -170,379 +174,92 @@ Joycraft tracks what it installed vs. what you've customized. Unmodified files u
170
174
 
171
175
  > **A note on complexity:** Setting up Level 5 does have some moving parts and, depending on the complexity of your stack (software vs. hardware, monorepo vs. single app, etc.), this will require a good amount of prompting and trial-and-error to get right. I've done my best to make this as painless as possible, but just note - this is not a one-shot-prompt-done-in-5-minutes kind of thing. For small projects and simple stacks it will be easy, but any level of complexity is going to take some iteration, so plan ahead. Full step-by-step guides along with a video coming soon.
172
176
 
173
- Level 5 is where specs go in and validated software comes out. Joycraft implements this as four interlocking GitHub Actions workflows, a separate scenarios repository, and two independent AI agents that can never see each other's work.
177
+ Level 5 is where specs go in and validated software comes out four GitHub Actions workflows, a separate scenarios repo, and two AI agents that can never see each other's work. Run `/joycraft-implement-level5` for guided setup, or `npx joycraft init-autofix` via CLI.
174
178
 
175
- Run `/joycraft-implement-level5` in Claude Code for a guided setup, or use the CLI directly:
179
+ See the full **[Level 5 Autonomy Guide](docs/guides/level-5-autonomy.md)** for architecture diagrams, setup steps, workflow details, and cost estimates.
176
180
 
177
- ```bash
178
- npx joycraft init-autofix --scenarios-repo my-project-scenarios --app-id 3180156
179
- ```
180
-
181
- ### Architecture Overview
182
-
183
- Level 5 has four moving parts. Each is a GitHub Actions workflow that communicates via `repository_dispatch` events. No custom servers, no webhooks, no external services.
184
-
185
- ```mermaid
186
- graph TB
187
- subgraph "Main Repository"
188
- A[Push specs to docs/specs/] -->|push to main| B[Spec Dispatch Workflow]
189
- C[PR opened] --> D[CI runs]
190
- D -->|CI fails| E[Autofix Workflow]
191
- D -->|CI passes| F[Scenarios Dispatch Workflow]
192
- G[Scenarios Re-run Workflow]
193
- end
194
-
195
- subgraph "Scenarios Repository (private)"
196
- H[Scenario Generation Workflow]
197
- I[Scenario Run Workflow]
198
- J[Holdout Tests]
199
- K[Specs Mirror]
200
- end
201
-
202
- B -->|repository_dispatch: spec-pushed| H
203
- H -->|reads specs, writes tests| J
204
- H -->|repository_dispatch: scenarios-updated| G
205
- G -->|repository_dispatch: run-scenarios| I
206
- F -->|repository_dispatch: run-scenarios| I
207
- I -->|posts PASS/FAIL comment| C
208
- E -->|Claude fixes code, pushes| D
209
-
210
- style J fill:#f9f,stroke:#333
211
- style K fill:#bbf,stroke:#333
212
- ```
213
-
214
- ### The Four Workflows
215
-
216
- #### 1. Autofix Workflow (`autofix.yml`)
217
-
218
- Triggered when CI **fails** on a PR. Claude Code CLI reads the failure logs and attempts a fix.
219
-
220
- ```mermaid
221
- sequenceDiagram
222
- participant CI as CI Workflow
223
- participant AF as Autofix Workflow
224
- participant Claude as Claude Code CLI
225
- participant PR as Pull Request
226
-
227
- CI->>AF: workflow_run (conclusion: failure)
228
- AF->>AF: Generate GitHub App token
229
- AF->>AF: Checkout PR branch
230
- AF->>AF: Count previous autofix attempts
231
-
232
- alt attempts >= 3
233
- AF->>PR: Comment: "Human review needed"
234
- else attempts < 3
235
- AF->>AF: Fetch CI failure logs
236
- AF->>AF: Strip ANSI codes
237
- AF->>Claude: claude -p "Fix this CI failure..." <br/> --dangerously-skip-permissions --max-turns 20
238
- Claude->>Claude: Read logs, edit code, run tests
239
- Claude->>AF: Exit (changes committed locally)
240
- AF->>PR: Push fix (commit prefix: "autofix:")
241
- AF->>PR: Comment: summary of fix
242
- Note over CI,PR: CI re-runs automatically on push
243
- end
244
- ```
245
-
246
- **Key details:**
247
- - Uses a GitHub App identity for pushes to avoid GitHub's anti-recursion protection
248
- - Concurrency group per PR so only one autofix runs at a time
249
- - Max 3 iterations, then posts "human review needed"
250
- - No `--model` flag. Claude CLI handles model selection.
251
- - Strips ANSI escape codes from logs so Claude gets clean text
252
-
253
- #### 2. Scenarios Dispatch Workflow (`scenarios-dispatch.yml`)
254
-
255
- Triggered when CI **passes** on a PR. Fires a `repository_dispatch` to the scenarios repo to run holdout tests against the PR branch.
256
-
257
- ```mermaid
258
- sequenceDiagram
259
- participant CI as CI Workflow
260
- participant SD as Scenarios Dispatch
261
- participant SR as Scenarios Repo
262
-
263
- CI->>SD: workflow_run (conclusion: success, PR)
264
- SD->>SD: Generate GitHub App token
265
- SD->>SR: repository_dispatch: run-scenarios<br/>payload: {pr_number, branch, sha, repo}
266
- ```
267
-
268
- #### 3. Spec Dispatch Workflow (`spec-dispatch.yml`)
269
-
270
- Triggered when spec files are pushed to `main`. Sends the spec content to the scenarios repo so the scenario agent can write tests.
271
-
272
- ```mermaid
273
- sequenceDiagram
274
- participant Dev as Developer
275
- participant Main as Main Repo (push to main)
276
- participant SPD as Spec Dispatch Workflow
277
- participant SR as Scenarios Repo
278
-
279
- Dev->>Main: Push specs to docs/specs/
280
- Main->>SPD: push event (docs/specs/** changed)
281
- SPD->>SPD: git diff --diff-filter=AM (added/modified only)
282
-
283
- loop For each changed spec
284
- SPD->>SR: repository_dispatch: spec-pushed<br/>payload: {spec_filename, spec_content, commit_sha, branch, repo}
285
- end
286
-
287
- Note over SPD: Deleted specs are ignored -<br/>existing scenario tests remain
288
- ```
289
-
290
- #### 4. Scenarios Re-run Workflow (`scenarios-rerun.yml`)
291
-
292
- Triggered when the scenarios repo updates its tests. Re-dispatches all open PRs to the scenarios repo so they get tested with the latest holdout tests.
293
-
294
- ```mermaid
295
- sequenceDiagram
296
- participant SR as Scenarios Repo
297
- participant RR as Re-run Workflow
298
- participant SRun as Scenarios Run
299
-
300
- SR->>RR: repository_dispatch: scenarios-updated
301
- RR->>RR: List open PRs via GitHub API
302
-
303
- alt No open PRs
304
- RR->>RR: Exit (no-op)
305
- else Has open PRs
306
- loop For each open PR
307
- RR->>SRun: repository_dispatch: run-scenarios<br/>payload: {pr_number, branch, sha, repo}
308
- end
309
- end
310
- ```
311
-
312
- **Why this exists:** There's a race condition. The implementation agent might open a PR before the scenario agent finishes writing new tests. The re-run workflow handles this by re-testing all open PRs when new tests land. Worst case, a PR merges before the re-run, and the new tests protect the very next PR. You're never more than one cycle behind.
313
-
314
- ### The Holdout Wall
315
-
316
- The core safety mechanism. Two agents, two repos, one shared interface (specs):
317
-
318
- ```mermaid
319
- graph LR
320
- subgraph "Implementation Agent (main repo)"
321
- IA_sees["Can see:<br/>Source code<br/>Internal tests<br/>Specs"]
322
- IA_cant["Cannot see:<br/>Scenario tests<br/>Scenario repo"]
323
- end
324
-
325
- subgraph "Specs (shared interface)"
326
- Specs["docs/specs/*.md<br/>Describes WHAT should happen<br/>Never describes HOW it's tested"]
327
- end
328
-
329
- subgraph "Scenario Agent (scenarios repo)"
330
- SA_sees["Can see:<br/>Specs (via dispatch)<br/>Scenario tests<br/>Specs mirror"]
331
- SA_cant["Cannot see:<br/>Source code<br/>Internal tests"]
332
- end
333
-
334
- IA_sees --> Specs
335
- Specs --> SA_sees
336
-
337
- style IA_cant fill:#fcc,stroke:#933
338
- style SA_cant fill:#fcc,stroke:#933
339
- style Specs fill:#cfc,stroke:#393
340
- ```
341
-
342
- This is the same principle as a holdout set in machine learning. If the implementation agent could see the scenario tests, it would optimize to pass them specifically instead of building correct software. By keeping the wall intact, scenario tests catch real behavioral regressions, not test-gaming.
343
-
344
- ### Scenario Evolution
345
-
346
- Scenarios aren't static. When you push new specs, the scenario agent automatically triages them and writes new holdout tests.
347
-
348
- ```mermaid
349
- flowchart TD
350
- A[New spec pushed to main] --> B[Spec Dispatch sends to scenarios repo]
351
- B --> C[Scenario Agent reads spec]
352
- C --> D{Triage: is this user-facing?}
353
-
354
- D -->|Internal refactor, CI, dev tooling| E[Skip - commit note: 'No scenario changes needed']
355
- D -->|New user-facing behavior| F[Write new scenario test file]
356
- D -->|Modified existing behavior| G[Update existing scenario tests]
357
-
358
- F --> H[Commit to scenarios main]
359
- G --> H
360
- H --> I[Dispatch scenarios-updated to main repo]
361
- I --> J[Re-run workflow tests open PRs with new scenarios]
362
-
363
- style D fill:#ffd,stroke:#993
364
- style E fill:#ddd,stroke:#999
365
- style F fill:#cfc,stroke:#393
366
- style G fill:#cfc,stroke:#393
367
- ```
368
-
369
- **The scenario agent's prompt instructs it to:**
370
- - Act as a QA engineer, never a developer
371
- - Write only behavioral tests (invoke the built artifact, assert on output)
372
- - Never import source code or reference internal implementation
373
- - Use a triage decision tree: SKIP / NEW / UPDATE
374
- - Err on the side of writing a test if the spec is ambiguous
375
-
376
- **The specs mirror:** The scenarios repo maintains a `specs/` folder that mirrors every spec it receives. This gives the scenario agent historical context ("what features already exist?") without access to the main repo's codebase.
377
-
378
- ### The Complete Loop
379
-
380
- Here's the full lifecycle from spec to shipped, validated code:
381
-
382
- ```mermaid
383
- sequenceDiagram
384
- participant Human as Human (writes specs)
385
- participant Main as Main Repo
386
- participant ScAgent as Scenario Agent
387
- participant ScRepo as Scenarios Repo
388
- participant ImplAgent as Implementation Agent
389
- participant Autofix as Autofix Workflow
390
-
391
- Human->>Main: Push spec to docs/specs/
392
- Main->>ScAgent: spec-pushed dispatch
393
-
394
- par Scenario Generation
395
- ScAgent->>ScAgent: Triage spec
396
- ScAgent->>ScRepo: Write/update holdout tests
397
- ScRepo->>Main: scenarios-updated dispatch
398
- and Implementation
399
- Human->>ImplAgent: Execute spec (fresh session)
400
- ImplAgent->>Main: Open PR
401
- end
402
-
403
- Main->>Main: CI runs on PR
404
-
405
- alt CI fails
406
- Main->>Autofix: Autofix workflow triggers
407
- Autofix->>Main: Push fix, CI re-runs
408
- end
409
-
410
- alt CI passes
411
- Main->>ScRepo: run-scenarios dispatch
412
- ScRepo->>ScRepo: Clone PR branch, build, run holdout tests
413
- ScRepo->>Main: Post PASS/FAIL comment on PR
414
- end
415
-
416
- alt Scenarios PASS
417
- Note over Human,Main: Ready for human review and merge
418
- else Scenarios FAIL
419
- Main->>Autofix: Autofix attempts fix
420
- Note over Autofix,ScRepo: Loop continues (max 3 iterations)
421
- end
422
- ```
423
-
424
- ### What Gets Installed
425
-
426
- | Where | File | Purpose |
427
- |-------|------|---------|
428
- | Main repo | `.github/workflows/autofix.yml` | CI failure → Claude fix → push |
429
- | Main repo | `.github/workflows/scenarios-dispatch.yml` | CI pass → trigger holdout tests |
430
- | Main repo | `.github/workflows/spec-dispatch.yml` | Spec push → trigger scenario generation |
431
- | Main repo | `.github/workflows/scenarios-rerun.yml` | New tests → re-test open PRs |
432
- | Scenarios repo | `workflows/run.yml` | Clone PR, build, run tests, post results |
433
- | Scenarios repo | `workflows/generate.yml` | Receive spec, run scenario agent |
434
- | Scenarios repo | `prompts/scenario-agent.md` | Scenario agent prompt template |
435
- | Scenarios repo | `example-scenario.test.ts` | Example holdout test |
436
- | Scenarios repo | `package.json` | Minimal vitest setup |
437
- | Scenarios repo | `README.md` | Explains holdout pattern to contributors |
438
-
439
- ### Setup Guide
181
+ ## Tuning: Risk Interview & Git Autonomy
440
182
 
441
- The fastest way: run `/joycraft-implement-level5` in Claude Code and it walks you through everything interactively. Or follow these steps manually:
183
+ When `/joycraft-tune` runs for the first time, it does two things:
442
184
 
443
- #### Step 1: Create a GitHub App
185
+ ### Risk interview
444
186
 
445
- The autofix workflow needs a GitHub App identity to push commits. GitHub blocks workflows from triggering other workflows with the default `GITHUB_TOKEN` -- a separate App identity solves this. Creating one takes about 2 minutes:
187
+ 3-5 targeted questions about what's dangerous in your project (production databases, live APIs, secrets, files that should be off-limits). From your answers, Joycraft generates:
446
188
 
447
- 1. Go to https://github.com/settings/apps/new
448
- 2. Give it a name (e.g., "My Project Autofix")
449
- 3. Uncheck "Webhook > Active" (not needed)
450
- 4. Under **Repository permissions**, set:
451
- - **Contents**: Read & Write
452
- - **Pull requests**: Read & Write
453
- - **Actions**: Read & Write
454
- 5. Click **Create GitHub App**
455
- 6. Note the **App ID** from the settings page (you'll need it in Step 2)
456
- 7. Scroll to **Private keys** > click **Generate a private key**
457
- 8. Save the downloaded `.pem` file -- you'll need it in Step 3
458
- 9. Click **Install App** in the left sidebar > install it on the repo(s) you want to use
189
+ - **NEVER rules** for CLAUDE.md (e.g., "NEVER connect to production DB")
190
+ - **Deny patterns** for `.claude/settings.json` (blocks dangerous bash commands)
191
+ - **`docs/context/production-map.md`** documenting what's real vs. safe to touch
192
+ - **`docs/context/dangerous-assumptions.md`** documenting "Agent might assume X, but actually Y"
459
193
 
460
- > **Coming soon:** We're working on a shared Joycraft Autofix app that will reduce this to a single click. For now, creating your own app gives you full control and takes just a couple minutes.
194
+ This takes 2-3 minutes and dramatically reduces the chance of your agent doing something catastrophic.
461
195
 
462
- #### Step 2: Run the CLI
196
+ ### Git autonomy
463
197
 
464
- ```bash
465
- npx joycraft init-autofix --scenarios-repo my-project-scenarios --app-id YOUR_APP_ID
466
- ```
198
+ One question: **how autonomous should git be?**
467
199
 
468
- Replace `YOUR_APP_ID` with the App ID from Step 1. This installs the four workflow files in your main repo and copies scenario templates to `docs/templates/scenarios/`.
200
+ - **Cautious** (default) commits freely but asks before pushing or opening PRs. Good for learning the workflow.
201
+ - **Autonomous** commits, pushes to feature branches, and opens PRs without asking. Good for spec-driven development where you want full send.
469
202
 
470
- #### Step 3: Add secrets to your main repo
203
+ Either way, Joycraft generates explicit git boundaries in your CLAUDE.md: commit message format (`verb: message`), specific file staging (no `git add -A`), no secrets in commits, no force-pushing.
471
204
 
472
- Go to your repo's **Settings > Secrets and variables > Actions** and add:
205
+ ## Test-First Development
473
206
 
474
- | Secret | Value |
475
- |--------|-------|
476
- | `JOYCRAFT_APP_PRIVATE_KEY` | The full contents of the `.pem` file from Step 1 |
477
- | `ANTHROPIC_API_KEY` | Your Anthropic API key (used by the autofix workflow to run Claude) |
207
+ Joycraft enforces a test-first workflow because tests are the mechanism to autonomy. Without tests, your agent implements 9 specs and you have to manually verify each one. With tests, the agent knows when it's done and you can trust the output.
478
208
 
479
- #### Step 4: Create the scenarios repo
209
+ ### How it works
480
210
 
481
- ```bash
482
- # Create a private repo for holdout tests
483
- gh repo create my-project-scenarios --private
484
-
485
- # Copy the scenario templates into it
486
- cp -r docs/templates/scenarios/* ../my-project-scenarios/
487
- cd ../my-project-scenarios
488
- git add -A && git commit -m "init: scaffold scenarios repo from Joycraft"
489
- git push
490
- ```
211
+ When you run `/joycraft-new-feature`, the interview now includes test-focused questions: what test types your project uses, how fast your tests need to run for iteration, and whether you want lockdown mode. Every atomic spec generated by `/joycraft-decompose` includes a **Test Plan** that maps each acceptance criterion to at least one test.
491
212
 
492
- Then add the **same two secrets** (`JOYCRAFT_APP_PRIVATE_KEY` and `ANTHROPIC_API_KEY`) to the scenarios repo's Settings > Secrets.
213
+ The execution order is enforced:
493
214
 
494
- #### Step 5: Verify
215
+ 1. **Write failing tests first** -- the agent writes tests from the spec's Test Plan
216
+ 2. **Run them and confirm they fail** -- if they pass immediately, something is wrong (you're testing the wrong thing)
217
+ 3. **Implement until tests pass** -- the tests are the contract
495
218
 
496
- ```bash
497
- # Check workflow files exist in your main repo
498
- ls .github/workflows/autofix.yml .github/workflows/scenarios-dispatch.yml \
499
- .github/workflows/spec-dispatch.yml .github/workflows/scenarios-rerun.yml
219
+ ### The three laws of test harnesses
500
220
 
501
- # Check scenario templates in the scenarios repo
502
- ls ../my-project-scenarios/workflows/run.yml ../my-project-scenarios/workflows/generate.yml \
503
- ../my-project-scenarios/prompts/scenario-agent.md ../my-project-scenarios/example-scenario.test.ts
504
- ```
221
+ These are baked into every spec template, discovered through real autonomous development:
505
222
 
506
- #### Step 6: Test it
223
+ 1. **Tests must fail first.** If your test harness doesn't have failing tests, the agent will write tests that pass trivially -- testing the library instead of your function.
224
+ 2. **Tests must run against your actual function.** Not a reimplementation, not a mock, not the wrapped library. The test calls your code.
225
+ 3. **Tests must detect individual changes.** You need fast smoke tests (seconds, not minutes) so you know if a single change helped or hurt.
507
226
 
508
- 1. Push a spec to `docs/specs/` on main -- this triggers scenario generation in the scenarios repo
509
- 2. Open a PR with a small change -- when CI passes, scenarios run against the PR
510
- 3. Watch for the scenario test results posted as a PR comment
227
+ ### Lockdown mode
511
228
 
512
- Or deliberately break something in a PR to test the autofix loop.
229
+ For complex stacks or long autonomous sessions, `/joycraft-lockdown` generates constrained execution boundaries:
513
230
 
514
- ### Cost
231
+ - **NEVER rules** for editing test files (read-only)
232
+ - **Deny patterns** for package installs, network access, log reading
233
+ - **Permission mode recommendations** (see below)
515
234
 
516
- Validated in the Pipit trial (~3 minutes, one iteration, zero human intervention). With Claude Sonnet + `--max-turns 20` + max 3 iterations per PR:
517
- - **Autofix:** ~$0.50 per attempt, worst case ~$1.50 per PR (3 iterations)
518
- - **Scenario generation:** ~$0.20 per spec dispatch
519
- - **Solo dev with ~10 PRs/month:** ~$5-10/month for the full loop
235
+ This prevents the agent from going rogue -- downloading SDKs, pinging random IPs, clearing test files, or filling context with log output. Lockdown is optional and most useful for complex tech stacks (hardware, firmware, multi-device workflows).
520
236
 
521
- The iteration guard and max-turns cap prevent runaway costs.
237
+ ### Independent verification
522
238
 
523
- ## Tuning: Risk Interview & Git Autonomy
239
+ `/joycraft-verify` spawns a separate subagent with a clean context window to independently check your implementation against the spec. The verifier reads the acceptance criteria, runs the tests, and produces a structured pass/fail verdict. It cannot edit any code -- read-only plus test execution only.
524
240
 
525
- When `/joycraft-tune` runs for the first time, it does two things:
241
+ This follows [Anthropic's finding](https://www.anthropic.com/engineering/harness-design-long-running-apps) that "agents reliably skew positive when grading their own work" and that separating the worker from the evaluator consistently outperforms self-evaluation.
526
242
 
527
- ### Risk interview
243
+ ## Claude Code Permission Modes
528
244
 
529
- 3-5 targeted questions about what's dangerous in your project (production databases, live APIs, secrets, files that should be off-limits). From your answers, Joycraft generates:
245
+ You do **not** need `--dangerously-skip-permissions` for autonomous development. Claude Code offers safer alternatives that Joycraft recommends based on your use case:
530
246
 
531
- - **NEVER rules** for CLAUDE.md (e.g., "NEVER connect to production DB")
532
- - **Deny patterns** for `.claude/settings.json` (blocks dangerous bash commands)
533
- - **`docs/context/production-map.md`** documenting what's real vs. safe to touch
534
- - **`docs/context/dangerous-assumptions.md`** documenting "Agent might assume X, but actually Y"
247
+ | Your situation | Permission mode | What it does |
248
+ |---|---|---|
249
+ | Interactive development | `acceptEdits` | Auto-approves file edits, prompts for shell commands |
250
+ | Long autonomous session | `auto` | Safety classifier reviews each action, blocks scope escalation |
251
+ | Autonomous spec execution | `dontAsk` + allowlist | Only pre-approved commands run, everything else denied |
252
+ | Planning and exploration | `plan` | Claude can only read and propose, no edits allowed |
535
253
 
536
- This takes 2-3 minutes and dramatically reduces the chance of your agent doing something catastrophic.
254
+ ### When to use what
537
255
 
538
- ### Git autonomy
256
+ **`--permission-mode auto`** is the best default for most developers. A background classifier (Sonnet) reviews each action before execution, blocking things like: downloading unexpected packages, accessing unfamiliar infrastructure, or escalating beyond the task scope. It adds minimal latency and catches the exact problems that make autonomous development scary.
539
257
 
540
- One question: **how autonomous should git be?**
258
+ **`--permission-mode dontAsk`** is for maximum control. You define an explicit allowlist of what the agent can do (write code, run specific test commands) and everything else is silently denied. No prompts, no surprises. This is what Joycraft's `/joycraft-lockdown` skill helps you configure.
541
259
 
542
- - **Cautious** (default) commits freely but asks before pushing or opening PRs. Good for learning the workflow.
543
- - **Autonomous** commits, pushes to feature branches, and opens PRs without asking. Good for spec-driven development where you want full send.
260
+ **`--dangerously-skip-permissions`** should only be used in isolated containers or VMs with no internet access. It bypasses all safety checks and cannot be overridden by subagents.
544
261
 
545
- Either way, Joycraft generates explicit git boundaries in your CLAUDE.md: commit message format (`verb: message`), specific file staging (no `git add -A`), no secrets in commits, no force-pushing.
262
+ Both `/joycraft-lockdown` and `/joycraft-tune` now recommend the appropriate permission mode based on your project's risk profile.
546
263
 
547
264
  ## How It Works with AI Agents
548
265
 
@@ -581,6 +298,10 @@ Joycraft's approach is synthesized from several sources:
581
298
 
582
299
  **Behavioral boundaries.** CLAUDE.md isn't a suggestion box, it's a contract. Joycraft installs a three-tier boundary framework (Always / Ask First / Never) that prevents the most common AI development failures: overwriting user files, skipping tests, pushing without approval, hardcoding secrets. This is [Addy Osmani's](https://addyosmani.com/blog/good-spec/) "boundaries" principle made concrete.
583
300
 
301
+ **Test-first as the mechanism to autonomy.** Tests aren't a nice-to-have, they're the bridge between "agent writes code" and "agent writes *correct* code." Every spec includes a Test Plan mapping acceptance criteria to tests, and the agent must write failing tests before implementing. This follows the three laws of test harnesses discovered through real autonomous development, and aligns with [Anthropic's harness design research](https://www.anthropic.com/engineering/harness-design-long-running-apps) which found that agents reliably skip verification unless explicitly constrained.
302
+
303
+ **Separation of evaluation from implementation.** [Anthropic's research](https://www.anthropic.com/engineering/harness-design-long-running-apps) found that "agents reliably skew positive when grading their own work." Joycraft addresses this at two levels: `/joycraft-verify` spawns a separate subagent with clean context to independently verify against the spec, and Level 5's holdout scenarios provide external evaluation the implementation agent can never see.
304
+
584
305
  **Knowledge capture over session notes.** Most session notes are never re-read. Joycraft's `/joycraft-session-end` skill captures only *discoveries*: assumptions that were wrong, APIs that behaved unexpectedly, decisions made during implementation that aren't in the spec. If nothing surprising happened, you capture nothing. This keeps the signal-to-noise ratio high.
585
306
 
586
307
  **External holdout scenarios.** [StrongDM's Software Factory](https://factory.strongdm.ai/) proved that AI agents will [actively game visible test suites](https://palisaderesearch.org/blog/specification-gaming). Their solution: scenarios that live *outside* the codebase, invisible to the agent during development. Like a holdout set in ML, this prevents overfitting. Joycraft now implements this directly. `init-autofix` sets up the holdout wall, the scenario agent, and the GitHub App integration.