joycraft 0.4.0 → 0.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Maximilian Maksutovic
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md CHANGED
@@ -1,18 +1,68 @@
1
1
  # Joycraft
2
2
 
3
+ <p align="center">
4
+ <img src="docs/joycraft-banner.png" alt="Joycraft — the craft of AI development" width="700" />
5
+ </p>
6
+
3
7
  > The craft of AI development — with joy, not darkness.
4
8
 
5
- **Joycraft** is a CLI tool and Claude Code plugin that takes any project from Level 1 to Level 4 on [Dan Shapiro's 5 Levels of Vibe Coding](https://www.danshapiro.com/blog/2026/01/the-five-levels-from-spicy-autocomplete-to-the-software-factory/). One command gives you behavioral boundaries, atomic spec workflows, skill-driven development, and structured knowledge capture.
9
+ ## What is Joycraft?
10
+
11
+ Joycraft is a CLI tool and [Claude Code](https://docs.anthropic.com/en/docs/claude-code) plugin that upgrades your AI development workflow. It installs skills, behavioral boundaries, templates, and documentation structure into any project — taking you from unstructured prompting to autonomous spec-driven development.
12
+
13
+ If you've been using Claude Code (or any AI coding tool) and your workflow looks like this:
14
+
15
+ > Prompt → wait → read output → "no, not that" → re-prompt → fix hallucination → re-prompt → manually fix → "ok close enough" → commit
16
+
17
+ ...then Joycraft is for you.
18
+
19
+ This project started as a personal exploration by [@maksutovic](https://github.com/maksutovic). I was working across multiple client projects, spending more time wrestling with prompts than building software. I knew Claude Code was capable of extraordinary work, but my *process* was holding it back. I was vibe coding — and vibe coding doesn't scale.
20
+
21
+ The spark was [Nate B Jones' video on the 5 Levels of Vibe Coding](https://www.youtube.com/watch?v=bDcgHzCBgmQ). It mapped out a progression I hadn't seen articulated before — from "spicy autocomplete" to fully autonomous development — and lit my brain up to the potential of what Claude Code could do with the right harness around it. Joycraft is the result of that exploration: a tool that encodes the patterns, boundaries, and workflows that make AI-assisted development actually deterministic.
22
+
23
+ ### The core idea
24
+
25
+ Joycraft is simple. It's a set of **skills** (slash commands for Claude Code) and **instructions** (CLAUDE.md boundaries) that guide you and your agent through a structured development process:
26
+
27
+ - **Levels 1-4:** Skills like `/joycraft-tune`, `/joycraft-new-feature`, and `/joycraft-interview` replace unstructured prompting with spec-driven development. You interview, you write specs, the agent executes. No back-and-forth.
28
+ - **Level 5:** The `/joycraft-implement-level5` skill sets up the autonomous loop — where specs go in and validated software comes out, with holdout scenario testing that prevents the agent from gaming its own tests.
29
+
30
+ StrongDM calls their Level 5 fully autonomous loop a "Dark Factory" — which, albeit a cool name, the world has so much darkness in it right now. I wanted a name that extolled more of what I believe tools like this can provide: joy and craftsmanship. Hence "Joycraft."
31
+
32
+ ### What are the levels?
33
+
34
+ [Dan Shapiro's 5 Levels of Vibe Coding](https://www.danshapiro.com/blog/2026/01/the-five-levels-from-spicy-autocomplete-to-the-software-factory/) provides the framework:
35
+
36
+ | Level | Name | What it looks like | Joycraft's role |
37
+ |-------|------|--------------------|-----------------|
38
+ | 1 | Autocomplete | Tab-complete suggestions | — |
39
+ | 2 | Junior Developer | Prompt → iterate → fix → repeat | `/joycraft-tune` assesses where you are |
40
+ | 3 | Developer as Manager | Your life is reviewing diffs | Behavioral boundaries in CLAUDE.md |
41
+ | 4 | Developer as PM | You write specs, agent writes code | `/joycraft-new-feature` + `/joycraft-decompose` |
42
+ | 5 | Software Factory | Specs in, validated software out | `/joycraft-implement-level5` sets up the autonomous loop |
6
43
 
7
- The name is a deliberate counter-narrative to "dark factory." Autonomous software development should bring craft and joy to engineering, not darkness.
44
+ Most developers plateau at Level 2. Joycraft's job is to move you up.
45
+
46
+ ### Platform support
47
+
48
+ Joycraft is currently focused on making the Claude Code experience state-of-the-art. Better [Codex](https://openai.com/codex) support is coming — `AGENTS.md` generation is already included, and deeper integration is on the roadmap.
8
49
 
9
50
  ## Quick Start
10
51
 
52
+ First, install the CLI:
53
+
54
+ ```bash
55
+ npm install -g joycraft
56
+ ```
57
+
58
+ Then navigate to your project's root directory and initialize:
59
+
11
60
  ```bash
61
+ cd /path/to/your/project
12
62
  npx joycraft init
13
63
  ```
14
64
 
15
- That's it. Joycraft auto-detects your tech stack and creates:
65
+ Joycraft auto-detects your tech stack and creates:
16
66
 
17
67
  - **CLAUDE.md** with behavioral boundaries (Always / Ask First / Never) and correct build/test/lint commands
18
68
  - **AGENTS.md** for Codex compatibility
@@ -22,8 +72,11 @@ That's it. Joycraft auto-detects your tech stack and creates:
22
72
  - `/joycraft-interview` — Lightweight brainstorm — yap about ideas, get a structured summary
23
73
  - `/joycraft-decompose` — Break a brief into small, testable specs
24
74
  - `/joycraft-session-end` — Capture discoveries, verify, commit
75
+ - `/joycraft-implement-level5` — Set up Level 5: autofix loop, holdout scenarios, scenario evolution
25
76
  - **docs/** structure — `briefs/`, `specs/`, `discoveries/`, `contracts/`, `decisions/`
26
- - **Templates** — Atomic spec, feature brief, implementation plan, boundary framework
77
+ - **Templates** — Atomic spec, feature brief, implementation plan, boundary framework, and workflow templates for scenario generation and autofix loops
78
+
79
+ Once you reach Level 4, you can set up the autonomous loop with `/joycraft-implement-level5`. See [Level 5: The Autonomous Loop](#level-5-the-autonomous-loop) below.
27
80
 
28
81
  ### Supported Stacks
29
82
 
@@ -41,6 +94,7 @@ After init, open Claude Code and use the installed skills:
41
94
  /joycraft-new-feature # Interview → Feature Brief → Atomic Specs → ready to execute
42
95
  /joycraft-decompose # Break any feature into small, independent specs
43
96
  /joycraft-session-end # Wrap up — discoveries, verification, commit
97
+ /joycraft-implement-level5 # Set up Level 5 — autofix, holdout scenarios, evolution
44
98
  ```
45
99
 
46
100
  The core loop:
@@ -49,6 +103,54 @@ The core loop:
49
103
  Interview → Spec → Fresh Session → Execute → Discoveries → Ship
50
104
  ```
51
105
 
106
+ ## The Interview: Why It Matters
107
+
108
+ The single biggest upgrade Joycraft makes to your workflow is replacing the prompt-iterate-fix cycle with a **structured interview**.
109
+
110
+ Here's the problem with how most of us use AI coding tools: we open a session and start typing. "Build me a notification system." The agent starts writing code immediately. It makes assumptions about your data model, your UI framework, your error handling strategy, your deployment target. You catch some of these mid-flight, correct them, the agent adjusts, introduces new assumptions. Three hours later you have something that *kind of* works but is built on a foundation of guesses.
111
+
112
+ Joycraft flips this. Before the agent writes a single line of code, you have a conversation about *what you're building and why*.
113
+
114
+ ### Two interview modes
115
+
116
+ **`/joycraft-interview`** — The lightweight brainstorm. You yap about an idea, the agent asks clarifying questions, and you get a structured summary saved to `docs/briefs/`. Good for early-stage thinking when you're not ready to commit to building anything yet. No pressure, no specs — just organized thought.
117
+
118
+ **`/joycraft-new-feature`** — The full workflow. This is the structured interview that produces a **Feature Brief** (the what and why) and then decomposes it into **Atomic Specs** (small, testable, independently executable units of work). Each spec is self-contained — an agent in a fresh session can pick it up and execute without reading anything else.
119
+
120
+ ### Why this works
121
+
122
+ The insight comes from [Boris Cherny](https://www.lennysnewsletter.com/p/head-of-claude-code-what-happens) (Head of Claude Code at Anthropic): interview in one session, write the spec, then execute in a *fresh session* with clean context. The interview captures your intent. The spec is the contract. The execution session has only the spec — no baggage from the conversation, no accumulated misunderstandings, no context window full of abandoned approaches.
123
+
124
+ This is what separates Level 2 (back-and-forth prompting) from Level 4 (spec-driven development). You stop being a typist correcting an agent's guesses and start being a PM defining what needs to be built.
125
+
126
+ ```mermaid
127
+ flowchart LR
128
+ A["/joycraft-interview<br/>(brainstorm)"] --> B["Draft Brief<br/>docs/briefs/"]
129
+ B --> C["/joycraft-new-feature<br/>(structured interview)"]
130
+ C --> D["Feature Brief<br/>(what & why)"]
131
+ D --> E["/joycraft-decompose"]
132
+ E --> F["Atomic Specs<br/>docs/specs/"]
133
+ F --> G["Fresh Session<br/>Execute each spec"]
134
+ G --> H["/joycraft-session-end<br/>(discoveries + commit)"]
135
+
136
+ style A fill:#e8f4fd,stroke:#369
137
+ style C fill:#e8f4fd,stroke:#369
138
+ style F fill:#cfc,stroke:#393
139
+ style G fill:#ffd,stroke:#993
140
+ ```
141
+
142
+ ### What a good spec looks like
143
+
144
+ An atomic spec produced by `/joycraft-decompose` has:
145
+
146
+ - **What** — One paragraph. A developer with zero context understands the change in 15 seconds.
147
+ - **Why** — One sentence. What breaks or is missing without this?
148
+ - **Acceptance criteria** — Checkboxes. Testable. No ambiguity.
149
+ - **Affected files** — Exact paths, what changes in each.
150
+ - **Edge cases** — Table of scenarios and expected behavior.
151
+
152
+ The agent doesn't guess. It reads the spec and executes. If something's unclear, the spec is wrong — fix the spec, not the conversation.
153
+
52
154
  ## Upgrade
53
155
 
54
156
  When Joycraft templates and skills evolve, update without losing your customizations:
@@ -59,9 +161,311 @@ npx joycraft upgrade
59
161
 
60
162
  Joycraft tracks what it installed vs. what you've customized. Unmodified files update automatically. Customized files show a diff and ask before overwriting. Use `--yes` for CI.
61
163
 
62
- ## Git Autonomy
164
+ > **Note:** If you're upgrading from an early version, deprecated skill directories (e.g., `/joy`, `/joysmith`, `/tune`) are automatically removed during upgrade.
165
+
166
+ ## Level 5: The Autonomous Loop
167
+
168
+ > **A note on complexity:** Setting up Level 5 does have some moving parts and, depending on the complexity of your stack (software vs. hardware, monorepo vs. single app, etc.), this will require a good amount of prompting and trial-and-error to get right. I've done my best to make this as painless as possible, but just note — this is not a one-shot-prompt-done-in-5-minutes kind of thing. For small projects and simple stacks it will be easy, but any level of complexity is going to take some iteration, so plan ahead. Full step-by-step guides along with a video coming soon.
63
169
 
64
- When `/joycraft-tune` runs for the first time, it asks one question: **how autonomous should git be?**
170
+ Level 5 is where specs go in and validated software comes out. Joycraft implements this as four interlocking GitHub Actions workflows, a separate scenarios repository, and two independent AI agents that can never see each other's work.
171
+
172
+ Run `/joycraft-implement-level5` in Claude Code for a guided setup, or use the CLI directly:
173
+
174
+ ```bash
175
+ npx joycraft init-autofix --scenarios-repo my-project-scenarios --app-id 3180156
176
+ ```
177
+
178
+ ### Architecture Overview
179
+
180
+ Level 5 has four moving parts. Each is a GitHub Actions workflow that communicates via `repository_dispatch` events — no custom servers, no webhooks, no external services.
181
+
182
+ ```mermaid
183
+ graph TB
184
+ subgraph "Main Repository"
185
+ A[Push specs to docs/specs/] -->|push to main| B[Spec Dispatch Workflow]
186
+ C[PR opened] --> D[CI runs]
187
+ D -->|CI fails| E[Autofix Workflow]
188
+ D -->|CI passes| F[Scenarios Dispatch Workflow]
189
+ G[Scenarios Re-run Workflow]
190
+ end
191
+
192
+ subgraph "Scenarios Repository (private)"
193
+ H[Scenario Generation Workflow]
194
+ I[Scenario Run Workflow]
195
+ J[Holdout Tests]
196
+ K[Specs Mirror]
197
+ end
198
+
199
+ B -->|repository_dispatch: spec-pushed| H
200
+ H -->|reads specs, writes tests| J
201
+ H -->|repository_dispatch: scenarios-updated| G
202
+ G -->|repository_dispatch: run-scenarios| I
203
+ F -->|repository_dispatch: run-scenarios| I
204
+ I -->|posts PASS/FAIL comment| C
205
+ E -->|Claude fixes code, pushes| D
206
+
207
+ style J fill:#f9f,stroke:#333
208
+ style K fill:#bbf,stroke:#333
209
+ ```
210
+
211
+ ### The Four Workflows
212
+
213
+ #### 1. Autofix Workflow (`autofix.yml`)
214
+
215
+ Triggered when CI **fails** on a PR. Claude Code CLI reads the failure logs and attempts a fix.
216
+
217
+ ```mermaid
218
+ sequenceDiagram
219
+ participant CI as CI Workflow
220
+ participant AF as Autofix Workflow
221
+ participant Claude as Claude Code CLI
222
+ participant PR as Pull Request
223
+
224
+ CI->>AF: workflow_run (conclusion: failure)
225
+ AF->>AF: Generate GitHub App token
226
+ AF->>AF: Checkout PR branch
227
+ AF->>AF: Count previous autofix attempts
228
+
229
+ alt attempts >= 3
230
+ AF->>PR: Comment: "Human review needed"
231
+ else attempts < 3
232
+ AF->>AF: Fetch CI failure logs
233
+ AF->>AF: Strip ANSI codes
234
+ AF->>Claude: claude -p "Fix this CI failure..." <br/> --dangerously-skip-permissions --max-turns 20
235
+ Claude->>Claude: Read logs, edit code, run tests
236
+ Claude->>AF: Exit (changes committed locally)
237
+ AF->>PR: Push fix (commit prefix: "autofix:")
238
+ AF->>PR: Comment: summary of fix
239
+ Note over CI,PR: CI re-runs automatically on push
240
+ end
241
+ ```
242
+
243
+ **Key details:**
244
+ - Uses a GitHub App identity for pushes — avoids GitHub's anti-recursion protection
245
+ - Concurrency group per PR — only one autofix runs at a time per PR
246
+ - Max 3 iterations — posts "human review needed" if it can't fix it
247
+ - No `--model` flag — Claude CLI handles model selection
248
+ - Strips ANSI escape codes from logs so Claude gets clean text
249
+
250
+ #### 2. Scenarios Dispatch Workflow (`scenarios-dispatch.yml`)
251
+
252
+ Triggered when CI **passes** on a PR. Fires a `repository_dispatch` to the scenarios repo to run holdout tests against the PR branch.
253
+
254
+ ```mermaid
255
+ sequenceDiagram
256
+ participant CI as CI Workflow
257
+ participant SD as Scenarios Dispatch
258
+ participant SR as Scenarios Repo
259
+
260
+ CI->>SD: workflow_run (conclusion: success, PR)
261
+ SD->>SD: Generate GitHub App token
262
+ SD->>SR: repository_dispatch: run-scenarios<br/>payload: {pr_number, branch, sha, repo}
263
+ ```
264
+
265
+ #### 3. Spec Dispatch Workflow (`spec-dispatch.yml`)
266
+
267
+ Triggered when spec files are pushed to `main`. Sends the spec content to the scenarios repo so the scenario agent can write tests.
268
+
269
+ ```mermaid
270
+ sequenceDiagram
271
+ participant Dev as Developer
272
+ participant Main as Main Repo (push to main)
273
+ participant SPD as Spec Dispatch Workflow
274
+ participant SR as Scenarios Repo
275
+
276
+ Dev->>Main: Push specs to docs/specs/
277
+ Main->>SPD: push event (docs/specs/** changed)
278
+ SPD->>SPD: git diff --diff-filter=AM (added/modified only)
279
+
280
+ loop For each changed spec
281
+ SPD->>SR: repository_dispatch: spec-pushed<br/>payload: {spec_filename, spec_content, commit_sha, branch, repo}
282
+ end
283
+
284
+ Note over SPD: Deleted specs are ignored —<br/>existing scenario tests remain
285
+ ```
286
+
287
+ #### 4. Scenarios Re-run Workflow (`scenarios-rerun.yml`)
288
+
289
+ Triggered when the scenarios repo updates its tests. Re-dispatches all open PRs to the scenarios repo so they get tested with the latest holdout tests.
290
+
291
+ ```mermaid
292
+ sequenceDiagram
293
+ participant SR as Scenarios Repo
294
+ participant RR as Re-run Workflow
295
+ participant SRun as Scenarios Run
296
+
297
+ SR->>RR: repository_dispatch: scenarios-updated
298
+ RR->>RR: List open PRs via GitHub API
299
+
300
+ alt No open PRs
301
+ RR->>RR: Exit (no-op)
302
+ else Has open PRs
303
+ loop For each open PR
304
+ RR->>SRun: repository_dispatch: run-scenarios<br/>payload: {pr_number, branch, sha, repo}
305
+ end
306
+ end
307
+ ```
308
+
309
+ **Why this exists:** There's a race condition. The implementation agent might open a PR before the scenario agent finishes writing new tests. The re-run workflow handles this — when new tests land, all open PRs get re-tested. Worst case: a PR merges before the re-run, and the new tests protect the very next PR. You're never more than one cycle behind.
310
+
311
+ ### The Holdout Wall
312
+
313
+ The core safety mechanism. Two agents, two repos, one shared interface (specs):
314
+
315
+ ```mermaid
316
+ graph LR
317
+ subgraph "Implementation Agent (main repo)"
318
+ IA_sees["Can see:<br/>Source code<br/>Internal tests<br/>Specs"]
319
+ IA_cant["Cannot see:<br/>Scenario tests<br/>Scenario repo"]
320
+ end
321
+
322
+ subgraph "Specs (shared interface)"
323
+ Specs["docs/specs/*.md<br/>Describes WHAT should happen<br/>Never describes HOW it's tested"]
324
+ end
325
+
326
+ subgraph "Scenario Agent (scenarios repo)"
327
+ SA_sees["Can see:<br/>Specs (via dispatch)<br/>Scenario tests<br/>Specs mirror"]
328
+ SA_cant["Cannot see:<br/>Source code<br/>Internal tests"]
329
+ end
330
+
331
+ IA_sees --> Specs
332
+ Specs --> SA_sees
333
+
334
+ style IA_cant fill:#fcc,stroke:#933
335
+ style SA_cant fill:#fcc,stroke:#933
336
+ style Specs fill:#cfc,stroke:#393
337
+ ```
338
+
339
+ This is the same principle as a holdout set in machine learning. If the implementation agent could see the scenario tests, it would optimize to pass them specifically — not to build correct software. By keeping the wall intact, scenario tests catch real behavioral regressions, not test-gaming.
340
+
341
+ ### Scenario Evolution
342
+
343
+ Scenarios aren't static. When you push new specs, the scenario agent automatically triages them and writes new holdout tests.
344
+
345
+ ```mermaid
346
+ flowchart TD
347
+ A[New spec pushed to main] --> B[Spec Dispatch sends to scenarios repo]
348
+ B --> C[Scenario Agent reads spec]
349
+ C --> D{Triage: is this user-facing?}
350
+
351
+ D -->|Internal refactor, CI, dev tooling| E[Skip — commit note: 'No scenario changes needed']
352
+ D -->|New user-facing behavior| F[Write new scenario test file]
353
+ D -->|Modified existing behavior| G[Update existing scenario tests]
354
+
355
+ F --> H[Commit to scenarios main]
356
+ G --> H
357
+ H --> I[Dispatch scenarios-updated to main repo]
358
+ I --> J[Re-run workflow tests open PRs with new scenarios]
359
+
360
+ style D fill:#ffd,stroke:#993
361
+ style E fill:#ddd,stroke:#999
362
+ style F fill:#cfc,stroke:#393
363
+ style G fill:#cfc,stroke:#393
364
+ ```
365
+
366
+ **The scenario agent's prompt instructs it to:**
367
+ - Act as a QA engineer, never a developer
368
+ - Write only behavioral tests (invoke the built artifact, assert on output)
369
+ - Never import source code or reference internal implementation
370
+ - Use a triage decision tree: SKIP / NEW / UPDATE
371
+ - Err on the side of writing a test if the spec is ambiguous
372
+
373
+ **The specs mirror:** The scenarios repo maintains a `specs/` folder that mirrors every spec it receives. This gives the scenario agent historical context ("what features already exist?") without access to the main repo's codebase.
374
+
375
+ ### The Complete Loop
376
+
377
+ Here's the full lifecycle from spec to shipped, validated code:
378
+
379
+ ```mermaid
380
+ sequenceDiagram
381
+ participant Human as Human (writes specs)
382
+ participant Main as Main Repo
383
+ participant ScAgent as Scenario Agent
384
+ participant ScRepo as Scenarios Repo
385
+ participant ImplAgent as Implementation Agent
386
+ participant Autofix as Autofix Workflow
387
+
388
+ Human->>Main: Push spec to docs/specs/
389
+ Main->>ScAgent: spec-pushed dispatch
390
+
391
+ par Scenario Generation
392
+ ScAgent->>ScAgent: Triage spec
393
+ ScAgent->>ScRepo: Write/update holdout tests
394
+ ScRepo->>Main: scenarios-updated dispatch
395
+ and Implementation
396
+ Human->>ImplAgent: Execute spec (fresh session)
397
+ ImplAgent->>Main: Open PR
398
+ end
399
+
400
+ Main->>Main: CI runs on PR
401
+
402
+ alt CI fails
403
+ Main->>Autofix: Autofix workflow triggers
404
+ Autofix->>Main: Push fix, CI re-runs
405
+ end
406
+
407
+ alt CI passes
408
+ Main->>ScRepo: run-scenarios dispatch
409
+ ScRepo->>ScRepo: Clone PR branch, build, run holdout tests
410
+ ScRepo->>Main: Post PASS/FAIL comment on PR
411
+ end
412
+
413
+ alt Scenarios PASS
414
+ Note over Human,Main: Ready for human review and merge
415
+ else Scenarios FAIL
416
+ Main->>Autofix: Autofix attempts fix
417
+ Note over Autofix,ScRepo: Loop continues (max 3 iterations)
418
+ end
419
+ ```
420
+
421
+ ### What Gets Installed
422
+
423
+ | Where | File | Purpose |
424
+ |-------|------|---------|
425
+ | Main repo | `.github/workflows/autofix.yml` | CI failure → Claude fix → push |
426
+ | Main repo | `.github/workflows/scenarios-dispatch.yml` | CI pass → trigger holdout tests |
427
+ | Main repo | `.github/workflows/spec-dispatch.yml` | Spec push → trigger scenario generation |
428
+ | Main repo | `.github/workflows/scenarios-rerun.yml` | New tests → re-test open PRs |
429
+ | Scenarios repo | `workflows/run.yml` | Clone PR, build, run tests, post results |
430
+ | Scenarios repo | `workflows/generate.yml` | Receive spec, run scenario agent |
431
+ | Scenarios repo | `prompts/scenario-agent.md` | Scenario agent prompt template |
432
+ | Scenarios repo | `example-scenario.test.ts` | Example holdout test |
433
+ | Scenarios repo | `package.json` | Minimal vitest setup |
434
+ | Scenarios repo | `README.md` | Explains holdout pattern to contributors |
435
+
436
+ ### Prerequisites
437
+
438
+ - **GitHub App** — Provides a separate identity for autofix pushes (avoids GitHub's anti-recursion protection). You can install the shared [Joycraft Autofix](https://github.com/apps/joycraft-autofix) app (App ID: `3180156`) or create your own.
439
+ - **Secrets** — `JOYCRAFT_APP_PRIVATE_KEY` and `ANTHROPIC_API_KEY` on both the main and scenarios repos.
440
+ - **Scenarios repo** — A private repository where holdout tests live. Created during setup.
441
+
442
+ ### Cost
443
+
444
+ Validated in the Pipit trial (~3 minutes, one iteration, zero human intervention). With Claude Sonnet + `--max-turns 20` + max 3 iterations per PR:
445
+ - **Autofix:** ~$0.50 per attempt, worst case ~$1.50 per PR (3 iterations)
446
+ - **Scenario generation:** ~$0.20 per spec dispatch
447
+ - **Solo dev with ~10 PRs/month:** ~$5-10/month for the full loop
448
+
449
+ The iteration guard and max-turns cap prevent runaway costs.
450
+
451
+ ## Tuning: Risk Interview & Git Autonomy
452
+
453
+ When `/joycraft-tune` runs for the first time, it does two things:
454
+
455
+ ### Risk interview
456
+
457
+ 3-5 targeted questions about what's dangerous in your project — production databases, live APIs, secrets, files that should be off-limits. From your answers, Joycraft generates:
458
+
459
+ - **NEVER rules** for CLAUDE.md (e.g., "NEVER connect to production DB")
460
+ - **Deny patterns** for `.claude/settings.json` (blocks dangerous bash commands)
461
+ - **`docs/context/production-map.md`** — what's real vs. safe to touch
462
+ - **`docs/context/dangerous-assumptions.md`** — "Agent might assume X, but actually Y"
463
+
464
+ This takes 2-3 minutes and dramatically reduces the chance of your agent doing something catastrophic.
465
+
466
+ ### Git autonomy
467
+
468
+ One question: **how autonomous should git be?**
65
469
 
66
470
  - **Cautious** (default) — commits freely, asks before pushing or opening PRs. Good for learning the workflow.
67
471
  - **Autonomous** — commits, pushes to feature branches, and opens PRs without asking. Good for spec-driven development where you want full send.
@@ -107,7 +511,7 @@ Joycraft's approach is synthesized from several sources:
107
511
 
108
512
  **Knowledge capture over session notes.** Most session notes are never re-read. Joycraft's `/joycraft-session-end` skill captures only *discoveries* — assumptions that were wrong, APIs that behaved unexpectedly, decisions made during implementation that aren't in the spec. If nothing surprising happened, you capture nothing. This keeps the signal-to-noise ratio high.
109
513
 
110
- **External holdout scenarios.** [StrongDM's Software Factory](https://factory.strongdm.ai/) proved that AI agents will [actively game visible test suites](https://palisaderesearch.org/blog/specification-gaming). Their solution: scenarios that live *outside* the codebase, invisible to the agent during development. Like a holdout set in ML, this prevents overfitting. Joycraft provides the template for building these.
514
+ **External holdout scenarios.** [StrongDM's Software Factory](https://factory.strongdm.ai/) proved that AI agents will [actively game visible test suites](https://palisaderesearch.org/blog/specification-gaming). Their solution: scenarios that live *outside* the codebase, invisible to the agent during development. Like a holdout set in ML, this prevents overfitting. Joycraft now implements this directly — `init-autofix` sets up the holdout wall, the scenario agent, and the GitHub App integration, not just provides templates for it.
111
515
 
112
516
  **The 5-level framework.** [Dan Shapiro's levels](https://www.danshapiro.com/blog/2026/01/the-five-levels-from-spicy-autocomplete-to-the-software-factory/) give you a map. Level 2 (Junior Developer) is where most teams plateau. Level 3 (Developer as Manager) means your life is diffs. Level 4 (Developer as PM) means you write specs, not code. Level 5 (Dark Factory) means specs in, software out. Joycraft's `/joycraft-tune` assessment tells you where you are and what to do next.
113
517
 
@@ -115,15 +519,29 @@ Joycraft's approach is synthesized from several sources:
115
519
 
116
520
  Joycraft synthesizes ideas and patterns from people doing extraordinary work in AI-assisted software development:
117
521
 
118
- - **[Dan Shapiro](https://www.danshapiro.com/)** ([@danshapiro](https://github.com/danshapiro)) — The [5 Levels of Vibe Coding](https://www.danshapiro.com/blog/2026/01/the-five-levels-from-spicy-autocomplete-to-the-software-factory/) framework that Joycraft's assessment and level system is built on
119
- - **[StrongDM](https://www.strongdm.com/)** / **Justin McCarthy** ([@justinmccarthy](https://github.com/justinmccarthy)) — The [Software Factory](https://factory.strongdm.ai/): spec-driven autonomous development, NLSpec, external holdout scenarios, and the proof that 3 engineers can outproduce 30
120
- - **[Boris Cherny](https://performancejs.com/)** ([@bcherny](https://github.com/bcherny)) — Head of Claude Code at Anthropic. The interview → spec → fresh session → execute pattern, and the insight that [context isolation produces better results](https://www.lennysnewsletter.com/p/head-of-claude-code-what-happens)
121
- - **[Addy Osmani](https://addyosmani.com/)** ([@addyosmani](https://github.com/addyosmani)) — [What makes a good spec for AI](https://addyosmani.com/blog/good-spec/): commands, testing, project structure, code style, git workflow, and boundaries
522
+ - **[Dan Shapiro](https://x.com/danshapiro)** — The [5 Levels of Vibe Coding](https://www.danshapiro.com/blog/2026/01/the-five-levels-from-spicy-autocomplete-to-the-software-factory/) framework that Joycraft's assessment and level system is built on
523
+ - **[StrongDM](https://www.strongdm.com/)** / **[Justin McCarthy](https://x.com/BuiltByJustin)** — The [Software Factory](https://factory.strongdm.ai/): spec-driven autonomous development, NLSpec, external holdout scenarios, and the proof that 3 engineers can outproduce 30
524
+ - **[Boris Cherny](https://x.com/bcherny)** — Head of Claude Code at Anthropic. The interview → spec → fresh session → execute pattern, and the insight that [context isolation produces better results](https://www.lennysnewsletter.com/p/head-of-claude-code-what-happens)
525
+ - **[Addy Osmani](https://x.com/addyosmani)** — [What makes a good spec for AI](https://addyosmani.com/blog/good-spec/): commands, testing, project structure, code style, git workflow, and boundaries
122
526
  - **[METR](https://metr.org/)** — The [randomized control trial](https://metr.org/) that proved unstructured AI use makes experienced developers slower, validating the need for harnesses
123
- - **[Nate B Jones](https://www.youtube.com/@natebj)** ([@nateBJ](https://github.com/nateBJ)) — His [video on the 5 Levels of Vibe Coding](https://www.youtube.com/watch?v=bDcgHzCBgmQ) made this research accessible and inspired turning Joycraft into a tool anyone can use
124
- - **[Simon Willison](https://simonwillison.net/)** ([@simonw](https://github.com/simonw)) — [Analysis of the Software Factory](https://simonwillison.net/2026/Feb/7/software-factory/) that helped contextualize StrongDM's approach for the broader community
527
+ - **[Nate B Jones](https://x.com/natebjones)** — His [video on the 5 Levels of Vibe Coding](https://www.youtube.com/watch?v=bDcgHzCBgmQ) made this research accessible and inspired turning Joycraft into a tool anyone can use
528
+ - **[Simon Willison](https://x.com/simonw)** — [Analysis of the Software Factory](https://simonwillison.net/2026/Feb/7/software-factory/) that helped contextualize StrongDM's approach for the broader community
125
529
  - **[Anthropic](https://www.anthropic.com/)** — Claude Code's skills, hooks, and CLAUDE.md system that makes tool-native AI development possible, and the [harness patterns for long-running agents](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents)
126
530
 
531
+ ## Contributing
532
+
533
+ Contributions are welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for the full guide.
534
+
535
+ The short version:
536
+
537
+ 1. Fork, branch from `main`
538
+ 2. `pnpm install && pnpm test --run` to verify your setup
539
+ 3. Write tests first, then implement
540
+ 4. `pnpm test --run && pnpm typecheck && pnpm build`
541
+ 5. Open a PR — one approval required
542
+
543
+ Look for [`good first issue`](https://github.com/maksutovic/joycraft/labels/good%20first%20issue) labels if you're new. Areas we'd especially love help with: stack detection for new languages, skill improvements, documentation, and Codex integration.
544
+
127
545
  ## License
128
546
 
129
- MIT
547
+ MIT — see [LICENSE](LICENSE) for details.
@@ -0,0 +1,36 @@
1
+ #!/usr/bin/env node
2
+
3
+ // src/version.ts
4
+ import { readFileSync, writeFileSync, existsSync } from "fs";
5
+ import { join } from "path";
6
+ import { createHash } from "crypto";
7
+ var VERSION_FILE = ".joycraft-version";
8
+ function hashContent(content) {
9
+ return createHash("sha256").update(content).digest("hex");
10
+ }
11
+ function readVersion(dir) {
12
+ const filePath = join(dir, VERSION_FILE);
13
+ if (!existsSync(filePath)) return null;
14
+ try {
15
+ const raw = readFileSync(filePath, "utf-8");
16
+ const parsed = JSON.parse(raw);
17
+ if (typeof parsed.version === "string" && typeof parsed.files === "object") {
18
+ return parsed;
19
+ }
20
+ return null;
21
+ } catch {
22
+ return null;
23
+ }
24
+ }
25
+ function writeVersion(dir, version, files) {
26
+ const filePath = join(dir, VERSION_FILE);
27
+ const data = { version, files };
28
+ writeFileSync(filePath, JSON.stringify(data, null, 2) + "\n", "utf-8");
29
+ }
30
+
31
+ export {
32
+ hashContent,
33
+ readVersion,
34
+ writeVersion
35
+ };
36
+ //# sourceMappingURL=chunk-2S7KP7FU.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"sources":["../src/version.ts"],"sourcesContent":["import { readFileSync, writeFileSync, existsSync } from 'node:fs';\nimport { join } from 'node:path';\nimport { createHash } from 'node:crypto';\n\nconst VERSION_FILE = '.joycraft-version';\n\nexport interface VersionInfo {\n version: string;\n files: Record<string, string>;\n}\n\nexport function hashContent(content: string): string {\n return createHash('sha256').update(content).digest('hex');\n}\n\nexport function readVersion(dir: string): VersionInfo | null {\n const filePath = join(dir, VERSION_FILE);\n if (!existsSync(filePath)) return null;\n try {\n const raw = readFileSync(filePath, 'utf-8');\n const parsed = JSON.parse(raw);\n if (typeof parsed.version === 'string' && typeof parsed.files === 'object') {\n return parsed as VersionInfo;\n }\n return null;\n } catch {\n return null;\n }\n}\n\nexport function writeVersion(dir: string, version: string, files: Record<string, string>): void {\n const filePath = join(dir, VERSION_FILE);\n const data: VersionInfo = { version, files };\n writeFileSync(filePath, JSON.stringify(data, null, 2) + '\\n', 'utf-8');\n}\n"],"mappings":";;;AAAA,SAAS,cAAc,eAAe,kBAAkB;AACxD,SAAS,YAAY;AACrB,SAAS,kBAAkB;AAE3B,IAAM,eAAe;AAOd,SAAS,YAAY,SAAyB;AACnD,SAAO,WAAW,QAAQ,EAAE,OAAO,OAAO,EAAE,OAAO,KAAK;AAC1D;AAEO,SAAS,YAAY,KAAiC;AAC3D,QAAM,WAAW,KAAK,KAAK,YAAY;AACvC,MAAI,CAAC,WAAW,QAAQ,EAAG,QAAO;AAClC,MAAI;AACF,UAAM,MAAM,aAAa,UAAU,OAAO;AAC1C,UAAM,SAAS,KAAK,MAAM,GAAG;AAC7B,QAAI,OAAO,OAAO,YAAY,YAAY,OAAO,OAAO,UAAU,UAAU;AAC1E,aAAO;AAAA,IACT;AACA,WAAO;AAAA,EACT,QAAQ;AACN,WAAO;AAAA,EACT;AACF;AAEO,SAAS,aAAa,KAAa,SAAiB,OAAqC;AAC9F,QAAM,WAAW,KAAK,KAAK,YAAY;AACvC,QAAM,OAAoB,EAAE,SAAS,MAAM;AAC3C,gBAAc,UAAU,KAAK,UAAU,MAAM,MAAM,CAAC,IAAI,MAAM,OAAO;AACvE;","names":[]}