joycraft 0.4.0 → 0.5.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +432 -14
- package/dist/chunk-2S7KP7FU.js +36 -0
- package/dist/chunk-2S7KP7FU.js.map +1 -0
- package/dist/chunk-HHW4Q2UC.js +2297 -0
- package/dist/chunk-HHW4Q2UC.js.map +1 -0
- package/dist/cli.js +6 -2
- package/dist/cli.js.map +1 -1
- package/dist/{init-XHJDJIZW.js → init-DHVJEWGX.js} +7 -4
- package/dist/init-DHVJEWGX.js.map +1 -0
- package/dist/init-autofix-OVHXYVLB.js +118 -0
- package/dist/init-autofix-OVHXYVLB.js.map +1 -0
- package/dist/{upgrade-NOHZWQMO.js → upgrade-RN2D5RAT.js} +60 -15
- package/dist/upgrade-RN2D5RAT.js.map +1 -0
- package/package.json +11 -2
- package/dist/chunk-FNGCEYUY.js +0 -1280
- package/dist/chunk-FNGCEYUY.js.map +0 -1
- package/dist/init-XHJDJIZW.js.map +0 -1
- package/dist/upgrade-NOHZWQMO.js.map +0 -1
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Maximilian Maksutovic
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
CHANGED
|
@@ -1,18 +1,68 @@
|
|
|
1
1
|
# Joycraft
|
|
2
2
|
|
|
3
|
+
<p align="center">
|
|
4
|
+
<img src="docs/joycraft-banner.png" alt="Joycraft — the craft of AI development" width="700" />
|
|
5
|
+
</p>
|
|
6
|
+
|
|
3
7
|
> The craft of AI development — with joy, not darkness.
|
|
4
8
|
|
|
5
|
-
|
|
9
|
+
## What is Joycraft?
|
|
10
|
+
|
|
11
|
+
Joycraft is a CLI tool and [Claude Code](https://docs.anthropic.com/en/docs/claude-code) plugin that upgrades your AI development workflow. It installs skills, behavioral boundaries, templates, and documentation structure into any project — taking you from unstructured prompting to autonomous spec-driven development.
|
|
12
|
+
|
|
13
|
+
If you've been using Claude Code (or any AI coding tool) and your workflow looks like this:
|
|
14
|
+
|
|
15
|
+
> Prompt → wait → read output → "no, not that" → re-prompt → fix hallucination → re-prompt → manually fix → "ok close enough" → commit
|
|
16
|
+
|
|
17
|
+
...then Joycraft is for you.
|
|
18
|
+
|
|
19
|
+
This project started as a personal exploration by [@maksutovic](https://github.com/maksutovic). I was working across multiple client projects, spending more time wrestling with prompts than building software. I knew Claude Code was capable of extraordinary work, but my *process* was holding it back. I was vibe coding — and vibe coding doesn't scale.
|
|
20
|
+
|
|
21
|
+
The spark was [Nate B Jones' video on the 5 Levels of Vibe Coding](https://www.youtube.com/watch?v=bDcgHzCBgmQ). It mapped out a progression I hadn't seen articulated before — from "spicy autocomplete" to fully autonomous development — and lit my brain up to the potential of what Claude Code could do with the right harness around it. Joycraft is the result of that exploration: a tool that encodes the patterns, boundaries, and workflows that make AI-assisted development actually deterministic.
|
|
22
|
+
|
|
23
|
+
### The core idea
|
|
24
|
+
|
|
25
|
+
Joycraft is simple. It's a set of **skills** (slash commands for Claude Code) and **instructions** (CLAUDE.md boundaries) that guide you and your agent through a structured development process:
|
|
26
|
+
|
|
27
|
+
- **Levels 1-4:** Skills like `/joycraft-tune`, `/joycraft-new-feature`, and `/joycraft-interview` replace unstructured prompting with spec-driven development. You interview, you write specs, the agent executes. No back-and-forth.
|
|
28
|
+
- **Level 5:** The `/joycraft-implement-level5` skill sets up the autonomous loop — where specs go in and validated software comes out, with holdout scenario testing that prevents the agent from gaming its own tests.
|
|
29
|
+
|
|
30
|
+
StrongDM calls their Level 5 fully autonomous loop a "Dark Factory" — which, albeit a cool name, the world has so much darkness in it right now. I wanted a name that extolled more of what I believe tools like this can provide: joy and craftsmanship. Hence "Joycraft."
|
|
31
|
+
|
|
32
|
+
### What are the levels?
|
|
33
|
+
|
|
34
|
+
[Dan Shapiro's 5 Levels of Vibe Coding](https://www.danshapiro.com/blog/2026/01/the-five-levels-from-spicy-autocomplete-to-the-software-factory/) provides the framework:
|
|
35
|
+
|
|
36
|
+
| Level | Name | What it looks like | Joycraft's role |
|
|
37
|
+
|-------|------|--------------------|-----------------|
|
|
38
|
+
| 1 | Autocomplete | Tab-complete suggestions | — |
|
|
39
|
+
| 2 | Junior Developer | Prompt → iterate → fix → repeat | `/joycraft-tune` assesses where you are |
|
|
40
|
+
| 3 | Developer as Manager | Your life is reviewing diffs | Behavioral boundaries in CLAUDE.md |
|
|
41
|
+
| 4 | Developer as PM | You write specs, agent writes code | `/joycraft-new-feature` + `/joycraft-decompose` |
|
|
42
|
+
| 5 | Software Factory | Specs in, validated software out | `/joycraft-implement-level5` sets up the autonomous loop |
|
|
6
43
|
|
|
7
|
-
|
|
44
|
+
Most developers plateau at Level 2. Joycraft's job is to move you up.
|
|
45
|
+
|
|
46
|
+
### Platform support
|
|
47
|
+
|
|
48
|
+
Joycraft is currently focused on making the Claude Code experience state-of-the-art. Better [Codex](https://openai.com/codex) support is coming — `AGENTS.md` generation is already included, and deeper integration is on the roadmap.
|
|
8
49
|
|
|
9
50
|
## Quick Start
|
|
10
51
|
|
|
52
|
+
First, install the CLI:
|
|
53
|
+
|
|
54
|
+
```bash
|
|
55
|
+
npm install -g joycraft
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
Then navigate to your project's root directory and initialize:
|
|
59
|
+
|
|
11
60
|
```bash
|
|
61
|
+
cd /path/to/your/project
|
|
12
62
|
npx joycraft init
|
|
13
63
|
```
|
|
14
64
|
|
|
15
|
-
|
|
65
|
+
Joycraft auto-detects your tech stack and creates:
|
|
16
66
|
|
|
17
67
|
- **CLAUDE.md** with behavioral boundaries (Always / Ask First / Never) and correct build/test/lint commands
|
|
18
68
|
- **AGENTS.md** for Codex compatibility
|
|
@@ -22,8 +72,11 @@ That's it. Joycraft auto-detects your tech stack and creates:
|
|
|
22
72
|
- `/joycraft-interview` — Lightweight brainstorm — yap about ideas, get a structured summary
|
|
23
73
|
- `/joycraft-decompose` — Break a brief into small, testable specs
|
|
24
74
|
- `/joycraft-session-end` — Capture discoveries, verify, commit
|
|
75
|
+
- `/joycraft-implement-level5` — Set up Level 5: autofix loop, holdout scenarios, scenario evolution
|
|
25
76
|
- **docs/** structure — `briefs/`, `specs/`, `discoveries/`, `contracts/`, `decisions/`
|
|
26
|
-
- **Templates** — Atomic spec, feature brief, implementation plan, boundary framework
|
|
77
|
+
- **Templates** — Atomic spec, feature brief, implementation plan, boundary framework, and workflow templates for scenario generation and autofix loops
|
|
78
|
+
|
|
79
|
+
Once you reach Level 4, you can set up the autonomous loop with `/joycraft-implement-level5`. See [Level 5: The Autonomous Loop](#level-5-the-autonomous-loop) below.
|
|
27
80
|
|
|
28
81
|
### Supported Stacks
|
|
29
82
|
|
|
@@ -41,6 +94,7 @@ After init, open Claude Code and use the installed skills:
|
|
|
41
94
|
/joycraft-new-feature # Interview → Feature Brief → Atomic Specs → ready to execute
|
|
42
95
|
/joycraft-decompose # Break any feature into small, independent specs
|
|
43
96
|
/joycraft-session-end # Wrap up — discoveries, verification, commit
|
|
97
|
+
/joycraft-implement-level5 # Set up Level 5 — autofix, holdout scenarios, evolution
|
|
44
98
|
```
|
|
45
99
|
|
|
46
100
|
The core loop:
|
|
@@ -49,6 +103,54 @@ The core loop:
|
|
|
49
103
|
Interview → Spec → Fresh Session → Execute → Discoveries → Ship
|
|
50
104
|
```
|
|
51
105
|
|
|
106
|
+
## The Interview: Why It Matters
|
|
107
|
+
|
|
108
|
+
The single biggest upgrade Joycraft makes to your workflow is replacing the prompt-iterate-fix cycle with a **structured interview**.
|
|
109
|
+
|
|
110
|
+
Here's the problem with how most of us use AI coding tools: we open a session and start typing. "Build me a notification system." The agent starts writing code immediately. It makes assumptions about your data model, your UI framework, your error handling strategy, your deployment target. You catch some of these mid-flight, correct them, the agent adjusts, introduces new assumptions. Three hours later you have something that *kind of* works but is built on a foundation of guesses.
|
|
111
|
+
|
|
112
|
+
Joycraft flips this. Before the agent writes a single line of code, you have a conversation about *what you're building and why*.
|
|
113
|
+
|
|
114
|
+
### Two interview modes
|
|
115
|
+
|
|
116
|
+
**`/joycraft-interview`** — The lightweight brainstorm. You yap about an idea, the agent asks clarifying questions, and you get a structured summary saved to `docs/briefs/`. Good for early-stage thinking when you're not ready to commit to building anything yet. No pressure, no specs — just organized thought.
|
|
117
|
+
|
|
118
|
+
**`/joycraft-new-feature`** — The full workflow. This is the structured interview that produces a **Feature Brief** (the what and why) and then decomposes it into **Atomic Specs** (small, testable, independently executable units of work). Each spec is self-contained — an agent in a fresh session can pick it up and execute without reading anything else.
|
|
119
|
+
|
|
120
|
+
### Why this works
|
|
121
|
+
|
|
122
|
+
The insight comes from [Boris Cherny](https://www.lennysnewsletter.com/p/head-of-claude-code-what-happens) (Head of Claude Code at Anthropic): interview in one session, write the spec, then execute in a *fresh session* with clean context. The interview captures your intent. The spec is the contract. The execution session has only the spec — no baggage from the conversation, no accumulated misunderstandings, no context window full of abandoned approaches.
|
|
123
|
+
|
|
124
|
+
This is what separates Level 2 (back-and-forth prompting) from Level 4 (spec-driven development). You stop being a typist correcting an agent's guesses and start being a PM defining what needs to be built.
|
|
125
|
+
|
|
126
|
+
```mermaid
|
|
127
|
+
flowchart LR
|
|
128
|
+
A["/joycraft-interview<br/>(brainstorm)"] --> B["Draft Brief<br/>docs/briefs/"]
|
|
129
|
+
B --> C["/joycraft-new-feature<br/>(structured interview)"]
|
|
130
|
+
C --> D["Feature Brief<br/>(what & why)"]
|
|
131
|
+
D --> E["/joycraft-decompose"]
|
|
132
|
+
E --> F["Atomic Specs<br/>docs/specs/"]
|
|
133
|
+
F --> G["Fresh Session<br/>Execute each spec"]
|
|
134
|
+
G --> H["/joycraft-session-end<br/>(discoveries + commit)"]
|
|
135
|
+
|
|
136
|
+
style A fill:#e8f4fd,stroke:#369
|
|
137
|
+
style C fill:#e8f4fd,stroke:#369
|
|
138
|
+
style F fill:#cfc,stroke:#393
|
|
139
|
+
style G fill:#ffd,stroke:#993
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
### What a good spec looks like
|
|
143
|
+
|
|
144
|
+
An atomic spec produced by `/joycraft-decompose` has:
|
|
145
|
+
|
|
146
|
+
- **What** — One paragraph. A developer with zero context understands the change in 15 seconds.
|
|
147
|
+
- **Why** — One sentence. What breaks or is missing without this?
|
|
148
|
+
- **Acceptance criteria** — Checkboxes. Testable. No ambiguity.
|
|
149
|
+
- **Affected files** — Exact paths, what changes in each.
|
|
150
|
+
- **Edge cases** — Table of scenarios and expected behavior.
|
|
151
|
+
|
|
152
|
+
The agent doesn't guess. It reads the spec and executes. If something's unclear, the spec is wrong — fix the spec, not the conversation.
|
|
153
|
+
|
|
52
154
|
## Upgrade
|
|
53
155
|
|
|
54
156
|
When Joycraft templates and skills evolve, update without losing your customizations:
|
|
@@ -59,9 +161,311 @@ npx joycraft upgrade
|
|
|
59
161
|
|
|
60
162
|
Joycraft tracks what it installed vs. what you've customized. Unmodified files update automatically. Customized files show a diff and ask before overwriting. Use `--yes` for CI.
|
|
61
163
|
|
|
62
|
-
|
|
164
|
+
> **Note:** If you're upgrading from an early version, deprecated skill directories (e.g., `/joy`, `/joysmith`, `/tune`) are automatically removed during upgrade.
|
|
165
|
+
|
|
166
|
+
## Level 5: The Autonomous Loop
|
|
167
|
+
|
|
168
|
+
> **A note on complexity:** Setting up Level 5 does have some moving parts and, depending on the complexity of your stack (software vs. hardware, monorepo vs. single app, etc.), this will require a good amount of prompting and trial-and-error to get right. I've done my best to make this as painless as possible, but just note — this is not a one-shot-prompt-done-in-5-minutes kind of thing. For small projects and simple stacks it will be easy, but any level of complexity is going to take some iteration, so plan ahead. Full step-by-step guides along with a video coming soon.
|
|
63
169
|
|
|
64
|
-
|
|
170
|
+
Level 5 is where specs go in and validated software comes out. Joycraft implements this as four interlocking GitHub Actions workflows, a separate scenarios repository, and two independent AI agents that can never see each other's work.
|
|
171
|
+
|
|
172
|
+
Run `/joycraft-implement-level5` in Claude Code for a guided setup, or use the CLI directly:
|
|
173
|
+
|
|
174
|
+
```bash
|
|
175
|
+
npx joycraft init-autofix --scenarios-repo my-project-scenarios --app-id 3180156
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
### Architecture Overview
|
|
179
|
+
|
|
180
|
+
Level 5 has four moving parts. Each is a GitHub Actions workflow that communicates via `repository_dispatch` events — no custom servers, no webhooks, no external services.
|
|
181
|
+
|
|
182
|
+
```mermaid
|
|
183
|
+
graph TB
|
|
184
|
+
subgraph "Main Repository"
|
|
185
|
+
A[Push specs to docs/specs/] -->|push to main| B[Spec Dispatch Workflow]
|
|
186
|
+
C[PR opened] --> D[CI runs]
|
|
187
|
+
D -->|CI fails| E[Autofix Workflow]
|
|
188
|
+
D -->|CI passes| F[Scenarios Dispatch Workflow]
|
|
189
|
+
G[Scenarios Re-run Workflow]
|
|
190
|
+
end
|
|
191
|
+
|
|
192
|
+
subgraph "Scenarios Repository (private)"
|
|
193
|
+
H[Scenario Generation Workflow]
|
|
194
|
+
I[Scenario Run Workflow]
|
|
195
|
+
J[Holdout Tests]
|
|
196
|
+
K[Specs Mirror]
|
|
197
|
+
end
|
|
198
|
+
|
|
199
|
+
B -->|repository_dispatch: spec-pushed| H
|
|
200
|
+
H -->|reads specs, writes tests| J
|
|
201
|
+
H -->|repository_dispatch: scenarios-updated| G
|
|
202
|
+
G -->|repository_dispatch: run-scenarios| I
|
|
203
|
+
F -->|repository_dispatch: run-scenarios| I
|
|
204
|
+
I -->|posts PASS/FAIL comment| C
|
|
205
|
+
E -->|Claude fixes code, pushes| D
|
|
206
|
+
|
|
207
|
+
style J fill:#f9f,stroke:#333
|
|
208
|
+
style K fill:#bbf,stroke:#333
|
|
209
|
+
```
|
|
210
|
+
|
|
211
|
+
### The Four Workflows
|
|
212
|
+
|
|
213
|
+
#### 1. Autofix Workflow (`autofix.yml`)
|
|
214
|
+
|
|
215
|
+
Triggered when CI **fails** on a PR. Claude Code CLI reads the failure logs and attempts a fix.
|
|
216
|
+
|
|
217
|
+
```mermaid
|
|
218
|
+
sequenceDiagram
|
|
219
|
+
participant CI as CI Workflow
|
|
220
|
+
participant AF as Autofix Workflow
|
|
221
|
+
participant Claude as Claude Code CLI
|
|
222
|
+
participant PR as Pull Request
|
|
223
|
+
|
|
224
|
+
CI->>AF: workflow_run (conclusion: failure)
|
|
225
|
+
AF->>AF: Generate GitHub App token
|
|
226
|
+
AF->>AF: Checkout PR branch
|
|
227
|
+
AF->>AF: Count previous autofix attempts
|
|
228
|
+
|
|
229
|
+
alt attempts >= 3
|
|
230
|
+
AF->>PR: Comment: "Human review needed"
|
|
231
|
+
else attempts < 3
|
|
232
|
+
AF->>AF: Fetch CI failure logs
|
|
233
|
+
AF->>AF: Strip ANSI codes
|
|
234
|
+
AF->>Claude: claude -p "Fix this CI failure..." <br/> --dangerously-skip-permissions --max-turns 20
|
|
235
|
+
Claude->>Claude: Read logs, edit code, run tests
|
|
236
|
+
Claude->>AF: Exit (changes committed locally)
|
|
237
|
+
AF->>PR: Push fix (commit prefix: "autofix:")
|
|
238
|
+
AF->>PR: Comment: summary of fix
|
|
239
|
+
Note over CI,PR: CI re-runs automatically on push
|
|
240
|
+
end
|
|
241
|
+
```
|
|
242
|
+
|
|
243
|
+
**Key details:**
|
|
244
|
+
- Uses a GitHub App identity for pushes — avoids GitHub's anti-recursion protection
|
|
245
|
+
- Concurrency group per PR — only one autofix runs at a time per PR
|
|
246
|
+
- Max 3 iterations — posts "human review needed" if it can't fix it
|
|
247
|
+
- No `--model` flag — Claude CLI handles model selection
|
|
248
|
+
- Strips ANSI escape codes from logs so Claude gets clean text
|
|
249
|
+
|
|
250
|
+
#### 2. Scenarios Dispatch Workflow (`scenarios-dispatch.yml`)
|
|
251
|
+
|
|
252
|
+
Triggered when CI **passes** on a PR. Fires a `repository_dispatch` to the scenarios repo to run holdout tests against the PR branch.
|
|
253
|
+
|
|
254
|
+
```mermaid
|
|
255
|
+
sequenceDiagram
|
|
256
|
+
participant CI as CI Workflow
|
|
257
|
+
participant SD as Scenarios Dispatch
|
|
258
|
+
participant SR as Scenarios Repo
|
|
259
|
+
|
|
260
|
+
CI->>SD: workflow_run (conclusion: success, PR)
|
|
261
|
+
SD->>SD: Generate GitHub App token
|
|
262
|
+
SD->>SR: repository_dispatch: run-scenarios<br/>payload: {pr_number, branch, sha, repo}
|
|
263
|
+
```
|
|
264
|
+
|
|
265
|
+
#### 3. Spec Dispatch Workflow (`spec-dispatch.yml`)
|
|
266
|
+
|
|
267
|
+
Triggered when spec files are pushed to `main`. Sends the spec content to the scenarios repo so the scenario agent can write tests.
|
|
268
|
+
|
|
269
|
+
```mermaid
|
|
270
|
+
sequenceDiagram
|
|
271
|
+
participant Dev as Developer
|
|
272
|
+
participant Main as Main Repo (push to main)
|
|
273
|
+
participant SPD as Spec Dispatch Workflow
|
|
274
|
+
participant SR as Scenarios Repo
|
|
275
|
+
|
|
276
|
+
Dev->>Main: Push specs to docs/specs/
|
|
277
|
+
Main->>SPD: push event (docs/specs/** changed)
|
|
278
|
+
SPD->>SPD: git diff --diff-filter=AM (added/modified only)
|
|
279
|
+
|
|
280
|
+
loop For each changed spec
|
|
281
|
+
SPD->>SR: repository_dispatch: spec-pushed<br/>payload: {spec_filename, spec_content, commit_sha, branch, repo}
|
|
282
|
+
end
|
|
283
|
+
|
|
284
|
+
Note over SPD: Deleted specs are ignored —<br/>existing scenario tests remain
|
|
285
|
+
```
|
|
286
|
+
|
|
287
|
+
#### 4. Scenarios Re-run Workflow (`scenarios-rerun.yml`)
|
|
288
|
+
|
|
289
|
+
Triggered when the scenarios repo updates its tests. Re-dispatches all open PRs to the scenarios repo so they get tested with the latest holdout tests.
|
|
290
|
+
|
|
291
|
+
```mermaid
|
|
292
|
+
sequenceDiagram
|
|
293
|
+
participant SR as Scenarios Repo
|
|
294
|
+
participant RR as Re-run Workflow
|
|
295
|
+
participant SRun as Scenarios Run
|
|
296
|
+
|
|
297
|
+
SR->>RR: repository_dispatch: scenarios-updated
|
|
298
|
+
RR->>RR: List open PRs via GitHub API
|
|
299
|
+
|
|
300
|
+
alt No open PRs
|
|
301
|
+
RR->>RR: Exit (no-op)
|
|
302
|
+
else Has open PRs
|
|
303
|
+
loop For each open PR
|
|
304
|
+
RR->>SRun: repository_dispatch: run-scenarios<br/>payload: {pr_number, branch, sha, repo}
|
|
305
|
+
end
|
|
306
|
+
end
|
|
307
|
+
```
|
|
308
|
+
|
|
309
|
+
**Why this exists:** There's a race condition. The implementation agent might open a PR before the scenario agent finishes writing new tests. The re-run workflow handles this — when new tests land, all open PRs get re-tested. Worst case: a PR merges before the re-run, and the new tests protect the very next PR. You're never more than one cycle behind.
|
|
310
|
+
|
|
311
|
+
### The Holdout Wall
|
|
312
|
+
|
|
313
|
+
The core safety mechanism. Two agents, two repos, one shared interface (specs):
|
|
314
|
+
|
|
315
|
+
```mermaid
|
|
316
|
+
graph LR
|
|
317
|
+
subgraph "Implementation Agent (main repo)"
|
|
318
|
+
IA_sees["Can see:<br/>Source code<br/>Internal tests<br/>Specs"]
|
|
319
|
+
IA_cant["Cannot see:<br/>Scenario tests<br/>Scenario repo"]
|
|
320
|
+
end
|
|
321
|
+
|
|
322
|
+
subgraph "Specs (shared interface)"
|
|
323
|
+
Specs["docs/specs/*.md<br/>Describes WHAT should happen<br/>Never describes HOW it's tested"]
|
|
324
|
+
end
|
|
325
|
+
|
|
326
|
+
subgraph "Scenario Agent (scenarios repo)"
|
|
327
|
+
SA_sees["Can see:<br/>Specs (via dispatch)<br/>Scenario tests<br/>Specs mirror"]
|
|
328
|
+
SA_cant["Cannot see:<br/>Source code<br/>Internal tests"]
|
|
329
|
+
end
|
|
330
|
+
|
|
331
|
+
IA_sees --> Specs
|
|
332
|
+
Specs --> SA_sees
|
|
333
|
+
|
|
334
|
+
style IA_cant fill:#fcc,stroke:#933
|
|
335
|
+
style SA_cant fill:#fcc,stroke:#933
|
|
336
|
+
style Specs fill:#cfc,stroke:#393
|
|
337
|
+
```
|
|
338
|
+
|
|
339
|
+
This is the same principle as a holdout set in machine learning. If the implementation agent could see the scenario tests, it would optimize to pass them specifically — not to build correct software. By keeping the wall intact, scenario tests catch real behavioral regressions, not test-gaming.
|
|
340
|
+
|
|
341
|
+
### Scenario Evolution
|
|
342
|
+
|
|
343
|
+
Scenarios aren't static. When you push new specs, the scenario agent automatically triages them and writes new holdout tests.
|
|
344
|
+
|
|
345
|
+
```mermaid
|
|
346
|
+
flowchart TD
|
|
347
|
+
A[New spec pushed to main] --> B[Spec Dispatch sends to scenarios repo]
|
|
348
|
+
B --> C[Scenario Agent reads spec]
|
|
349
|
+
C --> D{Triage: is this user-facing?}
|
|
350
|
+
|
|
351
|
+
D -->|Internal refactor, CI, dev tooling| E[Skip — commit note: 'No scenario changes needed']
|
|
352
|
+
D -->|New user-facing behavior| F[Write new scenario test file]
|
|
353
|
+
D -->|Modified existing behavior| G[Update existing scenario tests]
|
|
354
|
+
|
|
355
|
+
F --> H[Commit to scenarios main]
|
|
356
|
+
G --> H
|
|
357
|
+
H --> I[Dispatch scenarios-updated to main repo]
|
|
358
|
+
I --> J[Re-run workflow tests open PRs with new scenarios]
|
|
359
|
+
|
|
360
|
+
style D fill:#ffd,stroke:#993
|
|
361
|
+
style E fill:#ddd,stroke:#999
|
|
362
|
+
style F fill:#cfc,stroke:#393
|
|
363
|
+
style G fill:#cfc,stroke:#393
|
|
364
|
+
```
|
|
365
|
+
|
|
366
|
+
**The scenario agent's prompt instructs it to:**
|
|
367
|
+
- Act as a QA engineer, never a developer
|
|
368
|
+
- Write only behavioral tests (invoke the built artifact, assert on output)
|
|
369
|
+
- Never import source code or reference internal implementation
|
|
370
|
+
- Use a triage decision tree: SKIP / NEW / UPDATE
|
|
371
|
+
- Err on the side of writing a test if the spec is ambiguous
|
|
372
|
+
|
|
373
|
+
**The specs mirror:** The scenarios repo maintains a `specs/` folder that mirrors every spec it receives. This gives the scenario agent historical context ("what features already exist?") without access to the main repo's codebase.
|
|
374
|
+
|
|
375
|
+
### The Complete Loop
|
|
376
|
+
|
|
377
|
+
Here's the full lifecycle from spec to shipped, validated code:
|
|
378
|
+
|
|
379
|
+
```mermaid
|
|
380
|
+
sequenceDiagram
|
|
381
|
+
participant Human as Human (writes specs)
|
|
382
|
+
participant Main as Main Repo
|
|
383
|
+
participant ScAgent as Scenario Agent
|
|
384
|
+
participant ScRepo as Scenarios Repo
|
|
385
|
+
participant ImplAgent as Implementation Agent
|
|
386
|
+
participant Autofix as Autofix Workflow
|
|
387
|
+
|
|
388
|
+
Human->>Main: Push spec to docs/specs/
|
|
389
|
+
Main->>ScAgent: spec-pushed dispatch
|
|
390
|
+
|
|
391
|
+
par Scenario Generation
|
|
392
|
+
ScAgent->>ScAgent: Triage spec
|
|
393
|
+
ScAgent->>ScRepo: Write/update holdout tests
|
|
394
|
+
ScRepo->>Main: scenarios-updated dispatch
|
|
395
|
+
and Implementation
|
|
396
|
+
Human->>ImplAgent: Execute spec (fresh session)
|
|
397
|
+
ImplAgent->>Main: Open PR
|
|
398
|
+
end
|
|
399
|
+
|
|
400
|
+
Main->>Main: CI runs on PR
|
|
401
|
+
|
|
402
|
+
alt CI fails
|
|
403
|
+
Main->>Autofix: Autofix workflow triggers
|
|
404
|
+
Autofix->>Main: Push fix, CI re-runs
|
|
405
|
+
end
|
|
406
|
+
|
|
407
|
+
alt CI passes
|
|
408
|
+
Main->>ScRepo: run-scenarios dispatch
|
|
409
|
+
ScRepo->>ScRepo: Clone PR branch, build, run holdout tests
|
|
410
|
+
ScRepo->>Main: Post PASS/FAIL comment on PR
|
|
411
|
+
end
|
|
412
|
+
|
|
413
|
+
alt Scenarios PASS
|
|
414
|
+
Note over Human,Main: Ready for human review and merge
|
|
415
|
+
else Scenarios FAIL
|
|
416
|
+
Main->>Autofix: Autofix attempts fix
|
|
417
|
+
Note over Autofix,ScRepo: Loop continues (max 3 iterations)
|
|
418
|
+
end
|
|
419
|
+
```
|
|
420
|
+
|
|
421
|
+
### What Gets Installed
|
|
422
|
+
|
|
423
|
+
| Where | File | Purpose |
|
|
424
|
+
|-------|------|---------|
|
|
425
|
+
| Main repo | `.github/workflows/autofix.yml` | CI failure → Claude fix → push |
|
|
426
|
+
| Main repo | `.github/workflows/scenarios-dispatch.yml` | CI pass → trigger holdout tests |
|
|
427
|
+
| Main repo | `.github/workflows/spec-dispatch.yml` | Spec push → trigger scenario generation |
|
|
428
|
+
| Main repo | `.github/workflows/scenarios-rerun.yml` | New tests → re-test open PRs |
|
|
429
|
+
| Scenarios repo | `workflows/run.yml` | Clone PR, build, run tests, post results |
|
|
430
|
+
| Scenarios repo | `workflows/generate.yml` | Receive spec, run scenario agent |
|
|
431
|
+
| Scenarios repo | `prompts/scenario-agent.md` | Scenario agent prompt template |
|
|
432
|
+
| Scenarios repo | `example-scenario.test.ts` | Example holdout test |
|
|
433
|
+
| Scenarios repo | `package.json` | Minimal vitest setup |
|
|
434
|
+
| Scenarios repo | `README.md` | Explains holdout pattern to contributors |
|
|
435
|
+
|
|
436
|
+
### Prerequisites
|
|
437
|
+
|
|
438
|
+
- **GitHub App** — Provides a separate identity for autofix pushes (avoids GitHub's anti-recursion protection). You can install the shared [Joycraft Autofix](https://github.com/apps/joycraft-autofix) app (App ID: `3180156`) or create your own.
|
|
439
|
+
- **Secrets** — `JOYCRAFT_APP_PRIVATE_KEY` and `ANTHROPIC_API_KEY` on both the main and scenarios repos.
|
|
440
|
+
- **Scenarios repo** — A private repository where holdout tests live. Created during setup.
|
|
441
|
+
|
|
442
|
+
### Cost
|
|
443
|
+
|
|
444
|
+
Validated in the Pipit trial (~3 minutes, one iteration, zero human intervention). With Claude Sonnet + `--max-turns 20` + max 3 iterations per PR:
|
|
445
|
+
- **Autofix:** ~$0.50 per attempt, worst case ~$1.50 per PR (3 iterations)
|
|
446
|
+
- **Scenario generation:** ~$0.20 per spec dispatch
|
|
447
|
+
- **Solo dev with ~10 PRs/month:** ~$5-10/month for the full loop
|
|
448
|
+
|
|
449
|
+
The iteration guard and max-turns cap prevent runaway costs.
|
|
450
|
+
|
|
451
|
+
## Tuning: Risk Interview & Git Autonomy
|
|
452
|
+
|
|
453
|
+
When `/joycraft-tune` runs for the first time, it does two things:
|
|
454
|
+
|
|
455
|
+
### Risk interview
|
|
456
|
+
|
|
457
|
+
3-5 targeted questions about what's dangerous in your project — production databases, live APIs, secrets, files that should be off-limits. From your answers, Joycraft generates:
|
|
458
|
+
|
|
459
|
+
- **NEVER rules** for CLAUDE.md (e.g., "NEVER connect to production DB")
|
|
460
|
+
- **Deny patterns** for `.claude/settings.json` (blocks dangerous bash commands)
|
|
461
|
+
- **`docs/context/production-map.md`** — what's real vs. safe to touch
|
|
462
|
+
- **`docs/context/dangerous-assumptions.md`** — "Agent might assume X, but actually Y"
|
|
463
|
+
|
|
464
|
+
This takes 2-3 minutes and dramatically reduces the chance of your agent doing something catastrophic.
|
|
465
|
+
|
|
466
|
+
### Git autonomy
|
|
467
|
+
|
|
468
|
+
One question: **how autonomous should git be?**
|
|
65
469
|
|
|
66
470
|
- **Cautious** (default) — commits freely, asks before pushing or opening PRs. Good for learning the workflow.
|
|
67
471
|
- **Autonomous** — commits, pushes to feature branches, and opens PRs without asking. Good for spec-driven development where you want full send.
|
|
@@ -107,7 +511,7 @@ Joycraft's approach is synthesized from several sources:
|
|
|
107
511
|
|
|
108
512
|
**Knowledge capture over session notes.** Most session notes are never re-read. Joycraft's `/joycraft-session-end` skill captures only *discoveries* — assumptions that were wrong, APIs that behaved unexpectedly, decisions made during implementation that aren't in the spec. If nothing surprising happened, you capture nothing. This keeps the signal-to-noise ratio high.
|
|
109
513
|
|
|
110
|
-
**External holdout scenarios.** [StrongDM's Software Factory](https://factory.strongdm.ai/) proved that AI agents will [actively game visible test suites](https://palisaderesearch.org/blog/specification-gaming). Their solution: scenarios that live *outside* the codebase, invisible to the agent during development. Like a holdout set in ML, this prevents overfitting. Joycraft
|
|
514
|
+
**External holdout scenarios.** [StrongDM's Software Factory](https://factory.strongdm.ai/) proved that AI agents will [actively game visible test suites](https://palisaderesearch.org/blog/specification-gaming). Their solution: scenarios that live *outside* the codebase, invisible to the agent during development. Like a holdout set in ML, this prevents overfitting. Joycraft now implements this directly — `init-autofix` sets up the holdout wall, the scenario agent, and the GitHub App integration, not just provides templates for it.
|
|
111
515
|
|
|
112
516
|
**The 5-level framework.** [Dan Shapiro's levels](https://www.danshapiro.com/blog/2026/01/the-five-levels-from-spicy-autocomplete-to-the-software-factory/) give you a map. Level 2 (Junior Developer) is where most teams plateau. Level 3 (Developer as Manager) means your life is diffs. Level 4 (Developer as PM) means you write specs, not code. Level 5 (Dark Factory) means specs in, software out. Joycraft's `/joycraft-tune` assessment tells you where you are and what to do next.
|
|
113
517
|
|
|
@@ -115,15 +519,29 @@ Joycraft's approach is synthesized from several sources:
|
|
|
115
519
|
|
|
116
520
|
Joycraft synthesizes ideas and patterns from people doing extraordinary work in AI-assisted software development:
|
|
117
521
|
|
|
118
|
-
- **[Dan Shapiro](https://
|
|
119
|
-
- **[StrongDM](https://www.strongdm.com/)** / **Justin McCarthy
|
|
120
|
-
- **[Boris Cherny](https://
|
|
121
|
-
- **[Addy Osmani](https://
|
|
522
|
+
- **[Dan Shapiro](https://x.com/danshapiro)** — The [5 Levels of Vibe Coding](https://www.danshapiro.com/blog/2026/01/the-five-levels-from-spicy-autocomplete-to-the-software-factory/) framework that Joycraft's assessment and level system is built on
|
|
523
|
+
- **[StrongDM](https://www.strongdm.com/)** / **[Justin McCarthy](https://x.com/BuiltByJustin)** — The [Software Factory](https://factory.strongdm.ai/): spec-driven autonomous development, NLSpec, external holdout scenarios, and the proof that 3 engineers can outproduce 30
|
|
524
|
+
- **[Boris Cherny](https://x.com/bcherny)** — Head of Claude Code at Anthropic. The interview → spec → fresh session → execute pattern, and the insight that [context isolation produces better results](https://www.lennysnewsletter.com/p/head-of-claude-code-what-happens)
|
|
525
|
+
- **[Addy Osmani](https://x.com/addyosmani)** — [What makes a good spec for AI](https://addyosmani.com/blog/good-spec/): commands, testing, project structure, code style, git workflow, and boundaries
|
|
122
526
|
- **[METR](https://metr.org/)** — The [randomized control trial](https://metr.org/) that proved unstructured AI use makes experienced developers slower, validating the need for harnesses
|
|
123
|
-
- **[Nate B Jones](https://
|
|
124
|
-
- **[Simon Willison](https://
|
|
527
|
+
- **[Nate B Jones](https://x.com/natebjones)** — His [video on the 5 Levels of Vibe Coding](https://www.youtube.com/watch?v=bDcgHzCBgmQ) made this research accessible and inspired turning Joycraft into a tool anyone can use
|
|
528
|
+
- **[Simon Willison](https://x.com/simonw)** — [Analysis of the Software Factory](https://simonwillison.net/2026/Feb/7/software-factory/) that helped contextualize StrongDM's approach for the broader community
|
|
125
529
|
- **[Anthropic](https://www.anthropic.com/)** — Claude Code's skills, hooks, and CLAUDE.md system that makes tool-native AI development possible, and the [harness patterns for long-running agents](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents)
|
|
126
530
|
|
|
531
|
+
## Contributing
|
|
532
|
+
|
|
533
|
+
Contributions are welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for the full guide.
|
|
534
|
+
|
|
535
|
+
The short version:
|
|
536
|
+
|
|
537
|
+
1. Fork, branch from `main`
|
|
538
|
+
2. `pnpm install && pnpm test --run` to verify your setup
|
|
539
|
+
3. Write tests first, then implement
|
|
540
|
+
4. `pnpm test --run && pnpm typecheck && pnpm build`
|
|
541
|
+
5. Open a PR — one approval required
|
|
542
|
+
|
|
543
|
+
Look for [`good first issue`](https://github.com/maksutovic/joycraft/labels/good%20first%20issue) labels if you're new. Areas we'd especially love help with: stack detection for new languages, skill improvements, documentation, and Codex integration.
|
|
544
|
+
|
|
127
545
|
## License
|
|
128
546
|
|
|
129
|
-
MIT
|
|
547
|
+
MIT — see [LICENSE](LICENSE) for details.
|
|
@@ -0,0 +1,36 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
|
|
3
|
+
// src/version.ts
|
|
4
|
+
import { readFileSync, writeFileSync, existsSync } from "fs";
|
|
5
|
+
import { join } from "path";
|
|
6
|
+
import { createHash } from "crypto";
|
|
7
|
+
var VERSION_FILE = ".joycraft-version";
|
|
8
|
+
function hashContent(content) {
|
|
9
|
+
return createHash("sha256").update(content).digest("hex");
|
|
10
|
+
}
|
|
11
|
+
function readVersion(dir) {
|
|
12
|
+
const filePath = join(dir, VERSION_FILE);
|
|
13
|
+
if (!existsSync(filePath)) return null;
|
|
14
|
+
try {
|
|
15
|
+
const raw = readFileSync(filePath, "utf-8");
|
|
16
|
+
const parsed = JSON.parse(raw);
|
|
17
|
+
if (typeof parsed.version === "string" && typeof parsed.files === "object") {
|
|
18
|
+
return parsed;
|
|
19
|
+
}
|
|
20
|
+
return null;
|
|
21
|
+
} catch {
|
|
22
|
+
return null;
|
|
23
|
+
}
|
|
24
|
+
}
|
|
25
|
+
function writeVersion(dir, version, files) {
|
|
26
|
+
const filePath = join(dir, VERSION_FILE);
|
|
27
|
+
const data = { version, files };
|
|
28
|
+
writeFileSync(filePath, JSON.stringify(data, null, 2) + "\n", "utf-8");
|
|
29
|
+
}
|
|
30
|
+
|
|
31
|
+
export {
|
|
32
|
+
hashContent,
|
|
33
|
+
readVersion,
|
|
34
|
+
writeVersion
|
|
35
|
+
};
|
|
36
|
+
//# sourceMappingURL=chunk-2S7KP7FU.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"sources":["../src/version.ts"],"sourcesContent":["import { readFileSync, writeFileSync, existsSync } from 'node:fs';\nimport { join } from 'node:path';\nimport { createHash } from 'node:crypto';\n\nconst VERSION_FILE = '.joycraft-version';\n\nexport interface VersionInfo {\n version: string;\n files: Record<string, string>;\n}\n\nexport function hashContent(content: string): string {\n return createHash('sha256').update(content).digest('hex');\n}\n\nexport function readVersion(dir: string): VersionInfo | null {\n const filePath = join(dir, VERSION_FILE);\n if (!existsSync(filePath)) return null;\n try {\n const raw = readFileSync(filePath, 'utf-8');\n const parsed = JSON.parse(raw);\n if (typeof parsed.version === 'string' && typeof parsed.files === 'object') {\n return parsed as VersionInfo;\n }\n return null;\n } catch {\n return null;\n }\n}\n\nexport function writeVersion(dir: string, version: string, files: Record<string, string>): void {\n const filePath = join(dir, VERSION_FILE);\n const data: VersionInfo = { version, files };\n writeFileSync(filePath, JSON.stringify(data, null, 2) + '\\n', 'utf-8');\n}\n"],"mappings":";;;AAAA,SAAS,cAAc,eAAe,kBAAkB;AACxD,SAAS,YAAY;AACrB,SAAS,kBAAkB;AAE3B,IAAM,eAAe;AAOd,SAAS,YAAY,SAAyB;AACnD,SAAO,WAAW,QAAQ,EAAE,OAAO,OAAO,EAAE,OAAO,KAAK;AAC1D;AAEO,SAAS,YAAY,KAAiC;AAC3D,QAAM,WAAW,KAAK,KAAK,YAAY;AACvC,MAAI,CAAC,WAAW,QAAQ,EAAG,QAAO;AAClC,MAAI;AACF,UAAM,MAAM,aAAa,UAAU,OAAO;AAC1C,UAAM,SAAS,KAAK,MAAM,GAAG;AAC7B,QAAI,OAAO,OAAO,YAAY,YAAY,OAAO,OAAO,UAAU,UAAU;AAC1E,aAAO;AAAA,IACT;AACA,WAAO;AAAA,EACT,QAAQ;AACN,WAAO;AAAA,EACT;AACF;AAEO,SAAS,aAAa,KAAa,SAAiB,OAAqC;AAC9F,QAAM,WAAW,KAAK,KAAK,YAAY;AACvC,QAAM,OAAoB,EAAE,SAAS,MAAM;AAC3C,gBAAc,UAAU,KAAK,UAAU,MAAM,MAAM,CAAC,IAAI,MAAM,OAAO;AACvE;","names":[]}
|