joycraft 0.5.8 → 0.5.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -70,6 +70,8 @@ Joycraft auto-detects your tech stack and creates:
70
70
  - `/joycraft-tune` Assess your harness, apply upgrades, see your path to Level 5
71
71
  - `/joycraft-new-feature` Interview → Feature Brief → Atomic Specs
72
72
  - `/joycraft-interview` Lightweight brainstorm. Yap about ideas, get a structured summary
73
+ - `/joycraft-research` Objective codebase research — subagent sees only questions, never the brief
74
+ - `/joycraft-design` Design discussion checkpoint — ~200-line artifact for human review before decompose
73
75
  - `/joycraft-decompose` Break a brief into small, testable specs
74
76
  - `/joycraft-add-fact` Capture project knowledge on the fly -- routes to the right context doc
75
77
  - `/joycraft-lockdown` Generate constrained execution boundaries (read-only tests, deny patterns)
@@ -96,6 +98,8 @@ After init, open Claude Code and use the installed skills:
96
98
  /joycraft-tune # Assess your harness, apply upgrades, see path to Level 5
97
99
  /joycraft-interview # Brainstorm freely, yap about ideas, get a structured summary
98
100
  /joycraft-new-feature # Interview → Feature Brief → Atomic Specs → ready to execute
101
+ /joycraft-research # Objective codebase research (subagent never sees the brief)
102
+ /joycraft-design # Design discussion — patterns, decisions, open questions for review
99
103
  /joycraft-decompose # Break any feature into small, independent specs
100
104
  /joycraft-add-fact # Capture a fact mid-session -- auto-routes to the right context doc
101
105
  /joycraft-lockdown # Generate constrained execution boundaries for autonomous sessions
@@ -107,7 +111,8 @@ After init, open Claude Code and use the installed skills:
107
111
  The core loop:
108
112
 
109
113
  ```
110
- Interview → SpecFresh SessionExecuteDiscoveriesShip
114
+ Interview → BriefResearch Design DecomposeSpecsImplement → Verify
115
+ (optional) (optional)
111
116
  ```
112
117
 
113
118
  ## The Interview: Why It Matters
@@ -135,17 +140,47 @@ flowchart LR
135
140
  A["/joycraft-interview<br/>(brainstorm)"] --> B["Draft Brief<br/>docs/briefs/"]
136
141
  B --> C["/joycraft-new-feature<br/>(structured interview)"]
137
142
  C --> D["Feature Brief<br/>(what & why)"]
138
- D --> E["/joycraft-decompose"]
143
+ D --> R["/joycraft-research<br/>(objective facts)"]
144
+ R --> DS["/joycraft-design<br/>(human checkpoint)"]
145
+ DS --> E["/joycraft-decompose"]
139
146
  E --> F["Atomic Specs<br/>docs/specs/"]
140
147
  F --> G["Fresh Session<br/>Execute each spec"]
141
148
  G --> H["/joycraft-session-end<br/>(discoveries + commit)"]
142
149
 
143
150
  style A fill:#e8f4fd,stroke:#369
144
151
  style C fill:#e8f4fd,stroke:#369
152
+ style R fill:#f0e8fd,stroke:#639
153
+ style DS fill:#f0e8fd,stroke:#639
145
154
  style F fill:#cfc,stroke:#393
146
155
  style G fill:#ffd,stroke:#993
147
156
  ```
148
157
 
158
+ ## Research Isolation & Design Checkpoints
159
+
160
+ These two skills were inspired by [Dex Horthy](https://x.com/dexhorthy)'s work at [HumanLayer](https://humanlayer.dev) on what went wrong with the Research-Plan-Implement (RPI) methodology and the evolution to [CRISPY](https://humanlayer.dev/blog) (Context, Research, Investigate, Structure, Plan, Yield).
161
+
162
+ ### The problem with "research the codebase"
163
+
164
+ When you tell an agent "research how endpoints work — I'm going to build a new one," the research comes back contaminated with opinions about how to build the new endpoint. Good research is pure facts. The moment the researcher knows the intent, it editorializes.
165
+
166
+ **`/joycraft-research`** fixes this with context isolation: one context window generates research questions from the brief, then a separate subagent researches the codebase using *only those questions* — it never sees the brief. The output is a research document in `docs/research/` that contains file paths, function signatures, data flows, and patterns. No recommendations. No opinions. Just compressed truth.
167
+
168
+ This is the same "query planning" technique Dex describes: separate the intent from the investigation, like a database separates query planning from execution.
169
+
170
+ ### The 200-line checkpoint
171
+
172
+ HumanLayer found that engineers were reviewing 1,000-line plans — which is the same effort as reviewing 1,000 lines of code, and the plans often diverged from what was actually implemented. The leverage was terrible.
173
+
174
+ **`/joycraft-design`** produces a ~200-line design discussion artifact instead. It contains five sections: current state, desired end state, patterns to follow, resolved design decisions, and open questions with concrete options. This is where you catch "that's not how we do atomic SQL updates — go find the pattern in `/services/billing`" *before* 2,000 lines of code follow the wrong pattern.
175
+
176
+ [Matt Pocock](https://x.com/mattpocockuk) calls this the "design concept" — the shared understanding between you and the agent that exists separately from the code. Joycraft materializes it as a markdown document and forces a human checkpoint: the skill will not proceed to decomposition until you've reviewed and approved.
177
+
178
+ Both steps are optional. You can skip straight from brief to decompose for simple features. But for anything complex enough to get wrong, the 15 minutes of human review on a 200-line document saves hours of rework on code that followed the wrong patterns.
179
+
180
+ ### Instruction budget discipline
181
+
182
+ Every Joycraft skill now includes an `instructions` count in its frontmatter. No skill exceeds 40 instructions. This is based on [research](https://arxiv.org/pdf/2507.11538) showing that frontier LLMs can reliably follow ~150-200 instructions — but your skill shares that budget with the system prompt, CLAUDE.md, tools, and MCP servers. A skill with 85 instructions (as Joycraft's `/joycraft-tune` had before this refactor) is competing for attention with everything else in the context window. Smaller, focused skills with clear handoffs produce more reliable results than monolithic mega-prompts.
183
+
149
184
  ### What a good spec looks like
150
185
 
151
186
  An atomic spec produced by `/joycraft-decompose` has:
@@ -294,7 +329,7 @@ Joycraft's approach is synthesized from several sources:
294
329
 
295
330
  **Spec-driven development.** Instead of prompting AI in conversation, you write structured specifications. Feature Briefs capture the *what* and *why*, then Atomic Specs break work into small, testable, independently executable units. Each spec is self-contained: an agent can pick it up without reading anything else. This follows [Addy Osmani's](https://addyosmani.com/blog/good-spec/) principles for AI-consumable specs and [GitHub's Spec Kit](https://github.blog/ai-and-ml/generative-ai/spec-driven-development-with-ai-get-started-with-a-new-open-source-toolkit/) 4-phase process (Specify → Plan → Tasks → Implement).
296
331
 
297
- **Context isolation.** [Boris Cherny](https://www.lennysnewsletter.com/p/head-of-claude-code-what-happens) (Head of Claude Code at Anthropic) recommends: interview in one session, write the spec, then execute in a *fresh session* with clean context. Joycraft's `/joycraft-new-feature` → `/joycraft-decompose` → execute workflow enforces this naturally. The interview session captures intent; the execution session has only the spec.
332
+ **Context isolation.** [Boris Cherny](https://www.lennysnewsletter.com/p/head-of-claude-code-what-happens) (Head of Claude Code at Anthropic) recommends: interview in one session, write the spec, then execute in a *fresh session* with clean context. [Dex Horthy](https://humanlayer.dev) at HumanLayer took this further: even *research* should be isolated from intent — the researching agent should never see the ticket, only objective questions derived from it. Joycraft's `/joycraft-research` → `/joycraft-design` → `/joycraft-decompose` pipeline enforces this at every stage: the interview captures intent, research gathers objective facts, design aligns human and agent on approach, and the execution session has only the spec.
298
333
 
299
334
  **Behavioral boundaries.** CLAUDE.md isn't a suggestion box, it's a contract. Joycraft installs a three-tier boundary framework (Always / Ask First / Never) that prevents the most common AI development failures: overwriting user files, skipping tests, pushing without approval, hardcoding secrets. This is [Addy Osmani's](https://addyosmani.com/blog/good-spec/) "boundaries" principle made concrete.
300
335
 
@@ -314,6 +349,7 @@ Joycraft synthesizes ideas and patterns from people doing extraordinary work in
314
349
 
315
350
  - **[Dan Shapiro](https://x.com/danshapiro)** for the [5 Levels of Vibe Coding](https://www.danshapiro.com/blog/2026/01/the-five-levels-from-spicy-autocomplete-to-the-software-factory/) framework that Joycraft's assessment and level system is built on
316
351
  - **[StrongDM](https://www.strongdm.com/)** / **[Justin McCarthy](https://x.com/BuiltByJustin)** for the [Software Factory](https://factory.strongdm.ai/): spec-driven autonomous development, NLSpec, external holdout scenarios, and the proof that 3 engineers can outproduce 30
352
+ - **[Dex Horthy](https://x.com/dexhorthy)** / **[HumanLayer](https://humanlayer.dev)** for the [RPI to CRISPY evolution](https://humanlayer.dev/blog): research isolation (hide the ticket from the researcher), the instruction budget concept (~150-200 instructions max), design discussions as high-leverage checkpoints, vertical-over-horizontal planning, and the conviction that "if your tool requires magic words, go fix the tool"
317
353
  - **[Boris Cherny](https://x.com/bcherny)**, Head of Claude Code at Anthropic, for the interview → spec → fresh session → execute pattern and the insight that [context isolation produces better results](https://www.lennysnewsletter.com/p/head-of-claude-code-what-happens)
318
354
  - **[Addy Osmani](https://x.com/addyosmani)** for [What makes a good spec for AI](https://addyosmani.com/blog/good-spec/): commands, testing, project structure, code style, git workflow, and boundaries
319
355
  - **[METR](https://metr.org/)** for the [randomized control trial](https://metr.org/) that proved unstructured AI use makes experienced developers slower, validating the need for harnesses