joycraft 0.5.12 → 0.5.14
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +48 -244
- package/dist/{chunk-4RGMUQQZ.js → chunk-QU5VHXMV.js} +990 -1519
- package/dist/chunk-QU5VHXMV.js.map +1 -0
- package/dist/cli.js +3 -3
- package/dist/{init-HPU5RXOM.js → init-CNZYAFJB.js} +2 -2
- package/dist/{init-autofix-K3WRCZCJ.js → init-autofix-4KNP5RRV.js} +2 -2
- package/dist/{upgrade-HOIQM2TP.js → upgrade-KLUUI6RP.js} +2 -2
- package/package.json +1 -1
- package/dist/chunk-4RGMUQQZ.js.map +0 -1
- /package/dist/{init-HPU5RXOM.js.map → init-CNZYAFJB.js.map} +0 -0
- /package/dist/{init-autofix-K3WRCZCJ.js.map → init-autofix-4KNP5RRV.js.map} +0 -0
- /package/dist/{upgrade-HOIQM2TP.js.map → upgrade-KLUUI6RP.js.map} +0 -0
|
@@ -2,513 +2,390 @@
|
|
|
2
2
|
|
|
3
3
|
// src/bundled-files.ts
|
|
4
4
|
var SKILLS = {
|
|
5
|
-
"joycraft-
|
|
6
|
-
name: joycraft-
|
|
7
|
-
description:
|
|
8
|
-
instructions:
|
|
5
|
+
"joycraft-add-fact.md": `---
|
|
6
|
+
name: joycraft-add-fact
|
|
7
|
+
description: Capture a project fact and route it to the correct context document -- production map, dangerous assumptions, decision log, institutional knowledge, or troubleshooting
|
|
8
|
+
instructions: 38
|
|
9
9
|
---
|
|
10
10
|
|
|
11
|
-
#
|
|
12
|
-
|
|
13
|
-
You have a Feature Brief (or the user has described a feature). Your job is to decompose it into atomic specs that can be executed independently \u2014 one spec per session.
|
|
11
|
+
# Add Fact
|
|
14
12
|
|
|
15
|
-
|
|
13
|
+
The user has a fact to capture. Your job is to classify it, route it to the correct context document, append it in the right format, and optionally add a CLAUDE.md boundary rule.
|
|
16
14
|
|
|
17
|
-
|
|
15
|
+
## Step 1: Get the Fact
|
|
18
16
|
|
|
19
|
-
|
|
17
|
+
If the user already provided the fact (e.g., \`/joycraft-add-fact the staging DB resets every Sunday\`), use it directly.
|
|
20
18
|
|
|
21
|
-
If
|
|
19
|
+
If not, ask: "What fact do you want to capture?" -- then wait for their response.
|
|
22
20
|
|
|
23
|
-
|
|
21
|
+
If the user provides multiple facts at once, process each one separately through all the steps below, then give a combined confirmation at the end.
|
|
24
22
|
|
|
25
|
-
|
|
23
|
+
## Step 2: Classify the Fact
|
|
26
24
|
|
|
27
|
-
|
|
25
|
+
Route the fact to one of these 5 context documents based on its content:
|
|
28
26
|
|
|
29
|
-
-
|
|
30
|
-
|
|
31
|
-
-
|
|
32
|
-
-
|
|
33
|
-
- **Test infrastructure** (mocks, fixtures, helpers) \u2014 can be its own spec if substantial
|
|
34
|
-
- **Configuration / environment** \u2014 separate from code changes
|
|
27
|
+
### \`docs/context/production-map.md\`
|
|
28
|
+
The fact is about **infrastructure, services, environments, URLs, endpoints, credentials, or what is safe/unsafe to touch**.
|
|
29
|
+
- Signal words: "production", "staging", "endpoint", "URL", "database", "service", "deployed", "hosted", "credentials", "secret", "environment"
|
|
30
|
+
- Examples: "The staging DB is at postgres://staging.example.com", "We use Vercel for the frontend and Railway for the API"
|
|
35
31
|
|
|
36
|
-
|
|
32
|
+
### \`docs/context/dangerous-assumptions.md\`
|
|
33
|
+
The fact is about **something an AI agent might get wrong -- a false assumption that leads to bad outcomes**.
|
|
34
|
+
- Signal words: "assumes", "might think", "but actually", "looks like X but is Y", "not what it seems", "trap", "gotcha"
|
|
35
|
+
- Examples: "The \`users\` table looks like a test table but it's production", "Deleting a workspace doesn't delete the billing subscription"
|
|
37
36
|
|
|
38
|
-
|
|
37
|
+
### \`docs/context/decision-log.md\`
|
|
38
|
+
The fact is about **an architectural or tooling choice and why it was made**.
|
|
39
|
+
- Signal words: "decided", "chose", "because", "instead of", "we went with", "the reason we use", "trade-off"
|
|
40
|
+
- Examples: "We chose SQLite over Postgres because this runs on embedded devices", "We use pnpm instead of npm for workspace support"
|
|
39
41
|
|
|
40
|
-
|
|
42
|
+
### \`docs/context/institutional-knowledge.md\`
|
|
43
|
+
The fact is about **team conventions, unwritten rules, organizational context, or who owns what**.
|
|
44
|
+
- Signal words: "convention", "rule", "always", "never", "team", "process", "review", "approval", "owns", "responsible"
|
|
45
|
+
- Examples: "The design team reviews all color changes", "We never deploy on Fridays", "PR titles must start with the ticket number"
|
|
41
46
|
|
|
42
|
-
|
|
43
|
-
|
|
47
|
+
### \`docs/context/troubleshooting.md\`
|
|
48
|
+
The fact is about **diagnostic knowledge -- when X happens, do Y (or don't do Z)**.
|
|
49
|
+
- Signal words: "when", "fails", "error", "if you see", "stuck", "broken", "fix", "workaround", "before trying", "reboot", "restart", "reset"
|
|
50
|
+
- Examples: "If Wi-Fi disconnects during flash, wait and retry -- don't switch networks", "When tests fail with ECONNREFUSED, check if Docker is running"
|
|
44
51
|
|
|
45
|
-
|
|
46
|
-
- Each spec name is \`verb-object\` format (e.g., \`add-terminal-detection\`, \`extract-prompt-module\`)
|
|
47
|
-
- Each description is ONE sentence \u2014 if you need two, the spec is too big
|
|
48
|
-
- Dependencies reference other spec numbers \u2014 keep the dependency graph shallow
|
|
49
|
-
- More than 2 dependencies on a single spec = it's too big, split further
|
|
50
|
-
- Aim for 3-7 specs per feature. Fewer than 3 = probably not decomposed enough. More than 10 = the feature brief is too big
|
|
52
|
+
### Ambiguous Facts
|
|
51
53
|
|
|
52
|
-
|
|
54
|
+
If the fact fits multiple categories, pick the **best fit** based on the primary intent. You will mention the alternative in your confirmation message so the user can correct you.
|
|
53
55
|
|
|
54
|
-
|
|
55
|
-
1. "Does this breakdown match how you think about this feature?"
|
|
56
|
-
2. "Are there any specs that feel too big or too small?"
|
|
57
|
-
3. "Should any of these run in parallel (separate worktrees)?"
|
|
56
|
+
## Step 3: Ensure the Target Document Exists
|
|
58
57
|
|
|
59
|
-
|
|
58
|
+
1. If \`docs/context/\` does not exist, create the directory.
|
|
59
|
+
2. If the target document does not exist, create it from the template structure. Check \`docs/templates/\` for the matching template. If no template exists, use this minimal structure:
|
|
60
60
|
|
|
61
|
-
|
|
61
|
+
For **production-map.md**:
|
|
62
|
+
\`\`\`markdown
|
|
63
|
+
# Production Map
|
|
62
64
|
|
|
63
|
-
|
|
65
|
+
> What's real, what's staging, what's safe to touch.
|
|
64
66
|
|
|
65
|
-
|
|
67
|
+
## Services
|
|
66
68
|
|
|
67
|
-
|
|
69
|
+
| Service | Environment | URL/Endpoint | Impact if Corrupted |
|
|
70
|
+
|---------|-------------|-------------|-------------------|
|
|
71
|
+
\`\`\`
|
|
68
72
|
|
|
73
|
+
For **dangerous-assumptions.md**:
|
|
69
74
|
\`\`\`markdown
|
|
70
|
-
#
|
|
75
|
+
# Dangerous Assumptions
|
|
71
76
|
|
|
72
|
-
>
|
|
73
|
-
> **Status:** Ready
|
|
74
|
-
> **Date:** YYYY-MM-DD
|
|
75
|
-
> **Estimated scope:** [1 session / N files / ~N lines]
|
|
77
|
+
> Things the AI agent might assume that are wrong in this project.
|
|
76
78
|
|
|
77
|
-
|
|
79
|
+
## Assumptions
|
|
78
80
|
|
|
79
|
-
|
|
80
|
-
|
|
81
|
+
| Agent Might Assume | But Actually | Impact If Wrong |
|
|
82
|
+
|-------------------|-------------|----------------|
|
|
83
|
+
\`\`\`
|
|
81
84
|
|
|
82
|
-
|
|
83
|
-
|
|
85
|
+
For **decision-log.md**:
|
|
86
|
+
\`\`\`markdown
|
|
87
|
+
# Decision Log
|
|
84
88
|
|
|
85
|
-
|
|
86
|
-
- [ ] [Observable behavior]
|
|
87
|
-
- [ ] Build passes
|
|
88
|
-
- [ ] Tests pass
|
|
89
|
+
> Why choices were made, not just what was chosen.
|
|
89
90
|
|
|
90
|
-
##
|
|
91
|
+
## Decisions
|
|
91
92
|
|
|
92
|
-
|
|
|
93
|
-
|
|
94
|
-
|
|
93
|
+
| Date | Decision | Why | Alternatives Rejected | Revisit When |
|
|
94
|
+
|------|----------|-----|----------------------|-------------|
|
|
95
|
+
\`\`\`
|
|
95
96
|
|
|
96
|
-
**
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
3. Implement until all tests pass (green)
|
|
97
|
+
For **institutional-knowledge.md**:
|
|
98
|
+
\`\`\`markdown
|
|
99
|
+
# Institutional Knowledge
|
|
100
100
|
|
|
101
|
-
|
|
101
|
+
> Unwritten rules, team conventions, and organizational context.
|
|
102
102
|
|
|
103
|
-
|
|
104
|
-
1. Run all tests \u2014 they must FAIL (if they pass, you're testing the wrong thing)
|
|
105
|
-
2. Each test calls your actual function/endpoint \u2014 not a reimplementation or the underlying library
|
|
106
|
-
3. Identify your smoke test \u2014 it must run in seconds, not minutes, so you get fast feedback on each change
|
|
103
|
+
## Team Conventions
|
|
107
104
|
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
- MUST NOT: [hard prohibition]
|
|
105
|
+
- (none yet)
|
|
106
|
+
\`\`\`
|
|
111
107
|
|
|
112
|
-
|
|
113
|
-
|
|
114
|
-
|
|
108
|
+
For **troubleshooting.md**:
|
|
109
|
+
\`\`\`markdown
|
|
110
|
+
# Troubleshooting
|
|
115
111
|
|
|
116
|
-
|
|
117
|
-
Strategy, data flow, key decisions. Name one rejected alternative.
|
|
112
|
+
> What to do when things go wrong for non-code reasons.
|
|
118
113
|
|
|
119
|
-
##
|
|
120
|
-
|
|
121
|
-
|
|
114
|
+
## Common Failures
|
|
115
|
+
|
|
116
|
+
| When This Happens | Do This | Don't Do This |
|
|
117
|
+
|-------------------|---------|---------------|
|
|
122
118
|
\`\`\`
|
|
123
119
|
|
|
124
|
-
|
|
120
|
+
## Step 4: Read the Target Document
|
|
125
121
|
|
|
126
|
-
|
|
122
|
+
Read the target document to understand its current structure. Note:
|
|
123
|
+
- Which section to append to
|
|
124
|
+
- Whether it uses tables or lists
|
|
125
|
+
- The column format if it's a table
|
|
127
126
|
|
|
128
|
-
## Step
|
|
127
|
+
## Step 5: Append the Fact
|
|
129
128
|
|
|
130
|
-
|
|
131
|
-
- **Independent specs** \u2014 "These can run in parallel worktrees"
|
|
132
|
-
- **Sequential specs** \u2014 "Execute these in order: 1 -> 2 -> 4"
|
|
133
|
-
- **Mixed** \u2014 "Start specs 1 and 3 in parallel. After 1 completes, start 2."
|
|
129
|
+
Add the fact to the appropriate section of the target document. Match the existing format exactly:
|
|
134
130
|
|
|
135
|
-
|
|
131
|
+
- **Table-based documents** (production-map, dangerous-assumptions, decision-log, troubleshooting): Add a new table row in the correct columns. Use today's date where a date column exists.
|
|
132
|
+
- **List-based documents** (institutional-knowledge): Add a new list item (\`- \`) to the most appropriate section.
|
|
136
133
|
|
|
137
|
-
|
|
134
|
+
Remove any italic example rows (rows where all cells start with \`_\`) before appending, so the document transitions from template to real content. Only remove examples from the specific table you are appending to.
|
|
135
|
+
|
|
136
|
+
**Append only. Never modify or remove existing real content.**
|
|
137
|
+
|
|
138
|
+
## Step 6: Evaluate CLAUDE.md Boundary Rule
|
|
139
|
+
|
|
140
|
+
Decide whether the fact also warrants a rule in CLAUDE.md's behavioral boundaries:
|
|
141
|
+
|
|
142
|
+
**Add a CLAUDE.md rule if the fact:**
|
|
143
|
+
- Describes something that should ALWAYS or NEVER be done
|
|
144
|
+
- Could cause real damage if violated (data loss, broken deployments, security issues)
|
|
145
|
+
- Is a hard constraint that applies across all work, not just a one-time note
|
|
146
|
+
|
|
147
|
+
**Do NOT add a CLAUDE.md rule if the fact is:**
|
|
148
|
+
- Purely informational (e.g., "staging DB is at this URL")
|
|
149
|
+
- A one-time decision that's already captured
|
|
150
|
+
- A diagnostic tip rather than a prohibition
|
|
151
|
+
|
|
152
|
+
If a rule is warranted, read CLAUDE.md, find the appropriate section (ALWAYS, ASK FIRST, or NEVER under Behavioral Boundaries), and append the rule. If no Behavioral Boundaries section exists, append one.
|
|
153
|
+
|
|
154
|
+
## Step 7: Confirm
|
|
155
|
+
|
|
156
|
+
Report what you did in this format:
|
|
138
157
|
|
|
139
|
-
Tell the user:
|
|
140
158
|
\`\`\`
|
|
141
|
-
|
|
142
|
-
|
|
143
|
-
- [N] can run in parallel, [N] are sequential
|
|
144
|
-
- Estimated total: [N] sessions
|
|
159
|
+
Added to [document name]:
|
|
160
|
+
[summary of what was added]
|
|
145
161
|
|
|
146
|
-
|
|
147
|
-
|
|
148
|
-
|
|
149
|
-
- Each session should end with /joycraft-session-end to capture discoveries
|
|
162
|
+
[If CLAUDE.md was also updated:]
|
|
163
|
+
Added CLAUDE.md rule:
|
|
164
|
+
[ALWAYS/ASK FIRST/NEVER]: [rule text]
|
|
150
165
|
|
|
151
|
-
|
|
166
|
+
[If the fact was ambiguous:]
|
|
167
|
+
Routed to [chosen doc] -- move to [alternative doc] if this is more about [alternative category description].
|
|
152
168
|
\`\`\`
|
|
153
169
|
`,
|
|
154
|
-
"joycraft-
|
|
155
|
-
name: joycraft-
|
|
156
|
-
description:
|
|
157
|
-
instructions:
|
|
170
|
+
"joycraft-bugfix.md": `---
|
|
171
|
+
name: joycraft-bugfix
|
|
172
|
+
description: Structured bug fix workflow \u2014 triage, diagnose, discuss with user, write a focused spec, hand off for implementation
|
|
173
|
+
instructions: 32
|
|
158
174
|
---
|
|
159
175
|
|
|
160
|
-
#
|
|
176
|
+
# Bug Fix Workflow
|
|
161
177
|
|
|
162
|
-
You are
|
|
178
|
+
You are fixing a bug. Follow this process in order. Do not skip steps.
|
|
163
179
|
|
|
164
|
-
|
|
180
|
+
**Guard clause:** If this is clearly a new feature, redirect to \`/joycraft-new-feature\` and stop.
|
|
165
181
|
|
|
166
|
-
|
|
182
|
+
---
|
|
167
183
|
|
|
168
|
-
|
|
169
|
-
2. **Project should be at Level 4.** Check \`docs/joycraft-assessment.md\` if it exists. If the project hasn't been assessed yet, suggest running \`/joycraft-tune\` first. But don't block \u2014 the user may know they're ready.
|
|
170
|
-
3. **Git repo with GitHub remote.** This setup requires GitHub Actions. Check for \`.git/\` and a GitHub remote.
|
|
184
|
+
## Phase 1: Triage
|
|
171
185
|
|
|
172
|
-
If
|
|
186
|
+
Establish what's broken. Gather: symptom, steps to reproduce, expected vs actual behavior, when it started, relevant logs/errors. If an error message or stack trace is provided, read the referenced files immediately. Try to reproduce if steps are given.
|
|
173
187
|
|
|
174
|
-
|
|
188
|
+
**Done when:** You can describe the symptom in one sentence.
|
|
175
189
|
|
|
176
|
-
|
|
190
|
+
---
|
|
177
191
|
|
|
178
|
-
|
|
179
|
-
>
|
|
180
|
-
> 1. **Scenario evolution** \u2014 A separate AI agent reads your specs and writes holdout tests in a private scenarios repo. These tests are invisible to your coding agent.
|
|
181
|
-
> 2. **Autofix** \u2014 When CI fails on a PR, Claude Code automatically attempts a fix (up to 3 times).
|
|
182
|
-
> 3. **Holdout validation** \u2014 When CI passes, your scenarios repo runs behavioral tests against the PR. Results post as PR comments.
|
|
183
|
-
>
|
|
184
|
-
> The key insight: your coding agent never sees the scenario tests. This prevents it from gaming the test suite \u2014 like a validation set in machine learning.
|
|
192
|
+
## Phase 2: Diagnose
|
|
185
193
|
|
|
186
|
-
|
|
194
|
+
Find the root cause. Start from the error site and trace backward. Read source files \u2014 don't guess. Identify the specific line(s) and logic error. Check git blame if it's a recent regression.
|
|
187
195
|
|
|
188
|
-
|
|
196
|
+
**Done when:** You can explain what's wrong, why, and where in 2-3 sentences.
|
|
189
197
|
|
|
190
|
-
|
|
198
|
+
---
|
|
191
199
|
|
|
192
|
-
|
|
193
|
-
>
|
|
194
|
-
> Default: \`{current-repo-name}-scenarios\`
|
|
200
|
+
## Phase 3: Discuss
|
|
195
201
|
|
|
196
|
-
|
|
197
|
-
|
|
198
|
-
|
|
199
|
-
|
|
200
|
-
|
|
201
|
-
>
|
|
202
|
-
> 1. Go to https://github.com/settings/apps/new
|
|
203
|
-
> 2. Give it a name (e.g., "My Project Autofix")
|
|
204
|
-
> 3. Uncheck "Webhook > Active" (not needed)
|
|
205
|
-
> 4. Under **Repository permissions**, set:
|
|
206
|
-
> - **Contents**: Read & Write
|
|
207
|
-
> - **Pull requests**: Read & Write
|
|
208
|
-
> - **Actions**: Read & Write
|
|
209
|
-
> 5. Click **Create GitHub App**
|
|
210
|
-
> 6. Note the **App ID** from the settings page
|
|
211
|
-
> 7. Scroll to **Private keys** > click **Generate a private key** > save the \`.pem\` file
|
|
212
|
-
> 8. Click **Install App** in the left sidebar > install it on your repo
|
|
213
|
-
>
|
|
214
|
-
> What's your App ID?
|
|
215
|
-
|
|
216
|
-
## Step 3: Run init-autofix
|
|
217
|
-
|
|
218
|
-
Run the CLI command with the gathered configuration:
|
|
219
|
-
|
|
220
|
-
\`\`\`bash
|
|
221
|
-
npx joycraft init-autofix --scenarios-repo {name} --app-id {id}
|
|
222
|
-
\`\`\`
|
|
223
|
-
|
|
224
|
-
Review the output with the user. Confirm files were created.
|
|
225
|
-
|
|
226
|
-
## Step 4: Walk Through Secret Configuration
|
|
227
|
-
|
|
228
|
-
Guide the user step by step:
|
|
229
|
-
|
|
230
|
-
### 4a: Add Secrets to Main Repo
|
|
231
|
-
|
|
232
|
-
> You should already have the \`.pem\` file from when you created the app in Step 2.
|
|
233
|
-
|
|
234
|
-
> Go to your repo's Settings > Secrets and variables > Actions, and add:
|
|
235
|
-
> - \`JOYCRAFT_APP_PRIVATE_KEY\` \u2014 paste the contents of your \`.pem\` file
|
|
236
|
-
> - \`ANTHROPIC_API_KEY\` \u2014 your Anthropic API key
|
|
237
|
-
|
|
238
|
-
### 4b: Create the Scenarios Repo
|
|
239
|
-
|
|
240
|
-
> Create the private scenarios repo:
|
|
241
|
-
> \`\`\`bash
|
|
242
|
-
> gh repo create {scenarios-repo-name} --private
|
|
243
|
-
> \`\`\`
|
|
244
|
-
>
|
|
245
|
-
> Then copy the scenario templates into it:
|
|
246
|
-
> \`\`\`bash
|
|
247
|
-
> cp -r docs/templates/scenarios/* ../{scenarios-repo-name}/
|
|
248
|
-
> cd ../{scenarios-repo-name}
|
|
249
|
-
> git add -A && git commit -m "init: scaffold scenarios repo from Joycraft"
|
|
250
|
-
> git push
|
|
251
|
-
> \`\`\`
|
|
252
|
-
|
|
253
|
-
### 4c: Add Secrets to Scenarios Repo
|
|
254
|
-
|
|
255
|
-
> The scenarios repo also needs the App private key:
|
|
256
|
-
> - \`JOYCRAFT_APP_PRIVATE_KEY\` \u2014 same \`.pem\` file as the main repo
|
|
257
|
-
> - \`ANTHROPIC_API_KEY\` \u2014 same key (needed for scenario generation)
|
|
258
|
-
|
|
259
|
-
## Step 5: Verify Setup
|
|
260
|
-
|
|
261
|
-
Help the user verify everything is wired correctly:
|
|
262
|
-
|
|
263
|
-
1. **Check workflow files exist:** \`ls .github/workflows/autofix.yml .github/workflows/scenarios-dispatch.yml .github/workflows/spec-dispatch.yml .github/workflows/scenarios-rerun.yml\`
|
|
264
|
-
2. **Check scenario templates were copied:** Verify the scenarios repo has \`example-scenario.test.ts\`, \`workflows/run.yml\`, \`workflows/generate.yml\`, \`prompts/scenario-agent.md\`
|
|
265
|
-
3. **Check the App ID is correct** in the workflow files (not still a placeholder)
|
|
202
|
+
Present findings to the user BEFORE writing any code or spec:
|
|
203
|
+
1. **Symptom** \u2014 confirm it matches what they see
|
|
204
|
+
2. **Root cause** \u2014 specific file(s) and line(s)
|
|
205
|
+
3. **Proposed fix** \u2014 what changes, where
|
|
206
|
+
4. **Risk** \u2014 side effects? scope?
|
|
266
207
|
|
|
267
|
-
|
|
208
|
+
Ask: "Does this match? Comfortable with this approach?" If large/risky, suggest decomposing into multiple specs.
|
|
268
209
|
|
|
269
|
-
|
|
210
|
+
**Done when:** User agrees with the diagnosis and fix direction.
|
|
270
211
|
|
|
271
|
-
|
|
272
|
-
>
|
|
273
|
-
> This project uses holdout scenario tests in a separate private repo.
|
|
274
|
-
>
|
|
275
|
-
> ### NEVER
|
|
276
|
-
> - Access, read, or reference the scenarios repo
|
|
277
|
-
> - Mention scenario test names or contents
|
|
278
|
-
> - Modify the scenarios dispatch workflow to leak test information
|
|
279
|
-
>
|
|
280
|
-
> The scenarios repo is deliberately invisible to you. This is the holdout guarantee.
|
|
212
|
+
---
|
|
281
213
|
|
|
282
|
-
##
|
|
214
|
+
## Phase 4: Spec the Fix
|
|
283
215
|
|
|
284
|
-
|
|
216
|
+
Write a bug fix spec to \`docs/specs/YYYY-MM-DD-bugfix-name.md\`. Create the \`docs/specs/\` directory if it doesn't exist.
|
|
285
217
|
|
|
286
|
-
|
|
287
|
-
>
|
|
288
|
-
> 1. Write a simple spec in \`docs/specs/\` and push to main \u2014 this triggers scenario generation
|
|
289
|
-
> 2. Create a PR with a small change \u2014 when CI passes, scenarios will run
|
|
290
|
-
> 3. Watch for the scenario test results as a PR comment
|
|
291
|
-
>
|
|
292
|
-
> Or deliberately break something in a PR to test the autofix loop.
|
|
218
|
+
**Why:** Even bug fixes deserve a spec. It forces clarity on what "fixed" means, ensures test-first discipline, and creates a traceable record of the fix.
|
|
293
219
|
|
|
294
|
-
|
|
220
|
+
Use this template:
|
|
295
221
|
|
|
296
|
-
|
|
222
|
+
\`\`\`markdown
|
|
223
|
+
# Fix [Bug Description] \u2014 Bug Fix Spec
|
|
297
224
|
|
|
298
|
-
> **
|
|
299
|
-
>
|
|
300
|
-
>
|
|
301
|
-
>
|
|
302
|
-
>
|
|
303
|
-
> | PR fails CI | Claude autofix attempts (up to 3x) |
|
|
304
|
-
> | PR passes CI | Holdout scenarios run against PR |
|
|
305
|
-
> | Scenarios update | Open PRs re-tested with latest scenarios |
|
|
306
|
-
>
|
|
307
|
-
> Your scenarios repo: \`{name}\`
|
|
308
|
-
> Your coding agent cannot see those tests. The holdout wall is intact.
|
|
225
|
+
> **Parent Brief:** none (bug fix)
|
|
226
|
+
> **Issue/Error:** [error message, issue link, or symptom description]
|
|
227
|
+
> **Status:** Ready
|
|
228
|
+
> **Date:** YYYY-MM-DD
|
|
229
|
+
> **Estimated scope:** [1 session / N files / ~N lines]
|
|
309
230
|
|
|
310
|
-
Update \`docs/joycraft-assessment.md\` if it exists \u2014 set the Level 5 score to reflect the new setup.
|
|
311
|
-
`,
|
|
312
|
-
"joycraft-interview.md": `---
|
|
313
|
-
name: joycraft-interview
|
|
314
|
-
description: Brainstorm freely about what you want to build \u2014 yap, explore ideas, and get a structured summary you can use later
|
|
315
|
-
instructions: 18
|
|
316
231
|
---
|
|
317
232
|
|
|
318
|
-
|
|
233
|
+
## Bug
|
|
319
234
|
|
|
320
|
-
|
|
235
|
+
What is broken? Describe the symptom the user experiences.
|
|
321
236
|
|
|
322
|
-
##
|
|
237
|
+
## Root Cause
|
|
323
238
|
|
|
324
|
-
|
|
239
|
+
What is wrong in the code and why? Name the specific file(s) and line(s).
|
|
325
240
|
|
|
326
|
-
|
|
327
|
-
"What are you thinking about building? Just talk \u2014 I'll listen and ask questions as we go."
|
|
241
|
+
## Fix
|
|
328
242
|
|
|
329
|
-
|
|
243
|
+
What changes will fix this? Be specific \u2014 describe the code change, not just "fix the bug."
|
|
330
244
|
|
|
331
|
-
|
|
245
|
+
## Acceptance Criteria
|
|
332
246
|
|
|
333
|
-
|
|
247
|
+
- [ ] [The bug no longer occurs \u2014 describe the correct behavior]
|
|
248
|
+
- [ ] [No regressions in related functionality]
|
|
249
|
+
- [ ] Build passes
|
|
250
|
+
- [ ] Tests pass
|
|
334
251
|
|
|
335
|
-
|
|
336
|
-
- **What does "done" look like?** If this worked perfectly, what would a user see?
|
|
337
|
-
- **What are the constraints?** Time, tech, team, budget \u2014 what boxes are we in?
|
|
338
|
-
- **What's NOT in scope?** What's tempting but should be deferred?
|
|
339
|
-
- **What are the edge cases?** What could go wrong? What's the weird input?
|
|
340
|
-
- **What exists already?** Are we building on something or starting fresh?
|
|
252
|
+
## Test Plan
|
|
341
253
|
|
|
342
|
-
|
|
254
|
+
| Acceptance Criterion | Test | Type |
|
|
255
|
+
|---------------------|------|------|
|
|
256
|
+
| [Bug no longer occurs] | [Test that reproduces the bug, then verifies the fix] | [unit/integration/e2e] |
|
|
257
|
+
| [No regressions] | [Existing tests still pass, or new regression test] | [unit/integration] |
|
|
343
258
|
|
|
344
|
-
|
|
345
|
-
|
|
259
|
+
**Execution order:**
|
|
260
|
+
1. Write a test that reproduces the bug \u2014 it should FAIL (red)
|
|
261
|
+
2. Run the test to confirm it fails
|
|
262
|
+
3. Apply the fix
|
|
263
|
+
4. Run the test to confirm it passes (green)
|
|
264
|
+
5. Run the full test suite to check for regressions
|
|
346
265
|
|
|
347
|
-
|
|
266
|
+
**Smoke test:** [The bug reproduction test \u2014 fastest way to verify the fix works]
|
|
348
267
|
|
|
349
|
-
|
|
268
|
+
**Before implementing, verify your test harness:**
|
|
269
|
+
1. Run the reproduction test \u2014 it must FAIL (if it passes, you're not testing the actual bug)
|
|
270
|
+
2. The test must exercise your actual code \u2014 not a reimplementation or mock
|
|
271
|
+
3. Identify your smoke test \u2014 it must run in seconds, not minutes
|
|
350
272
|
|
|
351
|
-
|
|
273
|
+
## Constraints
|
|
352
274
|
|
|
353
|
-
|
|
275
|
+
- MUST: [any hard requirements for the fix]
|
|
276
|
+
- MUST NOT: [any prohibitions \u2014 e.g., don't change the public API]
|
|
354
277
|
|
|
355
|
-
|
|
356
|
-
# [Topic] \u2014 Draft Brief
|
|
278
|
+
## Affected Files
|
|
357
279
|
|
|
358
|
-
|
|
359
|
-
|
|
360
|
-
> **Origin:** /joycraft-interview session
|
|
280
|
+
| Action | File | What Changes |
|
|
281
|
+
|--------|------|-------------|
|
|
361
282
|
|
|
362
|
-
|
|
283
|
+
## Edge Cases
|
|
363
284
|
|
|
364
|
-
|
|
365
|
-
|
|
285
|
+
| Scenario | Expected Behavior |
|
|
286
|
+
|----------|------------------|
|
|
287
|
+
\`\`\`
|
|
366
288
|
|
|
367
|
-
|
|
368
|
-
[What pain or gap this addresses]
|
|
289
|
+
**For trivial bugs:** The spec will be short. That's fine \u2014 the structure is the point, not the length.
|
|
369
290
|
|
|
370
|
-
|
|
371
|
-
[The user's description of success \u2014 observable outcomes]
|
|
291
|
+
**For large bugs that span multiple files/systems:** Consider whether this should be decomposed into multiple specs. If so, create a brief first using \`/joycraft-new-feature\`, then decompose. A bug fix spec should be implementable in a single session.
|
|
372
292
|
|
|
373
|
-
|
|
374
|
-
- [constraint 1]
|
|
375
|
-
- [constraint 2]
|
|
293
|
+
---
|
|
376
294
|
|
|
377
|
-
##
|
|
378
|
-
- [things that came up but weren't resolved]
|
|
379
|
-
- [decisions that need more thought]
|
|
295
|
+
## Phase 5: Hand Off
|
|
380
296
|
|
|
381
|
-
|
|
382
|
-
- [things explicitly deferred]
|
|
297
|
+
Tell the user:
|
|
383
298
|
|
|
384
|
-
## Raw Notes
|
|
385
|
-
[Any additional context, quotes, or tangents worth preserving]
|
|
386
299
|
\`\`\`
|
|
300
|
+
Bug fix spec is ready: docs/specs/YYYY-MM-DD-bugfix-name.md
|
|
387
301
|
|
|
388
|
-
|
|
389
|
-
|
|
390
|
-
|
|
302
|
+
Summary:
|
|
303
|
+
- Bug: [one sentence]
|
|
304
|
+
- Root cause: [one sentence]
|
|
305
|
+
- Fix: [one sentence]
|
|
306
|
+
- Estimated: 1 session
|
|
391
307
|
|
|
392
|
-
|
|
393
|
-
|
|
308
|
+
To execute: Start a fresh session and:
|
|
309
|
+
1. Read the spec
|
|
310
|
+
2. Write the reproduction test (must fail)
|
|
311
|
+
3. Apply the fix (test must pass)
|
|
312
|
+
4. Run full test suite
|
|
313
|
+
5. Run /joycraft-session-end to capture discoveries
|
|
314
|
+
6. Commit and PR
|
|
394
315
|
|
|
395
|
-
|
|
396
|
-
- /joycraft-new-feature \u2014 formalize this into a full Feature Brief with specs
|
|
397
|
-
- /joycraft-decompose \u2014 break it directly into atomic specs if scope is clear
|
|
398
|
-
- Or just keep brainstorming \u2014 run /joycraft-interview again anytime
|
|
316
|
+
Ready to start?
|
|
399
317
|
\`\`\`
|
|
400
318
|
|
|
401
|
-
|
|
402
|
-
|
|
403
|
-
- **This is NOT /joycraft-new-feature.** Do not push toward formal briefs, decomposition tables, or atomic specs. The point is exploration.
|
|
404
|
-
- **Let the user lead.** Your job is to listen, clarify, and capture \u2014 not to structure or direct.
|
|
405
|
-
- **Mark everything as DRAFT.** The output is a starting point, not a commitment.
|
|
406
|
-
- **Keep it short.** The draft brief should be 1-2 pages max. Capture the essence, not every detail.
|
|
407
|
-
- **Multiple interviews are fine.** The user might run this several times as their thinking evolves. Each creates a new dated draft.
|
|
319
|
+
**Why:** A fresh session for implementation produces better results. This diagnostic session has context noise from exploration \u2014 a clean session with just the spec is more focused.
|
|
408
320
|
`,
|
|
409
|
-
"joycraft-
|
|
410
|
-
name: joycraft-
|
|
411
|
-
description:
|
|
412
|
-
instructions:
|
|
321
|
+
"joycraft-decompose.md": `---
|
|
322
|
+
name: joycraft-decompose
|
|
323
|
+
description: Break a feature brief into atomic specs \u2014 small, testable, independently executable units
|
|
324
|
+
instructions: 32
|
|
413
325
|
---
|
|
414
326
|
|
|
415
|
-
#
|
|
416
|
-
|
|
417
|
-
You are starting a new feature. Follow this process in order. Do not skip steps.
|
|
418
|
-
|
|
419
|
-
## Phase 1: Interview
|
|
420
|
-
|
|
421
|
-
Interview the user about what they want to build. Let them talk \u2014 your job is to listen, then sharpen.
|
|
422
|
-
|
|
423
|
-
**Ask about:**
|
|
424
|
-
- What problem does this solve? Who is affected?
|
|
425
|
-
- What does "done" look like?
|
|
426
|
-
- Hard constraints? (business rules, tech limitations, deadlines)
|
|
427
|
-
- What is explicitly NOT in scope? (push hard on this)
|
|
428
|
-
- Edge cases or error conditions?
|
|
429
|
-
- What existing code/patterns should this follow?
|
|
430
|
-
- Testing: existing setup? framework? smoke test budget? lockdown mode desired?
|
|
431
|
-
|
|
432
|
-
**Interview technique:**
|
|
433
|
-
- Let the user "yap" \u2014 don't interrupt their flow
|
|
434
|
-
- Play back your understanding: "So if I'm hearing you right..."
|
|
435
|
-
- Push toward testable statements: "How would we verify that works?"
|
|
436
|
-
|
|
437
|
-
Keep asking until you can fill out a Feature Brief.
|
|
438
|
-
|
|
439
|
-
## Phase 2: Feature Brief
|
|
440
|
-
|
|
441
|
-
Write a Feature Brief to \`docs/briefs/YYYY-MM-DD-feature-name.md\`. Create the \`docs/briefs/\` directory if it doesn't exist.
|
|
327
|
+
# Decompose Feature into Atomic Specs
|
|
442
328
|
|
|
443
|
-
|
|
329
|
+
You have a Feature Brief (or the user has described a feature). Your job is to decompose it into atomic specs that can be executed independently \u2014 one spec per session.
|
|
444
330
|
|
|
445
|
-
|
|
331
|
+
## Step 1: Verify the Brief Exists
|
|
446
332
|
|
|
447
|
-
|
|
448
|
-
# [Feature Name] \u2014 Feature Brief
|
|
333
|
+
Look for a Feature Brief in \`docs/briefs/\`. If one doesn't exist yet, tell the user:
|
|
449
334
|
|
|
450
|
-
>
|
|
451
|
-
> **Project:** [project name]
|
|
452
|
-
> **Status:** Interview | Decomposing | Specs Ready | In Progress | Complete
|
|
335
|
+
> No feature brief found. Run \`/joycraft-new-feature\` first to interview and create one, or describe the feature now and I'll work from your description.
|
|
453
336
|
|
|
454
|
-
|
|
337
|
+
If the user describes the feature inline, work from that description directly. You don't need a formal brief to decompose \u2014 but recommend creating one for complex features.
|
|
455
338
|
|
|
456
|
-
##
|
|
457
|
-
What are we building and why? The full picture in 2-4 paragraphs.
|
|
339
|
+
## Step 2: Identify Natural Boundaries
|
|
458
340
|
|
|
459
|
-
|
|
460
|
-
- As a [role], I want [capability] so that [benefit]
|
|
341
|
+
**Why:** Good boundaries make specs independently testable and committable. Bad boundaries create specs that can't be verified without other specs also being done.
|
|
461
342
|
|
|
462
|
-
|
|
463
|
-
- MUST: [constraint that every spec must respect]
|
|
464
|
-
- MUST NOT: [prohibition that every spec must respect]
|
|
343
|
+
Read the brief (or description) and identify natural split points:
|
|
465
344
|
|
|
466
|
-
|
|
467
|
-
-
|
|
345
|
+
- **Data layer changes** (schemas, types, migrations) \u2014 always a separate spec
|
|
346
|
+
- **Pure functions / business logic** \u2014 separate from I/O
|
|
347
|
+
- **UI components** \u2014 separate from data fetching
|
|
348
|
+
- **API endpoints / route handlers** \u2014 separate from business logic
|
|
349
|
+
- **Test infrastructure** (mocks, fixtures, helpers) \u2014 can be its own spec if substantial
|
|
350
|
+
- **Configuration / environment** \u2014 separate from code changes
|
|
468
351
|
|
|
469
|
-
|
|
470
|
-
- **Existing setup:** [framework and tools, or "none yet"]
|
|
471
|
-
- **User expertise:** [comfortable / learning / needs guidance]
|
|
472
|
-
- **Test types:** [smoke, unit, integration, e2e, etc.]
|
|
473
|
-
- **Smoke test budget:** [target time for fast-feedback tests]
|
|
474
|
-
- **Lockdown mode:** [yes/no \u2014 constrain agent to code + tests only]
|
|
352
|
+
Ask yourself: "Can this piece be committed and tested without the other pieces existing?" If yes, it's a good boundary.
|
|
475
353
|
|
|
476
|
-
## Decomposition
|
|
477
|
-
| # | Spec Name | Description | Dependencies | Est. Size |
|
|
478
|
-
|---|-----------|-------------|--------------|-----------|
|
|
479
|
-
| 1 | [verb-object] | [one sentence] | None | [S/M/L] |
|
|
354
|
+
## Step 3: Build the Decomposition Table
|
|
480
355
|
|
|
481
|
-
|
|
482
|
-
- [ ] Sequential (specs have chain dependencies)
|
|
483
|
-
- [ ] Parallel worktrees (specs are independent)
|
|
484
|
-
- [ ] Mixed
|
|
356
|
+
For each atomic spec, define:
|
|
485
357
|
|
|
486
|
-
|
|
487
|
-
|
|
488
|
-
- [ ] [No regressions in existing features]
|
|
489
|
-
\`\`\`
|
|
358
|
+
| # | Spec Name | Description | Dependencies | Size |
|
|
359
|
+
|---|-----------|-------------|--------------|------|
|
|
490
360
|
|
|
491
|
-
|
|
361
|
+
**Rules:**
|
|
362
|
+
- Each spec name is \`verb-object\` format (e.g., \`add-terminal-detection\`, \`extract-prompt-module\`)
|
|
363
|
+
- Each description is ONE sentence \u2014 if you need two, the spec is too big
|
|
364
|
+
- Dependencies reference other spec numbers \u2014 keep the dependency graph shallow
|
|
365
|
+
- More than 2 dependencies on a single spec = it's too big, split further
|
|
366
|
+
- Aim for 3-7 specs per feature. Fewer than 3 = probably not decomposed enough. More than 10 = the feature brief is too big
|
|
492
367
|
|
|
493
|
-
|
|
494
|
-
- "Does the decomposition match how you think about this?"
|
|
495
|
-
- "Is anything in scope that shouldn't be?"
|
|
496
|
-
- "Are the specs small enough? Can each be described in one sentence?"
|
|
368
|
+
## Step 4: Present and Iterate
|
|
497
369
|
|
|
498
|
-
|
|
370
|
+
Show the decomposition table to the user. Ask:
|
|
371
|
+
1. "Does this breakdown match how you think about this feature?"
|
|
372
|
+
2. "Are there any specs that feel too big or too small?"
|
|
373
|
+
3. "Should any of these run in parallel (separate worktrees)?"
|
|
499
374
|
|
|
500
|
-
|
|
375
|
+
Iterate until the user approves.
|
|
501
376
|
|
|
502
|
-
|
|
377
|
+
## Step 5: Generate Atomic Specs
|
|
503
378
|
|
|
504
|
-
|
|
379
|
+
For each approved row, create \`docs/specs/YYYY-MM-DD-spec-name.md\`. Create the \`docs/specs/\` directory if it doesn't exist.
|
|
505
380
|
|
|
506
|
-
|
|
381
|
+
**Why:** Each spec must be self-contained \u2014 a fresh Claude session should be able to execute it without reading the Feature Brief. Copy relevant constraints and context into each spec.
|
|
382
|
+
|
|
383
|
+
Use this structure:
|
|
507
384
|
|
|
508
385
|
\`\`\`markdown
|
|
509
386
|
# [Verb + Object] \u2014 Atomic Spec
|
|
510
387
|
|
|
511
|
-
> **Parent Brief:** \`docs/briefs/YYYY-MM-DD-feature-name.md\`
|
|
388
|
+
> **Parent Brief:** \`docs/briefs/YYYY-MM-DD-feature-name.md\` (or "standalone")
|
|
512
389
|
> **Status:** Ready
|
|
513
390
|
> **Date:** YYYY-MM-DD
|
|
514
391
|
> **Estimated scope:** [1 session / N files / ~N lines]
|
|
@@ -562,420 +439,382 @@ Strategy, data flow, key decisions. Name one rejected alternative.
|
|
|
562
439
|
|
|
563
440
|
If \`docs/templates/ATOMIC_SPEC_TEMPLATE.md\` exists, reference it for the full template with additional guidance.
|
|
564
441
|
|
|
565
|
-
|
|
442
|
+
Fill in all sections \u2014 each spec must be self-contained (no "see the brief for context"). Copy relevant constraints from the Feature Brief into each spec. Write acceptance criteria specific to THIS spec, not the whole feature. Every acceptance criterion must have at least one corresponding test in the Test Plan. If the user provided test strategy info from the interview, use it to choose test types and frameworks. Include the test harness verification rules in every Test Plan.
|
|
566
443
|
|
|
567
|
-
|
|
568
|
-
\`\`\`
|
|
569
|
-
Feature Brief and [N] atomic specs are ready.
|
|
444
|
+
## Step 6: Recommend Execution Strategy
|
|
570
445
|
|
|
571
|
-
|
|
572
|
-
|
|
573
|
-
|
|
574
|
-
|
|
446
|
+
Based on the dependency graph:
|
|
447
|
+
- **Independent specs** \u2014 "These can run in parallel worktrees"
|
|
448
|
+
- **Sequential specs** \u2014 "Execute these in order: 1 -> 2 -> 4"
|
|
449
|
+
- **Mixed** \u2014 "Start specs 1 and 3 in parallel. After 1 completes, start 2."
|
|
575
450
|
|
|
576
|
-
|
|
577
|
-
- [Parallel/Sequential/Mixed strategy]
|
|
578
|
-
- Estimated: [N] sessions total
|
|
451
|
+
Update the Feature Brief's Execution Strategy section with the plan (if a brief exists).
|
|
579
452
|
|
|
580
|
-
|
|
581
|
-
1. Read the spec
|
|
582
|
-
2. Implement
|
|
583
|
-
3. Run /joycraft-session-end to capture discoveries
|
|
584
|
-
4. Commit and PR
|
|
453
|
+
## Step 7: Hand Off
|
|
585
454
|
|
|
586
|
-
|
|
455
|
+
Tell the user:
|
|
587
456
|
\`\`\`
|
|
457
|
+
Decomposition complete:
|
|
458
|
+
- [N] atomic specs created in docs/specs/
|
|
459
|
+
- [N] can run in parallel, [N] are sequential
|
|
460
|
+
- Estimated total: [N] sessions
|
|
588
461
|
|
|
589
|
-
|
|
462
|
+
To execute:
|
|
463
|
+
- Sequential: Open a session, point Claude at each spec in order
|
|
464
|
+
- Parallel: Use worktrees \u2014 one spec per worktree, merge when done
|
|
465
|
+
- Each session should end with /joycraft-session-end to capture discoveries
|
|
590
466
|
|
|
591
|
-
|
|
467
|
+
Ready to start execution?
|
|
468
|
+
\`\`\`
|
|
592
469
|
`,
|
|
593
|
-
"joycraft-
|
|
594
|
-
name: joycraft-
|
|
595
|
-
description:
|
|
596
|
-
instructions: 22
|
|
470
|
+
"joycraft-design.md": `---
|
|
471
|
+
name: joycraft-design
|
|
472
|
+
description: Design discussion before decomposition \u2014 produce a ~200-line design artifact for human review, catching wrong assumptions before they propagate into specs
|
|
597
473
|
---
|
|
598
474
|
|
|
599
|
-
#
|
|
600
|
-
|
|
601
|
-
Before ending this session, complete these steps in order.
|
|
602
|
-
|
|
603
|
-
## 1. Capture Discoveries
|
|
604
|
-
|
|
605
|
-
**Why:** Discoveries are the surprises \u2014 things that weren't in the spec or that contradicted expectations. They prevent future sessions from hitting the same walls.
|
|
606
|
-
|
|
607
|
-
Check: did anything surprising happen during this session? If yes, create or update a discovery file at \`docs/discoveries/YYYY-MM-DD-topic.md\`. Create the \`docs/discoveries/\` directory if it doesn't exist.
|
|
608
|
-
|
|
609
|
-
Only capture what's NOT obvious from the code or git diff:
|
|
610
|
-
- "We thought X but found Y" \u2014 assumptions that were wrong
|
|
611
|
-
- "This API/library behaves differently than documented" \u2014 external gotchas
|
|
612
|
-
- "This edge case needs handling in a future spec" \u2014 deferred work with context
|
|
613
|
-
- "The approach in the spec didn't work because..." \u2014 spec-vs-reality gaps
|
|
614
|
-
- Key decisions made during implementation that aren't in the spec
|
|
615
|
-
|
|
616
|
-
**Do NOT capture:**
|
|
617
|
-
- Files changed (that's the diff)
|
|
618
|
-
- What you set out to do (that's the spec)
|
|
619
|
-
- Step-by-step narrative of the session (nobody re-reads these)
|
|
620
|
-
|
|
621
|
-
Use this format:
|
|
622
|
-
|
|
623
|
-
\`\`\`markdown
|
|
624
|
-
# Discoveries \u2014 [topic]
|
|
625
|
-
|
|
626
|
-
**Date:** YYYY-MM-DD
|
|
627
|
-
**Spec:** [link to spec if applicable]
|
|
628
|
-
|
|
629
|
-
## [Discovery title]
|
|
630
|
-
**Expected:** [what we thought would happen]
|
|
631
|
-
**Actual:** [what actually happened]
|
|
632
|
-
**Impact:** [what this means for future work]
|
|
633
|
-
\`\`\`
|
|
634
|
-
|
|
635
|
-
If nothing surprising happened, skip the discovery file entirely. No discovery is a good sign \u2014 the spec was accurate.
|
|
636
|
-
|
|
637
|
-
## 1b. Update Context Documents
|
|
638
|
-
|
|
639
|
-
If \`docs/context/\` exists, quickly check whether this session revealed anything about:
|
|
640
|
-
|
|
641
|
-
- **Production risks** \u2014 did you interact with or learn about production vs staging systems? \u2192 Update \`docs/context/production-map.md\`
|
|
642
|
-
- **Wrong assumptions** \u2014 did the agent (or you) assume something that turned out to be false? \u2192 Update \`docs/context/dangerous-assumptions.md\`
|
|
643
|
-
- **Key decisions** \u2014 did you make an architectural or tooling choice? \u2192 Add a row to \`docs/context/decision-log.md\`
|
|
644
|
-
- **Unwritten rules** \u2014 did you discover a convention or constraint not documented anywhere? \u2192 Update \`docs/context/institutional-knowledge.md\`
|
|
645
|
-
|
|
646
|
-
Skip this if nothing applies. Don't force it \u2014 only update when there's genuine new context.
|
|
647
|
-
|
|
648
|
-
## 2. Run Validation
|
|
649
|
-
|
|
650
|
-
Run the project's validation commands. Check CLAUDE.md for project-specific commands. Common checks:
|
|
651
|
-
|
|
652
|
-
- Type-check (e.g., \`tsc --noEmit\`, \`mypy\`, \`cargo check\`)
|
|
653
|
-
- Tests (e.g., \`npm test\`, \`pytest\`, \`cargo test\`)
|
|
654
|
-
- Lint (e.g., \`eslint\`, \`ruff\`, \`clippy\`)
|
|
475
|
+
# Design Discussion
|
|
655
476
|
|
|
656
|
-
|
|
477
|
+
You are producing a design discussion document for a feature. This sits between research and decomposition \u2014 it captures your understanding so the human can catch wrong assumptions before specs are written.
|
|
657
478
|
|
|
658
|
-
|
|
479
|
+
**Guard clause:** If no brief path is provided and no brief exists in \`docs/briefs/\`, say:
|
|
480
|
+
"No feature brief found. Run \`/joycraft-new-feature\` first to create one, or provide the path to your brief."
|
|
481
|
+
Then stop.
|
|
659
482
|
|
|
660
|
-
|
|
661
|
-
- All acceptance criteria met \u2014 update status to \`Complete\`
|
|
662
|
-
- Partially done \u2014 update status to \`In Progress\`, note what's left
|
|
483
|
+
---
|
|
663
484
|
|
|
664
|
-
|
|
485
|
+
## Step 1: Read Inputs
|
|
665
486
|
|
|
666
|
-
|
|
487
|
+
Read the feature brief at the path the user provides. If the user also provides a research document path, read that too. Research is optional \u2014 if none exists, note that you'll explore the codebase directly.
|
|
667
488
|
|
|
668
|
-
|
|
489
|
+
## Step 2: Explore the Codebase
|
|
669
490
|
|
|
670
|
-
|
|
491
|
+
Spawn subagents to explore the codebase for patterns relevant to the brief. Focus on:
|
|
671
492
|
|
|
672
|
-
|
|
493
|
+
- Files and functions that will be touched or extended
|
|
494
|
+
- Existing patterns this feature should follow (naming, data flow, error handling)
|
|
495
|
+
- Similar features already implemented that serve as models
|
|
496
|
+
- Boundaries and interfaces the feature must integrate with
|
|
673
497
|
|
|
674
|
-
|
|
675
|
-
2. **Open a PR if the feature is complete.** Check the parent Feature Brief's decomposition table \u2014 if all specs are done, run \`gh pr create\` with a summary of all completed specs. Do not ask first.
|
|
676
|
-
3. **If not all specs are done,** still push. The PR comes when the last spec is complete.
|
|
498
|
+
Gather file paths, function signatures, and code snippets. You need concrete evidence, not guesses.
|
|
677
499
|
|
|
678
|
-
|
|
500
|
+
## Step 3: Write the Design Document
|
|
679
501
|
|
|
680
|
-
|
|
502
|
+
Create \`docs/designs/\` directory if it doesn't exist. Write the design document to \`docs/designs/YYYY-MM-DD-feature-name.md\`.
|
|
681
503
|
|
|
682
|
-
|
|
683
|
-
Session complete.
|
|
684
|
-
- Spec: [spec name] \u2014 [Complete / In Progress]
|
|
685
|
-
- Build: [passing / failing]
|
|
686
|
-
- Discoveries: [N items / none]
|
|
687
|
-
- Pushed: [yes / no \u2014 and why not]
|
|
688
|
-
- PR: [opened #N / not yet \u2014 N specs remaining]
|
|
689
|
-
- Next: [what the next session should tackle]
|
|
690
|
-
\`\`\`
|
|
504
|
+
The document has exactly five sections:
|
|
691
505
|
|
|
692
|
-
|
|
506
|
+
### Section 1: Current State
|
|
693
507
|
|
|
694
|
-
|
|
508
|
+
What exists today in the codebase that is relevant to this feature. Include file paths, function signatures, and data flows. Be specific \u2014 reference actual code, not abstractions. If no research doc was provided, note that and describe what you found through direct exploration.
|
|
695
509
|
|
|
696
|
-
|
|
697
|
-
1. \`docs/pipit-examples/\` directory exists
|
|
698
|
-
2. A Feature Brief was produced or referenced during this session (check \`docs/briefs/\`)
|
|
699
|
-
3. Atomic specs were generated from that brief (check \`docs/specs/\`)
|
|
510
|
+
### Section 2: Desired End State
|
|
700
511
|
|
|
701
|
-
|
|
512
|
+
What the codebase should look like when this feature is complete. Describe the change at a high level \u2014 new files, modified interfaces, new data flows. Do NOT include implementation steps. This is the "what," not the "how."
|
|
702
513
|
|
|
703
|
-
|
|
514
|
+
### Section 3: Patterns to Follow
|
|
704
515
|
|
|
705
|
-
|
|
706
|
-
# [Feature Name] \u2014 Golden Example
|
|
516
|
+
Existing patterns in the codebase that this feature should match. Include short code snippets and \`file:line\` references. Show the pattern, don't just name it.
|
|
707
517
|
|
|
708
|
-
|
|
709
|
-
> **Project:** [project name from CLAUDE.md or directory name]
|
|
710
|
-
> **Source Brief:** \\\`docs/briefs/YYYY-MM-DD-feature-name.md\\\`
|
|
518
|
+
If this is a greenfield project with no existing patterns, propose conventions and note that no precedent exists.
|
|
711
519
|
|
|
712
|
-
|
|
520
|
+
### Section 4: Resolved Design Decisions
|
|
713
521
|
|
|
714
|
-
|
|
522
|
+
Decisions you have already made, with brief rationale. Format each as:
|
|
715
523
|
|
|
716
|
-
> [
|
|
524
|
+
> **Decision:** [what you decided]
|
|
525
|
+
> **Rationale:** [why, referencing existing code or constraints]
|
|
526
|
+
> **Alternative rejected:** [what you considered and why you rejected it]
|
|
717
527
|
|
|
718
|
-
|
|
528
|
+
### Section 5: Open Questions
|
|
719
529
|
|
|
720
|
-
|
|
721
|
-
- **Confidence:** [high | medium | low]
|
|
722
|
-
- **Skills Used:** [list the joycraft skills that were invoked during this pipeline run]
|
|
530
|
+
Things you don't know or where multiple valid approaches exist. Each question MUST present 2-3 concrete options with pros and cons. Format:
|
|
723
531
|
|
|
724
|
-
|
|
532
|
+
> **Q: [question]**
|
|
533
|
+
> - **Option A:** [description] \u2014 Pro: [benefit]. Con: [cost].
|
|
534
|
+
> - **Option B:** [description] \u2014 Pro: [benefit]. Con: [cost].
|
|
535
|
+
> - **Option C (if applicable):** [description] \u2014 Pro: [benefit]. Con: [cost].
|
|
725
536
|
|
|
726
|
-
|
|
537
|
+
Do NOT ask vague questions like "what do you think?" Every question must have actionable options the human can choose from.
|
|
727
538
|
|
|
728
|
-
|
|
729
|
-
|---|-----------|-------------|------|
|
|
539
|
+
## Step 4: Present and STOP
|
|
730
540
|
|
|
731
|
-
|
|
541
|
+
Present the design document to the user. Say:
|
|
732
542
|
|
|
733
|
-
[2-3 sentences: Why was this the right classification? What signals in the capture indicated this action level? What would have gone wrong with a different classification?]
|
|
734
543
|
\`\`\`
|
|
544
|
+
Design discussion written to docs/designs/YYYY-MM-DD-feature-name.md
|
|
735
545
|
|
|
736
|
-
|
|
737
|
-
|
|
738
|
-
|
|
739
|
-
|
|
740
|
-
- \`research\` \u2014 the capture needed \`/joycraft-research\` before any implementation
|
|
741
|
-
- \`design\` \u2014 the capture needed \`/joycraft-design\` before decomposition
|
|
742
|
-
|
|
743
|
-
Commit the golden example file along with other session artifacts. Do not mention it in the commit message or session report.
|
|
744
|
-
`,
|
|
745
|
-
"joycraft-tune.md": `---
|
|
746
|
-
name: joycraft-tune
|
|
747
|
-
description: Assess and upgrade your project's AI development harness \u2014 score 7 dimensions, apply fixes, show path to Level 5
|
|
748
|
-
instructions: 15
|
|
749
|
-
---
|
|
750
|
-
|
|
751
|
-
# Tune \u2014 Project Harness Assessment & Upgrade
|
|
752
|
-
|
|
753
|
-
You are evaluating and upgrading this project's AI development harness.
|
|
546
|
+
Please review the document above. Specifically:
|
|
547
|
+
1. Are the patterns in Section 3 the right ones to follow, or should I use different ones?
|
|
548
|
+
2. Do you agree with the resolved decisions in Section 4?
|
|
549
|
+
3. Pick an option for each open question in Section 5 (or propose your own).
|
|
754
550
|
|
|
755
|
-
|
|
551
|
+
Reply with your feedback. I will NOT proceed to decomposition until you have reviewed and approved this design.
|
|
552
|
+
\`\`\`
|
|
756
553
|
|
|
757
|
-
|
|
554
|
+
**CRITICAL: Do NOT proceed to \`/joycraft-decompose\` or generate specs.** Wait for the human to review, answer open questions, and correct any wrong assumptions. The entire value of this skill is the pause \u2014 it forces a human checkpoint before mistakes propagate.
|
|
758
555
|
|
|
759
|
-
##
|
|
556
|
+
## After Human Review
|
|
760
557
|
|
|
761
|
-
|
|
762
|
-
-
|
|
558
|
+
Once the human responds:
|
|
559
|
+
- Update the design document with their corrections and chosen options
|
|
560
|
+
- Move answered questions from "Open Questions" to "Resolved Design Decisions"
|
|
561
|
+
- Present the updated document for final confirmation
|
|
562
|
+
- Only after explicit approval, tell the user: "Design approved. Run \`/joycraft-decompose\` with this brief to generate atomic specs."
|
|
563
|
+
`,
|
|
564
|
+
"joycraft-implement-level5.md": `---
|
|
565
|
+
name: joycraft-implement-level5
|
|
566
|
+
description: Set up Level 5 autonomous development \u2014 autofix loop, holdout scenario testing, and scenario evolution from specs
|
|
567
|
+
instructions: 35
|
|
568
|
+
---
|
|
763
569
|
|
|
764
|
-
|
|
570
|
+
# Implement Level 5 \u2014 Autonomous Development Loop
|
|
765
571
|
|
|
766
|
-
|
|
572
|
+
You are guiding the user through setting up Level 5: the autonomous feedback loop where specs go in, validated software comes out. This is a one-time setup that installs workflows, creates a scenarios repo, and configures the autofix loop.
|
|
767
573
|
|
|
768
|
-
|
|
769
|
-
|-----------|--------------|
|
|
770
|
-
| Spec Quality | \`docs/specs/\` \u2014 structured? acceptance criteria? self-contained? |
|
|
771
|
-
| Spec Granularity | Can each spec be done in one session? |
|
|
772
|
-
| Behavioral Boundaries | ALWAYS/ASK FIRST/NEVER sections (or equivalent rules under any heading) |
|
|
773
|
-
| Skills & Hooks | \`.claude/skills/\` files, hooks config |
|
|
774
|
-
| Documentation | \`docs/\` structure, templates, referenced from CLAUDE.md |
|
|
775
|
-
| Knowledge Capture | \`docs/discoveries/\`, \`docs/context/*.md\` \u2014 existence AND real content |
|
|
776
|
-
| Testing & Validation | Test framework, CI pipeline, validation commands in CLAUDE.md |
|
|
574
|
+
## Before You Begin
|
|
777
575
|
|
|
778
|
-
|
|
576
|
+
Check prerequisites:
|
|
779
577
|
|
|
780
|
-
|
|
578
|
+
1. **Project must be initialized.** Look for \`.joycraft-version\`. If missing, tell the user to run \`npx joycraft init\` first.
|
|
579
|
+
2. **Project should be at Level 4.** Check \`docs/joycraft-assessment.md\` if it exists. If the project hasn't been assessed yet, suggest running \`/joycraft-tune\` first. But don't block \u2014 the user may know they're ready.
|
|
580
|
+
3. **Git repo with GitHub remote.** This setup requires GitHub Actions. Check for \`.git/\` and a GitHub remote.
|
|
781
581
|
|
|
782
|
-
|
|
582
|
+
If prerequisites aren't met, explain what's needed and stop.
|
|
783
583
|
|
|
784
|
-
## Step
|
|
584
|
+
## Step 1: Explain What Level 5 Means
|
|
785
585
|
|
|
786
|
-
|
|
586
|
+
Tell the user:
|
|
787
587
|
|
|
788
|
-
|
|
588
|
+
> Level 5 is the autonomous loop. When you push specs, three things happen automatically:
|
|
589
|
+
>
|
|
590
|
+
> 1. **Scenario evolution** \u2014 A separate AI agent reads your specs and writes holdout tests in a private scenarios repo. These tests are invisible to your coding agent.
|
|
591
|
+
> 2. **Autofix** \u2014 When CI fails on a PR, Claude Code automatically attempts a fix (up to 3 times).
|
|
592
|
+
> 3. **Holdout validation** \u2014 When CI passes, your scenarios repo runs behavioral tests against the PR. Results post as PR comments.
|
|
593
|
+
>
|
|
594
|
+
> The key insight: your coding agent never sees the scenario tests. This prevents it from gaming the test suite \u2014 like a validation set in machine learning.
|
|
789
595
|
|
|
790
|
-
|
|
596
|
+
## Step 2: Gather Configuration
|
|
791
597
|
|
|
792
|
-
|
|
793
|
-
2. **Risk interview (3-5 questions, one at a time):** What could break? What services connect to prod? Unwritten rules? Off-limits files/commands? Skip if \`docs/context/\` already has content.
|
|
598
|
+
Ask these questions **one at a time**:
|
|
794
599
|
|
|
795
|
-
|
|
600
|
+
### Question 1: Scenarios repo name
|
|
796
601
|
|
|
797
|
-
|
|
602
|
+
> What should we call your scenarios repo? It'll be a private repo that holds your holdout tests.
|
|
603
|
+
>
|
|
604
|
+
> Default: \`{current-repo-name}-scenarios\`
|
|
798
605
|
|
|
799
|
-
|
|
606
|
+
Accept the default or the user's choice.
|
|
800
607
|
|
|
801
|
-
|
|
608
|
+
### Question 2: GitHub App
|
|
802
609
|
|
|
803
|
-
|
|
610
|
+
> Level 5 needs a GitHub App to provide a separate identity for autofix pushes (this avoids GitHub's anti-recursion protection). Creating one takes about 2 minutes:
|
|
611
|
+
>
|
|
612
|
+
> 1. Go to https://github.com/settings/apps/new
|
|
613
|
+
> 2. Give it a name (e.g., "My Project Autofix")
|
|
614
|
+
> 3. Uncheck "Webhook > Active" (not needed)
|
|
615
|
+
> 4. Under **Repository permissions**, set:
|
|
616
|
+
> - **Contents**: Read & Write
|
|
617
|
+
> - **Pull requests**: Read & Write
|
|
618
|
+
> - **Actions**: Read & Write
|
|
619
|
+
> 5. Click **Create GitHub App**
|
|
620
|
+
> 6. Note the **App ID** from the settings page
|
|
621
|
+
> 7. Scroll to **Private keys** > click **Generate a private key** > save the \`.pem\` file
|
|
622
|
+
> 8. Click **Install App** in the left sidebar > install it on your repo
|
|
623
|
+
>
|
|
624
|
+
> What's your App ID?
|
|
804
625
|
|
|
805
|
-
|
|
626
|
+
## Step 3: Run init-autofix
|
|
806
627
|
|
|
807
|
-
|
|
628
|
+
Run the CLI command with the gathered configuration:
|
|
808
629
|
|
|
809
|
-
|
|
810
|
-
-
|
|
811
|
-
|
|
812
|
-
- **Previous assessment exists:** Read it first. If nothing to upgrade, say so.
|
|
813
|
-
- **Non-Joycraft content in CLAUDE.md:** Preserve as-is. Only append.
|
|
814
|
-
`,
|
|
815
|
-
"joycraft-add-fact.md": `---
|
|
816
|
-
name: joycraft-add-fact
|
|
817
|
-
description: Capture a project fact and route it to the correct context document -- production map, dangerous assumptions, decision log, institutional knowledge, or troubleshooting
|
|
818
|
-
instructions: 38
|
|
819
|
-
---
|
|
630
|
+
\`\`\`bash
|
|
631
|
+
npx joycraft init-autofix --scenarios-repo {name} --app-id {id}
|
|
632
|
+
\`\`\`
|
|
820
633
|
|
|
821
|
-
|
|
634
|
+
Review the output with the user. Confirm files were created.
|
|
822
635
|
|
|
823
|
-
|
|
636
|
+
## Step 4: Walk Through Secret Configuration
|
|
824
637
|
|
|
825
|
-
|
|
638
|
+
Guide the user step by step:
|
|
826
639
|
|
|
827
|
-
|
|
640
|
+
### 4a: Add Secrets to Main Repo
|
|
828
641
|
|
|
829
|
-
|
|
642
|
+
> You should already have the \`.pem\` file from when you created the app in Step 2.
|
|
830
643
|
|
|
831
|
-
|
|
644
|
+
> Go to your repo's Settings > Secrets and variables > Actions, and add:
|
|
645
|
+
> - \`JOYCRAFT_APP_PRIVATE_KEY\` \u2014 paste the contents of your \`.pem\` file
|
|
646
|
+
> - \`ANTHROPIC_API_KEY\` \u2014 your Anthropic API key
|
|
832
647
|
|
|
833
|
-
|
|
648
|
+
### 4b: Create the Scenarios Repo
|
|
834
649
|
|
|
835
|
-
|
|
650
|
+
> Create the private scenarios repo:
|
|
651
|
+
> \`\`\`bash
|
|
652
|
+
> gh repo create {scenarios-repo-name} --private
|
|
653
|
+
> \`\`\`
|
|
654
|
+
>
|
|
655
|
+
> Then copy the scenario templates into it:
|
|
656
|
+
> \`\`\`bash
|
|
657
|
+
> cp -r docs/templates/scenarios/* ../{scenarios-repo-name}/
|
|
658
|
+
> cd ../{scenarios-repo-name}
|
|
659
|
+
> git add -A && git commit -m "init: scaffold scenarios repo from Joycraft"
|
|
660
|
+
> git push
|
|
661
|
+
> \`\`\`
|
|
836
662
|
|
|
837
|
-
###
|
|
838
|
-
The fact is about **infrastructure, services, environments, URLs, endpoints, credentials, or what is safe/unsafe to touch**.
|
|
839
|
-
- Signal words: "production", "staging", "endpoint", "URL", "database", "service", "deployed", "hosted", "credentials", "secret", "environment"
|
|
840
|
-
- Examples: "The staging DB is at postgres://staging.example.com", "We use Vercel for the frontend and Railway for the API"
|
|
663
|
+
### 4c: Add Secrets to Scenarios Repo
|
|
841
664
|
|
|
842
|
-
|
|
843
|
-
|
|
844
|
-
-
|
|
845
|
-
- Examples: "The \`users\` table looks like a test table but it's production", "Deleting a workspace doesn't delete the billing subscription"
|
|
665
|
+
> The scenarios repo also needs the App private key:
|
|
666
|
+
> - \`JOYCRAFT_APP_PRIVATE_KEY\` \u2014 same \`.pem\` file as the main repo
|
|
667
|
+
> - \`ANTHROPIC_API_KEY\` \u2014 same key (needed for scenario generation)
|
|
846
668
|
|
|
847
|
-
|
|
848
|
-
The fact is about **an architectural or tooling choice and why it was made**.
|
|
849
|
-
- Signal words: "decided", "chose", "because", "instead of", "we went with", "the reason we use", "trade-off"
|
|
850
|
-
- Examples: "We chose SQLite over Postgres because this runs on embedded devices", "We use pnpm instead of npm for workspace support"
|
|
669
|
+
## Step 5: Verify Setup
|
|
851
670
|
|
|
852
|
-
|
|
853
|
-
The fact is about **team conventions, unwritten rules, organizational context, or who owns what**.
|
|
854
|
-
- Signal words: "convention", "rule", "always", "never", "team", "process", "review", "approval", "owns", "responsible"
|
|
855
|
-
- Examples: "The design team reviews all color changes", "We never deploy on Fridays", "PR titles must start with the ticket number"
|
|
671
|
+
Help the user verify everything is wired correctly:
|
|
856
672
|
|
|
857
|
-
|
|
858
|
-
|
|
859
|
-
|
|
860
|
-
- Examples: "If Wi-Fi disconnects during flash, wait and retry -- don't switch networks", "When tests fail with ECONNREFUSED, check if Docker is running"
|
|
673
|
+
1. **Check workflow files exist:** \`ls .github/workflows/autofix.yml .github/workflows/scenarios-dispatch.yml .github/workflows/spec-dispatch.yml .github/workflows/scenarios-rerun.yml\`
|
|
674
|
+
2. **Check scenario templates were copied:** Verify the scenarios repo has \`example-scenario.test.ts\`, \`workflows/run.yml\`, \`workflows/generate.yml\`, \`prompts/scenario-agent.md\`
|
|
675
|
+
3. **Check the App ID is correct** in the workflow files (not still a placeholder)
|
|
861
676
|
|
|
862
|
-
|
|
677
|
+
## Step 6: Update CLAUDE.md
|
|
863
678
|
|
|
864
|
-
If the
|
|
679
|
+
If the project's CLAUDE.md doesn't already have an "External Validation" section, add one:
|
|
865
680
|
|
|
866
|
-
##
|
|
681
|
+
> ## External Validation
|
|
682
|
+
>
|
|
683
|
+
> This project uses holdout scenario tests in a separate private repo.
|
|
684
|
+
>
|
|
685
|
+
> ### NEVER
|
|
686
|
+
> - Access, read, or reference the scenarios repo
|
|
687
|
+
> - Mention scenario test names or contents
|
|
688
|
+
> - Modify the scenarios dispatch workflow to leak test information
|
|
689
|
+
>
|
|
690
|
+
> The scenarios repo is deliberately invisible to you. This is the holdout guarantee.
|
|
867
691
|
|
|
868
|
-
|
|
869
|
-
2. If the target document does not exist, create it from the template structure. Check \`docs/templates/\` for the matching template. If no template exists, use this minimal structure:
|
|
692
|
+
## Step 7: First Test (Optional)
|
|
870
693
|
|
|
871
|
-
|
|
872
|
-
\`\`\`markdown
|
|
873
|
-
# Production Map
|
|
694
|
+
If the user wants to test the loop:
|
|
874
695
|
|
|
875
|
-
>
|
|
696
|
+
> Want to do a quick test? Here's how:
|
|
697
|
+
>
|
|
698
|
+
> 1. Write a simple spec in \`docs/specs/\` and push to main \u2014 this triggers scenario generation
|
|
699
|
+
> 2. Create a PR with a small change \u2014 when CI passes, scenarios will run
|
|
700
|
+
> 3. Watch for the scenario test results as a PR comment
|
|
701
|
+
>
|
|
702
|
+
> Or deliberately break something in a PR to test the autofix loop.
|
|
876
703
|
|
|
877
|
-
##
|
|
704
|
+
## Step 8: Summary
|
|
878
705
|
|
|
879
|
-
|
|
880
|
-
|---------|-------------|-------------|-------------------|
|
|
881
|
-
\`\`\`
|
|
706
|
+
Print a summary of what was set up:
|
|
882
707
|
|
|
883
|
-
|
|
884
|
-
|
|
885
|
-
|
|
708
|
+
> **Level 5 is live.** Here's what's running:
|
|
709
|
+
>
|
|
710
|
+
> | Trigger | What Happens |
|
|
711
|
+
> |---------|-------------|
|
|
712
|
+
> | Push specs to \`docs/specs/\` | Scenario agent writes holdout tests |
|
|
713
|
+
> | PR fails CI | Claude autofix attempts (up to 3x) |
|
|
714
|
+
> | PR passes CI | Holdout scenarios run against PR |
|
|
715
|
+
> | Scenarios update | Open PRs re-tested with latest scenarios |
|
|
716
|
+
>
|
|
717
|
+
> Your scenarios repo: \`{name}\`
|
|
718
|
+
> Your coding agent cannot see those tests. The holdout wall is intact.
|
|
886
719
|
|
|
887
|
-
|
|
720
|
+
Update \`docs/joycraft-assessment.md\` if it exists \u2014 set the Level 5 score to reflect the new setup.
|
|
721
|
+
`,
|
|
722
|
+
"joycraft-interview.md": `---
|
|
723
|
+
name: joycraft-interview
|
|
724
|
+
description: Brainstorm freely about what you want to build \u2014 yap, explore ideas, and get a structured summary you can use later
|
|
725
|
+
instructions: 18
|
|
726
|
+
---
|
|
888
727
|
|
|
889
|
-
|
|
728
|
+
# Interview \u2014 Idea Exploration
|
|
890
729
|
|
|
891
|
-
|
|
892
|
-
|-------------------|-------------|----------------|
|
|
893
|
-
\`\`\`
|
|
730
|
+
You are helping the user brainstorm and explore what they want to build. This is a lightweight, low-pressure conversation \u2014 not a formal spec process. Let them yap.
|
|
894
731
|
|
|
895
|
-
|
|
896
|
-
\`\`\`markdown
|
|
897
|
-
# Decision Log
|
|
732
|
+
## How to Run the Interview
|
|
898
733
|
|
|
899
|
-
|
|
734
|
+
### 1. Open the Floor
|
|
900
735
|
|
|
901
|
-
|
|
736
|
+
Start with something like:
|
|
737
|
+
"What are you thinking about building? Just talk \u2014 I'll listen and ask questions as we go."
|
|
902
738
|
|
|
903
|
-
|
|
904
|
-
|------|----------|-----|----------------------|-------------|
|
|
905
|
-
\`\`\`
|
|
739
|
+
Let the user talk freely. Do not interrupt their flow. Do not push toward structure yet.
|
|
906
740
|
|
|
907
|
-
|
|
908
|
-
\`\`\`markdown
|
|
909
|
-
# Institutional Knowledge
|
|
741
|
+
### 2. Ask Clarifying Questions
|
|
910
742
|
|
|
911
|
-
|
|
743
|
+
As they talk, weave in questions naturally \u2014 don't fire them all at once:
|
|
912
744
|
|
|
913
|
-
|
|
745
|
+
- **What problem does this solve?** Who feels the pain today?
|
|
746
|
+
- **What does "done" look like?** If this worked perfectly, what would a user see?
|
|
747
|
+
- **What are the constraints?** Time, tech, team, budget \u2014 what boxes are we in?
|
|
748
|
+
- **What's NOT in scope?** What's tempting but should be deferred?
|
|
749
|
+
- **What are the edge cases?** What could go wrong? What's the weird input?
|
|
750
|
+
- **What exists already?** Are we building on something or starting fresh?
|
|
914
751
|
|
|
915
|
-
|
|
916
|
-
\`\`\`
|
|
752
|
+
### 3. Play Back Understanding
|
|
917
753
|
|
|
918
|
-
|
|
919
|
-
|
|
920
|
-
# Troubleshooting
|
|
754
|
+
After the user has gotten their ideas out, reflect back:
|
|
755
|
+
"So if I'm hearing you right, you want to [summary]. The core problem is [X], and done looks like [Y]. Is that right?"
|
|
921
756
|
|
|
922
|
-
|
|
757
|
+
Let them correct and refine. Iterate until they say "yes, that's it."
|
|
923
758
|
|
|
924
|
-
|
|
759
|
+
### 4. Write a Draft Brief
|
|
925
760
|
|
|
926
|
-
|
|
927
|
-
|-------------------|---------|---------------|
|
|
928
|
-
\`\`\`
|
|
761
|
+
Create a draft file at \`docs/briefs/YYYY-MM-DD-topic-draft.md\`. Create the \`docs/briefs/\` directory if it doesn't exist.
|
|
929
762
|
|
|
930
|
-
|
|
763
|
+
Use this format:
|
|
931
764
|
|
|
932
|
-
|
|
933
|
-
|
|
934
|
-
- Whether it uses tables or lists
|
|
935
|
-
- The column format if it's a table
|
|
765
|
+
\`\`\`markdown
|
|
766
|
+
# [Topic] \u2014 Draft Brief
|
|
936
767
|
|
|
937
|
-
|
|
768
|
+
> **Date:** YYYY-MM-DD
|
|
769
|
+
> **Status:** DRAFT
|
|
770
|
+
> **Origin:** /joycraft-interview session
|
|
938
771
|
|
|
939
|
-
|
|
772
|
+
---
|
|
940
773
|
|
|
941
|
-
|
|
942
|
-
-
|
|
774
|
+
## The Idea
|
|
775
|
+
[2-3 paragraphs capturing what the user described \u2014 their words, their framing]
|
|
943
776
|
|
|
944
|
-
|
|
777
|
+
## Problem
|
|
778
|
+
[What pain or gap this addresses]
|
|
945
779
|
|
|
946
|
-
|
|
780
|
+
## What "Done" Looks Like
|
|
781
|
+
[The user's description of success \u2014 observable outcomes]
|
|
947
782
|
|
|
948
|
-
##
|
|
783
|
+
## Constraints
|
|
784
|
+
- [constraint 1]
|
|
785
|
+
- [constraint 2]
|
|
949
786
|
|
|
950
|
-
|
|
787
|
+
## Open Questions
|
|
788
|
+
- [things that came up but weren't resolved]
|
|
789
|
+
- [decisions that need more thought]
|
|
951
790
|
|
|
952
|
-
|
|
953
|
-
-
|
|
954
|
-
- Could cause real damage if violated (data loss, broken deployments, security issues)
|
|
955
|
-
- Is a hard constraint that applies across all work, not just a one-time note
|
|
791
|
+
## Out of Scope (for now)
|
|
792
|
+
- [things explicitly deferred]
|
|
956
793
|
|
|
957
|
-
|
|
958
|
-
|
|
959
|
-
|
|
960
|
-
- A diagnostic tip rather than a prohibition
|
|
794
|
+
## Raw Notes
|
|
795
|
+
[Any additional context, quotes, or tangents worth preserving]
|
|
796
|
+
\`\`\`
|
|
961
797
|
|
|
962
|
-
|
|
798
|
+
### 5. Hand Off
|
|
963
799
|
|
|
964
|
-
|
|
800
|
+
After writing the draft, tell the user:
|
|
965
801
|
|
|
966
|
-
|
|
802
|
+
\`\`\`
|
|
803
|
+
Draft brief saved to docs/briefs/YYYY-MM-DD-topic-draft.md
|
|
967
804
|
|
|
805
|
+
When you're ready to move forward:
|
|
806
|
+
- /joycraft-new-feature \u2014 formalize this into a full Feature Brief with specs
|
|
807
|
+
- /joycraft-decompose \u2014 break it directly into atomic specs if scope is clear
|
|
808
|
+
- Or just keep brainstorming \u2014 run /joycraft-interview again anytime
|
|
968
809
|
\`\`\`
|
|
969
|
-
Added to [document name]:
|
|
970
|
-
[summary of what was added]
|
|
971
810
|
|
|
972
|
-
|
|
973
|
-
Added CLAUDE.md rule:
|
|
974
|
-
[ALWAYS/ASK FIRST/NEVER]: [rule text]
|
|
811
|
+
## Guidelines
|
|
975
812
|
|
|
976
|
-
|
|
977
|
-
|
|
978
|
-
|
|
813
|
+
- **This is NOT /joycraft-new-feature.** Do not push toward formal briefs, decomposition tables, or atomic specs. The point is exploration.
|
|
814
|
+
- **Let the user lead.** Your job is to listen, clarify, and capture \u2014 not to structure or direct.
|
|
815
|
+
- **Mark everything as DRAFT.** The output is a starting point, not a commitment.
|
|
816
|
+
- **Keep it short.** The draft brief should be 1-2 pages max. Capture the essence, not every detail.
|
|
817
|
+
- **Multiple interviews are fine.** The user might run this several times as their thinking evolves. Each creates a new dated draft.
|
|
979
818
|
`,
|
|
980
819
|
"joycraft-lockdown.md": `---
|
|
981
820
|
name: joycraft-lockdown
|
|
@@ -1104,505 +943,613 @@ If the user asks you to apply the changes:
|
|
|
1104
943
|
|
|
1105
944
|
**Never auto-apply. Always show the exact changes and wait for explicit approval.**
|
|
1106
945
|
`,
|
|
1107
|
-
"joycraft-
|
|
1108
|
-
name: joycraft-
|
|
1109
|
-
description:
|
|
1110
|
-
instructions:
|
|
946
|
+
"joycraft-new-feature.md": `---
|
|
947
|
+
name: joycraft-new-feature
|
|
948
|
+
description: Guided feature development \u2014 interview the user, produce a Feature Brief, then decompose into atomic specs
|
|
949
|
+
instructions: 35
|
|
1111
950
|
---
|
|
1112
951
|
|
|
1113
|
-
#
|
|
952
|
+
# New Feature Workflow
|
|
1114
953
|
|
|
1115
|
-
|
|
954
|
+
You are starting a new feature. Follow this process in order. Do not skip steps.
|
|
1116
955
|
|
|
1117
|
-
|
|
956
|
+
## Phase 1: Interview
|
|
1118
957
|
|
|
1119
|
-
|
|
958
|
+
Interview the user about what they want to build. Let them talk \u2014 your job is to listen, then sharpen.
|
|
1120
959
|
|
|
1121
|
-
|
|
960
|
+
**Ask about:**
|
|
961
|
+
- What problem does this solve? Who is affected?
|
|
962
|
+
- What does "done" look like?
|
|
963
|
+
- Hard constraints? (business rules, tech limitations, deadlines)
|
|
964
|
+
- What is explicitly NOT in scope? (push hard on this)
|
|
965
|
+
- Edge cases or error conditions?
|
|
966
|
+
- What existing code/patterns should this follow?
|
|
967
|
+
- Testing: existing setup? framework? smoke test budget? lockdown mode desired?
|
|
1122
968
|
|
|
1123
|
-
|
|
969
|
+
**Interview technique:**
|
|
970
|
+
- Let the user "yap" \u2014 don't interrupt their flow
|
|
971
|
+
- Play back your understanding: "So if I'm hearing you right..."
|
|
972
|
+
- Push toward testable statements: "How would we verify that works?"
|
|
1124
973
|
|
|
1125
|
-
|
|
974
|
+
Keep asking until you can fill out a Feature Brief.
|
|
1126
975
|
|
|
1127
|
-
##
|
|
976
|
+
## Phase 2: Feature Brief
|
|
1128
977
|
|
|
1129
|
-
|
|
978
|
+
Write a Feature Brief to \`docs/briefs/YYYY-MM-DD-feature-name.md\`. Create the \`docs/briefs/\` directory if it doesn't exist.
|
|
1130
979
|
|
|
1131
|
-
|
|
1132
|
-
2. **Acceptance Criteria** -- the checklist under the \`## Acceptance Criteria\` section
|
|
1133
|
-
3. **Test Plan** -- the table under the \`## Test Plan\` section, including any test commands
|
|
1134
|
-
4. **Constraints** -- the \`## Constraints\` section if present
|
|
980
|
+
**Why:** The brief is the single source of truth for what we're building. It prevents scope creep and gives every spec a shared reference point.
|
|
1135
981
|
|
|
1136
|
-
|
|
982
|
+
Use this structure:
|
|
1137
983
|
|
|
1138
|
-
|
|
984
|
+
\`\`\`markdown
|
|
985
|
+
# [Feature Name] \u2014 Feature Brief
|
|
1139
986
|
|
|
1140
|
-
|
|
987
|
+
> **Date:** YYYY-MM-DD
|
|
988
|
+
> **Project:** [project name]
|
|
989
|
+
> **Status:** Interview | Decomposing | Specs Ready | In Progress | Complete
|
|
1141
990
|
|
|
1142
|
-
|
|
991
|
+
---
|
|
1143
992
|
|
|
1144
|
-
|
|
993
|
+
## Vision
|
|
994
|
+
What are we building and why? The full picture in 2-4 paragraphs.
|
|
1145
995
|
|
|
1146
|
-
|
|
1147
|
-
|
|
1148
|
-
3. Common defaults based on the project type:
|
|
1149
|
-
- Node.js: \`npm test\` or \`pnpm test --run\`
|
|
1150
|
-
- Python: \`pytest\`
|
|
1151
|
-
- Rust: \`cargo test\`
|
|
1152
|
-
- Go: \`go test ./...\`
|
|
996
|
+
## User Stories
|
|
997
|
+
- As a [role], I want [capability] so that [benefit]
|
|
1153
998
|
|
|
1154
|
-
|
|
999
|
+
## Hard Constraints
|
|
1000
|
+
- MUST: [constraint that every spec must respect]
|
|
1001
|
+
- MUST NOT: [prohibition that every spec must respect]
|
|
1155
1002
|
|
|
1156
|
-
##
|
|
1003
|
+
## Out of Scope
|
|
1004
|
+
- NOT: [tempting but deferred]
|
|
1157
1005
|
|
|
1158
|
-
|
|
1006
|
+
## Test Strategy
|
|
1007
|
+
- **Existing setup:** [framework and tools, or "none yet"]
|
|
1008
|
+
- **User expertise:** [comfortable / learning / needs guidance]
|
|
1009
|
+
- **Test types:** [smoke, unit, integration, e2e, etc.]
|
|
1010
|
+
- **Smoke test budget:** [target time for fast-feedback tests]
|
|
1011
|
+
- **Lockdown mode:** [yes/no \u2014 constrain agent to code + tests only]
|
|
1012
|
+
|
|
1013
|
+
## Decomposition
|
|
1014
|
+
| # | Spec Name | Description | Dependencies | Est. Size |
|
|
1015
|
+
|---|-----------|-------------|--------------|-----------|
|
|
1016
|
+
| 1 | [verb-object] | [one sentence] | None | [S/M/L] |
|
|
1017
|
+
|
|
1018
|
+
## Execution Strategy
|
|
1019
|
+
- [ ] Sequential (specs have chain dependencies)
|
|
1020
|
+
- [ ] Parallel worktrees (specs are independent)
|
|
1021
|
+
- [ ] Mixed
|
|
1159
1022
|
|
|
1023
|
+
## Success Criteria
|
|
1024
|
+
- [ ] [End-to-end behavior 1]
|
|
1025
|
+
- [ ] [No regressions in existing features]
|
|
1160
1026
|
\`\`\`
|
|
1161
|
-
You are a QA verifier. Your job is to independently verify an implementation against its spec. You have NO context about how the implementation was done -- you are checking it fresh.
|
|
1162
1027
|
|
|
1163
|
-
|
|
1164
|
-
- You may READ any file using the Read tool or cat
|
|
1165
|
-
- You may RUN these specific test/build commands: [TEST_COMMANDS]
|
|
1166
|
-
- You may NOT edit, create, or delete any files
|
|
1167
|
-
- You may NOT run commands that modify state (no git commit, no npm install, no file writes)
|
|
1168
|
-
- You may NOT install packages or access the network
|
|
1169
|
-
- Report what you OBSERVE, not what you expect or hope
|
|
1028
|
+
If \`docs/templates/FEATURE_BRIEF_TEMPLATE.md\` exists, reference it for the full template with additional guidance.
|
|
1170
1029
|
|
|
1171
|
-
|
|
1030
|
+
Present the brief to the user. Focus review on:
|
|
1031
|
+
- "Does the decomposition match how you think about this?"
|
|
1032
|
+
- "Is anything in scope that shouldn't be?"
|
|
1033
|
+
- "Are the specs small enough? Can each be described in one sentence?"
|
|
1172
1034
|
|
|
1173
|
-
|
|
1174
|
-
[ACCEPTANCE_CRITERIA]
|
|
1035
|
+
Iterate until approved.
|
|
1175
1036
|
|
|
1176
|
-
|
|
1177
|
-
[TEST_PLAN]
|
|
1037
|
+
## Phase 3: Generate Atomic Specs
|
|
1178
1038
|
|
|
1179
|
-
|
|
1180
|
-
[CONSTRAINTS_OR_NONE]
|
|
1039
|
+
For each row in the decomposition table, create a self-contained spec file at \`docs/specs/YYYY-MM-DD-spec-name.md\`. Create the \`docs/specs/\` directory if it doesn't exist.
|
|
1181
1040
|
|
|
1182
|
-
|
|
1183
|
-
For each acceptance criterion, determine if it PASSES or FAILS based on evidence:
|
|
1041
|
+
**Why:** Each spec must be understandable WITHOUT reading the Feature Brief. This prevents the "Curse of Instructions" \u2014 no spec should require holding the entire feature in context. Copy relevant context into each spec.
|
|
1184
1042
|
|
|
1185
|
-
|
|
1186
|
-
2. For each acceptance criterion:
|
|
1187
|
-
a. Check if there is a corresponding test and whether it passes
|
|
1188
|
-
b. If no test exists, read the relevant source files to verify the criterion is met
|
|
1189
|
-
c. If the criterion cannot be verified by reading code or running tests, mark it MANUAL CHECK NEEDED
|
|
1190
|
-
3. For criteria about build/test passing, actually run the commands and report results.
|
|
1043
|
+
Use this structure for each spec:
|
|
1191
1044
|
|
|
1192
|
-
|
|
1045
|
+
\`\`\`markdown
|
|
1046
|
+
# [Verb + Object] \u2014 Atomic Spec
|
|
1047
|
+
|
|
1048
|
+
> **Parent Brief:** \`docs/briefs/YYYY-MM-DD-feature-name.md\`
|
|
1049
|
+
> **Status:** Ready
|
|
1050
|
+
> **Date:** YYYY-MM-DD
|
|
1051
|
+
> **Estimated scope:** [1 session / N files / ~N lines]
|
|
1052
|
+
|
|
1053
|
+
---
|
|
1054
|
+
|
|
1055
|
+
## What
|
|
1056
|
+
One paragraph \u2014 what changes when this spec is done?
|
|
1057
|
+
|
|
1058
|
+
## Why
|
|
1059
|
+
One sentence \u2014 what breaks or is missing without this?
|
|
1060
|
+
|
|
1061
|
+
## Acceptance Criteria
|
|
1062
|
+
- [ ] [Observable behavior]
|
|
1063
|
+
- [ ] Build passes
|
|
1064
|
+
- [ ] Tests pass
|
|
1065
|
+
|
|
1066
|
+
## Test Plan
|
|
1067
|
+
|
|
1068
|
+
| Acceptance Criterion | Test | Type |
|
|
1069
|
+
|---------------------|------|------|
|
|
1070
|
+
| [Each AC above] | [What to call/assert] | [unit/integration/e2e] |
|
|
1071
|
+
|
|
1072
|
+
**Execution order:**
|
|
1073
|
+
1. Write all tests above \u2014 they should fail against current/stubbed code
|
|
1074
|
+
2. Run tests to confirm they fail (red)
|
|
1075
|
+
3. Implement until all tests pass (green)
|
|
1076
|
+
|
|
1077
|
+
**Smoke test:** [Identify the fastest test for iteration feedback]
|
|
1078
|
+
|
|
1079
|
+
**Before implementing, verify your test harness:**
|
|
1080
|
+
1. Run all tests \u2014 they must FAIL (if they pass, you're testing the wrong thing)
|
|
1081
|
+
2. Each test calls your actual function/endpoint \u2014 not a reimplementation or the underlying library
|
|
1082
|
+
3. Identify your smoke test \u2014 it must run in seconds, not minutes, so you get fast feedback on each change
|
|
1083
|
+
|
|
1084
|
+
## Constraints
|
|
1085
|
+
- MUST: [hard requirement]
|
|
1086
|
+
- MUST NOT: [hard prohibition]
|
|
1087
|
+
|
|
1088
|
+
## Affected Files
|
|
1089
|
+
| Action | File | What Changes |
|
|
1090
|
+
|--------|------|-------------|
|
|
1091
|
+
|
|
1092
|
+
## Approach
|
|
1093
|
+
Strategy, data flow, key decisions. Name one rejected alternative.
|
|
1193
1094
|
|
|
1194
|
-
|
|
1095
|
+
## Edge Cases
|
|
1096
|
+
| Scenario | Expected Behavior |
|
|
1097
|
+
|----------|------------------|
|
|
1098
|
+
\`\`\`
|
|
1195
1099
|
|
|
1196
|
-
|
|
1197
|
-
|---|-----------|---------|----------|
|
|
1198
|
-
| 1 | [criterion text] | PASS/FAIL/MANUAL CHECK NEEDED | [what you observed] |
|
|
1199
|
-
| 2 | [criterion text] | PASS/FAIL/MANUAL CHECK NEEDED | [what you observed] |
|
|
1200
|
-
[continue for all criteria]
|
|
1100
|
+
If \`docs/templates/ATOMIC_SPEC_TEMPLATE.md\` exists, reference it for the full template with additional guidance.
|
|
1201
1101
|
|
|
1202
|
-
|
|
1102
|
+
## Phase 4: Hand Off for Execution
|
|
1203
1103
|
|
|
1204
|
-
|
|
1104
|
+
Tell the user:
|
|
1205
1105
|
\`\`\`
|
|
1106
|
+
Feature Brief and [N] atomic specs are ready.
|
|
1206
1107
|
|
|
1207
|
-
|
|
1108
|
+
Specs:
|
|
1109
|
+
1. [spec-name] \u2014 [one sentence] [S/M/L]
|
|
1110
|
+
2. [spec-name] \u2014 [one sentence] [S/M/L]
|
|
1111
|
+
...
|
|
1208
1112
|
|
|
1209
|
-
|
|
1113
|
+
Recommended execution:
|
|
1114
|
+
- [Parallel/Sequential/Mixed strategy]
|
|
1115
|
+
- Estimated: [N] sessions total
|
|
1116
|
+
|
|
1117
|
+
To execute: Start a fresh session per spec. Each session should:
|
|
1118
|
+
1. Read the spec
|
|
1119
|
+
2. Implement
|
|
1120
|
+
3. Run /joycraft-session-end to capture discoveries
|
|
1121
|
+
4. Commit and PR
|
|
1210
1122
|
|
|
1123
|
+
Ready to start?
|
|
1211
1124
|
\`\`\`
|
|
1212
|
-
## Verification Report -- [Spec Name]
|
|
1213
1125
|
|
|
1214
|
-
|
|
1215
|
-
|---|-----------|---------|----------|
|
|
1216
|
-
| 1 | ... | PASS | ... |
|
|
1217
|
-
| 2 | ... | FAIL | ... |
|
|
1126
|
+
**Why:** A fresh session for execution produces better results. The interview session has too much context noise \u2014 a clean session with just the spec is more focused.
|
|
1218
1127
|
|
|
1219
|
-
|
|
1128
|
+
You can also use \`/joycraft-decompose\` to re-decompose a brief if the breakdown needs adjustment, or run \`/joycraft-interview\` first for a lighter brainstorm before committing to the full workflow.
|
|
1129
|
+
`,
|
|
1130
|
+
"joycraft-research.md": `---
|
|
1131
|
+
name: joycraft-research
|
|
1132
|
+
description: Produce objective codebase research by isolating question generation from fact-gathering \u2014 subagent sees only questions, never the brief
|
|
1133
|
+
---
|
|
1220
1134
|
|
|
1221
|
-
|
|
1222
|
-
All criteria verified. Ready to commit and open a PR.
|
|
1135
|
+
# Research Codebase for a Feature
|
|
1223
1136
|
|
|
1224
|
-
|
|
1225
|
-
N failures need attention. Review the evidence above and fix before proceeding.
|
|
1137
|
+
You are producing objective codebase research to inform a future spec or implementation. The key insight: the researching agent must never see the brief or ticket \u2014 only research questions. This prevents opinions from contaminating the facts.
|
|
1226
1138
|
|
|
1227
|
-
|
|
1228
|
-
|
|
1229
|
-
\`\`\`
|
|
1139
|
+
**Guard clause:** If the user doesn't provide a brief path or inline description, ask:
|
|
1140
|
+
"What feature or change are you researching? Provide a brief path (e.g., \`docs/briefs/2026-03-30-my-feature.md\`) or describe it in a few sentences."
|
|
1230
1141
|
|
|
1231
|
-
|
|
1142
|
+
---
|
|
1232
1143
|
|
|
1233
|
-
|
|
1144
|
+
## Phase 1: Generate Research Questions
|
|
1234
1145
|
|
|
1235
|
-
|
|
1236
|
-
- **Some FAIL:** List the failed criteria and suggest the user fix them, then run \`/joycraft-verify\` again.
|
|
1237
|
-
- **MANUAL CHECK NEEDED items:** Explain what needs human eyes and why automation couldn't verify it.
|
|
1146
|
+
Read the brief file (if a path was provided) or use the user's inline description.
|
|
1238
1147
|
|
|
1239
|
-
|
|
1148
|
+
Identify which zones of the codebase are relevant to this feature. Then generate 5-10 research questions that are:
|
|
1240
1149
|
|
|
1241
|
-
|
|
1150
|
+
- **Objective and fact-seeking** \u2014 "How does X work?" not "How should we build X?"
|
|
1151
|
+
- **Specific to the codebase** \u2014 reference concrete systems, files, or flows
|
|
1152
|
+
- **Answerable by reading code** \u2014 no questions about business strategy or user preferences
|
|
1242
1153
|
|
|
1243
|
-
|
|
1244
|
-
|
|
1245
|
-
|
|
1246
|
-
|
|
1247
|
-
|
|
1248
|
-
|
|
1249
|
-
| Spec status is "Complete" | Still run verification -- "Complete" means the implementer thinks it's done, verification confirms |
|
|
1250
|
-
`,
|
|
1251
|
-
"joycraft-bugfix.md": `---
|
|
1252
|
-
name: joycraft-bugfix
|
|
1253
|
-
description: Structured bug fix workflow \u2014 triage, diagnose, discuss with user, write a focused spec, hand off for implementation
|
|
1254
|
-
instructions: 32
|
|
1255
|
-
---
|
|
1154
|
+
Good examples:
|
|
1155
|
+
- "How does endpoint registration work in the current router?"
|
|
1156
|
+
- "What patterns exist for input validation across existing handlers?"
|
|
1157
|
+
- "Trace the data flow from API request to database write for entity X."
|
|
1158
|
+
- "What test infrastructure exists? Where are fixtures, mocks, and helpers?"
|
|
1159
|
+
- "What dependencies does module Y import, and what does its public API look like?"
|
|
1256
1160
|
|
|
1257
|
-
|
|
1161
|
+
Bad examples (do NOT generate these):
|
|
1162
|
+
- "What's the best way to implement this feature?" (opinion)
|
|
1163
|
+
- "Should we use library X or Y?" (recommendation)
|
|
1164
|
+
- "What would a good architecture look like?" (design, not research)
|
|
1258
1165
|
|
|
1259
|
-
|
|
1166
|
+
Write the questions to a temporary file at \`docs/research/.questions-tmp.md\`. Create the \`docs/research/\` directory if it doesn't exist.
|
|
1260
1167
|
|
|
1261
|
-
**
|
|
1168
|
+
**Do NOT include any content from the brief in this file \u2014 only the questions.**
|
|
1262
1169
|
|
|
1263
1170
|
---
|
|
1264
1171
|
|
|
1265
|
-
## Phase
|
|
1172
|
+
## Phase 2: Spawn Research Subagent
|
|
1266
1173
|
|
|
1267
|
-
|
|
1174
|
+
Use Claude Code's Agent tool to spawn a subagent. Pass ONLY the research questions \u2014 never the brief path, brief content, or feature description.
|
|
1268
1175
|
|
|
1269
|
-
|
|
1176
|
+
Build the subagent prompt by reading the questions file you just wrote, then use this template:
|
|
1270
1177
|
|
|
1271
|
-
|
|
1178
|
+
\`\`\`
|
|
1179
|
+
You are researching a codebase to answer specific questions. You have NO context about why these questions are being asked \u2014 you are simply gathering facts.
|
|
1272
1180
|
|
|
1273
|
-
|
|
1181
|
+
RULES \u2014 these are hard constraints:
|
|
1182
|
+
- Answer each question with FACTS ONLY: file paths, function signatures, data flows, patterns, dependencies
|
|
1183
|
+
- Do NOT recommend, suggest, or opine on anything
|
|
1184
|
+
- Do NOT speculate about what should be built or how
|
|
1185
|
+
- If a question cannot be answered (no relevant code exists), say "No existing code found for this"
|
|
1186
|
+
- Use the Read tool and Grep tool to explore the codebase thoroughly
|
|
1187
|
+
- Include code snippets only when they are essential evidence (e.g., a function signature, a config block)
|
|
1274
1188
|
|
|
1275
|
-
|
|
1189
|
+
QUESTIONS:
|
|
1190
|
+
[INSERT_QUESTIONS_HERE]
|
|
1276
1191
|
|
|
1277
|
-
|
|
1192
|
+
OUTPUT FORMAT \u2014 write your findings as a single markdown document using this structure:
|
|
1193
|
+
|
|
1194
|
+
# Codebase Research
|
|
1195
|
+
|
|
1196
|
+
**Date:** [today's date]
|
|
1197
|
+
**Questions answered:** [N/total]
|
|
1278
1198
|
|
|
1279
1199
|
---
|
|
1280
1200
|
|
|
1281
|
-
##
|
|
1201
|
+
## Q1: [question text]
|
|
1282
1202
|
|
|
1283
|
-
|
|
1284
|
-
1. **Symptom** \u2014 confirm it matches what they see
|
|
1285
|
-
2. **Root cause** \u2014 specific file(s) and line(s)
|
|
1286
|
-
3. **Proposed fix** \u2014 what changes, where
|
|
1287
|
-
4. **Risk** \u2014 side effects? scope?
|
|
1203
|
+
[Facts, file paths, function signatures, data flows. No opinions.]
|
|
1288
1204
|
|
|
1289
|
-
|
|
1205
|
+
## Q2: [question text]
|
|
1290
1206
|
|
|
1291
|
-
|
|
1207
|
+
[Facts, file paths, function signatures, data flows. No opinions.]
|
|
1292
1208
|
|
|
1293
|
-
|
|
1209
|
+
[Continue for all questions]
|
|
1210
|
+
\`\`\`
|
|
1294
1211
|
|
|
1295
|
-
## Phase
|
|
1212
|
+
## Phase 3: Write the Research Document
|
|
1296
1213
|
|
|
1297
|
-
|
|
1214
|
+
Take the subagent's response and write it to \`docs/research/YYYY-MM-DD-feature-name.md\`. Derive the feature name from the brief filename or the user's description (lowercase, hyphenated).
|
|
1298
1215
|
|
|
1299
|
-
|
|
1216
|
+
Delete the temporary questions file (\`docs/research/.questions-tmp.md\`).
|
|
1300
1217
|
|
|
1301
|
-
|
|
1218
|
+
Present the research document path to the user:
|
|
1302
1219
|
|
|
1303
|
-
\`\`\`
|
|
1304
|
-
|
|
1220
|
+
\`\`\`
|
|
1221
|
+
Research complete: docs/research/YYYY-MM-DD-feature-name.md
|
|
1305
1222
|
|
|
1306
|
-
|
|
1307
|
-
> **Issue/Error:** [error message, issue link, or symptom description]
|
|
1308
|
-
> **Status:** Ready
|
|
1309
|
-
> **Date:** YYYY-MM-DD
|
|
1310
|
-
> **Estimated scope:** [1 session / N files / ~N lines]
|
|
1223
|
+
This document contains objective facts about your codebase \u2014 no opinions or recommendations.
|
|
1311
1224
|
|
|
1312
|
-
|
|
1225
|
+
Next steps:
|
|
1226
|
+
- /joycraft-decompose \u2014 break the feature into atomic specs (research will inform the specs)
|
|
1227
|
+
- /joycraft-new-feature \u2014 formalize into a full Feature Brief first
|
|
1228
|
+
- Read the research and add any corrections or missing context manually
|
|
1229
|
+
\`\`\`
|
|
1313
1230
|
|
|
1314
|
-
##
|
|
1231
|
+
## Edge Cases
|
|
1315
1232
|
|
|
1316
|
-
|
|
1233
|
+
| Scenario | Behavior |
|
|
1234
|
+
|----------|----------|
|
|
1235
|
+
| No brief provided | Accept inline description, generate questions from that |
|
|
1236
|
+
| Codebase is empty or new | Research doc reports "no existing patterns found" per question |
|
|
1237
|
+
| User runs research twice for same feature | Overwrites previous research doc (same filename) |
|
|
1238
|
+
| Brief is very short (1-2 sentences) | Still generate questions \u2014 even simple features benefit from understanding existing patterns |
|
|
1239
|
+
| \`docs/research/\` doesn't exist | Create it |
|
|
1240
|
+
`,
|
|
1241
|
+
"joycraft-session-end.md": `---
|
|
1242
|
+
name: joycraft-session-end
|
|
1243
|
+
description: Wrap up a session \u2014 capture discoveries, verify, prepare for PR or next session
|
|
1244
|
+
instructions: 22
|
|
1245
|
+
---
|
|
1317
1246
|
|
|
1318
|
-
|
|
1247
|
+
# Session Wrap-Up
|
|
1319
1248
|
|
|
1320
|
-
|
|
1249
|
+
Before ending this session, complete these steps in order.
|
|
1321
1250
|
|
|
1322
|
-
##
|
|
1251
|
+
## 1. Capture Discoveries
|
|
1323
1252
|
|
|
1324
|
-
|
|
1253
|
+
**Why:** Discoveries are the surprises \u2014 things that weren't in the spec or that contradicted expectations. They prevent future sessions from hitting the same walls.
|
|
1325
1254
|
|
|
1326
|
-
|
|
1255
|
+
Check: did anything surprising happen during this session? If yes, create or update a discovery file at \`docs/discoveries/YYYY-MM-DD-topic.md\`. Create the \`docs/discoveries/\` directory if it doesn't exist.
|
|
1327
1256
|
|
|
1328
|
-
|
|
1329
|
-
-
|
|
1330
|
-
-
|
|
1331
|
-
-
|
|
1257
|
+
Only capture what's NOT obvious from the code or git diff:
|
|
1258
|
+
- "We thought X but found Y" \u2014 assumptions that were wrong
|
|
1259
|
+
- "This API/library behaves differently than documented" \u2014 external gotchas
|
|
1260
|
+
- "This edge case needs handling in a future spec" \u2014 deferred work with context
|
|
1261
|
+
- "The approach in the spec didn't work because..." \u2014 spec-vs-reality gaps
|
|
1262
|
+
- Key decisions made during implementation that aren't in the spec
|
|
1332
1263
|
|
|
1333
|
-
|
|
1264
|
+
**Do NOT capture:**
|
|
1265
|
+
- Files changed (that's the diff)
|
|
1266
|
+
- What you set out to do (that's the spec)
|
|
1267
|
+
- Step-by-step narrative of the session (nobody re-reads these)
|
|
1334
1268
|
|
|
1335
|
-
|
|
1336
|
-
|---------------------|------|------|
|
|
1337
|
-
| [Bug no longer occurs] | [Test that reproduces the bug, then verifies the fix] | [unit/integration/e2e] |
|
|
1338
|
-
| [No regressions] | [Existing tests still pass, or new regression test] | [unit/integration] |
|
|
1269
|
+
Use this format:
|
|
1339
1270
|
|
|
1340
|
-
|
|
1341
|
-
|
|
1342
|
-
2. Run the test to confirm it fails
|
|
1343
|
-
3. Apply the fix
|
|
1344
|
-
4. Run the test to confirm it passes (green)
|
|
1345
|
-
5. Run the full test suite to check for regressions
|
|
1271
|
+
\`\`\`markdown
|
|
1272
|
+
# Discoveries \u2014 [topic]
|
|
1346
1273
|
|
|
1347
|
-
**
|
|
1274
|
+
**Date:** YYYY-MM-DD
|
|
1275
|
+
**Spec:** [link to spec if applicable]
|
|
1348
1276
|
|
|
1349
|
-
|
|
1350
|
-
|
|
1351
|
-
|
|
1352
|
-
|
|
1277
|
+
## [Discovery title]
|
|
1278
|
+
**Expected:** [what we thought would happen]
|
|
1279
|
+
**Actual:** [what actually happened]
|
|
1280
|
+
**Impact:** [what this means for future work]
|
|
1281
|
+
\`\`\`
|
|
1353
1282
|
|
|
1354
|
-
|
|
1283
|
+
If nothing surprising happened, skip the discovery file entirely. No discovery is a good sign \u2014 the spec was accurate.
|
|
1355
1284
|
|
|
1356
|
-
|
|
1357
|
-
- MUST NOT: [any prohibitions \u2014 e.g., don't change the public API]
|
|
1285
|
+
## 1b. Update Context Documents
|
|
1358
1286
|
|
|
1359
|
-
|
|
1287
|
+
If \`docs/context/\` exists, quickly check whether this session revealed anything about:
|
|
1360
1288
|
|
|
1361
|
-
|
|
1362
|
-
|
|
1289
|
+
- **Production risks** \u2014 did you interact with or learn about production vs staging systems? \u2192 Update \`docs/context/production-map.md\`
|
|
1290
|
+
- **Wrong assumptions** \u2014 did the agent (or you) assume something that turned out to be false? \u2192 Update \`docs/context/dangerous-assumptions.md\`
|
|
1291
|
+
- **Key decisions** \u2014 did you make an architectural or tooling choice? \u2192 Add a row to \`docs/context/decision-log.md\`
|
|
1292
|
+
- **Unwritten rules** \u2014 did you discover a convention or constraint not documented anywhere? \u2192 Update \`docs/context/institutional-knowledge.md\`
|
|
1363
1293
|
|
|
1364
|
-
|
|
1294
|
+
Skip this if nothing applies. Don't force it \u2014 only update when there's genuine new context.
|
|
1365
1295
|
|
|
1366
|
-
|
|
1367
|
-
|----------|------------------|
|
|
1368
|
-
\`\`\`
|
|
1296
|
+
## 2. Run Validation
|
|
1369
1297
|
|
|
1370
|
-
|
|
1298
|
+
Run the project's validation commands. Check CLAUDE.md for project-specific commands. Common checks:
|
|
1371
1299
|
|
|
1372
|
-
|
|
1300
|
+
- Type-check (e.g., \`tsc --noEmit\`, \`mypy\`, \`cargo check\`)
|
|
1301
|
+
- Tests (e.g., \`npm test\`, \`pytest\`, \`cargo test\`)
|
|
1302
|
+
- Lint (e.g., \`eslint\`, \`ruff\`, \`clippy\`)
|
|
1373
1303
|
|
|
1374
|
-
|
|
1304
|
+
Fix any failures before proceeding.
|
|
1375
1305
|
|
|
1376
|
-
##
|
|
1306
|
+
## 3. Update Spec Status
|
|
1377
1307
|
|
|
1378
|
-
|
|
1308
|
+
If working from an atomic spec in \`docs/specs/\`:
|
|
1309
|
+
- All acceptance criteria met \u2014 update status to \`Complete\`
|
|
1310
|
+
- Partially done \u2014 update status to \`In Progress\`, note what's left
|
|
1379
1311
|
|
|
1380
|
-
|
|
1381
|
-
Bug fix spec is ready: docs/specs/YYYY-MM-DD-bugfix-name.md
|
|
1312
|
+
If working from a Feature Brief in \`docs/briefs/\`, check off completed specs in the decomposition table.
|
|
1382
1313
|
|
|
1383
|
-
|
|
1384
|
-
- Bug: [one sentence]
|
|
1385
|
-
- Root cause: [one sentence]
|
|
1386
|
-
- Fix: [one sentence]
|
|
1387
|
-
- Estimated: 1 session
|
|
1314
|
+
## 4. Commit
|
|
1388
1315
|
|
|
1389
|
-
|
|
1390
|
-
1. Read the spec
|
|
1391
|
-
2. Write the reproduction test (must fail)
|
|
1392
|
-
3. Apply the fix (test must pass)
|
|
1393
|
-
4. Run full test suite
|
|
1394
|
-
5. Run /joycraft-session-end to capture discoveries
|
|
1395
|
-
6. Commit and PR
|
|
1316
|
+
Commit all changes including the discovery file (if created) and spec status updates. The commit message should reference the spec if applicable.
|
|
1396
1317
|
|
|
1397
|
-
|
|
1398
|
-
\`\`\`
|
|
1318
|
+
## 5. Push and PR (if autonomous git is enabled)
|
|
1399
1319
|
|
|
1400
|
-
**
|
|
1401
|
-
`,
|
|
1402
|
-
"joycraft-design.md": `---
|
|
1403
|
-
name: joycraft-design
|
|
1404
|
-
description: Design discussion before decomposition \u2014 produce a ~200-line design artifact for human review, catching wrong assumptions before they propagate into specs
|
|
1405
|
-
---
|
|
1320
|
+
**Check CLAUDE.md for "Git Autonomy" in the Behavioral Boundaries section.** If it says "STRICTLY ENFORCED" or the ALWAYS section includes "Push to feature branches immediately after every commit":
|
|
1406
1321
|
|
|
1407
|
-
|
|
1322
|
+
1. **Push immediately.** Run \`git push origin <branch>\` \u2014 do not ask, do not hesitate.
|
|
1323
|
+
2. **Open a PR if the feature is complete.** Check the parent Feature Brief's decomposition table \u2014 if all specs are done, run \`gh pr create\` with a summary of all completed specs. Do not ask first.
|
|
1324
|
+
3. **If not all specs are done,** still push. The PR comes when the last spec is complete.
|
|
1408
1325
|
|
|
1409
|
-
|
|
1326
|
+
If CLAUDE.md does NOT have autonomous git rules (or has "ASK FIRST" for pushing), ask the user before pushing.
|
|
1410
1327
|
|
|
1411
|
-
|
|
1412
|
-
"No feature brief found. Run \`/joycraft-new-feature\` first to create one, or provide the path to your brief."
|
|
1413
|
-
Then stop.
|
|
1328
|
+
## 6. Report
|
|
1414
1329
|
|
|
1330
|
+
\`\`\`
|
|
1331
|
+
Session complete.
|
|
1332
|
+
- Spec: [spec name] \u2014 [Complete / In Progress]
|
|
1333
|
+
- Build: [passing / failing]
|
|
1334
|
+
- Discoveries: [N items / none]
|
|
1335
|
+
- Pushed: [yes / no \u2014 and why not]
|
|
1336
|
+
- PR: [opened #N / not yet \u2014 N specs remaining]
|
|
1337
|
+
- Next: [what the next session should tackle]
|
|
1338
|
+
\`\`\`
|
|
1339
|
+
`,
|
|
1340
|
+
"joycraft-tune.md": `---
|
|
1341
|
+
name: joycraft-tune
|
|
1342
|
+
description: Assess and upgrade your project's AI development harness \u2014 score 7 dimensions, apply fixes, show path to Level 5
|
|
1343
|
+
instructions: 15
|
|
1415
1344
|
---
|
|
1416
1345
|
|
|
1417
|
-
|
|
1346
|
+
# Tune \u2014 Project Harness Assessment & Upgrade
|
|
1418
1347
|
|
|
1419
|
-
|
|
1348
|
+
You are evaluating and upgrading this project's AI development harness.
|
|
1420
1349
|
|
|
1421
|
-
## Step
|
|
1350
|
+
## Step 1: Detect Harness State
|
|
1422
1351
|
|
|
1423
|
-
|
|
1352
|
+
Check for: CLAUDE.md (with meaningful content), \`docs/specs/\`, \`docs/briefs/\`, \`docs/discoveries/\`, \`.claude/skills/\`, and test configuration.
|
|
1424
1353
|
|
|
1425
|
-
|
|
1426
|
-
- Existing patterns this feature should follow (naming, data flow, error handling)
|
|
1427
|
-
- Similar features already implemented that serve as models
|
|
1428
|
-
- Boundaries and interfaces the feature must integrate with
|
|
1354
|
+
## Step 2: Route
|
|
1429
1355
|
|
|
1430
|
-
|
|
1356
|
+
- **No harness** (no CLAUDE.md or just a README): Recommend \`npx joycraft init\` and stop.
|
|
1357
|
+
- **Harness exists**: Continue to assessment.
|
|
1431
1358
|
|
|
1432
|
-
## Step 3:
|
|
1359
|
+
## Step 3: Assess \u2014 Score 7 Dimensions (1-5 scale)
|
|
1433
1360
|
|
|
1434
|
-
|
|
1361
|
+
Read CLAUDE.md and explore the project. Score each with specific evidence:
|
|
1435
1362
|
|
|
1436
|
-
|
|
1363
|
+
| Dimension | What to Check |
|
|
1364
|
+
|-----------|--------------|
|
|
1365
|
+
| Spec Quality | \`docs/specs/\` \u2014 structured? acceptance criteria? self-contained? |
|
|
1366
|
+
| Spec Granularity | Can each spec be done in one session? |
|
|
1367
|
+
| Behavioral Boundaries | ALWAYS/ASK FIRST/NEVER sections (or equivalent rules under any heading) |
|
|
1368
|
+
| Skills & Hooks | \`.claude/skills/\` files, hooks config |
|
|
1369
|
+
| Documentation | \`docs/\` structure, templates, referenced from CLAUDE.md |
|
|
1370
|
+
| Knowledge Capture | \`docs/discoveries/\`, \`docs/context/*.md\` \u2014 existence AND real content |
|
|
1371
|
+
| Testing & Validation | Test framework, CI pipeline, validation commands in CLAUDE.md |
|
|
1437
1372
|
|
|
1438
|
-
|
|
1373
|
+
Score 1 = absent, 3 = partially there, 5 = comprehensive. Give credit for substance over format.
|
|
1439
1374
|
|
|
1440
|
-
|
|
1375
|
+
## Step 4: Write Assessment
|
|
1441
1376
|
|
|
1442
|
-
|
|
1377
|
+
Write to \`docs/joycraft-assessment.md\` AND display it. Include: scores table, detailed findings (evidence + gap + recommendation per dimension), and an upgrade plan (up to 5 actions ordered by impact).
|
|
1443
1378
|
|
|
1444
|
-
|
|
1379
|
+
## Step 5: Apply Upgrades
|
|
1445
1380
|
|
|
1446
|
-
|
|
1381
|
+
Apply using three tiers \u2014 do NOT ask per-item permission:
|
|
1447
1382
|
|
|
1448
|
-
|
|
1383
|
+
**Tier 1 (silent):** Create missing dirs, install missing skills, copy missing templates, create AGENTS.md.
|
|
1449
1384
|
|
|
1450
|
-
|
|
1385
|
+
**Before Tier 2, ask TWO things:**
|
|
1451
1386
|
|
|
1452
|
-
|
|
1387
|
+
1. **Git autonomy:** Cautious (ask before push/PR) or Autonomous (push + PR without asking)?
|
|
1388
|
+
2. **Risk interview (3-5 questions, one at a time):** What could break? What services connect to prod? Unwritten rules? Off-limits files/commands? Skip if \`docs/context/\` already has content.
|
|
1453
1389
|
|
|
1454
|
-
|
|
1390
|
+
From answers, generate: CLAUDE.md boundary rules, \`.claude/settings.json\` deny patterns, \`docs/context/\` documents. Also recommend a permission mode (\`auto\` for most; \`dontAsk\` + allowlist for high-risk).
|
|
1455
1391
|
|
|
1456
|
-
|
|
1457
|
-
> **Rationale:** [why, referencing existing code or constraints]
|
|
1458
|
-
> **Alternative rejected:** [what you considered and why you rejected it]
|
|
1392
|
+
**Tier 2 (show diff):** Add missing CLAUDE.md sections (Boundaries, Workflow, Key Files). Draft from real codebase content. Append only \u2014 never reformat existing content.
|
|
1459
1393
|
|
|
1460
|
-
|
|
1394
|
+
**Tier 3 (confirm first):** Rewriting existing sections, overwriting customized files, suggesting test framework installs.
|
|
1461
1395
|
|
|
1462
|
-
|
|
1396
|
+
After applying, append to \`docs/joycraft-history.md\` and show a consolidated upgrade results table.
|
|
1463
1397
|
|
|
1464
|
-
|
|
1465
|
-
> - **Option A:** [description] \u2014 Pro: [benefit]. Con: [cost].
|
|
1466
|
-
> - **Option B:** [description] \u2014 Pro: [benefit]. Con: [cost].
|
|
1467
|
-
> - **Option C (if applicable):** [description] \u2014 Pro: [benefit]. Con: [cost].
|
|
1398
|
+
## Step 6: Show Path to Level 5
|
|
1468
1399
|
|
|
1469
|
-
|
|
1400
|
+
Show a tailored roadmap: Level 2-5 table, specific next steps based on actual gaps, and the Level 5 north star (spec queue, autofix, holdout scenarios, self-improving harness).
|
|
1470
1401
|
|
|
1471
|
-
##
|
|
1402
|
+
## Edge Cases
|
|
1472
1403
|
|
|
1473
|
-
|
|
1404
|
+
- **CLAUDE.md is just a README:** Treat as no harness.
|
|
1405
|
+
- **Non-Joycraft skills:** Acknowledge, don't replace.
|
|
1406
|
+
- **Rules under non-standard headings:** Give credit for substance.
|
|
1407
|
+
- **Previous assessment exists:** Read it first. If nothing to upgrade, say so.
|
|
1408
|
+
- **Non-Joycraft content in CLAUDE.md:** Preserve as-is. Only append.
|
|
1409
|
+
`,
|
|
1410
|
+
"joycraft-verify.md": `---
|
|
1411
|
+
name: joycraft-verify
|
|
1412
|
+
description: Spawn an independent verifier subagent to check an implementation against its spec -- read-only, no code edits, structured pass/fail verdict
|
|
1413
|
+
instructions: 30
|
|
1414
|
+
---
|
|
1474
1415
|
|
|
1475
|
-
|
|
1476
|
-
Design discussion written to docs/designs/YYYY-MM-DD-feature-name.md
|
|
1416
|
+
# Verify Implementation Against Spec
|
|
1477
1417
|
|
|
1478
|
-
|
|
1479
|
-
1. Are the patterns in Section 3 the right ones to follow, or should I use different ones?
|
|
1480
|
-
2. Do you agree with the resolved decisions in Section 4?
|
|
1481
|
-
3. Pick an option for each open question in Section 5 (or propose your own).
|
|
1418
|
+
The user wants independent verification of an implementation. Your job is to find the relevant spec, extract its acceptance criteria and test plan, then spawn a separate verifier subagent that checks each criterion and produces a structured verdict.
|
|
1482
1419
|
|
|
1483
|
-
|
|
1484
|
-
\`\`\`
|
|
1420
|
+
**Why a separate subagent?** Anthropic's research found that agents reliably skew positive when grading their own work. Separating the agent doing the work from the agent judging it consistently outperforms self-evaluation. The verifier gets a clean context window with no implementation bias.
|
|
1485
1421
|
|
|
1486
|
-
|
|
1422
|
+
## Step 1: Find the Spec
|
|
1487
1423
|
|
|
1488
|
-
|
|
1424
|
+
If the user provided a spec path (e.g., \`/joycraft-verify docs/specs/2026-03-26-add-widget.md\`), use that path directly.
|
|
1489
1425
|
|
|
1490
|
-
|
|
1491
|
-
- Update the design document with their corrections and chosen options
|
|
1492
|
-
- Move answered questions from "Open Questions" to "Resolved Design Decisions"
|
|
1493
|
-
- Present the updated document for final confirmation
|
|
1494
|
-
- Only after explicit approval, tell the user: "Design approved. Run \`/joycraft-decompose\` with this brief to generate atomic specs."
|
|
1495
|
-
`,
|
|
1496
|
-
"joycraft-research.md": `---
|
|
1497
|
-
name: joycraft-research
|
|
1498
|
-
description: Produce objective codebase research by isolating question generation from fact-gathering \u2014 subagent sees only questions, never the brief
|
|
1499
|
-
---
|
|
1426
|
+
If no path was provided, scan \`docs/specs/\` for spec files. Pick the most recently modified \`.md\` file in that directory. If \`docs/specs/\` doesn't exist or is empty, tell the user:
|
|
1500
1427
|
|
|
1501
|
-
|
|
1428
|
+
> No specs found in \`docs/specs/\`. Please provide a spec path: \`/joycraft-verify path/to/spec.md\`
|
|
1502
1429
|
|
|
1503
|
-
|
|
1430
|
+
## Step 2: Read and Parse the Spec
|
|
1504
1431
|
|
|
1505
|
-
|
|
1506
|
-
"What feature or change are you researching? Provide a brief path (e.g., \`docs/briefs/2026-03-30-my-feature.md\`) or describe it in a few sentences."
|
|
1432
|
+
Read the spec file and extract:
|
|
1507
1433
|
|
|
1508
|
-
|
|
1434
|
+
1. **Spec name** -- from the H1 title
|
|
1435
|
+
2. **Acceptance Criteria** -- the checklist under the \`## Acceptance Criteria\` section
|
|
1436
|
+
3. **Test Plan** -- the table under the \`## Test Plan\` section, including any test commands
|
|
1437
|
+
4. **Constraints** -- the \`## Constraints\` section if present
|
|
1509
1438
|
|
|
1510
|
-
|
|
1439
|
+
If the spec has no Acceptance Criteria section, tell the user:
|
|
1511
1440
|
|
|
1512
|
-
|
|
1441
|
+
> This spec doesn't have an Acceptance Criteria section. Verification needs criteria to check against. Add acceptance criteria to the spec and try again.
|
|
1513
1442
|
|
|
1514
|
-
|
|
1443
|
+
If the spec has no Test Plan section, note this but proceed -- the verifier can still check criteria by reading code and running any available project tests.
|
|
1515
1444
|
|
|
1516
|
-
|
|
1517
|
-
- **Specific to the codebase** \u2014 reference concrete systems, files, or flows
|
|
1518
|
-
- **Answerable by reading code** \u2014 no questions about business strategy or user preferences
|
|
1445
|
+
## Step 3: Identify Test Commands
|
|
1519
1446
|
|
|
1520
|
-
|
|
1521
|
-
- "How does endpoint registration work in the current router?"
|
|
1522
|
-
- "What patterns exist for input validation across existing handlers?"
|
|
1523
|
-
- "Trace the data flow from API request to database write for entity X."
|
|
1524
|
-
- "What test infrastructure exists? Where are fixtures, mocks, and helpers?"
|
|
1525
|
-
- "What dependencies does module Y import, and what does its public API look like?"
|
|
1447
|
+
Look for test commands in these locations (in priority order):
|
|
1526
1448
|
|
|
1527
|
-
|
|
1528
|
-
|
|
1529
|
-
|
|
1530
|
-
-
|
|
1449
|
+
1. The spec's Test Plan section (look for commands in backticks or "Type" column entries like "unit", "integration", "e2e", "build")
|
|
1450
|
+
2. The project's CLAUDE.md (look for test/build commands in the Development Workflow section)
|
|
1451
|
+
3. Common defaults based on the project type:
|
|
1452
|
+
- Node.js: \`npm test\` or \`pnpm test --run\`
|
|
1453
|
+
- Python: \`pytest\`
|
|
1454
|
+
- Rust: \`cargo test\`
|
|
1455
|
+
- Go: \`go test ./...\`
|
|
1531
1456
|
|
|
1532
|
-
|
|
1457
|
+
Build a list of specific commands the verifier should run.
|
|
1533
1458
|
|
|
1534
|
-
|
|
1459
|
+
## Step 4: Spawn the Verifier Subagent
|
|
1535
1460
|
|
|
1536
|
-
|
|
1461
|
+
Use Claude Code's Agent tool to spawn a subagent with the following prompt. Replace the placeholders with the actual content extracted in Steps 2-3.
|
|
1537
1462
|
|
|
1538
|
-
|
|
1463
|
+
\`\`\`
|
|
1464
|
+
You are a QA verifier. Your job is to independently verify an implementation against its spec. You have NO context about how the implementation was done -- you are checking it fresh.
|
|
1539
1465
|
|
|
1540
|
-
|
|
1466
|
+
RULES -- these are hard constraints, not suggestions:
|
|
1467
|
+
- You may READ any file using the Read tool or cat
|
|
1468
|
+
- You may RUN these specific test/build commands: [TEST_COMMANDS]
|
|
1469
|
+
- You may NOT edit, create, or delete any files
|
|
1470
|
+
- You may NOT run commands that modify state (no git commit, no npm install, no file writes)
|
|
1471
|
+
- You may NOT install packages or access the network
|
|
1472
|
+
- Report what you OBSERVE, not what you expect or hope
|
|
1541
1473
|
|
|
1542
|
-
|
|
1474
|
+
SPEC NAME: [SPEC_NAME]
|
|
1543
1475
|
|
|
1544
|
-
|
|
1545
|
-
|
|
1476
|
+
ACCEPTANCE CRITERIA:
|
|
1477
|
+
[ACCEPTANCE_CRITERIA]
|
|
1546
1478
|
|
|
1547
|
-
|
|
1548
|
-
|
|
1549
|
-
- Do NOT recommend, suggest, or opine on anything
|
|
1550
|
-
- Do NOT speculate about what should be built or how
|
|
1551
|
-
- If a question cannot be answered (no relevant code exists), say "No existing code found for this"
|
|
1552
|
-
- Use the Read tool and Grep tool to explore the codebase thoroughly
|
|
1553
|
-
- Include code snippets only when they are essential evidence (e.g., a function signature, a config block)
|
|
1479
|
+
TEST PLAN:
|
|
1480
|
+
[TEST_PLAN]
|
|
1554
1481
|
|
|
1555
|
-
|
|
1556
|
-
[
|
|
1482
|
+
CONSTRAINTS:
|
|
1483
|
+
[CONSTRAINTS_OR_NONE]
|
|
1557
1484
|
|
|
1558
|
-
|
|
1485
|
+
YOUR TASK:
|
|
1486
|
+
For each acceptance criterion, determine if it PASSES or FAILS based on evidence:
|
|
1559
1487
|
|
|
1560
|
-
|
|
1488
|
+
1. Run the test commands listed above. Record the output.
|
|
1489
|
+
2. For each acceptance criterion:
|
|
1490
|
+
a. Check if there is a corresponding test and whether it passes
|
|
1491
|
+
b. If no test exists, read the relevant source files to verify the criterion is met
|
|
1492
|
+
c. If the criterion cannot be verified by reading code or running tests, mark it MANUAL CHECK NEEDED
|
|
1493
|
+
3. For criteria about build/test passing, actually run the commands and report results.
|
|
1561
1494
|
|
|
1562
|
-
|
|
1563
|
-
**Questions answered:** [N/total]
|
|
1495
|
+
OUTPUT FORMAT -- you MUST use this exact format:
|
|
1564
1496
|
|
|
1565
|
-
|
|
1497
|
+
VERIFICATION REPORT
|
|
1498
|
+
|
|
1499
|
+
| # | Criterion | Verdict | Evidence |
|
|
1500
|
+
|---|-----------|---------|----------|
|
|
1501
|
+
| 1 | [criterion text] | PASS/FAIL/MANUAL CHECK NEEDED | [what you observed] |
|
|
1502
|
+
| 2 | [criterion text] | PASS/FAIL/MANUAL CHECK NEEDED | [what you observed] |
|
|
1503
|
+
[continue for all criteria]
|
|
1566
1504
|
|
|
1567
|
-
|
|
1505
|
+
SUMMARY: X/Y criteria passed. [Z failures need attention. / All criteria verified.]
|
|
1568
1506
|
|
|
1569
|
-
|
|
1507
|
+
If any test commands fail to run (missing dependencies, wrong command, etc.), report the error as evidence for a FAIL verdict on the relevant criterion.
|
|
1508
|
+
\`\`\`
|
|
1570
1509
|
|
|
1571
|
-
##
|
|
1510
|
+
## Step 5: Format and Present the Verdict
|
|
1572
1511
|
|
|
1573
|
-
|
|
1512
|
+
Take the subagent's response and present it to the user in this format:
|
|
1574
1513
|
|
|
1575
|
-
[Continue for all questions]
|
|
1576
1514
|
\`\`\`
|
|
1515
|
+
## Verification Report -- [Spec Name]
|
|
1577
1516
|
|
|
1578
|
-
|
|
1517
|
+
| # | Criterion | Verdict | Evidence |
|
|
1518
|
+
|---|-----------|---------|----------|
|
|
1519
|
+
| 1 | ... | PASS | ... |
|
|
1520
|
+
| 2 | ... | FAIL | ... |
|
|
1579
1521
|
|
|
1580
|
-
|
|
1522
|
+
**Overall: X/Y criteria passed.**
|
|
1581
1523
|
|
|
1582
|
-
|
|
1524
|
+
[If all passed:]
|
|
1525
|
+
All criteria verified. Ready to commit and open a PR.
|
|
1583
1526
|
|
|
1584
|
-
|
|
1527
|
+
[If any failed:]
|
|
1528
|
+
N failures need attention. Review the evidence above and fix before proceeding.
|
|
1585
1529
|
|
|
1530
|
+
[If any MANUAL CHECK NEEDED:]
|
|
1531
|
+
N criteria need manual verification -- they can't be checked by reading code or running tests alone.
|
|
1586
1532
|
\`\`\`
|
|
1587
|
-
Research complete: docs/research/YYYY-MM-DD-feature-name.md
|
|
1588
1533
|
|
|
1589
|
-
|
|
1534
|
+
## Step 6: Suggest Next Steps
|
|
1590
1535
|
|
|
1591
|
-
|
|
1592
|
-
|
|
1593
|
-
-
|
|
1594
|
-
-
|
|
1595
|
-
|
|
1536
|
+
Based on the verdict:
|
|
1537
|
+
|
|
1538
|
+
- **All PASS:** Suggest committing and opening a PR, or running \`/joycraft-session-end\` to capture discoveries.
|
|
1539
|
+
- **Some FAIL:** List the failed criteria and suggest the user fix them, then run \`/joycraft-verify\` again.
|
|
1540
|
+
- **MANUAL CHECK NEEDED items:** Explain what needs human eyes and why automation couldn't verify it.
|
|
1541
|
+
|
|
1542
|
+
**Do NOT offer to fix failures yourself.** The verifier reports; the human (or implementation agent in a separate turn) decides what to do. This separation is the whole point.
|
|
1596
1543
|
|
|
1597
1544
|
## Edge Cases
|
|
1598
1545
|
|
|
1599
1546
|
| Scenario | Behavior |
|
|
1600
1547
|
|----------|----------|
|
|
1601
|
-
|
|
|
1602
|
-
|
|
|
1603
|
-
|
|
|
1604
|
-
|
|
|
1605
|
-
|
|
|
1548
|
+
| Spec has no Test Plan | Warn that verification is weaker without a test plan, but proceed by checking criteria through code reading and any available project-level tests |
|
|
1549
|
+
| All tests pass but a criterion is not testable | Mark as MANUAL CHECK NEEDED with explanation |
|
|
1550
|
+
| Subagent can't run tests (missing deps) | Report the error as FAIL evidence |
|
|
1551
|
+
| No specs found and no path given | Tell user to provide a spec path or create a spec first |
|
|
1552
|
+
| Spec status is "Complete" | Still run verification -- "Complete" means the implementer thinks it's done, verification confirms |
|
|
1606
1553
|
`
|
|
1607
1554
|
};
|
|
1608
1555
|
var TEMPLATES = {
|
|
@@ -1984,59 +1931,13 @@ is required, though you can add one via the GitHub Checks API if you prefer.
|
|
|
1984
1931
|
|
|
1985
1932
|
---
|
|
1986
1933
|
|
|
1987
|
-
## Testing by Stack Type
|
|
1988
|
-
|
|
1989
|
-
The scenario agent selects the appropriate test format based on the project's
|
|
1990
|
-
testing backbone. Each backbone tests the same holdout principle \u2014 observable
|
|
1991
|
-
behavior only, no source imports \u2014 but uses different tools.
|
|
1992
|
-
|
|
1993
|
-
### Web Apps (Playwright)
|
|
1994
|
-
|
|
1995
|
-
For Next.js, Vite, Nuxt, Remix, and other web frameworks. Tests run against a
|
|
1996
|
-
dev server or preview URL using a headless browser.
|
|
1997
|
-
|
|
1998
|
-
- **Template:** \`example-scenario-web.spec.ts\`
|
|
1999
|
-
- **Config:** \`playwright.config.ts\`
|
|
2000
|
-
- **Package:** \`package-web.json\` (use instead of \`package.json\` for web projects)
|
|
2001
|
-
- **Run:** \`npx playwright test\`
|
|
2002
|
-
|
|
2003
|
-
### Mobile Apps (Maestro)
|
|
2004
|
-
|
|
2005
|
-
For React Native, Flutter, and native iOS/Android. Tests are declarative YAML
|
|
2006
|
-
flows that interact with a running app on a simulator.
|
|
2007
|
-
|
|
2008
|
-
- **Template:** \`example-scenario-mobile.yaml\`
|
|
2009
|
-
- **Login sub-flow:** \`example-scenario-mobile-login.yaml\`
|
|
2010
|
-
- **Setup guide:** \`README-mobile.md\`
|
|
2011
|
-
- **Run:** \`maestro test example-scenario-mobile.yaml\`
|
|
2012
|
-
|
|
2013
|
-
### API Backends (HTTP)
|
|
2014
|
-
|
|
2015
|
-
For Express, FastAPI, Django, and other API-only backends. Tests send HTTP
|
|
2016
|
-
requests using Node.js built-in \`fetch\`.
|
|
2017
|
-
|
|
2018
|
-
- **Template:** \`example-scenario-api.test.ts\`
|
|
2019
|
-
- **Run:** \`npx vitest run\`
|
|
2020
|
-
|
|
2021
|
-
### CLI Tools & Libraries (native)
|
|
2022
|
-
|
|
2023
|
-
For CLI tools, npm packages, and non-UI projects. Tests invoke the built
|
|
2024
|
-
binary via \`spawnSync\` and assert on stdout/stderr.
|
|
2025
|
-
|
|
2026
|
-
- **Template:** \`example-scenario.test.ts\`
|
|
2027
|
-
- **Run:** \`npx vitest run\`
|
|
2028
|
-
|
|
2029
|
-
---
|
|
2030
|
-
|
|
2031
1934
|
## Adding scenarios
|
|
2032
1935
|
|
|
2033
1936
|
### Rules
|
|
2034
1937
|
|
|
2035
|
-
|
|
2036
|
-
|
|
2037
|
-
|
|
2038
|
-
perspective. For web: navigate and assert on content. For CLI: run commands
|
|
2039
|
-
and check output. For API: send requests and check responses.
|
|
1938
|
+
1. **Behavioral, not structural.** Test what the tool does, not how it is
|
|
1939
|
+
built internally. Invoke the binary; assert on stdout, exit codes, and
|
|
1940
|
+
filesystem state. Never import from \`../main-repo/src\`.
|
|
2040
1941
|
|
|
2041
1942
|
2. **End-to-end.** Each test should represent something a real user would
|
|
2042
1943
|
actually do. If you would not put it in a demo or docs example, reconsider
|
|
@@ -2046,8 +1947,9 @@ These rules apply to ALL backbones:
|
|
|
2046
1947
|
see source code. Any \`import\` that reaches into \`../main-repo/src\` breaks
|
|
2047
1948
|
the pattern.
|
|
2048
1949
|
|
|
2049
|
-
4. **Independent.** Each test must be able to run in isolation.
|
|
2050
|
-
|
|
1950
|
+
4. **Independent.** Each test must be able to run in isolation. Use \`beforeEach\`
|
|
1951
|
+
/ \`afterEach\` to set up and tear down temp directories. Do not share mutable
|
|
1952
|
+
state between tests.
|
|
2051
1953
|
|
|
2052
1954
|
5. **Deterministic.** Avoid network calls, timestamps, or random values in
|
|
2053
1955
|
assertions unless the feature under test genuinely involves them.
|
|
@@ -2056,25 +1958,31 @@ These rules apply to ALL backbones:
|
|
|
2056
1958
|
|
|
2057
1959
|
\`\`\`
|
|
2058
1960
|
$SCENARIOS_REPO/
|
|
2059
|
-
\u251C\u2500\u2500 example-scenario.test.ts
|
|
2060
|
-
\u251C\u2500\u2500 example-scenario-web.spec.ts # Web app scenario template (Playwright)
|
|
2061
|
-
\u251C\u2500\u2500 example-scenario-api.test.ts # API backend scenario template
|
|
2062
|
-
\u251C\u2500\u2500 example-scenario-mobile.yaml # Mobile app scenario template (Maestro)
|
|
2063
|
-
\u251C\u2500\u2500 example-scenario-mobile-login.yaml # Reusable login sub-flow
|
|
2064
|
-
\u251C\u2500\u2500 playwright.config.ts # Playwright config (web projects)
|
|
2065
|
-
\u251C\u2500\u2500 package.json # Default (vitest for CLI/API)
|
|
2066
|
-
\u251C\u2500\u2500 package-web.json # Alternative (Playwright for web)
|
|
2067
|
-
\u251C\u2500\u2500 README-mobile.md # Mobile testing setup guide
|
|
1961
|
+
\u251C\u2500\u2500 example-scenario.test.ts # Starter file \u2014 replace with real scenarios
|
|
2068
1962
|
\u251C\u2500\u2500 workflows/
|
|
2069
|
-
\u2502 \
|
|
2070
|
-
\
|
|
2071
|
-
\u251C\u2500\u2500 prompts/
|
|
2072
|
-
\u2502 \u2514\u2500\u2500 scenario-agent.md # Scenario agent instructions
|
|
1963
|
+
\u2502 \u2514\u2500\u2500 run.yml # CI workflow (do not rename)
|
|
1964
|
+
\u251C\u2500\u2500 package.json
|
|
2073
1965
|
\u2514\u2500\u2500 README.md
|
|
2074
1966
|
\`\`\`
|
|
2075
1967
|
|
|
2076
|
-
|
|
2077
|
-
|
|
1968
|
+
Add new \`.test.ts\` files at the top level or in subdirectories. Vitest will
|
|
1969
|
+
discover them automatically.
|
|
1970
|
+
|
|
1971
|
+
### Example structure
|
|
1972
|
+
|
|
1973
|
+
\`\`\`ts
|
|
1974
|
+
import { spawnSync } from "node:child_process";
|
|
1975
|
+
import { join } from "node:path";
|
|
1976
|
+
|
|
1977
|
+
const CLI = join(__dirname, "..", "main-repo", "dist", "cli.js");
|
|
1978
|
+
|
|
1979
|
+
it("init creates a CLAUDE.md file", () => {
|
|
1980
|
+
const tmp = mkdtempSync(join(tmpdir(), "scenario-"));
|
|
1981
|
+
const { status } = spawnSync("node", [CLI, "init", tmp], { encoding: "utf8" });
|
|
1982
|
+
expect(status).toBe(0);
|
|
1983
|
+
expect(existsSync(join(tmp, "CLAUDE.md"))).toBe(true);
|
|
1984
|
+
});
|
|
1985
|
+
\`\`\`
|
|
2078
1986
|
|
|
2079
1987
|
---
|
|
2080
1988
|
|
|
@@ -2086,7 +1994,6 @@ don't need.
|
|
|
2086
1994
|
| Visible to agent | Yes | No |
|
|
2087
1995
|
| What they test | Units, modules, logic | End-to-end behavior |
|
|
2088
1996
|
| Import source code | Yes | Never |
|
|
2089
|
-
| Test method | Unit test framework | Depends on backbone (Playwright/Maestro/vitest/fetch) |
|
|
2090
1997
|
| Run on every push | Yes | Yes (via dispatch) |
|
|
2091
1998
|
| Purpose | Catch regressions fast | Validate real behavior |
|
|
2092
1999
|
|
|
@@ -2235,304 +2142,6 @@ describe("CLI: init command (example \u2014 replace with your real scenarios)",
|
|
|
2235
2142
|
}
|
|
2236
2143
|
}
|
|
2237
2144
|
`,
|
|
2238
|
-
"scenarios/package-web.json": `{
|
|
2239
|
-
"name": "$SCENARIOS_REPO",
|
|
2240
|
-
"version": "0.0.1",
|
|
2241
|
-
"private": true,
|
|
2242
|
-
"type": "module",
|
|
2243
|
-
"scripts": {
|
|
2244
|
-
"test": "playwright test"
|
|
2245
|
-
},
|
|
2246
|
-
"devDependencies": {
|
|
2247
|
-
"@playwright/test": "^1.50.0"
|
|
2248
|
-
}
|
|
2249
|
-
}
|
|
2250
|
-
`,
|
|
2251
|
-
"scenarios/playwright.config.ts": `import { defineConfig } from '@playwright/test';
|
|
2252
|
-
|
|
2253
|
-
/**
|
|
2254
|
-
* Playwright configuration for holdout scenario tests.
|
|
2255
|
-
*
|
|
2256
|
-
* BASE_URL can be set to test against a preview deployment URL
|
|
2257
|
-
* or defaults to http://localhost:3000 for local dev server testing.
|
|
2258
|
-
*/
|
|
2259
|
-
export default defineConfig({
|
|
2260
|
-
testDir: '.',
|
|
2261
|
-
testMatch: '**/*.spec.ts',
|
|
2262
|
-
timeout: 60_000,
|
|
2263
|
-
retries: 0,
|
|
2264
|
-
use: {
|
|
2265
|
-
baseURL: process.env.BASE_URL || 'http://localhost:3000',
|
|
2266
|
-
headless: true,
|
|
2267
|
-
screenshot: 'only-on-failure',
|
|
2268
|
-
},
|
|
2269
|
-
projects: [
|
|
2270
|
-
{ name: 'chromium', use: { browserName: 'chromium' } },
|
|
2271
|
-
],
|
|
2272
|
-
});
|
|
2273
|
-
`,
|
|
2274
|
-
"scenarios/example-scenario-web.spec.ts": `/**
|
|
2275
|
-
* Example Web Scenario Test (Playwright)
|
|
2276
|
-
*
|
|
2277
|
-
* This file is a template for scenario tests against web applications.
|
|
2278
|
-
* The holdout pattern applies: test the running app through its UI,
|
|
2279
|
-
* never import source code from the main repo.
|
|
2280
|
-
*
|
|
2281
|
-
* The main repo is available at ../main-repo and is already built.
|
|
2282
|
-
* Tests run against either:
|
|
2283
|
-
* - A dev server started from ../main-repo (default)
|
|
2284
|
-
* - A preview deployment URL (set BASE_URL env var)
|
|
2285
|
-
*
|
|
2286
|
-
* DO:
|
|
2287
|
-
* - Navigate to pages, click elements, fill forms, assert on visible content
|
|
2288
|
-
* - Use page.locator() with accessible selectors (role, text, test-id)
|
|
2289
|
-
* - Keep each test fully independent
|
|
2290
|
-
*
|
|
2291
|
-
* DON'T:
|
|
2292
|
-
* - Import from ../main-repo/src \u2014 that defeats the holdout
|
|
2293
|
-
* - Test internal implementation details
|
|
2294
|
-
* - Rely on specific CSS classes or DOM structure (use accessible selectors)
|
|
2295
|
-
*/
|
|
2296
|
-
|
|
2297
|
-
import { test, expect } from '@playwright/test';
|
|
2298
|
-
import { spawn, type ChildProcess } from 'node:child_process';
|
|
2299
|
-
import { join } from 'node:path';
|
|
2300
|
-
|
|
2301
|
-
const MAIN_REPO = join(__dirname, '..', 'main-repo');
|
|
2302
|
-
let serverProcess: ChildProcess | undefined;
|
|
2303
|
-
|
|
2304
|
-
/**
|
|
2305
|
-
* Wait for a URL to become reachable.
|
|
2306
|
-
*/
|
|
2307
|
-
async function waitForServer(url: string, timeoutMs = 60_000): Promise<void> {
|
|
2308
|
-
const start = Date.now();
|
|
2309
|
-
while (Date.now() - start < timeoutMs) {
|
|
2310
|
-
try {
|
|
2311
|
-
const res = await fetch(url);
|
|
2312
|
-
if (res.ok || res.status < 500) return;
|
|
2313
|
-
} catch {
|
|
2314
|
-
// Server not ready yet
|
|
2315
|
-
}
|
|
2316
|
-
await new Promise(r => setTimeout(r, 1000));
|
|
2317
|
-
}
|
|
2318
|
-
throw new Error(\`Server at \${url} did not become ready within \${timeoutMs}ms\`);
|
|
2319
|
-
}
|
|
2320
|
-
|
|
2321
|
-
test.beforeAll(async () => {
|
|
2322
|
-
// If BASE_URL is set, skip starting a dev server \u2014 test against the provided URL
|
|
2323
|
-
if (process.env.BASE_URL) return;
|
|
2324
|
-
|
|
2325
|
-
serverProcess = spawn('npm', ['run', 'dev'], {
|
|
2326
|
-
cwd: MAIN_REPO,
|
|
2327
|
-
stdio: 'pipe',
|
|
2328
|
-
env: { ...process.env, PORT: '3000' },
|
|
2329
|
-
});
|
|
2330
|
-
|
|
2331
|
-
await waitForServer('http://localhost:3000');
|
|
2332
|
-
});
|
|
2333
|
-
|
|
2334
|
-
test.afterAll(async () => {
|
|
2335
|
-
if (serverProcess) {
|
|
2336
|
-
serverProcess.kill('SIGTERM');
|
|
2337
|
-
serverProcess = undefined;
|
|
2338
|
-
}
|
|
2339
|
-
});
|
|
2340
|
-
|
|
2341
|
-
// ---------------------------------------------------------------------------
|
|
2342
|
-
// Example scenarios \u2014 replace with real tests for your application
|
|
2343
|
-
// ---------------------------------------------------------------------------
|
|
2344
|
-
|
|
2345
|
-
test.describe('Home page', () => {
|
|
2346
|
-
test('loads successfully and shows main heading', async ({ page }) => {
|
|
2347
|
-
await page.goto('/');
|
|
2348
|
-
// Replace with your app's actual heading or key element
|
|
2349
|
-
await expect(page.locator('h1')).toBeVisible();
|
|
2350
|
-
});
|
|
2351
|
-
|
|
2352
|
-
test('navigates to a subpage', async ({ page }) => {
|
|
2353
|
-
await page.goto('/');
|
|
2354
|
-
// Replace with your app's actual navigation
|
|
2355
|
-
// await page.click('text=About');
|
|
2356
|
-
// await expect(page).toHaveURL(/\\/about/);
|
|
2357
|
-
// await expect(page.locator('h1')).toContainText('About');
|
|
2358
|
-
});
|
|
2359
|
-
});
|
|
2360
|
-
`,
|
|
2361
|
-
"scenarios/example-scenario-api.test.ts": `/**
|
|
2362
|
-
* Example API Scenario Test
|
|
2363
|
-
*
|
|
2364
|
-
* This file is a template for scenario tests against API-only backends.
|
|
2365
|
-
* The holdout pattern applies: test the running server via HTTP requests,
|
|
2366
|
-
* never import route handlers or source code from the main repo.
|
|
2367
|
-
*
|
|
2368
|
-
* The main repo is available at ../main-repo and is already built.
|
|
2369
|
-
* Tests run against either:
|
|
2370
|
-
* - A server started from ../main-repo (default)
|
|
2371
|
-
* - A deployed URL (set BASE_URL env var)
|
|
2372
|
-
*
|
|
2373
|
-
* Uses Node.js built-in fetch \u2014 no additional HTTP client dependencies.
|
|
2374
|
-
*
|
|
2375
|
-
* DO:
|
|
2376
|
-
* - Send HTTP requests to endpoints, assert on status codes and response bodies
|
|
2377
|
-
* - Test realistic user actions (create, read, update, delete flows)
|
|
2378
|
-
* - Keep each test fully independent
|
|
2379
|
-
*
|
|
2380
|
-
* DON'T:
|
|
2381
|
-
* - Import from ../main-repo/src \u2014 that defeats the holdout
|
|
2382
|
-
* - Use supertest or similar tools that import the app directly
|
|
2383
|
-
* - Test internal implementation details
|
|
2384
|
-
*/
|
|
2385
|
-
|
|
2386
|
-
import { describe, it, expect, beforeAll, afterAll } from 'vitest';
|
|
2387
|
-
import { spawn, type ChildProcess } from 'node:child_process';
|
|
2388
|
-
import { join } from 'node:path';
|
|
2389
|
-
|
|
2390
|
-
const MAIN_REPO = join(__dirname, '..', 'main-repo');
|
|
2391
|
-
const BASE_URL = process.env.BASE_URL || 'http://localhost:3000';
|
|
2392
|
-
let serverProcess: ChildProcess | undefined;
|
|
2393
|
-
|
|
2394
|
-
/**
|
|
2395
|
-
* Wait for a URL to become reachable.
|
|
2396
|
-
*/
|
|
2397
|
-
async function waitForServer(url: string, timeoutMs = 60_000): Promise<void> {
|
|
2398
|
-
const start = Date.now();
|
|
2399
|
-
while (Date.now() - start < timeoutMs) {
|
|
2400
|
-
try {
|
|
2401
|
-
const res = await fetch(url);
|
|
2402
|
-
if (res.ok || res.status < 500) return;
|
|
2403
|
-
} catch {
|
|
2404
|
-
// Server not ready yet
|
|
2405
|
-
}
|
|
2406
|
-
await new Promise(r => setTimeout(r, 1000));
|
|
2407
|
-
}
|
|
2408
|
-
throw new Error(\`Server at \${url} did not become ready within \${timeoutMs}ms\`);
|
|
2409
|
-
}
|
|
2410
|
-
|
|
2411
|
-
beforeAll(async () => {
|
|
2412
|
-
// If BASE_URL is set externally, skip starting a server
|
|
2413
|
-
if (process.env.BASE_URL) return;
|
|
2414
|
-
|
|
2415
|
-
serverProcess = spawn('npm', ['start'], {
|
|
2416
|
-
cwd: MAIN_REPO,
|
|
2417
|
-
stdio: 'pipe',
|
|
2418
|
-
env: { ...process.env, PORT: '3000' },
|
|
2419
|
-
});
|
|
2420
|
-
|
|
2421
|
-
await waitForServer(BASE_URL);
|
|
2422
|
-
}, 90_000);
|
|
2423
|
-
|
|
2424
|
-
afterAll(() => {
|
|
2425
|
-
if (serverProcess) {
|
|
2426
|
-
serverProcess.kill('SIGTERM');
|
|
2427
|
-
serverProcess = undefined;
|
|
2428
|
-
}
|
|
2429
|
-
});
|
|
2430
|
-
|
|
2431
|
-
// ---------------------------------------------------------------------------
|
|
2432
|
-
// Example scenarios \u2014 replace with real tests for your API
|
|
2433
|
-
// ---------------------------------------------------------------------------
|
|
2434
|
-
|
|
2435
|
-
describe('API health', () => {
|
|
2436
|
-
it('GET / returns a success status', async () => {
|
|
2437
|
-
const res = await fetch(\`\${BASE_URL}/\`);
|
|
2438
|
-
expect(res.status).toBeLessThan(500);
|
|
2439
|
-
});
|
|
2440
|
-
});
|
|
2441
|
-
|
|
2442
|
-
describe('API endpoints', () => {
|
|
2443
|
-
it('GET /api/example returns JSON', async () => {
|
|
2444
|
-
const res = await fetch(\`\${BASE_URL}/api/example\`);
|
|
2445
|
-
// Replace with your actual endpoint
|
|
2446
|
-
// expect(res.status).toBe(200);
|
|
2447
|
-
// const body = await res.json();
|
|
2448
|
-
// expect(body).toHaveProperty('data');
|
|
2449
|
-
});
|
|
2450
|
-
|
|
2451
|
-
it('POST /api/example creates a resource', async () => {
|
|
2452
|
-
// Replace with your actual endpoint and payload
|
|
2453
|
-
// const res = await fetch(\\\`\\\${BASE_URL}/api/example\\\`, {
|
|
2454
|
-
// method: 'POST',
|
|
2455
|
-
// headers: { 'Content-Type': 'application/json' },
|
|
2456
|
-
// body: JSON.stringify({ name: 'test' }),
|
|
2457
|
-
// });
|
|
2458
|
-
// expect(res.status).toBe(201);
|
|
2459
|
-
// const body = await res.json();
|
|
2460
|
-
// expect(body).toHaveProperty('id');
|
|
2461
|
-
});
|
|
2462
|
-
|
|
2463
|
-
it('returns 404 for unknown routes', async () => {
|
|
2464
|
-
const res = await fetch(\`\${BASE_URL}/api/does-not-exist\`);
|
|
2465
|
-
expect(res.status).toBe(404);
|
|
2466
|
-
});
|
|
2467
|
-
});
|
|
2468
|
-
`,
|
|
2469
|
-
"scenarios/example-scenario-mobile.yaml": `# Example Mobile Scenario Test (Maestro)
|
|
2470
|
-
#
|
|
2471
|
-
# This file is a template for scenario tests against mobile applications.
|
|
2472
|
-
# The holdout pattern applies: test the running app through its UI,
|
|
2473
|
-
# never reference source code from the main repo.
|
|
2474
|
-
#
|
|
2475
|
-
# Maestro tests are declarative YAML flows that interact with a running
|
|
2476
|
-
# app on a simulator/emulator. Install Maestro:
|
|
2477
|
-
# curl -Ls "https://get.maestro.mobile.dev" | bash
|
|
2478
|
-
#
|
|
2479
|
-
# Run this flow:
|
|
2480
|
-
# maestro test example-scenario-mobile.yaml
|
|
2481
|
-
#
|
|
2482
|
-
# DO:
|
|
2483
|
-
# - Tap elements, fill inputs, assert on visible text
|
|
2484
|
-
# - Use runFlow for reusable sub-flows (e.g., login)
|
|
2485
|
-
# - Use assertWithAI for natural-language assertions
|
|
2486
|
-
#
|
|
2487
|
-
# DON'T:
|
|
2488
|
-
# - Reference source code paths or internal identifiers
|
|
2489
|
-
# - Depend on exact pixel positions (use text and accessibility labels)
|
|
2490
|
-
|
|
2491
|
-
appId: com.example.myapp # Replace with your app's bundle identifier
|
|
2492
|
-
name: "Core User Journey"
|
|
2493
|
-
tags:
|
|
2494
|
-
- smoke
|
|
2495
|
-
- holdout
|
|
2496
|
-
---
|
|
2497
|
-
# Step 1: Launch the app
|
|
2498
|
-
- launchApp
|
|
2499
|
-
|
|
2500
|
-
# Step 2: Login (using a reusable sub-flow)
|
|
2501
|
-
- runFlow: example-scenario-mobile-login.yaml
|
|
2502
|
-
|
|
2503
|
-
# Step 3: Verify the main screen loaded
|
|
2504
|
-
- assertVisible: "Home"
|
|
2505
|
-
|
|
2506
|
-
# Step 4: Navigate to a feature
|
|
2507
|
-
# - tapOn: "Settings"
|
|
2508
|
-
# - assertVisible: "Account"
|
|
2509
|
-
|
|
2510
|
-
# Step 5: AI-powered assertion (natural language)
|
|
2511
|
-
# - assertWithAI: "The main dashboard is visible with navigation tabs at the bottom"
|
|
2512
|
-
|
|
2513
|
-
# Step 6: Go back
|
|
2514
|
-
# - back
|
|
2515
|
-
# - assertVisible: "Home"
|
|
2516
|
-
`,
|
|
2517
|
-
"scenarios/example-scenario-mobile-login.yaml": `# Reusable Login Sub-Flow (Maestro)
|
|
2518
|
-
#
|
|
2519
|
-
# This flow handles authentication. Other flows include it via:
|
|
2520
|
-
# - runFlow: example-scenario-mobile-login.yaml
|
|
2521
|
-
#
|
|
2522
|
-
# Replace the selectors and credentials with your app's actual login flow.
|
|
2523
|
-
|
|
2524
|
-
appId: com.example.myapp
|
|
2525
|
-
name: "Login"
|
|
2526
|
-
---
|
|
2527
|
-
- assertVisible: "Sign In"
|
|
2528
|
-
- tapOn: "Email"
|
|
2529
|
-
- inputText: "test@example.com"
|
|
2530
|
-
- tapOn: "Password"
|
|
2531
|
-
- inputText: "testpassword123"
|
|
2532
|
-
- tapOn: "Log In"
|
|
2533
|
-
- assertVisible: "Home" # Verify login succeeded
|
|
2534
|
-
`,
|
|
2535
|
-
"scenarios/README-mobile.md": '# Mobile Scenario Testing with Maestro\n\nThis guide explains how to set up and run mobile holdout scenario tests using [Maestro](https://maestro.dev/).\n\n## Prerequisites\n\n- **Maestro CLI:** `curl -Ls "https://get.maestro.mobile.dev" | bash`\n- **Java 17+** (required by Maestro)\n- **Simulator/Emulator:**\n - iOS: Xcode with iOS Simulator (macOS only)\n - Android: Android Studio with an AVD configured\n\n> **Important:** Joycraft does not install Maestro or manage simulators. This is your responsibility.\n\n## Running Tests Locally\n\n```bash\n# Boot your simulator/emulator first, then:\nmaestro test example-scenario-mobile.yaml\n\n# Run all flows in a directory:\nmaestro test .maestro/\n```\n\n## Writing Flows\n\nMaestro flows are declarative YAML. Core commands:\n\n| Command | Purpose |\n|---------|--------|\n| `launchApp` | Start or restart the app |\n| `tapOn: "text"` | Tap an element by visible text or test ID |\n| `inputText: "value"` | Type into a focused field |\n| `assertVisible: "text"` | Assert an element is on screen |\n| `assertNotVisible: "text"` | Assert an element is NOT on screen |\n| `scroll` | Scroll down |\n| `back` | Press the back button |\n| `runFlow: file.yaml` | Run a reusable sub-flow |\n| `assertWithAI: "description"` | Natural-language assertion (AI-powered) |\n\n## CI Options\n\n### Option A: Maestro Cloud (paid, easiest)\n\nUpload your app binary and flows to Maestro Cloud. No simulator management.\n\n```yaml\n- uses: mobile-dev-inc/action-maestro-cloud@v2\n with:\n api-key: ${{ secrets.MAESTRO_API_KEY }}\n app-file: app.apk # or app.ipa\n workspace: .\n```\n\n### Option B: Self-hosted emulator (free, more setup)\n\nSpin up an Android emulator on a Linux runner or iOS simulator on a macOS runner.\n\n> **Cost note:** macOS GitHub Actions runners are ~10x more expensive than Linux runners.\n\n## The Holdout Pattern\n\nThese tests live in the scenarios repo, separate from the main codebase. The scenario agent generates them from specs. They test observable behavior through the app\'s UI \u2014 never referencing source code or internal implementation.\n',
|
|
2536
2145
|
"scenarios/prompts/scenario-agent.md": `You are a QA engineer working in a holdout test repository. You CANNOT access the main repository's source code. Your job is to write or update behavioral scenario tests based on specs that are pushed from the main repo.
|
|
2537
2146
|
|
|
2538
2147
|
## What You Have Access To
|
|
@@ -2540,23 +2149,7 @@ name: "Login"
|
|
|
2540
2149
|
- This scenarios repository (test files, \`specs/\` mirror, \`package.json\`)
|
|
2541
2150
|
- The incoming spec (provided below)
|
|
2542
2151
|
- A list of existing test files and spec mirrors (provided below)
|
|
2543
|
-
- The main repo is available at \`../main-repo\` and is already built
|
|
2544
|
-
- The testing strategy for this project (provided below)
|
|
2545
|
-
|
|
2546
|
-
## Testing Strategy
|
|
2547
|
-
|
|
2548
|
-
This project uses the **$TESTING_BACKBONE** testing backbone.
|
|
2549
|
-
|
|
2550
|
-
Select the correct test format based on the backbone:
|
|
2551
|
-
|
|
2552
|
-
| Backbone | Tool | Test Format | File Extension | How to Test |
|
|
2553
|
-
|----------|------|-------------|---------------|-------------|
|
|
2554
|
-
| \`playwright\` | Playwright | Browser-based E2E | \`.spec.ts\` | Navigate pages, click elements, assert on visible content |
|
|
2555
|
-
| \`maestro\` | Maestro | YAML flows | \`.yaml\` | Tap elements, fill inputs, assert on screen state |
|
|
2556
|
-
| \`api\` | fetch (Node.js built-in) | HTTP requests | \`.test.ts\` | Send requests to endpoints, assert on responses |
|
|
2557
|
-
| \`native\` | vitest + spawnSync | CLI/binary invocation | \`.test.ts\` | Run commands, assert on stdout/stderr/exit codes |
|
|
2558
|
-
|
|
2559
|
-
If the backbone is not provided or unrecognized, default to \`native\`.
|
|
2152
|
+
- The main repo is available at \`../main-repo\` and is already built \u2014 you can invoke its CLI or entry point via \`execSync\`/\`spawnSync\`, but you MUST NOT import from \`../main-repo/src\`
|
|
2560
2153
|
|
|
2561
2154
|
## Triage Decision Tree
|
|
2562
2155
|
|
|
@@ -2575,7 +2168,7 @@ If you SKIP, write a brief comment in the relevant test file (or a new one) expl
|
|
|
2575
2168
|
- A new output format or file that gets generated
|
|
2576
2169
|
- A new user-facing behavior that doesn't map to any existing test file
|
|
2577
2170
|
|
|
2578
|
-
Name the file after the feature area
|
|
2171
|
+
Name the file after the feature area: \`[feature-area].test.ts\`. One feature area per test file.
|
|
2579
2172
|
|
|
2580
2173
|
### UPDATE \u2014 Modify an existing test file if the spec:
|
|
2581
2174
|
- Changes behavior that is already tested
|
|
@@ -2586,20 +2179,25 @@ Match to the most relevant existing test file by feature area.
|
|
|
2586
2179
|
|
|
2587
2180
|
**If you are unsure whether a spec is user-facing, err on the side of writing a test.**
|
|
2588
2181
|
|
|
2589
|
-
## Test Writing Rules
|
|
2182
|
+
## Test Writing Rules
|
|
2183
|
+
|
|
2184
|
+
1. **Behavioral only.** Test observable output \u2014 stdout, stderr, exit codes, files created/modified on disk. Never test internal implementation details or import source modules.
|
|
2185
|
+
|
|
2186
|
+
2. **Use \`execSync\` or \`spawnSync\`.** Invoke the built binary at \`../main-repo/dist/cli.js\` (or whatever the main repo's entry point is). Check \`../main-repo/package.json\` to find the correct entry point if unsure.
|
|
2187
|
+
|
|
2188
|
+
3. **Use vitest.** Import \`describe\`, \`it\`, \`expect\` from \`vitest\`. Use \`beforeEach\`/\`afterEach\` for temp directory setup/teardown.
|
|
2189
|
+
|
|
2190
|
+
4. **Each test is fully independent.** No shared mutable state between tests. Each test that touches the filesystem gets its own temp directory via \`mkdtempSync\`.
|
|
2590
2191
|
|
|
2591
|
-
|
|
2592
|
-
2. **Each test is fully independent.** No shared mutable state between tests.
|
|
2593
|
-
3. **Assert on realistic user actions.** Write tests that reflect what a real user would do.
|
|
2594
|
-
4. **Never import from the parent repo's source.** If you find yourself writing \`import { ... } from '../main-repo/src/...'\`, stop \u2014 that defeats the holdout.
|
|
2192
|
+
5. **Assert on realistic user actions.** Write tests that reflect what a real user would do \u2014 not what the implementation happens to do.
|
|
2595
2193
|
|
|
2596
|
-
|
|
2194
|
+
6. **Never import from the parent repo's source.** If you find yourself writing \`import { ... } from '../main-repo/src/...'\`, stop \u2014 that defeats the holdout.
|
|
2597
2195
|
|
|
2598
|
-
|
|
2196
|
+
## Test File Template
|
|
2599
2197
|
|
|
2600
2198
|
\`\`\`typescript
|
|
2601
|
-
import { spawnSync } from 'node:child_process';
|
|
2602
|
-
import { mkdtempSync, rmSync } from 'node:fs';
|
|
2199
|
+
import { execSync, spawnSync } from 'node:child_process';
|
|
2200
|
+
import { existsSync, mkdtempSync, rmSync, readFileSync } from 'node:fs';
|
|
2603
2201
|
import { tmpdir } from 'node:os';
|
|
2604
2202
|
import { join } from 'node:path';
|
|
2605
2203
|
import { describe, it, expect, beforeEach, afterEach } from 'vitest';
|
|
@@ -2612,122 +2210,39 @@ function runCLI(args: string[], cwd?: string) {
|
|
|
2612
2210
|
cwd: cwd ?? process.cwd(),
|
|
2613
2211
|
env: { ...process.env, NO_COLOR: '1' },
|
|
2614
2212
|
});
|
|
2615
|
-
return {
|
|
2213
|
+
return {
|
|
2214
|
+
stdout: result.stdout ?? '',
|
|
2215
|
+
stderr: result.stderr ?? '',
|
|
2216
|
+
status: result.status ?? 1,
|
|
2217
|
+
};
|
|
2616
2218
|
}
|
|
2617
2219
|
|
|
2618
|
-
describe('[feature area]', () => {
|
|
2220
|
+
describe('[feature area]: [behavior being tested]', () => {
|
|
2619
2221
|
let tmpDir: string;
|
|
2620
|
-
beforeEach(() => { tmpDir = mkdtempSync(join(tmpdir(), 'scenarios-')); });
|
|
2621
|
-
afterEach(() => { rmSync(tmpDir, { recursive: true, force: true }); });
|
|
2622
|
-
|
|
2623
|
-
it('[observable behavior]', () => {
|
|
2624
|
-
const { stdout, status } = runCLI(['command', 'args'], tmpDir);
|
|
2625
|
-
expect(status).toBe(0);
|
|
2626
|
-
expect(stdout).toContain('expected output');
|
|
2627
|
-
});
|
|
2628
|
-
});
|
|
2629
|
-
\`\`\`
|
|
2630
|
-
|
|
2631
|
-
## Backbone: playwright (Web Apps)
|
|
2632
|
-
|
|
2633
|
-
Use when the project is a web application (Next.js, Vite, Nuxt, etc.).
|
|
2634
|
-
|
|
2635
|
-
\`\`\`typescript
|
|
2636
|
-
import { test, expect } from '@playwright/test';
|
|
2637
|
-
|
|
2638
|
-
// Tests run against BASE_URL (configured in playwright.config.ts)
|
|
2639
|
-
// The dev server is started automatically or BASE_URL points to a preview deploy
|
|
2640
|
-
|
|
2641
|
-
test.describe('[feature area]', () => {
|
|
2642
|
-
test('[observable behavior]', async ({ page }) => {
|
|
2643
|
-
await page.goto('/');
|
|
2644
|
-
await expect(page.locator('h1')).toBeVisible();
|
|
2645
|
-
});
|
|
2646
2222
|
|
|
2647
|
-
|
|
2648
|
-
|
|
2649
|
-
await page.fill('[name="email"]', 'test@example.com');
|
|
2650
|
-
await page.click('button[type="submit"]');
|
|
2651
|
-
await expect(page).toHaveURL(/dashboard/);
|
|
2223
|
+
beforeEach(() => {
|
|
2224
|
+
tmpDir = mkdtempSync(join(tmpdir(), 'scenarios-'));
|
|
2652
2225
|
});
|
|
2653
|
-
});
|
|
2654
|
-
\`\`\`
|
|
2655
|
-
|
|
2656
|
-
## Backbone: api (API Backends)
|
|
2657
|
-
|
|
2658
|
-
Use when the project is an API-only backend (Express, FastAPI, etc.).
|
|
2659
|
-
|
|
2660
|
-
\`\`\`typescript
|
|
2661
|
-
import { describe, it, expect } from 'vitest';
|
|
2662
2226
|
|
|
2663
|
-
|
|
2664
|
-
|
|
2665
|
-
describe('[feature area]', () => {
|
|
2666
|
-
it('[endpoint behavior]', async () => {
|
|
2667
|
-
const res = await fetch(\\\`\\\${BASE_URL}/api/endpoint\\\`);
|
|
2668
|
-
expect(res.status).toBe(200);
|
|
2669
|
-
const body = await res.json();
|
|
2670
|
-
expect(body).toHaveProperty('data');
|
|
2227
|
+
afterEach(() => {
|
|
2228
|
+
rmSync(tmpDir, { recursive: true, force: true });
|
|
2671
2229
|
});
|
|
2672
2230
|
|
|
2673
|
-
it('[
|
|
2674
|
-
const
|
|
2675
|
-
expect(
|
|
2231
|
+
it('[specific observable behavior]', () => {
|
|
2232
|
+
const { stdout, status } = runCLI(['command', 'args'], tmpDir);
|
|
2233
|
+
expect(status).toBe(0);
|
|
2234
|
+
expect(stdout).toContain('expected output');
|
|
2676
2235
|
});
|
|
2677
2236
|
});
|
|
2678
2237
|
\`\`\`
|
|
2679
2238
|
|
|
2680
|
-
## Backbone: maestro (Mobile Apps)
|
|
2681
|
-
|
|
2682
|
-
Use when the project is a mobile application (React Native, Flutter, native iOS/Android).
|
|
2683
|
-
|
|
2684
|
-
\`\`\`yaml
|
|
2685
|
-
appId: com.example.myapp
|
|
2686
|
-
name: "[feature area]: [behavior being tested]"
|
|
2687
|
-
tags:
|
|
2688
|
-
- holdout
|
|
2689
|
-
---
|
|
2690
|
-
- launchApp
|
|
2691
|
-
- tapOn: "Sign In"
|
|
2692
|
-
- inputText: "test@example.com"
|
|
2693
|
-
- tapOn: "Submit"
|
|
2694
|
-
- assertVisible: "Welcome"
|
|
2695
|
-
# Use assertWithAI for complex visual assertions:
|
|
2696
|
-
# - assertWithAI: "The dashboard shows a list of recent items"
|
|
2697
|
-
\`\`\`
|
|
2698
|
-
|
|
2699
|
-
## Graceful Degradation
|
|
2700
|
-
|
|
2701
|
-
If the primary backbone tool is not available in this repo, fall back to the next deepest testable layer:
|
|
2702
|
-
|
|
2703
|
-
| Layer | What's Tested | When to Use |
|
|
2704
|
-
|-------|-------------|-------------|
|
|
2705
|
-
| **Layer 4: UI** | Full user flows through browser/simulator | \`@playwright/test\` or Maestro is installed |
|
|
2706
|
-
| **Layer 3: API** | HTTP requests against running server | Server can be started from \`../main-repo\` |
|
|
2707
|
-
| **Layer 2: Logic** | Unit tests via test runner | Test runner (vitest/jest) is available |
|
|
2708
|
-
| **Layer 1: Static** | Build, typecheck, lint | Build toolchain is available |
|
|
2709
|
-
|
|
2710
|
-
**Fallback rules:**
|
|
2711
|
-
- If backbone is \`playwright\` but \`@playwright/test\` is NOT in this repo's \`package.json\`: fall back to \`api\` (fetch-based HTTP tests)
|
|
2712
|
-
- If backbone is \`maestro\` but no simulator context is available: fall back to \`api\` if a server can be started, else \`native\`
|
|
2713
|
-
- If backbone is \`api\` but no server start script exists: fall back to \`native\`
|
|
2714
|
-
- \`native\` is always available as the floor
|
|
2715
|
-
|
|
2716
|
-
Start each test file with a comment indicating the testing layer:
|
|
2717
|
-
\`// Testing Layer: [4|3|2|1] - [UI|API|Logic|Static]\`
|
|
2718
|
-
|
|
2719
|
-
If you fell back from the intended backbone, note this in your commit message:
|
|
2720
|
-
\`scenarios: [action] for [spec] (layer: [N], reason: [why])\`
|
|
2721
|
-
|
|
2722
2239
|
## Checklist Before Committing
|
|
2723
2240
|
|
|
2724
2241
|
- [ ] Decision: SKIP / NEW / UPDATE (and why)
|
|
2725
|
-
- [ ] Correct backbone selected (or fallback justified)
|
|
2726
2242
|
- [ ] Tests assert on observable behavior, not implementation
|
|
2727
2243
|
- [ ] No imports from \`../main-repo/src\`
|
|
2728
|
-
- [ ] Each test
|
|
2729
|
-
- [ ] File
|
|
2730
|
-
- [ ] Testing layer comment at top of file
|
|
2244
|
+
- [ ] Each test has its own temp directory if it touches the filesystem
|
|
2245
|
+
- [ ] File is named after the feature area, not the spec
|
|
2731
2246
|
`,
|
|
2732
2247
|
"scenarios/workflows/generate.yml": `# Scenario Generation Workflow
|
|
2733
2248
|
#
|
|
@@ -2827,9 +2342,7 @@ jobs:
|
|
|
2827
2342
|
## Context
|
|
2828
2343
|
|
|
2829
2344
|
Existing test files in this repo: \${{ steps.context.outputs.existing_tests }}
|
|
2830
|
-
Existing spec mirrors: \${{ steps.context.outputs.existing_specs }}
|
|
2831
|
-
|
|
2832
|
-
Testing backbone: \${{ github.event.client_payload.testing_backbone || 'native' }}"
|
|
2345
|
+
Existing spec mirrors: \${{ steps.context.outputs.existing_specs }}"
|
|
2833
2346
|
|
|
2834
2347
|
# \u2500\u2500 7. Commit any changes the agent made \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
|
|
2835
2348
|
- name: Commit scenario changes
|
|
@@ -3398,48 +2911,6 @@ jobs:
|
|
|
3398
2911
|
-f "client_payload[repo]=\${{ github.repository }}"
|
|
3399
2912
|
|
|
3400
2913
|
done <<< "\${{ steps.changed.outputs.files }}"
|
|
3401
|
-
`,
|
|
3402
|
-
"GOLDEN_EXAMPLE_TEMPLATE.md": `# [Feature Name] \u2014 Golden Example
|
|
3403
|
-
|
|
3404
|
-
> **Date:** YYYY-MM-DD
|
|
3405
|
-
> **Project:** [project name]
|
|
3406
|
-
> **Source Brief:** \`docs/briefs/YYYY-MM-DD-feature-name.md\`
|
|
3407
|
-
|
|
3408
|
-
---
|
|
3409
|
-
|
|
3410
|
-
## Capture
|
|
3411
|
-
|
|
3412
|
-
The original user request or description that initiated this feature. Copied verbatim or lightly edited from the brief's Vision section.
|
|
3413
|
-
|
|
3414
|
-
> [Paste the original capture text here \u2014 what the user said/typed that kicked off the pipeline]
|
|
3415
|
-
|
|
3416
|
-
## Classification
|
|
3417
|
-
|
|
3418
|
-
- **Action Level:** [interview | decompose | execute | research | design]
|
|
3419
|
-
- **Confidence:** [high | medium | low]
|
|
3420
|
-
- **Skills Used:** [comma-separated list of Joycraft skills invoked, e.g., joycraft-new-feature, joycraft-decompose]
|
|
3421
|
-
|
|
3422
|
-
## Decomposition Summary
|
|
3423
|
-
|
|
3424
|
-
The resulting spec breakdown from this capture:
|
|
3425
|
-
|
|
3426
|
-
| # | Spec Name | Description | Size |
|
|
3427
|
-
|---|-----------|-------------|------|
|
|
3428
|
-
| 1 | [spec-name] | [one sentence] | [S/M/L] |
|
|
3429
|
-
|
|
3430
|
-
## Rationale
|
|
3431
|
-
|
|
3432
|
-
2-3 sentences explaining why this classification was correct for this capture. What signals in the capture text indicated this action level? What would have gone wrong with a different classification?
|
|
3433
|
-
|
|
3434
|
-
---
|
|
3435
|
-
|
|
3436
|
-
## Template Usage Notes
|
|
3437
|
-
|
|
3438
|
-
**This template is for Pipit golden examples.** Golden examples are auto-generated by Joycraft's session-end skill after a successful pipeline run. They provide few-shot examples that improve Pipit's level classifier over time.
|
|
3439
|
-
|
|
3440
|
-
**Do not edit generated examples** unless the classification was wrong. If it was wrong, correct the Classification section \u2014 this teaches Pipit the right answer.
|
|
3441
|
-
|
|
3442
|
-
**One example per pipeline run.** Each successful interview \u2192 brief \u2192 specs \u2192 execution cycle produces one golden example.
|
|
3443
2914
|
`
|
|
3444
2915
|
};
|
|
3445
2916
|
var CODEX_SKILLS = {
|
|
@@ -4905,4 +4376,4 @@ export {
|
|
|
4905
4376
|
TEMPLATES,
|
|
4906
4377
|
CODEX_SKILLS
|
|
4907
4378
|
};
|
|
4908
|
-
//# sourceMappingURL=chunk-
|
|
4379
|
+
//# sourceMappingURL=chunk-QU5VHXMV.js.map
|