npm - joycraft - Versions diffs - 0.6.15 → 0.6.16 - Mend

joycraft 0.6.15 → 0.6.16

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

package/README.md +31 -13
package/dist/{chunk-YE4LWG2O.js → chunk-G6WSFZQG.js} +2 -2
package/dist/chunk-G6WSFZQG.js.map +1 -0
package/dist/{chunk-XOMQIK4U.js → chunk-UEG5IO6Q.js} +6 -184
package/dist/chunk-UEG5IO6Q.js.map +1 -0
package/dist/cli.js +4 -4
package/dist/{init-LXSMLAY5.js → init-WPKDBQDN.js} +141 -55
package/dist/init-WPKDBQDN.js.map +1 -0
package/dist/{init-autofix-OZW5ITFI.js → init-autofix-ESN27L3W.js} +5 -3
package/dist/{init-autofix-OZW5ITFI.js.map → init-autofix-ESN27L3W.js.map} +1 -1
package/dist/{upgrade-P3JZS7NM.js → upgrade-LKX25GTT.js} +3 -3
package/package.json +1 -1
package/dist/chunk-XOMQIK4U.js.map +0 -1
package/dist/chunk-YE4LWG2O.js.map +0 -1
package/dist/init-LXSMLAY5.js.map +0 -1
/package/dist/{upgrade-P3JZS7NM.js.map → upgrade-LKX25GTT.js.map} +0 -0

package/dist/{chunk-XOMQIK4U.js → chunk-UEG5IO6Q.js} RENAMED Viewed

@@ -10,7 +10,7 @@ var SKILLS = {
   "joycraft-design.md": '---\nname: joycraft-design\ndescription: Design discussion before decomposition \u2014 produce a ~200-line design artifact for human review, catching wrong assumptions before they propagate into specs\n---\n\n# Design Discussion\n\nYou are producing a design discussion document for a feature. This sits between research and decomposition \u2014 it captures your understanding so the human can catch wrong assumptions before specs are written.\n\n**Guard clause:** If no brief path is provided and no brief exists at `docs/features/<slug>/brief.md`, say:\n"No feature brief found. Run `/joycraft-new-feature` first to create one, or provide the path to your brief."\nThen stop.\n\n---\n\n## Step 1: Read Inputs\n\nRead the feature brief at the path the user provides. If the user also provides a research document path, read that too. Research is optional \u2014 if none exists, note that you\'ll explore the codebase directly.\n\n## Step 2: Explore the Codebase\n\nSpawn subagents to explore the codebase for patterns relevant to the brief. Focus on:\n\n- Files and functions that will be touched or extended\n- Existing patterns this feature should follow (naming, data flow, error handling)\n- Similar features already implemented that serve as models\n- Boundaries and interfaces the feature must integrate with\n\nGather file paths, function signatures, and code snippets. You need concrete evidence, not guesses.\n\n## Step 3: Write the Design Document\n\nDerive the slug from the brief path (`docs/features/<slug>/brief.md`).\nLazy-create the folder `docs/features/<slug>/` if needed.\nWrite the design document to `docs/features/<slug>/design.md`.\n\nThe file MUST start with YAML frontmatter \u2014 the 4-field personal schema:\n\n```yaml\n---\nstatus: active\nowner: <resolved name>\ncreated: YYYY-MM-DD\nfeature: <slug>\n---\n```\n\n**Owner resolution:** look up the owner name in this order \u2014 (1) `git config user.name`, (2) value in your auto-memory `joycraft-owner.txt` if present, (3) ask the user once and persist.\n\nThe document has exactly five sections:\n\n### Section 1: Current State\n\nWhat exists today in the codebase that is relevant to this feature. Include file paths, function signatures, and data flows. Be specific \u2014 reference actual code, not abstractions. If no research doc was provided, note that and describe what you found through direct exploration.\n\n### Section 2: Desired End State\n\nWhat the codebase should look like when this feature is complete. Describe the change at a high level \u2014 new files, modified interfaces, new data flows. Do NOT include implementation steps. This is the "what," not the "how."\n\n### Section 3: Patterns to Follow\n\nExisting patterns in the codebase that this feature should match. Include short code snippets and `file:line` references. Show the pattern, don\'t just name it.\n\nIf this is a greenfield project with no existing patterns, propose conventions and note that no precedent exists.\n\n### Section 4: Resolved Design Decisions\n\nDecisions you have already made, with brief rationale. Format each as:\n\n> **Decision:** [what you decided]\n> **Rationale:** [why, referencing existing code or constraints]\n> **Alternative rejected:** [what you considered and why you rejected it]\n\n### Section 5: Open Questions\n\nThings you don\'t know or where multiple valid approaches exist. Each question MUST present 2-3 concrete options with pros and cons. Format:\n\n> **Q: [question]**\n> - **Option A:** [description] \u2014 Pro: [benefit]. Con: [cost].\n> - **Option B:** [description] \u2014 Pro: [benefit]. Con: [cost].\n> - **Option C (if applicable):** [description] \u2014 Pro: [benefit]. Con: [cost].\n\nDo NOT ask vague questions like "what do you think?" Every question must have actionable options the human can choose from.\n\n### Update the Feature Brief\n\nAfter writing the design document, update the parent brief with a back-reference:\n1. Read `docs/features/<slug>/brief.md`\n2. In the header blockquote (the `>` lines at the top), add or update:\n   `> **Design:** docs/features/<slug>/design.md`\n3. If a `> **Design:**` line already exists, replace it \u2014 do NOT add a duplicate\n4. Write the brief back\n\n## Step 4: Reconcile Brief with Findings\n\nYou\'ve just written `docs/features/<slug>/design.md`. Before hand-off, the parent brief at `docs/features/<slug>/brief.md` may now disagree with what you discovered. Re-read it and check each of these sections:\n\n| Brief section | What to look for |\n|---|---|\n| Vision | Did your findings refine or contradict the framing? |\n| Hard Constraints | Are any constraints now obsolete, missing, or refined? |\n| Out of Scope | Did your findings push something in or out of scope? |\n| Decomposition | Are spec counts, names, or dependencies still accurate? |\n| Test Strategy | Do your findings change what or how to test? |\n| Success Criteria | Are the criteria still observable and still match the goal? |\n\n**For each section, choose one:**\n\n- **Edit in place** \u2014 small, mechanical updates: line-number corrections, clarifications, additions consistent with brief intent. No user approval needed.\n- **Diff + stop** \u2014 non-trivial changes: counts flipping, decomposition restructure, scope changes, contradiction with original brief intent. Present a diff of the proposed change, STOP, and wait for user approval before continuing.\n\nIf you make changes, note them at the bottom of `design.md` under a "Brief updates" subsection. If the brief is already in sync, note: "Reconciliation checked, no changes required." If no parent brief exists (feature was described inline), note that and skip this step.\n\n**Why this step exists:** the silent-drift gap. Without reconciliation, the brief and downstream artifacts diverge \u2014 and later decomposition is sized against the stale brief. This feature ("single-source-skills") hit exactly this: brief said "11 clean / 9 dirty" until the research re-audit forced a re-decomposition. Don\'t let it happen again.\n\n## Step 5: Present and STOP \u2014 Pre-Approval Hold\n\nPresent the design document to the user. Say:\n\n```\nDesign discussion written to docs/features/<slug>/design.md\n\nPlease review the document above. Specifically:\n1. Are the patterns in Section 3 the right ones to follow, or should I use different ones?\n2. Do you agree with the resolved decisions in Section 4?\n3. Pick an option for each open question in Section 5 (or propose your own).\n\nReply with your feedback. I will NOT proceed to decomposition until you have reviewed and approved this design.\n```\n\n**CRITICAL: Do NOT emit the canonical Handoff block at this point.** The Handoff block emits ONLY after human approval (see "Step 6: Hand Off (Post-Approval Only)" below). The entire value of this skill is the pause \u2014 it forces a human checkpoint before mistakes propagate.\n\n## Offer to Capture Deferred Items to Backlog\n\nIf during the design discussion the user mentions deferred work \u2014 "let\'s not do X yet," "save Y for later" \u2014 ASK before writing:\n\n> "This looks like deferred work \u2014 want me to capture it to `docs/backlog/`?"\n\nOnly on user confirmation, write a backlog entry at `docs/backlog/YYYY-MM-DD-<short-name>.md` with backlog frontmatter:\n\n```yaml\n---\nstatus: backlog\nowner: <resolved name>\ncreated: YYYY-MM-DD\nsource: docs/features/<slug>/brief.md\n---\n```\n\n**Never auto-write to `docs/backlog/`.** Every backlog entry is user-confirmed.\n\n## Step 6: Hand Off (Post-Approval Only)\n\nOnce the human approves the design:\n- Update the design document with their corrections and chosen options\n- Move answered questions from "Open Questions" to "Resolved Design Decisions"\n- Present the updated document for final confirmation\n- Once the user gives explicit approval, AND ONLY THEN, emit the canonical Handoff block:\n\n## Recommended Next Steps\n\nNext:\n```bash\n/joycraft-decompose docs/features/<slug>/brief.md\n```\nRun /clear first.\n\nInclude any backlog paths produced as a side effect.\n',
   "joycraft-gather-context.md": "---\nname: joycraft-gather-context\ndescription: First-run onboarding pass that populates the project context layer -- read what context already exists, then offer a gap-only interview and batch-write the missing fact rows and long-form reference docs\ninstructions: 40\n---\n\n# Gather Context\n\nThis is the first-run **read-then-offer** onboarding pass \u2014 the lowest-intervention way to populate the project's context layer. You read what context already exists, summarize coverage, offer a gap-only interview, and write everything in one reviewable batch at the end.\n\nThis skill is self-contained. It composes the same conventions the single-doc skills use, but everything you need is inlined below \u2014 do not call into or import another skill's logic.\n\n## Step 1: Read What Already Exists First\n\nThe user has invoked the first-run onboarding pass (e.g., `/joycraft-gather-context`). Before asking the user anything, scan the project's existing context. Default scan breadth is **README + `docs/` + CLAUDE.md only**:\n\n- The README(s) at the repo root and any obvious sub-package READMEs.\n- `docs/**` \u2014 existing design, architecture, or style docs.\n- `docs/context/*` \u2014 the flat operational fact-docs (production-map, dangerous-assumptions, decision-log, institutional-knowledge, troubleshooting) and `docs/context/reference/*` long-form docs.\n- The current CLAUDE.md content, including any `## Context Map` section.\n\nThen summarize for the user what context already exists and what's covered.\n\n**Do NOT auto-run a code-inference scan.** Reading the actual source to infer architecture costs significantly more tokens. Offer that deeper/full review ONLY if the user explicitly asks for it, and when you do, note clearly that it costs more tokens. The default pass never reads the codebase to infer context.\n\n## Step 2: Offer a Gap-Only Interview (Don't Force)\n\nFrom the summary, identify genuine gaps: no design-system doc? no production map? no decision log? Offer an **optional** interview that targets only those gaps. The user can decline any or all of it \u2014 offer, never force.\n\n**Per-doc skip guard (not all-or-nothing):** Never re-interview for a doc that already has real content. Skip each doc that's already populated individually, and interview only the empty or missing ones. If everything is already covered, say so and offer nothing.\n\n## Step 3: Route by Shape (Inline Test)\n\nFor each thing the user wants to capture, apply this minimal shape test inline \u2014 do not defer to another skill:\n\n- **\"Could this be one row in a table?\"** \u2192 it's an **operational fact**. Route it to one of the five flat fact-docs under `docs/context/`:\n  - `docs/context/production-map.md` \u2014 infrastructure, services, environments, URLs, credentials, safe/unsafe to touch.\n  - `docs/context/dangerous-assumptions.md` \u2014 false assumptions an agent might make.\n  - `docs/context/decision-log.md` \u2014 an architectural/tooling choice and why.\n  - `docs/context/institutional-knowledge.md` \u2014 team conventions, unwritten rules, ownership.\n  - `docs/context/troubleshooting.md` \u2014 when X happens, do Y.\n  Append it as a table row (or list item for institutional-knowledge), removing any italic example rows in that table first.\n\n- **\"Does explaining it take paragraphs?\"** \u2192 it's **long-form reference**. Scaffold `docs/context/reference/<slug>.md` from the matching template in `docs/templates/context/reference/` (`design-system`, `frontend-methodology`, `backend`, `testing`, or the generic `reference-doc` fallback), lazy-creating `docs/context/reference/` on first write.\n\nIf an item is ambiguous, apply the test literally: one row \u2192 fact bucket; paragraphs \u2192 reference doc.\n\n## Step 4: Batch-Write + One Final Confirm\n\nDo NOT write per-answer. Collect ALL of the user's gap answers across the whole interview first. Then, in ONE batch:\n\n1. Write all the fact rows into their fact-docs.\n2. Scaffold and write all the reference docs into `docs/context/reference/`.\n3. Add or update the `## Context Map` pointer rows in CLAUDE.md \u2014 one row per reference doc, in the form `| docs/context/reference/<slug>.md | <when to read it> |`. Create the `## Context Map` section (header + two-column table) if it doesn't exist; update an existing row in place rather than duplicating it.\n\nPresent the full set of intended changes and get ONE final confirm (\"do it in one go\") before writing. If the user aborts at the final confirm, write nothing \u2014 there are no partial writes in this batch model. The result is one clean, reviewable diff.\n\n## Step 5: Confirm and Hand Off\n\nReport the batch: which fact rows were added, which reference docs were scaffolded, and which Context Map rows were created or updated. Then end with the canonical Handoff block.\n\n## Recommended Next Steps\n\nNext:\n```bash\n/joycraft-session-end\n```\nRun /clear first.\n",
   "joycraft-implement-feature.md": "---\nname: joycraft-implement-feature\ndescription: Run a feature's entire spec queue from one invocation \u2014 fresh-context subagent per spec, fail-fast, session-end once at the end\ninstructions: 24\n---\n\n# Implement Feature (Whole-Queue Driver)\n\nOne invocation runs a feature's whole spec queue: `/joycraft-implement-feature docs/features/<slug>/`. You are the **driver** \u2014 you orchestrate; you do **not** implement specs in this conversation. Each spec runs in a **fresh-context subagent**: the subagent boundary is the context isolation, the in-session equivalent of Pi's process-per-spec loop. This is ordinary interactive use of your harness \u2014 one human invocation, no headless loop, no ToS/cost caveat.\n\n## Step 1: Load the Queue\n\n1. Resolve the specs directory: if the given path contains a `specs/` subdirectory, use it; otherwise use the path itself. Look for `.joycraft-spec-queue.json` there.\n2. **No queue** \u2192 stop:\n\n   > No spec queue found in [path]. Run `/joycraft-decompose` first \u2014 it writes the queue, the specs, and the wave plan.\n\n3. Read the sibling `README.md` (the wave plan written by `/joycraft-decompose`) \u2014 it tells you the intended order and which waves, if any, are marked **parallel-safe**.\n4. Report the plan before starting: feature slug, M specs, current statuses, the order you'll run them in.\n5. If **no `todo` specs remain**, skip to Step 4 and say why (everything is already `in-review`/`done`).\n\n## Step 2: The Loop \u2014 One Subagent per Spec\n\nRepeat until no `todo` specs remain:\n\n1. **Find the next ready spec**: the first `todo` whose `depends_on` are all `in-review`/`done`. Use `.pi/scripts/joycraft/joycraft-next-spec <specs-dir>` if installed, else read the queue JSON directly.\n2. **None ready but `todo` specs remain** \u2192 fail-fast (Step 3): report which specs are blocked and on what. Never run a spec whose dependencies are unmet.\n3. **Spawn one subagent** for the spec, with a prompt of this shape (fill in the concrete paths \u2014 the subagent starts with zero context):\n\n   > Implement exactly one atomic spec: `<spec-path>`.\n   > 1. Read `.claude/skills/joycraft-implement/SKILL.md` and follow it for this spec \u2014 strict TDD (write the Test Plan's tests first, confirm they fail, implement until green), every Acceptance Criterion met. IMPORTANT: skip that skill's \"continue the queue\" step \u2014 you own exactly this one spec.\n   > 2. Then perform the per-spec wrap-up defined in `.claude/skills/joycraft-spec-done/SKILL.md`: bump the spec to `in-review` in BOTH `.joycraft-spec-queue.json` and the spec file's `status:` frontmatter; write a 2-line discovery stub at `docs/discoveries/` ONLY if something contradicted the spec; commit as `spec: <spec-name>`. Do NOT push, do NOT open a PR, do NOT run session-end, do NOT touch other specs.\n   > 3. Reply with: tests written and passing (counts), each Acceptance Criterion's status, the commit hash, and the discovery stub path if any. If you could not get tests green, say so explicitly and DO NOT bump the status or commit a broken state.\n\n4. **Verify, don't trust**: when the subagent returns, confirm in the queue JSON that the spec is `in-review` and in `git log` that the `spec: <name>` commit exists. Both present \u2192 continue the loop. Either missing, or the subagent reported failure \u2192 fail-fast (Step 3).\n\n**Sequential by default.** Run a wave's specs in parallel ONLY when both hold: the README marks that wave **parallel-safe** (disjoint Affected Files), AND the user asked for parallelism. Never parallelize an unmarked wave \u2014 concurrent edits to shared files produce exactly the conflicts the wave plan exists to prevent.\n\n## Step 3: Fail-Fast\n\nWhen a spec fails (tests not green, wrap-up missing, subagent reports failure, or all remaining specs are blocked):\n\n- **Stop the loop.** Start no further specs.\n- Report: which spec failed and why, what reached `in-review`, what remains `todo`. Leave the queue exactly as it is \u2014 never mark anything to cover a failure.\n- Suggest the recovery path: investigate in a fresh conversation with `/joycraft-implement <failed-spec>`, then re-run `/joycraft-implement-feature` to finish the remainder.\n\n## Step 4: Finish \u2014 Session-End Once\n\nWhen no `todo` specs remain, run the once-per-feature finisher yourself, in this conversation: invoke `/joycraft-session-end` (or read and follow `.claude/skills/joycraft-session-end/SKILL.md`). It owns the gates the loop deliberately skipped: full validation (must pass before anything graduates `in-review \u2192 done`), discovery consolidation, and push/PR per the project's CLAUDE.md git autonomy rules.\n\n## Final Report\n\n```\nFeature run: <slug>\n- Specs completed: N of M (now in-review/done) \xB7 failures: [none | <spec> \u2014 <reason>]\n- Session-end: [ran \u2014 see its report | skipped: <reason>]\n- Discoveries: [n stubs consolidated | none]\n```\n",
-  "joycraft-implement-level5.md": "---\nname: joycraft-implement-level5\ndescription: Set up Level 5 autonomous development \u2014 autofix loop, holdout scenario testing, and scenario evolution from specs\ninstructions: 35\n---\n\n# Implement Level 5 \u2014 Autonomous Development Loop\n\nYou are guiding the user through setting up Level 5: the autonomous feedback loop where specs go in, validated software comes out. This is a one-time setup that installs workflows, creates a scenarios repo, and configures the autofix loop.\n\n## Before You Begin\n\nCheck prerequisites:\n\n1. **Project must be initialized.** Look for `docs/.joycraft/state.json` (older installs may still have it at the legacy `.claude/.joycraft/state.json` or a `.joycraft-version` at the repo root). If none exist, tell the user to run `npx joycraft init` first.\n2. **Project should be at Level 4.** Check `docs/joycraft-assessment.md` if it exists. If the project hasn't been assessed yet, suggest running `/joycraft-tune` first. But don't block \u2014 the user may know they're ready.\n3. **Git repo with GitHub remote.** This setup requires GitHub Actions. Check for `.git/` and a GitHub remote.\n\nIf prerequisites aren't met, explain what's needed and stop.\n\n## Step 1: Explain What Level 5 Means\n\nTell the user:\n\n> Level 5 is the autonomous loop. When you push specs, three things happen automatically:\n>\n> 1. **Scenario evolution** \u2014 A separate AI agent reads your specs and writes holdout tests in a private scenarios repo. These tests are invisible to your coding agent.\n> 2. **Autofix** \u2014 When CI fails on a PR, Claude Code automatically attempts a fix (up to 3 times).\n> 3. **Holdout validation** \u2014 When CI passes, your scenarios repo runs behavioral tests against the PR. Results post as PR comments.\n>\n> The key insight: your coding agent never sees the scenario tests. This prevents it from gaming the test suite \u2014 like a validation set in machine learning.\n\n## Step 2: Gather Configuration\n\nAsk these questions **one at a time**:\n\n### Question 1: Scenarios repo name\n\n> What should we call your scenarios repo? It'll be a private repo that holds your holdout tests.\n>\n> Default: `{current-repo-name}-scenarios`\n\nAccept the default or the user's choice.\n\n### Question 2: GitHub App\n\n> Level 5 needs a GitHub App to provide a separate identity for autofix pushes (this avoids GitHub's anti-recursion protection). Creating one takes about 2 minutes:\n>\n> 1. Go to https://github.com/settings/apps/new\n> 2. Give it a name (e.g., \"My Project Autofix\")\n> 3. Uncheck \"Webhook > Active\" (not needed)\n> 4. Under **Repository permissions**, set:\n>    - **Contents**: Read & Write\n>    - **Pull requests**: Read & Write\n>    - **Actions**: Read & Write\n> 5. Click **Create GitHub App**\n> 6. Note the **App ID** from the settings page\n> 7. Scroll to **Private keys** > click **Generate a private key** > save the `.pem` file\n> 8. Click **Install App** in the left sidebar > install it on your repo\n>\n> What's your App ID?\n\n## Step 3: Run init-autofix\n\nRun the CLI command with the gathered configuration:\n\n```bash\nnpx joycraft init-autofix --scenarios-repo {name} --app-id {id}\n```\n\nReview the output with the user. Confirm files were created.\n\n## Step 4: Walk Through Secret Configuration\n\nGuide the user step by step:\n\n### 4a: Add Secrets to Main Repo\n\n> You should already have the `.pem` file from when you created the app in Step 2.\n\n> Go to your repo's Settings > Secrets and variables > Actions, and add:\n> - `JOYCRAFT_APP_PRIVATE_KEY` \u2014 paste the contents of your `.pem` file\n> - `ANTHROPIC_API_KEY` \u2014 your Anthropic API key\n\n### 4b: Create the Scenarios Repo\n\n> Create the private scenarios repo:\n> ```bash\n> gh repo create {scenarios-repo-name} --private\n> ```\n>\n> Then copy the scenario templates into it:\n> ```bash\n> cp -r docs/templates/scenarios/* ../{scenarios-repo-name}/\n> cd ../{scenarios-repo-name}\n> git add -A && git commit -m \"init: scaffold scenarios repo from Joycraft\"\n> git push\n> ```\n\n### 4c: Add Secrets to Scenarios Repo\n\n> The scenarios repo also needs the App private key:\n> - `JOYCRAFT_APP_PRIVATE_KEY` \u2014 same `.pem` file as the main repo\n> - `ANTHROPIC_API_KEY` \u2014 same key (needed for scenario generation)\n\n## Step 5: Verify Setup\n\nHelp the user verify everything is wired correctly:\n\n1. **Check workflow files exist:** `ls .github/workflows/autofix.yml .github/workflows/scenarios-dispatch.yml .github/workflows/spec-dispatch.yml .github/workflows/scenarios-rerun.yml`\n2. **Check scenario templates were copied:** Verify the scenarios repo has `example-scenario.test.ts`, `workflows/run.yml`, `workflows/generate.yml`, `prompts/scenario-agent.md`\n3. **Check the App ID is correct** in the workflow files (not still a placeholder)\n\n## Step 6: Update CLAUDE.md\n\nIf the project's CLAUDE.md doesn't already have an \"External Validation\" section, add one:\n\n> ## External Validation\n>\n> This project uses holdout scenario tests in a separate private repo.\n>\n> ### NEVER\n> - Access, read, or reference the scenarios repo\n> - Mention scenario test names or contents\n> - Modify the scenarios dispatch workflow to leak test information\n>\n> The scenarios repo is deliberately invisible to you. This is the holdout guarantee.\n\n## Step 7: First Test (Optional)\n\nIf the user wants to test the loop:\n\n> Want to do a quick test? Here's how:\n>\n> 1. Write a simple spec in `docs/features/<slug>/specs/` and push to main \u2014 this triggers scenario generation\n> 2. Create a PR with a small change \u2014 when CI passes, scenarios will run\n> 3. Watch for the scenario test results as a PR comment\n>\n> Or deliberately break something in a PR to test the autofix loop.\n\n## Step 8: Summary\n\nPrint a summary of what was set up:\n\n> **Level 5 is live.** Here's what's running:\n>\n> | Trigger | What Happens |\n> |---------|-------------|\n> | Push specs to `docs/features/<slug>/specs/` | Scenario agent writes holdout tests |\n> | PR fails CI | Claude autofix attempts (up to 3x) |\n> | PR passes CI | Holdout scenarios run against PR |\n> | Scenarios update | Open PRs re-tested with latest scenarios |\n>\n> Your scenarios repo: `{name}`\n> Your coding agent cannot see those tests. The holdout wall is intact.\n\n**Important:** Tell the user:\n\n> **Before you can test the loop**, you need to merge this PR to main first. GitHub's `workflow_run` triggers only activate for workflows that exist on the default branch. Once merged, create a new PR with any small change \u2014 that's when you'll see Autofix, Scenarios Dispatch, and Spec Dispatch fire for the first time.\n\nUpdate `docs/joycraft-assessment.md` if it exists \u2014 set the Level 5 score to reflect the new setup.\n",
+  "joycraft-implement-level5.md": "---\nname: joycraft-implement-level5\ndescription: Set up Level 5 autonomous development \u2014 autofix loop, holdout scenario testing, and scenario evolution from specs\ninstructions: 35\n---\n\n# Implement Level 5 \u2014 Autonomous Development Loop\n\nYou are guiding the user through setting up Level 5: the autonomous feedback loop where specs go in, validated software comes out. This is a one-time setup that installs workflows, creates a scenarios repo, and configures the autofix loop.\n\n## Before You Begin\n\nCheck prerequisites:\n\n1. **Project must be initialized.** Look for `docs/.joycraft/state.json` (older installs may still have it at the legacy `.claude/.joycraft/state.json` or a `.joycraft-version` at the repo root). If none exist, tell the user to run `npx joycraft init` first.\n2. **Project should be at Level 4.** Check `docs/joycraft-assessment.md` if it exists. If the project hasn't been assessed yet, suggest running `/joycraft-tune` first. But don't block \u2014 the user may know they're ready.\n3. **Git repo with GitHub remote.** This setup requires GitHub Actions. Check for `.git/` and a GitHub remote.\n\nIf prerequisites aren't met, explain what's needed and stop.\n\n## Step 1: Explain What Level 5 Means\n\nTell the user:\n\n> Level 5 is the autonomous loop. When you push specs, three things happen automatically:\n>\n> 1. **Scenario evolution** \u2014 A separate AI agent reads your specs and writes holdout tests in a private scenarios repo. These tests are invisible to your coding agent.\n> 2. **Autofix** \u2014 When CI fails on a PR, Claude Code automatically attempts a fix (up to 3 times).\n> 3. **Holdout validation** \u2014 When CI passes, your scenarios repo runs behavioral tests against the PR. Results post as PR comments.\n>\n> The key insight: your coding agent never sees the scenario tests. This prevents it from gaming the test suite \u2014 like a validation set in machine learning.\n\n## Step 2: Gather Configuration\n\nAsk these questions **one at a time**:\n\n### Question 1: Scenarios repo name\n\n> What should we call your scenarios repo? It'll be a private repo that holds your holdout tests.\n>\n> Default: `{current-repo-name}-scenarios`\n\nAccept the default or the user's choice.\n\n### Question 2: GitHub App\n\n> Level 5 needs a GitHub App to provide a separate identity for autofix pushes (this avoids GitHub's anti-recursion protection). Creating one takes about 2 minutes:\n>\n> 1. Go to https://github.com/settings/apps/new\n> 2. Give it a name (e.g., \"My Project Autofix\")\n> 3. Uncheck \"Webhook > Active\" (not needed)\n> 4. Under **Repository permissions**, set:\n>    - **Contents**: Read & Write\n>    - **Pull requests**: Read & Write\n>    - **Actions**: Read & Write\n> 5. Click **Create GitHub App**\n> 6. Note the **App ID** from the settings page\n> 7. Scroll to **Private keys** > click **Generate a private key** > save the `.pem` file\n> 8. Click **Install App** in the left sidebar > install it on your repo\n>\n> What's your App ID?\n\n## Step 3: Run init-autofix\n\nRun the CLI command with the gathered configuration:\n\n```bash\nnpx joycraft init-autofix --scenarios-repo {name} --app-id {id}\n```\n\nReview the output with the user. Confirm files were created.\n\n## Step 4: Walk Through Secret Configuration\n\nGuide the user step by step:\n\n### 4a: Add Secrets to Main Repo\n\n> You should already have the `.pem` file from when you created the app in Step 2.\n\n> Go to your repo's Settings > Secrets and variables > Actions, and add:\n> - `JOYCRAFT_APP_PRIVATE_KEY` \u2014 paste the contents of your `.pem` file\n> - `ANTHROPIC_API_KEY` \u2014 your Anthropic API key\n\n### 4b: Create the Scenarios Repo\n\n> Create the private scenarios repo:\n> ```bash\n> gh repo create {scenarios-repo-name} --private\n> ```\n>\n> Then copy the scenario templates into it. The starter ships as\n> `example-scenario.test.ts.template` (the `.template` suffix keeps it out of\n> the *main* project's test/lint/build globs); rename it to `.test.ts` once it's\n> in the holdout repo so Vitest discovers it:\n> ```bash\n> cp -r docs/templates/scenarios/* ../{scenarios-repo-name}/\n> cd ../{scenarios-repo-name}\n> mv example-scenario.test.ts.template example-scenario.test.ts\n> git add -A && git commit -m \"init: scaffold scenarios repo from Joycraft\"\n> git push\n> ```\n\n### 4c: Add Secrets to Scenarios Repo\n\n> The scenarios repo also needs the App private key:\n> - `JOYCRAFT_APP_PRIVATE_KEY` \u2014 same `.pem` file as the main repo\n> - `ANTHROPIC_API_KEY` \u2014 same key (needed for scenario generation)\n\n## Step 5: Verify Setup\n\nHelp the user verify everything is wired correctly:\n\n1. **Check workflow files exist:** `ls .github/workflows/autofix.yml .github/workflows/scenarios-dispatch.yml .github/workflows/spec-dispatch.yml .github/workflows/scenarios-rerun.yml`\n2. **Check scenario templates were copied:** Verify the scenarios repo has `example-scenario.test.ts` (renamed from the `.template` starter), `workflows/run.yml`, `workflows/generate.yml`, `prompts/scenario-agent.md`\n3. **Check the App ID is correct** in the workflow files (not still a placeholder)\n\n## Step 6: Update CLAUDE.md\n\nIf the project's CLAUDE.md doesn't already have an \"External Validation\" section, add one:\n\n> ## External Validation\n>\n> This project uses holdout scenario tests in a separate private repo.\n>\n> ### NEVER\n> - Access, read, or reference the scenarios repo\n> - Mention scenario test names or contents\n> - Modify the scenarios dispatch workflow to leak test information\n>\n> The scenarios repo is deliberately invisible to you. This is the holdout guarantee.\n\n## Step 7: First Test (Optional)\n\nIf the user wants to test the loop:\n\n> Want to do a quick test? Here's how:\n>\n> 1. Write a simple spec in `docs/features/<slug>/specs/` and push to main \u2014 this triggers scenario generation\n> 2. Create a PR with a small change \u2014 when CI passes, scenarios will run\n> 3. Watch for the scenario test results as a PR comment\n>\n> Or deliberately break something in a PR to test the autofix loop.\n\n## Step 8: Summary\n\nPrint a summary of what was set up:\n\n> **Level 5 is live.** Here's what's running:\n>\n> | Trigger | What Happens |\n> |---------|-------------|\n> | Push specs to `docs/features/<slug>/specs/` | Scenario agent writes holdout tests |\n> | PR fails CI | Claude autofix attempts (up to 3x) |\n> | PR passes CI | Holdout scenarios run against PR |\n> | Scenarios update | Open PRs re-tested with latest scenarios |\n>\n> Your scenarios repo: `{name}`\n> Your coding agent cannot see those tests. The holdout wall is intact.\n\n**Important:** Tell the user:\n\n> **Before you can test the loop**, you need to merge this PR to main first. GitHub's `workflow_run` triggers only activate for workflows that exist on the default branch. Once merged, create a new PR with any small change \u2014 that's when you'll see Autofix, Scenarios Dispatch, and Spec Dispatch fire for the first time.\n\nUpdate `docs/joycraft-assessment.md` if it exists \u2014 set the Level 5 score to reflect the new setup.\n",
   "joycraft-implement.md": "---\nname: joycraft-implement\ndescription: Execute atomic specs with TDD \u2014 read spec, write failing tests, implement until green, wrap up and continue the queue\ninstructions: 32\n---\n\n# Implement Atomic Spec\n\nYou have exactly one atomic spec file to execute. Your job is to implement it using strict TDD \u2014 tests first, confirm they fail, then implement until green.\n\n## Step 1: Parse Arguments\n\nThe user MUST provide a path. No path = stop immediately.\n\n**If no path was provided:**\n\n> No spec path provided. Provide a spec file or a feature directory:\n> `/joycraft-implement docs/features/<slug>/specs/spec-name.md`\n> or `/joycraft-implement docs/features/<slug>/`\n\n**If the path is a directory** (ends with `/` or does not end with `.md`):\n\nLook for `specs/.joycraft-spec-queue.json` inside that directory. Read it. Find the **first `todo` spec whose dependencies are satisfied** (a dependency is satisfied once it is `in-review` or `done`). This matches what `joycraft-next-spec` serves. That single spec file is your target. Do NOT read any other specs.\n\n> Using spec queue: found [spec-file-name] as the next spec.\n\nIf the directory has no queue or no `todo` specs:\n\n> No remaining specs found in [directory].\n\n**If the path is a file** ending in `.md`:\n\nUse it directly as the spec to implement.\n\n## Step 2: Read the Sibling README.md FIRST (if present)\n\nBefore reading the spec itself, check for a sibling `README.md` in the same folder as the spec \u2014 i.e., `<spec-path>/../README.md`. This file is the wave-plan + spec-table that `/joycraft-decompose` writes per feature.\n\n- **If present:** Read the README first. It tells you the spec's position in the wave plan, its dependencies, and which sibling specs (in the same folder) need to be done before this one.\n- **If absent:** That's fine \u2014 proceed normally. The convention is forward-only and many legacy spec folders pre-date it.\n\n### Warn on Unmet Dependencies\n\nIf the README shows that this spec depends on other specs in the same folder, check whether those dependencies are satisfied. A dependency is satisfied once its frontmatter `status:` is `in-review` or `done` (see `docs/reference/spec-status-lifecycle.md`) \u2014 a checkpoint chain progresses on `in-review` without waiting for session-end to graduate it to `done`. A dependency still at `todo` is unmet.\n\nIf any dependency is **not** complete, tell the user:\n\n> \"This spec lists unmet dependencies in the sibling README.md: [list]. Proceed anyway, or stop?\"\n\nWait for confirmation before continuing. The user might be deliberately running out of order (a hotfix, an exploration, etc.) \u2014 your job is to surface the warning, not to gate.\n\n## Step 3: Read and Understand the Spec\n\n1. **Read the spec file.** The spec is your execution contract \u2014 the Acceptance Criteria and Test Plan define \"done.\"\n2. **Check the spec's Status field.** If it says \"Complete,\" warn the user and ask if they want to re-implement or skip.\n3. **Read the Acceptance Criteria** \u2014 these are your success conditions.\n4. **Read the Test Plan** \u2014 this tells you exactly what tests to write and in what order.\n5. **Read the Constraints** \u2014 these are hard boundaries you must not violate.\n\n### Finding Additional Context\n\nSpecs are designed to be self-contained, but if you need more context:\n\n- **Parent brief:** Linked in the spec's body (`> **Parent Brief:**` line). The new convention is `docs/features/<slug>/brief.md`. Read it for broader feature context.\n- **Related specs:** Live in the same directory (typically `docs/features/<slug>/specs/`). The sibling `README.md` (read in Step 2 above) is the index.\n- **Affected Files:** The spec's Affected Files table tells you which files to create or modify.\n\n\n### Before writing code against an external API:\n\n\u26A0\uFE0F If the spec references a third-party SDK or package, read its official documentation and type definitions FIRST. Never write a `declare module` stub for a package that actually exists \u2014 use the real package as a devDependency instead. The stub will make typecheck pass but the code will fail at runtime.\n\n## Step 4: Execute the TDD Cycle\n\n**This is not optional. Write tests FIRST.**\n\n### 3a. Write Tests (Red Phase)\n\nUsing the spec's Test Plan:\n\n1. Write ALL tests listed in the Test Plan. Each Acceptance Criterion must have at least one test.\n2. Tests should call the actual function/endpoint \u2014 not a reimplementation or mock of the underlying library.\n3. Run the tests. **They MUST fail.** If any test passes immediately:\n   - Flag it \u2014 either the test isn't testing the right thing, or the code already exists.\n   - Investigate before proceeding. A test that passes before implementation is a test that proves nothing.\n\n### 3b. Implement (Green Phase)\n\n1. Follow the spec's Approach section for implementation strategy.\n2. Implement the minimum code needed to make tests pass.\n3. Run tests after each meaningful change \u2014 use the spec's Smoke Test for fast feedback.\n4. Continue until ALL tests pass.\n\n### 3c. Verify Acceptance Criteria\n\nWalk through every Acceptance Criterion in the spec:\n\n- [ ] Is each one met?\n- [ ] Does the build pass?\n- [ ] Do all tests pass?\n\nIf any criterion is not met, keep implementing. Do not move on until all criteria are green.\n\n## Step 5: Handle Edge Cases\n\nCheck the spec's Edge Cases table. For each scenario:\n\n- Verify the expected behavior is handled.\n- If the spec says \"warn the user\" or \"prompt,\" make sure that path works.\n\n## Step 6: Wrap Up and Continue (mode-aware \u2014 do the wrap-up yourself)\n\nWhen the spec is implemented and all its tests pass, wrap up and advance according to the spec's **execution mode**. Read the `mode:` field from the spec's frontmatter (written by `joycraft-decompose`). If the spec has **no `mode:` field**, default to **`batch`** (back-compat with pre-mode specs). If the value is unrecognized, treat it as `batch` and note the unrecognized value.\n\n**You perform the wrap-up. You find the next spec. Do not stop to tell the human to run `/joycraft-spec-done` or to paste the next file path \u2014 those hand-backs carry zero information and break the feature's momentum.**\n\n### 6a. Per-spec wrap-up\n\n| Spec `mode:` | Wrap-up you perform now |\n|--------------|------------------------|\n| **batch** | **Status bump only**: set the spec to `in-review` in both systems (see below). No commit, no discovery stub \u2014 batch wraps once at feature end. (The bump is required: the queue treats a dependency as satisfied at `in-review`, so without it dependent specs would look blocked.) |\n| **checkpoint** / **isolated** | The full `joycraft-spec-done` wrap-up, performed by you (canonical definition: `.claude/skills/joycraft-spec-done/SKILL.md`): **(1)** bump status to `in-review` in both systems, **(2)** terse 2-line discovery stub at `docs/discoveries/YYYY-MM-DD-topic.md` ONLY if something contradicted the spec \u2014 usually skip, **(3)** commit `spec: <spec-name>` (implementation + status edits + stub, nothing unrelated), **(4)** no validation re-run, no push, no PR \u2014 those belong to `joycraft-session-end`. |\n\n**Both systems** means: the queue JSON (`joycraft-mark-done <spec-id> --to in-review <specs-dir>` if `.pi/scripts/joycraft/` is installed, else edit `.joycraft-spec-queue.json` directly) AND the spec file's `status:` frontmatter. Never `done` \u2014 the agent doesn't self-certify (`docs/reference/spec-status-lifecycle.md`).\n\n### 6b. Continue the queue (batch and checkpoint)\n\nRe-read `.joycraft-spec-queue.json` in the spec's directory and find the next `todo` spec whose dependencies are all `in-review`/`done` (same rule as Step 1). Then:\n\n- **Next ready spec exists** \u2192 announce one line \u2014 `Continuing: <next-spec> (spec N of M)` \u2014 and go back to Step 2 with it, in this same conversation.\n- **Remaining `todo` specs are all blocked** \u2192 stop and report which specs are blocked and on what.\n- **No `todo` specs remain** \u2192 this was the feature's last spec; go to 6d.\n- **No queue** (you were invoked with a bare spec file outside a queue) \u2192 report the spec complete and stop; there is nothing to continue from.\n\n### 6c. isolated \u2014 fresh context per spec\n\nA conversation cannot clear its own context, so after the wrap-up the fresh context comes from outside:\n\n- **Driver (recommended):** `/joycraft-implement-feature docs/features/<slug>/` runs the remaining queue with a fresh-context subagent per spec \u2014 in-session, interactive, no headless loop.\n- **Guided-manual:** tell the human to run `/clear`, then re-invoke `/joycraft-implement <next-spec>`. (Always fine, no ToS/cost surprise.)\n- **Pi:** the `joycraft-implement-loop` driver automates it \u2014 a fresh `pi -p` process per spec. Nothing for you to do beyond the wrap-up; the loop advances.\n- **Headless (`claude -p` / `codex exec` loop):** opt-in only. **Surface the caveat, don't bury it:** unattended headless loops draw metered, full-rate API usage and carry a ToS posture the user must **knowingly opt into** (Anthropic meters `claude -p` from a separate full-rate pool; routing subscription OAuth through third-party harnesses is prohibited). The responsible default is Pi (BYO API key / open weights). Do not silently auto-run a subscription-backed headless loop.\n\n### 6d. Feature's last spec (any mode)\n\nRun the once-per-feature finisher yourself: invoke `/joycraft-session-end` (or read and follow `.claude/skills/joycraft-session-end/SKILL.md`). It carries its own gates \u2014 validation is mandatory and must pass before specs graduate `in-review \u2192 done`, and push/PR honor the project's CLAUDE.md git autonomy rules \u2014 so running it automatically is safe.\n\n### Report\n\nAfter each spec's wrap-up, report tersely before continuing:\n\n```\nSpec complete: [spec name] \xB7 mode: [mode] \xB7 tests: [N] passing \xB7 [wrapped up + committed | status bumped (batch)]\n[Continuing: <next-spec> (spec N of M) | Feature complete \u2014 running session-end | Blocked: <specs + reasons>]\n```\n",
   "joycraft-interview.md": '---\nname: joycraft-interview\ndescription: Brainstorm freely about what you want to build \u2014 yap, explore ideas, and get a structured summary you can use later\ninstructions: 18\n---\n\n# Interview \u2014 Idea Exploration\n\nYou are helping the user brainstorm and explore what they want to build. This is a lightweight, low-pressure conversation \u2014 not a formal spec process. Let them yap.\n\n## How to Run the Interview\n\n### 1. Open the Floor\n\nStart with something like:\n"What are you thinking about building? Just talk \u2014 I\'ll listen and ask questions as we go."\n\nLet the user talk freely. Do not interrupt their flow. Do not push toward structure yet.\n\n### 2. Ask Clarifying Questions\n\nAs they talk, weave in questions naturally \u2014 don\'t fire them all at once:\n\n- **What problem does this solve?** Who feels the pain today?\n- **What does "done" look like?** If this worked perfectly, what would a user see?\n- **What are the constraints?** Time, tech, team, budget \u2014 what boxes are we in?\n- **What\'s NOT in scope?** What\'s tempting but should be deferred?\n- **What are the edge cases?** What could go wrong? What\'s the weird input?\n- **What exists already?** Are we building on something or starting fresh?\n\n### 3. Play Back Understanding\n\nAfter the user has gotten their ideas out, reflect back:\n"So if I\'m hearing you right, you want to [summary]. The core problem is [X], and done looks like [Y]. Is that right?"\n\nLet them correct and refine. Iterate until they say "yes, that\'s it."\n\n### 4. Write a Draft Brief\n\nDerive a slug `YYYY-MM-DD-<topic>` (today\'s date + kebab-case topic \u2014 no `-draft` suffix).\nCreate a draft file at `docs/features/<slug>/brief.md`. Lazy-create `docs/features/<slug>/` if it doesn\'t exist.\n\nThe file MUST start with YAML frontmatter \u2014 the 4-field personal schema with `status: draft`:\n\n```yaml\n---\nstatus: draft\nowner: <resolved name>\ncreated: YYYY-MM-DD\nfeature: <slug>\n---\n```\n\n**Owner resolution:** look up the owner name in this order \u2014 (1) `git config user.name`, (2) value in your auto-memory `joycraft-owner.txt` if present, (3) ask the user once and persist. If you can\'t get a name, leave the field as `<resolved name>` and note it for the user.\n\nUse this format for the body:\n\n```markdown\n# [Topic] \u2014 Draft Brief\n\n> **Date:** YYYY-MM-DD\n> **Origin:** /joycraft-interview session\n\n---\n\n## The Idea\n[2-3 paragraphs capturing what the user described \u2014 their words, their framing]\n\n## Problem\n[What pain or gap this addresses]\n\n## What "Done" Looks Like\n[The user\'s description of success \u2014 observable outcomes]\n\n## Constraints\n- [constraint 1]\n- [constraint 2]\n\n## Open Questions\n- [things that came up but weren\'t resolved]\n- [decisions that need more thought]\n\n## Out of Scope (for now)\n- [things explicitly deferred \u2014 see also: deferred work goes to `docs/backlog/`]\n\n## Raw Notes\n[Any additional context, quotes, or tangents worth preserving]\n```\n\n### 5. Offer to Capture Deferred Items to Backlog\n\nIf during the conversation deferred work surfaces (a tangent, a "later" item, a "out-of-scope but tempting" idea), ASK the user:\n\n> "This looks like deferred work \u2014 want me to capture it to `docs/backlog/`?"\n\nOnly on user confirmation, write a backlog entry at `docs/backlog/YYYY-MM-DD-<short-name>.md` with backlog frontmatter:\n\n```yaml\n---\nstatus: backlog\nowner: <resolved name>\ncreated: YYYY-MM-DD\nsource: docs/features/<slug>/brief.md\n---\n```\n\n**Never auto-write to `docs/backlog/`.** Every backlog entry is user-confirmed.\n\n### 6. Hand Off\n\nAfter writing the draft (and any backlog entries), present the canonical Handoff block.\nInclude any backlog paths produced as a side effect.\n\n## Recommended Next Steps\n\nNext:\n```bash\n/joycraft-new-feature docs/features/<slug>/brief.md\n```\nRun /clear first.\n\nIf the idea sounds complex \u2014 touches many files, involves architectural decisions, or the user is working in an unfamiliar area \u2014 nudge them toward research and design (e.g., `/joycraft-research` then `/joycraft-design`). But present it as a recommendation, not a gate.\n\n## Guidelines\n\n- **This is NOT /joycraft-new-feature.** Do not push toward formal briefs, decomposition tables, or atomic specs. The point is exploration.\n- **Let the user lead.** Your job is to listen, clarify, and capture \u2014 not to structure or direct.\n- **Mark everything as DRAFT.** The output is a starting point, not a commitment.\n- **Keep it short.** The draft brief should be 1-2 pages max. Capture the essence, not every detail.\n- **Multiple interviews are fine.** The user might run this several times as their thinking evolves. Each creates a new dated draft.\n',
   "joycraft-lockdown.md": "---\nname: joycraft-lockdown\ndescription: Generate constrained execution boundaries for an implementation session -- NEVER rules and deny patterns to prevent agent overreach\ninstructions: 28\n---\n\n# Lockdown Mode\n\nThe user wants to constrain agent behavior for an implementation session. Your job is to interview them about what should be off-limits, then generate CLAUDE.md NEVER rules and `.claude/settings.json` deny patterns they can review and apply.\n\n## When Is Lockdown Useful?\n\nLockdown is most valuable for:\n- **Complex tech stacks** (hardware, firmware, multi-device) where agents can cause real damage\n- **Long-running autonomous sessions** where you won't be monitoring every action\n- **Production-adjacent work** where accidental network calls or package installs are risky\n\nFor simple feature work on a well-tested codebase, lockdown is usually overkill. Mention this context to the user so they can decide.\n\n## Step 1: Check for Tests\n\nBefore starting the interview, check if the project has test files or directories (look for `tests/`, `test/`, `__tests__/`, `spec/`, or files matching `*.test.*`, `*.spec.*`).\n\nIf no tests are found, tell the user:\n\n> Lockdown mode is most useful when you already have tests in place -- it prevents the agent from modifying them while constraining behavior to writing code and running tests. Consider running `/joycraft-new-feature` first to set up a test-driven workflow, then come back to lock it down.\n\nIf the user wants to proceed anyway, continue with the interview.\n\n## Step 2: Interview -- What to Lock Down\n\nAsk these three questions, one at a time. Wait for the user's response before proceeding to the next question.\n\n### Question 1: Read-Only Files\n\n> What test files or directories should be off-limits for editing? (e.g., `tests/`, `__tests__/`, `spec/`, specific test files)\n>\n> I'll generate NEVER rules to prevent editing these.\n\nIf the user isn't sure, suggest the test directories you found in Step 1.\n\n### Question 2: Allowed Commands\n\n> What commands should the agent be allowed to run? Defaults:\n> - Write and edit source code files\n> - Run the project's smoke test command\n> - Run the full test suite\n>\n> Any other commands to explicitly allow? Or should I restrict to just these?\n\n### Question 3: Denied Commands\n\n> What commands should be denied? Defaults:\n> - Package installs (`npm install`, `pip install`, `cargo add`, `go get`, etc.)\n> - Network tools (`curl`, `wget`, `ping`, `ssh`)\n> - Direct log file reading\n>\n> Any specific commands to add or remove from this list?\n\n**Edge case -- user wants to allow some network access:** If the user mentions API tests or specific endpoints that need network access, exclude those from the deny list and note the exception in the output.\n\n**Edge case -- user wants to lock down file writes:** If the user wants to prevent ALL file writes, warn them:\n\n> Denying all file writes would prevent the agent from doing any work. I recommend keeping source code writes allowed and only locking down test files, config files, or other sensitive directories.\n\n## Step 3: Generate Boundaries\n\nBased on the interview responses, generate output in this exact format:\n\n```\n## Lockdown boundaries generated\n\nReview these suggestions and add them to your project:\n\n### CLAUDE.md -- add to NEVER section:\n\n- Edit any file in `[user's test directories]`\n- Run `[denied package manager commands]`\n- Use `[denied network tools]`\n- Read log files directly -- interact with logs only through test assertions\n- [Any additional NEVER rules based on user responses]\n\n### .claude/settings.json -- suggested deny patterns:\n\nAdd these to the `permissions.deny` array:\n\n[\"[command1]\", \"[command2]\", \"[command3]\"]\n\n---\n\nCopy these into your project manually, or tell me to apply them now (I'll show you the exact changes for approval first).\n```\n\nAdjust the content based on the actual interview responses:\n- Only include deny patterns for commands the user confirmed should be denied\n- Only include NEVER rules for directories/files the user specified\n- If the user allowed certain network tools or package managers, exclude those\n\n## Recommended Permission Mode\n\nAfter generating the boundaries above, also recommend a Claude Code permission mode. Include this section in your output:\n\n```\n### Recommended Permission Mode\n\nYou don't need `--dangerously-skip-permissions`. Safer alternatives exist:\n\n| Your situation | Use | Why |\n|---|---|---|\n| Autonomous spec execution | `--permission-mode dontAsk` + allowlist above | Only pre-approved commands run |\n| Long session with some trust | `--permission-mode auto` | Safety classifier reviews each action |\n| Interactive development | `--permission-mode acceptEdits` | Auto-approves file edits, prompts for commands |\n\n**For lockdown mode, we recommend `--permission-mode dontAsk`** combined with the deny patterns above. This gives you full autonomy for allowed operations while blocking everything else -- no classifier overhead, no prompts, and no safety bypass.\n\n`--dangerously-skip-permissions` disables ALL safety checks. The modes above give you autonomy without removing the guardrails.\n```\n\n## Step 4: Offer to Apply\n\nIf the user asks you to apply the changes:\n\n1. **For CLAUDE.md:** Read the existing CLAUDE.md, find the Behavioral Boundaries section, and show the user the exact diff for the NEVER section. Ask for confirmation before writing.\n2. **For settings.json:** Read the existing `.claude/settings.json`, show the user what the `permissions.deny` array will look like after adding the new patterns. Ask for confirmation before writing.\n\n**Never auto-apply. Always show the exact changes and wait for explicit approval.**\n",
@@ -130,186 +130,8 @@ _Situations where the AI agent should stop and ask the human instead of trying t
 `,
   "examples/example-brief.md": "# Add User Notifications \u2014 Feature Brief\n\n> **Date:** 2026-03-15\n> **Project:** acme-web\n> **Status:** Specs Ready\n\n---\n\n## Vision\n\nOur users have no idea when things happen in their account. A teammate comments on their pull request, a deployment finishes, a billing threshold is hit \u2014 they find out by accident, minutes or hours later. This is the #1 complaint in our last user survey.\n\nWe are building a notification system that delivers real-time and batched notifications across in-app, email, and (later) Slack channels. Users will have fine-grained control over what they receive and how. When this ships, no important event goes unnoticed, and no user gets buried in noise they didn't ask for.\n\nThe system is designed to be extensible \u2014 new event types plug in without touching the notification infrastructure. We start with three event types (PR comments, deploy status, billing alerts) and prove the pattern works before expanding.\n\n## User Stories\n\n- As a developer, I want to see a notification badge in the app when someone comments on my PR so that I can respond quickly\n- As a team lead, I want to receive an email when a production deployment fails so that I can coordinate the response\n- As a billing admin, I want to get alerted when usage exceeds 80% of our plan limit so that I can upgrade before service is disrupted\n- As any user, I want to control which notifications I receive and through which channels so that I am not overwhelmed\n\n## Hard Constraints\n\n- MUST: All notifications go through a single event bus \u2014 no direct coupling between event producers and delivery channels\n- MUST: Email delivery uses the existing SendGrid integration (do not add a new email provider)\n- MUST: Respect user preferences before delivering \u2014 never send a notification the user has opted out of\n- MUST NOT: Store notification content in plaintext in the database \u2014 use the existing encryption-at-rest pattern\n- MUST NOT: Send more than 50 emails per user per day (batch if necessary)\n\n## Out of Scope\n\n- NOT: Slack/Discord integration (Phase 2)\n- NOT: Push notifications / mobile (Phase 2)\n- NOT: Notification templates with rich HTML \u2014 plain text and simple markdown only for now\n- NOT: Admin dashboard for monitoring notification delivery rates\n- NOT: Retroactive notifications for events that happened before the feature ships\n\n## Decomposition\n\n| # | Spec Name | Description | Dependencies | Est. Size |\n|---|-----------|-------------|--------------|-----------|\n| 1 | add-notification-preferences-api | Create REST endpoints for users to read and update their notification preferences | None | M |\n| 2 | add-event-bus-infrastructure | Set up the internal event bus that decouples event producers from notification delivery | None | M |\n| 3 | add-notification-delivery-service | Build the service that consumes events, checks preferences, and dispatches to channels (in-app, email) | Spec 1, Spec 2 | L |\n| 4 | add-in-app-notification-ui | Add notification bell, dropdown, and badge count to the app header | Spec 3 | M |\n| 5 | add-email-batching | Implement daily digest batching for email notifications that exceed the per-user threshold | Spec 3 | S |\n\n## Execution Strategy\n\n- [x] Agent teams (parallel teammates within phases, sequential between phases)\n\n```\nPhase 1: Teammate A -> Spec 1 (preferences API), Teammate B -> Spec 2 (event bus)\nPhase 2: Teammate A -> Spec 3 (delivery service) \u2014 depends on Phase 1\nPhase 3: Teammate A -> Spec 4 (UI), Teammate B -> Spec 5 (batching) \u2014 both depend on Spec 3\n```\n\n## Success Criteria\n\n- [ ] User updates notification preferences via API, and subsequent events respect those preferences\n- [ ] A PR comment event triggers an in-app notification visible in the UI within 2 seconds\n- [ ] A deploy failure event sends an email to subscribed users via SendGrid\n- [ ] When email threshold (50/day) is exceeded, remaining notifications are batched into a daily digest\n- [ ] No regressions in existing PR, deployment, or billing features\n\n## External Scenarios\n\n| Scenario | What It Tests | Pass Criteria |\n|----------|--------------|---------------|\n| opt-out-respected | User disables email for deploy events, deploy fails | No email sent, in-app notification still appears |\n| batch-threshold | Send 51 email-eligible events for one user in a day | 50 individual emails + 1 digest containing the overflow |\n| preference-persistence | User sets preferences, logs out, logs back in | Preferences are unchanged |\n",
   "examples/example-spec.md": '# Add Notification Preferences API \u2014 Atomic Spec\n\n> **Parent Brief:** `docs/briefs/2026-03-15-add-user-notifications.md`\n> **Status:** Ready\n> **Date:** 2026-03-15\n> **Estimated scope:** 1 session / 4 files / ~250 lines\n\n---\n\n## What\n\nAdd REST API endpoints that let users read and update their notification preferences. Each user gets a preferences record with per-event-type, per-channel toggles (e.g., "PR comments: in-app=on, email=off"). Preferences default to all-on for new users and are stored encrypted alongside the user profile.\n\n## Why\n\nThe notification delivery service (Spec 3) needs to check preferences before dispatching. Without this API, there is no way for users to control what they receive, and we cannot build the delivery pipeline.\n\n## Acceptance Criteria\n\n- [ ] `GET /api/v1/notifications/preferences` returns the current user\'s preferences as JSON\n- [ ] `PATCH /api/v1/notifications/preferences` updates one or more preference fields and returns the updated record\n- [ ] New users get default preferences (all channels enabled for all event types) on first read\n- [ ] Preferences are validated \u2014 unknown event types or channels return 400\n- [ ] Preferences are stored using the existing encryption-at-rest pattern (`EncryptedJsonColumn`)\n- [ ] Endpoint requires authentication (returns 401 for unauthenticated requests)\n- [ ] Build passes\n- [ ] Tests pass (unit + integration)\n\n## Test Plan\n\n| Acceptance Criterion | Test | Type |\n|---------------------|------|------|\n| GET returns preferences as JSON | Call GET with authenticated user, assert 200 + JSON shape matches preferences schema | integration |\n| PATCH updates preferences | Call PATCH with valid partial update, assert 200 + returned record reflects changes | integration |\n| New users get defaults | Call GET for user with no existing record, assert default preferences (all channels enabled) | unit |\n| Unknown event types return 400 | Call PATCH with `{"foo": {"email": true}}`, assert 400 + validation error | unit |\n| Stored with EncryptedJsonColumn | Verify model uses EncryptedJsonColumn for preferences field | unit |\n| Auth required | Call GET/PATCH without auth token, assert 401 | integration |\n| Build passes | Verified by build step \u2014 no separate test needed | build |\n| Tests pass | Verified by test runner \u2014 no separate test needed | meta |\n\n**Execution order:**\n1. Write all tests above \u2014 they should fail against current/stubbed code\n2. Run tests to confirm they fail (red)\n3. Implement until all tests pass (green)\n\n**Smoke test:** The "New users get defaults" unit test \u2014 no database or HTTP needed, fastest feedback loop.\n\n**Before implementing, verify your test harness:**\n1. Run all tests \u2014 they must FAIL (if they pass, you\'re testing the wrong thing)\n2. Each test calls your actual function/endpoint \u2014 not a reimplementation or the underlying library\n3. Identify your smoke test \u2014 it must run in seconds, not minutes, so you get fast feedback on each change\n\n## Constraints\n\n- MUST: Use the existing `EncryptedJsonColumn` utility for storage \u2014 do not roll a new encryption pattern\n- MUST: Follow the existing REST controller pattern in `src/controllers/`\n- MUST NOT: Expose other users\' preferences (scope queries to authenticated user only)\n- SHOULD: Return the full preferences object on PATCH (not just the changed fields), so the frontend can replace state without merging\n\n## Affected Files\n\n| Action | File | What Changes |\n|--------|------|-------------|\n| Create | `src/controllers/notification-preferences.controller.ts` | New controller with GET and PATCH handlers |\n| Create | `src/models/notification-preferences.model.ts` | Sequelize model with EncryptedJsonColumn for preferences blob |\n| Create | `src/migrations/20260315-add-notification-preferences.ts` | Database migration to create notification_preferences table |\n| Create | `tests/controllers/notification-preferences.test.ts` | Unit and integration tests for both endpoints |\n| Modify | `src/routes/index.ts` | Register the new controller routes |\n\n## Approach\n\nCreate a `NotificationPreferences` model backed by a single `notification_preferences` table with columns: `id`, `user_id` (unique FK), `preferences` (EncryptedJsonColumn), `created_at`, `updated_at`. The `preferences` column stores a JSON blob shaped like `{ "pr_comment": { "in_app": true, "email": true }, "deploy_status": { ... } }`.\n\nThe GET endpoint does a find-or-create: if no record exists for the user, create one with defaults and return it. The PATCH endpoint deep-merges the request body into the existing preferences, validates the result against a known schema of event types and channels, and saves.\n\n**Rejected alternative:** Storing preferences as individual rows (one per event-type-channel pair). This would make queries more complex and would require N rows per user instead of 1. The JSON blob approach is simpler and matches how the frontend will consume the data.\n\n## Edge Cases\n\n| Scenario | Expected Behavior |\n|----------|------------------|\n| PATCH with empty body `{}` | Return 200 with unchanged preferences (no-op) |\n| PATCH with unknown event type `{"foo": {"email": true}}` | Return 400 with validation error listing valid event types |\n| GET for user with no existing record | Create default preferences, return 200 |\n| Concurrent PATCH requests | Last-write-wins (optimistic, no locking) \u2014 acceptable for user preferences |\n',
-  "pi-agents/joycraft-researcher.md": '---\nname: joycraft-researcher\ndescription: Independent research agent \u2014 sees only questions, never the brief\ntools: read, grep, find, ls, bash\n---\n\n# Joycraft Researcher\n\nYou are an independent research agent. Your job is to answer objective codebase research questions by reading files and searching the codebase.\n\n## Rules\n\n- Answer each question with FACTS ONLY: file paths, function signatures, data flows, patterns, dependencies\n- Do NOT recommend, suggest, or opine\n- Do NOT speculate about what should be built\n- If a question cannot be answered, say "No existing code found for this"\n- Search the codebase and read files thoroughly\n- Include code snippets only when essential evidence\n\n## Output Format\n\n# Codebase Research\n\n**Date:** [today]\n**Questions answered:** [N/total]\n\n---\n\n## Q1: [question]\n[Facts only]\n\n## Q2: [question]\n[Facts only]\n',
-  "pi-agents/joycraft-verifier.md": "---\nname: joycraft-verifier\ndescription: Independent verification agent \u2014 checks implementation against spec, read-only\ntools: read, grep, find, ls, bash\n---\n\n# Joycraft Verifier\n\nYou are a QA verifier. Your job is to independently verify an implementation against its spec. You have NO context about how the implementation was done \u2014 you are checking it fresh.\n\n## Rules (Hard Constraints)\n\n- You may search the codebase and read any file\n- You may RUN only the test/build commands specified in your prompt\n- You may NOT edit, create, or delete any files\n- You may NOT run commands that modify state (no git commit, no npm install, no file writes)\n- You may NOT install packages or access the network\n- Report what you OBSERVE, not what you expect or hope\n\n## Output Format\n\nVERIFICATION REPORT\n\n| # | Criterion | Verdict | Evidence |\n|---|-----------|---------|----------|\n| 1 | [criterion text] | PASS/FAIL/MANUAL CHECK NEEDED | [what you observed] |\n\nSUMMARY: X/Y criteria passed. [Z failures need attention. / All criteria verified.]\n",
-  "pi-extensions/joycraft-pipeline.ts": '// joycraft-pipeline.ts \u2014 Pi extension for Joycraft pipeline advancement.\n//\n// Provides a single registration point:\n//   - A /joycraft-next-spec COMMAND (human-typable) that finds the next spec\n//     and starts a fresh session seeded with it.\n//\n// The former joycraft_next_spec TOOL (LLM-callable, in-process advance) was\n// retired: the autonomous loop is the `joycraft-implement-loop` script, which\n// gets context isolation from the OS process boundary (one fresh `pi -p` per\n// spec) \u2014 the in-process path could not isolate context. Interactive Pi still\n// uses the COMMAND below.\n\nimport type { ExtensionAPI } from "@earendil-works/pi-coding-agent";\nimport { execSync } from "node:child_process";\nimport { join } from "node:path";\n\nfunction getScriptsDir(cwd: string) {\n  return join(cwd, ".pi", "scripts", "joycraft");\n}\n\nexport default function (pi: ExtensionAPI) {\n  // \u2500\u2500 COMMAND: full pipeline, human-typable \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n  pi.registerCommand("joycraft-next-spec", {\n    description:\n      "Advance the Joycraft pipeline: find next spec and start a fresh session with it.",\n    handler: async (_args, ctx) => {\n      const scriptsDir = getScriptsDir(ctx.cwd);\n\n      // Find next spec\n      let next: string;\n      try {\n        next = execSync(`"${join(scriptsDir, "joycraft-next-spec")}"`, {\n          cwd: ctx.cwd,\n          encoding: "utf-8",\n          stdio: "pipe",\n        }).trim();\n      } catch (e: any) {\n        ctx.ui.notify(\n          `Could not determine next spec: ${e.stderr?.toString() || e.message}`,\n          "error"\n        );\n        return;\n      }\n\n      if (!next || next === "Pipeline complete") {\n        ctx.ui.notify(\n          next === "Pipeline complete"\n            ? "\u{1F389} Pipeline complete! All specs in this feature are done."\n            : "Could not determine next spec.",\n          "info"\n        );\n        return;\n      }\n\n      // Start fresh session with next spec\n      await ctx.newSession({\n        withSession: async (session) => {\n          session.sendUserMessage(`/skill:joycraft-implement ${next}`);\n        },\n      });\n    },\n  });\n}\n',
-  "pi-scripts/README.md": "# Joycraft Pi Scripts\n\nBash scripts that form the tool belt for Joycraft's autonomous Pi pipeline.\n\n## Scripts\n\n| Script | Purpose |\n|--------|---------|\n| `joycraft-spec-status` | Read `.joycraft-spec-queue.json` and print a formatted status table (glyphs: `[ ]` todo, `[~]` in-review, `[\u2713]` done) |\n| `joycraft-mark-done` | Transition a spec's status in the queue: `joycraft-mark-done <id> --to <state>` where `<state>` is `todo`, `in-review`, or `done` (omitting `--to` defaults to `in-review`) |\n| `joycraft-next-spec` | Find the next `todo` spec whose dependencies are satisfied (`in-review`/`done`), respecting dependency order |\n| `joycraft-session-end` | Capture discoveries, run validation, and stage changes (the once-per-feature finisher) |\n| `joycraft-implement-loop` | Isolated-mode driver: run a whole feature's queue headlessly, one fresh `pi -p` process per spec |\n\nStatus vocabulary is defined canonically in `docs/reference/spec-status-lifecycle.md` (`todo \u2192 in-review \u2192 done`).\n\n## Usage\n\nAll scripts are designed to be called from the project root.\n\n```bash\n# Check status of all specs (3-glyph table)\n.pi/scripts/joycraft/joycraft-spec-status\n\n# Mark spec #3 in-review (spec-done), or graduate it to done (session-end)\n.pi/scripts/joycraft/joycraft-mark-done 3 --to in-review\n.pi/scripts/joycraft/joycraft-mark-done 3 --to done\n\n# Get path of next spec to implement\n.pi/scripts/joycraft/joycraft-next-spec docs/features/<slug>/specs\n\n# Run the isolated-mode loop over a feature's queue (fresh process per spec)\n.pi/scripts/joycraft/joycraft-implement-loop docs/features/<slug>/specs\n\n# End a feature (validate + stage)\n.pi/scripts/joycraft/joycraft-session-end add-pi-skills\n```\n\n`joycraft-implement-loop` reads the `pi` binary from `PI_BIN` (defaults to `pi`), so it can be tested with a stub and pointed at any Pi build. It is for Pi with a BYO API key or open-weight model \u2014 not a Claude/ChatGPT subscription OAuth (see the ToS note in the north star).\n\n## Dependency\n\nThese scripts parse `.joycraft-spec-queue.json` \u2014 a JSON manifest generated by the `joycraft-decompose` skill. They use only POSIX-compatible `grep` and `sed` (no `jq` dependency).\n\n## Pi Pipeline Flow\n\n```\njoycraft-implement-loop  (one fresh pi -p process per spec)\n  next-spec \u2192 implement \u2192 spec-done (todo\u2192in-review + commit) \u2192 repeat\n                                      \u2193\n                          queue exhausted \u2192 session-end (validate, graduate\n                                            in-review\u2192done, push, PR) once\n```\n",
-  "pi-scripts/joycraft-implement-loop": '#!/usr/bin/env bash\n# joycraft-implement-loop \u2014 Isolated-mode driver for Pi.\n#\n# Runs a whole feature\'s spec queue headlessly, ONE FRESH OS PROCESS PER SPEC.\n# The process boundary is the context isolation (verified) \u2014 this is what\n# "isolated mode" means on Pi.\n#\n# Usage: joycraft-implement-loop <specs-dir>\n#   <specs-dir>  REQUIRED. The folder holding .joycraft-spec-queue.json\n#                (e.g. docs/features/<slug>/specs). Passed through to\n#                joycraft-next-spec verbatim \u2014 no glob-guessing.\n#\n# Loop body, per iteration:\n#   1. joycraft-next-spec <specs-dir>  \u2192 next `todo` spec path, or\n#      "Pipeline complete" \u2192 run session-end once and exit 0.\n#   2. pi -p "/skill:joycraft-implement <spec>"   (fresh process)\n#   3. pi -p "/skill:joycraft-spec-done <spec>"   (fresh process)\n#   4. repeat.\n# Any per-spec failure is fail-fast: the loop stops with a non-zero exit and\n# names the failing spec (dependency-aware-continue is intentionally out of\n# scope). When the queue is exhausted, joycraft-session-end runs exactly once.\n#\n# ToS/cost note: this driver is for Pi with a BYO API key or open-weight model\n# (Commercial/API terms \u2014 no automation restriction). Do NOT point it at a\n# Claude/ChatGPT *subscription* OAuth \u2014 that re-introduces the consumer-ToS\n# problem the Pi-first path exists to avoid.\n\nset -euo pipefail\n\n# The pi binary is overridable so tests inject a deterministic stub instead of\n# burning real API tokens. Production default is the real `pi` on PATH.\nPI_BIN="${PI_BIN:-pi}"\n\n# Require an explicit specs-dir \u2014 never glob-guess (that was pipeline-hardening\n# Bug 1: alphabetical manifest mis-pick).\nSPECS_DIR="${1:-}"\nif [ -z "$SPECS_DIR" ]; then\n  echo "Usage: joycraft-implement-loop <specs-dir>" >&2\n  echo "  e.g. joycraft-implement-loop docs/features/<slug>/specs" >&2\n  exit 1\nfi\nif [ ! -d "$SPECS_DIR" ]; then\n  echo "Specs dir not found: $SPECS_DIR" >&2\n  exit 1\nfi\n\n# Resolve the helper scripts. Prefer one already on PATH (lets an operator \u2014\n# or a test harness \u2014 shadow them); otherwise fall back to the sibling next to\n# this script, so the loop works from the installed location\n# (.pi/scripts/joycraft/) regardless of cwd.\nSCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"\nresolve_helper() {\n  # $1 = helper name; echo the resolved path.\n  if command -v "$1" >/dev/null 2>&1; then\n    command -v "$1"\n  else\n    echo "$SCRIPT_DIR/$1"\n  fi\n}\nNEXT_SPEC="$(resolve_helper joycraft-next-spec)"\nSESSION_END="$(resolve_helper joycraft-session-end)"\n\nwhile true; do\n  # 1. Ask for the next servable spec.\n  NEXT="$("$NEXT_SPEC" "$SPECS_DIR")"\n\n  if [ -z "$NEXT" ] || [ "$NEXT" = "Pipeline complete" ]; then\n    # Queue exhausted \u2192 the once-per-feature finisher, then done.\n    echo "\u25BA Queue complete \u2014 running session-end."\n    "$SESSION_END"\n    exit 0\n  fi\n\n  echo "\u25BA Implementing: $NEXT"\n\n  # 2. Fresh process implements exactly this one spec. Naming both the\n  #    slash-skill and the spec path makes it trigger whether pi -p honors the\n  #    /skill: prefix directly or routes via description-match.\n  if ! "$PI_BIN" -p "/skill:joycraft-implement $NEXT"; then\n    echo "\u2717 implement failed for: $NEXT \u2014 stopping (fail-fast)." >&2\n    exit 1\n  fi\n\n  # 3. Fresh process wraps it up (status bump todo\u2192in-review + commit).\n  if ! "$PI_BIN" -p "/skill:joycraft-spec-done $NEXT"; then\n    echo "\u2717 spec-done failed for: $NEXT \u2014 stopping (fail-fast)." >&2\n    exit 1\n  fi\ndone\n',
-  "pi-scripts/joycraft-mark-done": '#!/usr/bin/env bash\n# joycraft-mark-done \u2014 Transition a spec\'s status in .joycraft-spec-queue.json.\n# Usage: joycraft-mark-done <spec-id> [--to <state>] [specs-dir]\n#\n# --to <state> is one of: todo, in-review, done (see\n# docs/reference/spec-status-lifecycle.md). Omitting --to defaults to\n# in-review (the common spec-done case). session-end passes --to done.\n\nset -euo pipefail\n\nSPEC_ID=""\nTO_STATE=""\nSPECS_DIR=""\n\n# Parse args: first positional = spec id, --to <state> anywhere, optional\n# trailing positional = specs dir.\nwhile [ $# -gt 0 ]; do\n  case "$1" in\n    --to)\n      TO_STATE="${2:-}"\n      shift 2\n      ;;\n    *)\n      if [ -z "$SPEC_ID" ]; then\n        SPEC_ID="$1"\n      else\n        SPECS_DIR="$1"\n      fi\n      shift\n      ;;\n  esac\ndone\n\nif [ -z "$SPEC_ID" ]; then\n  echo "Usage: joycraft-mark-done <spec-id> [--to <state>] [specs-dir]" >&2\n  exit 1\nfi\n\n# Default transition is to in-review.\nTO_STATE="${TO_STATE:-in-review}"\n\n# Validate the target state against the exact lowercase set.\ncase "$TO_STATE" in\n  todo|in-review|done) ;;\n  *)\n    echo "Invalid --to value: \'$TO_STATE\' (expected one of: todo, in-review, done)" >&2\n    exit 1\n    ;;\nesac\n\nSPECS_DIR="${SPECS_DIR:-docs/features/*/specs}"\nMANIFEST=$(ls "$SPECS_DIR"/.joycraft-spec-queue.json 2>/dev/null | head -1)\n\nif [ -z "$MANIFEST" ]; then\n  echo "No .joycraft-spec-queue.json found in $SPECS_DIR" >&2\n  exit 1\nfi\n\n# Check spec exists \u2014 hard error, never a silent no-op.\nif ! grep -q "\\"id\\": *$SPEC_ID[,}]" "$MANIFEST"; then\n  echo "Spec #$SPEC_ID not found in manifest" >&2\n  exit 1\nfi\n\n# Replace the matching spec id\'s status \u2014 from ANY current value \u2014 to the\n# requested state, so re-running transitions (e.g. in-review \u2192 done) works.\n# Edit via a temp file rather than `sed -i`: in-place editing is non-portable\n# (BSD/macOS needs `-i \'\'`, GNU/Linux rejects it), so we write to a temp file\n# and move it back \u2014 identical behavior on both platforms.\nTMP_MANIFEST="$(mktemp)"\nsed -E "/\\"id\\": *$SPEC_ID[,}]/s/\\"status\\": *\\"[^\\"]*\\"/\\"status\\": \\"$TO_STATE\\"/" "$MANIFEST" > "$TMP_MANIFEST"\nmv "$TMP_MANIFEST" "$MANIFEST"\n\necho "Spec #$SPEC_ID marked $TO_STATE"\n',
-  "pi-scripts/joycraft-next-spec": `#!/usr/bin/env bash
-# joycraft-next-spec \u2014 Find the next uncompleted spec respecting dependency order.
-# Usage: joycraft-next-spec [specs-dir]
-# Outputs: file path of the next spec, or "Pipeline complete" if all done.
-#
-# Status vocabulary (see docs/reference/spec-status-lifecycle.md):
-#   todo \u2192 eligible to serve; in-review / done \u2192 not served.
-#   A dependency is "met" once it reaches in-review OR done (so checkpoint
-#   chains progress without waiting for session-end to graduate to done).
-set -euo pipefail
-SPECS_DIR="\${1:-docs/features/*/specs}"
-# Find the manifest (allow glob to expand; pick most recent if multiple)
-MANIFEST_PATH=""
-for dir in $SPECS_DIR; do
-  candidate="$dir/.joycraft-spec-queue.json"
-  if [ -f "$candidate" ]; then
-    if [ -z "$MANIFEST_PATH" ] || [ "$candidate" -nt "$MANIFEST_PATH" ]; then
-      MANIFEST_PATH="$candidate"
-    fi
-  fi
-done
-if [ -z "$MANIFEST_PATH" ]; then
-  echo "No .joycraft-spec-queue.json found" >&2
-  exit 1
-fi
-SPECS_DIR_REAL=$(dirname "$MANIFEST_PATH")
-MANIFEST="$MANIFEST_PATH"
-TMPDIR=$(mktemp -d)
-trap 'rm -rf $TMPDIR' EXIT
-# Extract all spec entries (use process substitution to avoid pipefail+subshell issues)
-while IFS= read -r entry; do
-  id=$(echo "$entry" | sed -n 's/.*"id": *\\([0-9]*\\).*/\\1/p')
-  file=$(echo "$entry" | sed -n 's/.*"file": *"\\([^"]*\\)".*/\\1/p')
-  status=$(echo "$entry" | sed -n 's/.*"status": *"\\([^"]*\\)".*/\\1/p')
-  deps=$(echo "$entry" | sed -n 's/.*"depends_on": *\\[\\([^]]*\\)\\].*/\\1/p')
-  if [ -n "$id" ] && [ -n "$file" ] && [ -n "$status" ]; then
-    echo "$id|$file|$status|$deps" >> "$TMPDIR/specs.txt"
-  fi
-done < <(grep -o '{[^}]*}' "$MANIFEST")
-if [ ! -f "$TMPDIR/specs.txt" ]; then
-  echo "Pipeline complete"
-  exit 0
-fi
-# Build the "satisfied" set: a dependency counts as met once it is in-review OR done.
-while IFS='|' read -r id file status deps; do
-  if [ "$status" = "in-review" ] || [ "$status" = "done" ]; then
-    echo "$id" >> "$TMPDIR/satisfied.txt"
-  fi
-done < "$TMPDIR/specs.txt"
-touch "$TMPDIR/satisfied.txt"
-# Find first todo spec whose deps are all satisfied
-while IFS='|' read -r id file status deps; do
-  if [ "$status" != "todo" ]; then
-    continue
-  fi
-  # Check dependencies
-  all_deps_met=true
-  if [ -n "$(echo "$deps" | tr -d '[:space:]')" ]; then
-    for dep_id in $(echo "$deps" | tr ',' ' ' | tr -d '[:space:]'); do
-      if ! grep -q "^$dep_id$" "$TMPDIR/satisfied.txt"; then
-        all_deps_met=false
-        break
-      fi
-    done
-  fi
-  if $all_deps_met; then
-    echo "$SPECS_DIR_REAL/$file"
-    exit 0
-  fi
-done < "$TMPDIR/specs.txt"
-# If we get here, no eligible spec found.
-# \`grep -c\` prints 0 but exits non-zero when there are no matches; \`|| true\`
-# swallows that exit WITHOUT appending a second "0" (which would make
-# $remaining a two-line value and break the integer test below).
-remaining=$(grep -c '|todo|' "$TMPDIR/specs.txt" 2>/dev/null || true)
-if [ "\${remaining:-0}" -gt 0 ]; then
-  echo "All remaining specs blocked \u2014 unmet dependencies" >&2
-  exit 1
-fi
-echo "Pipeline complete"
-`,
-  "pi-scripts/joycraft-session-end": `#!/usr/bin/env bash
-# joycraft-session-end \u2014 Capture discoveries, validate, and stage changes for a Joycraft session.
-# Usage: joycraft-session-end [spec-name]
-set -euo pipefail
-SPEC_NAME="\${1:-unknown}"
-echo "Joycraft Session End"
-echo "\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550"
-echo ""
-# 1. Validate
-echo "\u25BA Running validation..."
-if pnpm test 2>/dev/null && pnpm build 2>/dev/null; then
-  echo "  \u2713 Validation passed"
-else
-  echo "  \u2717 Validation failed \u2014 fix before proceeding"
-  exit 1
-fi
-# 2. Capture discoveries (interactive prompt handled by agent, this just stages)
-DISCOVERIES_DIR="docs/discoveries"
-if [ -d "$DISCOVERIES_DIR" ]; then
-  # Check for uncommitted discoveries
-  if git diff --name-only -- "$DISCOVERIES_DIR" | grep -q .; then
-    echo "\u25BA Discoveries to review in $DISCOVERIES_DIR"
-  fi
-fi
-# 3. Stage all changes
-echo "\u25BA Staging changes..."
-git add -A
-echo ""
-echo "Session complete \u2014 $SPEC_NAME"
-echo "Ready for commit. Run: git commit -m 'spec: $SPEC_NAME'"
-`,
-  "pi-scripts/joycraft-spec-status": `#!/usr/bin/env bash
-# joycraft-spec-status \u2014 Read .joycraft-spec-queue.json and print a formatted status table.
-# Usage: joycraft-spec-status [specs-dir]
-set -euo pipefail
-SPECS_DIR="\${1:-docs/features/*/specs}"
-MANIFEST=$(ls "$SPECS_DIR"/.joycraft-spec-queue.json 2>/dev/null | head -1)
-if [ -z "$MANIFEST" ]; then
-  echo "No .joycraft-spec-queue.json found in $SPECS_DIR"
-  exit 1
-fi
-# Parse JSON with grep+sed (no jq dependency)
-echo "Spec Queue Status"
-echo "\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500"
-echo ""
-# Extract spec entries
-while IFS= read -r line; do
-  # Each spec entry looks like: { "id": 1, "file": "...", "depends_on": [...], "status": "..." }
-  id=$(echo "$line" | sed -n 's/.*"id": *\\([0-9]*\\).*/\\1/p')
-  file=$(echo "$line" | sed -n 's/.*"file": *"\\([^"]*\\)".*/\\1/p')
-  status=$(echo "$line" | sed -n 's/.*"status": *"\\([^"]*\\)".*/\\1/p')
-  if [ -n "$id" ] && [ -n "$file" ] && [ -n "$status" ]; then
-    # Glyphs per docs/reference/spec-status-lifecycle.md
-    case "$status" in
-      done)      marker="[\u2713]" ;;
-      in-review) marker="[~]" ;;
-      *)         marker="[ ]" ;;  # todo (and any unknown) render as not-started
-    esac
-    printf "%s  #%s  %s  (%s)\\n" "$marker" "$id" "$file" "$status"
-  fi
-done < <(grep -o '{[^}]*}' "$MANIFEST" | grep '"id"')
-`,
-  "scenarios/README.md": '# $SCENARIOS_REPO\n\nHoldout scenario tests for the main project. These tests run in CI against the\nbuilt artifact of each PR \u2014 but they live here, in a separate repository, so\nthe coding agent working on the main project cannot see them.\n\n---\n\n## What is the holdout pattern?\n\nThink of it like a validation set in machine learning. When you train a model,\nyou keep a slice of your data hidden from the training process. If the model\nscores well on data it has never seen, you can trust that it has actually\nlearned something \u2014 not just memorized the training examples.\n\nScenario tests work the same way. The coding agent writes code and passes\ninternal tests in the main repo. These scenario tests then check whether the\nresult behaves correctly from a real user\'s perspective, using only the public\ninterface of the built artifact.\n\nBecause the agent cannot read this repository, it cannot game the tests. A\npassing scenario run means the feature genuinely works.\n\n---\n\n## Why a separate repository?\n\nA single repository would expose the tests to the agent. Claude Code reads\nfiles in the working directory; if scenario tests lived in the main repo, the\nagent could (and would) read them when fixing failures, which defeats the\npurpose.\n\nA separate repo also means:\n\n- The test suite can be updated by humans without triggering the autofix loop\n- Scenarios can reference multiple projects over time\n- Access controls are independent \u2014 the scenarios repo can be more restricted\n\n---\n\n## How the CI pipeline works\n\n```\nMain repo PR opened\n        |\n        v\nMain repo CI runs (unit + integration tests)\n        |\n        | passes\n        v\nscenarios-dispatch.yml fires a repository_dispatch event\n        |\n        v\nThis repo: run.yml receives the event\n        |\n        +-- clones main-repo PR branch to ../main-repo\n        |\n        +-- builds the artifact (npm ci && npm run build)\n        |\n        +-- runs: NO_COLOR=1 npx vitest run\n        |\n        +-- captures exit code + output\n        |\n        v\nPosts PASS / FAIL comment on the originating PR\n```\n\nThe PR author sees the scenario result as a comment. No separate status check\nis required, though you can add one via the GitHub Checks API if you prefer.\n\n---\n\n## Adding scenarios\n\n### Rules\n\n1. **Behavioral, not structural.** Test what the tool does, not how it is\n   built internally. Invoke the binary; assert on stdout, exit codes, and\n   filesystem state. Never import from `../main-repo/src`.\n\n2. **End-to-end.** Each test should represent something a real user would\n   actually do. If you would not put it in a demo or docs example, reconsider\n   whether it belongs here.\n\n3. **No source imports.** The entire point of the holdout is that tests cannot\n   see source code. Any `import` that reaches into `../main-repo/src` breaks\n   the pattern.\n\n4. **Independent.** Each test must be able to run in isolation. Use `beforeEach`\n   / `afterEach` to set up and tear down temp directories. Do not share mutable\n   state between tests.\n\n5. **Deterministic.** Avoid network calls, timestamps, or random values in\n   assertions unless the feature under test genuinely involves them.\n\n### File layout\n\n```\n$SCENARIOS_REPO/\n\u251C\u2500\u2500 example-scenario.test.ts   # Starter file \u2014 replace with real scenarios\n\u251C\u2500\u2500 workflows/\n\u2502   \u2514\u2500\u2500 run.yml                # CI workflow (do not rename)\n\u251C\u2500\u2500 package.json\n\u2514\u2500\u2500 README.md\n```\n\nAdd new `.test.ts` files at the top level or in subdirectories. Vitest will\ndiscover them automatically.\n\n### Example structure\n\n```ts\nimport { spawnSync } from "node:child_process";\nimport { join } from "node:path";\n\nconst CLI = join(__dirname, "..", "main-repo", "dist", "cli.js");\n\nit("init creates a CLAUDE.md file", () => {\n  const tmp = mkdtempSync(join(tmpdir(), "scenario-"));\n  const { status } = spawnSync("node", [CLI, "init", tmp], { encoding: "utf8" });\n  expect(status).toBe(0);\n  expect(existsSync(join(tmp, "CLAUDE.md"))).toBe(true);\n});\n```\n\n---\n\n## Internal tests vs scenario tests\n\n| | Internal tests (main repo) | Scenario tests (this repo) |\n|---|---|---|\n| Location | `tests/` in main repo | This repo |\n| Visible to agent | Yes | No |\n| What they test | Units, modules, logic | End-to-end behavior |\n| Import source code | Yes | Never |\n| Run on every push | Yes | Yes (via dispatch) |\n| Purpose | Catch regressions fast | Validate real behavior |\n\n---\n\n## Relationship to Joycraft\n\nThis repository was bootstrapped by `npx joycraft init --autofix`. Joycraft\nmanages the `run.yml` workflow and keeps it in sync when you run\n`npx joycraft upgrade`. The test files are yours \u2014 Joycraft will never\noverwrite them.\n\nIf the `run.yml` workflow needs updating (e.g., a new version of\n`actions/create-github-app-token`), run `npx joycraft upgrade` in this repo\nand review the diff before applying.\n',
-  "scenarios/example-scenario.test.ts": `/**
+  "scenarios/README.md": '# $SCENARIOS_REPO\n\nHoldout scenario tests for the main project. These tests run in CI against the\nbuilt artifact of each PR \u2014 but they live here, in a separate repository, so\nthe coding agent working on the main project cannot see them.\n\n---\n\n## What is the holdout pattern?\n\nThink of it like a validation set in machine learning. When you train a model,\nyou keep a slice of your data hidden from the training process. If the model\nscores well on data it has never seen, you can trust that it has actually\nlearned something \u2014 not just memorized the training examples.\n\nScenario tests work the same way. The coding agent writes code and passes\ninternal tests in the main repo. These scenario tests then check whether the\nresult behaves correctly from a real user\'s perspective, using only the public\ninterface of the built artifact.\n\nBecause the agent cannot read this repository, it cannot game the tests. A\npassing scenario run means the feature genuinely works.\n\n---\n\n## Why a separate repository?\n\nA single repository would expose the tests to the agent. Claude Code reads\nfiles in the working directory; if scenario tests lived in the main repo, the\nagent could (and would) read them when fixing failures, which defeats the\npurpose.\n\nA separate repo also means:\n\n- The test suite can be updated by humans without triggering the autofix loop\n- Scenarios can reference multiple projects over time\n- Access controls are independent \u2014 the scenarios repo can be more restricted\n\n---\n\n## How the CI pipeline works\n\n```\nMain repo PR opened\n        |\n        v\nMain repo CI runs (unit + integration tests)\n        |\n        | passes\n        v\nscenarios-dispatch.yml fires a repository_dispatch event\n        |\n        v\nThis repo: run.yml receives the event\n        |\n        +-- clones main-repo PR branch to ../main-repo\n        |\n        +-- builds the artifact (npm ci && npm run build)\n        |\n        +-- runs: NO_COLOR=1 npx vitest run\n        |\n        +-- captures exit code + output\n        |\n        v\nPosts PASS / FAIL comment on the originating PR\n```\n\nThe PR author sees the scenario result as a comment. No separate status check\nis required, though you can add one via the GitHub Checks API if you prefer.\n\n---\n\n## Adding scenarios\n\n### Rules\n\n1. **Behavioral, not structural.** Test what the tool does, not how it is\n   built internally. Invoke the binary; assert on stdout, exit codes, and\n   filesystem state. Never import from `../main-repo/src`.\n\n2. **End-to-end.** Each test should represent something a real user would\n   actually do. If you would not put it in a demo or docs example, reconsider\n   whether it belongs here.\n\n3. **No source imports.** The entire point of the holdout is that tests cannot\n   see source code. Any `import` that reaches into `../main-repo/src` breaks\n   the pattern.\n\n4. **Independent.** Each test must be able to run in isolation. Use `beforeEach`\n   / `afterEach` to set up and tear down temp directories. Do not share mutable\n   state between tests.\n\n5. **Deterministic.** Avoid network calls, timestamps, or random values in\n   assertions unless the feature under test genuinely involves them.\n\n### File layout\n\n```\n$SCENARIOS_REPO/\n\u251C\u2500\u2500 example-scenario.test.ts.template   # Starter \u2014 rename to .test.ts, then edit\n\u251C\u2500\u2500 workflows/\n\u2502   \u2514\u2500\u2500 run.yml                         # CI workflow (do not rename)\n\u251C\u2500\u2500 package.json\n\u2514\u2500\u2500 README.md\n```\n\n> **Rename the starter on first use.** It ships as\n> `example-scenario.test.ts.template` (not `.test.ts`) so it stays out of your\n> *main* project\'s test/lint/build globs when Joycraft scaffolds it there. Once\n> it\'s in this holdout repo, rename it so Vitest discovers it:\n> `mv example-scenario.test.ts.template example-scenario.test.ts`.\n\nAdd new `.test.ts` files at the top level or in subdirectories. Vitest will\ndiscover them automatically.\n\n### Example structure\n\n```ts\nimport { spawnSync } from "node:child_process";\nimport { join } from "node:path";\n\nconst CLI = join(__dirname, "..", "main-repo", "dist", "cli.js");\n\nit("init creates a CLAUDE.md file", () => {\n  const tmp = mkdtempSync(join(tmpdir(), "scenario-"));\n  const { status } = spawnSync("node", [CLI, "init", tmp], { encoding: "utf8" });\n  expect(status).toBe(0);\n  expect(existsSync(join(tmp, "CLAUDE.md"))).toBe(true);\n});\n```\n\n---\n\n## Internal tests vs scenario tests\n\n| | Internal tests (main repo) | Scenario tests (this repo) |\n|---|---|---|\n| Location | `tests/` in main repo | This repo |\n| Visible to agent | Yes | No |\n| What they test | Units, modules, logic | End-to-end behavior |\n| Import source code | Yes | Never |\n| Run on every push | Yes | Yes (via dispatch) |\n| Purpose | Catch regressions fast | Validate real behavior |\n\n---\n\n## Relationship to Joycraft\n\nThis repository was bootstrapped by `npx joycraft init --autofix`. Joycraft\nmanages the `run.yml` workflow and keeps it in sync when you run\n`npx joycraft upgrade`. The test files are yours \u2014 Joycraft will never\noverwrite them.\n\nIf the `run.yml` workflow needs updating (e.g., a new version of\n`actions/create-github-app-token`), run `npx joycraft upgrade` in this repo\nand review the diff before applying.\n',
+  "scenarios/example-scenario.test.ts.template": `/**
  * Example Scenario Test
  *
  * This file is a template for scenario tests in your holdout repository.
@@ -446,7 +268,7 @@ var CODEX_SKILLS = {
   "joycraft-design.md": '---\nname: joycraft-design\ndescription: Design discussion before decomposition \u2014 produce a ~200-line design artifact for human review, catching wrong assumptions before they propagate into specs\n---\n\n# Design Discussion\n\nYou are producing a design discussion document for a feature. This sits between research and decomposition \u2014 it captures your understanding so the human can catch wrong assumptions before specs are written.\n\n**Guard clause:** If no brief path is provided and no brief exists at `docs/features/<slug>/brief.md`, say:\n"No feature brief found. Run `$joycraft-new-feature` first to create one, or provide the path to your brief."\nThen stop.\n\n---\n\n## Step 1: Read Inputs\n\nRead the feature brief at the path the user provides. If the user also provides a research document path, read that too. Research is optional \u2014 if none exists, note that you\'ll explore the codebase directly.\n\n## Step 2: Explore the Codebase\n\nSpawn subagents to explore the codebase for patterns relevant to the brief. Focus on:\n\n- Files and functions that will be touched or extended\n- Existing patterns this feature should follow (naming, data flow, error handling)\n- Similar features already implemented that serve as models\n- Boundaries and interfaces the feature must integrate with\n\nGather file paths, function signatures, and code snippets. You need concrete evidence, not guesses.\n\n## Step 3: Write the Design Document\n\nDerive the slug from the brief path (`docs/features/<slug>/brief.md`).\nLazy-create the folder `docs/features/<slug>/` if needed.\nWrite the design document to `docs/features/<slug>/design.md`.\n\nThe file MUST start with YAML frontmatter \u2014 the 4-field personal schema:\n\n```yaml\n---\nstatus: active\nowner: <resolved name>\ncreated: YYYY-MM-DD\nfeature: <slug>\n---\n```\n\n**Owner resolution:** look up the owner name in this order \u2014 (1) `git config user.name`, (2) value in your auto-memory `joycraft-owner.txt` if present, (3) ask the user once and persist.\n\nThe document has exactly five sections:\n\n### Section 1: Current State\n\nWhat exists today in the codebase that is relevant to this feature. Include file paths, function signatures, and data flows. Be specific \u2014 reference actual code, not abstractions. If no research doc was provided, note that and describe what you found through direct exploration.\n\n### Section 2: Desired End State\n\nWhat the codebase should look like when this feature is complete. Describe the change at a high level \u2014 new files, modified interfaces, new data flows. Do NOT include implementation steps. This is the "what," not the "how."\n\n### Section 3: Patterns to Follow\n\nExisting patterns in the codebase that this feature should match. Include short code snippets and `file:line` references. Show the pattern, don\'t just name it.\n\nIf this is a greenfield project with no existing patterns, propose conventions and note that no precedent exists.\n\n### Section 4: Resolved Design Decisions\n\nDecisions you have already made, with brief rationale. Format each as:\n\n> **Decision:** [what you decided]\n> **Rationale:** [why, referencing existing code or constraints]\n> **Alternative rejected:** [what you considered and why you rejected it]\n\n### Section 5: Open Questions\n\nThings you don\'t know or where multiple valid approaches exist. Each question MUST present 2-3 concrete options with pros and cons. Format:\n\n> **Q: [question]**\n> - **Option A:** [description] \u2014 Pro: [benefit]. Con: [cost].\n> - **Option B:** [description] \u2014 Pro: [benefit]. Con: [cost].\n> - **Option C (if applicable):** [description] \u2014 Pro: [benefit]. Con: [cost].\n\nDo NOT ask vague questions like "what do you think?" Every question must have actionable options the human can choose from.\n\n### Update the Feature Brief\n\nAfter writing the design document, update the parent brief with a back-reference:\n1. Read `docs/features/<slug>/brief.md`\n2. In the header blockquote (the `>` lines at the top), add or update:\n   `> **Design:** docs/features/<slug>/design.md`\n3. If a `> **Design:**` line already exists, replace it \u2014 do NOT add a duplicate\n4. Write the brief back\n\n## Step 4: Reconcile Brief with Findings\n\nYou\'ve just written `docs/features/<slug>/design.md`. Before hand-off, the parent brief at `docs/features/<slug>/brief.md` may now disagree with what you discovered. Re-read it and check each of these sections:\n\n| Brief section | What to look for |\n|---|---|\n| Vision | Did your findings refine or contradict the framing? |\n| Hard Constraints | Are any constraints now obsolete, missing, or refined? |\n| Out of Scope | Did your findings push something in or out of scope? |\n| Decomposition | Are spec counts, names, or dependencies still accurate? |\n| Test Strategy | Do your findings change what or how to test? |\n| Success Criteria | Are the criteria still observable and still match the goal? |\n\n**For each section, choose one:**\n\n- **Edit in place** \u2014 small, mechanical updates: line-number corrections, clarifications, additions consistent with brief intent. No user approval needed.\n- **Diff + stop** \u2014 non-trivial changes: counts flipping, decomposition restructure, scope changes, contradiction with original brief intent. Present a diff of the proposed change, STOP, and wait for user approval before continuing.\n\nIf you make changes, note them at the bottom of `design.md` under a "Brief updates" subsection. If the brief is already in sync, note: "Reconciliation checked, no changes required." If no parent brief exists (feature was described inline), note that and skip this step.\n\n**Why this step exists:** the silent-drift gap. Without reconciliation, the brief and downstream artifacts diverge \u2014 and later decomposition is sized against the stale brief. This feature ("single-source-skills") hit exactly this: brief said "11 clean / 9 dirty" until the research re-audit forced a re-decomposition. Don\'t let it happen again.\n\n## Step 5: Present and STOP \u2014 Pre-Approval Hold\n\nPresent the design document to the user. Say:\n\n```\nDesign discussion written to docs/features/<slug>/design.md\n\nPlease review the document above. Specifically:\n1. Are the patterns in Section 3 the right ones to follow, or should I use different ones?\n2. Do you agree with the resolved decisions in Section 4?\n3. Pick an option for each open question in Section 5 (or propose your own).\n\nReply with your feedback. I will NOT proceed to decomposition until you have reviewed and approved this design.\n```\n\n**CRITICAL: Do NOT emit the canonical Handoff block at this point.** The Handoff block emits ONLY after human approval (see "Step 6: Hand Off (Post-Approval Only)" below). The entire value of this skill is the pause \u2014 it forces a human checkpoint before mistakes propagate.\n\n## Offer to Capture Deferred Items to Backlog\n\nIf during the design discussion the user mentions deferred work \u2014 "let\'s not do X yet," "save Y for later" \u2014 ASK before writing:\n\n> "This looks like deferred work \u2014 want me to capture it to `docs/backlog/`?"\n\nOnly on user confirmation, write a backlog entry at `docs/backlog/YYYY-MM-DD-<short-name>.md` with backlog frontmatter:\n\n```yaml\n---\nstatus: backlog\nowner: <resolved name>\ncreated: YYYY-MM-DD\nsource: docs/features/<slug>/brief.md\n---\n```\n\n**Never auto-write to `docs/backlog/`.** Every backlog entry is user-confirmed.\n\n## Step 6: Hand Off (Post-Approval Only)\n\nOnce the human approves the design:\n- Update the design document with their corrections and chosen options\n- Move answered questions from "Open Questions" to "Resolved Design Decisions"\n- Present the updated document for final confirmation\n- Once the user gives explicit approval, AND ONLY THEN, emit the canonical Handoff block:\n\n## Recommended Next Steps\n\nNext:\n```bash\n$joycraft-decompose docs/features/<slug>/brief.md\n```\nRun run `/clear` in the CLI, or press Cmd+N (Ctrl+N on Windows/Linux) for a new thread in the desktop/IDE app first.\n\nInclude any backlog paths produced as a side effect.\n',
   "joycraft-gather-context.md": "---\nname: joycraft-gather-context\ndescription: First-run onboarding pass that populates the project context layer -- read what context already exists, then offer a gap-only interview and batch-write the missing fact rows and long-form reference docs\n---\n\n# Gather Context\n\nThis is the first-run **read-then-offer** onboarding pass \u2014 the lowest-intervention way to populate the project's context layer. You read what context already exists, summarize coverage, offer a gap-only interview, and write everything in one reviewable batch at the end.\n\nThis skill is self-contained. It composes the same conventions the single-doc skills use, but everything you need is inlined below \u2014 do not call into or import another skill's logic.\n\n## Step 1: Read What Already Exists First\n\nThe user has invoked the first-run onboarding pass (e.g., `$joycraft-gather-context`). Before asking the user anything, scan the project's existing context. Default scan breadth is **README + `docs/` + AGENTS.md only**:\n\n- The README(s) at the repo root and any obvious sub-package READMEs.\n- `docs/**` \u2014 existing design, architecture, or style docs.\n- `docs/context/*` \u2014 the flat operational fact-docs (production-map, dangerous-assumptions, decision-log, institutional-knowledge, troubleshooting) and `docs/context/reference/*` long-form docs.\n- The current AGENTS.md content, including any `## Context Map` section.\n\nThen summarize for the user what context already exists and what's covered.\n\n**Do NOT auto-run a code-inference scan.** Reading the actual source to infer architecture costs significantly more tokens. Offer that deeper/full review ONLY if the user explicitly asks for it, and when you do, note clearly that it costs more tokens. The default pass never reads the codebase to infer context.\n\n## Step 2: Offer a Gap-Only Interview (Don't Force)\n\nFrom the summary, identify genuine gaps: no design-system doc? no production map? no decision log? Offer an **optional** interview that targets only those gaps. The user can decline any or all of it \u2014 offer, never force.\n\n**Per-doc skip guard (not all-or-nothing):** Never re-interview for a doc that already has real content. Skip each doc that's already populated individually, and interview only the empty or missing ones. If everything is already covered, say so and offer nothing.\n\n## Step 3: Route by Shape (Inline Test)\n\nFor each thing the user wants to capture, apply this minimal shape test inline \u2014 do not defer to another skill:\n\n- **\"Could this be one row in a table?\"** \u2192 it's an **operational fact**. Route it to one of the five flat fact-docs under `docs/context/`:\n  - `docs/context/production-map.md` \u2014 infrastructure, services, environments, URLs, credentials, safe/unsafe to touch.\n  - `docs/context/dangerous-assumptions.md` \u2014 false assumptions an agent might make.\n  - `docs/context/decision-log.md` \u2014 an architectural/tooling choice and why.\n  - `docs/context/institutional-knowledge.md` \u2014 team conventions, unwritten rules, ownership.\n  - `docs/context/troubleshooting.md` \u2014 when X happens, do Y.\n  Append it as a table row (or list item for institutional-knowledge), removing any italic example rows in that table first.\n\n- **\"Does explaining it take paragraphs?\"** \u2192 it's **long-form reference**. Scaffold `docs/context/reference/<slug>.md` from the matching template in `docs/templates/context/reference/` (`design-system`, `frontend-methodology`, `backend`, `testing`, or the generic `reference-doc` fallback), lazy-creating `docs/context/reference/` on first write.\n\nIf an item is ambiguous, apply the test literally: one row \u2192 fact bucket; paragraphs \u2192 reference doc.\n\n## Step 4: Batch-Write + One Final Confirm\n\nDo NOT write per-answer. Collect ALL of the user's gap answers across the whole interview first. Then, in ONE batch:\n\n1. Write all the fact rows into their fact-docs.\n2. Scaffold and write all the reference docs into `docs/context/reference/`.\n3. Add or update the `## Context Map` pointer rows in AGENTS.md \u2014 one row per reference doc, in the form `| docs/context/reference/<slug>.md | <when to read it> |`. Create the `## Context Map` section (header + two-column table) if it doesn't exist; update an existing row in place rather than duplicating it.\n\nPresent the full set of intended changes and get ONE final confirm (\"do it in one go\") before writing. If the user aborts at the final confirm, write nothing \u2014 there are no partial writes in this batch model. The result is one clean, reviewable diff.\n\n## Step 5: Confirm and Hand Off\n\nReport the batch: which fact rows were added, which reference docs were scaffolded, and which Context Map rows were created or updated. Then end with the canonical Handoff block.\n\n## Recommended Next Steps\n\nNext:\n```bash\n$joycraft-session-end\n```\nRun run `/clear` in the CLI, or press Cmd+N (Ctrl+N on Windows/Linux) for a new thread in the desktop/IDE app first.\n",
   "joycraft-implement-feature.md": "---\nname: joycraft-implement-feature\ndescription: Run a feature's entire spec queue from one invocation \u2014 sequential chain with per-spec wrap-up, fail-fast, session-end once at the end\n---\n\n# Implement Feature (Whole-Queue Driver)\n\nOne invocation runs a feature's whole spec queue: `$joycraft-implement-feature docs/features/<slug>/`. You drive the queue **sequentially in this conversation** \u2014 Codex has no subagent boundary to give each spec a fresh context, so the chain shares context and compensates with disciplined per-spec wrap-ups. This is ordinary interactive use \u2014 one human invocation, no headless loop, no ToS/cost caveat.\n\n> **Context honesty:** for queues of heavy `isolated`-mode specs, a shared-context chain is the wrong tool \u2014 true per-spec isolation comes from Pi's `joycraft-implement-loop` (fresh process per spec) or guided-manual (`/new` + re-invoke per spec). Say so up front when you see a queue dominated by `isolated` specs, then proceed only if the user confirms.\n\n## Step 1: Load the Queue\n\n1. Resolve the specs directory: if the given path contains a `specs/` subdirectory, use it; otherwise use the path itself. Look for `.joycraft-spec-queue.json` there.\n2. **No queue** \u2192 stop:\n\n   > No spec queue found in [path]. Run `$joycraft-decompose` first \u2014 it writes the queue, the specs, and the wave plan.\n\n3. Read the sibling `README.md` (the wave plan) for the intended order. Waves marked parallel-safe still run sequentially here \u2014 parallelism needs isolation this harness chain doesn't have.\n4. Report the plan before starting: feature slug, M specs, current statuses, the order you'll run them in. If the queue is dominated by `isolated` specs, surface the context-honesty note above and get a confirmation.\n5. If **no `todo` specs remain**, skip to Step 4 and say why.\n\n## Step 2: The Chain \u2014 One Spec at a Time\n\nRepeat until no `todo` specs remain:\n\n1. **Find the next ready spec**: the first `todo` whose `depends_on` are all `in-review`/`done` (read the queue JSON).\n2. **None ready but `todo` specs remain** \u2192 fail-fast (Step 3): report which specs are blocked and on what.\n3. **Execute the spec** by following `.agents/skills/joycraft-implement/SKILL.md` end to end \u2014 strict TDD (failing tests first, implement until green, every Acceptance Criterion met), then its per-spec wrap-up: bump to `in-review` in BOTH the queue JSON and the spec's frontmatter, terse discovery stub only if surprised, commit `spec: <spec-name>`. (Treat `isolated` specs the user approved into this chain as `checkpoint`.)\n4. **Verify before advancing**: queue shows `in-review`, `git log` shows the `spec:` commit, tests green. Anything off \u2192 fail-fast (Step 3).\n5. Report one line \u2014 `Spec complete: <name> (spec N of M)` \u2014 and continue.\n\n## Step 3: Fail-Fast\n\nWhen a spec fails (tests not green, or all remaining specs are blocked):\n\n- **Stop the chain.** Start no further specs.\n- Report: which spec failed and why, what reached `in-review`, what remains `todo`. Leave the queue exactly as it is \u2014 never mark anything to cover a failure.\n- Suggest the recovery path: fix in a fresh conversation (`/new`, then `$joycraft-implement <failed-spec>`), then re-run `$joycraft-implement-feature` for the remainder.\n\n## Step 4: Finish \u2014 Session-End Once\n\nWhen no `todo` specs remain, run the once-per-feature finisher yourself: read and follow `.agents/skills/joycraft-session-end/SKILL.md`. It owns the gates the chain deliberately skipped: full validation (must pass before anything graduates `in-review \u2192 done`), discovery consolidation, and push/PR per the project's AGENTS.md git autonomy rules.\n\n## Final Report\n\n```\nFeature run: <slug>\n- Specs completed: N of M (now in-review/done) \xB7 failures: [none | <spec> \u2014 <reason>]\n- Session-end: [ran \u2014 see its report | skipped: <reason>]\n- Discoveries: [n stubs consolidated | none]\n```\n",
-  "joycraft-implement-level5.md": "---\nname: joycraft-implement-level5\ndescription: Set up Level 5 autonomous development \u2014 autofix loop, holdout scenario testing, and scenario evolution from specs\n---\n\n# Implement Level 5 \u2014 Autonomous Development Loop\n\nYou are guiding the user through setting up Level 5: the autonomous feedback loop where specs go in, validated software comes out. This is a one-time setup that installs workflows, creates a scenarios repo, and configures the autofix loop.\n\n## Before You Begin\n\nCheck prerequisites:\n\n1. **Project must be initialized.** Look for `docs/.joycraft/state.json` (older installs may still have it at the legacy `.claude/.joycraft/state.json` or a `.joycraft-version` at the repo root). If none exist, tell the user to run `npx joycraft init` first.\n2. **Project should be at Level 4.** Check `docs$joycraft-assessment.md` if it exists. If the project hasn't been assessed yet, suggest running `$joycraft-tune` first. But don't block \u2014 the user may know they're ready.\n3. **Git repo with GitHub remote.** This setup requires GitHub Actions. Check for `.git/` and a GitHub remote.\n\nIf prerequisites aren't met, explain what's needed and stop.\n\n## Step 1: Explain What Level 5 Means\n\nTell the user:\n\n> Level 5 is the autonomous loop. When you push specs, three things happen automatically:\n>\n> 1. **Scenario evolution** \u2014 A separate AI agent reads your specs and writes holdout tests in a private scenarios repo. These tests are invisible to your coding agent.\n> 2. **Autofix** \u2014 When CI fails on a PR, Claude Code automatically attempts a fix (up to 3 times).\n> 3. **Holdout validation** \u2014 When CI passes, your scenarios repo runs behavioral tests against the PR. Results post as PR comments.\n>\n> The key insight: your coding agent never sees the scenario tests. This prevents it from gaming the test suite \u2014 like a validation set in machine learning.\n\n## Step 2: Gather Configuration\n\nAsk these questions **one at a time**:\n\n### Question 1: Scenarios repo name\n\n> What should we call your scenarios repo? It'll be a private repo that holds your holdout tests.\n>\n> Default: `{current-repo-name}-scenarios`\n\nAccept the default or the user's choice.\n\n### Question 2: GitHub App\n\n> Level 5 needs a GitHub App to provide a separate identity for autofix pushes (this avoids GitHub's anti-recursion protection). Creating one takes about 2 minutes:\n>\n> 1. Go to https://github.com/settings/apps/new\n> 2. Give it a name (e.g., \"My Project Autofix\")\n> 3. Uncheck \"Webhook > Active\" (not needed)\n> 4. Under **Repository permissions**, set:\n>    - **Contents**: Read & Write\n>    - **Pull requests**: Read & Write\n>    - **Actions**: Read & Write\n> 5. Click **Create GitHub App**\n> 6. Note the **App ID** from the settings page\n> 7. Scroll to **Private keys** > click **Generate a private key** > save the `.pem` file\n> 8. Click **Install App** in the left sidebar > install it on your repo\n>\n> What's your App ID?\n\n## Step 3: Run init-autofix\n\nRun the CLI command with the gathered configuration:\n\n```bash\nnpx joycraft init-autofix --scenarios-repo {name} --app-id {id}\n```\n\nReview the output with the user. Confirm files were created.\n\n## Step 4: Walk Through Secret Configuration\n\nGuide the user step by step:\n\n### 4a: Add Secrets to Main Repo\n\n> You should already have the `.pem` file from when you created the app in Step 2.\n\n> Go to your repo's Settings > Secrets and variables > Actions, and add:\n> - `JOYCRAFT_APP_PRIVATE_KEY` \u2014 paste the contents of your `.pem` file\n> - `ANTHROPIC_API_KEY` \u2014 your Anthropic API key\n\n### 4b: Create the Scenarios Repo\n\n> Create the private scenarios repo:\n> ```bash\n> gh repo create {scenarios-repo-name} --private\n> ```\n>\n> Then copy the scenario templates into it:\n> ```bash\n> cp -r docs/templates/scenarios/* ../{scenarios-repo-name}/\n> cd ../{scenarios-repo-name}\n> git add -A && git commit -m \"init: scaffold scenarios repo from Joycraft\"\n> git push\n> ```\n\n### 4c: Add Secrets to Scenarios Repo\n\n> The scenarios repo also needs the App private key:\n> - `JOYCRAFT_APP_PRIVATE_KEY` \u2014 same `.pem` file as the main repo\n> - `ANTHROPIC_API_KEY` \u2014 same key (needed for scenario generation)\n\n## Step 5: Verify Setup\n\nHelp the user verify everything is wired correctly:\n\n1. **Check workflow files exist:** `ls .github/workflows/autofix.yml .github/workflows/scenarios-dispatch.yml .github/workflows/spec-dispatch.yml .github/workflows/scenarios-rerun.yml`\n2. **Check scenario templates were copied:** Verify the scenarios repo has `example-scenario.test.ts`, `workflows/run.yml`, `workflows/generate.yml`, `prompts/scenario-agent.md`\n3. **Check the App ID is correct** in the workflow files (not still a placeholder)\n\n## Step 6: Update AGENTS.md\n\nIf the project's AGENTS.md doesn't already have an \"External Validation\" section, add one:\n\n> ## External Validation\n>\n> This project uses holdout scenario tests in a separate private repo.\n>\n> ### NEVER\n> - Access, read, or reference the scenarios repo\n> - Mention scenario test names or contents\n> - Modify the scenarios dispatch workflow to leak test information\n>\n> The scenarios repo is deliberately invisible to you. This is the holdout guarantee.\n\n## Step 7: First Test (Optional)\n\nIf the user wants to test the loop:\n\n> Want to do a quick test? Here's how:\n>\n> 1. Write a simple spec in `docs/features/<slug>/specs/` and push to main \u2014 this triggers scenario generation\n> 2. Create a PR with a small change \u2014 when CI passes, scenarios will run\n> 3. Watch for the scenario test results as a PR comment\n>\n> Or deliberately break something in a PR to test the autofix loop.\n\n## Step 8: Summary\n\nPrint a summary of what was set up:\n\n> **Level 5 is live.** Here's what's running:\n>\n> | Trigger | What Happens |\n> |---------|-------------|\n> | Push specs to `docs/features/<slug>/specs/` | Scenario agent writes holdout tests |\n> | PR fails CI | Claude autofix attempts (up to 3x) |\n> | PR passes CI | Holdout scenarios run against PR |\n> | Scenarios update | Open PRs re-tested with latest scenarios |\n>\n> Your scenarios repo: `{name}`\n> Your coding agent cannot see those tests. The holdout wall is intact.\n\n**Important:** Tell the user:\n\n> **Before you can test the loop**, you need to merge this PR to main first. GitHub's `workflow_run` triggers only activate for workflows that exist on the default branch. Once merged, create a new PR with any small change \u2014 that's when you'll see Autofix, Scenarios Dispatch, and Spec Dispatch fire for the first time.\n\nUpdate `docs$joycraft-assessment.md` if it exists \u2014 set the Level 5 score to reflect the new setup.\n",
+  "joycraft-implement-level5.md": "---\nname: joycraft-implement-level5\ndescription: Set up Level 5 autonomous development \u2014 autofix loop, holdout scenario testing, and scenario evolution from specs\n---\n\n# Implement Level 5 \u2014 Autonomous Development Loop\n\nYou are guiding the user through setting up Level 5: the autonomous feedback loop where specs go in, validated software comes out. This is a one-time setup that installs workflows, creates a scenarios repo, and configures the autofix loop.\n\n## Before You Begin\n\nCheck prerequisites:\n\n1. **Project must be initialized.** Look for `docs/.joycraft/state.json` (older installs may still have it at the legacy `.claude/.joycraft/state.json` or a `.joycraft-version` at the repo root). If none exist, tell the user to run `npx joycraft init` first.\n2. **Project should be at Level 4.** Check `docs$joycraft-assessment.md` if it exists. If the project hasn't been assessed yet, suggest running `$joycraft-tune` first. But don't block \u2014 the user may know they're ready.\n3. **Git repo with GitHub remote.** This setup requires GitHub Actions. Check for `.git/` and a GitHub remote.\n\nIf prerequisites aren't met, explain what's needed and stop.\n\n## Step 1: Explain What Level 5 Means\n\nTell the user:\n\n> Level 5 is the autonomous loop. When you push specs, three things happen automatically:\n>\n> 1. **Scenario evolution** \u2014 A separate AI agent reads your specs and writes holdout tests in a private scenarios repo. These tests are invisible to your coding agent.\n> 2. **Autofix** \u2014 When CI fails on a PR, Claude Code automatically attempts a fix (up to 3 times).\n> 3. **Holdout validation** \u2014 When CI passes, your scenarios repo runs behavioral tests against the PR. Results post as PR comments.\n>\n> The key insight: your coding agent never sees the scenario tests. This prevents it from gaming the test suite \u2014 like a validation set in machine learning.\n\n## Step 2: Gather Configuration\n\nAsk these questions **one at a time**:\n\n### Question 1: Scenarios repo name\n\n> What should we call your scenarios repo? It'll be a private repo that holds your holdout tests.\n>\n> Default: `{current-repo-name}-scenarios`\n\nAccept the default or the user's choice.\n\n### Question 2: GitHub App\n\n> Level 5 needs a GitHub App to provide a separate identity for autofix pushes (this avoids GitHub's anti-recursion protection). Creating one takes about 2 minutes:\n>\n> 1. Go to https://github.com/settings/apps/new\n> 2. Give it a name (e.g., \"My Project Autofix\")\n> 3. Uncheck \"Webhook > Active\" (not needed)\n> 4. Under **Repository permissions**, set:\n>    - **Contents**: Read & Write\n>    - **Pull requests**: Read & Write\n>    - **Actions**: Read & Write\n> 5. Click **Create GitHub App**\n> 6. Note the **App ID** from the settings page\n> 7. Scroll to **Private keys** > click **Generate a private key** > save the `.pem` file\n> 8. Click **Install App** in the left sidebar > install it on your repo\n>\n> What's your App ID?\n\n## Step 3: Run init-autofix\n\nRun the CLI command with the gathered configuration:\n\n```bash\nnpx joycraft init-autofix --scenarios-repo {name} --app-id {id}\n```\n\nReview the output with the user. Confirm files were created.\n\n## Step 4: Walk Through Secret Configuration\n\nGuide the user step by step:\n\n### 4a: Add Secrets to Main Repo\n\n> You should already have the `.pem` file from when you created the app in Step 2.\n\n> Go to your repo's Settings > Secrets and variables > Actions, and add:\n> - `JOYCRAFT_APP_PRIVATE_KEY` \u2014 paste the contents of your `.pem` file\n> - `ANTHROPIC_API_KEY` \u2014 your Anthropic API key\n\n### 4b: Create the Scenarios Repo\n\n> Create the private scenarios repo:\n> ```bash\n> gh repo create {scenarios-repo-name} --private\n> ```\n>\n> Then copy the scenario templates into it. The starter ships as\n> `example-scenario.test.ts.template` (the `.template` suffix keeps it out of\n> the *main* project's test/lint/build globs); rename it to `.test.ts` once it's\n> in the holdout repo so Vitest discovers it:\n> ```bash\n> cp -r docs/templates/scenarios/* ../{scenarios-repo-name}/\n> cd ../{scenarios-repo-name}\n> mv example-scenario.test.ts.template example-scenario.test.ts\n> git add -A && git commit -m \"init: scaffold scenarios repo from Joycraft\"\n> git push\n> ```\n\n### 4c: Add Secrets to Scenarios Repo\n\n> The scenarios repo also needs the App private key:\n> - `JOYCRAFT_APP_PRIVATE_KEY` \u2014 same `.pem` file as the main repo\n> - `ANTHROPIC_API_KEY` \u2014 same key (needed for scenario generation)\n\n## Step 5: Verify Setup\n\nHelp the user verify everything is wired correctly:\n\n1. **Check workflow files exist:** `ls .github/workflows/autofix.yml .github/workflows/scenarios-dispatch.yml .github/workflows/spec-dispatch.yml .github/workflows/scenarios-rerun.yml`\n2. **Check scenario templates were copied:** Verify the scenarios repo has `example-scenario.test.ts` (renamed from the `.template` starter), `workflows/run.yml`, `workflows/generate.yml`, `prompts/scenario-agent.md`\n3. **Check the App ID is correct** in the workflow files (not still a placeholder)\n\n## Step 6: Update AGENTS.md\n\nIf the project's AGENTS.md doesn't already have an \"External Validation\" section, add one:\n\n> ## External Validation\n>\n> This project uses holdout scenario tests in a separate private repo.\n>\n> ### NEVER\n> - Access, read, or reference the scenarios repo\n> - Mention scenario test names or contents\n> - Modify the scenarios dispatch workflow to leak test information\n>\n> The scenarios repo is deliberately invisible to you. This is the holdout guarantee.\n\n## Step 7: First Test (Optional)\n\nIf the user wants to test the loop:\n\n> Want to do a quick test? Here's how:\n>\n> 1. Write a simple spec in `docs/features/<slug>/specs/` and push to main \u2014 this triggers scenario generation\n> 2. Create a PR with a small change \u2014 when CI passes, scenarios will run\n> 3. Watch for the scenario test results as a PR comment\n>\n> Or deliberately break something in a PR to test the autofix loop.\n\n## Step 8: Summary\n\nPrint a summary of what was set up:\n\n> **Level 5 is live.** Here's what's running:\n>\n> | Trigger | What Happens |\n> |---------|-------------|\n> | Push specs to `docs/features/<slug>/specs/` | Scenario agent writes holdout tests |\n> | PR fails CI | Claude autofix attempts (up to 3x) |\n> | PR passes CI | Holdout scenarios run against PR |\n> | Scenarios update | Open PRs re-tested with latest scenarios |\n>\n> Your scenarios repo: `{name}`\n> Your coding agent cannot see those tests. The holdout wall is intact.\n\n**Important:** Tell the user:\n\n> **Before you can test the loop**, you need to merge this PR to main first. GitHub's `workflow_run` triggers only activate for workflows that exist on the default branch. Once merged, create a new PR with any small change \u2014 that's when you'll see Autofix, Scenarios Dispatch, and Spec Dispatch fire for the first time.\n\nUpdate `docs$joycraft-assessment.md` if it exists \u2014 set the Level 5 score to reflect the new setup.\n",
   "joycraft-implement.md": "---\nname: joycraft-implement\ndescription: Execute atomic specs with TDD \u2014 read spec, write failing tests, implement until green, wrap up and continue the queue\n---\n\n# Implement Atomic Spec\n\nYou have exactly one atomic spec file to execute. Your job is to implement it using strict TDD \u2014 tests first, confirm they fail, then implement until green.\n\n## Step 1: Parse Arguments\n\nThe user MUST provide a path. No path = stop immediately.\n\n**If no path was provided:**\n\n> No spec path provided. Provide a spec file or a feature directory:\n> `$joycraft-implement docs/features/<slug>/specs/spec-name.md`\n> or `$joycraft-implement docs/features/<slug>/`\n\n**If the path is a directory** (ends with `/` or does not end with `.md`):\n\nLook for `specs/.joycraft-spec-queue.json` inside that directory. Read it. Find the **first `todo` spec whose dependencies are satisfied** (a dependency is satisfied once it is `in-review` or `done`). This matches what `joycraft-next-spec` serves. That single spec file is your target. Do NOT read any other specs.\n\n> Using spec queue: found [spec-file-name] as the next spec.\n\nIf the directory has no queue or no `todo` specs:\n\n> No remaining specs found in [directory].\n\n**If the path is a file** ending in `.md`:\n\nUse it directly as the spec to implement.\n\n## Step 2: Read the Sibling README.md FIRST (if present)\n\nBefore reading the spec itself, check for a sibling `README.md` in the same folder as the spec \u2014 i.e., `<spec-path>/../README.md`. This file is the wave-plan + spec-table that `$joycraft-decompose` writes per feature.\n\n- **If present:** Read the README first. It tells you the spec's position in the wave plan, its dependencies, and which sibling specs (in the same folder) need to be done before this one.\n- **If absent:** That's fine \u2014 proceed normally. The convention is forward-only and many legacy spec folders pre-date it.\n\n### Warn on Unmet Dependencies\n\nIf the README shows that this spec depends on other specs in the same folder, check whether those dependencies are satisfied. A dependency is satisfied once its frontmatter `status:` is `in-review` or `done` (see `docs/reference/spec-status-lifecycle.md`) \u2014 a checkpoint chain progresses on `in-review` without waiting for session-end to graduate it to `done`. A dependency still at `todo` is unmet.\n\nIf any dependency is **not** complete, tell the user:\n\n> \"This spec lists unmet dependencies in the sibling README.md: [list]. Proceed anyway, or stop?\"\n\nWait for confirmation before continuing. The user might be deliberately running out of order (a hotfix, an exploration, etc.) \u2014 your job is to surface the warning, not to gate.\n\n## Step 3: Read and Understand the Spec\n\n1. **Read the spec file.** The spec is your execution contract \u2014 the Acceptance Criteria and Test Plan define \"done.\"\n2. **Check the spec's Status field.** If it says \"Complete,\" warn the user and ask if they want to re-implement or skip.\n3. **Read the Acceptance Criteria** \u2014 these are your success conditions.\n4. **Read the Test Plan** \u2014 this tells you exactly what tests to write and in what order.\n5. **Read the Constraints** \u2014 these are hard boundaries you must not violate.\n\n### Finding Additional Context\n\nSpecs are designed to be self-contained, but if you need more context:\n\n- **Parent brief:** Linked in the spec's body (`> **Parent Brief:**` line). The new convention is `docs/features/<slug>/brief.md`. Read it for broader feature context.\n- **Related specs:** Live in the same directory (typically `docs/features/<slug>/specs/`). The sibling `README.md` (read in Step 2 above) is the index.\n- **Affected Files:** The spec's Affected Files table tells you which files to create or modify.\n\n\n### Before writing code against an external API:\n\n\u26A0\uFE0F If the spec references a third-party SDK or package, read its official documentation and type definitions FIRST. Never write a `declare module` stub for a package that actually exists \u2014 use the real package as a devDependency instead. The stub will make typecheck pass but the code will fail at runtime.\n\n## Step 4: Execute the TDD Cycle\n\n**This is not optional. Write tests FIRST.**\n\n### 3a. Write Tests (Red Phase)\n\nUsing the spec's Test Plan:\n\n1. Write ALL tests listed in the Test Plan. Each Acceptance Criterion must have at least one test.\n2. Tests should call the actual function/endpoint \u2014 not a reimplementation or mock of the underlying library.\n3. Run the tests. **They MUST fail.** If any test passes immediately:\n   - Flag it \u2014 either the test isn't testing the right thing, or the code already exists.\n   - Investigate before proceeding. A test that passes before implementation is a test that proves nothing.\n\n### 3b. Implement (Green Phase)\n\n1. Follow the spec's Approach section for implementation strategy.\n2. Implement the minimum code needed to make tests pass.\n3. Run tests after each meaningful change \u2014 use the spec's Smoke Test for fast feedback.\n4. Continue until ALL tests pass.\n\n### 3c. Verify Acceptance Criteria\n\nWalk through every Acceptance Criterion in the spec:\n\n- [ ] Is each one met?\n- [ ] Does the build pass?\n- [ ] Do all tests pass?\n\nIf any criterion is not met, keep implementing. Do not move on until all criteria are green.\n\n## Step 5: Handle Edge Cases\n\nCheck the spec's Edge Cases table. For each scenario:\n\n- Verify the expected behavior is handled.\n- If the spec says \"warn the user\" or \"prompt,\" make sure that path works.\n\n## Step 6: Wrap Up and Continue (mode-aware \u2014 do the wrap-up yourself)\n\nWhen the spec is implemented and all its tests pass, wrap up and advance according to the spec's **execution mode**. Read the `mode:` field from the spec's frontmatter (written by `joycraft-decompose`). If the spec has **no `mode:` field**, default to **`batch`** (back-compat with pre-mode specs). If the value is unrecognized, treat it as `batch` and note the unrecognized value.\n\n**You perform the wrap-up. You find the next spec. Do not stop to tell the human to run `$joycraft-spec-done` or to paste the next file path \u2014 those hand-backs carry zero information and break the feature's momentum.**\n\n### 6a. Per-spec wrap-up\n\n| Spec `mode:` | Wrap-up you perform now |\n|--------------|------------------------|\n| **batch** | **Status bump only**: set the spec to `in-review` in both systems (see below). No commit, no discovery stub \u2014 batch wraps once at feature end. (The bump is required: the queue treats a dependency as satisfied at `in-review`, so without it dependent specs would look blocked.) |\n| **checkpoint** / **isolated** | The full `joycraft-spec-done` wrap-up, performed by you (canonical definition: `.agents/skills/joycraft-spec-done/SKILL.md`): **(1)** bump status to `in-review` in both systems, **(2)** terse 2-line discovery stub at `docs/discoveries/YYYY-MM-DD-topic.md` ONLY if something contradicted the spec \u2014 usually skip, **(3)** commit `spec: <spec-name>` (implementation + status edits + stub, nothing unrelated), **(4)** no validation re-run, no push, no PR \u2014 those belong to `joycraft-session-end`. |\n\n**Both systems** means: the queue JSON (`joycraft-mark-done <spec-id> --to in-review <specs-dir>` if `.pi/scripts/joycraft/` is installed, else edit `.joycraft-spec-queue.json` directly) AND the spec file's `status:` frontmatter. Never `done` \u2014 the agent doesn't self-certify (`docs/reference/spec-status-lifecycle.md`).\n\n### 6b. Continue the queue (batch and checkpoint)\n\nRe-read `.joycraft-spec-queue.json` in the spec's directory and find the next `todo` spec whose dependencies are all `in-review`/`done` (same rule as Step 1). Then:\n\n- **Next ready spec exists** \u2192 announce one line \u2014 `Continuing: <next-spec> (spec N of M)` \u2014 and go back to Step 2 with it, in this same conversation.\n- **Remaining `todo` specs are all blocked** \u2192 stop and report which specs are blocked and on what.\n- **No `todo` specs remain** \u2192 this was the feature's last spec; go to 6d.\n- **No queue** (you were invoked with a bare spec file outside a queue) \u2192 report the spec complete and stop; there is nothing to continue from.\n\n### 6c. isolated \u2014 fresh context per spec\n\nA conversation cannot clear its own context, so after the wrap-up the fresh context comes from outside:\n\n- **Driver (recommended):** `$joycraft-implement-feature docs/features/<slug>/` runs the remaining queue with a fresh-context subagent per spec \u2014 in-session, interactive, no headless loop.\n- **Guided-manual:** tell the human to run `run `/clear` in the CLI, or press Cmd+N (Ctrl+N on Windows/Linux) for a new thread in the desktop/IDE app`, then re-invoke `$joycraft-implement <next-spec>`. (Always fine, no ToS/cost surprise.)\n- **Pi:** the `joycraft-implement-loop` driver automates it \u2014 a fresh `pi -p` process per spec. Nothing for you to do beyond the wrap-up; the loop advances.\n- **Headless (`claude -p` / `codex exec` loop):** opt-in only. **Surface the caveat, don't bury it:** unattended headless loops draw metered, full-rate API usage and carry a ToS posture the user must **knowingly opt into** (Anthropic meters `claude -p` from a separate full-rate pool; routing subscription OAuth through third-party harnesses is prohibited). The responsible default is Pi (BYO API key / open weights). Do not silently auto-run a subscription-backed headless loop.\n\n### 6d. Feature's last spec (any mode)\n\nRun the once-per-feature finisher yourself: invoke `$joycraft-session-end` (or read and follow `.agents/skills/joycraft-session-end/SKILL.md`). It carries its own gates \u2014 validation is mandatory and must pass before specs graduate `in-review \u2192 done`, and push/PR honor the project's AGENTS.md git autonomy rules \u2014 so running it automatically is safe.\n\n### Report\n\nAfter each spec's wrap-up, report tersely before continuing:\n\n```\nSpec complete: [spec name] \xB7 mode: [mode] \xB7 tests: [N] passing \xB7 [wrapped up + committed | status bumped (batch)]\n[Continuing: <next-spec> (spec N of M) | Feature complete \u2014 running session-end | Blocked: <specs + reasons>]\n```\n",
   "joycraft-interview.md": '---\nname: joycraft-interview\ndescription: Brainstorm freely about what you want to build \u2014 yap, explore ideas, and get a structured summary you can use later\n---\n\n# Interview \u2014 Idea Exploration\n\nYou are helping the user brainstorm and explore what they want to build. This is a lightweight, low-pressure conversation \u2014 not a formal spec process. Let them yap.\n\n## How to Run the Interview\n\n### 1. Open the Floor\n\nStart with something like:\n"What are you thinking about building? Just talk \u2014 I\'ll listen and ask questions as we go."\n\nLet the user talk freely. Do not interrupt their flow. Do not push toward structure yet.\n\n### 2. Ask Clarifying Questions\n\nAs they talk, weave in questions naturally \u2014 don\'t fire them all at once:\n\n- **What problem does this solve?** Who feels the pain today?\n- **What does "done" look like?** If this worked perfectly, what would a user see?\n- **What are the constraints?** Time, tech, team, budget \u2014 what boxes are we in?\n- **What\'s NOT in scope?** What\'s tempting but should be deferred?\n- **What are the edge cases?** What could go wrong? What\'s the weird input?\n- **What exists already?** Are we building on something or starting fresh?\n\n### 3. Play Back Understanding\n\nAfter the user has gotten their ideas out, reflect back:\n"So if I\'m hearing you right, you want to [summary]. The core problem is [X], and done looks like [Y]. Is that right?"\n\nLet them correct and refine. Iterate until they say "yes, that\'s it."\n\n### 4. Write a Draft Brief\n\nDerive a slug `YYYY-MM-DD-<topic>` (today\'s date + kebab-case topic \u2014 no `-draft` suffix).\nCreate a draft file at `docs/features/<slug>/brief.md`. Lazy-create `docs/features/<slug>/` if it doesn\'t exist.\n\nThe file MUST start with YAML frontmatter \u2014 the 4-field personal schema with `status: draft`:\n\n```yaml\n---\nstatus: draft\nowner: <resolved name>\ncreated: YYYY-MM-DD\nfeature: <slug>\n---\n```\n\n**Owner resolution:** look up the owner name in this order \u2014 (1) `git config user.name`, (2) value in your auto-memory `joycraft-owner.txt` if present, (3) ask the user once and persist. If you can\'t get a name, leave the field as `<resolved name>` and note it for the user.\n\nUse this format for the body:\n\n```markdown\n# [Topic] \u2014 Draft Brief\n\n> **Date:** YYYY-MM-DD\n> **Origin:** $joycraft-interview session\n\n---\n\n## The Idea\n[2-3 paragraphs capturing what the user described \u2014 their words, their framing]\n\n## Problem\n[What pain or gap this addresses]\n\n## What "Done" Looks Like\n[The user\'s description of success \u2014 observable outcomes]\n\n## Constraints\n- [constraint 1]\n- [constraint 2]\n\n## Open Questions\n- [things that came up but weren\'t resolved]\n- [decisions that need more thought]\n\n## Out of Scope (for now)\n- [things explicitly deferred \u2014 see also: deferred work goes to `docs/backlog/`]\n\n## Raw Notes\n[Any additional context, quotes, or tangents worth preserving]\n```\n\n### 5. Offer to Capture Deferred Items to Backlog\n\nIf during the conversation deferred work surfaces (a tangent, a "later" item, a "out-of-scope but tempting" idea), ASK the user:\n\n> "This looks like deferred work \u2014 want me to capture it to `docs/backlog/`?"\n\nOnly on user confirmation, write a backlog entry at `docs/backlog/YYYY-MM-DD-<short-name>.md` with backlog frontmatter:\n\n```yaml\n---\nstatus: backlog\nowner: <resolved name>\ncreated: YYYY-MM-DD\nsource: docs/features/<slug>/brief.md\n---\n```\n\n**Never auto-write to `docs/backlog/`.** Every backlog entry is user-confirmed.\n\n### 6. Hand Off\n\nAfter writing the draft (and any backlog entries), present the canonical Handoff block.\nInclude any backlog paths produced as a side effect.\n\n## Recommended Next Steps\n\nNext:\n```bash\n$joycraft-new-feature docs/features/<slug>/brief.md\n```\nRun run `/clear` in the CLI, or press Cmd+N (Ctrl+N on Windows/Linux) for a new thread in the desktop/IDE app first.\n\nIf the idea sounds complex \u2014 touches many files, involves architectural decisions, or the user is working in an unfamiliar area \u2014 nudge them toward research and design (e.g., `$joycraft-research` then `$joycraft-design`). But present it as a recommendation, not a gate.\n\n## Guidelines\n\n- **This is NOT $joycraft-new-feature.** Do not push toward formal briefs, decomposition tables, or atomic specs. The point is exploration.\n- **Let the user lead.** Your job is to listen, clarify, and capture \u2014 not to structure or direct.\n- **Mark everything as DRAFT.** The output is a starting point, not a commitment.\n- **Keep it short.** The draft brief should be 1-2 pages max. Capture the essence, not every detail.\n- **Multiple interviews are fine.** The user might run this several times as their thinking evolves. Each creates a new dated draft.\n',
   "joycraft-lockdown.md": "---\nname: joycraft-lockdown\ndescription: Generate constrained execution boundaries for an implementation session -- NEVER rules and deny patterns to prevent agent overreach\n---\n\n# Lockdown Mode\n\nThe user wants to constrain agent behavior for an implementation session. Your job is to interview them about what should be off-limits, then generate AGENTS.md NEVER rules and Codex configuration deny patterns they can review and apply.\n\n## When Is Lockdown Useful?\n\nLockdown is most valuable for:\n- **Complex tech stacks** (hardware, firmware, multi-device) where agents can cause real damage\n- **Long-running autonomous sessions** where you won't be monitoring every action\n- **Production-adjacent work** where accidental network calls or package installs are risky\n\nFor simple feature work on a well-tested codebase, lockdown is usually overkill. Mention this context to the user so they can decide.\n\n## Step 1: Check for Tests\n\nBefore starting the interview, search the codebase for test files or directories (look for `tests/`, `test/`, `__tests__/`, `spec/`, or files matching `*.test.*`, `*.spec.*`).\n\nIf no tests are found, tell the user:\n\n> Lockdown mode is most useful when you already have tests in place -- it prevents the agent from modifying them while constraining behavior to writing code and running tests. Consider running `$joycraft-new-feature` first to set up a test-driven workflow, then come back to lock it down.\n\nIf the user wants to proceed anyway, continue with the interview.\n\n## Step 2: Interview -- What to Lock Down\n\nAsk these three questions, one at a time. Wait for the user's response before proceeding to the next question.\n\n### Question 1: Read-Only Files\n\n> What test files or directories should be off-limits for editing? (e.g., `tests/`, `__tests__/`, `spec/`, specific test files)\n>\n> I'll generate NEVER rules to prevent editing these.\n\nIf the user isn't sure, suggest the test directories you found in Step 1.\n\n### Question 2: Allowed Commands\n\n> What commands should the agent be allowed to run? Defaults:\n> - Write and edit source code files\n> - Run the project's smoke test command\n> - Run the full test suite\n>\n> Any other commands to explicitly allow? Or should I restrict to just these?\n\n### Question 3: Denied Commands\n\n> What commands should be denied? Defaults:\n> - Package installs (`npm install`, `pip install`, `cargo add`, `go get`, etc.)\n> - Network tools (`curl`, `wget`, `ping`, `ssh`)\n> - Direct log file reading\n>\n> Any specific commands to add or remove from this list?\n\n**Edge case -- user wants to allow some network access:** If the user mentions API tests or specific endpoints that need network access, exclude those from the deny list and note the exception in the output.\n\n**Edge case -- user wants to lock down file writes:** If the user wants to prevent ALL file writes, warn them:\n\n> Denying all file writes would prevent the agent from doing any work. I recommend keeping source code writes allowed and only locking down test files, config files, or other sensitive directories.\n\n## Step 3: Generate Boundaries\n\nBased on the interview responses, generate output in this exact format:\n\n```\n## Lockdown boundaries generated\n\nReview these suggestions and add them to your project:\n\n### AGENTS.md -- add to NEVER section:\n\n- Edit any file in `[user's test directories]`\n- Run `[denied package manager commands]`\n- Use `[denied network tools]`\n- Read log files directly -- interact with logs only through test assertions\n- [Any additional NEVER rules based on user responses]\n\n### Codex configuration -- suggested deny patterns:\n\nAdd these to your Codex sandbox configuration to restrict command execution:\n\n[\"[command1]\", \"[command2]\", \"[command3]\"]\n\n---\n\nCopy these into your project manually, or tell me to apply them now (I'll show you the exact changes for approval first).\n```\n\nAdjust the content based on the actual interview responses:\n- Only include deny patterns for commands the user confirmed should be denied\n- Only include NEVER rules for directories/files the user specified\n- If the user allowed certain network tools or package managers, exclude those\n\n## Recommended Execution Model\n\nAfter generating the boundaries above, also recommend a Codex execution configuration. Include this section in your output:\n\n```\n### Recommended Execution Configuration\n\nCodex runs in a sandboxed environment by default. To maximize safety during lockdown:\n\n| Your situation | Configuration | Why |\n|---|---|---|\n| Autonomous spec execution | Sandbox with deny patterns above | Only pre-approved commands run |\n| Long session with some trust | Default sandbox | Network-disabled sandbox prevents external access |\n| Interactive development | Default with manual review | Review outputs before applying |\n\n**For lockdown mode, we recommend the default sandboxed execution** combined with the deny patterns above. Codex's sandbox already disables network access by default -- the deny patterns add file-level and command-level restrictions on top.\n\nIf you need network access for specific commands (e.g., API tests), configure explicit network allowances in your Codex setup rather than disabling the sandbox entirely.\n```\n\n## Step 4: Offer to Apply\n\nIf the user asks you to apply the changes:\n\n1. **For AGENTS.md:** Read the existing AGENTS.md, find the Behavioral Boundaries section, and show the user the exact diff for the NEVER section. Ask for confirmation before writing.\n2. **For Codex configuration:** Show the user what the deny patterns will look like after adding the new restrictions. Ask for confirmation before writing.\n\n**Never auto-apply. Always show the exact changes and wait for explicit approval.**\n",
@@ -468,7 +290,7 @@ var PI_SKILLS = {
   "joycraft-design.md": '---\nname: joycraft-design\ndescription: Design discussion before decomposition \u2014 produce a ~200-line design artifact for human review, catching wrong assumptions before they propagate into specs\n---\n\n# Design Discussion\n\nYou are producing a design discussion document for a feature. This sits between research and decomposition \u2014 it captures your understanding so the human can catch wrong assumptions before specs are written.\n\n**Guard clause:** If no brief path is provided and no brief exists at `docs/features/<slug>/brief.md`, say:\n"No feature brief found. Run `/skill:joycraft-new-feature` first to create one, or provide the path to your brief."\nThen stop.\n\n---\n\n## Step 1: Read Inputs\n\nRead the feature brief at the path the user provides. If the user also provides a research document path, read that too. Research is optional \u2014 if none exists, note that you\'ll explore the codebase directly.\n\n## Step 2: Explore the Codebase\n\nSpawn subagents to explore the codebase for patterns relevant to the brief. Focus on:\n\n- Files and functions that will be touched or extended\n- Existing patterns this feature should follow (naming, data flow, error handling)\n- Similar features already implemented that serve as models\n- Boundaries and interfaces the feature must integrate with\n\nGather file paths, function signatures, and code snippets. You need concrete evidence, not guesses.\n\n## Step 3: Write the Design Document\n\nDerive the slug from the brief path (`docs/features/<slug>/brief.md`).\nLazy-create the folder `docs/features/<slug>/` if needed.\nWrite the design document to `docs/features/<slug>/design.md`.\n\nThe file MUST start with YAML frontmatter \u2014 the 4-field personal schema:\n\n```yaml\n---\nstatus: active\nowner: <resolved name>\ncreated: YYYY-MM-DD\nfeature: <slug>\n---\n```\n\n**Owner resolution:** look up the owner name in this order \u2014 (1) `git config user.name`, (2) value in your auto-memory `joycraft-owner.txt` if present, (3) ask the user once and persist.\n\nThe document has exactly five sections:\n\n### Section 1: Current State\n\nWhat exists today in the codebase that is relevant to this feature. Include file paths, function signatures, and data flows. Be specific \u2014 reference actual code, not abstractions. If no research doc was provided, note that and describe what you found through direct exploration.\n\n### Section 2: Desired End State\n\nWhat the codebase should look like when this feature is complete. Describe the change at a high level \u2014 new files, modified interfaces, new data flows. Do NOT include implementation steps. This is the "what," not the "how."\n\n### Section 3: Patterns to Follow\n\nExisting patterns in the codebase that this feature should match. Include short code snippets and `file:line` references. Show the pattern, don\'t just name it.\n\nIf this is a greenfield project with no existing patterns, propose conventions and note that no precedent exists.\n\n### Section 4: Resolved Design Decisions\n\nDecisions you have already made, with brief rationale. Format each as:\n\n> **Decision:** [what you decided]\n> **Rationale:** [why, referencing existing code or constraints]\n> **Alternative rejected:** [what you considered and why you rejected it]\n\n### Section 5: Open Questions\n\nThings you don\'t know or where multiple valid approaches exist. Each question MUST present 2-3 concrete options with pros and cons. Format:\n\n> **Q: [question]**\n> - **Option A:** [description] \u2014 Pro: [benefit]. Con: [cost].\n> - **Option B:** [description] \u2014 Pro: [benefit]. Con: [cost].\n> - **Option C (if applicable):** [description] \u2014 Pro: [benefit]. Con: [cost].\n\nDo NOT ask vague questions like "what do you think?" Every question must have actionable options the human can choose from.\n\n### Update the Feature Brief\n\nAfter writing the design document, update the parent brief with a back-reference:\n1. Read `docs/features/<slug>/brief.md`\n2. In the header blockquote (the `>` lines at the top), add or update:\n   `> **Design:** docs/features/<slug>/design.md`\n3. If a `> **Design:**` line already exists, replace it \u2014 do NOT add a duplicate\n4. Write the brief back\n\n## Step 4: Reconcile Brief with Findings\n\nYou\'ve just written `docs/features/<slug>/design.md`. Before hand-off, the parent brief at `docs/features/<slug>/brief.md` may now disagree with what you discovered. Re-read it and check each of these sections:\n\n| Brief section | What to look for |\n|---|---|\n| Vision | Did your findings refine or contradict the framing? |\n| Hard Constraints | Are any constraints now obsolete, missing, or refined? |\n| Out of Scope | Did your findings push something in or out of scope? |\n| Decomposition | Are spec counts, names, or dependencies still accurate? |\n| Test Strategy | Do your findings change what or how to test? |\n| Success Criteria | Are the criteria still observable and still match the goal? |\n\n**For each section, choose one:**\n\n- **Edit in place** \u2014 small, mechanical updates: line-number corrections, clarifications, additions consistent with brief intent. No user approval needed.\n- **Diff + stop** \u2014 non-trivial changes: counts flipping, decomposition restructure, scope changes, contradiction with original brief intent. Present a diff of the proposed change, STOP, and wait for user approval before continuing.\n\nIf you make changes, note them at the bottom of `design.md` under a "Brief updates" subsection. If the brief is already in sync, note: "Reconciliation checked, no changes required." If no parent brief exists (feature was described inline), note that and skip this step.\n\n**Why this step exists:** the silent-drift gap. Without reconciliation, the brief and downstream artifacts diverge \u2014 and later decomposition is sized against the stale brief. This feature ("single-source-skills") hit exactly this: brief said "11 clean / 9 dirty" until the research re-audit forced a re-decomposition. Don\'t let it happen again.\n\n## Step 5: Present and STOP \u2014 Pre-Approval Hold\n\nPresent the design document to the user. Say:\n\n```\nDesign discussion written to docs/features/<slug>/design.md\n\nPlease review the document above. Specifically:\n1. Are the patterns in Section 3 the right ones to follow, or should I use different ones?\n2. Do you agree with the resolved decisions in Section 4?\n3. Pick an option for each open question in Section 5 (or propose your own).\n\nReply with your feedback. I will NOT proceed to decomposition until you have reviewed and approved this design.\n```\n\n**CRITICAL: Do NOT emit the canonical Handoff block at this point.** The Handoff block emits ONLY after human approval (see "Step 6: Hand Off (Post-Approval Only)" below). The entire value of this skill is the pause \u2014 it forces a human checkpoint before mistakes propagate.\n\n## Offer to Capture Deferred Items to Backlog\n\nIf during the design discussion the user mentions deferred work \u2014 "let\'s not do X yet," "save Y for later" \u2014 ASK before writing:\n\n> "This looks like deferred work \u2014 want me to capture it to `docs/backlog/`?"\n\nOnly on user confirmation, write a backlog entry at `docs/backlog/YYYY-MM-DD-<short-name>.md` with backlog frontmatter:\n\n```yaml\n---\nstatus: backlog\nowner: <resolved name>\ncreated: YYYY-MM-DD\nsource: docs/features/<slug>/brief.md\n---\n```\n\n**Never auto-write to `docs/backlog/`.** Every backlog entry is user-confirmed.\n\n## Step 6: Hand Off (Post-Approval Only)\n\nOnce the human approves the design:\n- Update the design document with their corrections and chosen options\n- Move answered questions from "Open Questions" to "Resolved Design Decisions"\n- Present the updated document for final confirmation\n- Once the user gives explicit approval, AND ONLY THEN, emit the canonical Handoff block:\n\n## Recommended Next Steps\n\nNext:\n```bash\n/skill:joycraft-decompose docs/features/<slug>/brief.md\n```\nRun /new first.\n\nInclude any backlog paths produced as a side effect.\n',
   "joycraft-gather-context.md": "---\nname: joycraft-gather-context\ndescription: First-run onboarding pass that populates the project context layer -- read what context already exists, then offer a gap-only interview and batch-write the missing fact rows and long-form reference docs\n---\n\n# Gather Context\n\nThis is the first-run **read-then-offer** onboarding pass \u2014 the lowest-intervention way to populate the project's context layer. You read what context already exists, summarize coverage, offer a gap-only interview, and write everything in one reviewable batch at the end.\n\nThis skill is self-contained. It composes the same conventions the single-doc skills use, but everything you need is inlined below \u2014 do not call into or import another skill's logic.\n\n## Step 1: Read What Already Exists First\n\nThe user has invoked the first-run onboarding pass (e.g., `/skill:joycraft-gather-context`). Before asking the user anything, scan the project's existing context. Default scan breadth is **README + `docs/` + AGENTS.md only**:\n\n- The README(s) at the repo root and any obvious sub-package READMEs.\n- `docs/**` \u2014 existing design, architecture, or style docs.\n- `docs/context/*` \u2014 the flat operational fact-docs (production-map, dangerous-assumptions, decision-log, institutional-knowledge, troubleshooting) and `docs/context/reference/*` long-form docs.\n- The current AGENTS.md content, including any `## Context Map` section.\n\nThen summarize for the user what context already exists and what's covered.\n\n**Do NOT auto-run a code-inference scan.** Reading the actual source to infer architecture costs significantly more tokens. Offer that deeper/full review ONLY if the user explicitly asks for it, and when you do, note clearly that it costs more tokens. The default pass never reads the codebase to infer context.\n\n## Step 2: Offer a Gap-Only Interview (Don't Force)\n\nFrom the summary, identify genuine gaps: no design-system doc? no production map? no decision log? Offer an **optional** interview that targets only those gaps. The user can decline any or all of it \u2014 offer, never force.\n\n**Per-doc skip guard (not all-or-nothing):** Never re-interview for a doc that already has real content. Skip each doc that's already populated individually, and interview only the empty or missing ones. If everything is already covered, say so and offer nothing.\n\n## Step 3: Route by Shape (Inline Test)\n\nFor each thing the user wants to capture, apply this minimal shape test inline \u2014 do not defer to another skill:\n\n- **\"Could this be one row in a table?\"** \u2192 it's an **operational fact**. Route it to one of the five flat fact-docs under `docs/context/`:\n  - `docs/context/production-map.md` \u2014 infrastructure, services, environments, URLs, credentials, safe/unsafe to touch.\n  - `docs/context/dangerous-assumptions.md` \u2014 false assumptions an agent might make.\n  - `docs/context/decision-log.md` \u2014 an architectural/tooling choice and why.\n  - `docs/context/institutional-knowledge.md` \u2014 team conventions, unwritten rules, ownership.\n  - `docs/context/troubleshooting.md` \u2014 when X happens, do Y.\n  Append it as a table row (or list item for institutional-knowledge), removing any italic example rows in that table first.\n\n- **\"Does explaining it take paragraphs?\"** \u2192 it's **long-form reference**. Scaffold `docs/context/reference/<slug>.md` from the matching template in `docs/templates/context/reference/` (`design-system`, `frontend-methodology`, `backend`, `testing`, or the generic `reference-doc` fallback), lazy-creating `docs/context/reference/` on first write.\n\nIf an item is ambiguous, apply the test literally: one row \u2192 fact bucket; paragraphs \u2192 reference doc.\n\n## Step 4: Batch-Write + One Final Confirm\n\nDo NOT write per-answer. Collect ALL of the user's gap answers across the whole interview first. Then, in ONE batch:\n\n1. Write all the fact rows into their fact-docs.\n2. Scaffold and write all the reference docs into `docs/context/reference/`.\n3. Add or update the `## Context Map` pointer rows in AGENTS.md \u2014 one row per reference doc, in the form `| docs/context/reference/<slug>.md | <when to read it> |`. Create the `## Context Map` section (header + two-column table) if it doesn't exist; update an existing row in place rather than duplicating it.\n\nPresent the full set of intended changes and get ONE final confirm (\"do it in one go\") before writing. If the user aborts at the final confirm, write nothing \u2014 there are no partial writes in this batch model. The result is one clean, reviewable diff.\n\n## Step 5: Confirm and Hand Off\n\nReport the batch: which fact rows were added, which reference docs were scaffolded, and which Context Map rows were created or updated. Then end with the canonical Handoff block.\n\n## Recommended Next Steps\n\nNext:\n```bash\n/skill:joycraft-session-end\n```\nRun /new first.\n",
   "joycraft-implement-feature.md": "---\nname: joycraft-implement-feature\ndescription: Run a feature's entire spec queue from one invocation \u2014 delegates to the joycraft-implement-loop driver (fresh pi -p process per spec)\n---\n\n# Implement Feature (Whole-Queue Driver)\n\nOne invocation runs a feature's whole spec queue: `/skill:joycraft-implement-feature docs/features/<slug>/`. On Pi the driver already exists as a script \u2014 `.pi/scripts/joycraft/joycraft-implement-loop` \u2014 and the process boundary it creates (a fresh `pi -p` per spec) is the verified context isolation. **Your job is to point the loop at the right queue and run it, not to reimplement it.**\n\n## Step 1: Load the Queue\n\n1. Resolve the specs directory: if the given path contains a `specs/` subdirectory, use it; otherwise use the path itself. Look for `.joycraft-spec-queue.json` there.\n2. **No queue** \u2192 stop:\n\n   > No spec queue found in [path]. Run `/skill:joycraft-decompose` first \u2014 it writes the queue, the specs, and the wave plan.\n\n3. Read the sibling `README.md` (the wave plan) and report the plan: feature slug, M specs, current statuses, the order the loop will serve them in (`joycraft-next-spec` order: first `todo` whose `depends_on` are all `in-review`/`done`).\n4. If **no `todo` specs remain**, report that and suggest `/skill:joycraft-session-end` if the feature was never finished; do not run the loop.\n\n## Step 2: Run the Loop\n\nInvoke the driver via the shell, pointing at the specs dir:\n\n```\njoycraft-implement-loop docs/features/<slug>/specs\n```\n\nWhat it does (so you can narrate it, not reimplement it): `joycraft-next-spec` \u2192 fresh `pi -p \"/skill:joycraft-implement <spec>\"` \u2192 fresh `pi -p \"/skill:joycraft-spec-done <spec>\"` \u2192 repeat; **fail-fast** (exits non-zero naming the failing spec, queue left intact); runs `joycraft-session-end` exactly once when the queue is exhausted.\n\nNotes:\n- The driver spawns `pi -p` subprocesses; nesting it under an already-running Pi session is sound by design but not yet smoke-tested end-to-end \u2014 if the nested `pi -p` misbehaves, fall back to telling the human to run the command above in a separate terminal.\n- **ToS/cost:** this path is for Pi with a BYO API key or open-weight model \u2014 do not route a subscription OAuth through it.\n\n## Step 3: Report\n\nRelay the loop's outcome:\n\n- **Success** \u2192 which specs ran, and session-end's own report (validation, graduation `in-review \u2192 done`, push/PR per AGENTS.md autonomy).\n- **Failure** \u2192 which spec failed (the loop names it), what reached `in-review`, what remains `todo`. Suggest fixing in a fresh session (`/skill:joycraft-implement <failed-spec>`), then re-running the loop for the remainder \u2014 it picks up where it stopped.\n",
-  "joycraft-implement-level5.md": "---\nname: joycraft-implement-level5\ndescription: Set up Level 5 autonomous development \u2014 autofix loop, holdout scenario testing, and scenario evolution from specs\n---\n\n# Implement Level 5 \u2014 Autonomous Development Loop\n\nYou are guiding the user through setting up Level 5: the autonomous feedback loop where specs go in, validated software comes out. This is a one-time setup that installs workflows, creates a scenarios repo, and configures the autofix loop.\n\n## Before You Begin\n\nCheck prerequisites:\n\n1. **Project must be initialized.** Look for `docs/.joycraft/state.json` (older installs may still have it at the legacy `.claude/.joycraft/state.json` or a `.joycraft-version` at the repo root). If none exist, tell the user to run `npx joycraft init` first.\n2. **Project should be at Level 4.** Check `docs/skill:joycraft-assessment.md` if it exists. If the project hasn't been assessed yet, suggest running `/skill:joycraft-tune` first. But don't block \u2014 the user may know they're ready.\n3. **Git repo with GitHub remote.** This setup requires GitHub Actions. Check for `.git/` and a GitHub remote.\n\nIf prerequisites aren't met, explain what's needed and stop.\n\n## Step 1: Explain What Level 5 Means\n\nTell the user:\n\n> Level 5 is the autonomous loop. When you push specs, three things happen automatically:\n>\n> 1. **Scenario evolution** \u2014 A separate AI agent reads your specs and writes holdout tests in a private scenarios repo. These tests are invisible to your coding agent.\n> 2. **Autofix** \u2014 When CI fails on a PR, Claude Code automatically attempts a fix (up to 3 times).\n> 3. **Holdout validation** \u2014 When CI passes, your scenarios repo runs behavioral tests against the PR. Results post as PR comments.\n>\n> The key insight: your coding agent never sees the scenario tests. This prevents it from gaming the test suite \u2014 like a validation set in machine learning.\n\n## Step 2: Gather Configuration\n\nAsk these questions **one at a time**:\n\n### Question 1: Scenarios repo name\n\n> What should we call your scenarios repo? It'll be a private repo that holds your holdout tests.\n>\n> Default: `{current-repo-name}-scenarios`\n\nAccept the default or the user's choice.\n\n### Question 2: GitHub App\n\n> Level 5 needs a GitHub App to provide a separate identity for autofix pushes (this avoids GitHub's anti-recursion protection). Creating one takes about 2 minutes:\n>\n> 1. Go to https://github.com/settings/apps/new\n> 2. Give it a name (e.g., \"My Project Autofix\")\n> 3. Uncheck \"Webhook > Active\" (not needed)\n> 4. Under **Repository permissions**, set:\n>    - **Contents**: Read & Write\n>    - **Pull requests**: Read & Write\n>    - **Actions**: Read & Write\n> 5. Click **Create GitHub App**\n> 6. Note the **App ID** from the settings page\n> 7. Scroll to **Private keys** > click **Generate a private key** > save the `.pem` file\n> 8. Click **Install App** in the left sidebar > install it on your repo\n>\n> What's your App ID?\n\n## Step 3: Run init-autofix\n\nRun the CLI command with the gathered configuration:\n\n```bash\nnpx joycraft init-autofix --scenarios-repo {name} --app-id {id}\n```\n\nReview the output with the user. Confirm files were created.\n\n## Step 4: Walk Through Secret Configuration\n\nGuide the user step by step:\n\n### 4a: Add Secrets to Main Repo\n\n> You should already have the `.pem` file from when you created the app in Step 2.\n\n> Go to your repo's Settings > Secrets and variables > Actions, and add:\n> - `JOYCRAFT_APP_PRIVATE_KEY` \u2014 paste the contents of your `.pem` file\n> - `ANTHROPIC_API_KEY` \u2014 your Anthropic API key\n\n### 4b: Create the Scenarios Repo\n\n> Create the private scenarios repo:\n> ```bash\n> gh repo create {scenarios-repo-name} --private\n> ```\n>\n> Then copy the scenario templates into it:\n> ```bash\n> cp -r docs/templates/scenarios/* ../{scenarios-repo-name}/\n> cd ../{scenarios-repo-name}\n> git add -A && git commit -m \"init: scaffold scenarios repo from Joycraft\"\n> git push\n> ```\n\n### 4c: Add Secrets to Scenarios Repo\n\n> The scenarios repo also needs the App private key:\n> - `JOYCRAFT_APP_PRIVATE_KEY` \u2014 same `.pem` file as the main repo\n> - `ANTHROPIC_API_KEY` \u2014 same key (needed for scenario generation)\n\n## Step 5: Verify Setup\n\nHelp the user verify everything is wired correctly:\n\n1. **Check workflow files exist:** `ls .github/workflows/autofix.yml .github/workflows/scenarios-dispatch.yml .github/workflows/spec-dispatch.yml .github/workflows/scenarios-rerun.yml`\n2. **Check scenario templates were copied:** Verify the scenarios repo has `example-scenario.test.ts`, `workflows/run.yml`, `workflows/generate.yml`, `prompts/scenario-agent.md`\n3. **Check the App ID is correct** in the workflow files (not still a placeholder)\n\n## Step 6: Update AGENTS.md\n\nIf the project's AGENTS.md doesn't already have an \"External Validation\" section, add one:\n\n> ## External Validation\n>\n> This project uses holdout scenario tests in a separate private repo.\n>\n> ### NEVER\n> - Access, read, or reference the scenarios repo\n> - Mention scenario test names or contents\n> - Modify the scenarios dispatch workflow to leak test information\n>\n> The scenarios repo is deliberately invisible to you. This is the holdout guarantee.\n\n## Step 7: First Test (Optional)\n\nIf the user wants to test the loop:\n\n> Want to do a quick test? Here's how:\n>\n> 1. Write a simple spec in `docs/features/<slug>/specs/` and push to main \u2014 this triggers scenario generation\n> 2. Create a PR with a small change \u2014 when CI passes, scenarios will run\n> 3. Watch for the scenario test results as a PR comment\n>\n> Or deliberately break something in a PR to test the autofix loop.\n\n## Step 8: Summary\n\nPrint a summary of what was set up:\n\n> **Level 5 is live.** Here's what's running:\n>\n> | Trigger | What Happens |\n> |---------|-------------|\n> | Push specs to `docs/features/<slug>/specs/` | Scenario agent writes holdout tests |\n> | PR fails CI | Claude autofix attempts (up to 3x) |\n> | PR passes CI | Holdout scenarios run against PR |\n> | Scenarios update | Open PRs re-tested with latest scenarios |\n>\n> Your scenarios repo: `{name}`\n> Your coding agent cannot see those tests. The holdout wall is intact.\n\n**Important:** Tell the user:\n\n> **Before you can test the loop**, you need to merge this PR to main first. GitHub's `workflow_run` triggers only activate for workflows that exist on the default branch. Once merged, create a new PR with any small change \u2014 that's when you'll see Autofix, Scenarios Dispatch, and Spec Dispatch fire for the first time.\n\nUpdate `docs/skill:joycraft-assessment.md` if it exists \u2014 set the Level 5 score to reflect the new setup.\n",
+  "joycraft-implement-level5.md": "---\nname: joycraft-implement-level5\ndescription: Set up Level 5 autonomous development \u2014 autofix loop, holdout scenario testing, and scenario evolution from specs\n---\n\n# Implement Level 5 \u2014 Autonomous Development Loop\n\nYou are guiding the user through setting up Level 5: the autonomous feedback loop where specs go in, validated software comes out. This is a one-time setup that installs workflows, creates a scenarios repo, and configures the autofix loop.\n\n## Before You Begin\n\nCheck prerequisites:\n\n1. **Project must be initialized.** Look for `docs/.joycraft/state.json` (older installs may still have it at the legacy `.claude/.joycraft/state.json` or a `.joycraft-version` at the repo root). If none exist, tell the user to run `npx joycraft init` first.\n2. **Project should be at Level 4.** Check `docs/skill:joycraft-assessment.md` if it exists. If the project hasn't been assessed yet, suggest running `/skill:joycraft-tune` first. But don't block \u2014 the user may know they're ready.\n3. **Git repo with GitHub remote.** This setup requires GitHub Actions. Check for `.git/` and a GitHub remote.\n\nIf prerequisites aren't met, explain what's needed and stop.\n\n## Step 1: Explain What Level 5 Means\n\nTell the user:\n\n> Level 5 is the autonomous loop. When you push specs, three things happen automatically:\n>\n> 1. **Scenario evolution** \u2014 A separate AI agent reads your specs and writes holdout tests in a private scenarios repo. These tests are invisible to your coding agent.\n> 2. **Autofix** \u2014 When CI fails on a PR, Claude Code automatically attempts a fix (up to 3 times).\n> 3. **Holdout validation** \u2014 When CI passes, your scenarios repo runs behavioral tests against the PR. Results post as PR comments.\n>\n> The key insight: your coding agent never sees the scenario tests. This prevents it from gaming the test suite \u2014 like a validation set in machine learning.\n\n## Step 2: Gather Configuration\n\nAsk these questions **one at a time**:\n\n### Question 1: Scenarios repo name\n\n> What should we call your scenarios repo? It'll be a private repo that holds your holdout tests.\n>\n> Default: `{current-repo-name}-scenarios`\n\nAccept the default or the user's choice.\n\n### Question 2: GitHub App\n\n> Level 5 needs a GitHub App to provide a separate identity for autofix pushes (this avoids GitHub's anti-recursion protection). Creating one takes about 2 minutes:\n>\n> 1. Go to https://github.com/settings/apps/new\n> 2. Give it a name (e.g., \"My Project Autofix\")\n> 3. Uncheck \"Webhook > Active\" (not needed)\n> 4. Under **Repository permissions**, set:\n>    - **Contents**: Read & Write\n>    - **Pull requests**: Read & Write\n>    - **Actions**: Read & Write\n> 5. Click **Create GitHub App**\n> 6. Note the **App ID** from the settings page\n> 7. Scroll to **Private keys** > click **Generate a private key** > save the `.pem` file\n> 8. Click **Install App** in the left sidebar > install it on your repo\n>\n> What's your App ID?\n\n## Step 3: Run init-autofix\n\nRun the CLI command with the gathered configuration:\n\n```bash\nnpx joycraft init-autofix --scenarios-repo {name} --app-id {id}\n```\n\nReview the output with the user. Confirm files were created.\n\n## Step 4: Walk Through Secret Configuration\n\nGuide the user step by step:\n\n### 4a: Add Secrets to Main Repo\n\n> You should already have the `.pem` file from when you created the app in Step 2.\n\n> Go to your repo's Settings > Secrets and variables > Actions, and add:\n> - `JOYCRAFT_APP_PRIVATE_KEY` \u2014 paste the contents of your `.pem` file\n> - `ANTHROPIC_API_KEY` \u2014 your Anthropic API key\n\n### 4b: Create the Scenarios Repo\n\n> Create the private scenarios repo:\n> ```bash\n> gh repo create {scenarios-repo-name} --private\n> ```\n>\n> Then copy the scenario templates into it. The starter ships as\n> `example-scenario.test.ts.template` (the `.template` suffix keeps it out of\n> the *main* project's test/lint/build globs); rename it to `.test.ts` once it's\n> in the holdout repo so Vitest discovers it:\n> ```bash\n> cp -r docs/templates/scenarios/* ../{scenarios-repo-name}/\n> cd ../{scenarios-repo-name}\n> mv example-scenario.test.ts.template example-scenario.test.ts\n> git add -A && git commit -m \"init: scaffold scenarios repo from Joycraft\"\n> git push\n> ```\n\n### 4c: Add Secrets to Scenarios Repo\n\n> The scenarios repo also needs the App private key:\n> - `JOYCRAFT_APP_PRIVATE_KEY` \u2014 same `.pem` file as the main repo\n> - `ANTHROPIC_API_KEY` \u2014 same key (needed for scenario generation)\n\n## Step 5: Verify Setup\n\nHelp the user verify everything is wired correctly:\n\n1. **Check workflow files exist:** `ls .github/workflows/autofix.yml .github/workflows/scenarios-dispatch.yml .github/workflows/spec-dispatch.yml .github/workflows/scenarios-rerun.yml`\n2. **Check scenario templates were copied:** Verify the scenarios repo has `example-scenario.test.ts` (renamed from the `.template` starter), `workflows/run.yml`, `workflows/generate.yml`, `prompts/scenario-agent.md`\n3. **Check the App ID is correct** in the workflow files (not still a placeholder)\n\n## Step 6: Update AGENTS.md\n\nIf the project's AGENTS.md doesn't already have an \"External Validation\" section, add one:\n\n> ## External Validation\n>\n> This project uses holdout scenario tests in a separate private repo.\n>\n> ### NEVER\n> - Access, read, or reference the scenarios repo\n> - Mention scenario test names or contents\n> - Modify the scenarios dispatch workflow to leak test information\n>\n> The scenarios repo is deliberately invisible to you. This is the holdout guarantee.\n\n## Step 7: First Test (Optional)\n\nIf the user wants to test the loop:\n\n> Want to do a quick test? Here's how:\n>\n> 1. Write a simple spec in `docs/features/<slug>/specs/` and push to main \u2014 this triggers scenario generation\n> 2. Create a PR with a small change \u2014 when CI passes, scenarios will run\n> 3. Watch for the scenario test results as a PR comment\n>\n> Or deliberately break something in a PR to test the autofix loop.\n\n## Step 8: Summary\n\nPrint a summary of what was set up:\n\n> **Level 5 is live.** Here's what's running:\n>\n> | Trigger | What Happens |\n> |---------|-------------|\n> | Push specs to `docs/features/<slug>/specs/` | Scenario agent writes holdout tests |\n> | PR fails CI | Claude autofix attempts (up to 3x) |\n> | PR passes CI | Holdout scenarios run against PR |\n> | Scenarios update | Open PRs re-tested with latest scenarios |\n>\n> Your scenarios repo: `{name}`\n> Your coding agent cannot see those tests. The holdout wall is intact.\n\n**Important:** Tell the user:\n\n> **Before you can test the loop**, you need to merge this PR to main first. GitHub's `workflow_run` triggers only activate for workflows that exist on the default branch. Once merged, create a new PR with any small change \u2014 that's when you'll see Autofix, Scenarios Dispatch, and Spec Dispatch fire for the first time.\n\nUpdate `docs/skill:joycraft-assessment.md` if it exists \u2014 set the Level 5 score to reflect the new setup.\n",
   "joycraft-implement.md": "---\nname: joycraft-implement\ndescription: Execute atomic specs with TDD \u2014 read spec, write failing tests, implement until green, wrap up and continue the queue\n---\n\n# Implement Atomic Spec\n\nYou have exactly one atomic spec file to execute. Your job is to implement it using strict TDD \u2014 tests first, confirm they fail, then implement until green.\n\n## Step 1: Parse Arguments\n\nThe user MUST provide a path. No path = stop immediately.\n\n**If no path was provided:**\n\n> No spec path provided. Provide a spec file or a feature directory:\n> `/skill:joycraft-implement docs/features/<slug>/specs/spec-name.md`\n> or `/skill:joycraft-implement docs/features/<slug>/`\n\n**If the path is a directory** (ends with `/` or does not end with `.md`):\n\nLook for `specs/.joycraft-spec-queue.json` inside that directory. Read it. Find the **first `todo` spec whose dependencies are satisfied** (a dependency is satisfied once it is `in-review` or `done`). This matches what `joycraft-next-spec` serves. That single spec file is your target. Do NOT read any other specs.\n\n> Using spec queue: found [spec-file-name] as the next spec.\n\nIf the directory has no queue or no `todo` specs:\n\n> No remaining specs found in [directory].\n\n**If the path is a file** ending in `.md`:\n\nUse it directly as the spec to implement.\n\n## Step 2: Read the Sibling README.md FIRST (if present)\n\nBefore reading the spec itself, check for a sibling `README.md` in the same folder as the spec \u2014 i.e., `<spec-path>/../README.md`. This file is the wave-plan + spec-table that `/skill:joycraft-decompose` writes per feature.\n\n- **If present:** Read the README first. It tells you the spec's position in the wave plan, its dependencies, and which sibling specs (in the same folder) need to be done before this one.\n- **If absent:** That's fine \u2014 proceed normally. The convention is forward-only and many legacy spec folders pre-date it.\n\n### Warn on Unmet Dependencies\n\nIf the README shows that this spec depends on other specs in the same folder, check whether those dependencies are satisfied. A dependency is satisfied once its frontmatter `status:` is `in-review` or `done` (see `docs/reference/spec-status-lifecycle.md`) \u2014 a checkpoint chain progresses on `in-review` without waiting for session-end to graduate it to `done`. A dependency still at `todo` is unmet.\n\nIf any dependency is **not** complete, tell the user:\n\n> \"This spec lists unmet dependencies in the sibling README.md: [list]. Proceed anyway, or stop?\"\n\nWait for confirmation before continuing. The user might be deliberately running out of order (a hotfix, an exploration, etc.) \u2014 your job is to surface the warning, not to gate.\n\n## Step 3: Read and Understand the Spec\n\n1. **Read the spec file.** The spec is your execution contract \u2014 the Acceptance Criteria and Test Plan define \"done.\"\n2. **Check the spec's Status field.** If it says \"Complete,\" warn the user and ask if they want to re-implement or skip.\n3. **Read the Acceptance Criteria** \u2014 these are your success conditions.\n4. **Read the Test Plan** \u2014 this tells you exactly what tests to write and in what order.\n5. **Read the Constraints** \u2014 these are hard boundaries you must not violate.\n\n### Finding Additional Context\n\nSpecs are designed to be self-contained, but if you need more context:\n\n- **Parent brief:** Linked in the spec's body (`> **Parent Brief:**` line). The new convention is `docs/features/<slug>/brief.md`. Read it for broader feature context.\n- **Related specs:** Live in the same directory (typically `docs/features/<slug>/specs/`). The sibling `README.md` (read in Step 2 above) is the index.\n- **Affected Files:** The spec's Affected Files table tells you which files to create or modify.\n\n\n### Before writing code against an external API:\n\n\u26A0\uFE0F If the spec references a third-party SDK or package, read its official documentation and type definitions FIRST. Never write a `declare module` stub for a package that actually exists \u2014 use the real package as a devDependency instead. The stub will make typecheck pass but the code will fail at runtime.\n\n## Step 4: Execute the TDD Cycle\n\n**This is not optional. Write tests FIRST.**\n\n### 3a. Write Tests (Red Phase)\n\nUsing the spec's Test Plan:\n\n1. Write ALL tests listed in the Test Plan. Each Acceptance Criterion must have at least one test.\n2. Tests should call the actual function/endpoint \u2014 not a reimplementation or mock of the underlying library.\n3. Run the tests. **They MUST fail.** If any test passes immediately:\n   - Flag it \u2014 either the test isn't testing the right thing, or the code already exists.\n   - Investigate before proceeding. A test that passes before implementation is a test that proves nothing.\n\n### 3b. Implement (Green Phase)\n\n1. Follow the spec's Approach section for implementation strategy.\n2. Implement the minimum code needed to make tests pass.\n3. Run tests after each meaningful change \u2014 use the spec's Smoke Test for fast feedback.\n4. Continue until ALL tests pass.\n\n### 3c. Verify Acceptance Criteria\n\nWalk through every Acceptance Criterion in the spec:\n\n- [ ] Is each one met?\n- [ ] Does the build pass?\n- [ ] Do all tests pass?\n\nIf any criterion is not met, keep implementing. Do not move on until all criteria are green.\n\n## Step 5: Handle Edge Cases\n\nCheck the spec's Edge Cases table. For each scenario:\n\n- Verify the expected behavior is handled.\n- If the spec says \"warn the user\" or \"prompt,\" make sure that path works.\n\n## Step 6: Wrap Up and Continue (mode-aware \u2014 do the wrap-up yourself)\n\n**Loop-iteration check FIRST.** If this process is one iteration of the `joycraft-implement-loop` driver (you were launched by `pi -p` with a single spec path), STOP after the implementation report \u2014 do **not** wrap up and do **not** continue. The loop runs `/skill:joycraft-spec-done` as its own fresh `pi -p` step and advances the queue itself; wrapping up here would double-run it.\n\nOtherwise (interactive session), when the spec is implemented and all its tests pass, wrap up and advance according to the spec's **execution mode**. Read the `mode:` field from the spec's frontmatter (written by `joycraft-decompose`). If the spec has **no `mode:` field**, default to **`batch`** (back-compat with pre-mode specs). If the value is unrecognized, treat it as `batch` and note the unrecognized value.\n\n**You perform the wrap-up. You find the next spec. Do not stop to tell the human to run `/skill:joycraft-spec-done` or to paste the next file path \u2014 those hand-backs carry zero information and break the feature's momentum.**\n\n### 6a. Per-spec wrap-up\n\n| Spec `mode:` | Wrap-up you perform now |\n|--------------|------------------------|\n| **batch** | **Status bump only**: set the spec to `in-review` in both systems (see below). No commit, no discovery stub \u2014 batch wraps once at feature end. (The bump is required: the queue treats a dependency as satisfied at `in-review`, so without it dependent specs would look blocked.) |\n| **checkpoint** / **isolated** | The full `joycraft-spec-done` wrap-up, performed by you (canonical definition: `.pi/skills/joycraft-spec-done/SKILL.md`): **(1)** bump status to `in-review` in both systems, **(2)** terse 2-line discovery stub at `docs/discoveries/YYYY-MM-DD-topic.md` ONLY if something contradicted the spec \u2014 usually skip, **(3)** commit `spec: <spec-name>` (implementation + status edits + stub, nothing unrelated), **(4)** no validation re-run, no push, no PR \u2014 those belong to `joycraft-session-end`. |\n\n**Both systems** means: the queue JSON (`joycraft-mark-done <spec-id> --to in-review <specs-dir>` if `.pi/scripts/joycraft/` is installed, else edit `.joycraft-spec-queue.json` directly) AND the spec file's `status:` frontmatter. Never `done` \u2014 the agent doesn't self-certify (`docs/reference/spec-status-lifecycle.md`).\n\n### 6b. Continue the queue (batch and checkpoint)\n\nRe-read `.joycraft-spec-queue.json` in the spec's directory and find the next `todo` spec whose dependencies are all `in-review`/`done` (same rule as Step 1). Then:\n\n- **Next ready spec exists** \u2192 announce one line \u2014 `Continuing: <next-spec> (spec N of M)` \u2014 and go back to Step 2 with it, in this same conversation.\n- **Remaining `todo` specs are all blocked** \u2192 stop and report which specs are blocked and on what.\n- **No `todo` specs remain** \u2192 this was the feature's last spec; go to 6d.\n- **No queue** (you were invoked with a bare spec file outside a queue) \u2192 report the spec complete and stop; there is nothing to continue from.\n\n### 6c. isolated \u2014 fresh context per spec\n\nA conversation cannot clear its own context, so after the wrap-up the fresh context comes from outside:\n\n- **Driver (recommended):** `/skill:joycraft-implement-feature docs/features/<slug>/` runs the remaining queue with a fresh-context subagent per spec \u2014 in-session, interactive, no headless loop.\n- **Guided-manual:** tell the human to run `/new`, then re-invoke `/skill:joycraft-implement <next-spec>`. (Always fine, no ToS/cost surprise.)\n- **Pi:** the `joycraft-implement-loop` driver automates it \u2014 a fresh `pi -p` process per spec. Nothing for you to do beyond the wrap-up; the loop advances.\n- **Headless (`claude -p` / `codex exec` loop):** opt-in only. **Surface the caveat, don't bury it:** unattended headless loops draw metered, full-rate API usage and carry a ToS posture the user must **knowingly opt into** (Anthropic meters `claude -p` from a separate full-rate pool; routing subscription OAuth through third-party harnesses is prohibited). The responsible default is Pi (BYO API key / open weights). Do not silently auto-run a subscription-backed headless loop.\n\n### 6d. Feature's last spec (any mode)\n\nRun the once-per-feature finisher yourself: invoke `/skill:joycraft-session-end` (or read and follow `.pi/skills/joycraft-session-end/SKILL.md`). It carries its own gates \u2014 validation is mandatory and must pass before specs graduate `in-review \u2192 done`, and push/PR honor the project's AGENTS.md git autonomy rules \u2014 so running it automatically is safe.\n\n### Report\n\nAfter each spec's wrap-up, report tersely before continuing:\n\n```\nSpec complete: [spec name] \xB7 mode: [mode] \xB7 tests: [N] passing \xB7 [wrapped up + committed | status bumped (batch)]\n[Continuing: <next-spec> (spec N of M) | Feature complete \u2014 running session-end | Blocked: <specs + reasons>]\n```\n",
   "joycraft-interview.md": '---\nname: joycraft-interview\ndescription: Brainstorm freely about what you want to build \u2014 yap, explore ideas, and get a structured summary you can use later\n---\n\n# Interview \u2014 Idea Exploration\n\nYou are helping the user brainstorm and explore what they want to build. This is a lightweight, low-pressure conversation \u2014 not a formal spec process. Let them yap.\n\n## How to Run the Interview\n\n### 1. Open the Floor\n\nStart with something like:\n"What are you thinking about building? Just talk \u2014 I\'ll listen and ask questions as we go."\n\nLet the user talk freely. Do not interrupt their flow. Do not push toward structure yet.\n\n### 2. Ask Clarifying Questions\n\nAs they talk, weave in questions naturally \u2014 don\'t fire them all at once:\n\n- **What problem does this solve?** Who feels the pain today?\n- **What does "done" look like?** If this worked perfectly, what would a user see?\n- **What are the constraints?** Time, tech, team, budget \u2014 what boxes are we in?\n- **What\'s NOT in scope?** What\'s tempting but should be deferred?\n- **What are the edge cases?** What could go wrong? What\'s the weird input?\n- **What exists already?** Are we building on something or starting fresh?\n\n### 3. Play Back Understanding\n\nAfter the user has gotten their ideas out, reflect back:\n"So if I\'m hearing you right, you want to [summary]. The core problem is [X], and done looks like [Y]. Is that right?"\n\nLet them correct and refine. Iterate until they say "yes, that\'s it."\n\n### 4. Write a Draft Brief\n\nDerive a slug `YYYY-MM-DD-<topic>` (today\'s date + kebab-case topic \u2014 no `-draft` suffix).\nCreate a draft file at `docs/features/<slug>/brief.md`. Lazy-create `docs/features/<slug>/` if it doesn\'t exist.\n\nThe file MUST start with YAML frontmatter \u2014 the 4-field personal schema with `status: draft`:\n\n```yaml\n---\nstatus: draft\nowner: <resolved name>\ncreated: YYYY-MM-DD\nfeature: <slug>\n---\n```\n\n**Owner resolution:** look up the owner name in this order \u2014 (1) `git config user.name`, (2) value in your auto-memory `joycraft-owner.txt` if present, (3) ask the user once and persist. If you can\'t get a name, leave the field as `<resolved name>` and note it for the user.\n\nUse this format for the body:\n\n```markdown\n# [Topic] \u2014 Draft Brief\n\n> **Date:** YYYY-MM-DD\n> **Origin:** /skill:joycraft-interview session\n\n---\n\n## The Idea\n[2-3 paragraphs capturing what the user described \u2014 their words, their framing]\n\n## Problem\n[What pain or gap this addresses]\n\n## What "Done" Looks Like\n[The user\'s description of success \u2014 observable outcomes]\n\n## Constraints\n- [constraint 1]\n- [constraint 2]\n\n## Open Questions\n- [things that came up but weren\'t resolved]\n- [decisions that need more thought]\n\n## Out of Scope (for now)\n- [things explicitly deferred \u2014 see also: deferred work goes to `docs/backlog/`]\n\n## Raw Notes\n[Any additional context, quotes, or tangents worth preserving]\n```\n\n### 5. Offer to Capture Deferred Items to Backlog\n\nIf during the conversation deferred work surfaces (a tangent, a "later" item, a "out-of-scope but tempting" idea), ASK the user:\n\n> "This looks like deferred work \u2014 want me to capture it to `docs/backlog/`?"\n\nOnly on user confirmation, write a backlog entry at `docs/backlog/YYYY-MM-DD-<short-name>.md` with backlog frontmatter:\n\n```yaml\n---\nstatus: backlog\nowner: <resolved name>\ncreated: YYYY-MM-DD\nsource: docs/features/<slug>/brief.md\n---\n```\n\n**Never auto-write to `docs/backlog/`.** Every backlog entry is user-confirmed.\n\n### 6. Hand Off\n\nAfter writing the draft (and any backlog entries), present the canonical Handoff block.\nInclude any backlog paths produced as a side effect.\n\n## Recommended Next Steps\n\nNext:\n```bash\n/skill:joycraft-new-feature docs/features/<slug>/brief.md\n```\nRun /new first.\n\nIf the idea sounds complex \u2014 touches many files, involves architectural decisions, or the user is working in an unfamiliar area \u2014 nudge them toward research and design (e.g., `/skill:joycraft-research` then `/skill:joycraft-design`). But present it as a recommendation, not a gate.\n\n## Guidelines\n\n- **This is NOT /skill:joycraft-new-feature.** Do not push toward formal briefs, decomposition tables, or atomic specs. The point is exploration.\n- **Let the user lead.** Your job is to listen, clarify, and capture \u2014 not to structure or direct.\n- **Mark everything as DRAFT.** The output is a starting point, not a commitment.\n- **Keep it short.** The draft brief should be 1-2 pages max. Capture the essence, not every detail.\n- **Multiple interviews are fine.** The user might run this several times as their thinking evolves. Each creates a new dated draft.\n',
   "joycraft-lockdown.md": "---\nname: joycraft-lockdown\ndescription: Generate constrained execution boundaries for an implementation session -- NEVER rules and deny patterns to prevent agent overreach\n---\n\n# Lockdown Mode\n\nThe user wants to constrain agent behavior for an implementation session. Your job is to interview them about what should be off-limits, then generate AGENTS.md NEVER rules and Codex configuration deny patterns they can review and apply.\n\n## When Is Lockdown Useful?\n\nLockdown is most valuable for:\n- **Complex tech stacks** (hardware, firmware, multi-device) where agents can cause real damage\n- **Long-running autonomous sessions** where you won't be monitoring every action\n- **Production-adjacent work** where accidental network calls or package installs are risky\n\nFor simple feature work on a well-tested codebase, lockdown is usually overkill. Mention this context to the user so they can decide.\n\n## Step 1: Check for Tests\n\nBefore starting the interview, search the codebase for test files or directories (look for `tests/`, `test/`, `__tests__/`, `spec/`, or files matching `*.test.*`, `*.spec.*`).\n\nIf no tests are found, tell the user:\n\n> Lockdown mode is most useful when you already have tests in place -- it prevents the agent from modifying them while constraining behavior to writing code and running tests. Consider running `/skill:joycraft-new-feature` first to set up a test-driven workflow, then come back to lock it down.\n\nIf the user wants to proceed anyway, continue with the interview.\n\n## Step 2: Interview -- What to Lock Down\n\nAsk these three questions, one at a time. Wait for the user's response before proceeding to the next question.\n\n### Question 1: Read-Only Files\n\n> What test files or directories should be off-limits for editing? (e.g., `tests/`, `__tests__/`, `spec/`, specific test files)\n>\n> I'll generate NEVER rules to prevent editing these.\n\nIf the user isn't sure, suggest the test directories you found in Step 1.\n\n### Question 2: Allowed Commands\n\n> What commands should the agent be allowed to run? Defaults:\n> - Write and edit source code files\n> - Run the project's smoke test command\n> - Run the full test suite\n>\n> Any other commands to explicitly allow? Or should I restrict to just these?\n\n### Question 3: Denied Commands\n\n> What commands should be denied? Defaults:\n> - Package installs (`npm install`, `pip install`, `cargo add`, `go get`, etc.)\n> - Network tools (`curl`, `wget`, `ping`, `ssh`)\n> - Direct log file reading\n>\n> Any specific commands to add or remove from this list?\n\n**Edge case -- user wants to allow some network access:** If the user mentions API tests or specific endpoints that need network access, exclude those from the deny list and note the exception in the output.\n\n**Edge case -- user wants to lock down file writes:** If the user wants to prevent ALL file writes, warn them:\n\n> Denying all file writes would prevent the agent from doing any work. I recommend keeping source code writes allowed and only locking down test files, config files, or other sensitive directories.\n\n## Step 3: Generate Boundaries\n\nBased on the interview responses, generate output in this exact format:\n\n```\n## Lockdown boundaries generated\n\nReview these suggestions and add them to your project:\n\n### AGENTS.md -- add to NEVER section:\n\n- Edit any file in `[user's test directories]`\n- Run `[denied package manager commands]`\n- Use `[denied network tools]`\n- Read log files directly -- interact with logs only through test assertions\n- [Any additional NEVER rules based on user responses]\n\n### Codex configuration -- suggested deny patterns:\n\nAdd these to your Codex sandbox configuration to restrict command execution:\n\n[\"[command1]\", \"[command2]\", \"[command3]\"]\n\n---\n\nCopy these into your project manually, or tell me to apply them now (I'll show you the exact changes for approval first).\n```\n\nAdjust the content based on the actual interview responses:\n- Only include deny patterns for commands the user confirmed should be denied\n- Only include NEVER rules for directories/files the user specified\n- If the user allowed certain network tools or package managers, exclude those\n\n## Recommended Execution Model\n\nAfter generating the boundaries above, also recommend a Codex execution configuration. Include this section in your output:\n\n```\n### Recommended Execution Configuration\n\nCodex runs in a sandboxed environment by default. To maximize safety during lockdown:\n\n| Your situation | Configuration | Why |\n|---|---|---|\n| Autonomous spec execution | Sandbox with deny patterns above | Only pre-approved commands run |\n| Long session with some trust | Default sandbox | Network-disabled sandbox prevents external access |\n| Interactive development | Default with manual review | Review outputs before applying |\n\n**For lockdown mode, we recommend the default sandboxed execution** combined with the deny patterns above. Codex's sandbox already disables network access by default -- the deny patterns add file-level and command-level restrictions on top.\n\nIf you need network access for specific commands (e.g., API tests), configure explicit network allowances in your Codex setup rather than disabling the sandbox entirely.\n```\n\n## Step 4: Offer to Apply\n\nIf the user asks you to apply the changes:\n\n1. **For AGENTS.md:** Read the existing AGENTS.md, find the Behavioral Boundaries section, and show the user the exact diff for the NEVER section. Ask for confirmation before writing.\n2. **For Codex configuration:** Show the user what the deny patterns will look like after adding the new restrictions. Ask for confirmation before writing.\n\n**Never auto-apply. Always show the exact changes and wait for explicit approval.**\n",
@@ -675,4 +497,4 @@ export {
   PI_EXTENSIONS,
   PI_AGENTS
 };
-//# sourceMappingURL=chunk-XOMQIK4U.js.map
+//# sourceMappingURL=chunk-UEG5IO6Q.js.map