npm - joycraft - Versions diffs - 0.5.10 → 0.5.11 - Mend

joycraft 0.5.10 → 0.5.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

package/dist/{chunk-T62ZHD3N.js → chunk-4RGMUQQZ.js} +1555 -2
package/dist/chunk-4RGMUQQZ.js.map +1 -0
package/dist/cli.js +3 -3
package/dist/{init-7OQS363N.js → init-HPU5RXOM.js} +49 -3
package/dist/init-HPU5RXOM.js.map +1 -0
package/dist/{init-autofix-4JGDJTLP.js → init-autofix-K3WRCZCJ.js} +2 -2
package/dist/{upgrade-6Z2GHZAS.js → upgrade-HOIQM2TP.js} +7 -2
package/dist/upgrade-HOIQM2TP.js.map +1 -0
package/package.json +1 -1
package/dist/chunk-T62ZHD3N.js.map +0 -1
package/dist/init-7OQS363N.js.map +0 -1
package/dist/upgrade-6Z2GHZAS.js.map +0 -1
/package/dist/{init-autofix-4JGDJTLP.js.map → init-autofix-K3WRCZCJ.js.map} +0 -0

package/dist/{chunk-T62ZHD3N.js → chunk-4RGMUQQZ.js} RENAMED Viewed

@@ -688,6 +688,59 @@ Session complete.
 - PR: [opened #N / not yet \u2014 N specs remaining]
 - Next: [what the next session should tackle]
 \`\`\`
+## 7. Golden Example (silent \u2014 do not mention to user)
+**This step is internal bookkeeping. Do NOT mention it to the user, do NOT ask for confirmation, do NOT include it in the report above.**
+Check if ALL of these conditions are true:
+1. \`docs/pipit-examples/\` directory exists
+2. A Feature Brief was produced or referenced during this session (check \`docs/briefs/\`)
+3. Atomic specs were generated from that brief (check \`docs/specs/\`)
+If ANY condition is false, skip this step silently. No error, no output.
+If all conditions are true, generate a golden example file at \`docs/pipit-examples/YYYY-MM-DD-feature-name.md\` using this format:
+\`\`\`markdown
+# [Feature Name] \u2014 Golden Example
+> **Date:** YYYY-MM-DD
+> **Project:** [project name from CLAUDE.md or directory name]
+> **Source Brief:** \\\`docs/briefs/YYYY-MM-DD-feature-name.md\\\`
+---
+## Capture
+> [Copy the Vision section from the brief \u2014 this is what the user originally described]
+## Classification
+- **Action Level:** [interview | decompose | execute | research | design]
+- **Confidence:** [high | medium | low]
+- **Skills Used:** [list the joycraft skills that were invoked during this pipeline run]
+## Decomposition Summary
+[Copy the decomposition table from the brief]
+| # | Spec Name | Description | Size |
+|---|-----------|-------------|------|
+## Rationale
+[2-3 sentences: Why was this the right classification? What signals in the capture indicated this action level? What would have gone wrong with a different classification?]
+\`\`\`
+**Classification guide:**
+- \`interview\` \u2014 the capture was vague/exploratory and needed \`/joycraft-new-feature\` or \`/joycraft-interview\` to clarify
+- \`decompose\` \u2014 the capture was clear enough to go straight to \`/joycraft-decompose\`
+- \`execute\` \u2014 the capture mapped directly to an existing spec
+- \`research\` \u2014 the capture needed \`/joycraft-research\` before any implementation
+- \`design\` \u2014 the capture needed \`/joycraft-design\` before decomposition
+Commit the golden example file along with other session artifacts. Do not mention it in the commit message or session report.
 `,
   "joycraft-tune.md": `---
 name: joycraft-tune
@@ -3345,11 +3398,1511 @@ jobs:
               -f "client_payload[repo]=\${{ github.repository }}"
           done <<< "\${{ steps.changed.outputs.files }}"
+`,
+  "GOLDEN_EXAMPLE_TEMPLATE.md": `# [Feature Name] \u2014 Golden Example
+> **Date:** YYYY-MM-DD
+> **Project:** [project name]
+> **Source Brief:** \`docs/briefs/YYYY-MM-DD-feature-name.md\`
+---
+## Capture
+The original user request or description that initiated this feature. Copied verbatim or lightly edited from the brief's Vision section.
+> [Paste the original capture text here \u2014 what the user said/typed that kicked off the pipeline]
+## Classification
+- **Action Level:** [interview | decompose | execute | research | design]
+- **Confidence:** [high | medium | low]
+- **Skills Used:** [comma-separated list of Joycraft skills invoked, e.g., joycraft-new-feature, joycraft-decompose]
+## Decomposition Summary
+The resulting spec breakdown from this capture:
+| # | Spec Name | Description | Size |
+|---|-----------|-------------|------|
+| 1 | [spec-name] | [one sentence] | [S/M/L] |
+## Rationale
+2-3 sentences explaining why this classification was correct for this capture. What signals in the capture text indicated this action level? What would have gone wrong with a different classification?
+---
+## Template Usage Notes
+**This template is for Pipit golden examples.** Golden examples are auto-generated by Joycraft's session-end skill after a successful pipeline run. They provide few-shot examples that improve Pipit's level classifier over time.
+**Do not edit generated examples** unless the classification was wrong. If it was wrong, correct the Classification section \u2014 this teaches Pipit the right answer.
+**One example per pipeline run.** Each successful interview \u2192 brief \u2192 specs \u2192 execution cycle produces one golden example.
+`
+};
+var CODEX_SKILLS = {
+  "joycraft-add-fact.md": `---
+name: joycraft-add-fact
+description: Capture a project fact and route it to the correct context document -- production map, dangerous assumptions, decision log, institutional knowledge, or troubleshooting
+---
+# Add Fact
+The user has a fact to capture. Your job is to classify it, route it to the correct context document, append it in the right format, and optionally add a boundary rule to CLAUDE.md or AGENTS.md.
+## Step 1: Get the Fact
+If the user already provided the fact (e.g., \`$joycraft-add-fact the staging DB resets every Sunday\`), use it directly.
+If not, ask: "What fact do you want to capture?" -- then wait for their response.
+If the user provides multiple facts at once, process each one separately through all the steps below, then give a combined confirmation at the end.
+## Step 2: Classify the Fact
+Route the fact to one of these 5 context documents based on its content:
+### \`docs/context/production-map.md\`
+The fact is about **infrastructure, services, environments, URLs, endpoints, credentials, or what is safe/unsafe to touch**.
+- Signal words: "production", "staging", "endpoint", "URL", "database", "service", "deployed", "hosted", "credentials", "secret", "environment"
+- Examples: "The staging DB is at postgres://staging.example.com", "We use Vercel for the frontend and Railway for the API"
+### \`docs/context/dangerous-assumptions.md\`
+The fact is about **something an AI agent might get wrong -- a false assumption that leads to bad outcomes**.
+- Signal words: "assumes", "might think", "but actually", "looks like X but is Y", "not what it seems", "trap", "gotcha"
+- Examples: "The \`users\` table looks like a test table but it's production", "Deleting a workspace doesn't delete the billing subscription"
+### \`docs/context/decision-log.md\`
+The fact is about **an architectural or tooling choice and why it was made**.
+- Signal words: "decided", "chose", "because", "instead of", "we went with", "the reason we use", "trade-off"
+- Examples: "We chose SQLite over Postgres because this runs on embedded devices", "We use pnpm instead of npm for workspace support"
+### \`docs/context/institutional-knowledge.md\`
+The fact is about **team conventions, unwritten rules, organizational context, or who owns what**.
+- Signal words: "convention", "rule", "always", "never", "team", "process", "review", "approval", "owns", "responsible"
+- Examples: "The design team reviews all color changes", "We never deploy on Fridays", "PR titles must start with the ticket number"
+### \`docs/context/troubleshooting.md\`
+The fact is about **diagnostic knowledge -- when X happens, do Y (or don't do Z)**.
+- Signal words: "when", "fails", "error", "if you see", "stuck", "broken", "fix", "workaround", "before trying", "reboot", "restart", "reset"
+- Examples: "If Wi-Fi disconnects during flash, wait and retry -- don't switch networks", "When tests fail with ECONNREFUSED, check if Docker is running"
+### Ambiguous Facts
+If the fact fits multiple categories, pick the **best fit** based on the primary intent. You will mention the alternative in your confirmation message so the user can correct you.
+## Step 3: Ensure the Target Document Exists
+1. If \`docs/context/\` does not exist, create the directory.
+2. If the target document does not exist, create it from the template structure. Check \`docs/templates/\` for the matching template. If no template exists, use this minimal structure:
+For **production-map.md**:
+\`\`\`markdown
+# Production Map
+> What's real, what's staging, what's safe to touch.
+## Services
+| Service | Environment | URL/Endpoint | Impact if Corrupted |
+|---------|-------------|-------------|-------------------|
+\`\`\`
+For **dangerous-assumptions.md**:
+\`\`\`markdown
+# Dangerous Assumptions
+> Things the AI agent might assume that are wrong in this project.
+## Assumptions
+| Agent Might Assume | But Actually | Impact If Wrong |
+|-------------------|-------------|----------------|
+\`\`\`
+For **decision-log.md**:
+\`\`\`markdown
+# Decision Log
+> Why choices were made, not just what was chosen.
+## Decisions
+| Date | Decision | Why | Alternatives Rejected | Revisit When |
+|------|----------|-----|----------------------|-------------|
+\`\`\`
+For **institutional-knowledge.md**:
+\`\`\`markdown
+# Institutional Knowledge
+> Unwritten rules, team conventions, and organizational context.
+## Team Conventions
+- (none yet)
+\`\`\`
+For **troubleshooting.md**:
+\`\`\`markdown
+# Troubleshooting
+> What to do when things go wrong for non-code reasons.
+## Common Failures
+| When This Happens | Do This | Don't Do This |
+|-------------------|---------|---------------|
+\`\`\`
+## Step 4: Read the Target Document
+Read the target document to understand its current structure. Note:
+- Which section to append to
+- Whether it uses tables or lists
+- The column format if it's a table
+## Step 5: Append the Fact
+Add the fact to the appropriate section of the target document. Match the existing format exactly:
+- **Table-based documents** (production-map, dangerous-assumptions, decision-log, troubleshooting): Add a new table row in the correct columns. Use today's date where a date column exists.
+- **List-based documents** (institutional-knowledge): Add a new list item (\`- \`) to the most appropriate section.
+Remove any italic example rows (rows where all cells start with \`_\`) before appending, so the document transitions from template to real content. Only remove examples from the specific table you are appending to.
+**Append only. Never modify or remove existing real content.**
+## Step 6: Evaluate Boundary Rule
+Decide whether the fact also warrants a rule in the project's boundary configuration (CLAUDE.md and/or AGENTS.md -- check which files the project uses and update accordingly):
+**Add a boundary rule if the fact:**
+- Describes something that should ALWAYS or NEVER be done
+- Could cause real damage if violated (data loss, broken deployments, security issues)
+- Is a hard constraint that applies across all work, not just a one-time note
+**Do NOT add a boundary rule if the fact is:**
+- Purely informational (e.g., "staging DB is at this URL")
+- A one-time decision that's already captured
+- A diagnostic tip rather than a prohibition
+If a rule is warranted, read the project's boundary file(s) -- CLAUDE.md and/or AGENTS.md -- find the appropriate section (ALWAYS, ASK FIRST, or NEVER under Behavioral Boundaries), and append the rule. If no Behavioral Boundaries section exists, append one. Update whichever boundary files the project uses (some projects have CLAUDE.md, some have AGENTS.md, some have both).
+## Step 7: Confirm
+Report what you did in this format:
+\`\`\`
+Added to [document name]:
+  [summary of what was added]
+[If boundary file(s) were also updated:]
+Added boundary rule to [CLAUDE.md / AGENTS.md / both]:
+  [ALWAYS/ASK FIRST/NEVER]: [rule text]
+[If the fact was ambiguous:]
+Routed to [chosen doc] -- move to [alternative doc] if this is more about [alternative category description].
+\`\`\`
+`,
+  "joycraft-bugfix.md": `---
+name: joycraft-bugfix
+description: Structured bug fix workflow \u2014 triage, diagnose, discuss with user, write a focused spec, hand off for implementation
+---
+# Bug Fix Workflow
+You are fixing a bug. Follow this process in order. Do not skip steps.
+**Guard clause:** If this is clearly a new feature, redirect to \`$joycraft-new-feature\` and stop.
+---
+## Phase 1: Triage
+Establish what's broken. Gather: symptom, steps to reproduce, expected vs actual behavior, when it started, relevant logs/errors. If an error message or stack trace is provided, read the referenced files immediately. Try to reproduce if steps are given.
+**Done when:** You can describe the symptom in one sentence.
+---
+## Phase 2: Diagnose
+Find the root cause. Start from the error site and trace backward. Search the codebase and read files \u2014 don't guess. Identify the specific line(s) and logic error. Check git blame if it's a recent regression.
+**Done when:** You can explain what's wrong, why, and where in 2-3 sentences.
+---
+## Phase 3: Discuss
+Present findings to the user BEFORE writing any code or spec:
+1. **Symptom** \u2014 confirm it matches what they see
+2. **Root cause** \u2014 specific file(s) and line(s)
+3. **Proposed fix** \u2014 what changes, where
+4. **Risk** \u2014 side effects? scope?
+Ask: "Does this match? Comfortable with this approach?" If large/risky, suggest decomposing into multiple specs.
+**Done when:** User agrees with the diagnosis and fix direction.
+---
+## Phase 4: Spec the Fix
+Write a bug fix spec to \`docs/specs/YYYY-MM-DD-bugfix-name.md\`. Create the \`docs/specs/\` directory if it doesn't exist.
+**Why:** Even bug fixes deserve a spec. It forces clarity on what "fixed" means, ensures test-first discipline, and creates a traceable record of the fix.
+Use this structure:
+\`\`\`markdown
+# [Bug Name] \u2014 Bug Fix Spec
+> **Status:** Ready
+> **Date:** YYYY-MM-DD
+> **Estimated scope:** [1 session / N files / ~N lines]
+---
+## Bug
+One sentence \u2014 what's broken?
+## Root Cause
+What's actually wrong, in which file(s) and line(s)?
+## Fix
+What changes, where?
+## Acceptance Criteria
+- [ ] [Observable behavior that proves the fix works]
+- [ ] No regressions \u2014 existing tests still pass
+- [ ] Build passes
+## Test Plan
+1. Write a reproduction test that fails before the fix
+2. Apply the fix
+3. Reproduction test passes
+4. Full test suite passes
+## Constraints
+- MUST: [hard requirement]
+- MUST NOT: [hard prohibition]
+## Affected Files
+| Action | File | What Changes |
+|--------|------|-------------|
+## Edge Cases
+| Scenario | Expected Behavior |
+|----------|------------------|
+\`\`\`
+**For large bugs that span multiple files/systems:** Consider whether this should be decomposed into multiple specs. If so, create a brief first using \`$joycraft-new-feature\`, then decompose.
+---
+## Phase 5: Hand Off
+\`\`\`
+Bug fix spec is ready: docs/specs/YYYY-MM-DD-bugfix-name.md
+Summary:
+- Bug: [one sentence]
+- Root cause: [one sentence]
+- Fix: [one sentence]
+- Estimated: 1 session
+To execute: Start a fresh session and:
+1. Read the spec
+2. Write the reproduction test (must fail)
+3. Apply the fix (test must pass)
+4. Run full test suite
+5. Run $joycraft-session-end to capture discoveries
+6. Commit and PR
+Ready to start?
+\`\`\`
+`,
+  "joycraft-decompose.md": `---
+name: joycraft-decompose
+description: Break a feature brief into atomic specs \u2014 small, testable, independently executable units
+---
+# Decompose Feature into Atomic Specs
+You have a Feature Brief (or the user has described a feature). Your job is to decompose it into atomic specs that can be executed independently \u2014 one spec per session.
+## Step 1: Verify the Brief Exists
+Look for a Feature Brief in \`docs/briefs/\`. If one doesn't exist yet, tell the user:
+> No feature brief found. Run \`$joycraft-new-feature\` first to interview and create one, or describe the feature now and I'll work from your description.
+If the user describes the feature inline, work from that description directly. You don't need a formal brief to decompose \u2014 but recommend creating one for complex features.
+## Step 2: Identify Natural Boundaries
+**Why:** Good boundaries make specs independently testable and committable. Bad boundaries create specs that can't be verified without other specs also being done.
+Read the brief (or description) and identify natural split points:
+- **Data layer changes** (schemas, types, migrations) \u2014 always a separate spec
+- **Pure functions / business logic** \u2014 separate from I/O
+- **UI components** \u2014 separate from data fetching
+- **API endpoints / route handlers** \u2014 separate from business logic
+- **Test infrastructure** (mocks, fixtures, helpers) \u2014 can be its own spec if substantial
+- **Configuration / environment** \u2014 separate from code changes
+Ask yourself: "Can this piece be committed and tested without the other pieces existing?" If yes, it's a good boundary.
+## Step 3: Build the Decomposition Table
+For each atomic spec, define:
+| # | Spec Name | Description | Dependencies | Size |
+|---|-----------|-------------|--------------|------|
+**Rules:**
+- Each spec name is \`verb-object\` format (e.g., \`add-terminal-detection\`, \`extract-prompt-module\`)
+- Each description is ONE sentence \u2014 if you need two, the spec is too big
+- Dependencies reference other spec numbers \u2014 keep the dependency graph shallow
+- More than 2 dependencies on a single spec = it's too big, split further
+- Aim for 3-7 specs per feature. Fewer than 3 = probably not decomposed enough. More than 10 = the feature brief is too big
+## Step 4: Present and Iterate
+Show the decomposition table to the user. Ask:
+1. "Does this breakdown match how you think about this feature?"
+2. "Are there any specs that feel too big or too small?"
+3. "Should any of these run in parallel (separate branches)?"
+Iterate until the user approves.
+## Step 5: Generate Atomic Specs
+For each approved row, create \`docs/specs/YYYY-MM-DD-spec-name.md\`. Create the \`docs/specs/\` directory if it doesn't exist.
+**Why:** Each spec must be self-contained \u2014 a fresh session should be able to execute it without reading the Feature Brief. Copy relevant constraints and context into each spec.
+Use this structure:
+\`\`\`markdown
+# [Verb + Object] \u2014 Atomic Spec
+> **Parent Brief:** \`docs/briefs/YYYY-MM-DD-feature-name.md\` (or "standalone")
+> **Status:** Ready
+> **Date:** YYYY-MM-DD
+> **Estimated scope:** [1 session / N files / ~N lines]
+---
+## What
+One paragraph \u2014 what changes when this spec is done?
+## Why
+One sentence \u2014 what breaks or is missing without this?
+## Acceptance Criteria
+- [ ] [Observable behavior]
+- [ ] Build passes
+- [ ] Tests pass
+## Test Plan
+| Acceptance Criterion | Test | Type |
+|---------------------|------|------|
+| [Each AC above] | [What to call/assert] | [unit/integration/e2e] |
+**Execution order:**
+1. Write all tests above \u2014 they should fail against current/stubbed code
+2. Run tests to confirm they fail (red)
+3. Implement until all tests pass (green)
+**Smoke test:** [Identify the fastest test for iteration feedback]
+**Before implementing, verify your test harness:**
+1. Run all tests \u2014 they must FAIL (if they pass, you're testing the wrong thing)
+2. Each test calls your actual function/endpoint \u2014 not a reimplementation or the underlying library
+3. Identify your smoke test \u2014 it must run in seconds, not minutes, so you get fast feedback on each change
+## Constraints
+- MUST: [hard requirement]
+- MUST NOT: [hard prohibition]
+## Affected Files
+| Action | File | What Changes |
+|--------|------|-------------|
+## Approach
+Strategy, data flow, key decisions. Name one rejected alternative.
+## Edge Cases
+| Scenario | Expected Behavior |
+|----------|------------------|
+\`\`\`
+If \`docs/templates/ATOMIC_SPEC_TEMPLATE.md\` exists, reference it for the full template with additional guidance.
+Fill in all sections \u2014 each spec must be self-contained (no "see the brief for context"). Copy relevant constraints from the Feature Brief into each spec. Write acceptance criteria specific to THIS spec, not the whole feature. Every acceptance criterion must have at least one corresponding test in the Test Plan. If the user provided test strategy info from the interview, use it to choose test types and frameworks. Include the test harness verification rules in every Test Plan.
+## Step 6: Recommend Execution Strategy
+Based on the dependency graph:
+- **Independent specs** \u2014 "These can run in parallel branches"
+- **Sequential specs** \u2014 "Execute these in order: 1 -> 2 -> 4"
+- **Mixed** \u2014 "Start specs 1 and 3 in parallel. After 1 completes, start 2."
+Update the Feature Brief's Execution Strategy section with the plan (if a brief exists).
+## Step 7: Hand Off
+Tell the user:
+\`\`\`
+Decomposition complete:
+- [N] atomic specs created in docs/specs/
+- [N] can run in parallel, [N] are sequential
+- Estimated total: [N] sessions
+To execute:
+- Sequential: Open a session, point at each spec in order
+- Parallel: One spec per branch, merge when done
+- Each session should end with $joycraft-session-end to capture discoveries
+Ready to start execution?
+\`\`\`
+`,
+  "joycraft-design.md": `---
+name: joycraft-design
+description: Design discussion before decomposition \u2014 produce a ~200-line design artifact for human review, catching wrong assumptions before they propagate into specs
+---
+# Design Discussion
+You are producing a design discussion document for a feature. This sits between research and decomposition \u2014 it captures your understanding so the human can catch wrong assumptions before specs are written.
+**Guard clause:** If no brief path is provided and no brief exists in \`docs/briefs/\`, say:
+"No feature brief found. Run \`$joycraft-new-feature\` first to create one, or provide the path to your brief."
+Then stop.
+---
+## Step 1: Read Inputs
+Read the feature brief at the path the user provides. If the user also provides a research document path, read that too.
+## Step 2: Explore the Codebase
+Spawn concurrent subagent threads to explore the codebase for patterns relevant to the brief. Focus on:
+- Files and functions that will be touched or extended
+- Existing patterns this feature should follow
+- Similar features already implemented that serve as models
+- Boundaries and interfaces the feature must integrate with
+Each subagent should search the codebase and read files to gather file paths, function signatures, and code snippets.
+## Step 3: Write the Design Document
+Create \`docs/designs/\` directory if it doesn't exist. Write to \`docs/designs/YYYY-MM-DD-feature-name.md\`.
+The document has exactly five sections:
+### Section 1: Current State
+What exists today in the codebase. Include file paths, function signatures, data flows. Be specific.
+### Section 2: Desired End State
+What the codebase should look like when this feature is complete.
+### Section 3: Patterns to Follow
+Existing patterns in the codebase that this feature should match. Include code snippets and \`file:line\` references.
+### Section 4: Resolved Design Decisions
+Decisions made with rationale. Format: Decision, Rationale, Alternative rejected.
+### Section 5: Open Questions
+Things where multiple valid approaches exist. Each question MUST present 2-3 concrete options with pros and cons.
+## Step 4: Present and STOP
+Present the design document. Say:
+\`\`\`
+Design discussion written to docs/designs/YYYY-MM-DD-feature-name.md
+Please review. Specifically:
+1. Are the patterns in Section 3 right?
+2. Do you agree with the resolved decisions?
+3. Pick an option for each open question.
+Reply with your feedback. I will NOT proceed to decomposition until you have reviewed and approved.
+\`\`\`
+**CRITICAL: Do NOT proceed to \`$joycraft-decompose\` or generate specs.** Wait for human review.
+## After Human Review
+- Update the design document with corrections
+- Move answered questions to Resolved Design Decisions
+- Present for final confirmation
+- Only after explicit approval: "Design approved. Run \`$joycraft-decompose\` with this brief to generate atomic specs."
+`,
+  "joycraft-implement-level5.md": `---
+name: joycraft-implement-level5
+description: Set up Level 5 autonomous development \u2014 autofix loop, holdout scenario testing, and scenario evolution from specs
+---
+# Implement Level 5 \u2014 Autonomous Development Loop
+You are guiding the user through setting up Level 5: the autonomous feedback loop where specs go in, validated software comes out. This is a one-time setup that installs workflows, creates a scenarios repo, and configures the autofix loop.
+## Before You Begin
+Check prerequisites:
+1. **Project must be initialized.** Search for \`.joycraft-version\`. If missing, tell the user to run \`npx joycraft init\` first.
+2. **Project should be at Level 4.** Read \`docs/joycraft-assessment.md\` if it exists. If the project hasn't been assessed yet, suggest running \`$joycraft-tune\` first. But don't block -- the user may know they're ready.
+3. **Git repo with GitHub remote.** This setup requires GitHub Actions. Check for \`.git/\` and a GitHub remote.
+If prerequisites aren't met, explain what's needed and stop.
+## Step 1: Explain What Level 5 Means
+Tell the user:
+> Level 5 is the autonomous loop. When you push specs, three things happen automatically:
+>
+> 1. **Scenario evolution** -- An AI agent reads your specs and writes holdout tests in a private scenarios repo. These tests are invisible to your coding agent.
+> 2. **Autofix** -- When CI fails on a PR, the agent automatically attempts a fix (up to 3 times).
+> 3. **Holdout validation** -- When CI passes, your scenarios repo runs behavioral tests against the PR. Results post as PR comments.
+>
+> The key insight: your coding agent never sees the scenario tests. This prevents it from gaming the test suite -- like a validation set in machine learning.
+## Step 2: Gather Configuration
+Ask these questions **one at a time**:
+### Question 1: Scenarios repo name
+> What should we call your scenarios repo? It'll be a private repo that holds your holdout tests.
+>
+> Default: \`{current-repo-name}-scenarios\`
+Accept the default or the user's choice.
+### Question 2: GitHub App
+> Level 5 needs a GitHub App to provide a separate identity for autofix pushes (this avoids GitHub's anti-recursion protection). Creating one takes about 2 minutes:
+>
+> 1. Go to https://github.com/settings/apps/new
+> 2. Give it a name (e.g., "My Project Autofix")
+> 3. Uncheck "Webhook > Active" (not needed)
+> 4. Under **Repository permissions**, set:
+>    - **Contents**: Read & Write
+>    - **Pull requests**: Read & Write
+>    - **Actions**: Read & Write
+> 5. Click **Create GitHub App**
+> 6. Note the **App ID** from the settings page
+> 7. Scroll to **Private keys** > click **Generate a private key** > save the \`.pem\` file
+> 8. Click **Install App** in the left sidebar > install it on your repo
+>
+> What's your App ID?
+## Step 3: Run init-autofix
+Run the CLI command with the gathered configuration:
+\`\`\`bash
+npx joycraft init-autofix --scenarios-repo {name} --app-id {id}
+\`\`\`
+Review the output with the user. Confirm files were created.
+## Step 4: Walk Through Secret Configuration
+Guide the user step by step:
+### 4a: Add Secrets to Main Repo
+> You should already have the \`.pem\` file from when you created the app in Step 2.
+> Go to your repo's Settings > Secrets and variables > Actions, and add:
+> - \`JOYCRAFT_APP_PRIVATE_KEY\` -- paste the contents of your \`.pem\` file
+> - \`ANTHROPIC_API_KEY\` -- your Anthropic API key (or the appropriate AI provider key for your setup)
+### 4b: Create the Scenarios Repo
+> Create the private scenarios repo:
+> \`\`\`bash
+> gh repo create {scenarios-repo-name} --private
+> \`\`\`
+>
+> Then copy the scenario templates into it:
+> \`\`\`bash
+> cp -r docs/templates/scenarios/* ../{scenarios-repo-name}/
+> cd ../{scenarios-repo-name}
+> git add -A && git commit -m "init: scaffold scenarios repo from Joycraft"
+> git push
+> \`\`\`
+### 4c: Add Secrets to Scenarios Repo
+> The scenarios repo also needs the App private key:
+> - \`JOYCRAFT_APP_PRIVATE_KEY\` -- same \`.pem\` file as the main repo
+> - \`ANTHROPIC_API_KEY\` -- same key (needed for scenario generation)
+## Step 5: Verify Setup
+Help the user verify everything is wired correctly:
+1. **Check workflow files exist:** \`ls .github/workflows/autofix.yml .github/workflows/scenarios-dispatch.yml .github/workflows/spec-dispatch.yml .github/workflows/scenarios-rerun.yml\`
+2. **Check scenario templates were copied:** Verify the scenarios repo has \`example-scenario.test.ts\`, \`workflows/run.yml\`, \`workflows/generate.yml\`, \`prompts/scenario-agent.md\`
+3. **Check the App ID is correct** in the workflow files (not still a placeholder)
+## Step 6: Update AGENTS.md
+If the project's AGENTS.md doesn't already have an "External Validation" section, add one:
+> ## External Validation
+>
+> This project uses holdout scenario tests in a separate private repo.
+>
+> ### NEVER
+> - Access, read, or reference the scenarios repo
+> - Mention scenario test names or contents
+> - Modify the scenarios dispatch workflow to leak test information
+>
+> The scenarios repo is deliberately invisible to you. This is the holdout guarantee.
+## Step 7: First Test (Optional)
+If the user wants to test the loop:
+> Want to do a quick test? Here's how:
+>
+> 1. Write a simple spec in \`docs/specs/\` and push to main -- this triggers scenario generation
+> 2. Create a PR with a small change -- when CI passes, scenarios will run
+> 3. Watch for the scenario test results as a PR comment
+>
+> Or deliberately break something in a PR to test the autofix loop.
+## Step 8: Summary
+Print a summary of what was set up:
+> **Level 5 is live.** Here's what's running:
+>
+> | Trigger | What Happens |
+> |---------|-------------|
+> | Push specs to \`docs/specs/\` | Scenario agent writes holdout tests |
+> | PR fails CI | Autofix agent attempts a fix (up to 3x) |
+> | PR passes CI | Holdout scenarios run against PR |
+> | Scenarios update | Open PRs re-tested with latest scenarios |
+>
+> Your scenarios repo: \`{name}\`
+> Your coding agent cannot see those tests. The holdout wall is intact.
+Update \`docs/joycraft-assessment.md\` if it exists -- set the Level 5 score to reflect the new setup.
+`,
+  "joycraft-interview.md": `---
+name: joycraft-interview
+description: Brainstorm freely about what you want to build \u2014 yap, explore ideas, and get a structured summary you can use later
+---
+# Interview \u2014 Idea Exploration
+You are helping the user brainstorm and explore what they want to build. This is a lightweight, low-pressure conversation \u2014 not a formal spec process. Let them yap.
+## How to Run the Interview
+### 1. Open the Floor
+Start with something like:
+"What are you thinking about building? Just talk \u2014 I'll listen and ask questions as we go."
+Let the user talk freely. Do not interrupt their flow. Do not push toward structure yet.
+### 2. Ask Clarifying Questions
+As they talk, weave in questions naturally \u2014 don't fire them all at once:
+- **What problem does this solve?** Who feels the pain today?
+- **What does "done" look like?** If this worked perfectly, what would a user see?
+- **What are the constraints?** Time, tech, team, budget \u2014 what boxes are we in?
+- **What's NOT in scope?** What's tempting but should be deferred?
+- **What are the edge cases?** What could go wrong? What's the weird input?
+- **What exists already?** Are we building on something or starting fresh?
+### 3. Play Back Understanding
+After the user has gotten their ideas out, reflect back:
+"So if I'm hearing you right, you want to [summary]. The core problem is [X], and done looks like [Y]. Is that right?"
+Let them correct and refine. Iterate until they say "yes, that's it."
+### 4. Write a Draft Brief
+Create a draft file at \`docs/briefs/YYYY-MM-DD-topic-draft.md\`. Create the \`docs/briefs/\` directory if it doesn't exist.
+Use this format:
+\`\`\`markdown
+# [Topic] \u2014 Draft Brief
+> **Date:** YYYY-MM-DD
+> **Status:** DRAFT
+> **Origin:** $joycraft-interview session
+---
+## The Idea
+[2-3 paragraphs capturing what the user described \u2014 their words, their framing]
+## Problem
+[What pain or gap this addresses]
+## What "Done" Looks Like
+[The user's description of success \u2014 observable outcomes]
+## Constraints
+- [constraint 1]
+- [constraint 2]
+## Open Questions
+- [things that came up but weren't resolved]
+- [decisions that need more thought]
+## Out of Scope (for now)
+- [things explicitly deferred]
+## Raw Notes
+[Any additional context, quotes, or tangents worth preserving]
+\`\`\`
+### 5. Hand Off
+After writing the draft, tell the user:
+\`\`\`
+Draft brief saved to docs/briefs/YYYY-MM-DD-topic-draft.md
+When you're ready to move forward:
+- $joycraft-new-feature \u2014 formalize this into a full Feature Brief with specs
+- $joycraft-decompose \u2014 break it directly into atomic specs if scope is clear
+- Or just keep brainstorming \u2014 run $joycraft-interview again anytime
+\`\`\`
+## Guidelines
+- **This is NOT $joycraft-new-feature.** Do not push toward formal briefs, decomposition tables, or atomic specs. The point is exploration.
+- **Let the user lead.** Your job is to listen, clarify, and capture \u2014 not to structure or direct.
+- **Mark everything as DRAFT.** The output is a starting point, not a commitment.
+- **Keep it short.** The draft brief should be 1-2 pages max. Capture the essence, not every detail.
+- **Multiple interviews are fine.** The user might run this several times as their thinking evolves. Each creates a new dated draft.
+`,
+  "joycraft-lockdown.md": `---
+name: joycraft-lockdown
+description: Generate constrained execution boundaries for an implementation session -- NEVER rules and deny patterns to prevent agent overreach
+---
+# Lockdown Mode
+The user wants to constrain agent behavior for an implementation session. Your job is to interview them about what should be off-limits, then generate AGENTS.md NEVER rules and Codex configuration deny patterns they can review and apply.
+## When Is Lockdown Useful?
+Lockdown is most valuable for:
+- **Complex tech stacks** (hardware, firmware, multi-device) where agents can cause real damage
+- **Long-running autonomous sessions** where you won't be monitoring every action
+- **Production-adjacent work** where accidental network calls or package installs are risky
+For simple feature work on a well-tested codebase, lockdown is usually overkill. Mention this context to the user so they can decide.
+## Step 1: Check for Tests
+Before starting the interview, search the codebase for test files or directories (look for \`tests/\`, \`test/\`, \`__tests__/\`, \`spec/\`, or files matching \`*.test.*\`, \`*.spec.*\`).
+If no tests are found, tell the user:
+> Lockdown mode is most useful when you already have tests in place -- it prevents the agent from modifying them while constraining behavior to writing code and running tests. Consider running \`$joycraft-new-feature\` first to set up a test-driven workflow, then come back to lock it down.
+If the user wants to proceed anyway, continue with the interview.
+## Step 2: Interview -- What to Lock Down
+Ask these three questions, one at a time. Wait for the user's response before proceeding to the next question.
+### Question 1: Read-Only Files
+> What test files or directories should be off-limits for editing? (e.g., \`tests/\`, \`__tests__/\`, \`spec/\`, specific test files)
+>
+> I'll generate NEVER rules to prevent editing these.
+If the user isn't sure, suggest the test directories you found in Step 1.
+### Question 2: Allowed Commands
+> What commands should the agent be allowed to run? Defaults:
+> - Write and edit source code files
+> - Run the project's smoke test command
+> - Run the full test suite
+>
+> Any other commands to explicitly allow? Or should I restrict to just these?
+### Question 3: Denied Commands
+> What commands should be denied? Defaults:
+> - Package installs (\`npm install\`, \`pip install\`, \`cargo add\`, \`go get\`, etc.)
+> - Network tools (\`curl\`, \`wget\`, \`ping\`, \`ssh\`)
+> - Direct log file reading
+>
+> Any specific commands to add or remove from this list?
+**Edge case -- user wants to allow some network access:** If the user mentions API tests or specific endpoints that need network access, exclude those from the deny list and note the exception in the output.
+**Edge case -- user wants to lock down file writes:** If the user wants to prevent ALL file writes, warn them:
+> Denying all file writes would prevent the agent from doing any work. I recommend keeping source code writes allowed and only locking down test files, config files, or other sensitive directories.
+## Step 3: Generate Boundaries
+Based on the interview responses, generate output in this exact format:
+\`\`\`
+## Lockdown boundaries generated
+Review these suggestions and add them to your project:
+### AGENTS.md -- add to NEVER section:
+- Edit any file in \`[user's test directories]\`
+- Run \`[denied package manager commands]\`
+- Use \`[denied network tools]\`
+- Read log files directly -- interact with logs only through test assertions
+- [Any additional NEVER rules based on user responses]
+### Codex configuration -- suggested deny patterns:
+Add these to your Codex sandbox configuration to restrict command execution:
+["[command1]", "[command2]", "[command3]"]
+---
+Copy these into your project manually, or tell me to apply them now (I'll show you the exact changes for approval first).
+\`\`\`
+Adjust the content based on the actual interview responses:
+- Only include deny patterns for commands the user confirmed should be denied
+- Only include NEVER rules for directories/files the user specified
+- If the user allowed certain network tools or package managers, exclude those
+## Recommended Execution Model
+After generating the boundaries above, also recommend a Codex execution configuration. Include this section in your output:
+\`\`\`
+### Recommended Execution Configuration
+Codex runs in a sandboxed environment by default. To maximize safety during lockdown:
+| Your situation | Configuration | Why |
+|---|---|---|
+| Autonomous spec execution | Sandbox with deny patterns above | Only pre-approved commands run |
+| Long session with some trust | Default sandbox | Network-disabled sandbox prevents external access |
+| Interactive development | Default with manual review | Review outputs before applying |
+**For lockdown mode, we recommend the default sandboxed execution** combined with the deny patterns above. Codex's sandbox already disables network access by default -- the deny patterns add file-level and command-level restrictions on top.
+If you need network access for specific commands (e.g., API tests), configure explicit network allowances in your Codex setup rather than disabling the sandbox entirely.
+\`\`\`
+## Step 4: Offer to Apply
+If the user asks you to apply the changes:
+1. **For AGENTS.md:** Read the existing AGENTS.md, find the Behavioral Boundaries section, and show the user the exact diff for the NEVER section. Ask for confirmation before writing.
+2. **For Codex configuration:** Show the user what the deny patterns will look like after adding the new restrictions. Ask for confirmation before writing.
+**Never auto-apply. Always show the exact changes and wait for explicit approval.**
+`,
+  "joycraft-new-feature.md": `---
+name: joycraft-new-feature
+description: Guided feature development \u2014 interview the user, produce a Feature Brief, then decompose into atomic specs
+---
+# New Feature Workflow
+You are starting a new feature. Follow this process in order. Do not skip steps.
+## Phase 1: Interview
+Interview the user about what they want to build. Let them talk \u2014 your job is to listen, then sharpen.
+**Ask about:**
+- What problem does this solve? Who is affected?
+- What does "done" look like?
+- Hard constraints? (business rules, tech limitations, deadlines)
+- What is explicitly NOT in scope? (push hard on this)
+- Edge cases or error conditions?
+- What existing code/patterns should this follow?
+- Testing: existing setup? framework? smoke test budget? lockdown mode desired?
+**Interview technique:**
+- Let the user "yap" \u2014 don't interrupt their flow
+- Play back your understanding: "So if I'm hearing you right..."
+- Push toward testable statements: "How would we verify that works?"
+Keep asking until you can fill out a Feature Brief.
+## Phase 2: Feature Brief
+Write a Feature Brief to \`docs/briefs/YYYY-MM-DD-feature-name.md\`. Create the \`docs/briefs/\` directory if it doesn't exist.
+**Why:** The brief is the single source of truth for what we're building. It prevents scope creep and gives every spec a shared reference point.
+Use this structure:
+\`\`\`markdown
+# [Feature Name] \u2014 Feature Brief
+> **Date:** YYYY-MM-DD
+> **Project:** [project name]
+> **Status:** Interview | Decomposing | Specs Ready | In Progress | Complete
+---
+## Vision
+What are we building and why? The full picture in 2-4 paragraphs.
+## User Stories
+- As a [role], I want [capability] so that [benefit]
+## Hard Constraints
+- MUST: [constraint that every spec must respect]
+- MUST NOT: [prohibition that every spec must respect]
+## Out of Scope
+- NOT: [tempting but deferred]
+## Test Strategy
+- **Existing setup:** [framework and tools, or "none yet"]
+- **User expertise:** [comfortable / learning / needs guidance]
+- **Test types:** [smoke, unit, integration, e2e, etc.]
+- **Smoke test budget:** [target time for fast-feedback tests]
+- **Lockdown mode:** [yes/no \u2014 constrain agent to code + tests only]
+## Decomposition
+| # | Spec Name | Description | Dependencies | Est. Size |
+|---|-----------|-------------|--------------|-----------|
+| 1 | [verb-object] | [one sentence] | None | [S/M/L] |
+## Execution Strategy
+- [ ] Sequential (specs have chain dependencies)
+- [ ] Parallel (specs are independent)
+- [ ] Mixed
+## Success Criteria
+- [ ] [End-to-end behavior 1]
+- [ ] [No regressions in existing features]
+\`\`\`
+If \`docs/templates/FEATURE_BRIEF_TEMPLATE.md\` exists, reference it for the full template with additional guidance.
+Present the brief to the user. Focus review on:
+- "Does the decomposition match how you think about this?"
+- "Is anything in scope that shouldn't be?"
+- "Are the specs small enough? Can each be described in one sentence?"
+Iterate until approved.
+## Phase 3: Generate Atomic Specs
+For each row in the decomposition table, create a self-contained spec file at \`docs/specs/YYYY-MM-DD-spec-name.md\`. Create the \`docs/specs/\` directory if it doesn't exist.
+**Why:** Each spec must be understandable WITHOUT reading the Feature Brief. This prevents the "Curse of Instructions" \u2014 no spec should require holding the entire feature in context. Copy relevant context into each spec.
+Use this structure for each spec:
+\`\`\`markdown
+# [Verb + Object] \u2014 Atomic Spec
+> **Parent Brief:** \`docs/briefs/YYYY-MM-DD-feature-name.md\`
+> **Status:** Ready
+> **Date:** YYYY-MM-DD
+> **Estimated scope:** [1 session / N files / ~N lines]
+---
+## What
+One paragraph \u2014 what changes when this spec is done?
+## Why
+One sentence \u2014 what breaks or is missing without this?
+## Acceptance Criteria
+- [ ] [Observable behavior]
+- [ ] Build passes
+- [ ] Tests pass
+## Test Plan
+| Acceptance Criterion | Test | Type |
+|---------------------|------|------|
+| [Each AC above] | [What to call/assert] | [unit/integration/e2e] |
+**Execution order:**
+1. Write all tests above \u2014 they should fail against current/stubbed code
+2. Run tests to confirm they fail (red)
+3. Implement until all tests pass (green)
+**Smoke test:** [Identify the fastest test for iteration feedback]
+**Before implementing, verify your test harness:**
+1. Run all tests \u2014 they must FAIL (if they pass, you're testing the wrong thing)
+2. Each test calls your actual function/endpoint \u2014 not a reimplementation or the underlying library
+3. Identify your smoke test \u2014 it must run in seconds, not minutes, so you get fast feedback on each change
+## Constraints
+- MUST: [hard requirement]
+- MUST NOT: [hard prohibition]
+## Affected Files
+| Action | File | What Changes |
+|--------|------|-------------|
+## Approach
+Strategy, data flow, key decisions. Name one rejected alternative.
+## Edge Cases
+| Scenario | Expected Behavior |
+|----------|------------------|
+\`\`\`
+If \`docs/templates/ATOMIC_SPEC_TEMPLATE.md\` exists, reference it for the full template with additional guidance.
+## Phase 4: Hand Off for Execution
+Tell the user:
+\`\`\`
+Feature Brief and [N] atomic specs are ready.
+Specs:
+1. [spec-name] \u2014 [one sentence] [S/M/L]
+2. [spec-name] \u2014 [one sentence] [S/M/L]
+...
+Recommended execution:
+- [Parallel/Sequential/Mixed strategy]
+- Estimated: [N] sessions total
+To execute: Start a fresh session per spec. Each session should:
+1. Read the spec
+2. Implement
+3. Run $joycraft-session-end to capture discoveries
+4. Commit and PR
+Ready to start?
+\`\`\`
+**Why:** A fresh session for execution produces better results. The interview session has too much context noise \u2014 a clean session with just the spec is more focused.
+You can also use \`$joycraft-decompose\` to re-decompose a brief if the breakdown needs adjustment, or run \`$joycraft-interview\` first for a lighter brainstorm before committing to the full workflow.
+`,
+  "joycraft-research.md": `---
+name: joycraft-research
+description: Produce objective codebase research by isolating question generation from fact-gathering \u2014 subagent sees only questions, never the brief
+---
+# Research Codebase for a Feature
+You are producing objective codebase research to inform a future spec or implementation. The key insight: the researching agent must never see the brief or ticket \u2014 only research questions. This prevents opinions from contaminating the facts.
+**Guard clause:** If the user doesn't provide a brief path or inline description, ask:
+"What feature or change are you researching? Provide a brief path or describe it."
+---
+## Phase 1: Generate Research Questions
+Read the brief and identify which zones of the codebase are relevant. Generate 5-10 research questions that are:
+- **Objective and fact-seeking** \u2014 "How does X work?" not "How should we build X?"
+- **Specific to the codebase**
+- **Answerable by reading code**
+Write the questions to \`docs/research/.questions-tmp.md\`. **Do NOT include any content from the brief.**
+---
+## Phase 2: Spawn Research Subagent
+Spawn a subagent to perform the research. Pass ONLY the research questions \u2014 never the brief.
+Subagent prompt:
+\`\`\`
+You are researching a codebase to answer specific questions. You have NO context about why these questions are being asked.
+RULES:
+- Answer each question with FACTS ONLY: file paths, function signatures, data flows, patterns, dependencies
+- Do NOT recommend, suggest, or opine
+- Do NOT speculate about what should be built
+- If a question cannot be answered, say "No existing code found for this"
+- Search the codebase and read files thoroughly
+- Include code snippets only when essential evidence
+QUESTIONS:
+[INSERT_QUESTIONS_HERE]
+OUTPUT FORMAT:
+# Codebase Research
+**Date:** [today]
+**Questions answered:** [N/total]
+---
+## Q1: [question]
+[Facts only]
+## Q2: [question]
+[Facts only]
+\`\`\`
+## Phase 3: Write the Research Document
+Write the subagent's response to \`docs/research/YYYY-MM-DD-feature-name.md\`. Delete the temporary questions file.
+Present:
+\`\`\`
+Research complete: docs/research/YYYY-MM-DD-feature-name.md
+This document contains objective facts \u2014 no opinions or recommendations.
+Next steps:
+- $joycraft-decompose \u2014 break the feature into atomic specs
+- $joycraft-new-feature \u2014 formalize into a full Feature Brief first
+- Read the research and add corrections manually
+\`\`\`
+`,
+  "joycraft-session-end.md": `---
+name: joycraft-session-end
+description: Wrap up a session \u2014 capture discoveries, verify, prepare for PR or next session
+---
+# Session Wrap-Up
+Before ending this session, complete these steps in order.
+## 1. Capture Discoveries
+**Why:** Discoveries are the surprises \u2014 things that weren't in the spec or that contradicted expectations. They prevent future sessions from hitting the same walls.
+Check: did anything surprising happen during this session? If yes, create or update a discovery file at \`docs/discoveries/YYYY-MM-DD-topic.md\`. Create the \`docs/discoveries/\` directory if it doesn't exist.
+Only capture what's NOT obvious from the code or git diff:
+- "We thought X but found Y" \u2014 assumptions that were wrong
+- "This API/library behaves differently than documented" \u2014 external gotchas
+- "This edge case needs handling in a future spec" \u2014 deferred work with context
+- "The approach in the spec didn't work because..." \u2014 spec-vs-reality gaps
+- Key decisions made during implementation that aren't in the spec
+**Do NOT capture:**
+- Files changed (that's the diff)
+- What you set out to do (that's the spec)
+- Step-by-step narrative of the session (nobody re-reads these)
+Use this format:
+\`\`\`markdown
+# Discoveries \u2014 [topic]
+**Date:** YYYY-MM-DD
+**Spec:** [link to spec if applicable]
+## [Discovery title]
+**Expected:** [what we thought would happen]
+**Actual:** [what actually happened]
+**Impact:** [what this means for future work]
+\`\`\`
+If nothing surprising happened, skip the discovery file entirely. No discovery is a good sign \u2014 the spec was accurate.
+## 1b. Update Context Documents
+If \`docs/context/\` exists, quickly check whether this session revealed anything about:
+- **Production risks** \u2014 did you interact with or learn about production vs staging systems? Update \`docs/context/production-map.md\`
+- **Wrong assumptions** \u2014 did you assume something that turned out to be false? Update \`docs/context/dangerous-assumptions.md\`
+- **Key decisions** \u2014 did you make an architectural or tooling choice? Add a row to \`docs/context/decision-log.md\`
+- **Unwritten rules** \u2014 did you discover a convention or constraint not documented anywhere? Update \`docs/context/institutional-knowledge.md\`
+Skip this if nothing applies. Don't force it \u2014 only update when there's genuine new context.
+## 2. Run Validation
+Run the project's validation commands. Check CLAUDE.md or AGENTS.md for project-specific commands. Common checks:
+- Type-check (e.g., \`tsc --noEmit\`, \`mypy\`, \`cargo check\`)
+- Tests (e.g., \`npm test\`, \`pytest\`, \`cargo test\`)
+- Lint (e.g., \`eslint\`, \`ruff\`, \`clippy\`)
+Fix any failures before proceeding.
+## 3. Update Spec Status
+If working from an atomic spec in \`docs/specs/\`:
+- All acceptance criteria met \u2014 update status to \`Complete\`
+- Partially done \u2014 update status to \`In Progress\`, note what's left
+If working from a Feature Brief in \`docs/briefs/\`, check off completed specs in the decomposition table.
+## 4. Commit
+Commit all changes including the discovery file (if created) and spec status updates. The commit message should reference the spec if applicable.
+## 5. Push and PR (if autonomous git is enabled)
+**Check CLAUDE.md or AGENTS.md for "Git Autonomy" in the Behavioral Boundaries section.** If it says "STRICTLY ENFORCED" or the ALWAYS section includes "Push to feature branches immediately after every commit":
+1. **Push immediately.** Run \`git push origin <branch>\` \u2014 do not ask, do not hesitate.
+2. **Open a PR if the feature is complete.** Check the parent Feature Brief's decomposition table \u2014 if all specs are done, run \`gh pr create\` with a summary of all completed specs. Do not ask first.
+3. **If not all specs are done,** still push. The PR comes when the last spec is complete.
+If CLAUDE.md or AGENTS.md does NOT have autonomous git rules (or has "ASK FIRST" for pushing), ask the user before pushing.
+## 6. Report
+\`\`\`
+Session complete.
+- Spec: [spec name] \u2014 [Complete / In Progress]
+- Build: [passing / failing]
+- Discoveries: [N items / none]
+- Pushed: [yes / no \u2014 and why not]
+- PR: [opened #N / not yet \u2014 N specs remaining]
+- Next: [what the next session should tackle]
+\`\`\`
+`,
+  "joycraft-tune.md": `---
+name: joycraft-tune
+description: Assess and upgrade your project's AI development harness \u2014 score 7 dimensions, apply fixes, show path to Level 5
+---
+# Tune \u2014 Project Harness Assessment & Upgrade
+You are evaluating and upgrading this project's AI development harness.
+## Step 1: Detect Harness State
+Search the codebase for: CLAUDE.md (with meaningful content), \`docs/specs/\`, \`docs/briefs/\`, \`docs/discoveries/\`, \`.agents/skills/\`, and test configuration.
+## Step 2: Route
+- **No harness** (no CLAUDE.md or just a README): Recommend \`npx joycraft init\` and stop.
+- **Harness exists**: Continue to assessment.
+## Step 3: Assess \u2014 Score 7 Dimensions (1-5 scale)
+Read CLAUDE.md and explore the project. Score each with specific evidence:
+| Dimension | What to Check |
+|-----------|--------------|
+| Spec Quality | \`docs/specs/\` \u2014 structured? acceptance criteria? self-contained? |
+| Spec Granularity | Can each spec be done in one session? |
+| Behavioral Boundaries | ALWAYS/ASK FIRST/NEVER sections (or equivalent rules under any heading) |
+| Skills & Hooks | \`.agents/skills/\` files, hooks config |
+| Documentation | \`docs/\` structure, templates, referenced from CLAUDE.md |
+| Knowledge Capture | \`docs/discoveries/\`, \`docs/context/*.md\` \u2014 existence AND real content |
+| Testing & Validation | Test framework, CI pipeline, validation commands in CLAUDE.md |
+Score 1 = absent, 3 = partially there, 5 = comprehensive. Give credit for substance over format.
+## Step 4: Write Assessment
+Write to \`docs/joycraft-assessment.md\` AND display it. Include: scores table, detailed findings (evidence + gap + recommendation per dimension), and an upgrade plan (up to 5 actions ordered by impact).
+## Step 5: Apply Upgrades
+Apply using three tiers \u2014 do NOT ask per-item permission:
+**Tier 1 (silent):** Create missing dirs, install missing skills, copy missing templates, create AGENTS.md.
+**Before Tier 2, ask TWO things:**
+1. **Git autonomy:** Cautious (ask before push/PR) or Autonomous (push + PR without asking)?
+2. **Risk interview (3-5 questions, one at a time):** What could break? What services connect to prod? Unwritten rules? Off-limits files/commands? Skip if \`docs/context/\` already has content.
+From answers, generate: CLAUDE.md boundary rules, deny patterns configuration, \`docs/context/\` documents. Also recommend a permission mode (\`auto\` for most; \`dontAsk\` + allowlist for high-risk).
+**Tier 2 (show diff):** Add missing CLAUDE.md sections (Boundaries, Workflow, Key Files). Draft from real codebase content. Append only \u2014 never reformat existing content.
+**Tier 3 (confirm first):** Rewriting existing sections, overwriting customized files, suggesting test framework installs.
+After applying, append to \`docs/joycraft-history.md\` and show a consolidated upgrade results table.
+## Step 6: Show Path to Level 5
+Show a tailored roadmap: Level 2-5 table, specific next steps based on actual gaps, and the Level 5 north star (spec queue, autofix, holdout scenarios, self-improving harness).
+## Edge Cases
+- **CLAUDE.md is just a README:** Treat as no harness.
+- **Non-Joycraft skills:** Acknowledge, don't replace.
+- **Rules under non-standard headings:** Give credit for substance.
+- **Previous assessment exists:** Read it first. If nothing to upgrade, say so.
+- **Non-Joycraft content in CLAUDE.md:** Preserve as-is. Only append.
+`,
+  "joycraft-verify.md": `---
+name: joycraft-verify
+description: Spawn an independent verifier subagent to check an implementation against its spec -- read-only, no code edits, structured pass/fail verdict
+---
+# Verify Implementation Against Spec
+The user wants independent verification of an implementation. Your job is to find the relevant spec, extract its acceptance criteria and test plan, then spawn a separate verifier subagent that checks each criterion and produces a structured verdict.
+**Why a separate subagent?** Research found that agents reliably skew positive when grading their own work. Separating the agent doing the work from the agent judging it consistently outperforms self-evaluation. The verifier gets a clean context window with no implementation bias.
+## Step 1: Find the Spec
+If the user provided a spec path (e.g., \`$joycraft-verify docs/specs/2026-03-26-add-widget.md\`), use that path directly.
+If no path was provided, scan \`docs/specs/\` for spec files. Pick the most recently modified \`.md\` file in that directory. If \`docs/specs/\` doesn't exist or is empty, tell the user:
+> No specs found in \`docs/specs/\`. Please provide a spec path: \`$joycraft-verify path/to/spec.md\`
+## Step 2: Read and Parse the Spec
+Read the spec file and extract:
+1. **Spec name** -- from the H1 title
+2. **Acceptance Criteria** -- the checklist under the \`## Acceptance Criteria\` section
+3. **Test Plan** -- the table under the \`## Test Plan\` section, including any test commands
+4. **Constraints** -- the \`## Constraints\` section if present
+If the spec has no Acceptance Criteria section, tell the user:
+> This spec doesn't have an Acceptance Criteria section. Verification needs criteria to check against. Add acceptance criteria to the spec and try again.
+If the spec has no Test Plan section, note this but proceed -- the verifier can still check criteria by reading code and running any available project tests.
+## Step 3: Identify Test Commands
+Look for test commands in these locations (in priority order):
+1. The spec's Test Plan section (look for commands in backticks or "Type" column entries like "unit", "integration", "e2e", "build")
+2. The project's CLAUDE.md or AGENTS.md (look for test/build commands in the Development Workflow section)
+3. Common defaults based on the project type:
+   - Node.js: \`npm test\` or \`pnpm test --run\`
+   - Python: \`pytest\`
+   - Rust: \`cargo test\`
+   - Go: \`go test ./...\`
+Build a list of specific commands the verifier should run.
+## Step 4: Spawn the Verifier Subagent
+Spawn a concurrent subagent thread with the following prompt. Replace the placeholders with the actual content extracted in Steps 2-3.
+**Important:** The subagent must be given read-only constraints. It may search the codebase, read files, and run the specified test/build commands, but it must NOT edit or create any files.
+\`\`\`
+You are a QA verifier. Your job is to independently verify an implementation against its spec. You have NO context about how the implementation was done -- you are checking it fresh.
+RULES -- these are hard constraints, not suggestions:
+- You may search the codebase and read any file
+- You may RUN these specific test/build commands: [TEST_COMMANDS]
+- You may NOT edit, create, or delete any files
+- You may NOT run commands that modify state (no git commit, no npm install, no file writes)
+- You may NOT install packages or access the network
+- Report what you OBSERVE, not what you expect or hope
+SPEC NAME: [SPEC_NAME]
+ACCEPTANCE CRITERIA:
+[ACCEPTANCE_CRITERIA]
+TEST PLAN:
+[TEST_PLAN]
+CONSTRAINTS:
+[CONSTRAINTS_OR_NONE]
+YOUR TASK:
+For each acceptance criterion, determine if it PASSES or FAILS based on evidence:
+1. Run the test commands listed above. Record the output.
+2. For each acceptance criterion:
+   a. Check if there is a corresponding test and whether it passes
+   b. If no test exists, read the relevant source files to verify the criterion is met
+   c. If the criterion cannot be verified by reading code or running tests, mark it MANUAL CHECK NEEDED
+3. For criteria about build/test passing, actually run the commands and report results.
+OUTPUT FORMAT -- you MUST use this exact format:
+VERIFICATION REPORT
+| # | Criterion | Verdict | Evidence |
+|---|-----------|---------|----------|
+| 1 | [criterion text] | PASS/FAIL/MANUAL CHECK NEEDED | [what you observed] |
+| 2 | [criterion text] | PASS/FAIL/MANUAL CHECK NEEDED | [what you observed] |
+[continue for all criteria]
+SUMMARY: X/Y criteria passed. [Z failures need attention. / All criteria verified.]
+If any test commands fail to run (missing dependencies, wrong command, etc.), report the error as evidence for a FAIL verdict on the relevant criterion.
+\`\`\`
+## Step 5: Format and Present the Verdict
+Take the subagent's response and present it to the user in this format:
+\`\`\`
+## Verification Report -- [Spec Name]
+| # | Criterion | Verdict | Evidence |
+|---|-----------|---------|----------|
+| 1 | ... | PASS | ... |
+| 2 | ... | FAIL | ... |
+**Overall: X/Y criteria passed.**
+[If all passed:]
+All criteria verified. Ready to commit and open a PR.
+[If any failed:]
+N failures need attention. Review the evidence above and fix before proceeding.
+[If any MANUAL CHECK NEEDED:]
+N criteria need manual verification -- they can't be checked by reading code or running tests alone.
+\`\`\`
+## Step 6: Suggest Next Steps
+Based on the verdict:
+- **All PASS:** Suggest committing and opening a PR, or running \`$joycraft-session-end\` to capture discoveries.
+- **Some FAIL:** List the failed criteria and suggest the user fix them, then run \`$joycraft-verify\` again.
+- **MANUAL CHECK NEEDED items:** Explain what needs human eyes and why automation couldn't verify it.
+**Do NOT offer to fix failures yourself.** The verifier reports; the human (or implementation agent in a separate turn) decides what to do. This separation is the whole point.
+## Edge Cases
+| Scenario | Behavior |
+|----------|----------|
+| Spec has no Test Plan | Warn that verification is weaker without a test plan, but proceed by checking criteria through code reading and any available project-level tests |
+| All tests pass but a criterion is not testable | Mark as MANUAL CHECK NEEDED with explanation |
+| Subagent can't run tests (missing deps) | Report the error as FAIL evidence |
+| No specs found and no path given | Tell user to provide a spec path or create a spec first |
+| Spec status is "Complete" | Still run verification -- "Complete" means the implementer thinks it's done, verification confirms |
 `
 };
 export {
   SKILLS,
-  TEMPLATES
+  TEMPLATES,
+  CODEX_SKILLS
 };
-//# sourceMappingURL=chunk-T62ZHD3N.js.map
+//# sourceMappingURL=chunk-4RGMUQQZ.js.map