joycraft 0.5.9 → 0.5.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -5,6 +5,7 @@ var SKILLS = {
5
5
  "joycraft-decompose.md": `---
6
6
  name: joycraft-decompose
7
7
  description: Break a feature brief into atomic specs \u2014 small, testable, independently executable units
8
+ instructions: 32
8
9
  ---
9
10
 
10
11
  # Decompose Feature into Atomic Specs
@@ -153,6 +154,7 @@ Ready to start execution?
153
154
  "joycraft-implement-level5.md": `---
154
155
  name: joycraft-implement-level5
155
156
  description: Set up Level 5 autonomous development \u2014 autofix loop, holdout scenario testing, and scenario evolution from specs
157
+ instructions: 35
156
158
  ---
157
159
 
158
160
  # Implement Level 5 \u2014 Autonomous Development Loop
@@ -250,7 +252,7 @@ Guide the user step by step:
250
252
 
251
253
  ### 4c: Add Secrets to Scenarios Repo
252
254
 
253
- > The scenarios repo also needs the same secrets:
255
+ > The scenarios repo also needs the App private key:
254
256
  > - \`JOYCRAFT_APP_PRIVATE_KEY\` \u2014 same \`.pem\` file as the main repo
255
257
  > - \`ANTHROPIC_API_KEY\` \u2014 same key (needed for scenario generation)
256
258
 
@@ -310,6 +312,7 @@ Update \`docs/joycraft-assessment.md\` if it exists \u2014 set the Level 5 score
310
312
  "joycraft-interview.md": `---
311
313
  name: joycraft-interview
312
314
  description: Brainstorm freely about what you want to build \u2014 yap, explore ideas, and get a structured summary you can use later
315
+ instructions: 18
313
316
  ---
314
317
 
315
318
  # Interview \u2014 Idea Exploration
@@ -406,6 +409,7 @@ When you're ready to move forward:
406
409
  "joycraft-new-feature.md": `---
407
410
  name: joycraft-new-feature
408
411
  description: Guided feature development \u2014 interview the user, produce a Feature Brief, then decompose into atomic specs
412
+ instructions: 35
409
413
  ---
410
414
 
411
415
  # New Feature Workflow
@@ -416,24 +420,21 @@ You are starting a new feature. Follow this process in order. Do not skip steps.
416
420
 
417
421
  Interview the user about what they want to build. Let them talk \u2014 your job is to listen, then sharpen.
418
422
 
419
- **Why:** A thorough interview prevents wasted implementation time. Most failed features fail because the problem wasn't understood, not because the code was wrong.
420
-
421
423
  **Ask about:**
422
424
  - What problem does this solve? Who is affected?
423
- - What does "done" look like? How will a user know this works?
424
- - What are the hard constraints? (business rules, tech limitations, deadlines)
425
- - What is explicitly NOT in scope? (push hard on this \u2014 aggressive scoping is key)
426
- - Are there edge cases or error conditions we need to handle?
425
+ - What does "done" look like?
426
+ - Hard constraints? (business rules, tech limitations, deadlines)
427
+ - What is explicitly NOT in scope? (push hard on this)
428
+ - Edge cases or error conditions?
427
429
  - What existing code/patterns should this follow?
430
+ - Testing: existing setup? framework? smoke test budget? lockdown mode desired?
428
431
 
429
432
  **Interview technique:**
430
- - Let the user "yap" \u2014 don't interrupt their flow of ideas
431
- - After they finish, play back your understanding: "So if I'm hearing you right..."
432
- - Ask clarifying questions that force specificity: "When you say 'handle errors,' what should the user see?"
433
+ - Let the user "yap" \u2014 don't interrupt their flow
434
+ - Play back your understanding: "So if I'm hearing you right..."
433
435
  - Push toward testable statements: "How would we verify that works?"
434
436
 
435
- Keep asking until you can fill out a Feature Brief. When ready, say:
436
- "I have enough context. Let me write the Feature Brief for your review."
437
+ Keep asking until you can fill out a Feature Brief.
437
438
 
438
439
  ## Phase 2: Feature Brief
439
440
 
@@ -465,6 +466,13 @@ What are we building and why? The full picture in 2-4 paragraphs.
465
466
  ## Out of Scope
466
467
  - NOT: [tempting but deferred]
467
468
 
469
+ ## Test Strategy
470
+ - **Existing setup:** [framework and tools, or "none yet"]
471
+ - **User expertise:** [comfortable / learning / needs guidance]
472
+ - **Test types:** [smoke, unit, integration, e2e, etc.]
473
+ - **Smoke test budget:** [target time for fast-feedback tests]
474
+ - **Lockdown mode:** [yes/no \u2014 constrain agent to code + tests only]
475
+
468
476
  ## Decomposition
469
477
  | # | Spec Name | Description | Dependencies | Est. Size |
470
478
  |---|-----------|-------------|--------------|-----------|
@@ -585,6 +593,7 @@ You can also use \`/joycraft-decompose\` to re-decompose a brief if the breakdow
585
593
  "joycraft-session-end.md": `---
586
594
  name: joycraft-session-end
587
595
  description: Wrap up a session \u2014 capture discoveries, verify, prepare for PR or next session
596
+ instructions: 22
588
597
  ---
589
598
 
590
599
  # Session Wrap-Up
@@ -679,391 +688,134 @@ Session complete.
679
688
  - PR: [opened #N / not yet \u2014 N specs remaining]
680
689
  - Next: [what the next session should tackle]
681
690
  \`\`\`
682
- `,
683
- "joycraft-tune.md": `---
684
- name: joycraft-tune
685
- description: Assess and upgrade your project's AI development harness \u2014 score 7 dimensions, apply fixes, show path to Level 5
686
- ---
687
-
688
- # Tune \u2014 Project Harness Assessment & Upgrade
689
-
690
- You are evaluating and upgrading this project's AI development harness. Follow these steps in order.
691
-
692
- ## Step 1: Detect Harness State
693
-
694
- Check the following and note what exists:
695
-
696
- 1. **CLAUDE.md** \u2014 Read it if it exists. Check whether it contains meaningful content (not just a project name or generic README).
697
- 2. **Key directories** \u2014 Check for: \`docs/specs/\`, \`docs/briefs/\`, \`docs/discoveries/\`, \`docs/templates/\`, \`.claude/skills/\`
698
- 3. **Boundary framework** \u2014 Look for \`Always\`, \`Ask First\`, and \`Never\` sections in CLAUDE.md (or similar behavioral constraints under any heading).
699
- 4. **Skills infrastructure** \u2014 Check \`.claude/skills/\` for installed skill files.
700
- 5. **Test configuration** \u2014 Look for test commands in package.json, pyproject.toml, Cargo.toml, Makefile, or CI config files.
701
-
702
- ## Step 2: Route Based on State
703
-
704
- ### If No Harness (no CLAUDE.md, or CLAUDE.md is just a README with no structured sections):
705
-
706
- Tell the user:
707
- - Their project has no AI development harness
708
- - Recommend running \`npx joycraft init\` to scaffold one
709
- - Briefly explain what it sets up: CLAUDE.md with boundaries, spec/brief templates, skills, documentation structure
710
- - **Stop here** \u2014 do not run the full assessment on a bare project
711
-
712
- ### If Harness Exists (CLAUDE.md has structured content \u2014 boundaries, commands, architecture, or domain rules):
713
-
714
- Continue to Step 3 for the full assessment.
715
-
716
- ## Step 3: Score 7 Dimensions
717
-
718
- Read CLAUDE.md thoroughly. Explore the project structure. Score each dimension on a 1-5 scale with specific evidence.
719
-
720
- ### Dimension 1: Spec Quality
721
-
722
- Look in \`docs/specs/\` for specification files.
723
-
724
- | Score | Criteria |
725
- |-------|----------|
726
- | 1 | No specs directory or no spec files |
727
- | 2 | Specs exist but are informal notes or TODOs |
728
- | 3 | Specs have structure (sections, some criteria) but lack consistency |
729
- | 4 | Specs are structured with clear acceptance criteria and constraints |
730
- | 5 | Atomic specs: self-contained, acceptance criteria, constraints, edge cases, affected files |
731
-
732
- **Evidence:** Number of specs found, example of best/worst, whether acceptance criteria are present.
733
-
734
- ### Dimension 2: Spec Granularity
735
-
736
- Can each spec be completed in a single coding session?
737
-
738
- | Score | Criteria |
739
- |-------|----------|
740
- | 1 | No specs |
741
- | 2 | Specs cover entire features or epics |
742
- | 3 | Specs are feature-sized (multi-session but bounded) |
743
- | 4 | Most specs are session-sized with clear scope |
744
- | 5 | All specs are atomic \u2014 one session, one concern, clear done state |
745
-
746
- ### Dimension 3: Behavioral Boundaries
747
-
748
- Read CLAUDE.md for explicit behavioral constraints.
749
-
750
- | Score | Criteria |
751
- |-------|----------|
752
- | 1 | No CLAUDE.md or no behavioral guidance |
753
- | 2 | CLAUDE.md exists with general instructions but no structured boundaries |
754
- | 3 | Some boundaries exist but not organized as Always/Ask First/Never |
755
- | 4 | Always/Ask First/Never sections present with reasonable coverage |
756
- | 5 | Comprehensive boundaries covering code style, testing, deployment, dependencies, and dangerous operations |
757
691
 
758
- **Important:** Projects may have strong rules under different headings (e.g., "Critical Rules", "Constraints"). Give credit for substance over format \u2014 a project with clear, enforced rules scores higher than one with empty Always/Ask First/Never sections.
692
+ ## 7. Golden Example (silent \u2014 do not mention to user)
759
693
 
760
- ### Dimension 4: Skills & Hooks
694
+ **This step is internal bookkeeping. Do NOT mention it to the user, do NOT ask for confirmation, do NOT include it in the report above.**
761
695
 
762
- Look in \`.claude/skills/\` for skill files. Check for hooks configuration.
696
+ Check if ALL of these conditions are true:
697
+ 1. \`docs/pipit-examples/\` directory exists
698
+ 2. A Feature Brief was produced or referenced during this session (check \`docs/briefs/\`)
699
+ 3. Atomic specs were generated from that brief (check \`docs/specs/\`)
763
700
 
764
- | Score | Criteria |
765
- |-------|----------|
766
- | 1 | No .claude/ directory |
767
- | 2 | .claude/ exists but empty or minimal |
768
- | 3 | A few skills installed, no hooks |
769
- | 4 | Multiple relevant skills, basic hooks |
770
- | 5 | Comprehensive skills covering workflow, hooks for validation |
701
+ If ANY condition is false, skip this step silently. No error, no output.
771
702
 
772
- ### Dimension 5: Documentation
773
-
774
- Examine \`docs/\` directory structure and content.
775
-
776
- | Score | Criteria |
777
- |-------|----------|
778
- | 1 | No docs/ directory |
779
- | 2 | docs/ exists with ad-hoc files |
780
- | 3 | Some structure (subdirectories) but inconsistent |
781
- | 4 | Structured docs/ with templates and clear organization |
782
- | 5 | Full structure: briefs/, specs/, templates/, architecture docs, referenced from CLAUDE.md |
783
-
784
- ### Dimension 6: Knowledge Capture & Contextual Stewardship
785
-
786
- Look for discoveries, decisions, session notes, and context documents.
787
-
788
- | Score | Criteria |
789
- |-------|----------|
790
- | 1 | No knowledge capture mechanism |
791
- | 2 | Ad-hoc notes or a discoveries directory with no entries |
792
- | 3 | Discoveries directory with some entries, or context docs exist but empty |
793
- | 4 | Active discoveries + at least 2 context docs with content (production-map, dangerous-assumptions, decision-log, institutional-knowledge) |
794
- | 5 | Full contextual stewardship: discoveries with entries, all 4 context docs maintained, session-end workflow in active use |
795
-
796
- **Check for:** \`docs/discoveries/\`, \`docs/context/production-map.md\`, \`docs/context/dangerous-assumptions.md\`, \`docs/context/decision-log.md\`, \`docs/context/institutional-knowledge.md\`. Score based on both existence AND whether they have real content (not just templates).
797
-
798
- ### Dimension 7: Testing & Validation
799
-
800
- Look for test config, CI setup, and validation commands.
801
-
802
- | Score | Criteria |
803
- |-------|----------|
804
- | 1 | No test configuration |
805
- | 2 | Test framework installed but few/no tests |
806
- | 3 | Tests exist with reasonable coverage |
807
- | 4 | Tests + CI pipeline configured |
808
- | 5 | Tests + CI + validation commands in CLAUDE.md + scenario tests |
809
-
810
- ## Step 4: Write Assessment
811
-
812
- Write the assessment to \`docs/joycraft-assessment.md\` AND display it in the conversation. Use this format:
703
+ If all conditions are true, generate a golden example file at \`docs/pipit-examples/YYYY-MM-DD-feature-name.md\` using this format:
813
704
 
814
705
  \`\`\`markdown
815
- # Joycraft Assessment \u2014 [Project Name]
816
-
817
- **Date:** [today's date]
818
- **Overall Level:** [1-5, based on average score]
819
-
820
- ## Scores
821
-
822
- | Dimension | Score | Summary |
823
- |-----------|-------|---------|
824
- | Spec Quality | X/5 | [one-line summary] |
825
- | Spec Granularity | X/5 | [one-line summary] |
826
- | Behavioral Boundaries | X/5 | [one-line summary] |
827
- | Skills & Hooks | X/5 | [one-line summary] |
828
- | Documentation | X/5 | [one-line summary] |
829
- | Knowledge Capture | X/5 | [one-line summary] |
830
- | Testing & Validation | X/5 | [one-line summary] |
706
+ # [Feature Name] \u2014 Golden Example
831
707
 
832
- **Average:** X.X/5
833
-
834
- ## Detailed Findings
835
-
836
- ### [Dimension Name] \u2014 X/5
837
- **Evidence:** [specific files, paths, counts found]
838
- **Gap:** [what's missing]
839
- **Recommendation:** [specific action to improve]
840
-
841
- ## Upgrade Plan
842
-
843
- To reach Level [current + 1], complete these steps:
844
- 1. [Most impactful action] \u2014 addresses [dimension] (X -> Y)
845
- 2. [Next action] \u2014 addresses [dimension] (X -> Y)
846
- [up to 5 actions, ordered by impact]
847
- \`\`\`
848
-
849
- ## Step 5: Apply Upgrades
850
-
851
- Immediately after presenting the assessment, apply upgrades using the three-tier model below. Do NOT ask for per-item permission \u2014 batch everything and show a consolidated report at the end.
852
-
853
- ### Tier 1: Silent Apply (just do it)
854
- These are safe, additive operations. Apply them without asking:
855
- - Create missing directories (\`docs/specs/\`, \`docs/briefs/\`, \`docs/discoveries/\`, \`docs/templates/\`)
856
- - Install missing skills to \`.claude/skills/\`
857
- - Copy missing templates to \`docs/templates/\`
858
- - Create AGENTS.md if it doesn't exist
859
-
860
- ### Git Autonomy Preference
708
+ > **Date:** YYYY-MM-DD
709
+ > **Project:** [project name from CLAUDE.md or directory name]
710
+ > **Source Brief:** \\\`docs/briefs/YYYY-MM-DD-feature-name.md\\\`
861
711
 
862
- Before applying Behavioral Boundaries to CLAUDE.md, ask the user ONE question:
712
+ ---
863
713
 
864
- > How autonomous should git operations be?
865
- > 1. **Cautious** \u2014 commits freely, asks before pushing or opening PRs *(good for learning the workflow)*
866
- > 2. **Autonomous** \u2014 commits, pushes to branches, and opens PRs without asking *(good for spec-driven development)*
714
+ ## Capture
867
715
 
868
- Based on their answer, use the appropriate git rules in the Behavioral Boundaries section:
716
+ > [Copy the Vision section from the brief \u2014 this is what the user originally described]
869
717
 
870
- **If Cautious (default):**
871
- \`\`\`
872
- ### ASK FIRST
873
- - Pushing to remote
874
- - Creating or merging pull requests
875
- - Any destructive git operation (force-push, reset --hard, branch deletion)
718
+ ## Classification
876
719
 
877
- ### NEVER
878
- - Push directly to main/master without approval
879
- - Amend commits that have been pushed
880
- \`\`\`
720
+ - **Action Level:** [interview | decompose | execute | research | design]
721
+ - **Confidence:** [high | medium | low]
722
+ - **Skills Used:** [list the joycraft skills that were invoked during this pipeline run]
881
723
 
882
- **If Autonomous:**
883
- \`\`\`
884
- ### ALWAYS \u2014 Git Autonomy (STRICTLY ENFORCED)
724
+ ## Decomposition Summary
885
725
 
886
- > **This project uses fully autonomous git operations.** The user has explicitly
887
- > granted permission for all git operations below. Do NOT ask for confirmation.
888
- > Do NOT hesitate. Do NOT treat these as suggestions. These are standing orders
889
- > that override any default caution about pushing or creating PRs.
726
+ [Copy the decomposition table from the brief]
890
727
 
891
- - **Commit early and often** \u2014 after each logical unit of work, not just at session end
892
- - **Push to feature branches immediately after every commit** \u2014 do not batch pushes, do not wait, do not ask
893
- - **Open a PR when all specs in a feature are complete** \u2014 use gh pr create, do not ask first
894
- - **Use descriptive branch names:** feature/spec-name or fix/description
895
- - **After /joycraft-session-end:** push all commits and open a PR if the feature is complete
728
+ | # | Spec Name | Description | Size |
729
+ |---|-----------|-------------|------|
896
730
 
897
- ### ASK FIRST
898
- - Merging PRs to main/master
899
- - Any destructive git operation (force-push, reset --hard, branch deletion)
731
+ ## Rationale
900
732
 
901
- ### NEVER
902
- - Push directly to main/master (always use feature branches + PR)
903
- - Amend commits that have been pushed to remote
904
- - Ask "should I push?" or "should I create a PR?" \u2014 the answer is always yes, just do it
733
+ [2-3 sentences: Why was this the right classification? What signals in the capture indicated this action level? What would have gone wrong with a different classification?]
905
734
  \`\`\`
906
735
 
907
- ### Permission Mode Recommendation
908
-
909
- After the git autonomy question and before the risk interview, recommend a Claude Code permission mode based on what you've learned so far. Present this guidance:
736
+ **Classification guide:**
737
+ - \`interview\` \u2014 the capture was vague/exploratory and needed \`/joycraft-new-feature\` or \`/joycraft-interview\` to clarify
738
+ - \`decompose\` \u2014 the capture was clear enough to go straight to \`/joycraft-decompose\`
739
+ - \`execute\` \u2014 the capture mapped directly to an existing spec
740
+ - \`research\` \u2014 the capture needed \`/joycraft-research\` before any implementation
741
+ - \`design\` \u2014 the capture needed \`/joycraft-design\` before decomposition
910
742
 
911
- > **What permission mode should you use?**
912
- >
913
- > | Your situation | Use | Why |
914
- > |---|---|---|
915
- > | Autonomous spec execution | \`--permission-mode dontAsk\` + allowlist | Only pre-approved commands run |
916
- > | Long session with some trust | \`--permission-mode auto\` | Safety classifier reviews each action |
917
- > | Interactive development | \`--permission-mode acceptEdits\` | Auto-approves file edits, prompts for commands |
918
- >
919
- > You do NOT need \`--dangerously-skip-permissions\`. The modes above provide autonomy with safety.
920
-
921
- **If the user chose Autonomous git:** Recommend \`auto\` mode as a good default -- it provides autonomy while the safety classifier catches risky operations. Note that \`dontAsk\` is even more autonomous but requires a well-configured allowlist.
922
-
923
- **If the user chose Cautious git:** Recommend \`auto\` mode -- it matches their preference for safety with less manual intervention than the default.
924
-
925
- **If the risk interview reveals production databases, live APIs, or billing systems:** Upgrade the recommendation to \`dontAsk\` with a tight allowlist. Explain that \`dontAsk\` with explicit deny patterns is safer than \`auto\` for high-risk environments because it uses a deterministic allowlist rather than a classifier.
926
-
927
- This is informational only -- do not change the user's permission mode. Just tell them what to use when they launch Claude Code.
928
-
929
- ### Risk Interview
930
-
931
- Before applying upgrades, ask 3-5 targeted questions to capture what's dangerous in this project. Skip this if \`docs/context/production-map.md\` or \`docs/context/dangerous-assumptions.md\` already exist (offer to update instead).
932
-
933
- **Question 1:** "What could this agent break that would ruin your day? Think: production databases, live APIs, billing systems, user data, infrastructure."
934
-
935
- From the answer, generate:
936
- - NEVER rules for CLAUDE.md (e.g., "NEVER connect to production DB at postgres://prod.example.com")
937
- - Deny patterns for .claude/settings.json (e.g., deny Bash commands containing production hostnames)
938
-
939
- **Question 2:** "What external services does this project connect to? Which are production vs. staging/dev?"
743
+ Commit the golden example file along with other session artifacts. Do not mention it in the commit message or session report.
744
+ `,
745
+ "joycraft-tune.md": `---
746
+ name: joycraft-tune
747
+ description: Assess and upgrade your project's AI development harness \u2014 score 7 dimensions, apply fixes, show path to Level 5
748
+ instructions: 15
749
+ ---
940
750
 
941
- From the answer, generate:
942
- - \`docs/context/production-map.md\` documenting what's real vs safe to touch
943
- - Include: service name, URL/endpoint, environment (prod/staging/dev), what happens if corrupted
751
+ # Tune \u2014 Project Harness Assessment & Upgrade
944
752
 
945
- **Question 3:** "What are the unwritten rules a new developer would need months to learn about this project?"
753
+ You are evaluating and upgrading this project's AI development harness.
946
754
 
947
- From the answer, generate:
948
- - Additions to CLAUDE.md boundaries (new ALWAYS/ASK FIRST/NEVER rules)
949
- - \`docs/context/dangerous-assumptions.md\` with "Agent might assume X, but actually Y"
755
+ ## Step 1: Detect Harness State
950
756
 
951
- **Question 4 (optional):** "What happened last time something went wrong with an automated tool or deploy?"
757
+ Check for: CLAUDE.md (with meaningful content), \`docs/specs/\`, \`docs/briefs/\`, \`docs/discoveries/\`, \`.claude/skills/\`, and test configuration.
952
758
 
953
- If the user has a story, capture the lesson as a specific NEVER rule and add to dangerous-assumptions.md.
759
+ ## Step 2: Route
954
760
 
955
- **Question 5:** "Any files, directories, or commands that should be completely off-limits?"
761
+ - **No harness** (no CLAUDE.md or just a README): Recommend \`npx joycraft init\` and stop.
762
+ - **Harness exists**: Continue to assessment.
956
763
 
957
- From the answer, generate deny rules for .claude/settings.json and add to NEVER section.
764
+ ## Step 3: Assess \u2014 Score 7 Dimensions (1-5 scale)
958
765
 
959
- **Rules for the interview:**
960
- - Ask questions ONE AT A TIME, not all at once
961
- - If the user says "nothing" or "skip", respect that and move on
962
- - Keep it to 2-3 minutes total \u2014 don't interrogate
963
- - Generate artifacts immediately after the interview, don't wait for all questions
964
- - This is the SECOND and LAST set of questions during /joycraft-tune (first is git autonomy)
766
+ Read CLAUDE.md and explore the project. Score each with specific evidence:
965
767
 
966
- ### Tier 2: Apply and Show Diff (do it, then report)
967
- These modify important files but are additive (append-only). Apply them, then show what changed so the user can review. Git is the undo button.
968
- - Add missing sections to CLAUDE.md (Behavioral Boundaries, Development Workflow, Getting Started with Joycraft, Key Files, Common Gotchas)
969
- - Use the git autonomy preference from above when generating the Behavioral Boundaries section
970
- - Draft section content from the actual codebase \u2014 not generic placeholders. Read the project's real rules, real commands, real structure.
971
- - Only append \u2014 never modify or reformat existing content
768
+ | Dimension | What to Check |
769
+ |-----------|--------------|
770
+ | Spec Quality | \`docs/specs/\` \u2014 structured? acceptance criteria? self-contained? |
771
+ | Spec Granularity | Can each spec be done in one session? |
772
+ | Behavioral Boundaries | ALWAYS/ASK FIRST/NEVER sections (or equivalent rules under any heading) |
773
+ | Skills & Hooks | \`.claude/skills/\` files, hooks config |
774
+ | Documentation | \`docs/\` structure, templates, referenced from CLAUDE.md |
775
+ | Knowledge Capture | \`docs/discoveries/\`, \`docs/context/*.md\` \u2014 existence AND real content |
776
+ | Testing & Validation | Test framework, CI pipeline, validation commands in CLAUDE.md |
972
777
 
973
- ### Tier 3: Confirm First (ask before acting)
974
- These are potentially destructive or opinionated. Ask before proceeding:
975
- - Rewriting or reorganizing existing CLAUDE.md sections
976
- - Overwriting files the user has customized
977
- - Suggesting test framework installation or CI setup (present as recommendations, don't auto-install)
778
+ Score 1 = absent, 3 = partially there, 5 = comprehensive. Give credit for substance over format.
978
779
 
979
- ### Reading a Previous Assessment
780
+ ## Step 4: Write Assessment
980
781
 
981
- If \`docs/joycraft-assessment.md\` already exists, read it first. If all recommendations have been applied, report "nothing to upgrade" and offer to re-assess.
782
+ Write to \`docs/joycraft-assessment.md\` AND display it. Include: scores table, detailed findings (evidence + gap + recommendation per dimension), and an upgrade plan (up to 5 actions ordered by impact).
982
783
 
983
- ### After Applying
784
+ ## Step 5: Apply Upgrades
984
785
 
985
- Append a history entry to \`docs/joycraft-history.md\` (create if needed):
986
- \`\`\`
987
- | [date] | [new avg score] | [change from last] | [summary of what changed] |
988
- \`\`\`
786
+ Apply using three tiers \u2014 do NOT ask per-item permission:
989
787
 
990
- Then display a single consolidated report:
788
+ **Tier 1 (silent):** Create missing dirs, install missing skills, copy missing templates, create AGENTS.md.
991
789
 
992
- \`\`\`markdown
993
- ## Upgrade Results
790
+ **Before Tier 2, ask TWO things:**
994
791
 
995
- | Dimension | Before | After | Change |
996
- |------------------------|--------|-------|--------|
997
- | Spec Quality | X/5 | X/5 | +X |
998
- | ... | ... | ... | ... |
792
+ 1. **Git autonomy:** Cautious (ask before push/PR) or Autonomous (push + PR without asking)?
793
+ 2. **Risk interview (3-5 questions, one at a time):** What could break? What services connect to prod? Unwritten rules? Off-limits files/commands? Skip if \`docs/context/\` already has content.
999
794
 
1000
- **Previous Level:** X \u2014 **New Level:** X
795
+ From answers, generate: CLAUDE.md boundary rules, \`.claude/settings.json\` deny patterns, \`docs/context/\` documents. Also recommend a permission mode (\`auto\` for most; \`dontAsk\` + allowlist for high-risk).
1001
796
 
1002
- ### What Changed
1003
- - [list each change applied]
797
+ **Tier 2 (show diff):** Add missing CLAUDE.md sections (Boundaries, Workflow, Key Files). Draft from real codebase content. Append only \u2014 never reformat existing content.
1004
798
 
1005
- ### Remaining Gaps
1006
- - [anything still below 3.5, with specific next action]
1007
- \`\`\`
799
+ **Tier 3 (confirm first):** Rewriting existing sections, overwriting customized files, suggesting test framework installs.
1008
800
 
1009
- Update \`docs/joycraft-assessment.md\` with the new scores and today's date.
801
+ After applying, append to \`docs/joycraft-history.md\` and show a consolidated upgrade results table.
1010
802
 
1011
803
  ## Step 6: Show Path to Level 5
1012
804
 
1013
- After the upgrade report, always show the Level 5 roadmap tailored to the project's current state:
1014
-
1015
- \`\`\`markdown
1016
- ## Path to Level 5 \u2014 Autonomous Development
1017
-
1018
- You're at Level [X]. Here's what each level looks like:
1019
-
1020
- | Level | You | AI | Key Skill |
1021
- |-------|-----|-----|-----------|
1022
- | 2 | Guide direction | Multi-file changes | AI-native tooling |
1023
- | 3 | Review diffs | Primary developer | Code review at scale |
1024
- | 4 | Write specs, check tests | End-to-end development | Specification writing |
1025
- | 5 | Define what + why | Specs in, software out | Systems design |
1026
-
1027
- ### Your Next Steps Toward Level [X+1]:
1028
- 1. [Specific action based on current gaps \u2014 e.g., "Write your first atomic spec using /joycraft-new-feature"]
1029
- 2. [Next action \u2014 e.g., "Add vitest and write tests for your core logic"]
1030
- 3. [Next action \u2014 e.g., "Use /joycraft-session-end consistently to build your discoveries log"]
1031
-
1032
- ### What Level 5 Looks Like (Your North Star):
1033
- - A backlog of ready specs that agents pull from and execute autonomously
1034
- - CI failures auto-generate fix specs \u2014 no human triage for regressions
1035
- - Multi-agent execution with parallel worktrees, one spec per agent
1036
- - External holdout scenarios (tests the agent can't see) prevent overfitting
1037
- - CLAUDE.md evolves from discoveries \u2014 the harness improves itself
1038
-
1039
- ### You'll Know You're at Level 5 When:
1040
- - You describe a feature in one sentence and walk away
1041
- - The system produces a PR with tests, docs, and discoveries \u2014 without further input
1042
- - Failed CI runs generate their own fix specs
1043
- - Your harness improves without you manually editing CLAUDE.md
1044
-
1045
- This is a significant journey. Most teams are at Level 2. Getting to Level 4 with Joycraft's workflow is achievable \u2014 Level 5 requires building validation infrastructure (scenario tests, spec queues, CI feedback loops) that goes beyond what Joycraft scaffolds today. But the harness you're building now is the foundation.
1046
- \`\`\`
1047
-
1048
- Tailor the "Next Steps" section based on the project's actual gaps \u2014 don't show generic advice.
805
+ Show a tailored roadmap: Level 2-5 table, specific next steps based on actual gaps, and the Level 5 north star (spec queue, autofix, holdout scenarios, self-improving harness).
1049
806
 
1050
807
  ## Edge Cases
1051
808
 
1052
- - **Not a git repo:** Note this. Joycraft works best in a git repo.
1053
- - **CLAUDE.md is just a README:** Treat as "no harness."
1054
- - **Non-Joycraft skills already installed:** Acknowledge them. Do not replace \u2014 suggest additions.
1055
- - **Monorepo:** Assess the root CLAUDE.md. Note if component-level CLAUDE.md files exist.
1056
- - **Project has rules under non-standard headings:** Give credit. Suggest reformatting as Always/Ask First/Never but acknowledge the rules are there.
1057
- - **Assessment file missing when upgrading:** Run the full assessment first, then offer to apply.
1058
- - **Assessment is stale:** Warn and offer to re-assess before proceeding.
1059
- - **All recommendations already applied:** Report "nothing to upgrade" and stop.
1060
- - **User declines a recommendation:** Skip it, continue, include in "What Was Skipped."
1061
- - **CLAUDE.md does not exist at all:** Create it with recommended sections, but ask the user first.
1062
- - **Non-Joycraft content in CLAUDE.md:** Preserve exactly as-is. Only append or merge \u2014 never remove or reformat existing content.
809
+ - **CLAUDE.md is just a README:** Treat as no harness.
810
+ - **Non-Joycraft skills:** Acknowledge, don't replace.
811
+ - **Rules under non-standard headings:** Give credit for substance.
812
+ - **Previous assessment exists:** Read it first. If nothing to upgrade, say so.
813
+ - **Non-Joycraft content in CLAUDE.md:** Preserve as-is. Only append.
1063
814
  `,
1064
815
  "joycraft-add-fact.md": `---
1065
816
  name: joycraft-add-fact
1066
817
  description: Capture a project fact and route it to the correct context document -- production map, dangerous assumptions, decision log, institutional knowledge, or troubleshooting
818
+ instructions: 38
1067
819
  ---
1068
820
 
1069
821
  # Add Fact
@@ -1090,7 +842,7 @@ The fact is about **infrastructure, services, environments, URLs, endpoints, cre
1090
842
  ### \`docs/context/dangerous-assumptions.md\`
1091
843
  The fact is about **something an AI agent might get wrong -- a false assumption that leads to bad outcomes**.
1092
844
  - Signal words: "assumes", "might think", "but actually", "looks like X but is Y", "not what it seems", "trap", "gotcha"
1093
- - Examples: "The \\\`users\\\` table looks like a test table but it's production", "Deleting a workspace doesn't delete the billing subscription"
845
+ - Examples: "The \`users\` table looks like a test table but it's production", "Deleting a workspace doesn't delete the billing subscription"
1094
846
 
1095
847
  ### \`docs/context/decision-log.md\`
1096
848
  The fact is about **an architectural or tooling choice and why it was made**.
@@ -1228,6 +980,7 @@ Routed to [chosen doc] -- move to [alternative doc] if this is more about [alter
1228
980
  "joycraft-lockdown.md": `---
1229
981
  name: joycraft-lockdown
1230
982
  description: Generate constrained execution boundaries for an implementation session -- NEVER rules and deny patterns to prevent agent overreach
983
+ instructions: 28
1231
984
  ---
1232
985
 
1233
986
  # Lockdown Mode
@@ -1300,15 +1053,15 @@ Review these suggestions and add them to your project:
1300
1053
 
1301
1054
  ### CLAUDE.md -- add to NEVER section:
1302
1055
 
1303
- - Edit any file in \\\`[user's test directories]\\\`
1304
- - Run \\\`[denied package manager commands]\\\`
1305
- - Use \\\`[denied network tools]\\\`
1056
+ - Edit any file in \`[user's test directories]\`
1057
+ - Run \`[denied package manager commands]\`
1058
+ - Use \`[denied network tools]\`
1306
1059
  - Read log files directly -- interact with logs only through test assertions
1307
1060
  - [Any additional NEVER rules based on user responses]
1308
1061
 
1309
1062
  ### .claude/settings.json -- suggested deny patterns:
1310
1063
 
1311
- Add these to the \\\`permissions.deny\\\` array:
1064
+ Add these to the \`permissions.deny\` array:
1312
1065
 
1313
1066
  ["[command1]", "[command2]", "[command3]"]
1314
1067
 
@@ -1329,17 +1082,17 @@ After generating the boundaries above, also recommend a Claude Code permission m
1329
1082
  \`\`\`
1330
1083
  ### Recommended Permission Mode
1331
1084
 
1332
- You don't need \\\`--dangerously-skip-permissions\\\`. Safer alternatives exist:
1085
+ You don't need \`--dangerously-skip-permissions\`. Safer alternatives exist:
1333
1086
 
1334
1087
  | Your situation | Use | Why |
1335
1088
  |---|---|---|
1336
- | Autonomous spec execution | \\\`--permission-mode dontAsk\\\` + allowlist above | Only pre-approved commands run |
1337
- | Long session with some trust | \\\`--permission-mode auto\\\` | Safety classifier reviews each action |
1338
- | Interactive development | \\\`--permission-mode acceptEdits\\\` | Auto-approves file edits, prompts for commands |
1089
+ | Autonomous spec execution | \`--permission-mode dontAsk\` + allowlist above | Only pre-approved commands run |
1090
+ | Long session with some trust | \`--permission-mode auto\` | Safety classifier reviews each action |
1091
+ | Interactive development | \`--permission-mode acceptEdits\` | Auto-approves file edits, prompts for commands |
1339
1092
 
1340
- **For lockdown mode, we recommend \\\`--permission-mode dontAsk\\\`** combined with the deny patterns above. This gives you full autonomy for allowed operations while blocking everything else -- no classifier overhead, no prompts, and no safety bypass.
1093
+ **For lockdown mode, we recommend \`--permission-mode dontAsk\`** combined with the deny patterns above. This gives you full autonomy for allowed operations while blocking everything else -- no classifier overhead, no prompts, and no safety bypass.
1341
1094
 
1342
- \\\`--dangerously-skip-permissions\\\` disables ALL safety checks. The modes above give you autonomy without removing the guardrails.
1095
+ \`--dangerously-skip-permissions\` disables ALL safety checks. The modes above give you autonomy without removing the guardrails.
1343
1096
  \`\`\`
1344
1097
 
1345
1098
  ## Step 4: Offer to Apply
@@ -1354,6 +1107,7 @@ If the user asks you to apply the changes:
1354
1107
  "joycraft-verify.md": `---
1355
1108
  name: joycraft-verify
1356
1109
  description: Spawn an independent verifier subagent to check an implementation against its spec -- read-only, no code edits, structured pass/fail verdict
1110
+ instructions: 30
1357
1111
  ---
1358
1112
 
1359
1113
  # Verify Implementation Against Spec
@@ -1497,70 +1251,44 @@ Based on the verdict:
1497
1251
  "joycraft-bugfix.md": `---
1498
1252
  name: joycraft-bugfix
1499
1253
  description: Structured bug fix workflow \u2014 triage, diagnose, discuss with user, write a focused spec, hand off for implementation
1254
+ instructions: 32
1500
1255
  ---
1501
1256
 
1502
1257
  # Bug Fix Workflow
1503
1258
 
1504
1259
  You are fixing a bug. Follow this process in order. Do not skip steps.
1505
1260
 
1506
- **Guard clause:** If the user's request is clearly a new feature \u2014 not a bug, error, or unexpected behavior \u2014 say:
1507
- "This sounds like a new feature rather than a bug fix. Try \`/joycraft-new-feature\` for a guided feature workflow."
1508
- Then stop.
1261
+ **Guard clause:** If this is clearly a new feature, redirect to \`/joycraft-new-feature\` and stop.
1509
1262
 
1510
1263
  ---
1511
1264
 
1512
1265
  ## Phase 1: Triage
1513
1266
 
1514
- Establish what's broken. Your goal is to reproduce the bug or at minimum understand the symptom clearly.
1515
-
1516
- **Ask / gather:**
1517
- - What is the symptom? (error message, unexpected behavior, crash, wrong output)
1518
- - What are the steps to reproduce?
1519
- - What is the expected behavior vs. actual behavior?
1520
- - When did it start? (recent change, always been this way, intermittent)
1521
- - Any relevant logs, screenshots, or error output?
1267
+ Establish what's broken. Gather: symptom, steps to reproduce, expected vs actual behavior, when it started, relevant logs/errors. If an error message or stack trace is provided, read the referenced files immediately. Try to reproduce if steps are given.
1522
1268
 
1523
- **Actions:**
1524
- - If the user provides an error message or stack trace, read the referenced files immediately
1525
- - If steps to reproduce are provided, try to reproduce the bug (run the failing command, test, or request)
1526
- - If the bug is intermittent or hard to reproduce, gather more context: environment, OS, versions, config
1527
-
1528
- **Done when:** You can describe the symptom in one sentence and have either reproduced it or have enough context to diagnose without reproduction.
1269
+ **Done when:** You can describe the symptom in one sentence.
1529
1270
 
1530
1271
  ---
1531
1272
 
1532
1273
  ## Phase 2: Diagnose
1533
1274
 
1534
- Find the root cause. Read code, trace the execution path, identify what's wrong and why.
1535
-
1536
- **Actions:**
1537
- - Start from the error site (stack trace, failing test, broken UI) and trace backward
1538
- - Read the relevant source files \u2014 don't guess based on file names alone
1539
- - Identify the specific line(s), condition, or logic error causing the bug
1540
- - Check git blame or recent commits if the bug was introduced by a recent change
1541
- - Look for related bugs \u2014 is this a symptom of a deeper issue?
1275
+ Find the root cause. Start from the error site and trace backward. Read source files \u2014 don't guess. Identify the specific line(s) and logic error. Check git blame if it's a recent regression.
1542
1276
 
1543
- **Done when:** You can explain the root cause in 2-3 sentences: what's wrong, why it's wrong, and where in the code it happens.
1277
+ **Done when:** You can explain what's wrong, why, and where in 2-3 sentences.
1544
1278
 
1545
1279
  ---
1546
1280
 
1547
1281
  ## Phase 3: Discuss
1548
1282
 
1549
- Present your findings to the user. Do NOT start writing code or a spec yet.
1283
+ Present findings to the user BEFORE writing any code or spec:
1284
+ 1. **Symptom** \u2014 confirm it matches what they see
1285
+ 2. **Root cause** \u2014 specific file(s) and line(s)
1286
+ 3. **Proposed fix** \u2014 what changes, where
1287
+ 4. **Risk** \u2014 side effects? scope?
1550
1288
 
1551
- **Present:**
1552
- 1. **Symptom:** What the user sees (confirm your understanding matches theirs)
1553
- 2. **Root cause:** What's actually wrong in the code and why
1554
- 3. **Proposed fix:** What you think the fix is \u2014 be specific (which files, what changes)
1555
- 4. **Risk assessment:** What could go wrong with this fix? Any side effects?
1556
- 5. **Scope check:** Is this a simple fix or does it touch multiple systems?
1289
+ Ask: "Does this match? Comfortable with this approach?" If large/risky, suggest decomposing into multiple specs.
1557
1290
 
1558
- **Ask:**
1559
- - "Does this match what you're seeing?"
1560
- - "Are you comfortable with this approach, or do you want to explore alternatives?"
1561
- - If the fix is large or risky: "Should we decompose this into smaller specs?"
1562
-
1563
- **Done when:** The user agrees with the diagnosis and proposed fix direction.
1291
+ **Done when:** User agrees with the diagnosis and fix direction.
1564
1292
 
1565
1293
  ---
1566
1294
 
@@ -1670,6 +1398,211 @@ Ready to start?
1670
1398
  \`\`\`
1671
1399
 
1672
1400
  **Why:** A fresh session for implementation produces better results. This diagnostic session has context noise from exploration \u2014 a clean session with just the spec is more focused.
1401
+ `,
1402
+ "joycraft-design.md": `---
1403
+ name: joycraft-design
1404
+ description: Design discussion before decomposition \u2014 produce a ~200-line design artifact for human review, catching wrong assumptions before they propagate into specs
1405
+ ---
1406
+
1407
+ # Design Discussion
1408
+
1409
+ You are producing a design discussion document for a feature. This sits between research and decomposition \u2014 it captures your understanding so the human can catch wrong assumptions before specs are written.
1410
+
1411
+ **Guard clause:** If no brief path is provided and no brief exists in \`docs/briefs/\`, say:
1412
+ "No feature brief found. Run \`/joycraft-new-feature\` first to create one, or provide the path to your brief."
1413
+ Then stop.
1414
+
1415
+ ---
1416
+
1417
+ ## Step 1: Read Inputs
1418
+
1419
+ Read the feature brief at the path the user provides. If the user also provides a research document path, read that too. Research is optional \u2014 if none exists, note that you'll explore the codebase directly.
1420
+
1421
+ ## Step 2: Explore the Codebase
1422
+
1423
+ Spawn subagents to explore the codebase for patterns relevant to the brief. Focus on:
1424
+
1425
+ - Files and functions that will be touched or extended
1426
+ - Existing patterns this feature should follow (naming, data flow, error handling)
1427
+ - Similar features already implemented that serve as models
1428
+ - Boundaries and interfaces the feature must integrate with
1429
+
1430
+ Gather file paths, function signatures, and code snippets. You need concrete evidence, not guesses.
1431
+
1432
+ ## Step 3: Write the Design Document
1433
+
1434
+ Create \`docs/designs/\` directory if it doesn't exist. Write the design document to \`docs/designs/YYYY-MM-DD-feature-name.md\`.
1435
+
1436
+ The document has exactly five sections:
1437
+
1438
+ ### Section 1: Current State
1439
+
1440
+ What exists today in the codebase that is relevant to this feature. Include file paths, function signatures, and data flows. Be specific \u2014 reference actual code, not abstractions. If no research doc was provided, note that and describe what you found through direct exploration.
1441
+
1442
+ ### Section 2: Desired End State
1443
+
1444
+ What the codebase should look like when this feature is complete. Describe the change at a high level \u2014 new files, modified interfaces, new data flows. Do NOT include implementation steps. This is the "what," not the "how."
1445
+
1446
+ ### Section 3: Patterns to Follow
1447
+
1448
+ Existing patterns in the codebase that this feature should match. Include short code snippets and \`file:line\` references. Show the pattern, don't just name it.
1449
+
1450
+ If this is a greenfield project with no existing patterns, propose conventions and note that no precedent exists.
1451
+
1452
+ ### Section 4: Resolved Design Decisions
1453
+
1454
+ Decisions you have already made, with brief rationale. Format each as:
1455
+
1456
+ > **Decision:** [what you decided]
1457
+ > **Rationale:** [why, referencing existing code or constraints]
1458
+ > **Alternative rejected:** [what you considered and why you rejected it]
1459
+
1460
+ ### Section 5: Open Questions
1461
+
1462
+ Things you don't know or where multiple valid approaches exist. Each question MUST present 2-3 concrete options with pros and cons. Format:
1463
+
1464
+ > **Q: [question]**
1465
+ > - **Option A:** [description] \u2014 Pro: [benefit]. Con: [cost].
1466
+ > - **Option B:** [description] \u2014 Pro: [benefit]. Con: [cost].
1467
+ > - **Option C (if applicable):** [description] \u2014 Pro: [benefit]. Con: [cost].
1468
+
1469
+ Do NOT ask vague questions like "what do you think?" Every question must have actionable options the human can choose from.
1470
+
1471
+ ## Step 4: Present and STOP
1472
+
1473
+ Present the design document to the user. Say:
1474
+
1475
+ \`\`\`
1476
+ Design discussion written to docs/designs/YYYY-MM-DD-feature-name.md
1477
+
1478
+ Please review the document above. Specifically:
1479
+ 1. Are the patterns in Section 3 the right ones to follow, or should I use different ones?
1480
+ 2. Do you agree with the resolved decisions in Section 4?
1481
+ 3. Pick an option for each open question in Section 5 (or propose your own).
1482
+
1483
+ Reply with your feedback. I will NOT proceed to decomposition until you have reviewed and approved this design.
1484
+ \`\`\`
1485
+
1486
+ **CRITICAL: Do NOT proceed to \`/joycraft-decompose\` or generate specs.** Wait for the human to review, answer open questions, and correct any wrong assumptions. The entire value of this skill is the pause \u2014 it forces a human checkpoint before mistakes propagate.
1487
+
1488
+ ## After Human Review
1489
+
1490
+ Once the human responds:
1491
+ - Update the design document with their corrections and chosen options
1492
+ - Move answered questions from "Open Questions" to "Resolved Design Decisions"
1493
+ - Present the updated document for final confirmation
1494
+ - Only after explicit approval, tell the user: "Design approved. Run \`/joycraft-decompose\` with this brief to generate atomic specs."
1495
+ `,
1496
+ "joycraft-research.md": `---
1497
+ name: joycraft-research
1498
+ description: Produce objective codebase research by isolating question generation from fact-gathering \u2014 subagent sees only questions, never the brief
1499
+ ---
1500
+
1501
+ # Research Codebase for a Feature
1502
+
1503
+ You are producing objective codebase research to inform a future spec or implementation. The key insight: the researching agent must never see the brief or ticket \u2014 only research questions. This prevents opinions from contaminating the facts.
1504
+
1505
+ **Guard clause:** If the user doesn't provide a brief path or inline description, ask:
1506
+ "What feature or change are you researching? Provide a brief path (e.g., \`docs/briefs/2026-03-30-my-feature.md\`) or describe it in a few sentences."
1507
+
1508
+ ---
1509
+
1510
+ ## Phase 1: Generate Research Questions
1511
+
1512
+ Read the brief file (if a path was provided) or use the user's inline description.
1513
+
1514
+ Identify which zones of the codebase are relevant to this feature. Then generate 5-10 research questions that are:
1515
+
1516
+ - **Objective and fact-seeking** \u2014 "How does X work?" not "How should we build X?"
1517
+ - **Specific to the codebase** \u2014 reference concrete systems, files, or flows
1518
+ - **Answerable by reading code** \u2014 no questions about business strategy or user preferences
1519
+
1520
+ Good examples:
1521
+ - "How does endpoint registration work in the current router?"
1522
+ - "What patterns exist for input validation across existing handlers?"
1523
+ - "Trace the data flow from API request to database write for entity X."
1524
+ - "What test infrastructure exists? Where are fixtures, mocks, and helpers?"
1525
+ - "What dependencies does module Y import, and what does its public API look like?"
1526
+
1527
+ Bad examples (do NOT generate these):
1528
+ - "What's the best way to implement this feature?" (opinion)
1529
+ - "Should we use library X or Y?" (recommendation)
1530
+ - "What would a good architecture look like?" (design, not research)
1531
+
1532
+ Write the questions to a temporary file at \`docs/research/.questions-tmp.md\`. Create the \`docs/research/\` directory if it doesn't exist.
1533
+
1534
+ **Do NOT include any content from the brief in this file \u2014 only the questions.**
1535
+
1536
+ ---
1537
+
1538
+ ## Phase 2: Spawn Research Subagent
1539
+
1540
+ Use Claude Code's Agent tool to spawn a subagent. Pass ONLY the research questions \u2014 never the brief path, brief content, or feature description.
1541
+
1542
+ Build the subagent prompt by reading the questions file you just wrote, then use this template:
1543
+
1544
+ \`\`\`
1545
+ You are researching a codebase to answer specific questions. You have NO context about why these questions are being asked \u2014 you are simply gathering facts.
1546
+
1547
+ RULES \u2014 these are hard constraints:
1548
+ - Answer each question with FACTS ONLY: file paths, function signatures, data flows, patterns, dependencies
1549
+ - Do NOT recommend, suggest, or opine on anything
1550
+ - Do NOT speculate about what should be built or how
1551
+ - If a question cannot be answered (no relevant code exists), say "No existing code found for this"
1552
+ - Use the Read tool and Grep tool to explore the codebase thoroughly
1553
+ - Include code snippets only when they are essential evidence (e.g., a function signature, a config block)
1554
+
1555
+ QUESTIONS:
1556
+ [INSERT_QUESTIONS_HERE]
1557
+
1558
+ OUTPUT FORMAT \u2014 write your findings as a single markdown document using this structure:
1559
+
1560
+ # Codebase Research
1561
+
1562
+ **Date:** [today's date]
1563
+ **Questions answered:** [N/total]
1564
+
1565
+ ---
1566
+
1567
+ ## Q1: [question text]
1568
+
1569
+ [Facts, file paths, function signatures, data flows. No opinions.]
1570
+
1571
+ ## Q2: [question text]
1572
+
1573
+ [Facts, file paths, function signatures, data flows. No opinions.]
1574
+
1575
+ [Continue for all questions]
1576
+ \`\`\`
1577
+
1578
+ ## Phase 3: Write the Research Document
1579
+
1580
+ Take the subagent's response and write it to \`docs/research/YYYY-MM-DD-feature-name.md\`. Derive the feature name from the brief filename or the user's description (lowercase, hyphenated).
1581
+
1582
+ Delete the temporary questions file (\`docs/research/.questions-tmp.md\`).
1583
+
1584
+ Present the research document path to the user:
1585
+
1586
+ \`\`\`
1587
+ Research complete: docs/research/YYYY-MM-DD-feature-name.md
1588
+
1589
+ This document contains objective facts about your codebase \u2014 no opinions or recommendations.
1590
+
1591
+ Next steps:
1592
+ - /joycraft-decompose \u2014 break the feature into atomic specs (research will inform the specs)
1593
+ - /joycraft-new-feature \u2014 formalize into a full Feature Brief first
1594
+ - Read the research and add any corrections or missing context manually
1595
+ \`\`\`
1596
+
1597
+ ## Edge Cases
1598
+
1599
+ | Scenario | Behavior |
1600
+ |----------|----------|
1601
+ | No brief provided | Accept inline description, generate questions from that |
1602
+ | Codebase is empty or new | Research doc reports "no existing patterns found" per question |
1603
+ | User runs research twice for same feature | Overwrites previous research doc (same filename) |
1604
+ | Brief is very short (1-2 sentences) | Still generate questions \u2014 even simple features benefit from understanding existing patterns |
1605
+ | \`docs/research/\` doesn't exist | Create it |
1673
1606
  `
1674
1607
  };
1675
1608
  var TEMPLATES = {
@@ -3465,11 +3398,1511 @@ jobs:
3465
3398
  -f "client_payload[repo]=\${{ github.repository }}"
3466
3399
 
3467
3400
  done <<< "\${{ steps.changed.outputs.files }}"
3401
+ `,
3402
+ "GOLDEN_EXAMPLE_TEMPLATE.md": `# [Feature Name] \u2014 Golden Example
3403
+
3404
+ > **Date:** YYYY-MM-DD
3405
+ > **Project:** [project name]
3406
+ > **Source Brief:** \`docs/briefs/YYYY-MM-DD-feature-name.md\`
3407
+
3408
+ ---
3409
+
3410
+ ## Capture
3411
+
3412
+ The original user request or description that initiated this feature. Copied verbatim or lightly edited from the brief's Vision section.
3413
+
3414
+ > [Paste the original capture text here \u2014 what the user said/typed that kicked off the pipeline]
3415
+
3416
+ ## Classification
3417
+
3418
+ - **Action Level:** [interview | decompose | execute | research | design]
3419
+ - **Confidence:** [high | medium | low]
3420
+ - **Skills Used:** [comma-separated list of Joycraft skills invoked, e.g., joycraft-new-feature, joycraft-decompose]
3421
+
3422
+ ## Decomposition Summary
3423
+
3424
+ The resulting spec breakdown from this capture:
3425
+
3426
+ | # | Spec Name | Description | Size |
3427
+ |---|-----------|-------------|------|
3428
+ | 1 | [spec-name] | [one sentence] | [S/M/L] |
3429
+
3430
+ ## Rationale
3431
+
3432
+ 2-3 sentences explaining why this classification was correct for this capture. What signals in the capture text indicated this action level? What would have gone wrong with a different classification?
3433
+
3434
+ ---
3435
+
3436
+ ## Template Usage Notes
3437
+
3438
+ **This template is for Pipit golden examples.** Golden examples are auto-generated by Joycraft's session-end skill after a successful pipeline run. They provide few-shot examples that improve Pipit's level classifier over time.
3439
+
3440
+ **Do not edit generated examples** unless the classification was wrong. If it was wrong, correct the Classification section \u2014 this teaches Pipit the right answer.
3441
+
3442
+ **One example per pipeline run.** Each successful interview \u2192 brief \u2192 specs \u2192 execution cycle produces one golden example.
3443
+ `
3444
+ };
3445
+ var CODEX_SKILLS = {
3446
+ "joycraft-add-fact.md": `---
3447
+ name: joycraft-add-fact
3448
+ description: Capture a project fact and route it to the correct context document -- production map, dangerous assumptions, decision log, institutional knowledge, or troubleshooting
3449
+ ---
3450
+
3451
+ # Add Fact
3452
+
3453
+ The user has a fact to capture. Your job is to classify it, route it to the correct context document, append it in the right format, and optionally add a boundary rule to CLAUDE.md or AGENTS.md.
3454
+
3455
+ ## Step 1: Get the Fact
3456
+
3457
+ If the user already provided the fact (e.g., \`$joycraft-add-fact the staging DB resets every Sunday\`), use it directly.
3458
+
3459
+ If not, ask: "What fact do you want to capture?" -- then wait for their response.
3460
+
3461
+ If the user provides multiple facts at once, process each one separately through all the steps below, then give a combined confirmation at the end.
3462
+
3463
+ ## Step 2: Classify the Fact
3464
+
3465
+ Route the fact to one of these 5 context documents based on its content:
3466
+
3467
+ ### \`docs/context/production-map.md\`
3468
+ The fact is about **infrastructure, services, environments, URLs, endpoints, credentials, or what is safe/unsafe to touch**.
3469
+ - Signal words: "production", "staging", "endpoint", "URL", "database", "service", "deployed", "hosted", "credentials", "secret", "environment"
3470
+ - Examples: "The staging DB is at postgres://staging.example.com", "We use Vercel for the frontend and Railway for the API"
3471
+
3472
+ ### \`docs/context/dangerous-assumptions.md\`
3473
+ The fact is about **something an AI agent might get wrong -- a false assumption that leads to bad outcomes**.
3474
+ - Signal words: "assumes", "might think", "but actually", "looks like X but is Y", "not what it seems", "trap", "gotcha"
3475
+ - Examples: "The \`users\` table looks like a test table but it's production", "Deleting a workspace doesn't delete the billing subscription"
3476
+
3477
+ ### \`docs/context/decision-log.md\`
3478
+ The fact is about **an architectural or tooling choice and why it was made**.
3479
+ - Signal words: "decided", "chose", "because", "instead of", "we went with", "the reason we use", "trade-off"
3480
+ - Examples: "We chose SQLite over Postgres because this runs on embedded devices", "We use pnpm instead of npm for workspace support"
3481
+
3482
+ ### \`docs/context/institutional-knowledge.md\`
3483
+ The fact is about **team conventions, unwritten rules, organizational context, or who owns what**.
3484
+ - Signal words: "convention", "rule", "always", "never", "team", "process", "review", "approval", "owns", "responsible"
3485
+ - Examples: "The design team reviews all color changes", "We never deploy on Fridays", "PR titles must start with the ticket number"
3486
+
3487
+ ### \`docs/context/troubleshooting.md\`
3488
+ The fact is about **diagnostic knowledge -- when X happens, do Y (or don't do Z)**.
3489
+ - Signal words: "when", "fails", "error", "if you see", "stuck", "broken", "fix", "workaround", "before trying", "reboot", "restart", "reset"
3490
+ - Examples: "If Wi-Fi disconnects during flash, wait and retry -- don't switch networks", "When tests fail with ECONNREFUSED, check if Docker is running"
3491
+
3492
+ ### Ambiguous Facts
3493
+
3494
+ If the fact fits multiple categories, pick the **best fit** based on the primary intent. You will mention the alternative in your confirmation message so the user can correct you.
3495
+
3496
+ ## Step 3: Ensure the Target Document Exists
3497
+
3498
+ 1. If \`docs/context/\` does not exist, create the directory.
3499
+ 2. If the target document does not exist, create it from the template structure. Check \`docs/templates/\` for the matching template. If no template exists, use this minimal structure:
3500
+
3501
+ For **production-map.md**:
3502
+ \`\`\`markdown
3503
+ # Production Map
3504
+
3505
+ > What's real, what's staging, what's safe to touch.
3506
+
3507
+ ## Services
3508
+
3509
+ | Service | Environment | URL/Endpoint | Impact if Corrupted |
3510
+ |---------|-------------|-------------|-------------------|
3511
+ \`\`\`
3512
+
3513
+ For **dangerous-assumptions.md**:
3514
+ \`\`\`markdown
3515
+ # Dangerous Assumptions
3516
+
3517
+ > Things the AI agent might assume that are wrong in this project.
3518
+
3519
+ ## Assumptions
3520
+
3521
+ | Agent Might Assume | But Actually | Impact If Wrong |
3522
+ |-------------------|-------------|----------------|
3523
+ \`\`\`
3524
+
3525
+ For **decision-log.md**:
3526
+ \`\`\`markdown
3527
+ # Decision Log
3528
+
3529
+ > Why choices were made, not just what was chosen.
3530
+
3531
+ ## Decisions
3532
+
3533
+ | Date | Decision | Why | Alternatives Rejected | Revisit When |
3534
+ |------|----------|-----|----------------------|-------------|
3535
+ \`\`\`
3536
+
3537
+ For **institutional-knowledge.md**:
3538
+ \`\`\`markdown
3539
+ # Institutional Knowledge
3540
+
3541
+ > Unwritten rules, team conventions, and organizational context.
3542
+
3543
+ ## Team Conventions
3544
+
3545
+ - (none yet)
3546
+ \`\`\`
3547
+
3548
+ For **troubleshooting.md**:
3549
+ \`\`\`markdown
3550
+ # Troubleshooting
3551
+
3552
+ > What to do when things go wrong for non-code reasons.
3553
+
3554
+ ## Common Failures
3555
+
3556
+ | When This Happens | Do This | Don't Do This |
3557
+ |-------------------|---------|---------------|
3558
+ \`\`\`
3559
+
3560
+ ## Step 4: Read the Target Document
3561
+
3562
+ Read the target document to understand its current structure. Note:
3563
+ - Which section to append to
3564
+ - Whether it uses tables or lists
3565
+ - The column format if it's a table
3566
+
3567
+ ## Step 5: Append the Fact
3568
+
3569
+ Add the fact to the appropriate section of the target document. Match the existing format exactly:
3570
+
3571
+ - **Table-based documents** (production-map, dangerous-assumptions, decision-log, troubleshooting): Add a new table row in the correct columns. Use today's date where a date column exists.
3572
+ - **List-based documents** (institutional-knowledge): Add a new list item (\`- \`) to the most appropriate section.
3573
+
3574
+ Remove any italic example rows (rows where all cells start with \`_\`) before appending, so the document transitions from template to real content. Only remove examples from the specific table you are appending to.
3575
+
3576
+ **Append only. Never modify or remove existing real content.**
3577
+
3578
+ ## Step 6: Evaluate Boundary Rule
3579
+
3580
+ Decide whether the fact also warrants a rule in the project's boundary configuration (CLAUDE.md and/or AGENTS.md -- check which files the project uses and update accordingly):
3581
+
3582
+ **Add a boundary rule if the fact:**
3583
+ - Describes something that should ALWAYS or NEVER be done
3584
+ - Could cause real damage if violated (data loss, broken deployments, security issues)
3585
+ - Is a hard constraint that applies across all work, not just a one-time note
3586
+
3587
+ **Do NOT add a boundary rule if the fact is:**
3588
+ - Purely informational (e.g., "staging DB is at this URL")
3589
+ - A one-time decision that's already captured
3590
+ - A diagnostic tip rather than a prohibition
3591
+
3592
+ If a rule is warranted, read the project's boundary file(s) -- CLAUDE.md and/or AGENTS.md -- find the appropriate section (ALWAYS, ASK FIRST, or NEVER under Behavioral Boundaries), and append the rule. If no Behavioral Boundaries section exists, append one. Update whichever boundary files the project uses (some projects have CLAUDE.md, some have AGENTS.md, some have both).
3593
+
3594
+ ## Step 7: Confirm
3595
+
3596
+ Report what you did in this format:
3597
+
3598
+ \`\`\`
3599
+ Added to [document name]:
3600
+ [summary of what was added]
3601
+
3602
+ [If boundary file(s) were also updated:]
3603
+ Added boundary rule to [CLAUDE.md / AGENTS.md / both]:
3604
+ [ALWAYS/ASK FIRST/NEVER]: [rule text]
3605
+
3606
+ [If the fact was ambiguous:]
3607
+ Routed to [chosen doc] -- move to [alternative doc] if this is more about [alternative category description].
3608
+ \`\`\`
3609
+ `,
3610
+ "joycraft-bugfix.md": `---
3611
+ name: joycraft-bugfix
3612
+ description: Structured bug fix workflow \u2014 triage, diagnose, discuss with user, write a focused spec, hand off for implementation
3613
+ ---
3614
+
3615
+ # Bug Fix Workflow
3616
+
3617
+ You are fixing a bug. Follow this process in order. Do not skip steps.
3618
+
3619
+ **Guard clause:** If this is clearly a new feature, redirect to \`$joycraft-new-feature\` and stop.
3620
+
3621
+ ---
3622
+
3623
+ ## Phase 1: Triage
3624
+
3625
+ Establish what's broken. Gather: symptom, steps to reproduce, expected vs actual behavior, when it started, relevant logs/errors. If an error message or stack trace is provided, read the referenced files immediately. Try to reproduce if steps are given.
3626
+
3627
+ **Done when:** You can describe the symptom in one sentence.
3628
+
3629
+ ---
3630
+
3631
+ ## Phase 2: Diagnose
3632
+
3633
+ Find the root cause. Start from the error site and trace backward. Search the codebase and read files \u2014 don't guess. Identify the specific line(s) and logic error. Check git blame if it's a recent regression.
3634
+
3635
+ **Done when:** You can explain what's wrong, why, and where in 2-3 sentences.
3636
+
3637
+ ---
3638
+
3639
+ ## Phase 3: Discuss
3640
+
3641
+ Present findings to the user BEFORE writing any code or spec:
3642
+ 1. **Symptom** \u2014 confirm it matches what they see
3643
+ 2. **Root cause** \u2014 specific file(s) and line(s)
3644
+ 3. **Proposed fix** \u2014 what changes, where
3645
+ 4. **Risk** \u2014 side effects? scope?
3646
+
3647
+ Ask: "Does this match? Comfortable with this approach?" If large/risky, suggest decomposing into multiple specs.
3648
+
3649
+ **Done when:** User agrees with the diagnosis and fix direction.
3650
+
3651
+ ---
3652
+
3653
+ ## Phase 4: Spec the Fix
3654
+
3655
+ Write a bug fix spec to \`docs/specs/YYYY-MM-DD-bugfix-name.md\`. Create the \`docs/specs/\` directory if it doesn't exist.
3656
+
3657
+ **Why:** Even bug fixes deserve a spec. It forces clarity on what "fixed" means, ensures test-first discipline, and creates a traceable record of the fix.
3658
+
3659
+ Use this structure:
3660
+
3661
+ \`\`\`markdown
3662
+ # [Bug Name] \u2014 Bug Fix Spec
3663
+
3664
+ > **Status:** Ready
3665
+ > **Date:** YYYY-MM-DD
3666
+ > **Estimated scope:** [1 session / N files / ~N lines]
3667
+
3668
+ ---
3669
+
3670
+ ## Bug
3671
+ One sentence \u2014 what's broken?
3672
+
3673
+ ## Root Cause
3674
+ What's actually wrong, in which file(s) and line(s)?
3675
+
3676
+ ## Fix
3677
+ What changes, where?
3678
+
3679
+ ## Acceptance Criteria
3680
+ - [ ] [Observable behavior that proves the fix works]
3681
+ - [ ] No regressions \u2014 existing tests still pass
3682
+ - [ ] Build passes
3683
+
3684
+ ## Test Plan
3685
+ 1. Write a reproduction test that fails before the fix
3686
+ 2. Apply the fix
3687
+ 3. Reproduction test passes
3688
+ 4. Full test suite passes
3689
+
3690
+ ## Constraints
3691
+ - MUST: [hard requirement]
3692
+ - MUST NOT: [hard prohibition]
3693
+
3694
+ ## Affected Files
3695
+ | Action | File | What Changes |
3696
+ |--------|------|-------------|
3697
+
3698
+ ## Edge Cases
3699
+ | Scenario | Expected Behavior |
3700
+ |----------|------------------|
3701
+ \`\`\`
3702
+
3703
+ **For large bugs that span multiple files/systems:** Consider whether this should be decomposed into multiple specs. If so, create a brief first using \`$joycraft-new-feature\`, then decompose.
3704
+
3705
+ ---
3706
+
3707
+ ## Phase 5: Hand Off
3708
+
3709
+ \`\`\`
3710
+ Bug fix spec is ready: docs/specs/YYYY-MM-DD-bugfix-name.md
3711
+
3712
+ Summary:
3713
+ - Bug: [one sentence]
3714
+ - Root cause: [one sentence]
3715
+ - Fix: [one sentence]
3716
+ - Estimated: 1 session
3717
+
3718
+ To execute: Start a fresh session and:
3719
+ 1. Read the spec
3720
+ 2. Write the reproduction test (must fail)
3721
+ 3. Apply the fix (test must pass)
3722
+ 4. Run full test suite
3723
+ 5. Run $joycraft-session-end to capture discoveries
3724
+ 6. Commit and PR
3725
+
3726
+ Ready to start?
3727
+ \`\`\`
3728
+ `,
3729
+ "joycraft-decompose.md": `---
3730
+ name: joycraft-decompose
3731
+ description: Break a feature brief into atomic specs \u2014 small, testable, independently executable units
3732
+ ---
3733
+
3734
+ # Decompose Feature into Atomic Specs
3735
+
3736
+ You have a Feature Brief (or the user has described a feature). Your job is to decompose it into atomic specs that can be executed independently \u2014 one spec per session.
3737
+
3738
+ ## Step 1: Verify the Brief Exists
3739
+
3740
+ Look for a Feature Brief in \`docs/briefs/\`. If one doesn't exist yet, tell the user:
3741
+
3742
+ > No feature brief found. Run \`$joycraft-new-feature\` first to interview and create one, or describe the feature now and I'll work from your description.
3743
+
3744
+ If the user describes the feature inline, work from that description directly. You don't need a formal brief to decompose \u2014 but recommend creating one for complex features.
3745
+
3746
+ ## Step 2: Identify Natural Boundaries
3747
+
3748
+ **Why:** Good boundaries make specs independently testable and committable. Bad boundaries create specs that can't be verified without other specs also being done.
3749
+
3750
+ Read the brief (or description) and identify natural split points:
3751
+
3752
+ - **Data layer changes** (schemas, types, migrations) \u2014 always a separate spec
3753
+ - **Pure functions / business logic** \u2014 separate from I/O
3754
+ - **UI components** \u2014 separate from data fetching
3755
+ - **API endpoints / route handlers** \u2014 separate from business logic
3756
+ - **Test infrastructure** (mocks, fixtures, helpers) \u2014 can be its own spec if substantial
3757
+ - **Configuration / environment** \u2014 separate from code changes
3758
+
3759
+ Ask yourself: "Can this piece be committed and tested without the other pieces existing?" If yes, it's a good boundary.
3760
+
3761
+ ## Step 3: Build the Decomposition Table
3762
+
3763
+ For each atomic spec, define:
3764
+
3765
+ | # | Spec Name | Description | Dependencies | Size |
3766
+ |---|-----------|-------------|--------------|------|
3767
+
3768
+ **Rules:**
3769
+ - Each spec name is \`verb-object\` format (e.g., \`add-terminal-detection\`, \`extract-prompt-module\`)
3770
+ - Each description is ONE sentence \u2014 if you need two, the spec is too big
3771
+ - Dependencies reference other spec numbers \u2014 keep the dependency graph shallow
3772
+ - More than 2 dependencies on a single spec = it's too big, split further
3773
+ - Aim for 3-7 specs per feature. Fewer than 3 = probably not decomposed enough. More than 10 = the feature brief is too big
3774
+
3775
+ ## Step 4: Present and Iterate
3776
+
3777
+ Show the decomposition table to the user. Ask:
3778
+ 1. "Does this breakdown match how you think about this feature?"
3779
+ 2. "Are there any specs that feel too big or too small?"
3780
+ 3. "Should any of these run in parallel (separate branches)?"
3781
+
3782
+ Iterate until the user approves.
3783
+
3784
+ ## Step 5: Generate Atomic Specs
3785
+
3786
+ For each approved row, create \`docs/specs/YYYY-MM-DD-spec-name.md\`. Create the \`docs/specs/\` directory if it doesn't exist.
3787
+
3788
+ **Why:** Each spec must be self-contained \u2014 a fresh session should be able to execute it without reading the Feature Brief. Copy relevant constraints and context into each spec.
3789
+
3790
+ Use this structure:
3791
+
3792
+ \`\`\`markdown
3793
+ # [Verb + Object] \u2014 Atomic Spec
3794
+
3795
+ > **Parent Brief:** \`docs/briefs/YYYY-MM-DD-feature-name.md\` (or "standalone")
3796
+ > **Status:** Ready
3797
+ > **Date:** YYYY-MM-DD
3798
+ > **Estimated scope:** [1 session / N files / ~N lines]
3799
+
3800
+ ---
3801
+
3802
+ ## What
3803
+ One paragraph \u2014 what changes when this spec is done?
3804
+
3805
+ ## Why
3806
+ One sentence \u2014 what breaks or is missing without this?
3807
+
3808
+ ## Acceptance Criteria
3809
+ - [ ] [Observable behavior]
3810
+ - [ ] Build passes
3811
+ - [ ] Tests pass
3812
+
3813
+ ## Test Plan
3814
+
3815
+ | Acceptance Criterion | Test | Type |
3816
+ |---------------------|------|------|
3817
+ | [Each AC above] | [What to call/assert] | [unit/integration/e2e] |
3818
+
3819
+ **Execution order:**
3820
+ 1. Write all tests above \u2014 they should fail against current/stubbed code
3821
+ 2. Run tests to confirm they fail (red)
3822
+ 3. Implement until all tests pass (green)
3823
+
3824
+ **Smoke test:** [Identify the fastest test for iteration feedback]
3825
+
3826
+ **Before implementing, verify your test harness:**
3827
+ 1. Run all tests \u2014 they must FAIL (if they pass, you're testing the wrong thing)
3828
+ 2. Each test calls your actual function/endpoint \u2014 not a reimplementation or the underlying library
3829
+ 3. Identify your smoke test \u2014 it must run in seconds, not minutes, so you get fast feedback on each change
3830
+
3831
+ ## Constraints
3832
+ - MUST: [hard requirement]
3833
+ - MUST NOT: [hard prohibition]
3834
+
3835
+ ## Affected Files
3836
+ | Action | File | What Changes |
3837
+ |--------|------|-------------|
3838
+
3839
+ ## Approach
3840
+ Strategy, data flow, key decisions. Name one rejected alternative.
3841
+
3842
+ ## Edge Cases
3843
+ | Scenario | Expected Behavior |
3844
+ |----------|------------------|
3845
+ \`\`\`
3846
+
3847
+ If \`docs/templates/ATOMIC_SPEC_TEMPLATE.md\` exists, reference it for the full template with additional guidance.
3848
+
3849
+ Fill in all sections \u2014 each spec must be self-contained (no "see the brief for context"). Copy relevant constraints from the Feature Brief into each spec. Write acceptance criteria specific to THIS spec, not the whole feature. Every acceptance criterion must have at least one corresponding test in the Test Plan. If the user provided test strategy info from the interview, use it to choose test types and frameworks. Include the test harness verification rules in every Test Plan.
3850
+
3851
+ ## Step 6: Recommend Execution Strategy
3852
+
3853
+ Based on the dependency graph:
3854
+ - **Independent specs** \u2014 "These can run in parallel branches"
3855
+ - **Sequential specs** \u2014 "Execute these in order: 1 -> 2 -> 4"
3856
+ - **Mixed** \u2014 "Start specs 1 and 3 in parallel. After 1 completes, start 2."
3857
+
3858
+ Update the Feature Brief's Execution Strategy section with the plan (if a brief exists).
3859
+
3860
+ ## Step 7: Hand Off
3861
+
3862
+ Tell the user:
3863
+ \`\`\`
3864
+ Decomposition complete:
3865
+ - [N] atomic specs created in docs/specs/
3866
+ - [N] can run in parallel, [N] are sequential
3867
+ - Estimated total: [N] sessions
3868
+
3869
+ To execute:
3870
+ - Sequential: Open a session, point at each spec in order
3871
+ - Parallel: One spec per branch, merge when done
3872
+ - Each session should end with $joycraft-session-end to capture discoveries
3873
+
3874
+ Ready to start execution?
3875
+ \`\`\`
3876
+ `,
3877
+ "joycraft-design.md": `---
3878
+ name: joycraft-design
3879
+ description: Design discussion before decomposition \u2014 produce a ~200-line design artifact for human review, catching wrong assumptions before they propagate into specs
3880
+ ---
3881
+
3882
+ # Design Discussion
3883
+
3884
+ You are producing a design discussion document for a feature. This sits between research and decomposition \u2014 it captures your understanding so the human can catch wrong assumptions before specs are written.
3885
+
3886
+ **Guard clause:** If no brief path is provided and no brief exists in \`docs/briefs/\`, say:
3887
+ "No feature brief found. Run \`$joycraft-new-feature\` first to create one, or provide the path to your brief."
3888
+ Then stop.
3889
+
3890
+ ---
3891
+
3892
+ ## Step 1: Read Inputs
3893
+
3894
+ Read the feature brief at the path the user provides. If the user also provides a research document path, read that too.
3895
+
3896
+ ## Step 2: Explore the Codebase
3897
+
3898
+ Spawn concurrent subagent threads to explore the codebase for patterns relevant to the brief. Focus on:
3899
+
3900
+ - Files and functions that will be touched or extended
3901
+ - Existing patterns this feature should follow
3902
+ - Similar features already implemented that serve as models
3903
+ - Boundaries and interfaces the feature must integrate with
3904
+
3905
+ Each subagent should search the codebase and read files to gather file paths, function signatures, and code snippets.
3906
+
3907
+ ## Step 3: Write the Design Document
3908
+
3909
+ Create \`docs/designs/\` directory if it doesn't exist. Write to \`docs/designs/YYYY-MM-DD-feature-name.md\`.
3910
+
3911
+ The document has exactly five sections:
3912
+
3913
+ ### Section 1: Current State
3914
+ What exists today in the codebase. Include file paths, function signatures, data flows. Be specific.
3915
+
3916
+ ### Section 2: Desired End State
3917
+ What the codebase should look like when this feature is complete.
3918
+
3919
+ ### Section 3: Patterns to Follow
3920
+ Existing patterns in the codebase that this feature should match. Include code snippets and \`file:line\` references.
3921
+
3922
+ ### Section 4: Resolved Design Decisions
3923
+ Decisions made with rationale. Format: Decision, Rationale, Alternative rejected.
3924
+
3925
+ ### Section 5: Open Questions
3926
+ Things where multiple valid approaches exist. Each question MUST present 2-3 concrete options with pros and cons.
3927
+
3928
+ ## Step 4: Present and STOP
3929
+
3930
+ Present the design document. Say:
3931
+ \`\`\`
3932
+ Design discussion written to docs/designs/YYYY-MM-DD-feature-name.md
3933
+
3934
+ Please review. Specifically:
3935
+ 1. Are the patterns in Section 3 right?
3936
+ 2. Do you agree with the resolved decisions?
3937
+ 3. Pick an option for each open question.
3938
+
3939
+ Reply with your feedback. I will NOT proceed to decomposition until you have reviewed and approved.
3940
+ \`\`\`
3941
+
3942
+ **CRITICAL: Do NOT proceed to \`$joycraft-decompose\` or generate specs.** Wait for human review.
3943
+
3944
+ ## After Human Review
3945
+
3946
+ - Update the design document with corrections
3947
+ - Move answered questions to Resolved Design Decisions
3948
+ - Present for final confirmation
3949
+ - Only after explicit approval: "Design approved. Run \`$joycraft-decompose\` with this brief to generate atomic specs."
3950
+ `,
3951
+ "joycraft-implement-level5.md": `---
3952
+ name: joycraft-implement-level5
3953
+ description: Set up Level 5 autonomous development \u2014 autofix loop, holdout scenario testing, and scenario evolution from specs
3954
+ ---
3955
+
3956
+ # Implement Level 5 \u2014 Autonomous Development Loop
3957
+
3958
+ You are guiding the user through setting up Level 5: the autonomous feedback loop where specs go in, validated software comes out. This is a one-time setup that installs workflows, creates a scenarios repo, and configures the autofix loop.
3959
+
3960
+ ## Before You Begin
3961
+
3962
+ Check prerequisites:
3963
+
3964
+ 1. **Project must be initialized.** Search for \`.joycraft-version\`. If missing, tell the user to run \`npx joycraft init\` first.
3965
+ 2. **Project should be at Level 4.** Read \`docs/joycraft-assessment.md\` if it exists. If the project hasn't been assessed yet, suggest running \`$joycraft-tune\` first. But don't block -- the user may know they're ready.
3966
+ 3. **Git repo with GitHub remote.** This setup requires GitHub Actions. Check for \`.git/\` and a GitHub remote.
3967
+
3968
+ If prerequisites aren't met, explain what's needed and stop.
3969
+
3970
+ ## Step 1: Explain What Level 5 Means
3971
+
3972
+ Tell the user:
3973
+
3974
+ > Level 5 is the autonomous loop. When you push specs, three things happen automatically:
3975
+ >
3976
+ > 1. **Scenario evolution** -- An AI agent reads your specs and writes holdout tests in a private scenarios repo. These tests are invisible to your coding agent.
3977
+ > 2. **Autofix** -- When CI fails on a PR, the agent automatically attempts a fix (up to 3 times).
3978
+ > 3. **Holdout validation** -- When CI passes, your scenarios repo runs behavioral tests against the PR. Results post as PR comments.
3979
+ >
3980
+ > The key insight: your coding agent never sees the scenario tests. This prevents it from gaming the test suite -- like a validation set in machine learning.
3981
+
3982
+ ## Step 2: Gather Configuration
3983
+
3984
+ Ask these questions **one at a time**:
3985
+
3986
+ ### Question 1: Scenarios repo name
3987
+
3988
+ > What should we call your scenarios repo? It'll be a private repo that holds your holdout tests.
3989
+ >
3990
+ > Default: \`{current-repo-name}-scenarios\`
3991
+
3992
+ Accept the default or the user's choice.
3993
+
3994
+ ### Question 2: GitHub App
3995
+
3996
+ > Level 5 needs a GitHub App to provide a separate identity for autofix pushes (this avoids GitHub's anti-recursion protection). Creating one takes about 2 minutes:
3997
+ >
3998
+ > 1. Go to https://github.com/settings/apps/new
3999
+ > 2. Give it a name (e.g., "My Project Autofix")
4000
+ > 3. Uncheck "Webhook > Active" (not needed)
4001
+ > 4. Under **Repository permissions**, set:
4002
+ > - **Contents**: Read & Write
4003
+ > - **Pull requests**: Read & Write
4004
+ > - **Actions**: Read & Write
4005
+ > 5. Click **Create GitHub App**
4006
+ > 6. Note the **App ID** from the settings page
4007
+ > 7. Scroll to **Private keys** > click **Generate a private key** > save the \`.pem\` file
4008
+ > 8. Click **Install App** in the left sidebar > install it on your repo
4009
+ >
4010
+ > What's your App ID?
4011
+
4012
+ ## Step 3: Run init-autofix
4013
+
4014
+ Run the CLI command with the gathered configuration:
4015
+
4016
+ \`\`\`bash
4017
+ npx joycraft init-autofix --scenarios-repo {name} --app-id {id}
4018
+ \`\`\`
4019
+
4020
+ Review the output with the user. Confirm files were created.
4021
+
4022
+ ## Step 4: Walk Through Secret Configuration
4023
+
4024
+ Guide the user step by step:
4025
+
4026
+ ### 4a: Add Secrets to Main Repo
4027
+
4028
+ > You should already have the \`.pem\` file from when you created the app in Step 2.
4029
+
4030
+ > Go to your repo's Settings > Secrets and variables > Actions, and add:
4031
+ > - \`JOYCRAFT_APP_PRIVATE_KEY\` -- paste the contents of your \`.pem\` file
4032
+ > - \`ANTHROPIC_API_KEY\` -- your Anthropic API key (or the appropriate AI provider key for your setup)
4033
+
4034
+ ### 4b: Create the Scenarios Repo
4035
+
4036
+ > Create the private scenarios repo:
4037
+ > \`\`\`bash
4038
+ > gh repo create {scenarios-repo-name} --private
4039
+ > \`\`\`
4040
+ >
4041
+ > Then copy the scenario templates into it:
4042
+ > \`\`\`bash
4043
+ > cp -r docs/templates/scenarios/* ../{scenarios-repo-name}/
4044
+ > cd ../{scenarios-repo-name}
4045
+ > git add -A && git commit -m "init: scaffold scenarios repo from Joycraft"
4046
+ > git push
4047
+ > \`\`\`
4048
+
4049
+ ### 4c: Add Secrets to Scenarios Repo
4050
+
4051
+ > The scenarios repo also needs the App private key:
4052
+ > - \`JOYCRAFT_APP_PRIVATE_KEY\` -- same \`.pem\` file as the main repo
4053
+ > - \`ANTHROPIC_API_KEY\` -- same key (needed for scenario generation)
4054
+
4055
+ ## Step 5: Verify Setup
4056
+
4057
+ Help the user verify everything is wired correctly:
4058
+
4059
+ 1. **Check workflow files exist:** \`ls .github/workflows/autofix.yml .github/workflows/scenarios-dispatch.yml .github/workflows/spec-dispatch.yml .github/workflows/scenarios-rerun.yml\`
4060
+ 2. **Check scenario templates were copied:** Verify the scenarios repo has \`example-scenario.test.ts\`, \`workflows/run.yml\`, \`workflows/generate.yml\`, \`prompts/scenario-agent.md\`
4061
+ 3. **Check the App ID is correct** in the workflow files (not still a placeholder)
4062
+
4063
+ ## Step 6: Update AGENTS.md
4064
+
4065
+ If the project's AGENTS.md doesn't already have an "External Validation" section, add one:
4066
+
4067
+ > ## External Validation
4068
+ >
4069
+ > This project uses holdout scenario tests in a separate private repo.
4070
+ >
4071
+ > ### NEVER
4072
+ > - Access, read, or reference the scenarios repo
4073
+ > - Mention scenario test names or contents
4074
+ > - Modify the scenarios dispatch workflow to leak test information
4075
+ >
4076
+ > The scenarios repo is deliberately invisible to you. This is the holdout guarantee.
4077
+
4078
+ ## Step 7: First Test (Optional)
4079
+
4080
+ If the user wants to test the loop:
4081
+
4082
+ > Want to do a quick test? Here's how:
4083
+ >
4084
+ > 1. Write a simple spec in \`docs/specs/\` and push to main -- this triggers scenario generation
4085
+ > 2. Create a PR with a small change -- when CI passes, scenarios will run
4086
+ > 3. Watch for the scenario test results as a PR comment
4087
+ >
4088
+ > Or deliberately break something in a PR to test the autofix loop.
4089
+
4090
+ ## Step 8: Summary
4091
+
4092
+ Print a summary of what was set up:
4093
+
4094
+ > **Level 5 is live.** Here's what's running:
4095
+ >
4096
+ > | Trigger | What Happens |
4097
+ > |---------|-------------|
4098
+ > | Push specs to \`docs/specs/\` | Scenario agent writes holdout tests |
4099
+ > | PR fails CI | Autofix agent attempts a fix (up to 3x) |
4100
+ > | PR passes CI | Holdout scenarios run against PR |
4101
+ > | Scenarios update | Open PRs re-tested with latest scenarios |
4102
+ >
4103
+ > Your scenarios repo: \`{name}\`
4104
+ > Your coding agent cannot see those tests. The holdout wall is intact.
4105
+
4106
+ Update \`docs/joycraft-assessment.md\` if it exists -- set the Level 5 score to reflect the new setup.
4107
+ `,
4108
+ "joycraft-interview.md": `---
4109
+ name: joycraft-interview
4110
+ description: Brainstorm freely about what you want to build \u2014 yap, explore ideas, and get a structured summary you can use later
4111
+ ---
4112
+
4113
+ # Interview \u2014 Idea Exploration
4114
+
4115
+ You are helping the user brainstorm and explore what they want to build. This is a lightweight, low-pressure conversation \u2014 not a formal spec process. Let them yap.
4116
+
4117
+ ## How to Run the Interview
4118
+
4119
+ ### 1. Open the Floor
4120
+
4121
+ Start with something like:
4122
+ "What are you thinking about building? Just talk \u2014 I'll listen and ask questions as we go."
4123
+
4124
+ Let the user talk freely. Do not interrupt their flow. Do not push toward structure yet.
4125
+
4126
+ ### 2. Ask Clarifying Questions
4127
+
4128
+ As they talk, weave in questions naturally \u2014 don't fire them all at once:
4129
+
4130
+ - **What problem does this solve?** Who feels the pain today?
4131
+ - **What does "done" look like?** If this worked perfectly, what would a user see?
4132
+ - **What are the constraints?** Time, tech, team, budget \u2014 what boxes are we in?
4133
+ - **What's NOT in scope?** What's tempting but should be deferred?
4134
+ - **What are the edge cases?** What could go wrong? What's the weird input?
4135
+ - **What exists already?** Are we building on something or starting fresh?
4136
+
4137
+ ### 3. Play Back Understanding
4138
+
4139
+ After the user has gotten their ideas out, reflect back:
4140
+ "So if I'm hearing you right, you want to [summary]. The core problem is [X], and done looks like [Y]. Is that right?"
4141
+
4142
+ Let them correct and refine. Iterate until they say "yes, that's it."
4143
+
4144
+ ### 4. Write a Draft Brief
4145
+
4146
+ Create a draft file at \`docs/briefs/YYYY-MM-DD-topic-draft.md\`. Create the \`docs/briefs/\` directory if it doesn't exist.
4147
+
4148
+ Use this format:
4149
+
4150
+ \`\`\`markdown
4151
+ # [Topic] \u2014 Draft Brief
4152
+
4153
+ > **Date:** YYYY-MM-DD
4154
+ > **Status:** DRAFT
4155
+ > **Origin:** $joycraft-interview session
4156
+
4157
+ ---
4158
+
4159
+ ## The Idea
4160
+ [2-3 paragraphs capturing what the user described \u2014 their words, their framing]
4161
+
4162
+ ## Problem
4163
+ [What pain or gap this addresses]
4164
+
4165
+ ## What "Done" Looks Like
4166
+ [The user's description of success \u2014 observable outcomes]
4167
+
4168
+ ## Constraints
4169
+ - [constraint 1]
4170
+ - [constraint 2]
4171
+
4172
+ ## Open Questions
4173
+ - [things that came up but weren't resolved]
4174
+ - [decisions that need more thought]
4175
+
4176
+ ## Out of Scope (for now)
4177
+ - [things explicitly deferred]
4178
+
4179
+ ## Raw Notes
4180
+ [Any additional context, quotes, or tangents worth preserving]
4181
+ \`\`\`
4182
+
4183
+ ### 5. Hand Off
4184
+
4185
+ After writing the draft, tell the user:
4186
+
4187
+ \`\`\`
4188
+ Draft brief saved to docs/briefs/YYYY-MM-DD-topic-draft.md
4189
+
4190
+ When you're ready to move forward:
4191
+ - $joycraft-new-feature \u2014 formalize this into a full Feature Brief with specs
4192
+ - $joycraft-decompose \u2014 break it directly into atomic specs if scope is clear
4193
+ - Or just keep brainstorming \u2014 run $joycraft-interview again anytime
4194
+ \`\`\`
4195
+
4196
+ ## Guidelines
4197
+
4198
+ - **This is NOT $joycraft-new-feature.** Do not push toward formal briefs, decomposition tables, or atomic specs. The point is exploration.
4199
+ - **Let the user lead.** Your job is to listen, clarify, and capture \u2014 not to structure or direct.
4200
+ - **Mark everything as DRAFT.** The output is a starting point, not a commitment.
4201
+ - **Keep it short.** The draft brief should be 1-2 pages max. Capture the essence, not every detail.
4202
+ - **Multiple interviews are fine.** The user might run this several times as their thinking evolves. Each creates a new dated draft.
4203
+ `,
4204
+ "joycraft-lockdown.md": `---
4205
+ name: joycraft-lockdown
4206
+ description: Generate constrained execution boundaries for an implementation session -- NEVER rules and deny patterns to prevent agent overreach
4207
+ ---
4208
+
4209
+ # Lockdown Mode
4210
+
4211
+ The user wants to constrain agent behavior for an implementation session. Your job is to interview them about what should be off-limits, then generate AGENTS.md NEVER rules and Codex configuration deny patterns they can review and apply.
4212
+
4213
+ ## When Is Lockdown Useful?
4214
+
4215
+ Lockdown is most valuable for:
4216
+ - **Complex tech stacks** (hardware, firmware, multi-device) where agents can cause real damage
4217
+ - **Long-running autonomous sessions** where you won't be monitoring every action
4218
+ - **Production-adjacent work** where accidental network calls or package installs are risky
4219
+
4220
+ For simple feature work on a well-tested codebase, lockdown is usually overkill. Mention this context to the user so they can decide.
4221
+
4222
+ ## Step 1: Check for Tests
4223
+
4224
+ Before starting the interview, search the codebase for test files or directories (look for \`tests/\`, \`test/\`, \`__tests__/\`, \`spec/\`, or files matching \`*.test.*\`, \`*.spec.*\`).
4225
+
4226
+ If no tests are found, tell the user:
4227
+
4228
+ > Lockdown mode is most useful when you already have tests in place -- it prevents the agent from modifying them while constraining behavior to writing code and running tests. Consider running \`$joycraft-new-feature\` first to set up a test-driven workflow, then come back to lock it down.
4229
+
4230
+ If the user wants to proceed anyway, continue with the interview.
4231
+
4232
+ ## Step 2: Interview -- What to Lock Down
4233
+
4234
+ Ask these three questions, one at a time. Wait for the user's response before proceeding to the next question.
4235
+
4236
+ ### Question 1: Read-Only Files
4237
+
4238
+ > What test files or directories should be off-limits for editing? (e.g., \`tests/\`, \`__tests__/\`, \`spec/\`, specific test files)
4239
+ >
4240
+ > I'll generate NEVER rules to prevent editing these.
4241
+
4242
+ If the user isn't sure, suggest the test directories you found in Step 1.
4243
+
4244
+ ### Question 2: Allowed Commands
4245
+
4246
+ > What commands should the agent be allowed to run? Defaults:
4247
+ > - Write and edit source code files
4248
+ > - Run the project's smoke test command
4249
+ > - Run the full test suite
4250
+ >
4251
+ > Any other commands to explicitly allow? Or should I restrict to just these?
4252
+
4253
+ ### Question 3: Denied Commands
4254
+
4255
+ > What commands should be denied? Defaults:
4256
+ > - Package installs (\`npm install\`, \`pip install\`, \`cargo add\`, \`go get\`, etc.)
4257
+ > - Network tools (\`curl\`, \`wget\`, \`ping\`, \`ssh\`)
4258
+ > - Direct log file reading
4259
+ >
4260
+ > Any specific commands to add or remove from this list?
4261
+
4262
+ **Edge case -- user wants to allow some network access:** If the user mentions API tests or specific endpoints that need network access, exclude those from the deny list and note the exception in the output.
4263
+
4264
+ **Edge case -- user wants to lock down file writes:** If the user wants to prevent ALL file writes, warn them:
4265
+
4266
+ > Denying all file writes would prevent the agent from doing any work. I recommend keeping source code writes allowed and only locking down test files, config files, or other sensitive directories.
4267
+
4268
+ ## Step 3: Generate Boundaries
4269
+
4270
+ Based on the interview responses, generate output in this exact format:
4271
+
4272
+ \`\`\`
4273
+ ## Lockdown boundaries generated
4274
+
4275
+ Review these suggestions and add them to your project:
4276
+
4277
+ ### AGENTS.md -- add to NEVER section:
4278
+
4279
+ - Edit any file in \`[user's test directories]\`
4280
+ - Run \`[denied package manager commands]\`
4281
+ - Use \`[denied network tools]\`
4282
+ - Read log files directly -- interact with logs only through test assertions
4283
+ - [Any additional NEVER rules based on user responses]
4284
+
4285
+ ### Codex configuration -- suggested deny patterns:
4286
+
4287
+ Add these to your Codex sandbox configuration to restrict command execution:
4288
+
4289
+ ["[command1]", "[command2]", "[command3]"]
4290
+
4291
+ ---
4292
+
4293
+ Copy these into your project manually, or tell me to apply them now (I'll show you the exact changes for approval first).
4294
+ \`\`\`
4295
+
4296
+ Adjust the content based on the actual interview responses:
4297
+ - Only include deny patterns for commands the user confirmed should be denied
4298
+ - Only include NEVER rules for directories/files the user specified
4299
+ - If the user allowed certain network tools or package managers, exclude those
4300
+
4301
+ ## Recommended Execution Model
4302
+
4303
+ After generating the boundaries above, also recommend a Codex execution configuration. Include this section in your output:
4304
+
4305
+ \`\`\`
4306
+ ### Recommended Execution Configuration
4307
+
4308
+ Codex runs in a sandboxed environment by default. To maximize safety during lockdown:
4309
+
4310
+ | Your situation | Configuration | Why |
4311
+ |---|---|---|
4312
+ | Autonomous spec execution | Sandbox with deny patterns above | Only pre-approved commands run |
4313
+ | Long session with some trust | Default sandbox | Network-disabled sandbox prevents external access |
4314
+ | Interactive development | Default with manual review | Review outputs before applying |
4315
+
4316
+ **For lockdown mode, we recommend the default sandboxed execution** combined with the deny patterns above. Codex's sandbox already disables network access by default -- the deny patterns add file-level and command-level restrictions on top.
4317
+
4318
+ If you need network access for specific commands (e.g., API tests), configure explicit network allowances in your Codex setup rather than disabling the sandbox entirely.
4319
+ \`\`\`
4320
+
4321
+ ## Step 4: Offer to Apply
4322
+
4323
+ If the user asks you to apply the changes:
4324
+
4325
+ 1. **For AGENTS.md:** Read the existing AGENTS.md, find the Behavioral Boundaries section, and show the user the exact diff for the NEVER section. Ask for confirmation before writing.
4326
+ 2. **For Codex configuration:** Show the user what the deny patterns will look like after adding the new restrictions. Ask for confirmation before writing.
4327
+
4328
+ **Never auto-apply. Always show the exact changes and wait for explicit approval.**
4329
+ `,
4330
+ "joycraft-new-feature.md": `---
4331
+ name: joycraft-new-feature
4332
+ description: Guided feature development \u2014 interview the user, produce a Feature Brief, then decompose into atomic specs
4333
+ ---
4334
+
4335
+ # New Feature Workflow
4336
+
4337
+ You are starting a new feature. Follow this process in order. Do not skip steps.
4338
+
4339
+ ## Phase 1: Interview
4340
+
4341
+ Interview the user about what they want to build. Let them talk \u2014 your job is to listen, then sharpen.
4342
+
4343
+ **Ask about:**
4344
+ - What problem does this solve? Who is affected?
4345
+ - What does "done" look like?
4346
+ - Hard constraints? (business rules, tech limitations, deadlines)
4347
+ - What is explicitly NOT in scope? (push hard on this)
4348
+ - Edge cases or error conditions?
4349
+ - What existing code/patterns should this follow?
4350
+ - Testing: existing setup? framework? smoke test budget? lockdown mode desired?
4351
+
4352
+ **Interview technique:**
4353
+ - Let the user "yap" \u2014 don't interrupt their flow
4354
+ - Play back your understanding: "So if I'm hearing you right..."
4355
+ - Push toward testable statements: "How would we verify that works?"
4356
+
4357
+ Keep asking until you can fill out a Feature Brief.
4358
+
4359
+ ## Phase 2: Feature Brief
4360
+
4361
+ Write a Feature Brief to \`docs/briefs/YYYY-MM-DD-feature-name.md\`. Create the \`docs/briefs/\` directory if it doesn't exist.
4362
+
4363
+ **Why:** The brief is the single source of truth for what we're building. It prevents scope creep and gives every spec a shared reference point.
4364
+
4365
+ Use this structure:
4366
+
4367
+ \`\`\`markdown
4368
+ # [Feature Name] \u2014 Feature Brief
4369
+
4370
+ > **Date:** YYYY-MM-DD
4371
+ > **Project:** [project name]
4372
+ > **Status:** Interview | Decomposing | Specs Ready | In Progress | Complete
4373
+
4374
+ ---
4375
+
4376
+ ## Vision
4377
+ What are we building and why? The full picture in 2-4 paragraphs.
4378
+
4379
+ ## User Stories
4380
+ - As a [role], I want [capability] so that [benefit]
4381
+
4382
+ ## Hard Constraints
4383
+ - MUST: [constraint that every spec must respect]
4384
+ - MUST NOT: [prohibition that every spec must respect]
4385
+
4386
+ ## Out of Scope
4387
+ - NOT: [tempting but deferred]
4388
+
4389
+ ## Test Strategy
4390
+ - **Existing setup:** [framework and tools, or "none yet"]
4391
+ - **User expertise:** [comfortable / learning / needs guidance]
4392
+ - **Test types:** [smoke, unit, integration, e2e, etc.]
4393
+ - **Smoke test budget:** [target time for fast-feedback tests]
4394
+ - **Lockdown mode:** [yes/no \u2014 constrain agent to code + tests only]
4395
+
4396
+ ## Decomposition
4397
+ | # | Spec Name | Description | Dependencies | Est. Size |
4398
+ |---|-----------|-------------|--------------|-----------|
4399
+ | 1 | [verb-object] | [one sentence] | None | [S/M/L] |
4400
+
4401
+ ## Execution Strategy
4402
+ - [ ] Sequential (specs have chain dependencies)
4403
+ - [ ] Parallel (specs are independent)
4404
+ - [ ] Mixed
4405
+
4406
+ ## Success Criteria
4407
+ - [ ] [End-to-end behavior 1]
4408
+ - [ ] [No regressions in existing features]
4409
+ \`\`\`
4410
+
4411
+ If \`docs/templates/FEATURE_BRIEF_TEMPLATE.md\` exists, reference it for the full template with additional guidance.
4412
+
4413
+ Present the brief to the user. Focus review on:
4414
+ - "Does the decomposition match how you think about this?"
4415
+ - "Is anything in scope that shouldn't be?"
4416
+ - "Are the specs small enough? Can each be described in one sentence?"
4417
+
4418
+ Iterate until approved.
4419
+
4420
+ ## Phase 3: Generate Atomic Specs
4421
+
4422
+ For each row in the decomposition table, create a self-contained spec file at \`docs/specs/YYYY-MM-DD-spec-name.md\`. Create the \`docs/specs/\` directory if it doesn't exist.
4423
+
4424
+ **Why:** Each spec must be understandable WITHOUT reading the Feature Brief. This prevents the "Curse of Instructions" \u2014 no spec should require holding the entire feature in context. Copy relevant context into each spec.
4425
+
4426
+ Use this structure for each spec:
4427
+
4428
+ \`\`\`markdown
4429
+ # [Verb + Object] \u2014 Atomic Spec
4430
+
4431
+ > **Parent Brief:** \`docs/briefs/YYYY-MM-DD-feature-name.md\`
4432
+ > **Status:** Ready
4433
+ > **Date:** YYYY-MM-DD
4434
+ > **Estimated scope:** [1 session / N files / ~N lines]
4435
+
4436
+ ---
4437
+
4438
+ ## What
4439
+ One paragraph \u2014 what changes when this spec is done?
4440
+
4441
+ ## Why
4442
+ One sentence \u2014 what breaks or is missing without this?
4443
+
4444
+ ## Acceptance Criteria
4445
+ - [ ] [Observable behavior]
4446
+ - [ ] Build passes
4447
+ - [ ] Tests pass
4448
+
4449
+ ## Test Plan
4450
+
4451
+ | Acceptance Criterion | Test | Type |
4452
+ |---------------------|------|------|
4453
+ | [Each AC above] | [What to call/assert] | [unit/integration/e2e] |
4454
+
4455
+ **Execution order:**
4456
+ 1. Write all tests above \u2014 they should fail against current/stubbed code
4457
+ 2. Run tests to confirm they fail (red)
4458
+ 3. Implement until all tests pass (green)
4459
+
4460
+ **Smoke test:** [Identify the fastest test for iteration feedback]
4461
+
4462
+ **Before implementing, verify your test harness:**
4463
+ 1. Run all tests \u2014 they must FAIL (if they pass, you're testing the wrong thing)
4464
+ 2. Each test calls your actual function/endpoint \u2014 not a reimplementation or the underlying library
4465
+ 3. Identify your smoke test \u2014 it must run in seconds, not minutes, so you get fast feedback on each change
4466
+
4467
+ ## Constraints
4468
+ - MUST: [hard requirement]
4469
+ - MUST NOT: [hard prohibition]
4470
+
4471
+ ## Affected Files
4472
+ | Action | File | What Changes |
4473
+ |--------|------|-------------|
4474
+
4475
+ ## Approach
4476
+ Strategy, data flow, key decisions. Name one rejected alternative.
4477
+
4478
+ ## Edge Cases
4479
+ | Scenario | Expected Behavior |
4480
+ |----------|------------------|
4481
+ \`\`\`
4482
+
4483
+ If \`docs/templates/ATOMIC_SPEC_TEMPLATE.md\` exists, reference it for the full template with additional guidance.
4484
+
4485
+ ## Phase 4: Hand Off for Execution
4486
+
4487
+ Tell the user:
4488
+ \`\`\`
4489
+ Feature Brief and [N] atomic specs are ready.
4490
+
4491
+ Specs:
4492
+ 1. [spec-name] \u2014 [one sentence] [S/M/L]
4493
+ 2. [spec-name] \u2014 [one sentence] [S/M/L]
4494
+ ...
4495
+
4496
+ Recommended execution:
4497
+ - [Parallel/Sequential/Mixed strategy]
4498
+ - Estimated: [N] sessions total
4499
+
4500
+ To execute: Start a fresh session per spec. Each session should:
4501
+ 1. Read the spec
4502
+ 2. Implement
4503
+ 3. Run $joycraft-session-end to capture discoveries
4504
+ 4. Commit and PR
4505
+
4506
+ Ready to start?
4507
+ \`\`\`
4508
+
4509
+ **Why:** A fresh session for execution produces better results. The interview session has too much context noise \u2014 a clean session with just the spec is more focused.
4510
+
4511
+ You can also use \`$joycraft-decompose\` to re-decompose a brief if the breakdown needs adjustment, or run \`$joycraft-interview\` first for a lighter brainstorm before committing to the full workflow.
4512
+ `,
4513
+ "joycraft-research.md": `---
4514
+ name: joycraft-research
4515
+ description: Produce objective codebase research by isolating question generation from fact-gathering \u2014 subagent sees only questions, never the brief
4516
+ ---
4517
+
4518
+ # Research Codebase for a Feature
4519
+
4520
+ You are producing objective codebase research to inform a future spec or implementation. The key insight: the researching agent must never see the brief or ticket \u2014 only research questions. This prevents opinions from contaminating the facts.
4521
+
4522
+ **Guard clause:** If the user doesn't provide a brief path or inline description, ask:
4523
+ "What feature or change are you researching? Provide a brief path or describe it."
4524
+
4525
+ ---
4526
+
4527
+ ## Phase 1: Generate Research Questions
4528
+
4529
+ Read the brief and identify which zones of the codebase are relevant. Generate 5-10 research questions that are:
4530
+ - **Objective and fact-seeking** \u2014 "How does X work?" not "How should we build X?"
4531
+ - **Specific to the codebase**
4532
+ - **Answerable by reading code**
4533
+
4534
+ Write the questions to \`docs/research/.questions-tmp.md\`. **Do NOT include any content from the brief.**
4535
+
4536
+ ---
4537
+
4538
+ ## Phase 2: Spawn Research Subagent
4539
+
4540
+ Spawn a subagent to perform the research. Pass ONLY the research questions \u2014 never the brief.
4541
+
4542
+ Subagent prompt:
4543
+ \`\`\`
4544
+ You are researching a codebase to answer specific questions. You have NO context about why these questions are being asked.
4545
+
4546
+ RULES:
4547
+ - Answer each question with FACTS ONLY: file paths, function signatures, data flows, patterns, dependencies
4548
+ - Do NOT recommend, suggest, or opine
4549
+ - Do NOT speculate about what should be built
4550
+ - If a question cannot be answered, say "No existing code found for this"
4551
+ - Search the codebase and read files thoroughly
4552
+ - Include code snippets only when essential evidence
4553
+
4554
+ QUESTIONS:
4555
+ [INSERT_QUESTIONS_HERE]
4556
+
4557
+ OUTPUT FORMAT:
4558
+
4559
+ # Codebase Research
4560
+
4561
+ **Date:** [today]
4562
+ **Questions answered:** [N/total]
4563
+
4564
+ ---
4565
+
4566
+ ## Q1: [question]
4567
+ [Facts only]
4568
+
4569
+ ## Q2: [question]
4570
+ [Facts only]
4571
+ \`\`\`
4572
+
4573
+ ## Phase 3: Write the Research Document
4574
+
4575
+ Write the subagent's response to \`docs/research/YYYY-MM-DD-feature-name.md\`. Delete the temporary questions file.
4576
+
4577
+ Present:
4578
+ \`\`\`
4579
+ Research complete: docs/research/YYYY-MM-DD-feature-name.md
4580
+
4581
+ This document contains objective facts \u2014 no opinions or recommendations.
4582
+
4583
+ Next steps:
4584
+ - $joycraft-decompose \u2014 break the feature into atomic specs
4585
+ - $joycraft-new-feature \u2014 formalize into a full Feature Brief first
4586
+ - Read the research and add corrections manually
4587
+ \`\`\`
4588
+ `,
4589
+ "joycraft-session-end.md": `---
4590
+ name: joycraft-session-end
4591
+ description: Wrap up a session \u2014 capture discoveries, verify, prepare for PR or next session
4592
+ ---
4593
+
4594
+ # Session Wrap-Up
4595
+
4596
+ Before ending this session, complete these steps in order.
4597
+
4598
+ ## 1. Capture Discoveries
4599
+
4600
+ **Why:** Discoveries are the surprises \u2014 things that weren't in the spec or that contradicted expectations. They prevent future sessions from hitting the same walls.
4601
+
4602
+ Check: did anything surprising happen during this session? If yes, create or update a discovery file at \`docs/discoveries/YYYY-MM-DD-topic.md\`. Create the \`docs/discoveries/\` directory if it doesn't exist.
4603
+
4604
+ Only capture what's NOT obvious from the code or git diff:
4605
+ - "We thought X but found Y" \u2014 assumptions that were wrong
4606
+ - "This API/library behaves differently than documented" \u2014 external gotchas
4607
+ - "This edge case needs handling in a future spec" \u2014 deferred work with context
4608
+ - "The approach in the spec didn't work because..." \u2014 spec-vs-reality gaps
4609
+ - Key decisions made during implementation that aren't in the spec
4610
+
4611
+ **Do NOT capture:**
4612
+ - Files changed (that's the diff)
4613
+ - What you set out to do (that's the spec)
4614
+ - Step-by-step narrative of the session (nobody re-reads these)
4615
+
4616
+ Use this format:
4617
+
4618
+ \`\`\`markdown
4619
+ # Discoveries \u2014 [topic]
4620
+
4621
+ **Date:** YYYY-MM-DD
4622
+ **Spec:** [link to spec if applicable]
4623
+
4624
+ ## [Discovery title]
4625
+ **Expected:** [what we thought would happen]
4626
+ **Actual:** [what actually happened]
4627
+ **Impact:** [what this means for future work]
4628
+ \`\`\`
4629
+
4630
+ If nothing surprising happened, skip the discovery file entirely. No discovery is a good sign \u2014 the spec was accurate.
4631
+
4632
+ ## 1b. Update Context Documents
4633
+
4634
+ If \`docs/context/\` exists, quickly check whether this session revealed anything about:
4635
+
4636
+ - **Production risks** \u2014 did you interact with or learn about production vs staging systems? Update \`docs/context/production-map.md\`
4637
+ - **Wrong assumptions** \u2014 did you assume something that turned out to be false? Update \`docs/context/dangerous-assumptions.md\`
4638
+ - **Key decisions** \u2014 did you make an architectural or tooling choice? Add a row to \`docs/context/decision-log.md\`
4639
+ - **Unwritten rules** \u2014 did you discover a convention or constraint not documented anywhere? Update \`docs/context/institutional-knowledge.md\`
4640
+
4641
+ Skip this if nothing applies. Don't force it \u2014 only update when there's genuine new context.
4642
+
4643
+ ## 2. Run Validation
4644
+
4645
+ Run the project's validation commands. Check CLAUDE.md or AGENTS.md for project-specific commands. Common checks:
4646
+
4647
+ - Type-check (e.g., \`tsc --noEmit\`, \`mypy\`, \`cargo check\`)
4648
+ - Tests (e.g., \`npm test\`, \`pytest\`, \`cargo test\`)
4649
+ - Lint (e.g., \`eslint\`, \`ruff\`, \`clippy\`)
4650
+
4651
+ Fix any failures before proceeding.
4652
+
4653
+ ## 3. Update Spec Status
4654
+
4655
+ If working from an atomic spec in \`docs/specs/\`:
4656
+ - All acceptance criteria met \u2014 update status to \`Complete\`
4657
+ - Partially done \u2014 update status to \`In Progress\`, note what's left
4658
+
4659
+ If working from a Feature Brief in \`docs/briefs/\`, check off completed specs in the decomposition table.
4660
+
4661
+ ## 4. Commit
4662
+
4663
+ Commit all changes including the discovery file (if created) and spec status updates. The commit message should reference the spec if applicable.
4664
+
4665
+ ## 5. Push and PR (if autonomous git is enabled)
4666
+
4667
+ **Check CLAUDE.md or AGENTS.md for "Git Autonomy" in the Behavioral Boundaries section.** If it says "STRICTLY ENFORCED" or the ALWAYS section includes "Push to feature branches immediately after every commit":
4668
+
4669
+ 1. **Push immediately.** Run \`git push origin <branch>\` \u2014 do not ask, do not hesitate.
4670
+ 2. **Open a PR if the feature is complete.** Check the parent Feature Brief's decomposition table \u2014 if all specs are done, run \`gh pr create\` with a summary of all completed specs. Do not ask first.
4671
+ 3. **If not all specs are done,** still push. The PR comes when the last spec is complete.
4672
+
4673
+ If CLAUDE.md or AGENTS.md does NOT have autonomous git rules (or has "ASK FIRST" for pushing), ask the user before pushing.
4674
+
4675
+ ## 6. Report
4676
+
4677
+ \`\`\`
4678
+ Session complete.
4679
+ - Spec: [spec name] \u2014 [Complete / In Progress]
4680
+ - Build: [passing / failing]
4681
+ - Discoveries: [N items / none]
4682
+ - Pushed: [yes / no \u2014 and why not]
4683
+ - PR: [opened #N / not yet \u2014 N specs remaining]
4684
+ - Next: [what the next session should tackle]
4685
+ \`\`\`
4686
+ `,
4687
+ "joycraft-tune.md": `---
4688
+ name: joycraft-tune
4689
+ description: Assess and upgrade your project's AI development harness \u2014 score 7 dimensions, apply fixes, show path to Level 5
4690
+ ---
4691
+
4692
+ # Tune \u2014 Project Harness Assessment & Upgrade
4693
+
4694
+ You are evaluating and upgrading this project's AI development harness.
4695
+
4696
+ ## Step 1: Detect Harness State
4697
+
4698
+ Search the codebase for: CLAUDE.md (with meaningful content), \`docs/specs/\`, \`docs/briefs/\`, \`docs/discoveries/\`, \`.agents/skills/\`, and test configuration.
4699
+
4700
+ ## Step 2: Route
4701
+
4702
+ - **No harness** (no CLAUDE.md or just a README): Recommend \`npx joycraft init\` and stop.
4703
+ - **Harness exists**: Continue to assessment.
4704
+
4705
+ ## Step 3: Assess \u2014 Score 7 Dimensions (1-5 scale)
4706
+
4707
+ Read CLAUDE.md and explore the project. Score each with specific evidence:
4708
+
4709
+ | Dimension | What to Check |
4710
+ |-----------|--------------|
4711
+ | Spec Quality | \`docs/specs/\` \u2014 structured? acceptance criteria? self-contained? |
4712
+ | Spec Granularity | Can each spec be done in one session? |
4713
+ | Behavioral Boundaries | ALWAYS/ASK FIRST/NEVER sections (or equivalent rules under any heading) |
4714
+ | Skills & Hooks | \`.agents/skills/\` files, hooks config |
4715
+ | Documentation | \`docs/\` structure, templates, referenced from CLAUDE.md |
4716
+ | Knowledge Capture | \`docs/discoveries/\`, \`docs/context/*.md\` \u2014 existence AND real content |
4717
+ | Testing & Validation | Test framework, CI pipeline, validation commands in CLAUDE.md |
4718
+
4719
+ Score 1 = absent, 3 = partially there, 5 = comprehensive. Give credit for substance over format.
4720
+
4721
+ ## Step 4: Write Assessment
4722
+
4723
+ Write to \`docs/joycraft-assessment.md\` AND display it. Include: scores table, detailed findings (evidence + gap + recommendation per dimension), and an upgrade plan (up to 5 actions ordered by impact).
4724
+
4725
+ ## Step 5: Apply Upgrades
4726
+
4727
+ Apply using three tiers \u2014 do NOT ask per-item permission:
4728
+
4729
+ **Tier 1 (silent):** Create missing dirs, install missing skills, copy missing templates, create AGENTS.md.
4730
+
4731
+ **Before Tier 2, ask TWO things:**
4732
+
4733
+ 1. **Git autonomy:** Cautious (ask before push/PR) or Autonomous (push + PR without asking)?
4734
+ 2. **Risk interview (3-5 questions, one at a time):** What could break? What services connect to prod? Unwritten rules? Off-limits files/commands? Skip if \`docs/context/\` already has content.
4735
+
4736
+ From answers, generate: CLAUDE.md boundary rules, deny patterns configuration, \`docs/context/\` documents. Also recommend a permission mode (\`auto\` for most; \`dontAsk\` + allowlist for high-risk).
4737
+
4738
+ **Tier 2 (show diff):** Add missing CLAUDE.md sections (Boundaries, Workflow, Key Files). Draft from real codebase content. Append only \u2014 never reformat existing content.
4739
+
4740
+ **Tier 3 (confirm first):** Rewriting existing sections, overwriting customized files, suggesting test framework installs.
4741
+
4742
+ After applying, append to \`docs/joycraft-history.md\` and show a consolidated upgrade results table.
4743
+
4744
+ ## Step 6: Show Path to Level 5
4745
+
4746
+ Show a tailored roadmap: Level 2-5 table, specific next steps based on actual gaps, and the Level 5 north star (spec queue, autofix, holdout scenarios, self-improving harness).
4747
+
4748
+ ## Edge Cases
4749
+
4750
+ - **CLAUDE.md is just a README:** Treat as no harness.
4751
+ - **Non-Joycraft skills:** Acknowledge, don't replace.
4752
+ - **Rules under non-standard headings:** Give credit for substance.
4753
+ - **Previous assessment exists:** Read it first. If nothing to upgrade, say so.
4754
+ - **Non-Joycraft content in CLAUDE.md:** Preserve as-is. Only append.
4755
+ `,
4756
+ "joycraft-verify.md": `---
4757
+ name: joycraft-verify
4758
+ description: Spawn an independent verifier subagent to check an implementation against its spec -- read-only, no code edits, structured pass/fail verdict
4759
+ ---
4760
+
4761
+ # Verify Implementation Against Spec
4762
+
4763
+ The user wants independent verification of an implementation. Your job is to find the relevant spec, extract its acceptance criteria and test plan, then spawn a separate verifier subagent that checks each criterion and produces a structured verdict.
4764
+
4765
+ **Why a separate subagent?** Research found that agents reliably skew positive when grading their own work. Separating the agent doing the work from the agent judging it consistently outperforms self-evaluation. The verifier gets a clean context window with no implementation bias.
4766
+
4767
+ ## Step 1: Find the Spec
4768
+
4769
+ If the user provided a spec path (e.g., \`$joycraft-verify docs/specs/2026-03-26-add-widget.md\`), use that path directly.
4770
+
4771
+ If no path was provided, scan \`docs/specs/\` for spec files. Pick the most recently modified \`.md\` file in that directory. If \`docs/specs/\` doesn't exist or is empty, tell the user:
4772
+
4773
+ > No specs found in \`docs/specs/\`. Please provide a spec path: \`$joycraft-verify path/to/spec.md\`
4774
+
4775
+ ## Step 2: Read and Parse the Spec
4776
+
4777
+ Read the spec file and extract:
4778
+
4779
+ 1. **Spec name** -- from the H1 title
4780
+ 2. **Acceptance Criteria** -- the checklist under the \`## Acceptance Criteria\` section
4781
+ 3. **Test Plan** -- the table under the \`## Test Plan\` section, including any test commands
4782
+ 4. **Constraints** -- the \`## Constraints\` section if present
4783
+
4784
+ If the spec has no Acceptance Criteria section, tell the user:
4785
+
4786
+ > This spec doesn't have an Acceptance Criteria section. Verification needs criteria to check against. Add acceptance criteria to the spec and try again.
4787
+
4788
+ If the spec has no Test Plan section, note this but proceed -- the verifier can still check criteria by reading code and running any available project tests.
4789
+
4790
+ ## Step 3: Identify Test Commands
4791
+
4792
+ Look for test commands in these locations (in priority order):
4793
+
4794
+ 1. The spec's Test Plan section (look for commands in backticks or "Type" column entries like "unit", "integration", "e2e", "build")
4795
+ 2. The project's CLAUDE.md or AGENTS.md (look for test/build commands in the Development Workflow section)
4796
+ 3. Common defaults based on the project type:
4797
+ - Node.js: \`npm test\` or \`pnpm test --run\`
4798
+ - Python: \`pytest\`
4799
+ - Rust: \`cargo test\`
4800
+ - Go: \`go test ./...\`
4801
+
4802
+ Build a list of specific commands the verifier should run.
4803
+
4804
+ ## Step 4: Spawn the Verifier Subagent
4805
+
4806
+ Spawn a concurrent subagent thread with the following prompt. Replace the placeholders with the actual content extracted in Steps 2-3.
4807
+
4808
+ **Important:** The subagent must be given read-only constraints. It may search the codebase, read files, and run the specified test/build commands, but it must NOT edit or create any files.
4809
+
4810
+ \`\`\`
4811
+ You are a QA verifier. Your job is to independently verify an implementation against its spec. You have NO context about how the implementation was done -- you are checking it fresh.
4812
+
4813
+ RULES -- these are hard constraints, not suggestions:
4814
+ - You may search the codebase and read any file
4815
+ - You may RUN these specific test/build commands: [TEST_COMMANDS]
4816
+ - You may NOT edit, create, or delete any files
4817
+ - You may NOT run commands that modify state (no git commit, no npm install, no file writes)
4818
+ - You may NOT install packages or access the network
4819
+ - Report what you OBSERVE, not what you expect or hope
4820
+
4821
+ SPEC NAME: [SPEC_NAME]
4822
+
4823
+ ACCEPTANCE CRITERIA:
4824
+ [ACCEPTANCE_CRITERIA]
4825
+
4826
+ TEST PLAN:
4827
+ [TEST_PLAN]
4828
+
4829
+ CONSTRAINTS:
4830
+ [CONSTRAINTS_OR_NONE]
4831
+
4832
+ YOUR TASK:
4833
+ For each acceptance criterion, determine if it PASSES or FAILS based on evidence:
4834
+
4835
+ 1. Run the test commands listed above. Record the output.
4836
+ 2. For each acceptance criterion:
4837
+ a. Check if there is a corresponding test and whether it passes
4838
+ b. If no test exists, read the relevant source files to verify the criterion is met
4839
+ c. If the criterion cannot be verified by reading code or running tests, mark it MANUAL CHECK NEEDED
4840
+ 3. For criteria about build/test passing, actually run the commands and report results.
4841
+
4842
+ OUTPUT FORMAT -- you MUST use this exact format:
4843
+
4844
+ VERIFICATION REPORT
4845
+
4846
+ | # | Criterion | Verdict | Evidence |
4847
+ |---|-----------|---------|----------|
4848
+ | 1 | [criterion text] | PASS/FAIL/MANUAL CHECK NEEDED | [what you observed] |
4849
+ | 2 | [criterion text] | PASS/FAIL/MANUAL CHECK NEEDED | [what you observed] |
4850
+ [continue for all criteria]
4851
+
4852
+ SUMMARY: X/Y criteria passed. [Z failures need attention. / All criteria verified.]
4853
+
4854
+ If any test commands fail to run (missing dependencies, wrong command, etc.), report the error as evidence for a FAIL verdict on the relevant criterion.
4855
+ \`\`\`
4856
+
4857
+ ## Step 5: Format and Present the Verdict
4858
+
4859
+ Take the subagent's response and present it to the user in this format:
4860
+
4861
+ \`\`\`
4862
+ ## Verification Report -- [Spec Name]
4863
+
4864
+ | # | Criterion | Verdict | Evidence |
4865
+ |---|-----------|---------|----------|
4866
+ | 1 | ... | PASS | ... |
4867
+ | 2 | ... | FAIL | ... |
4868
+
4869
+ **Overall: X/Y criteria passed.**
4870
+
4871
+ [If all passed:]
4872
+ All criteria verified. Ready to commit and open a PR.
4873
+
4874
+ [If any failed:]
4875
+ N failures need attention. Review the evidence above and fix before proceeding.
4876
+
4877
+ [If any MANUAL CHECK NEEDED:]
4878
+ N criteria need manual verification -- they can't be checked by reading code or running tests alone.
4879
+ \`\`\`
4880
+
4881
+ ## Step 6: Suggest Next Steps
4882
+
4883
+ Based on the verdict:
4884
+
4885
+ - **All PASS:** Suggest committing and opening a PR, or running \`$joycraft-session-end\` to capture discoveries.
4886
+ - **Some FAIL:** List the failed criteria and suggest the user fix them, then run \`$joycraft-verify\` again.
4887
+ - **MANUAL CHECK NEEDED items:** Explain what needs human eyes and why automation couldn't verify it.
4888
+
4889
+ **Do NOT offer to fix failures yourself.** The verifier reports; the human (or implementation agent in a separate turn) decides what to do. This separation is the whole point.
4890
+
4891
+ ## Edge Cases
4892
+
4893
+ | Scenario | Behavior |
4894
+ |----------|----------|
4895
+ | Spec has no Test Plan | Warn that verification is weaker without a test plan, but proceed by checking criteria through code reading and any available project-level tests |
4896
+ | All tests pass but a criterion is not testable | Mark as MANUAL CHECK NEEDED with explanation |
4897
+ | Subagent can't run tests (missing deps) | Report the error as FAIL evidence |
4898
+ | No specs found and no path given | Tell user to provide a spec path or create a spec first |
4899
+ | Spec status is "Complete" | Still run verification -- "Complete" means the implementer thinks it's done, verification confirms |
3468
4900
  `
3469
4901
  };
3470
4902
 
3471
4903
  export {
3472
4904
  SKILLS,
3473
- TEMPLATES
4905
+ TEMPLATES,
4906
+ CODEX_SKILLS
3474
4907
  };
3475
- //# sourceMappingURL=chunk-Y6GBN6R4.js.map
4908
+ //# sourceMappingURL=chunk-4RGMUQQZ.js.map