kairn-cli 1.10.0 → 1.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/dist/cli.js CHANGED
@@ -460,24 +460,79 @@ import Anthropic2 from "@anthropic-ai/sdk";
460
460
  import OpenAI2 from "openai";
461
461
 
462
462
  // src/compiler/prompt.ts
463
- var SYSTEM_PROMPT = `You are the Kairn environment compiler. Your job is to generate a minimal, optimal Claude Code agent environment from a user's natural language description of what they want their agent to do.
463
+ var SKELETON_PROMPT = `You are the Kairn skeleton compiler. Your job is to select tools and outline the project structure from a user's natural language description.
464
464
 
465
465
  You will receive:
466
466
  1. The user's intent (what they want to build/do)
467
467
  2. A tool registry (available MCP servers, plugins, and hooks)
468
468
 
469
- You must output a JSON object matching the EnvironmentSpec schema.
469
+ You must output a JSON object matching the SkeletonSpec schema.
470
470
 
471
471
  ## Core Principles
472
472
 
473
473
  - **Minimalism over completeness.** Fewer, well-chosen tools beat many generic ones. Each MCP server costs 500-2000 context tokens.
474
+ - **Workflow-specific, not generic.** Select tools that directly support the user's actual workflow.
475
+ - **Security by default.** Essential for all projects.
476
+
477
+ ## Tool Selection Rules
478
+
479
+ - Only select tools directly relevant to the described workflow
480
+ - Prefer free tools (auth: "none") when quality is comparable
481
+ - Tier 1 tools (Context7, Sequential Thinking, security-guidance) should be included in most environments
482
+ - For tools requiring API keys (auth: "api_key"), use \${ENV_VAR} syntax \u2014 never hardcode keys
483
+ - Maximum 6-8 MCP servers to avoid context bloat
484
+ - Include a \`reason\` for each selected tool explaining why it fits this workflow
485
+
486
+ ## Context Budget (STRICT)
487
+
488
+ - MCP servers: maximum 6. Prefer fewer.
489
+ - Skills: maximum 3. Only include directly relevant ones.
490
+ - Agents: maximum 5. Orchestration pipeline (/develop) agents.
491
+ - Hooks: maximum 4 (auto-format, block-destructive, PostCompact, plus one contextual).
492
+
493
+ If the workflow doesn't clearly need a tool, DO NOT include it.
494
+ Each MCP server costs 500-2000 tokens of context window.
495
+
496
+ ## Output Schema
497
+
498
+ Return ONLY valid JSON matching this structure:
499
+
500
+ \`\`\`json
501
+ {
502
+ "name": "short-kebab-case-name",
503
+ "description": "One-line description",
504
+ "tools": [
505
+ { "tool_id": "id-from-registry", "reason": "why this tool fits" }
506
+ ],
507
+ "outline": {
508
+ "tech_stack": ["Python", "pandas"],
509
+ "workflow_type": "data-analysis",
510
+ "key_commands": ["ingest", "analyze", "report"],
511
+ "custom_rules": ["data-integrity"],
512
+ "custom_agents": ["data-reviewer"],
513
+ "custom_skills": ["ms-data-analysis"]
514
+ }
515
+ }
516
+ \`\`\`
517
+
518
+ Return ONLY valid JSON. No markdown fences. No text outside the JSON.`;
519
+ var HARNESS_PROMPT = `You are the Kairn harness compiler. Your job is to generate the full environment content from a project skeleton.
520
+
521
+ You will receive:
522
+ 1. The skeleton (tool selections + project outline)
523
+ 2. The user's original intent
524
+
525
+ You must generate all harness content: CLAUDE.md, commands, rules, agents, skills, and docs.
526
+
527
+ ## Core Principles
528
+
474
529
  - **Workflow-specific, not generic.** Every instruction, command, and rule must relate to the user's actual workflow.
475
- - **Concise CLAUDE.md.** Under 120 lines. No generic text like "be helpful." Include build/test commands, reference docs/ and skills/.
530
+ - **Concise CLAUDE.md.** Under 150 lines. No generic text like "be helpful." Include build/test commands, reference docs/ and skills/.
476
531
  - **Security by default.** Always include deny rules for destructive commands and secret file access.
477
532
 
478
533
  ## CLAUDE.md Template (mandatory structure)
479
534
 
480
- The \`claude_md\` field MUST follow this exact structure (max 120 lines):
535
+ The \`claude_md\` field MUST follow this exact structure (max 150 lines):
481
536
 
482
537
  \`\`\`
483
538
  # {Project Name}
@@ -526,6 +581,25 @@ Use subagents for deep investigation to keep main context clean.
526
581
  - Prefer small, focused commits (one feature or fix per commit)
527
582
  - Use conventional commits: feat:, fix:, docs:, refactor:, test:
528
583
  - Target < 200 lines per PR when possible
584
+
585
+ ## Engineering Standards
586
+ - Lead with answers over reasoning. Be concise.
587
+ - Use absolute file paths in all references.
588
+ - No filler, no inner monologue, no time estimates.
589
+ - Produce load-bearing code \u2014 every line of output should be actionable.
590
+
591
+ ## Tool Usage Policy
592
+ - Prefer Edit tool over sed/awk for file modifications
593
+ - Prefer Grep tool over rg for searching
594
+ - Prefer Read tool over cat for file reading
595
+ - Reserve Bash for: builds, installs, git, network, processes
596
+ - Read and understand existing code before modifying
597
+ - Delete unused code completely \u2014 no compatibility shims
598
+
599
+ ## Code Philosophy
600
+ - Do not create abstractions for one-time operations
601
+ - Complete the task fully \u2014 don't gold-plate, but don't leave it half-done
602
+ - Prefer editing existing files over creating new ones
529
603
  \`\`\`
530
604
 
531
605
  Do not add generic filler. Every line must be specific to the user's workflow.
@@ -534,20 +608,19 @@ Do not add generic filler. Every line must be specific to the user's workflow.
534
608
 
535
609
  1. A concise, workflow-specific \`claude_md\` (the CLAUDE.md content)
536
610
  2. A \`/project:help\` command that explains the environment
537
- 3. A \`/project:tasks\` command for task management via TODO.md
538
- 4. A \`docs/TODO.md\` file for continuity
539
- 5. A \`docs/DECISIONS.md\` file for architectural decisions
540
- 6. A \`docs/LEARNINGS.md\` file for non-obvious discoveries
541
- 7. A \`rules/continuity.md\` rule encouraging updates to DECISIONS.md and LEARNINGS.md
542
- 8. A \`rules/security.md\` rule with essential security instructions
543
- 9. settings.json with deny rules for \`rm -rf\`, \`curl|sh\`, reading \`.env\` and \`secrets/\`
544
- 10. A \`/project:status\` command for code projects (uses ! for live git/test output)
545
- 11. A \`/project:fix\` command for code projects (uses $ARGUMENTS for issue number)
546
- 12. A \`docs/SPRINT.md\` file for sprint contracts (acceptance criteria, verification steps)
547
- 13. A "Verification" section in CLAUDE.md with concrete verify commands for the project
548
- 14. A "Known Gotchas" section in CLAUDE.md (starts empty, grows with corrections)
549
- 15. A "Debugging" section in CLAUDE.md (2 lines: paste raw errors, use subagents)
550
- 16. A "Git Workflow" section in CLAUDE.md (3 rules: small commits, conventional format, <200 lines PR)
611
+ 3. A \`docs/DECISIONS.md\` file for architectural decisions
612
+ 4. A \`docs/LEARNINGS.md\` file for non-obvious discoveries
613
+ 5. A \`rules/continuity.md\` rule encouraging updates to DECISIONS.md and LEARNINGS.md
614
+ 6. A \`rules/security.md\` rule with essential security instructions
615
+ 7. settings.json with deny rules for \`rm -rf\`, \`curl|sh\`, reading \`.env\` and \`secrets/\`
616
+ 8. A \`/project:status\` command for code projects (uses ! for live git/SPRINT.md output)
617
+ 9. A \`/project:fix\` command for code projects (uses $ARGUMENTS for issue number)
618
+ 10. A \`docs/SPRINT.md\` file as the living spec/plan (replaces TODO.md \u2014 acceptance criteria, verification steps)
619
+ 11. A "Verification" section in CLAUDE.md with concrete verify commands for the project
620
+ 12. A "Known Gotchas" section in CLAUDE.md (starts empty, grows with corrections)
621
+ 13. A "Debugging" section in CLAUDE.md (2 lines: paste raw errors, use subagents)
622
+ 14. A "Git Workflow" section in CLAUDE.md (3 rules: small commits, conventional format, <200 lines PR)
623
+ 15. "Engineering Standards", "Tool Usage Policy", and "Code Philosophy" sections in CLAUDE.md
551
624
 
552
625
  ## Shell-Integrated Commands
553
626
 
@@ -656,37 +729,16 @@ All projects should include a PostCompact hook to restore context after compacti
656
729
 
657
730
  Merge this into the settings hooks alongside the PreToolUse and PostToolUse hooks.
658
731
 
659
- ## Tool Selection Rules
660
-
661
- - Only select tools directly relevant to the described workflow
662
- - Prefer free tools (auth: "none") when quality is comparable
663
- - Tier 1 tools (Context7, Sequential Thinking, security-guidance) should be included in most environments
664
- - For tools requiring API keys (auth: "api_key"), use \${ENV_VAR} syntax \u2014 never hardcode keys
665
- - Maximum 6-8 MCP servers to avoid context bloat
666
- - Include a \`reason\` for each selected tool explaining why it fits this workflow
667
-
668
- ## Context Budget (STRICT)
669
-
670
- - MCP servers: maximum 6. Prefer fewer.
671
- - CLAUDE.md: maximum 120 lines.
672
- - Rules: maximum 5 files, each under 20 lines.
673
- - Skills: maximum 3. Only include directly relevant ones.
674
- - Agents: maximum 3. QA pipeline + one specialist.
675
- - Commands: no limit (loaded on demand, zero context cost).
676
- - Hooks: maximum 4 (auto-format, block-destructive, PostCompact, plus one contextual).
677
-
678
- If the workflow doesn't clearly need a tool, DO NOT include it.
679
- Each MCP server costs 500-2000 tokens of context window.
680
-
681
732
  ## For Code Projects, Additionally Include
682
733
 
683
734
  - \`/project:plan\` command (plan before coding)
684
735
  - \`/project:review\` command (review changes)
685
736
  - \`/project:test\` command (run and fix tests)
686
737
  - \`/project:commit\` command (conventional commits)
687
- - \`/project:status\` command (live git status, recent commits, TODO overview using ! prefix)
738
+ - \`/project:status\` command (live git status, recent commits, SPRINT.md overview using ! prefix)
688
739
  - \`/project:fix\` command (takes $ARGUMENTS as issue number, plans fix, implements, tests, commits)
689
740
  - \`/project:sprint\` command (define acceptance criteria before coding, writes to docs/SPRINT.md)
741
+ - \`/project:develop\` command (full development pipeline \u2014 orchestrates @architect \u2192 @planner \u2192 @implementer \u2192 @verifier \u2192 @fixer \u2192 @grill \u2192 @doc-updater through spec, plan, TDD implement, review, and doc update phases)
690
742
  - A TDD skill using the 3-phase isolation pattern (RED \u2192 GREEN \u2192 REFACTOR):
691
743
  - RED: Write failing test only. Verify it FAILS.
692
744
  - GREEN: Write MINIMUM code to pass. Nothing extra.
@@ -696,6 +748,12 @@ Each MCP server costs 500-2000 tokens of context window.
696
748
  - \`@qa-orchestrator\` (sonnet) \u2014 delegates to linter and e2e-tester, compiles QA report
697
749
  - \`@linter\` (haiku) \u2014 runs formatters, linters, security scanners
698
750
  - \`@e2e-tester\` (sonnet, only when Playwright is in tools) \u2014 browser-based QA via Playwright
751
+ - Development pipeline agents (used by /project:develop):
752
+ - \`@architect\` (opus) \u2014 conducts spec interview with user, writes confirmed spec to docs/SPRINT.md
753
+ - \`@planner\` (opus) \u2014 reads spec and codebase, creates step-by-step implementation plan in docs/PLAN.md
754
+ - \`@implementer\` (sonnet) \u2014 TDD-focused implementation, writes failing tests then minimum code to pass
755
+ - \`@fixer\` (sonnet) \u2014 targeted bug fixing from verifier/review feedback
756
+ - \`@doc-updater\` (haiku) \u2014 extracts decisions and learnings from completed work, updates docs/DECISIONS.md and docs/LEARNINGS.md
699
757
  - \`/project:spec\` command (interview-based spec creation \u2014 asks 5-8 questions one at a time, writes structured spec to docs/SPRINT.md, does NOT start coding until confirmed)
700
758
  - \`/project:prove\` command (runs tests, shows git diff vs main, rates confidence HIGH/MEDIUM/LOW with evidence)
701
759
  - \`/project:grill\` command (adversarial code review \u2014 challenges each change with "why this approach?", "what if X input?", rates BLOCKER/SHOULD-FIX/NITPICK, blocks until BLOCKERs resolved)
@@ -741,6 +799,151 @@ If no autonomy level is specified, assume Level 1 (Guided).
741
799
 
742
800
  Return ONLY valid JSON matching this structure:
743
801
 
802
+ \`\`\`json
803
+ {
804
+ "claude_md": "Full CLAUDE.md content (under 150 lines)",
805
+ "commands": { "help": "...", "develop": "...", "status": "...", "fix": "...", "sprint": "...", "spec": "...", "prove": "...", "grill": "...", "reset": "..." },
806
+ "rules": { "continuity": "...", "security": "..." },
807
+ "agents": { "architect": "...", "planner": "...", "implementer": "...", "fixer": "...", "doc-updater": "...", "qa-orchestrator": "...", "linter": "...", "e2e-tester": "..." },
808
+ "skills": { "skill-name/SKILL": "..." },
809
+ "docs": { "DECISIONS": "...", "LEARNINGS": "...", "SPRINT": "..." }
810
+ }
811
+ \`\`\`
812
+
813
+ Return ONLY valid JSON. No markdown fences. No text outside the JSON.`;
814
+ var SYSTEM_PROMPT = `You are the Kairn environment compiler. Your job is to generate a minimal, optimal Claude Code agent environment from a user's natural language description of what they want their agent to do.
815
+
816
+ You will receive:
817
+ 1. The user's intent (what they want to build/do)
818
+ 2. A tool registry (available MCP servers, plugins, and hooks)
819
+
820
+ You must output a JSON object matching the EnvironmentSpec schema.
821
+
822
+ ## Core Principles
823
+
824
+ - **Minimalism over completeness.** Fewer, well-chosen tools beat many generic ones. Each MCP server costs 500-2000 context tokens.
825
+ - **Workflow-specific, not generic.** Every instruction, command, and rule must relate to the user's actual workflow.
826
+ - **Concise CLAUDE.md.** Under 150 lines. No generic text like "be helpful." Include build/test commands, reference docs/ and skills/.
827
+ - **Security by default.** Always include deny rules for destructive commands and secret file access.
828
+
829
+ ## CLAUDE.md Template (mandatory structure)
830
+
831
+ The \`claude_md\` field MUST follow this exact structure (max 150 lines):
832
+
833
+ \`\`\`
834
+ # {Project Name}
835
+
836
+ ## Purpose
837
+ {one-line description}
838
+
839
+ ## Tech Stack
840
+ {bullet list of frameworks/languages}
841
+
842
+ ## Commands
843
+ {concrete build/test/lint/dev commands}
844
+
845
+ ## Architecture
846
+ {brief folder structure, max 10 lines}
847
+
848
+ ## Conventions
849
+ {3-5 specific coding rules}
850
+
851
+ ## Key Commands
852
+ {list /project: commands with descriptions}
853
+
854
+ ## Output
855
+ {where results go, key files}
856
+
857
+ ## Verification
858
+ After implementing any change, verify it works:
859
+ - {build command} \u2014 must pass with no errors
860
+ - {test command} \u2014 all tests must pass
861
+ - {lint command} \u2014 no warnings or errors
862
+ - {type check command} \u2014 no type errors
863
+
864
+ If any verification step fails, fix the issue before moving on.
865
+ Do NOT skip verification steps.
866
+
867
+ ## Known Gotchas
868
+ <!-- After any correction, add it here: "Update CLAUDE.md so you don't make that mistake again." -->
869
+ <!-- Prune this section when it exceeds 10 items \u2014 keep only the recurring ones. -->
870
+ - (none yet \u2014 this section grows as you work)
871
+
872
+ ## Debugging
873
+ When debugging, paste raw error output. Don't summarize \u2014 Claude works better with raw data.
874
+ Use subagents for deep investigation to keep main context clean.
875
+
876
+ ## Git Workflow
877
+ - Prefer small, focused commits (one feature or fix per commit)
878
+ - Use conventional commits: feat:, fix:, docs:, refactor:, test:
879
+ - Target < 200 lines per PR when possible
880
+
881
+ ## Engineering Standards
882
+ - Lead with answers over reasoning. Be concise.
883
+ - Use absolute file paths in all references.
884
+ - No filler, no inner monologue, no time estimates.
885
+ - Produce load-bearing code \u2014 every line of output should be actionable.
886
+
887
+ ## Tool Usage Policy
888
+ - Prefer Edit tool over sed/awk for file modifications
889
+ - Prefer Grep tool over rg for searching
890
+ - Prefer Read tool over cat for file reading
891
+ - Reserve Bash for: builds, installs, git, network, processes
892
+ - Read and understand existing code before modifying
893
+ - Delete unused code completely \u2014 no compatibility shims
894
+
895
+ ## Code Philosophy
896
+ - Do not create abstractions for one-time operations
897
+ - Complete the task fully \u2014 don't gold-plate, but don't leave it half-done
898
+ - Prefer editing existing files over creating new ones
899
+ \`\`\`
900
+
901
+ Do not add generic filler. Every line must be specific to the user's workflow.
902
+
903
+ ## What You Must Always Include
904
+
905
+ 1. A concise, workflow-specific \`claude_md\` (the CLAUDE.md content)
906
+ 2. A \`/project:help\` command that explains the environment
907
+ 3. A \`docs/DECISIONS.md\` file for architectural decisions
908
+ 4. A \`docs/LEARNINGS.md\` file for non-obvious discoveries
909
+ 5. A \`rules/continuity.md\` rule encouraging updates to DECISIONS.md and LEARNINGS.md
910
+ 6. A \`rules/security.md\` rule with essential security instructions
911
+ 7. settings.json with deny rules for \`rm -rf\`, \`curl|sh\`, reading \`.env\` and \`secrets/\`
912
+ 8. A \`/project:status\` command for code projects (uses ! for live git/SPRINT.md output)
913
+ 9. A \`/project:fix\` command for code projects (uses $ARGUMENTS for issue number)
914
+ 10. A \`docs/SPRINT.md\` file as the living spec/plan (replaces TODO.md \u2014 acceptance criteria, verification steps)
915
+ 11. A "Verification" section in CLAUDE.md with concrete verify commands for the project
916
+ 12. A "Known Gotchas" section in CLAUDE.md (starts empty, grows with corrections)
917
+ 13. A "Debugging" section in CLAUDE.md (2 lines: paste raw errors, use subagents)
918
+ 14. A "Git Workflow" section in CLAUDE.md (3 rules: small commits, conventional format, <200 lines PR)
919
+ 15. "Engineering Standards", "Tool Usage Policy", and "Code Philosophy" sections in CLAUDE.md
920
+
921
+ ## Tool Selection Rules
922
+
923
+ - Only select tools directly relevant to the described workflow
924
+ - Prefer free tools (auth: "none") when quality is comparable
925
+ - Tier 1 tools (Context7, Sequential Thinking, security-guidance) should be included in most environments
926
+ - For tools requiring API keys (auth: "api_key"), use \${ENV_VAR} syntax \u2014 never hardcode keys
927
+ - Maximum 6-8 MCP servers to avoid context bloat
928
+ - Include a \`reason\` for each selected tool explaining why it fits this workflow
929
+
930
+ ## Context Budget (STRICT)
931
+
932
+ - MCP servers: maximum 6. Prefer fewer.
933
+ - CLAUDE.md: maximum 150 lines.
934
+ - Rules: maximum 5 files, each under 20 lines.
935
+ - Skills: maximum 3. Only include directly relevant ones.
936
+ - Agents: maximum 5. Orchestration pipeline (/develop) agents.
937
+ - Commands: no limit (loaded on demand, zero context cost).
938
+ - Hooks: maximum 4 (auto-format, block-destructive, PostCompact, plus one contextual).
939
+
940
+ If the workflow doesn't clearly need a tool, DO NOT include it.
941
+ Each MCP server costs 500-2000 tokens of context window.
942
+
943
+ ## Output Schema
944
+
945
+ Return ONLY valid JSON matching this structure:
946
+
744
947
  \`\`\`json
745
948
  {
746
949
  "name": "short-kebab-case-name",
@@ -749,7 +952,7 @@ Return ONLY valid JSON matching this structure:
749
952
  { "tool_id": "id-from-registry", "reason": "why this tool fits" }
750
953
  ],
751
954
  "harness": {
752
- "claude_md": "The full CLAUDE.md content (under 120 lines)",
955
+ "claude_md": "The full CLAUDE.md content (under 150 lines)",
753
956
  "settings": {
754
957
  "permissions": {
755
958
  "allow": ["Bash(npm run *)", "Read", "Write", "Edit"],
@@ -761,14 +964,7 @@ Return ONLY valid JSON matching this structure:
761
964
  },
762
965
  "commands": {
763
966
  "help": "markdown content for /project:help",
764
- "tasks": "markdown content for /project:tasks",
765
- "status": "Show project status:\\n\\n!git status --short\\n\\n!git log --oneline -5\\n\\nRead TODO.md and summarize progress.",
766
- "fix": "Fix issue #$ARGUMENTS:\\n\\n1. Read the issue and understand the problem\\n2. Plan the fix\\n3. Implement the fix\\n4. Run tests:\\n\\n!npm test 2>&1 | tail -20\\n\\n5. Commit with: fix: resolve #$ARGUMENTS",
767
- "sprint": "Define a sprint contract for the next feature:\\n\\n1. Read docs/TODO.md for context:\\n\\n!cat docs/TODO.md 2>/dev/null\\n\\n2. Write a CONTRACT to docs/SPRINT.md with: feature name, acceptance criteria, verification steps, files to modify, scope estimate.\\n3. Do NOT start coding until contract is confirmed.",
768
- "spec": "Before building this feature, interview me to create a complete spec.\\n\\nAsk me 5-8 questions, one at a time:\\n1. What specifically should this feature do?\\n2. Who uses it and how?\\n3. What are the edge cases or error states?\\n4. How will we know it works? (acceptance criteria)\\n5. What should it explicitly NOT do? (scope boundaries)\\n6. Any dependencies, APIs, or constraints?\\n7. How does it fit with existing code?\\n8. Priority: speed, quality, or flexibility?\\n\\nAfter my answers, write a structured spec to docs/SPRINT.md:\\n- Feature name\\n- Description (from my answers, not invented)\\n- Acceptance criteria (testable)\\n- Out of scope\\n- Technical approach\\n\\nDo NOT start coding until I confirm the spec.",
769
- "prove": "Prove the current implementation works.\\n\\n1. Run the full test suite:\\n\\n!npm test 2>&1\\n\\n2. Compare against main:\\n\\n!git diff main --stat 2>/dev/null\\n\\n3. Show evidence:\\n - Test results (pass/fail counts)\\n - Behavioral diff (main vs this branch)\\n - Edge cases tested\\n - Error handling verified\\n\\n4. Rate confidence:\\n - HIGH: All tests pass, edge cases covered, no regressions\\n - MEDIUM: Core works, some edges untested\\n - LOW: Needs more verification\\n\\nIf LOW or MEDIUM, explain what's missing and fix it.",
770
- "grill": "Review the current changes adversarially.\\n\\n!git diff --staged 2>/dev/null || git diff HEAD 2>/dev/null\\n\\nAct as a senior engineer. For each file changed:\\n\\n1. \\"Why this approach over X?\\"\\n2. \\"What happens if Y input?\\"\\n3. \\"Performance impact of Z?\\"\\n4. \\"This could break if...\\"\\n\\nFor each concern:\\n- Severity: BLOCKER / SHOULD-FIX / NITPICK\\n- The exact scenario that could fail\\n- A suggested alternative\\n\\nDo NOT approve until all BLOCKERs are resolved.",
771
- "reset": "Stop. Read docs/DECISIONS.md and docs/LEARNINGS.md.\\n\\nConsidering everything we've learned:\\n1. What was the original approach?\\n2. What went wrong or feels inelegant?\\n3. What would the clean solution look like?\\n\\nPropose the new approach. Do NOT implement yet.\\nIf I approve, stash current changes:\\n git stash -m \\"pre-reset: $(date +%Y%m%d-%H%M)\\"\\n\\nThen implement the elegant solution."
967
+ "develop": "markdown content for /project:develop"
772
968
  },
773
969
  "rules": {
774
970
  "continuity": "markdown content for continuity rule",
@@ -778,15 +974,16 @@ Return ONLY valid JSON matching this structure:
778
974
  "skill-name/SKILL": "markdown content with YAML frontmatter"
779
975
  },
780
976
  "agents": {
781
- "qa-orchestrator": "---\\nname: qa-orchestrator\\ndescription: Orchestrates QA pipeline\\nmodel: sonnet\\n---\\nRun QA: delegate to @linter for static analysis, @e2e-tester for browser tests. Compile consolidated report.",
782
- "linter": "---\\nname: linter\\ndescription: Fast static analysis\\nmodel: haiku\\n---\\nRun available linters (eslint, prettier, biome, ruff, mypy, semgrep). Report issues.",
783
- "e2e-tester": "---\\nname: e2e-tester\\ndescription: Browser-based QA via Playwright\\nmodel: sonnet\\n---\\nTest user flows via Playwright. Verify behavior, not just DOM. Screenshot failures."
977
+ "architect": "agent markdown with YAML frontmatter",
978
+ "planner": "agent markdown with YAML frontmatter",
979
+ "implementer": "agent markdown with YAML frontmatter",
980
+ "fixer": "agent markdown with YAML frontmatter",
981
+ "doc-updater": "agent markdown with YAML frontmatter"
784
982
  },
785
983
  "docs": {
786
- "TODO": "# TODO\\n\\n- [ ] First task based on workflow",
787
- "DECISIONS": "# Decisions\\n\\nArchitectural decisions for this project.",
788
- "LEARNINGS": "# Learnings\\n\\nNon-obvious discoveries and gotchas.",
789
- "SPRINT": "# Sprint Contract\\n\\nDefine acceptance criteria before starting work."
984
+ "DECISIONS": "# Decisions\\n\\nArchitectural decisions.",
985
+ "LEARNINGS": "# Learnings\\n\\nNon-obvious discoveries.",
986
+ "SPRINT": "# Sprint\\n\\nLiving spec and plan."
790
987
  }
791
988
  }
792
989
  }
@@ -864,7 +1061,7 @@ async function loadRegistry() {
864
1061
  }
865
1062
 
866
1063
  // src/compiler/compile.ts
867
- function buildUserMessage(intent, registry) {
1064
+ function buildSkeletonMessage(intent, registry) {
868
1065
  const registrySummary = registry.map(
869
1066
  (t) => `- ${t.id} (${t.type}, tier ${t.tier}, auth: ${t.auth}): ${t.description} [best_for: ${t.best_for.join(", ")}]`
870
1067
  ).join("\n");
@@ -876,25 +1073,60 @@ ${intent}
876
1073
 
877
1074
  ${registrySummary}
878
1075
 
879
- Generate the EnvironmentSpec JSON now.`;
1076
+ Generate the skeleton JSON now.`;
1077
+ }
1078
+ function buildHarnessMessage(intent, skeleton, concise) {
1079
+ const skeletonJson = JSON.stringify(skeleton, null, 2);
1080
+ const conciseNote = concise ? "\n\nIMPORTANT: Be concise. Maximum 80 lines for claude_md. Maximum 5 commands. Keep all content brief." : "";
1081
+ return `## User Intent
1082
+
1083
+ ${intent}
1084
+
1085
+ ## Project Skeleton
1086
+
1087
+ ${skeletonJson}
1088
+
1089
+ Generate the harness content JSON now.${conciseNote}`;
880
1090
  }
881
- function parseSpecResponse(text) {
1091
+ function parseSkeletonResponse(text) {
882
1092
  let cleaned = text.trim();
883
1093
  if (cleaned.startsWith("```")) {
884
1094
  cleaned = cleaned.replace(/^```(?:json)?\n?/, "").replace(/\n?```$/, "");
885
1095
  }
886
1096
  const jsonMatch = cleaned.match(/\{[\s\S]*\}/);
887
1097
  if (!jsonMatch) {
1098
+ throw new Error("Pass 1 (skeleton) did not return valid JSON.");
1099
+ }
1100
+ try {
1101
+ const parsed = JSON.parse(jsonMatch[0]);
1102
+ if (!parsed.name || !parsed.tools || !Array.isArray(parsed.tools)) {
1103
+ throw new Error("Skeleton missing required fields: name, tools");
1104
+ }
1105
+ return parsed;
1106
+ } catch (err) {
888
1107
  throw new Error(
889
- "LLM response did not contain valid JSON. Try again or use a different model."
1108
+ `Failed to parse skeleton JSON: ${err instanceof Error ? err.message : String(err)}`
890
1109
  );
891
1110
  }
1111
+ }
1112
+ function parseHarnessResponse(text) {
1113
+ let cleaned = text.trim();
1114
+ if (cleaned.startsWith("```")) {
1115
+ cleaned = cleaned.replace(/^```(?:json)?\n?/, "").replace(/\n?```$/, "");
1116
+ }
1117
+ const jsonMatch = cleaned.match(/\{[\s\S]*\}/);
1118
+ if (!jsonMatch) {
1119
+ throw new Error("Pass 2 (harness) did not return valid JSON.");
1120
+ }
892
1121
  try {
893
- return JSON.parse(jsonMatch[0]);
1122
+ const parsed = JSON.parse(jsonMatch[0]);
1123
+ if (!parsed.claude_md || !parsed.commands) {
1124
+ throw new Error("Harness missing required fields: claude_md, commands");
1125
+ }
1126
+ return parsed;
894
1127
  } catch (err) {
895
1128
  throw new Error(
896
- `Failed to parse LLM response as JSON: ${err instanceof Error ? err.message : String(err)}
897
- Response started with: ${cleaned.slice(0, 200)}...`
1129
+ `Failed to parse harness JSON: ${err instanceof Error ? err.message : String(err)}`
898
1130
  );
899
1131
  }
900
1132
  }
@@ -928,15 +1160,17 @@ function classifyError(err, provider) {
928
1160
  }
929
1161
  return `${provider} API error: ${msg}`;
930
1162
  }
931
- async function callLLM(config, userMessage) {
1163
+ async function callLLM(config, userMessage, options) {
1164
+ const maxTokens = options?.maxTokens ?? 8192;
1165
+ const systemPrompt = options?.systemPrompt ?? SYSTEM_PROMPT;
932
1166
  const providerName = getProviderName(config.provider);
933
1167
  if (config.provider === "anthropic") {
934
1168
  const client2 = new Anthropic2({ apiKey: config.api_key });
935
1169
  try {
936
1170
  const response = await client2.messages.create({
937
1171
  model: config.model,
938
- max_tokens: 8192,
939
- system: SYSTEM_PROMPT,
1172
+ max_tokens: maxTokens,
1173
+ system: systemPrompt,
940
1174
  messages: [{ role: "user", content: userMessage }]
941
1175
  });
942
1176
  const textBlock = response.content.find((block) => block.type === "text");
@@ -955,9 +1189,9 @@ async function callLLM(config, userMessage) {
955
1189
  try {
956
1190
  const response = await client.chat.completions.create({
957
1191
  model: config.model,
958
- max_tokens: 8192,
1192
+ max_tokens: maxTokens,
959
1193
  messages: [
960
- { role: "system", content: SYSTEM_PROMPT },
1194
+ { role: "system", content: systemPrompt },
961
1195
  { role: "user", content: userMessage }
962
1196
  ]
963
1197
  });
@@ -970,6 +1204,66 @@ async function callLLM(config, userMessage) {
970
1204
  throw new Error(classifyError(err, providerName));
971
1205
  }
972
1206
  }
1207
+ function buildSettings(skeleton, registry) {
1208
+ const selectedTools = skeleton.tools.map((t) => registry.find((r) => r.id === t.tool_id)).filter(Boolean);
1209
+ const allow = ["Read", "Write", "Edit", "Bash(npm run *)", "Bash(npx *)"];
1210
+ const deny = [
1211
+ "Bash(rm -rf *)",
1212
+ "Bash(curl * | sh)",
1213
+ "Bash(wget * | sh)",
1214
+ "Read(./.env)",
1215
+ "Read(./secrets/**)"
1216
+ ];
1217
+ const hooks = {
1218
+ PreToolUse: [
1219
+ {
1220
+ matcher: "Bash",
1221
+ hooks: [
1222
+ {
1223
+ type: "command",
1224
+ command: `CMD=$(cat | jq -r '.tool_input.command // empty') && echo "$CMD" | grep -qiE 'rm\\s+-rf\\s+/|DROP\\s+TABLE|curl.*\\|\\s*sh' && echo 'Blocked destructive command' >&2 && exit 2 || true`
1225
+ }
1226
+ ]
1227
+ }
1228
+ ],
1229
+ PostCompact: [
1230
+ {
1231
+ matcher: "",
1232
+ hooks: [
1233
+ {
1234
+ type: "prompt",
1235
+ prompt: "Re-read CLAUDE.md and docs/SPRINT.md (if it exists) to restore project context after compaction."
1236
+ }
1237
+ ]
1238
+ }
1239
+ ]
1240
+ };
1241
+ const techStack = skeleton.outline.tech_stack.map((t) => t.toLowerCase());
1242
+ if (techStack.some((t) => t.includes("typescript") || t.includes("javascript") || t.includes("react") || t.includes("next"))) {
1243
+ hooks.PostToolUse = [
1244
+ {
1245
+ matcher: "Edit|Write",
1246
+ hooks: [
1247
+ {
1248
+ type: "command",
1249
+ command: `FILE=$(cat | jq -r '.tool_input.file_path // empty') && [ -n "$FILE" ] && npx prettier --write "$FILE" 2>/dev/null || true`
1250
+ }
1251
+ ]
1252
+ }
1253
+ ];
1254
+ }
1255
+ return { permissions: { allow, deny }, hooks };
1256
+ }
1257
+ function buildMcpConfig(skeleton, registry) {
1258
+ const config = {};
1259
+ for (const tool of skeleton.tools) {
1260
+ const reg = registry.find((r) => r.id === tool.tool_id);
1261
+ if (reg?.install.mcp_config) {
1262
+ config[tool.tool_id] = reg.install.mcp_config;
1263
+ }
1264
+ }
1265
+ return config;
1266
+ }
973
1267
  function validateSpec(spec, onProgress) {
974
1268
  const warnings = [];
975
1269
  if (spec.tools.length > 8) {
@@ -978,7 +1272,7 @@ function validateSpec(spec, onProgress) {
978
1272
  if (spec.harness.claude_md) {
979
1273
  const lines = spec.harness.claude_md.split("\n").length;
980
1274
  if (lines > 150) {
981
- warnings.push(`CLAUDE.md is ${lines} lines (recommended: \u2264100)`);
1275
+ warnings.push(`CLAUDE.md is ${lines} lines (recommended: \u2264150)`);
982
1276
  }
983
1277
  }
984
1278
  if (spec.harness.skills && Object.keys(spec.harness.skills).length > 5) {
@@ -995,17 +1289,52 @@ async function compile(intent, onProgress) {
995
1289
  }
996
1290
  onProgress?.("Loading tool registry...");
997
1291
  const registry = await loadRegistry();
998
- onProgress?.(`Compiling with ${config.provider} (${config.model})...`);
999
- const userMessage = buildUserMessage(intent, registry);
1000
- const responseText = await callLLM(config, userMessage);
1001
- onProgress?.("Parsing environment spec...");
1002
- const parsed = parseSpecResponse(responseText);
1292
+ onProgress?.("Analyzing workflow...");
1293
+ const skeletonMsg = buildSkeletonMessage(intent, registry);
1294
+ const skeletonText = await callLLM(config, skeletonMsg, {
1295
+ maxTokens: 2048,
1296
+ systemPrompt: SKELETON_PROMPT
1297
+ });
1298
+ const skeleton = parseSkeletonResponse(skeletonText);
1299
+ onProgress?.("Generating environment...");
1300
+ const harnessMsg = buildHarnessMessage(intent, skeleton);
1301
+ let harness;
1302
+ try {
1303
+ const harnessText = await callLLM(config, harnessMsg, {
1304
+ maxTokens: 8192,
1305
+ systemPrompt: HARNESS_PROMPT
1306
+ });
1307
+ harness = parseHarnessResponse(harnessText);
1308
+ } catch {
1309
+ onProgress?.("Retrying with concise mode...");
1310
+ const retryMsg = buildHarnessMessage(intent, skeleton, true);
1311
+ const retryText = await callLLM(config, retryMsg, {
1312
+ maxTokens: 8192,
1313
+ systemPrompt: HARNESS_PROMPT
1314
+ });
1315
+ harness = parseHarnessResponse(retryText);
1316
+ }
1317
+ onProgress?.("Configuring tools...");
1318
+ const settings = buildSettings(skeleton, registry);
1319
+ const mcpConfig = buildMcpConfig(skeleton, registry);
1003
1320
  const spec = {
1004
1321
  id: `env_${crypto.randomUUID()}`,
1005
1322
  intent,
1006
1323
  created_at: (/* @__PURE__ */ new Date()).toISOString(),
1007
- ...parsed,
1008
- autonomy_level: parsed.autonomy_level ?? 1
1324
+ name: skeleton.name,
1325
+ description: skeleton.description,
1326
+ autonomy_level: 1,
1327
+ tools: skeleton.tools,
1328
+ harness: {
1329
+ claude_md: harness.claude_md,
1330
+ settings,
1331
+ mcp_config: mcpConfig,
1332
+ commands: harness.commands,
1333
+ rules: harness.rules,
1334
+ skills: harness.skills ?? {},
1335
+ agents: harness.agents ?? {},
1336
+ docs: harness.docs
1337
+ }
1009
1338
  };
1010
1339
  validateSpec(spec, onProgress);
1011
1340
  await ensureDirs();
@@ -1062,10 +1391,10 @@ Read .claude/commands/ and list each one with a one-line description.
1062
1391
  Group them by workflow phase:
1063
1392
 
1064
1393
  PLAN: /project:spec, /project:sprint, /project:plan
1065
- BUILD: (just start coding \u2014 Claude reads CLAUDE.md automatically)
1394
+ BUILD: /project:develop (full pipeline), or just start coding
1066
1395
  VERIFY: /project:prove, /project:grill, /project:test
1067
1396
  SHIP: /project:commit, /project:review
1068
- MANAGE: /project:status, /project:tasks, /project:reset
1397
+ MANAGE: /project:status, /project:reset
1069
1398
 
1070
1399
  ## Your Agents
1071
1400
  Read .claude/agents/ and explain each one with how to invoke it.
@@ -1124,7 +1453,7 @@ var LOOP_COMMAND_CODE = `# Development Loop
1124
1453
  Run an assisted development cycle for the next feature.
1125
1454
 
1126
1455
  ## Phase 1: SPEC
1127
- Review docs/TODO.md and docs/SPRINT.md.
1456
+ Review docs/SPRINT.md.
1128
1457
  If no sprint is defined, run /project:spec to interview the user.
1129
1458
  Wait for user approval of the spec.
1130
1459
 
@@ -1149,7 +1478,7 @@ Fix any BLOCKERs.
1149
1478
 
1150
1479
  ## Phase 6: SHIP
1151
1480
  Run /project:commit.
1152
- Report what was built and what's next from docs/TODO.md.
1481
+ Report what was built and what's next from docs/SPRINT.md.
1153
1482
 
1154
1483
  Then ask: "Continue to next feature?"
1155
1484
  If yes, return to Phase 1.`;
@@ -1158,7 +1487,7 @@ var LOOP_COMMAND_RESEARCH = `# Research Loop
1158
1487
  Run an assisted research cycle.
1159
1488
 
1160
1489
  ## Phase 1: QUESTION
1161
- Review docs/TODO.md for the next research question.
1490
+ Review docs/SPRINT.md for the next research question.
1162
1491
  If none, ask the user what to investigate.
1163
1492
 
1164
1493
  ## Phase 2: RESEARCH
@@ -1174,7 +1503,7 @@ Present the summary. Ask the user for feedback.
1174
1503
  Revise based on feedback.
1175
1504
 
1176
1505
  ## Phase 5: NEXT
1177
- Update docs/TODO.md \u2014 mark question as done, identify follow-ups.
1506
+ Update docs/SPRINT.md \u2014 mark question as done, identify follow-ups.
1178
1507
  Ask: "Continue to next question?"`;
1179
1508
  var PM_AGENT = `---
1180
1509
  name: pm
@@ -1185,7 +1514,7 @@ model: opus
1185
1514
  You are a project manager for this codebase.
1186
1515
 
1187
1516
  Your responsibilities:
1188
- 1. Maintain docs/TODO.md \u2014 keep it prioritized and current
1517
+ 1. Maintain docs/SPRINT.md \u2014 keep it prioritized and current
1189
1518
  2. Write specs to docs/SPRINT.md when asked
1190
1519
  3. Review completed work and suggest what's next
1191
1520
  4. Track decisions in docs/DECISIONS.md
@@ -1204,7 +1533,7 @@ PM-driven development loop with PR delivery.
1204
1533
 
1205
1534
  ## Phase 1: PLAN (@pm)
1206
1535
  Use @pm to:
1207
- - Read docs/TODO.md and docs/SPRINT.md
1536
+ - Read docs/SPRINT.md
1208
1537
  - Select the highest-priority unfinished task
1209
1538
  - Write a spec to docs/SPRINT.md
1210
1539
  - Present the spec for approval
@@ -1234,7 +1563,7 @@ Create a pull request:
1234
1563
  ## Phase 6: NEXT
1235
1564
  Report:
1236
1565
  "PR #{N} ready for review: {link}
1237
- Next priority from TODO.md: {next task}
1566
+ Next priority from SPRINT.md: {next task}
1238
1567
  Continue? (y/n)"
1239
1568
 
1240
1569
  If yes, return to Phase 1 with next task.`;
@@ -1251,7 +1580,7 @@ PRs are opened automatically. You review when ready.
1251
1580
 
1252
1581
  ## The Loop
1253
1582
  Repeat until max features reached or stopped:
1254
- 1. @pm selects next priority from docs/TODO.md
1583
+ 1. @pm selects next priority from docs/SPRINT.md
1255
1584
  2. Create worktree + branch
1256
1585
  3. Implement the feature
1257
1586
  4. Run verification (build, test, lint)
@@ -1323,7 +1652,7 @@ function autonomyLabel(level) {
1323
1652
 
1324
1653
  // src/adapter/claude-code.ts
1325
1654
  var STATUS_LINE = {
1326
- command: `printf '%s | %s tasks' "$(git branch --show-current 2>/dev/null || echo 'no-git')" "$(grep -c '\\- \\[ \\]' docs/TODO.md 2>/dev/null || echo 0)"`
1655
+ command: `printf '%s | %s tasks' "$(git branch --show-current 2>/dev/null || echo 'no-git')" "$(grep -c '\\- \\[ \\]' docs/SPRINT.md 2>/dev/null || echo 0)"`
1327
1656
  };
1328
1657
  function isCodeProject(spec) {
1329
1658
  const commands = spec.harness.commands ?? {};