azclaude-copilot 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (108) hide show
  1. package/.claude-plugin/marketplace.json +27 -0
  2. package/.claude-plugin/plugin.json +17 -0
  3. package/LICENSE +21 -0
  4. package/README.md +477 -0
  5. package/bin/cli.js +1027 -0
  6. package/bin/copilot.js +228 -0
  7. package/hooks/README.md +3 -0
  8. package/hooks/hooks.json +40 -0
  9. package/package.json +41 -0
  10. package/templates/CLAUDE.md +51 -0
  11. package/templates/agents/cc-cli-integrator.md +104 -0
  12. package/templates/agents/cc-template-author.md +109 -0
  13. package/templates/agents/cc-test-maintainer.md +101 -0
  14. package/templates/agents/code-reviewer.md +136 -0
  15. package/templates/agents/loop-controller.md +118 -0
  16. package/templates/agents/orchestrator-init.md +196 -0
  17. package/templates/agents/test-writer.md +129 -0
  18. package/templates/capabilities/evolution/cycle2-knowledge.md +87 -0
  19. package/templates/capabilities/evolution/cycle3-topology.md +128 -0
  20. package/templates/capabilities/evolution/detect.md +103 -0
  21. package/templates/capabilities/evolution/evaluate.md +90 -0
  22. package/templates/capabilities/evolution/generate.md +123 -0
  23. package/templates/capabilities/evolution/re-derivation.md +77 -0
  24. package/templates/capabilities/intelligence/debate.md +104 -0
  25. package/templates/capabilities/intelligence/elo.md +122 -0
  26. package/templates/capabilities/intelligence/experiment.md +86 -0
  27. package/templates/capabilities/intelligence/opro.md +84 -0
  28. package/templates/capabilities/intelligence/pipeline.md +149 -0
  29. package/templates/capabilities/level-builders/level1-claudemd.md +52 -0
  30. package/templates/capabilities/level-builders/level2-mcp.md +58 -0
  31. package/templates/capabilities/level-builders/level3-skills.md +276 -0
  32. package/templates/capabilities/level-builders/level4-memory.md +72 -0
  33. package/templates/capabilities/level-builders/level5-agents.md +123 -0
  34. package/templates/capabilities/level-builders/level6-hooks.md +119 -0
  35. package/templates/capabilities/level-builders/level7-extmcp.md +60 -0
  36. package/templates/capabilities/level-builders/level8-orchestrated.md +98 -0
  37. package/templates/capabilities/manifest.md +58 -0
  38. package/templates/capabilities/shared/5-layer-agent.md +206 -0
  39. package/templates/capabilities/shared/completion-rule.md +44 -0
  40. package/templates/capabilities/shared/context-artifacts.md +96 -0
  41. package/templates/capabilities/shared/domain-advisor-generator.md +205 -0
  42. package/templates/capabilities/shared/friction-log.md +43 -0
  43. package/templates/capabilities/shared/multi-cli-paths.md +56 -0
  44. package/templates/capabilities/shared/native-tools.md +199 -0
  45. package/templates/capabilities/shared/plan-tracker.md +69 -0
  46. package/templates/capabilities/shared/pressure-test.md +88 -0
  47. package/templates/capabilities/shared/quality-check.md +83 -0
  48. package/templates/capabilities/shared/reflexes.md +159 -0
  49. package/templates/capabilities/shared/review-reception.md +70 -0
  50. package/templates/capabilities/shared/security.md +174 -0
  51. package/templates/capabilities/shared/semantic-boundary-check.md +140 -0
  52. package/templates/capabilities/shared/session-rhythm.md +42 -0
  53. package/templates/capabilities/shared/tdd.md +54 -0
  54. package/templates/capabilities/shared/vocabulary-transform.md +63 -0
  55. package/templates/commands/add.md +152 -0
  56. package/templates/commands/audit.md +123 -0
  57. package/templates/commands/blueprint.md +115 -0
  58. package/templates/commands/copilot.md +157 -0
  59. package/templates/commands/create.md +156 -0
  60. package/templates/commands/debate.md +75 -0
  61. package/templates/commands/deps.md +112 -0
  62. package/templates/commands/doc.md +100 -0
  63. package/templates/commands/dream.md +120 -0
  64. package/templates/commands/evolve.md +170 -0
  65. package/templates/commands/explain.md +25 -0
  66. package/templates/commands/find.md +100 -0
  67. package/templates/commands/fix.md +122 -0
  68. package/templates/commands/hookify.md +100 -0
  69. package/templates/commands/level-up.md +48 -0
  70. package/templates/commands/loop.md +62 -0
  71. package/templates/commands/migrate.md +119 -0
  72. package/templates/commands/persist.md +73 -0
  73. package/templates/commands/pulse.md +87 -0
  74. package/templates/commands/refactor.md +97 -0
  75. package/templates/commands/reflect.md +107 -0
  76. package/templates/commands/reflexes.md +141 -0
  77. package/templates/commands/setup.md +97 -0
  78. package/templates/commands/ship.md +131 -0
  79. package/templates/commands/snapshot.md +70 -0
  80. package/templates/commands/test.md +86 -0
  81. package/templates/hooks/post-tool-use.js +175 -0
  82. package/templates/hooks/stop.js +85 -0
  83. package/templates/hooks/user-prompt.js +96 -0
  84. package/templates/scripts/env-scan.sh +46 -0
  85. package/templates/scripts/import-graph.sh +88 -0
  86. package/templates/scripts/validate-boundaries.sh +180 -0
  87. package/templates/skills/agent-creator/SKILL.md +91 -0
  88. package/templates/skills/agent-creator/examples/sample-agent.md +80 -0
  89. package/templates/skills/agent-creator/references/agent-engineering-guide.md +596 -0
  90. package/templates/skills/agent-creator/references/quality-checklist.md +42 -0
  91. package/templates/skills/agent-creator/scripts/scaffold.sh +144 -0
  92. package/templates/skills/architecture-advisor/SKILL.md +92 -0
  93. package/templates/skills/architecture-advisor/references/database-decisions.md +61 -0
  94. package/templates/skills/architecture-advisor/references/decision-matrices.md +122 -0
  95. package/templates/skills/architecture-advisor/references/rendering-decisions.md +39 -0
  96. package/templates/skills/architecture-advisor/scripts/detect-scale.sh +67 -0
  97. package/templates/skills/debate/SKILL.md +36 -0
  98. package/templates/skills/debate/references/acemad-protocol.md +72 -0
  99. package/templates/skills/env-scanner/SKILL.md +41 -0
  100. package/templates/skills/security/SKILL.md +44 -0
  101. package/templates/skills/security/references/security-details.md +48 -0
  102. package/templates/skills/session-guard/SKILL.md +33 -0
  103. package/templates/skills/skill-creator/SKILL.md +82 -0
  104. package/templates/skills/skill-creator/examples/sample-skill.md +74 -0
  105. package/templates/skills/skill-creator/references/quality-checklist.md +36 -0
  106. package/templates/skills/skill-creator/references/skill-engineering-guide.md +365 -0
  107. package/templates/skills/skill-creator/scripts/scaffold.sh +75 -0
  108. package/templates/skills/test-first/SKILL.md +41 -0
@@ -0,0 +1,104 @@
1
+ ---
2
+ name: intelligence-debate
3
+ description: >
4
+ Structured adversarial debate for hard architectural decisions.
5
+ Opt-in only — not the default for every decision.
6
+ Triggers on: "debate", "tradeoff", "should we", "which is better", "decide between".
7
+ tokens: ~400
8
+ ---
9
+
10
+ ## Structured Debate Protocol [AceMAD]
11
+
12
+ Use when a decision is genuinely uncertain, multi-criteria, and wrong choice costs real time.
13
+ Do NOT use for routine decisions — Claude's direct answer is faster and equally good.
14
+
15
+ ---
16
+
17
+ ### Phase 1: Define the Question
18
+ State the decision as a binary or small option set:
19
+ "Should we do A or B?"
20
+ "Which architecture: X or Y?"
21
+ Not: "What should we do about the general problem?"
22
+
23
+ ---
24
+
25
+ ### Phase 2: Advocate A — MAXIMALIST
26
+ Argue the strongest possible case FOR the leading option.
27
+ - Make the best arguments you can — do not sandbag
28
+ - N ≤ 10 arguments maximum
29
+ - Each argument: one claim + one piece of evidence
30
+ - Label evidence: [VERIFIED] / [PARTIAL] / [UNVERIFIED] / [FALSE]
31
+
32
+ ---
33
+
34
+ ### Phase 3: Advocate B — SKEPTIC
35
+ Argue the strongest possible case AGAINST the leading option (or FOR the alternative).
36
+ - N ≤ 10 arguments maximum
37
+ - T ≤ 5 total rounds across both advocates
38
+ - Same evidence labeling rules apply
39
+ - Must address MAXIMALIST's strongest verified argument
40
+
41
+ ---
42
+
43
+ ### Phase 4: FACT CHECK — NON-NEGOTIABLE
44
+ Before synthesis, tag every claim:
45
+
46
+ | Tag | Meaning |
47
+ |-----|---------|
48
+ | [VERIFIED] | Confirmed by code, docs, or direct test |
49
+ | [PARTIAL] | Directionally correct but incomplete |
50
+ | [UNVERIFIED] | Asserted without evidence |
51
+ | [FALSE] | Contradicted by evidence |
52
+
53
+ **Disqualified claim language** — any argument containing:
54
+ "should work" / "probably" / "I believe" / "I think" → mark [UNVERIFIED], reduce weight.
55
+
56
+ If ≥ 30% of an advocate's claims are UNVERIFIED → confidence stays LOW regardless of score.
57
+ Adjusted evidence scores replace raw argument counts.
58
+
59
+ Truth wins. Not volume.
60
+
61
+ ---
62
+
63
+ ### Phase 5: Synthesis
64
+
65
+ **5a. Draft synthesis** — pick the winner, state the margin (0-100 confidence).
66
+
67
+ **5b. Second-Order Cognition Check [AceMAD]**:
68
+ Did the synthesis address the strongest *verified* claim from the losing side?
69
+ If not → revise. Confidence stays LOW until addressed.
70
+
71
+ **5b. Self-refine** — critique the draft:
72
+ - Did I favor the side with more words? (higher word-count ≠ stronger case)
73
+ - Did I ignore any [VERIFIED] claim?
74
+ - Is the margin honest?
75
+ Max 2 rounds of self-refine.
76
+
77
+ **5c. Order-Independence Check [PeerRank]**:
78
+ If margin < 10 points: run synthesis again with advocates in reversed order.
79
+ If conclusion reverses → result is INCONCLUSIVE. Present both sides to user.
80
+ Position bias accounts for 0.39 correlation with first-presented argument.
81
+
82
+ **5d. Deliver** — state winner, confidence, and the one verified claim that decided it.
83
+
84
+ ---
85
+
86
+ ### Phase 6: Length-Independence [Elo-Evolve]
87
+ Score on **evidence-density** (verified claims per 100 words), NOT raw word count.
88
+ Flag any advocate whose argument exceeds 2× the median length — the excess is noise.
89
+
90
+ **Comparative Binary Framing [Elo-Evolve]**: Use pairwise comparison (which is better: A or B?)
91
+ not absolute ratings (rate A /10). Pairwise reduces noise from σ_abs=35.65 to σ_comp=7.85.
92
+
93
+ ---
94
+
95
+ ### Phase 7: Record the Decision
96
+ Append to `.claude/memory/decisions.md`:
97
+ ```
98
+ ## {Decision Title} — {date}
99
+ **Question**: {the decision}
100
+ **Options**: {A} vs {B}
101
+ **Winner**: {chosen approach} (confidence: {level})
102
+ **Deciding claim**: {the one verified claim that settled it}
103
+ **Dissent**: {strongest verified argument for the loser}
104
+ ```
@@ -0,0 +1,122 @@
1
+ ---
2
+ name: intelligence-elo
3
+ description: >
4
+ ELO quality ranking across options, agents, or skills. Use when comparing
5
+ multiple candidates and need a defensible rank order.
6
+ Triggers on: "rank these", "which is best", "compare quality".
7
+ tokens: ~200
8
+ ---
9
+
10
+ ## ELO Quality Ranking [PeerRank + Elo-Evolve]
11
+
12
+ ---
13
+
14
+ ### Authoritative Ownership [Elo-Evolve]
15
+ The loop controller (the model running this skill) owns the authoritative ELO scores.
16
+ Subagents that produce provisional scores feed into this controller — they do not own the final rank.
17
+ Provisional scores from subagents are inputs, not decisions.
18
+
19
+ Self-evaluation reliability: r=0.538 (self-scored) vs r=0.905 (loop controller reconciliation).
20
+ Never let an agent self-rank as authoritative.
21
+
22
+ ---
23
+
24
+ ### Pairwise Comparison Protocol
25
+
26
+ **Comparative Binary Framing**: Compare pairs, not absolutes.
27
+ "Is A better than B for this criterion?" — not "Rate A from 1 to 10."
28
+
29
+ For N candidates: run N×(N-1)/2 pairwise comparisons.
30
+ Each comparison: one winner, one loser. No ties unless evidence is genuinely equal.
31
+
32
+ ---
33
+
34
+ ### ELO Schema
35
+
36
+ For each candidate, track:
37
+ ```json
38
+ {
39
+ "id": "{candidate-id}",
40
+ "elo": 1000,
41
+ "wins": 0,
42
+ "losses": 0,
43
+ "adjusted_evidence_score": 0,
44
+ "fact_check": {
45
+ "verified": 0,
46
+ "partial": 0,
47
+ "unverified": 0,
48
+ "false": 0
49
+ }
50
+ }
51
+ ```
52
+
53
+ ELO update after each comparison:
54
+ - Winner: elo += 32 × (1 - expected_score)
55
+ - Loser: elo -= 32 × expected_score
56
+ - expected_score = 1 / (1 + 10^((opponent_elo - this_elo) / 400))
57
+
58
+ ---
59
+
60
+ ### Adjusted Evidence Score
61
+
62
+ Do not rank on raw ELO alone. Adjust for fact-check quality:
63
+ - `adjusted_evidence_score = verified_claims / (verified + unverified + false)`
64
+ - If adjusted_evidence_score < 0.5: cap ELO at 1100 regardless of wins
65
+ - A candidate that wins on volume but loses on evidence quality is not the best candidate
66
+
67
+ ---
68
+
69
+ ### Output
70
+
71
+ Rank candidates by final ELO descending.
72
+ For the winner: state the one pairwise comparison that determined the rank.
73
+ Record in `.claude/memory/decisions.md` with the full ranking table.
74
+
75
+ ---
76
+
77
+ ### Persistent ELO — elo-rankings.json
78
+
79
+ Write rankings to `.claude/memory/elo-rankings.json` so scores accumulate across sessions:
80
+
81
+ ```json
82
+ {
83
+ "last_updated": "YYYY-MM-DD",
84
+ "debate_elo": [
85
+ {
86
+ "position": "{argument-id}",
87
+ "context": "{decision-topic}",
88
+ "elo": 1000,
89
+ "wins": 0,
90
+ "losses": 0,
91
+ "adjusted_evidence_score": 0.0
92
+ }
93
+ ],
94
+ "agent_elo": [
95
+ {
96
+ "agent": "{agent-name}",
97
+ "elo": 1000,
98
+ "tasks_completed": 0,
99
+ "wins": 0,
100
+ "losses": 0,
101
+ "last_task": "YYYY-MM-DD"
102
+ }
103
+ ],
104
+ "pattern_elo": [
105
+ {
106
+ "pattern": "{pattern-description}",
107
+ "source": ".claude/memory/patterns.md",
108
+ "elo": 1000,
109
+ "times_applied": 0,
110
+ "times_succeeded": 0,
111
+ "times_failed": 0
112
+ }
113
+ ]
114
+ }
115
+ ```
116
+
117
+ **Update rules:**
118
+ - `debate_elo`: updated after each `/debate` cycle — one entry per argued position
119
+ - `agent_elo`: updated after each pipeline run — winner = agent whose output was accepted without revision
120
+ - `pattern_elo`: updated after each task — pattern succeeded if the approach worked, failed if re-derivation was triggered
121
+
122
+ Read `elo-rankings.json` at the start of each debate or agent spawn. Pass as context so prior performance informs the current run.
@@ -0,0 +1,86 @@
1
+ ---
2
+ name: intelligence-experiment
3
+ description: >
4
+ Experiment agent with worktree isolation. Use when testing a risky approach
5
+ that must not affect the main branch until proven. Triggers on: "try this approach",
6
+ "experiment with", "test this idea safely", "isolated experiment", "worktree".
7
+ tokens: ~80
8
+ ---
9
+
10
+ ## Experiment Agent
11
+
12
+ Run risky explorations in an isolated git worktree. Main branch is never touched.
13
+
14
+ ---
15
+
16
+ ### When to Use
17
+
18
+ - Refactoring with uncertain outcome
19
+ - Trying a new library or framework approach
20
+ - Architectural change that might break things
21
+ - Any "I want to try X but not break the main branch" scenario
22
+
23
+ If the experiment succeeds → merge. If it fails → discard the worktree. Zero cleanup.
24
+
25
+ ---
26
+
27
+ ### Agent Definition Template
28
+
29
+ ```yaml
30
+ ---
31
+ name: experiment-{task-name}
32
+ description: >
33
+ Isolated experiment for {task}. Runs in worktree, cannot affect main branch.
34
+ model: claude-sonnet-4-6
35
+ maxTurns: 30
36
+ tools:
37
+ - Read
38
+ - Write
39
+ - Edit
40
+ - Bash
41
+ - Glob
42
+ - Grep
43
+ permissionMode: acceptEdits
44
+ isolation: worktree
45
+ background: false
46
+ ---
47
+ ```
48
+
49
+ The `isolation: worktree` field is what creates a separate git branch automatically.
50
+ The agent operates on a copy — main branch is read-only from the experiment's perspective.
51
+
52
+ ---
53
+
54
+ ### Experiment Protocol
55
+
56
+ **Before starting**: define the hypothesis and success criteria.
57
+ ```
58
+ Hypothesis: {what you believe will happen}
59
+ Success: {measurable outcome that proves it worked}
60
+ Failure: {measurable outcome that proves it didn't}
61
+ ```
62
+
63
+ **After completing** — log the outcome:
64
+
65
+ If experiment succeeded:
66
+ ```bash
67
+ echo "\n## {task} — $(date +%Y-%m-%d)\n{what worked and why — include hypothesis}" >> .claude/memory/patterns.md
68
+ ```
69
+
70
+ If experiment failed:
71
+ ```bash
72
+ echo "\n## {task} — $(date +%Y-%m-%d)\n{what failed, why, what to try instead}" >> .claude/memory/antipatterns.md
73
+ ```
74
+
75
+ **Never discard without logging.** A failed experiment that teaches something is a successful experiment.
76
+
77
+ ---
78
+
79
+ ### Merge Decision
80
+
81
+ After experiment completes:
82
+ - Tests pass + success criteria met → merge to main
83
+ - Tests fail or success criteria not met → discard worktree, log antipattern
84
+ - Partial success → extract the working parts before discarding
85
+
86
+ Do NOT merge experiments that pass tests but fail success criteria. Tests prove correctness; success criteria prove value.
@@ -0,0 +1,84 @@
1
+ ---
2
+ name: intelligence-opro
3
+ description: >
4
+ OPRO + APE prompt optimization. Improves skill instructions by learning
5
+ from history. Use when a skill underperforms or after 10+ uses.
6
+ Triggers on: "optimize prompts", "improve skill", "skill isn't working".
7
+ tokens: ~300
8
+ ---
9
+
10
+ ## Prompt Optimization [OPRO + APE]
11
+
12
+ ---
13
+
14
+ ### Step 1: Read History [OPRO]
15
+ ```bash
16
+ cat .claude/memory/prompt-history.json 2>/dev/null
17
+ ```
18
+
19
+ If it exists:
20
+ - Top 5 scoring instructions → positive signal (what worked)
21
+ - Bottom 2 scoring instructions → negative signal (what failed)
22
+ - These signals shape the new instruction variants
23
+
24
+ If it doesn't exist: create it after this optimization run.
25
+
26
+ ---
27
+
28
+ ### Step 2: Generate 3 Variants [APE]
29
+
30
+ Write 3 instruction variants for the target skill:
31
+
32
+ **Variant A — Few-shot enrichment**:
33
+ Add 2-3 concrete examples of correct behavior from this project.
34
+ The examples are real, not hypothetical.
35
+
36
+ **Variant B — Chain-of-thought**:
37
+ Add explicit reasoning steps before the action.
38
+ "First identify X, then check Y, then do Z."
39
+
40
+ **Variant C — Domain context**:
41
+ Add domain-specific vocabulary and constraints at the top.
42
+ Ground the instruction in the project's actual language.
43
+
44
+ Save all 3 to `.claude/memory/prompt-versions/{skill}-{date}/`:
45
+ - `variant-a.md`
46
+ - `variant-b.md`
47
+ - `variant-c.md`
48
+
49
+ ---
50
+
51
+ ### Step 3: Score and Select
52
+
53
+ Apply each variant to 3 real inputs from the current project (k=3 gate).
54
+ Score on:
55
+ - Correctness of output (0-10)
56
+ - Token efficiency (output length vs. information density)
57
+ - Self-applicability (would an unfamiliar agent apply it correctly?)
58
+
59
+ Keep the winner. Discard the others (don't delete — archive).
60
+
61
+ ---
62
+
63
+ ### Step 4: Update History [OPRO]
64
+
65
+ Append to `.claude/memory/prompt-history.json`:
66
+ ```json
67
+ {
68
+ "skill": "{skill-name}",
69
+ "date": "{ISO date}",
70
+ "winner": "{variant-a|b|c}",
71
+ "score": {0-10},
72
+ "why": "{one sentence: what made it better}"
73
+ }
74
+ ```
75
+
76
+ This entry feeds the next OPRO cycle. The history grows, the signal sharpens.
77
+
78
+ ---
79
+
80
+ ### Step 5: Deploy Winner
81
+
82
+ Replace the skill body with the winning variant.
83
+ Keep the frontmatter (name, description, tokens) unchanged.
84
+ Update the token estimate if the new body is significantly different.
@@ -0,0 +1,149 @@
1
+ ---
2
+ name: intelligence-pipeline
3
+ description: >
4
+ Pipeline agent design. Use when 3+ agents chain output to input.
5
+ Ensures clean context passing, no context bleed between agents.
6
+ Triggers on: "chain agents", "pipeline", "agent A feeds agent B".
7
+ tokens: ~250
8
+ ---
9
+
10
+ ## Pipeline Agent Design
11
+
12
+ Use when work genuinely requires sequential agents where each feeds the next.
13
+ Do NOT use for work one agent can do with tools directly.
14
+
15
+ ---
16
+
17
+ ### Pipeline Building Blocks
18
+
19
+ Six primitives. Every pipeline is a composition of these.
20
+
21
+ | Block | What it does | When to use |
22
+ |-------|-------------|-------------|
23
+ | **Sequential** | A → B → C | Each step needs the previous result |
24
+ | **Parallel** | A + B + C → merge | Independent tasks that can run simultaneously |
25
+ | **Reflect** | Agent reviews its own output before passing it | High-stakes output, expensive to fix downstream |
26
+ | **Debate** | Two agents argue a position → synthesizer picks winner | Architectural decisions, tradeoffs |
27
+ | **Summarize** | Compresses large context before passing to next agent | When Agent A output would overflow Agent B context |
28
+ | **Tool-use** | Agent calls scripts/tools and passes JSON result | Programmatic data (never raw tool transcripts) |
29
+
30
+ **Compose by need**: most pipelines are Sequential + Summarize. Add Reflect only for high-risk steps.
31
+ Adding Debate to every pipeline wastes tokens — reserve for genuine tradeoffs.
32
+
33
+ ---
34
+
35
+ ### Pre-Built Pipeline Templates
36
+
37
+ #### Feature Pipeline (new feature implementation)
38
+ ```
39
+ Agents: planner → implementer → reviewer
40
+ Planner input: feature description + CLAUDE.md
41
+ Planner output: { files_to_change, test_plan, approach }
42
+ Implementer input: planner output + tdd.md
43
+ Implementer output: { files_changed, tests_written, test_results }
44
+ Reviewer input: implementer output + spec
45
+ Reviewer output: { spec_compliance: pass|fail, issues: [...] }
46
+ Block types: Sequential + Reflect (implementer self-reviews tests before passing)
47
+ ```
48
+
49
+ #### Fix Pipeline (bug investigation)
50
+ ```
51
+ Agents: investigator → hypothesizer → fixer
52
+ Investigator input: error description + relevant files
53
+ Investigator output: { root_cause, affected_files, reproduction_steps }
54
+ Hypothesizer input: investigator output
55
+ Hypothesizer output: { hypothesis, fix_approach, risk: low|medium|high }
56
+ Fixer input: hypothesizer output (high risk → add Debate block before fix)
57
+ Fixer output: { files_changed, tests_passing, fix_summary }
58
+ Block types: Sequential (+ Debate if risk = high)
59
+ ```
60
+
61
+ #### Review Pipeline (code review)
62
+ ```
63
+ Agents: spec-checker → quality-checker
64
+ Spec-checker input: PR diff + spec/requirements
65
+ Spec-checker output: { spec_compliance: pass|fail, violations: [...] }
66
+ Quality-checker input: PR diff + spec-checker output
67
+ GATE: if spec_compliance = fail → stop, return spec-checker output (do not proceed)
68
+ Quality-checker output: { quality_issues: [...], suggestions: [...] }
69
+ Block types: Sequential with hard gate
70
+ ```
71
+
72
+ #### Architecture Pipeline (major design decision)
73
+ ```
74
+ Agents: analyst → maximalist → skeptic → synthesizer
75
+ Analyst input: problem statement + codebase signals
76
+ Analyst output: { options: [A, B, C], constraints, tradeoffs }
77
+ Maximalist input: analyst output → argues for best option
78
+ Skeptic input: analyst output → argues against best option
79
+ Synthesizer input: maximalist + skeptic outputs → picks winner with reasoning
80
+ Block types: Sequential → Parallel (maximalist + skeptic run together) → Sequential
81
+ ```
82
+
83
+ ---
84
+
85
+ ### Pipeline Validity Check
86
+ Before building a pipeline, confirm:
87
+ 1. Can a single agent do this with Read/Write/Bash tools? If yes → don't pipeline.
88
+ 2. Is the output of Agent A genuinely the input of Agent B? If output needs human review first → don't pipeline.
89
+ 3. Is each agent's task parallelizable instead? If yes → spawn parallel agents, not a pipeline.
90
+
91
+ Pipeline = sequential dependency. Parallel = independent tasks. Don't confuse them.
92
+
93
+ ---
94
+
95
+ ### Context Passing Rule
96
+ Each agent receives ONLY:
97
+ - The output of the previous agent (not the previous agent's full context)
98
+ - Its own capability file (the micro-section for its specific task)
99
+ - Shared rules if relevant (tdd.md, completion-rule.md)
100
+
101
+ **NEVER pass the full context window of Agent A to Agent B.**
102
+ Agent B's context starts fresh. It gets a summary, not a transcript.
103
+
104
+ ---
105
+
106
+ ### Pipeline Schema
107
+ Define before building:
108
+ ```
109
+ Pipeline: {name}
110
+
111
+ Agent 1: {task}
112
+ Input: {what it receives}
113
+ Output: {what it produces — this is Agent 2's input}
114
+
115
+ Agent 2: {task}
116
+ Input: {Agent 1's output format}
117
+ Output: {what it produces — this is Agent 3's input}
118
+
119
+ Agent 3: {task}
120
+ Input: {Agent 2's output format}
121
+ Output: {final result format}
122
+ ```
123
+
124
+ The output format of each agent must exactly match the input format of the next.
125
+ If they don't match: the pipeline has a bug before it's built.
126
+
127
+ ---
128
+
129
+ ### Knowledge Passing [AutoAgent]
130
+ When passing knowledge between agents:
131
+ - Pass structured objects (JSON), not prose summaries
132
+ - Include only fields the next agent actually uses
133
+ - Compress intermediate results: script runs → summary output only
134
+
135
+ Intermediate tool call results stay in the script's scope, not in Claude's context.
136
+ Only the final structured output crosses the agent boundary.
137
+
138
+ ---
139
+
140
+ ### Pipeline Map
141
+ Document in `.claude/memory/pipeline-map.md`:
142
+ ```
143
+ {pipeline-name}:
144
+ A → B → C
145
+ A output: {format}
146
+ B output: {format}
147
+ C output: {format}
148
+ Total token budget: ~{estimate}
149
+ ```
@@ -0,0 +1,52 @@
1
+ ---
2
+ name: level1-claudemd
3
+ description: >
4
+ Build Level 1: create or improve the project CLAUDE.md.
5
+ Triggers on: "build level 1", "set up CLAUDE.md", "configure project rules".
6
+ tokens: ~200
7
+ ---
8
+
9
+ ## Level 1: CLAUDE.md
10
+
11
+ CLAUDE.md is the only always-hot file. It must be lightweight (~30 lines).
12
+ Do not embed logic here. Use pointers to capability files.
13
+
14
+ ---
15
+
16
+ ### What Goes In CLAUDE.md
17
+ 1. **Identity** — project name, domain, stack, scale (3 lines max)
18
+ 2. **Rules** — 2-3 non-negotiable rules only. No lists of guidelines.
19
+ 3. **Session state** — pointer to goals.md (2 lines)
20
+ 4. **Task routing** — quick dispatch table + pointer to manifest.md (~10 lines)
21
+ 5. **Trade-Off Hierarchies** — what wins when priorities conflict (3 lines)
22
+ 6. **Available commands** — one line list
23
+
24
+ Total: ~30 lines. If it grows past 40 lines, move content to a capability file.
25
+
26
+ ---
27
+
28
+ ### What Does NOT Go In CLAUDE.md
29
+ - Detailed instructions (those go in capability files)
30
+ - Level builder logic (that's in level-builders/)
31
+ - Memory of past sessions (that's in goals.md, injected via hook)
32
+ - Domain knowledge (that's in knowledge-index.md)
33
+
34
+ ---
35
+
36
+ ### Fill the Template
37
+ Replace these placeholders in templates/CLAUDE.md:
38
+ - `{{PROJECT_NAME}}` — actual project name
39
+ - `{{PROJECT_DESCRIPTION}}` — 1-2 sentence description
40
+ - `{{DOMAIN}}` — detected domain (developer/writer/researcher/compliance)
41
+ - `{{STACK}}` — detected stack
42
+ - `{{SCALE}}` — STANDARD/SKIM/MINIMAL/STRUCTURE-ONLY
43
+ - `{{TDD_RULE}}` — insert TDD Iron Law if developer domain, else remove line
44
+ - `{{PRIORITY_1/2/3}}` — trade-off hierarchy from domain signals
45
+
46
+ ---
47
+
48
+ ### Trade-Off Hierarchies — Pre-fill from Domain Signals
49
+ - Developer project → correctness > speed > elegance
50
+ - Compliance project → regulatory conformity > completeness > efficiency
51
+ - Writer project → clarity > structure > length
52
+ - Research project → evidence quality > comprehensiveness > recency
@@ -0,0 +1,58 @@
1
+ ---
2
+ name: level2-mcp
3
+ description: >
4
+ Build Level 2: configure MCP servers for this project.
5
+ Triggers on: "build level 2", "add MCP", "configure MCP servers".
6
+ tokens: ~150
7
+ requires: level1-claudemd
8
+ ---
9
+
10
+ ## Level 2: MCP Servers
11
+
12
+ MCP servers extend Claude's tool set. They are deferred-loaded — only connect
13
+ when an agent needs that specific capability.
14
+
15
+ ---
16
+
17
+ ### Detect What's Needed
18
+ Read signals from the project:
19
+ - Database files (sqlite, postgres connection strings) → database MCP
20
+ - Browser/web tasks mentioned in README → browser/playwright MCP
21
+ - File system operations beyond the project → filesystem MCP
22
+ - External APIs in dependencies → relevant API MCPs
23
+
24
+ Only install what the project actually needs. Not a default list.
25
+
26
+ ---
27
+
28
+ ### Configure `.mcp.json`
29
+ ```json
30
+ {
31
+ "mcpServers": {
32
+ "{server-name}": {
33
+ "command": "{command}",
34
+ "args": ["{args}"],
35
+ "env": {
36
+ "{KEY}": "${ENV_VAR}"
37
+ }
38
+ }
39
+ }
40
+ }
41
+ ```
42
+
43
+ Place `.mcp.json` in the project root.
44
+
45
+ ---
46
+
47
+ ### Deferred Loading Rule
48
+ MCPs are NOT loaded at session start. Each subagent connects only when it needs
49
+ that capability. Do not list all MCPs in CLAUDE.md — they appear in manifest.md
50
+ if they need to be explicitly dispatched.
51
+
52
+ ---
53
+
54
+ ### Verification
55
+ After configuring:
56
+ - Run `claude mcp list` to confirm servers are registered
57
+ - Test one tool from each server before declaring Level 2 complete
58
+ - Document each server in manifest.md if it needs to be discoverable by agents