azclaude-copilot 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +27 -0
- package/.claude-plugin/plugin.json +17 -0
- package/LICENSE +21 -0
- package/README.md +477 -0
- package/bin/cli.js +1027 -0
- package/bin/copilot.js +228 -0
- package/hooks/README.md +3 -0
- package/hooks/hooks.json +40 -0
- package/package.json +41 -0
- package/templates/CLAUDE.md +51 -0
- package/templates/agents/cc-cli-integrator.md +104 -0
- package/templates/agents/cc-template-author.md +109 -0
- package/templates/agents/cc-test-maintainer.md +101 -0
- package/templates/agents/code-reviewer.md +136 -0
- package/templates/agents/loop-controller.md +118 -0
- package/templates/agents/orchestrator-init.md +196 -0
- package/templates/agents/test-writer.md +129 -0
- package/templates/capabilities/evolution/cycle2-knowledge.md +87 -0
- package/templates/capabilities/evolution/cycle3-topology.md +128 -0
- package/templates/capabilities/evolution/detect.md +103 -0
- package/templates/capabilities/evolution/evaluate.md +90 -0
- package/templates/capabilities/evolution/generate.md +123 -0
- package/templates/capabilities/evolution/re-derivation.md +77 -0
- package/templates/capabilities/intelligence/debate.md +104 -0
- package/templates/capabilities/intelligence/elo.md +122 -0
- package/templates/capabilities/intelligence/experiment.md +86 -0
- package/templates/capabilities/intelligence/opro.md +84 -0
- package/templates/capabilities/intelligence/pipeline.md +149 -0
- package/templates/capabilities/level-builders/level1-claudemd.md +52 -0
- package/templates/capabilities/level-builders/level2-mcp.md +58 -0
- package/templates/capabilities/level-builders/level3-skills.md +276 -0
- package/templates/capabilities/level-builders/level4-memory.md +72 -0
- package/templates/capabilities/level-builders/level5-agents.md +123 -0
- package/templates/capabilities/level-builders/level6-hooks.md +119 -0
- package/templates/capabilities/level-builders/level7-extmcp.md +60 -0
- package/templates/capabilities/level-builders/level8-orchestrated.md +98 -0
- package/templates/capabilities/manifest.md +58 -0
- package/templates/capabilities/shared/5-layer-agent.md +206 -0
- package/templates/capabilities/shared/completion-rule.md +44 -0
- package/templates/capabilities/shared/context-artifacts.md +96 -0
- package/templates/capabilities/shared/domain-advisor-generator.md +205 -0
- package/templates/capabilities/shared/friction-log.md +43 -0
- package/templates/capabilities/shared/multi-cli-paths.md +56 -0
- package/templates/capabilities/shared/native-tools.md +199 -0
- package/templates/capabilities/shared/plan-tracker.md +69 -0
- package/templates/capabilities/shared/pressure-test.md +88 -0
- package/templates/capabilities/shared/quality-check.md +83 -0
- package/templates/capabilities/shared/reflexes.md +159 -0
- package/templates/capabilities/shared/review-reception.md +70 -0
- package/templates/capabilities/shared/security.md +174 -0
- package/templates/capabilities/shared/semantic-boundary-check.md +140 -0
- package/templates/capabilities/shared/session-rhythm.md +42 -0
- package/templates/capabilities/shared/tdd.md +54 -0
- package/templates/capabilities/shared/vocabulary-transform.md +63 -0
- package/templates/commands/add.md +152 -0
- package/templates/commands/audit.md +123 -0
- package/templates/commands/blueprint.md +115 -0
- package/templates/commands/copilot.md +157 -0
- package/templates/commands/create.md +156 -0
- package/templates/commands/debate.md +75 -0
- package/templates/commands/deps.md +112 -0
- package/templates/commands/doc.md +100 -0
- package/templates/commands/dream.md +120 -0
- package/templates/commands/evolve.md +170 -0
- package/templates/commands/explain.md +25 -0
- package/templates/commands/find.md +100 -0
- package/templates/commands/fix.md +122 -0
- package/templates/commands/hookify.md +100 -0
- package/templates/commands/level-up.md +48 -0
- package/templates/commands/loop.md +62 -0
- package/templates/commands/migrate.md +119 -0
- package/templates/commands/persist.md +73 -0
- package/templates/commands/pulse.md +87 -0
- package/templates/commands/refactor.md +97 -0
- package/templates/commands/reflect.md +107 -0
- package/templates/commands/reflexes.md +141 -0
- package/templates/commands/setup.md +97 -0
- package/templates/commands/ship.md +131 -0
- package/templates/commands/snapshot.md +70 -0
- package/templates/commands/test.md +86 -0
- package/templates/hooks/post-tool-use.js +175 -0
- package/templates/hooks/stop.js +85 -0
- package/templates/hooks/user-prompt.js +96 -0
- package/templates/scripts/env-scan.sh +46 -0
- package/templates/scripts/import-graph.sh +88 -0
- package/templates/scripts/validate-boundaries.sh +180 -0
- package/templates/skills/agent-creator/SKILL.md +91 -0
- package/templates/skills/agent-creator/examples/sample-agent.md +80 -0
- package/templates/skills/agent-creator/references/agent-engineering-guide.md +596 -0
- package/templates/skills/agent-creator/references/quality-checklist.md +42 -0
- package/templates/skills/agent-creator/scripts/scaffold.sh +144 -0
- package/templates/skills/architecture-advisor/SKILL.md +92 -0
- package/templates/skills/architecture-advisor/references/database-decisions.md +61 -0
- package/templates/skills/architecture-advisor/references/decision-matrices.md +122 -0
- package/templates/skills/architecture-advisor/references/rendering-decisions.md +39 -0
- package/templates/skills/architecture-advisor/scripts/detect-scale.sh +67 -0
- package/templates/skills/debate/SKILL.md +36 -0
- package/templates/skills/debate/references/acemad-protocol.md +72 -0
- package/templates/skills/env-scanner/SKILL.md +41 -0
- package/templates/skills/security/SKILL.md +44 -0
- package/templates/skills/security/references/security-details.md +48 -0
- package/templates/skills/session-guard/SKILL.md +33 -0
- package/templates/skills/skill-creator/SKILL.md +82 -0
- package/templates/skills/skill-creator/examples/sample-skill.md +74 -0
- package/templates/skills/skill-creator/references/quality-checklist.md +36 -0
- package/templates/skills/skill-creator/references/skill-engineering-guide.md +365 -0
- package/templates/skills/skill-creator/scripts/scaffold.sh +75 -0
- package/templates/skills/test-first/SKILL.md +41 -0
|
@@ -0,0 +1,104 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: intelligence-debate
|
|
3
|
+
description: >
|
|
4
|
+
Structured adversarial debate for hard architectural decisions.
|
|
5
|
+
Opt-in only — not the default for every decision.
|
|
6
|
+
Triggers on: "debate", "tradeoff", "should we", "which is better", "decide between".
|
|
7
|
+
tokens: ~400
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
## Structured Debate Protocol [AceMAD]
|
|
11
|
+
|
|
12
|
+
Use when a decision is genuinely uncertain, multi-criteria, and wrong choice costs real time.
|
|
13
|
+
Do NOT use for routine decisions — Claude's direct answer is faster and equally good.
|
|
14
|
+
|
|
15
|
+
---
|
|
16
|
+
|
|
17
|
+
### Phase 1: Define the Question
|
|
18
|
+
State the decision as a binary or small option set:
|
|
19
|
+
"Should we do A or B?"
|
|
20
|
+
"Which architecture: X or Y?"
|
|
21
|
+
Not: "What should we do about the general problem?"
|
|
22
|
+
|
|
23
|
+
---
|
|
24
|
+
|
|
25
|
+
### Phase 2: Advocate A — MAXIMALIST
|
|
26
|
+
Argue the strongest possible case FOR the leading option.
|
|
27
|
+
- Make the best arguments you can — do not sandbag
|
|
28
|
+
- N ≤ 10 arguments maximum
|
|
29
|
+
- Each argument: one claim + one piece of evidence
|
|
30
|
+
- Label evidence: [VERIFIED] / [PARTIAL] / [UNVERIFIED] / [FALSE]
|
|
31
|
+
|
|
32
|
+
---
|
|
33
|
+
|
|
34
|
+
### Phase 3: Advocate B — SKEPTIC
|
|
35
|
+
Argue the strongest possible case AGAINST the leading option (or FOR the alternative).
|
|
36
|
+
- N ≤ 10 arguments maximum
|
|
37
|
+
- T ≤ 5 total rounds across both advocates
|
|
38
|
+
- Same evidence labeling rules apply
|
|
39
|
+
- Must address MAXIMALIST's strongest verified argument
|
|
40
|
+
|
|
41
|
+
---
|
|
42
|
+
|
|
43
|
+
### Phase 4: FACT CHECK — NON-NEGOTIABLE
|
|
44
|
+
Before synthesis, tag every claim:
|
|
45
|
+
|
|
46
|
+
| Tag | Meaning |
|
|
47
|
+
|-----|---------|
|
|
48
|
+
| [VERIFIED] | Confirmed by code, docs, or direct test |
|
|
49
|
+
| [PARTIAL] | Directionally correct but incomplete |
|
|
50
|
+
| [UNVERIFIED] | Asserted without evidence |
|
|
51
|
+
| [FALSE] | Contradicted by evidence |
|
|
52
|
+
|
|
53
|
+
**Disqualified claim language** — any argument containing:
|
|
54
|
+
"should work" / "probably" / "I believe" / "I think" → mark [UNVERIFIED], reduce weight.
|
|
55
|
+
|
|
56
|
+
If ≥ 30% of an advocate's claims are UNVERIFIED → confidence stays LOW regardless of score.
|
|
57
|
+
Adjusted evidence scores replace raw argument counts.
|
|
58
|
+
|
|
59
|
+
Truth wins. Not volume.
|
|
60
|
+
|
|
61
|
+
---
|
|
62
|
+
|
|
63
|
+
### Phase 5: Synthesis
|
|
64
|
+
|
|
65
|
+
**5a. Draft synthesis** — pick the winner, state the margin (0-100 confidence).
|
|
66
|
+
|
|
67
|
+
**5b. Second-Order Cognition Check [AceMAD]**:
|
|
68
|
+
Did the synthesis address the strongest *verified* claim from the losing side?
|
|
69
|
+
If not → revise. Confidence stays LOW until addressed.
|
|
70
|
+
|
|
71
|
+
**5b. Self-refine** — critique the draft:
|
|
72
|
+
- Did I favor the side with more words? (higher word-count ≠ stronger case)
|
|
73
|
+
- Did I ignore any [VERIFIED] claim?
|
|
74
|
+
- Is the margin honest?
|
|
75
|
+
Max 2 rounds of self-refine.
|
|
76
|
+
|
|
77
|
+
**5c. Order-Independence Check [PeerRank]**:
|
|
78
|
+
If margin < 10 points: run synthesis again with advocates in reversed order.
|
|
79
|
+
If conclusion reverses → result is INCONCLUSIVE. Present both sides to user.
|
|
80
|
+
Position bias accounts for 0.39 correlation with first-presented argument.
|
|
81
|
+
|
|
82
|
+
**5d. Deliver** — state winner, confidence, and the one verified claim that decided it.
|
|
83
|
+
|
|
84
|
+
---
|
|
85
|
+
|
|
86
|
+
### Phase 6: Length-Independence [Elo-Evolve]
|
|
87
|
+
Score on **evidence-density** (verified claims per 100 words), NOT raw word count.
|
|
88
|
+
Flag any advocate whose argument exceeds 2× the median length — the excess is noise.
|
|
89
|
+
|
|
90
|
+
**Comparative Binary Framing [Elo-Evolve]**: Use pairwise comparison (which is better: A or B?)
|
|
91
|
+
not absolute ratings (rate A /10). Pairwise reduces noise from σ_abs=35.65 to σ_comp=7.85.
|
|
92
|
+
|
|
93
|
+
---
|
|
94
|
+
|
|
95
|
+
### Phase 7: Record the Decision
|
|
96
|
+
Append to `.claude/memory/decisions.md`:
|
|
97
|
+
```
|
|
98
|
+
## {Decision Title} — {date}
|
|
99
|
+
**Question**: {the decision}
|
|
100
|
+
**Options**: {A} vs {B}
|
|
101
|
+
**Winner**: {chosen approach} (confidence: {level})
|
|
102
|
+
**Deciding claim**: {the one verified claim that settled it}
|
|
103
|
+
**Dissent**: {strongest verified argument for the loser}
|
|
104
|
+
```
|
|
@@ -0,0 +1,122 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: intelligence-elo
|
|
3
|
+
description: >
|
|
4
|
+
ELO quality ranking across options, agents, or skills. Use when comparing
|
|
5
|
+
multiple candidates and need a defensible rank order.
|
|
6
|
+
Triggers on: "rank these", "which is best", "compare quality".
|
|
7
|
+
tokens: ~200
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
## ELO Quality Ranking [PeerRank + Elo-Evolve]
|
|
11
|
+
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
### Authoritative Ownership [Elo-Evolve]
|
|
15
|
+
The loop controller (the model running this skill) owns the authoritative ELO scores.
|
|
16
|
+
Subagents that produce provisional scores feed into this controller — they do not own the final rank.
|
|
17
|
+
Provisional scores from subagents are inputs, not decisions.
|
|
18
|
+
|
|
19
|
+
Self-evaluation reliability: r=0.538 (self-scored) vs r=0.905 (loop controller reconciliation).
|
|
20
|
+
Never let an agent self-rank as authoritative.
|
|
21
|
+
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
### Pairwise Comparison Protocol
|
|
25
|
+
|
|
26
|
+
**Comparative Binary Framing**: Compare pairs, not absolutes.
|
|
27
|
+
"Is A better than B for this criterion?" — not "Rate A from 1 to 10."
|
|
28
|
+
|
|
29
|
+
For N candidates: run N×(N-1)/2 pairwise comparisons.
|
|
30
|
+
Each comparison: one winner, one loser. No ties unless evidence is genuinely equal.
|
|
31
|
+
|
|
32
|
+
---
|
|
33
|
+
|
|
34
|
+
### ELO Schema
|
|
35
|
+
|
|
36
|
+
For each candidate, track:
|
|
37
|
+
```json
|
|
38
|
+
{
|
|
39
|
+
"id": "{candidate-id}",
|
|
40
|
+
"elo": 1000,
|
|
41
|
+
"wins": 0,
|
|
42
|
+
"losses": 0,
|
|
43
|
+
"adjusted_evidence_score": 0,
|
|
44
|
+
"fact_check": {
|
|
45
|
+
"verified": 0,
|
|
46
|
+
"partial": 0,
|
|
47
|
+
"unverified": 0,
|
|
48
|
+
"false": 0
|
|
49
|
+
}
|
|
50
|
+
}
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
ELO update after each comparison:
|
|
54
|
+
- Winner: elo += 32 × (1 - expected_score)
|
|
55
|
+
- Loser: elo -= 32 × expected_score
|
|
56
|
+
- expected_score = 1 / (1 + 10^((opponent_elo - this_elo) / 400))
|
|
57
|
+
|
|
58
|
+
---
|
|
59
|
+
|
|
60
|
+
### Adjusted Evidence Score
|
|
61
|
+
|
|
62
|
+
Do not rank on raw ELO alone. Adjust for fact-check quality:
|
|
63
|
+
- `adjusted_evidence_score = verified_claims / (verified + unverified + false)`
|
|
64
|
+
- If adjusted_evidence_score < 0.5: cap ELO at 1100 regardless of wins
|
|
65
|
+
- A candidate that wins on volume but loses on evidence quality is not the best candidate
|
|
66
|
+
|
|
67
|
+
---
|
|
68
|
+
|
|
69
|
+
### Output
|
|
70
|
+
|
|
71
|
+
Rank candidates by final ELO descending.
|
|
72
|
+
For the winner: state the one pairwise comparison that determined the rank.
|
|
73
|
+
Record in `.claude/memory/decisions.md` with the full ranking table.
|
|
74
|
+
|
|
75
|
+
---
|
|
76
|
+
|
|
77
|
+
### Persistent ELO — elo-rankings.json
|
|
78
|
+
|
|
79
|
+
Write rankings to `.claude/memory/elo-rankings.json` so scores accumulate across sessions:
|
|
80
|
+
|
|
81
|
+
```json
|
|
82
|
+
{
|
|
83
|
+
"last_updated": "YYYY-MM-DD",
|
|
84
|
+
"debate_elo": [
|
|
85
|
+
{
|
|
86
|
+
"position": "{argument-id}",
|
|
87
|
+
"context": "{decision-topic}",
|
|
88
|
+
"elo": 1000,
|
|
89
|
+
"wins": 0,
|
|
90
|
+
"losses": 0,
|
|
91
|
+
"adjusted_evidence_score": 0.0
|
|
92
|
+
}
|
|
93
|
+
],
|
|
94
|
+
"agent_elo": [
|
|
95
|
+
{
|
|
96
|
+
"agent": "{agent-name}",
|
|
97
|
+
"elo": 1000,
|
|
98
|
+
"tasks_completed": 0,
|
|
99
|
+
"wins": 0,
|
|
100
|
+
"losses": 0,
|
|
101
|
+
"last_task": "YYYY-MM-DD"
|
|
102
|
+
}
|
|
103
|
+
],
|
|
104
|
+
"pattern_elo": [
|
|
105
|
+
{
|
|
106
|
+
"pattern": "{pattern-description}",
|
|
107
|
+
"source": ".claude/memory/patterns.md",
|
|
108
|
+
"elo": 1000,
|
|
109
|
+
"times_applied": 0,
|
|
110
|
+
"times_succeeded": 0,
|
|
111
|
+
"times_failed": 0
|
|
112
|
+
}
|
|
113
|
+
]
|
|
114
|
+
}
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
**Update rules:**
|
|
118
|
+
- `debate_elo`: updated after each `/debate` cycle — one entry per argued position
|
|
119
|
+
- `agent_elo`: updated after each pipeline run — winner = agent whose output was accepted without revision
|
|
120
|
+
- `pattern_elo`: updated after each task — pattern succeeded if the approach worked, failed if re-derivation was triggered
|
|
121
|
+
|
|
122
|
+
Read `elo-rankings.json` at the start of each debate or agent spawn. Pass as context so prior performance informs the current run.
|
|
@@ -0,0 +1,86 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: intelligence-experiment
|
|
3
|
+
description: >
|
|
4
|
+
Experiment agent with worktree isolation. Use when testing a risky approach
|
|
5
|
+
that must not affect the main branch until proven. Triggers on: "try this approach",
|
|
6
|
+
"experiment with", "test this idea safely", "isolated experiment", "worktree".
|
|
7
|
+
tokens: ~80
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
## Experiment Agent
|
|
11
|
+
|
|
12
|
+
Run risky explorations in an isolated git worktree. Main branch is never touched.
|
|
13
|
+
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
### When to Use
|
|
17
|
+
|
|
18
|
+
- Refactoring with uncertain outcome
|
|
19
|
+
- Trying a new library or framework approach
|
|
20
|
+
- Architectural change that might break things
|
|
21
|
+
- Any "I want to try X but not break the main branch" scenario
|
|
22
|
+
|
|
23
|
+
If the experiment succeeds → merge. If it fails → discard the worktree. Zero cleanup.
|
|
24
|
+
|
|
25
|
+
---
|
|
26
|
+
|
|
27
|
+
### Agent Definition Template
|
|
28
|
+
|
|
29
|
+
```yaml
|
|
30
|
+
---
|
|
31
|
+
name: experiment-{task-name}
|
|
32
|
+
description: >
|
|
33
|
+
Isolated experiment for {task}. Runs in worktree, cannot affect main branch.
|
|
34
|
+
model: claude-sonnet-4-6
|
|
35
|
+
maxTurns: 30
|
|
36
|
+
tools:
|
|
37
|
+
- Read
|
|
38
|
+
- Write
|
|
39
|
+
- Edit
|
|
40
|
+
- Bash
|
|
41
|
+
- Glob
|
|
42
|
+
- Grep
|
|
43
|
+
permissionMode: acceptEdits
|
|
44
|
+
isolation: worktree
|
|
45
|
+
background: false
|
|
46
|
+
---
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
The `isolation: worktree` field is what creates a separate git branch automatically.
|
|
50
|
+
The agent operates on a copy — main branch is read-only from the experiment's perspective.
|
|
51
|
+
|
|
52
|
+
---
|
|
53
|
+
|
|
54
|
+
### Experiment Protocol
|
|
55
|
+
|
|
56
|
+
**Before starting**: define the hypothesis and success criteria.
|
|
57
|
+
```
|
|
58
|
+
Hypothesis: {what you believe will happen}
|
|
59
|
+
Success: {measurable outcome that proves it worked}
|
|
60
|
+
Failure: {measurable outcome that proves it didn't}
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
**After completing** — log the outcome:
|
|
64
|
+
|
|
65
|
+
If experiment succeeded:
|
|
66
|
+
```bash
|
|
67
|
+
echo "\n## {task} — $(date +%Y-%m-%d)\n{what worked and why — include hypothesis}" >> .claude/memory/patterns.md
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
If experiment failed:
|
|
71
|
+
```bash
|
|
72
|
+
echo "\n## {task} — $(date +%Y-%m-%d)\n{what failed, why, what to try instead}" >> .claude/memory/antipatterns.md
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
**Never discard without logging.** A failed experiment that teaches something is a successful experiment.
|
|
76
|
+
|
|
77
|
+
---
|
|
78
|
+
|
|
79
|
+
### Merge Decision
|
|
80
|
+
|
|
81
|
+
After experiment completes:
|
|
82
|
+
- Tests pass + success criteria met → merge to main
|
|
83
|
+
- Tests fail or success criteria not met → discard worktree, log antipattern
|
|
84
|
+
- Partial success → extract the working parts before discarding
|
|
85
|
+
|
|
86
|
+
Do NOT merge experiments that pass tests but fail success criteria. Tests prove correctness; success criteria prove value.
|
|
@@ -0,0 +1,84 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: intelligence-opro
|
|
3
|
+
description: >
|
|
4
|
+
OPRO + APE prompt optimization. Improves skill instructions by learning
|
|
5
|
+
from history. Use when a skill underperforms or after 10+ uses.
|
|
6
|
+
Triggers on: "optimize prompts", "improve skill", "skill isn't working".
|
|
7
|
+
tokens: ~300
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
## Prompt Optimization [OPRO + APE]
|
|
11
|
+
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
### Step 1: Read History [OPRO]
|
|
15
|
+
```bash
|
|
16
|
+
cat .claude/memory/prompt-history.json 2>/dev/null
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
If it exists:
|
|
20
|
+
- Top 5 scoring instructions → positive signal (what worked)
|
|
21
|
+
- Bottom 2 scoring instructions → negative signal (what failed)
|
|
22
|
+
- These signals shape the new instruction variants
|
|
23
|
+
|
|
24
|
+
If it doesn't exist: create it after this optimization run.
|
|
25
|
+
|
|
26
|
+
---
|
|
27
|
+
|
|
28
|
+
### Step 2: Generate 3 Variants [APE]
|
|
29
|
+
|
|
30
|
+
Write 3 instruction variants for the target skill:
|
|
31
|
+
|
|
32
|
+
**Variant A — Few-shot enrichment**:
|
|
33
|
+
Add 2-3 concrete examples of correct behavior from this project.
|
|
34
|
+
The examples are real, not hypothetical.
|
|
35
|
+
|
|
36
|
+
**Variant B — Chain-of-thought**:
|
|
37
|
+
Add explicit reasoning steps before the action.
|
|
38
|
+
"First identify X, then check Y, then do Z."
|
|
39
|
+
|
|
40
|
+
**Variant C — Domain context**:
|
|
41
|
+
Add domain-specific vocabulary and constraints at the top.
|
|
42
|
+
Ground the instruction in the project's actual language.
|
|
43
|
+
|
|
44
|
+
Save all 3 to `.claude/memory/prompt-versions/{skill}-{date}/`:
|
|
45
|
+
- `variant-a.md`
|
|
46
|
+
- `variant-b.md`
|
|
47
|
+
- `variant-c.md`
|
|
48
|
+
|
|
49
|
+
---
|
|
50
|
+
|
|
51
|
+
### Step 3: Score and Select
|
|
52
|
+
|
|
53
|
+
Apply each variant to 3 real inputs from the current project (k=3 gate).
|
|
54
|
+
Score on:
|
|
55
|
+
- Correctness of output (0-10)
|
|
56
|
+
- Token efficiency (output length vs. information density)
|
|
57
|
+
- Self-applicability (would an unfamiliar agent apply it correctly?)
|
|
58
|
+
|
|
59
|
+
Keep the winner. Discard the others (don't delete — archive).
|
|
60
|
+
|
|
61
|
+
---
|
|
62
|
+
|
|
63
|
+
### Step 4: Update History [OPRO]
|
|
64
|
+
|
|
65
|
+
Append to `.claude/memory/prompt-history.json`:
|
|
66
|
+
```json
|
|
67
|
+
{
|
|
68
|
+
"skill": "{skill-name}",
|
|
69
|
+
"date": "{ISO date}",
|
|
70
|
+
"winner": "{variant-a|b|c}",
|
|
71
|
+
"score": {0-10},
|
|
72
|
+
"why": "{one sentence: what made it better}"
|
|
73
|
+
}
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
This entry feeds the next OPRO cycle. The history grows, the signal sharpens.
|
|
77
|
+
|
|
78
|
+
---
|
|
79
|
+
|
|
80
|
+
### Step 5: Deploy Winner
|
|
81
|
+
|
|
82
|
+
Replace the skill body with the winning variant.
|
|
83
|
+
Keep the frontmatter (name, description, tokens) unchanged.
|
|
84
|
+
Update the token estimate if the new body is significantly different.
|
|
@@ -0,0 +1,149 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: intelligence-pipeline
|
|
3
|
+
description: >
|
|
4
|
+
Pipeline agent design. Use when 3+ agents chain output to input.
|
|
5
|
+
Ensures clean context passing, no context bleed between agents.
|
|
6
|
+
Triggers on: "chain agents", "pipeline", "agent A feeds agent B".
|
|
7
|
+
tokens: ~250
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
## Pipeline Agent Design
|
|
11
|
+
|
|
12
|
+
Use when work genuinely requires sequential agents where each feeds the next.
|
|
13
|
+
Do NOT use for work one agent can do with tools directly.
|
|
14
|
+
|
|
15
|
+
---
|
|
16
|
+
|
|
17
|
+
### Pipeline Building Blocks
|
|
18
|
+
|
|
19
|
+
Six primitives. Every pipeline is a composition of these.
|
|
20
|
+
|
|
21
|
+
| Block | What it does | When to use |
|
|
22
|
+
|-------|-------------|-------------|
|
|
23
|
+
| **Sequential** | A → B → C | Each step needs the previous result |
|
|
24
|
+
| **Parallel** | A + B + C → merge | Independent tasks that can run simultaneously |
|
|
25
|
+
| **Reflect** | Agent reviews its own output before passing it | High-stakes output, expensive to fix downstream |
|
|
26
|
+
| **Debate** | Two agents argue a position → synthesizer picks winner | Architectural decisions, tradeoffs |
|
|
27
|
+
| **Summarize** | Compresses large context before passing to next agent | When Agent A output would overflow Agent B context |
|
|
28
|
+
| **Tool-use** | Agent calls scripts/tools and passes JSON result | Programmatic data (never raw tool transcripts) |
|
|
29
|
+
|
|
30
|
+
**Compose by need**: most pipelines are Sequential + Summarize. Add Reflect only for high-risk steps.
|
|
31
|
+
Adding Debate to every pipeline wastes tokens — reserve for genuine tradeoffs.
|
|
32
|
+
|
|
33
|
+
---
|
|
34
|
+
|
|
35
|
+
### Pre-Built Pipeline Templates
|
|
36
|
+
|
|
37
|
+
#### Feature Pipeline (new feature implementation)
|
|
38
|
+
```
|
|
39
|
+
Agents: planner → implementer → reviewer
|
|
40
|
+
Planner input: feature description + CLAUDE.md
|
|
41
|
+
Planner output: { files_to_change, test_plan, approach }
|
|
42
|
+
Implementer input: planner output + tdd.md
|
|
43
|
+
Implementer output: { files_changed, tests_written, test_results }
|
|
44
|
+
Reviewer input: implementer output + spec
|
|
45
|
+
Reviewer output: { spec_compliance: pass|fail, issues: [...] }
|
|
46
|
+
Block types: Sequential + Reflect (implementer self-reviews tests before passing)
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
#### Fix Pipeline (bug investigation)
|
|
50
|
+
```
|
|
51
|
+
Agents: investigator → hypothesizer → fixer
|
|
52
|
+
Investigator input: error description + relevant files
|
|
53
|
+
Investigator output: { root_cause, affected_files, reproduction_steps }
|
|
54
|
+
Hypothesizer input: investigator output
|
|
55
|
+
Hypothesizer output: { hypothesis, fix_approach, risk: low|medium|high }
|
|
56
|
+
Fixer input: hypothesizer output (high risk → add Debate block before fix)
|
|
57
|
+
Fixer output: { files_changed, tests_passing, fix_summary }
|
|
58
|
+
Block types: Sequential (+ Debate if risk = high)
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
#### Review Pipeline (code review)
|
|
62
|
+
```
|
|
63
|
+
Agents: spec-checker → quality-checker
|
|
64
|
+
Spec-checker input: PR diff + spec/requirements
|
|
65
|
+
Spec-checker output: { spec_compliance: pass|fail, violations: [...] }
|
|
66
|
+
Quality-checker input: PR diff + spec-checker output
|
|
67
|
+
GATE: if spec_compliance = fail → stop, return spec-checker output (do not proceed)
|
|
68
|
+
Quality-checker output: { quality_issues: [...], suggestions: [...] }
|
|
69
|
+
Block types: Sequential with hard gate
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
#### Architecture Pipeline (major design decision)
|
|
73
|
+
```
|
|
74
|
+
Agents: analyst → maximalist → skeptic → synthesizer
|
|
75
|
+
Analyst input: problem statement + codebase signals
|
|
76
|
+
Analyst output: { options: [A, B, C], constraints, tradeoffs }
|
|
77
|
+
Maximalist input: analyst output → argues for best option
|
|
78
|
+
Skeptic input: analyst output → argues against best option
|
|
79
|
+
Synthesizer input: maximalist + skeptic outputs → picks winner with reasoning
|
|
80
|
+
Block types: Sequential → Parallel (maximalist + skeptic run together) → Sequential
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
---
|
|
84
|
+
|
|
85
|
+
### Pipeline Validity Check
|
|
86
|
+
Before building a pipeline, confirm:
|
|
87
|
+
1. Can a single agent do this with Read/Write/Bash tools? If yes → don't pipeline.
|
|
88
|
+
2. Is the output of Agent A genuinely the input of Agent B? If output needs human review first → don't pipeline.
|
|
89
|
+
3. Is each agent's task parallelizable instead? If yes → spawn parallel agents, not a pipeline.
|
|
90
|
+
|
|
91
|
+
Pipeline = sequential dependency. Parallel = independent tasks. Don't confuse them.
|
|
92
|
+
|
|
93
|
+
---
|
|
94
|
+
|
|
95
|
+
### Context Passing Rule
|
|
96
|
+
Each agent receives ONLY:
|
|
97
|
+
- The output of the previous agent (not the previous agent's full context)
|
|
98
|
+
- Its own capability file (the micro-section for its specific task)
|
|
99
|
+
- Shared rules if relevant (tdd.md, completion-rule.md)
|
|
100
|
+
|
|
101
|
+
**NEVER pass the full context window of Agent A to Agent B.**
|
|
102
|
+
Agent B's context starts fresh. It gets a summary, not a transcript.
|
|
103
|
+
|
|
104
|
+
---
|
|
105
|
+
|
|
106
|
+
### Pipeline Schema
|
|
107
|
+
Define before building:
|
|
108
|
+
```
|
|
109
|
+
Pipeline: {name}
|
|
110
|
+
|
|
111
|
+
Agent 1: {task}
|
|
112
|
+
Input: {what it receives}
|
|
113
|
+
Output: {what it produces — this is Agent 2's input}
|
|
114
|
+
|
|
115
|
+
Agent 2: {task}
|
|
116
|
+
Input: {Agent 1's output format}
|
|
117
|
+
Output: {what it produces — this is Agent 3's input}
|
|
118
|
+
|
|
119
|
+
Agent 3: {task}
|
|
120
|
+
Input: {Agent 2's output format}
|
|
121
|
+
Output: {final result format}
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
The output format of each agent must exactly match the input format of the next.
|
|
125
|
+
If they don't match: the pipeline has a bug before it's built.
|
|
126
|
+
|
|
127
|
+
---
|
|
128
|
+
|
|
129
|
+
### Knowledge Passing [AutoAgent]
|
|
130
|
+
When passing knowledge between agents:
|
|
131
|
+
- Pass structured objects (JSON), not prose summaries
|
|
132
|
+
- Include only fields the next agent actually uses
|
|
133
|
+
- Compress intermediate results: script runs → summary output only
|
|
134
|
+
|
|
135
|
+
Intermediate tool call results stay in the script's scope, not in Claude's context.
|
|
136
|
+
Only the final structured output crosses the agent boundary.
|
|
137
|
+
|
|
138
|
+
---
|
|
139
|
+
|
|
140
|
+
### Pipeline Map
|
|
141
|
+
Document in `.claude/memory/pipeline-map.md`:
|
|
142
|
+
```
|
|
143
|
+
{pipeline-name}:
|
|
144
|
+
A → B → C
|
|
145
|
+
A output: {format}
|
|
146
|
+
B output: {format}
|
|
147
|
+
C output: {format}
|
|
148
|
+
Total token budget: ~{estimate}
|
|
149
|
+
```
|
|
@@ -0,0 +1,52 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: level1-claudemd
|
|
3
|
+
description: >
|
|
4
|
+
Build Level 1: create or improve the project CLAUDE.md.
|
|
5
|
+
Triggers on: "build level 1", "set up CLAUDE.md", "configure project rules".
|
|
6
|
+
tokens: ~200
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Level 1: CLAUDE.md
|
|
10
|
+
|
|
11
|
+
CLAUDE.md is the only always-hot file. It must be lightweight (~30 lines).
|
|
12
|
+
Do not embed logic here. Use pointers to capability files.
|
|
13
|
+
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
### What Goes In CLAUDE.md
|
|
17
|
+
1. **Identity** — project name, domain, stack, scale (3 lines max)
|
|
18
|
+
2. **Rules** — 2-3 non-negotiable rules only. No lists of guidelines.
|
|
19
|
+
3. **Session state** — pointer to goals.md (2 lines)
|
|
20
|
+
4. **Task routing** — quick dispatch table + pointer to manifest.md (~10 lines)
|
|
21
|
+
5. **Trade-Off Hierarchies** — what wins when priorities conflict (3 lines)
|
|
22
|
+
6. **Available commands** — one line list
|
|
23
|
+
|
|
24
|
+
Total: ~30 lines. If it grows past 40 lines, move content to a capability file.
|
|
25
|
+
|
|
26
|
+
---
|
|
27
|
+
|
|
28
|
+
### What Does NOT Go In CLAUDE.md
|
|
29
|
+
- Detailed instructions (those go in capability files)
|
|
30
|
+
- Level builder logic (that's in level-builders/)
|
|
31
|
+
- Memory of past sessions (that's in goals.md, injected via hook)
|
|
32
|
+
- Domain knowledge (that's in knowledge-index.md)
|
|
33
|
+
|
|
34
|
+
---
|
|
35
|
+
|
|
36
|
+
### Fill the Template
|
|
37
|
+
Replace these placeholders in templates/CLAUDE.md:
|
|
38
|
+
- `{{PROJECT_NAME}}` — actual project name
|
|
39
|
+
- `{{PROJECT_DESCRIPTION}}` — 1-2 sentence description
|
|
40
|
+
- `{{DOMAIN}}` — detected domain (developer/writer/researcher/compliance)
|
|
41
|
+
- `{{STACK}}` — detected stack
|
|
42
|
+
- `{{SCALE}}` — STANDARD/SKIM/MINIMAL/STRUCTURE-ONLY
|
|
43
|
+
- `{{TDD_RULE}}` — insert TDD Iron Law if developer domain, else remove line
|
|
44
|
+
- `{{PRIORITY_1/2/3}}` — trade-off hierarchy from domain signals
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
48
|
+
### Trade-Off Hierarchies — Pre-fill from Domain Signals
|
|
49
|
+
- Developer project → correctness > speed > elegance
|
|
50
|
+
- Compliance project → regulatory conformity > completeness > efficiency
|
|
51
|
+
- Writer project → clarity > structure > length
|
|
52
|
+
- Research project → evidence quality > comprehensiveness > recency
|
|
@@ -0,0 +1,58 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: level2-mcp
|
|
3
|
+
description: >
|
|
4
|
+
Build Level 2: configure MCP servers for this project.
|
|
5
|
+
Triggers on: "build level 2", "add MCP", "configure MCP servers".
|
|
6
|
+
tokens: ~150
|
|
7
|
+
requires: level1-claudemd
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
## Level 2: MCP Servers
|
|
11
|
+
|
|
12
|
+
MCP servers extend Claude's tool set. They are deferred-loaded — only connect
|
|
13
|
+
when an agent needs that specific capability.
|
|
14
|
+
|
|
15
|
+
---
|
|
16
|
+
|
|
17
|
+
### Detect What's Needed
|
|
18
|
+
Read signals from the project:
|
|
19
|
+
- Database files (sqlite, postgres connection strings) → database MCP
|
|
20
|
+
- Browser/web tasks mentioned in README → browser/playwright MCP
|
|
21
|
+
- File system operations beyond the project → filesystem MCP
|
|
22
|
+
- External APIs in dependencies → relevant API MCPs
|
|
23
|
+
|
|
24
|
+
Only install what the project actually needs. Not a default list.
|
|
25
|
+
|
|
26
|
+
---
|
|
27
|
+
|
|
28
|
+
### Configure `.mcp.json`
|
|
29
|
+
```json
|
|
30
|
+
{
|
|
31
|
+
"mcpServers": {
|
|
32
|
+
"{server-name}": {
|
|
33
|
+
"command": "{command}",
|
|
34
|
+
"args": ["{args}"],
|
|
35
|
+
"env": {
|
|
36
|
+
"{KEY}": "${ENV_VAR}"
|
|
37
|
+
}
|
|
38
|
+
}
|
|
39
|
+
}
|
|
40
|
+
}
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
Place `.mcp.json` in the project root.
|
|
44
|
+
|
|
45
|
+
---
|
|
46
|
+
|
|
47
|
+
### Deferred Loading Rule
|
|
48
|
+
MCPs are NOT loaded at session start. Each subagent connects only when it needs
|
|
49
|
+
that capability. Do not list all MCPs in CLAUDE.md — they appear in manifest.md
|
|
50
|
+
if they need to be explicitly dispatched.
|
|
51
|
+
|
|
52
|
+
---
|
|
53
|
+
|
|
54
|
+
### Verification
|
|
55
|
+
After configuring:
|
|
56
|
+
- Run `claude mcp list` to confirm servers are registered
|
|
57
|
+
- Test one tool from each server before declaring Level 2 complete
|
|
58
|
+
- Document each server in manifest.md if it needs to be discoverable by agents
|