azclaude-copilot 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +27 -0
- package/.claude-plugin/plugin.json +17 -0
- package/LICENSE +21 -0
- package/README.md +477 -0
- package/bin/cli.js +1027 -0
- package/bin/copilot.js +228 -0
- package/hooks/README.md +3 -0
- package/hooks/hooks.json +40 -0
- package/package.json +41 -0
- package/templates/CLAUDE.md +51 -0
- package/templates/agents/cc-cli-integrator.md +104 -0
- package/templates/agents/cc-template-author.md +109 -0
- package/templates/agents/cc-test-maintainer.md +101 -0
- package/templates/agents/code-reviewer.md +136 -0
- package/templates/agents/loop-controller.md +118 -0
- package/templates/agents/orchestrator-init.md +196 -0
- package/templates/agents/test-writer.md +129 -0
- package/templates/capabilities/evolution/cycle2-knowledge.md +87 -0
- package/templates/capabilities/evolution/cycle3-topology.md +128 -0
- package/templates/capabilities/evolution/detect.md +103 -0
- package/templates/capabilities/evolution/evaluate.md +90 -0
- package/templates/capabilities/evolution/generate.md +123 -0
- package/templates/capabilities/evolution/re-derivation.md +77 -0
- package/templates/capabilities/intelligence/debate.md +104 -0
- package/templates/capabilities/intelligence/elo.md +122 -0
- package/templates/capabilities/intelligence/experiment.md +86 -0
- package/templates/capabilities/intelligence/opro.md +84 -0
- package/templates/capabilities/intelligence/pipeline.md +149 -0
- package/templates/capabilities/level-builders/level1-claudemd.md +52 -0
- package/templates/capabilities/level-builders/level2-mcp.md +58 -0
- package/templates/capabilities/level-builders/level3-skills.md +276 -0
- package/templates/capabilities/level-builders/level4-memory.md +72 -0
- package/templates/capabilities/level-builders/level5-agents.md +123 -0
- package/templates/capabilities/level-builders/level6-hooks.md +119 -0
- package/templates/capabilities/level-builders/level7-extmcp.md +60 -0
- package/templates/capabilities/level-builders/level8-orchestrated.md +98 -0
- package/templates/capabilities/manifest.md +58 -0
- package/templates/capabilities/shared/5-layer-agent.md +206 -0
- package/templates/capabilities/shared/completion-rule.md +44 -0
- package/templates/capabilities/shared/context-artifacts.md +96 -0
- package/templates/capabilities/shared/domain-advisor-generator.md +205 -0
- package/templates/capabilities/shared/friction-log.md +43 -0
- package/templates/capabilities/shared/multi-cli-paths.md +56 -0
- package/templates/capabilities/shared/native-tools.md +199 -0
- package/templates/capabilities/shared/plan-tracker.md +69 -0
- package/templates/capabilities/shared/pressure-test.md +88 -0
- package/templates/capabilities/shared/quality-check.md +83 -0
- package/templates/capabilities/shared/reflexes.md +159 -0
- package/templates/capabilities/shared/review-reception.md +70 -0
- package/templates/capabilities/shared/security.md +174 -0
- package/templates/capabilities/shared/semantic-boundary-check.md +140 -0
- package/templates/capabilities/shared/session-rhythm.md +42 -0
- package/templates/capabilities/shared/tdd.md +54 -0
- package/templates/capabilities/shared/vocabulary-transform.md +63 -0
- package/templates/commands/add.md +152 -0
- package/templates/commands/audit.md +123 -0
- package/templates/commands/blueprint.md +115 -0
- package/templates/commands/copilot.md +157 -0
- package/templates/commands/create.md +156 -0
- package/templates/commands/debate.md +75 -0
- package/templates/commands/deps.md +112 -0
- package/templates/commands/doc.md +100 -0
- package/templates/commands/dream.md +120 -0
- package/templates/commands/evolve.md +170 -0
- package/templates/commands/explain.md +25 -0
- package/templates/commands/find.md +100 -0
- package/templates/commands/fix.md +122 -0
- package/templates/commands/hookify.md +100 -0
- package/templates/commands/level-up.md +48 -0
- package/templates/commands/loop.md +62 -0
- package/templates/commands/migrate.md +119 -0
- package/templates/commands/persist.md +73 -0
- package/templates/commands/pulse.md +87 -0
- package/templates/commands/refactor.md +97 -0
- package/templates/commands/reflect.md +107 -0
- package/templates/commands/reflexes.md +141 -0
- package/templates/commands/setup.md +97 -0
- package/templates/commands/ship.md +131 -0
- package/templates/commands/snapshot.md +70 -0
- package/templates/commands/test.md +86 -0
- package/templates/hooks/post-tool-use.js +175 -0
- package/templates/hooks/stop.js +85 -0
- package/templates/hooks/user-prompt.js +96 -0
- package/templates/scripts/env-scan.sh +46 -0
- package/templates/scripts/import-graph.sh +88 -0
- package/templates/scripts/validate-boundaries.sh +180 -0
- package/templates/skills/agent-creator/SKILL.md +91 -0
- package/templates/skills/agent-creator/examples/sample-agent.md +80 -0
- package/templates/skills/agent-creator/references/agent-engineering-guide.md +596 -0
- package/templates/skills/agent-creator/references/quality-checklist.md +42 -0
- package/templates/skills/agent-creator/scripts/scaffold.sh +144 -0
- package/templates/skills/architecture-advisor/SKILL.md +92 -0
- package/templates/skills/architecture-advisor/references/database-decisions.md +61 -0
- package/templates/skills/architecture-advisor/references/decision-matrices.md +122 -0
- package/templates/skills/architecture-advisor/references/rendering-decisions.md +39 -0
- package/templates/skills/architecture-advisor/scripts/detect-scale.sh +67 -0
- package/templates/skills/debate/SKILL.md +36 -0
- package/templates/skills/debate/references/acemad-protocol.md +72 -0
- package/templates/skills/env-scanner/SKILL.md +41 -0
- package/templates/skills/security/SKILL.md +44 -0
- package/templates/skills/security/references/security-details.md +48 -0
- package/templates/skills/session-guard/SKILL.md +33 -0
- package/templates/skills/skill-creator/SKILL.md +82 -0
- package/templates/skills/skill-creator/examples/sample-skill.md +74 -0
- package/templates/skills/skill-creator/references/quality-checklist.md +36 -0
- package/templates/skills/skill-creator/references/skill-engineering-guide.md +365 -0
- package/templates/skills/skill-creator/scripts/scaffold.sh +75 -0
- package/templates/skills/test-first/SKILL.md +41 -0
|
@@ -0,0 +1,87 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: evolution-cycle2-knowledge
|
|
3
|
+
description: >
|
|
4
|
+
Load when patterns.md has 10+ entries that haven't been reviewed. Load when
|
|
5
|
+
goals.md has stale sessions older than 2 weeks. Load when the same pattern
|
|
6
|
+
appears in multiple session files and hasn't been consolidated. Load after
|
|
7
|
+
detect+generate+evaluate as part of /evolve to close the learning loop.
|
|
8
|
+
tokens: ~200
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## Cycle 2: Knowledge Consolidation
|
|
12
|
+
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
### HARVEST — 3-layer retrieval (same as detect.md Layer 1-3)
|
|
16
|
+
Read only what's recent and relevant:
|
|
17
|
+
- Session summaries from last 5 sessions
|
|
18
|
+
- Friction logs from last 3 sessions
|
|
19
|
+
- Any learnings files modified in last 10 commits
|
|
20
|
+
|
|
21
|
+
---
|
|
22
|
+
|
|
23
|
+
### CONSOLIDATE
|
|
24
|
+
Group harvested content by pattern:
|
|
25
|
+
- Same friction appearing in 3+ sessions → candidate for re-derivation
|
|
26
|
+
- Same workflow appearing in 3+ sessions → candidate for skill
|
|
27
|
+
- Same fact referenced in 3+ files → candidate for knowledge-index entry
|
|
28
|
+
|
|
29
|
+
Rules:
|
|
30
|
+
- Do not consolidate single-occurrence items — they are noise
|
|
31
|
+
- Consolidation writes ONE new file or updates ONE existing file
|
|
32
|
+
- Never write to a file that is > 7 days old without reading it first
|
|
33
|
+
|
|
34
|
+
---
|
|
35
|
+
|
|
36
|
+
### PRUNE — Importance Scoring
|
|
37
|
+
|
|
38
|
+
Before archiving, score each memory file:
|
|
39
|
+
|
|
40
|
+
```
|
|
41
|
+
importance = (frequency × 3) + (recency × 2) + (impact × 5)
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
| Factor | How to measure | Scale |
|
|
45
|
+
|--------|---------------|-------|
|
|
46
|
+
| frequency | How many session logs reference this file | 0-10 |
|
|
47
|
+
| recency | Days since last reference (0=today, 10=never) | 0-10 (inverted) |
|
|
48
|
+
| impact | Did this file change a decision? (grep decisions.md) | 0-10 |
|
|
49
|
+
|
|
50
|
+
**Thresholds:**
|
|
51
|
+
- importance ≥ 30 → promote to core memory (load every session)
|
|
52
|
+
- importance 15-29 → keep, load on demand
|
|
53
|
+
- importance < 15 → archive candidate
|
|
54
|
+
|
|
55
|
+
Check git history:
|
|
56
|
+
```bash
|
|
57
|
+
git log --oneline -20 -- .claude/memory/
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
If importance < 15 AND not referenced in last 10 sessions → archive:
|
|
61
|
+
```bash
|
|
62
|
+
mkdir -p .claude/memory/archive
|
|
63
|
+
mv .claude/memory/{file}.md .claude/memory/archive/{file}.md
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
Do NOT delete — archive. Deleted knowledge cannot be recovered.
|
|
67
|
+
|
|
68
|
+
---
|
|
69
|
+
|
|
70
|
+
### ENRICH knowledge-index.md
|
|
71
|
+
If a `knowledge/` directory exists:
|
|
72
|
+
```
|
|
73
|
+
| file | summary | key_questions | tags |
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
- `key_questions`: 2-3 actual questions this file answers (not tags — real questions the model would ask)
|
|
77
|
+
- Purpose: grep-based retrieval. The model searches key_questions to find which file to read.
|
|
78
|
+
- DO NOT load these files into memory. Use the index to find which file to read on demand.
|
|
79
|
+
|
|
80
|
+
---
|
|
81
|
+
|
|
82
|
+
### LOG
|
|
83
|
+
Append to `.claude/memory/sessions/{date}-cycle2.md`:
|
|
84
|
+
- What was harvested
|
|
85
|
+
- What was consolidated
|
|
86
|
+
- What was pruned (archived)
|
|
87
|
+
- What was added to knowledge-index
|
|
@@ -0,0 +1,128 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: evolution-cycle3-topology
|
|
3
|
+
description: >
|
|
4
|
+
Load when agents overlap in scope and you're not sure which handles what. Load
|
|
5
|
+
when a pipeline feels slow or agents are passing too much context to each other.
|
|
6
|
+
Load when manifest.md has capabilities that never get loaded. Load during /level-up
|
|
7
|
+
or when the user says "too many agents" or "agents are getting confused".
|
|
8
|
+
tokens: ~250
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## Cycle 3: Topology Optimization
|
|
12
|
+
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
### INVENTORY
|
|
16
|
+
Map the current environment:
|
|
17
|
+
```bash
|
|
18
|
+
ls .claude/capabilities/
|
|
19
|
+
ls .claude/capabilities/evolution/ .claude/capabilities/intelligence/ .claude/capabilities/shared/
|
|
20
|
+
ls .claude/commands/
|
|
21
|
+
ls .claude/agents/ 2>/dev/null
|
|
22
|
+
wc -l .claude/capabilities/**/*.md
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
For each capability file:
|
|
26
|
+
- Record line count
|
|
27
|
+
- Record token estimate (lines × 12 approx)
|
|
28
|
+
- Record last modified date
|
|
29
|
+
|
|
30
|
+
---
|
|
31
|
+
|
|
32
|
+
### MEASURE
|
|
33
|
+
Score each file on two axes:
|
|
34
|
+
|
|
35
|
+
**Usage score (1-10)**: How often is this file loaded per session?
|
|
36
|
+
- Check friction logs for references
|
|
37
|
+
- Check session summaries for mentions
|
|
38
|
+
- Score 1 = never referenced, Score 10 = every session
|
|
39
|
+
|
|
40
|
+
**Quality score (1-10)**: Does it pass Five Quality Criteria?
|
|
41
|
+
- Run Five Quality Criteria from evaluate.md on each file
|
|
42
|
+
- Score = number of criteria passed × 2
|
|
43
|
+
|
|
44
|
+
**ELO rank**: Sort files by (usage × quality) descending.
|
|
45
|
+
High score → keep and improve. Low score → candidate for pruning.
|
|
46
|
+
|
|
47
|
+
---
|
|
48
|
+
|
|
49
|
+
### OPTIMIZE
|
|
50
|
+
For each file scoring below threshold (usage < 3 AND quality < 6):
|
|
51
|
+
|
|
52
|
+
1. Is it referenced in manifest.md? If not → it's dead weight. Archive it.
|
|
53
|
+
2. Is it too long (> 150 lines)? Split it into two focused files.
|
|
54
|
+
3. Is its description in manifest.md accurate? If not → update manifest row.
|
|
55
|
+
4. Does it overlap with another file? Merge or clarify boundary.
|
|
56
|
+
|
|
57
|
+
**Pipeline check** (if agents chain):
|
|
58
|
+
- Does each agent receive only what it needs?
|
|
59
|
+
- Is any agent passing its full context window to the next? That's a violation.
|
|
60
|
+
- Fix: pass only the output, not the context that produced it.
|
|
61
|
+
|
|
62
|
+
---
|
|
63
|
+
|
|
64
|
+
### RECORD
|
|
65
|
+
Update manifest.md token estimates based on actual line counts.
|
|
66
|
+
Append topology report to `.claude/memory/sessions/{date}-cycle3.md`.
|
|
67
|
+
|
|
68
|
+
---
|
|
69
|
+
|
|
70
|
+
### PRUNE AGENTS
|
|
71
|
+
For any custom agent in `.claude/agents/`:
|
|
72
|
+
- Does it have all 5 layers? (see agent-creator skill or shared/5-layer-agent.md)
|
|
73
|
+
- Is it still needed, or has a skill replaced it?
|
|
74
|
+
- If obsolete: archive to `.claude/agents/archive/`
|
|
75
|
+
|
|
76
|
+
---
|
|
77
|
+
|
|
78
|
+
### UPDATE PIPELINES
|
|
79
|
+
If 3+ agents chain (A → B → C):
|
|
80
|
+
- Verify B receives only A's output, not A's full context
|
|
81
|
+
- Verify C receives only B's output
|
|
82
|
+
- Document the chain in `.claude/memory/pipeline-map.md`
|
|
83
|
+
|
|
84
|
+
---
|
|
85
|
+
|
|
86
|
+
### Topology Map Schema
|
|
87
|
+
|
|
88
|
+
Write / update `.claude/memory/topology-map.json` after every Cycle 3 run:
|
|
89
|
+
|
|
90
|
+
```json
|
|
91
|
+
{
|
|
92
|
+
"last_updated": "YYYY-MM-DD",
|
|
93
|
+
"pipelines": [
|
|
94
|
+
{
|
|
95
|
+
"name": "{pipeline-name}",
|
|
96
|
+
"chain": ["agent-a", "agent-b", "agent-c"],
|
|
97
|
+
"uses": 0,
|
|
98
|
+
"quality_score": 0,
|
|
99
|
+
"influence_score": 0,
|
|
100
|
+
"last_used": "YYYY-MM-DD"
|
|
101
|
+
}
|
|
102
|
+
],
|
|
103
|
+
"agents": [
|
|
104
|
+
{
|
|
105
|
+
"name": "{agent-name}",
|
|
106
|
+
"file": ".claude/agents/{name}.md",
|
|
107
|
+
"uses": 0,
|
|
108
|
+
"elo": 1000,
|
|
109
|
+
"last_used": "YYYY-MM-DD",
|
|
110
|
+
"status": "active|archived"
|
|
111
|
+
}
|
|
112
|
+
],
|
|
113
|
+
"capabilities": [
|
|
114
|
+
{
|
|
115
|
+
"file": "capabilities/{path}.md",
|
|
116
|
+
"usage_score": 0,
|
|
117
|
+
"quality_score": 0,
|
|
118
|
+
"tokens": 0,
|
|
119
|
+
"status": "active|archived"
|
|
120
|
+
}
|
|
121
|
+
]
|
|
122
|
+
}
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
`influence_score` = how many other pipelines depend on this one's output.
|
|
126
|
+
`elo` = updated from intelligence/elo.md pairwise comparisons when agents compete.
|
|
127
|
+
|
|
128
|
+
After updating topology-map.json → also update the relevant rows in manifest.md token estimates.
|
|
@@ -0,0 +1,103 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: evolution-detect
|
|
3
|
+
description: >
|
|
4
|
+
Load at the start of /evolve. Load when the environment feels stale or broken.
|
|
5
|
+
Load when skills aren't triggering correctly, agents are making repeated mistakes,
|
|
6
|
+
or friction logs mention the same pain 3+ times. Load when you suspect something
|
|
7
|
+
was built weeks ago and may no longer match how the project actually works.
|
|
8
|
+
tokens: ~250
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## DETECT
|
|
12
|
+
|
|
13
|
+
### Step 1: 3-Layer Memory Retrieval
|
|
14
|
+
|
|
15
|
+
**Layer 1 — Search** (grep keywords, don't read files yet):
|
|
16
|
+
```bash
|
|
17
|
+
git log --oneline -10
|
|
18
|
+
ls .claude/memory/learnings/ 2>/dev/null
|
|
19
|
+
ls ops/observations/ 2>/dev/null
|
|
20
|
+
grep -rl "friction\|gap\|repeated\|failed" .claude/memory/ 2>/dev/null
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
**Layer 2 — Filter** (recency and relevance):
|
|
24
|
+
- Last 5 commits — what was recently touched?
|
|
25
|
+
- Friction logs from the last 3 sessions only
|
|
26
|
+
- Memory files whose filenames match current task keywords
|
|
27
|
+
|
|
28
|
+
**Layer 3 — Read** (only what passed Layer 2):
|
|
29
|
+
- Read matched files — not all of them. Layer 1+2 cost ~100 tokens. Layer 3 costs per file.
|
|
30
|
+
- Session summaries: `tail -50 .claude/memory/sessions/*.md`
|
|
31
|
+
- Frontmatter filter: read `topics:` field before reading full session file.
|
|
32
|
+
|
|
33
|
+
---
|
|
34
|
+
|
|
35
|
+
### Step 2: Friction Scan
|
|
36
|
+
|
|
37
|
+
```bash
|
|
38
|
+
ls ops/observations/ 2>/dev/null
|
|
39
|
+
grep -r "harder than\|repeated from\|took longer\|missing" ops/observations/ 2>/dev/null
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
Classify signals:
|
|
43
|
+
- **Repetition signals** — same task appears in ≥ 2 friction logs
|
|
44
|
+
- **Correction signals** — Claude output corrected by user in ≥ 2 sessions
|
|
45
|
+
- **Speed signals** — "took longer than expected" for same task type
|
|
46
|
+
- **Missing signals** — "environment is missing" repeated
|
|
47
|
+
|
|
48
|
+
Domain-specific friction signals:
|
|
49
|
+
- Developer: test failures, missing scaffolding
|
|
50
|
+
- Writer: continuity errors, structure resets
|
|
51
|
+
- Researcher: unsourced claims, repeated lookups
|
|
52
|
+
- Compliance: wrong vocabulary, missing obligation documentation
|
|
53
|
+
|
|
54
|
+
Compute friction score alongside structural score. High friction = high priority fix.
|
|
55
|
+
|
|
56
|
+
---
|
|
57
|
+
|
|
58
|
+
### Step 3: CONTEXT ROT CLASSIFICATION [CE Pyramid]
|
|
59
|
+
|
|
60
|
+
Before patching any context problem, classify the rot type:
|
|
61
|
+
|
|
62
|
+
| Rot Type | Symptom | Fix |
|
|
63
|
+
|----------|---------|-----|
|
|
64
|
+
| Poisoning | Agent believes wrong facts | Remove or correct the false source |
|
|
65
|
+
| Distraction | Irrelevant context consuming window | Filter or summarize |
|
|
66
|
+
| Confusion | Agent receives contradictory instructions | Resolve conflict, enforce one rule |
|
|
67
|
+
| Clash | Context from multiple sources conflicts | Establish priority order |
|
|
68
|
+
|
|
69
|
+
Rule: **classify first, patch second.** Never patch without a rot type label.
|
|
70
|
+
|
|
71
|
+
---
|
|
72
|
+
|
|
73
|
+
### Step 4: SEQUENCE SCAN [AutoAgent]
|
|
74
|
+
|
|
75
|
+
Look for patterns that repeat across 3+ sessions:
|
|
76
|
+
```bash
|
|
77
|
+
grep -r "description:" .claude/memory/sessions/ 2>/dev/null | head -20
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
A sequence candidate = same 3+ steps done manually in 3+ separate sessions.
|
|
81
|
+
|
|
82
|
+
Rules:
|
|
83
|
+
- Flag candidates only — don't build the skill yet
|
|
84
|
+
- If it maps to an existing skill: skip
|
|
85
|
+
- If it's a project-specific one-off: skip
|
|
86
|
+
- If it's a general workflow pattern: flag for GENERATE
|
|
87
|
+
|
|
88
|
+
---
|
|
89
|
+
|
|
90
|
+
### Step 5: INTENTION-OUTCOME SCAN [AutoAgent]
|
|
91
|
+
|
|
92
|
+
```bash
|
|
93
|
+
grep "description:" .claude/memory/*.md 2>/dev/null
|
|
94
|
+
grep -r "friction\|gap\|missed" ops/observations/ 2>/dev/null
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
Compare what agents were supposed to do (description field) vs. what friction logs say happened.
|
|
98
|
+
|
|
99
|
+
- Gap = capability described but friction says it doesn't work
|
|
100
|
+
- Only flag gaps that appear in ≥ 2 friction entries (single-occurrence = noise)
|
|
101
|
+
- Gap action must be specific: "add X to file Y" — not "improve agent"
|
|
102
|
+
|
|
103
|
+
Add flagged gaps to PLAN alongside structural gaps.
|
|
@@ -0,0 +1,90 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: evolution-evaluate
|
|
3
|
+
description: >
|
|
4
|
+
Load after generate.md has produced a new skill, agent, or capability.
|
|
5
|
+
Load before promoting, committing, or merging any generated file.
|
|
6
|
+
Load when a new skill was just written and you are unsure if it is good enough.
|
|
7
|
+
Never promote generated output without running this — generated files fail silently.
|
|
8
|
+
tokens: ~200
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## EVALUATE
|
|
12
|
+
|
|
13
|
+
Run for every file produced in GENERATE before it ships.
|
|
14
|
+
|
|
15
|
+
---
|
|
16
|
+
|
|
17
|
+
### Step 1: Five Quality Criteria [CE Pyramid]
|
|
18
|
+
|
|
19
|
+
Check all five. Fail any one → fix first, ship second. No exceptions.
|
|
20
|
+
|
|
21
|
+
| Criterion | Question | Pass condition |
|
|
22
|
+
|-----------|----------|---------------|
|
|
23
|
+
| Relevance | Does this context help with the current task? | Directly used in this task |
|
|
24
|
+
| Sufficiency | Does it contain everything needed to act? | No missing steps or undefined terms |
|
|
25
|
+
| Isolation | Does it stand alone without unstated dependencies? | Self-contained for its scope |
|
|
26
|
+
| Economy | Can any word be removed without losing meaning? | Minimum words for maximum clarity |
|
|
27
|
+
| Provenance | Is the source of this knowledge traceable? | Source referenced or derivation clear |
|
|
28
|
+
|
|
29
|
+
Pass all 5 → proceed to TAG.
|
|
30
|
+
Fail any 1 → fix it. Do not move forward until all 5 pass.
|
|
31
|
+
|
|
32
|
+
---
|
|
33
|
+
|
|
34
|
+
### Step 2: Pass-k Thresholds [DUCTILE]
|
|
35
|
+
|
|
36
|
+
A single passing test proves nothing.
|
|
37
|
+
|
|
38
|
+
- **k=3 dev gate**: Apply the generated skill to 3 different real inputs from the current project. k=3 catches prompt sensitivity.
|
|
39
|
+
- **k=10 deploy gate**: Before promoting to shared-skills, apply to 10 inputs. k=10 catches edge-case failures that k=3 misses.
|
|
40
|
+
|
|
41
|
+
Never mark a skill done after one successful application.
|
|
42
|
+
Never promote to shared-skills without passing k=10.
|
|
43
|
+
|
|
44
|
+
---
|
|
45
|
+
|
|
46
|
+
### Step 3: TAG
|
|
47
|
+
|
|
48
|
+
Classify every generated skill:
|
|
49
|
+
|
|
50
|
+
**GENERAL skill** — applies across projects:
|
|
51
|
+
- Works for any developer project, any writer project, etc.
|
|
52
|
+
- Does not depend on this project's specific code or structure
|
|
53
|
+
- Add frontmatter: `skill_type: general`
|
|
54
|
+
- After passing k=10: copy to `~/shared-skills/` for reuse across projects
|
|
55
|
+
- Add `discovered_in: {project}` to frontmatter before promoting
|
|
56
|
+
|
|
57
|
+
**NARROW skill** — specific to this project:
|
|
58
|
+
- Depends on this project's domain, stack, or structure
|
|
59
|
+
- Do NOT promote to shared-skills
|
|
60
|
+
- Stays in `.claude/commands/` only
|
|
61
|
+
- Narrow skills never go to shared-skills, even if they seem reusable
|
|
62
|
+
|
|
63
|
+
When uncertain: default to NARROW. Promote to GENERAL only with evidence across projects.
|
|
64
|
+
|
|
65
|
+
---
|
|
66
|
+
|
|
67
|
+
### Step 4: GOALS UPDATE
|
|
68
|
+
|
|
69
|
+
After evaluating all generated files:
|
|
70
|
+
1. Update `.claude/memory/goals.md` — mark completed items, add next actions
|
|
71
|
+
2. Top 3 concrete next actions only — not a wish list
|
|
72
|
+
3. Remove blockers that were resolved
|
|
73
|
+
|
|
74
|
+
---
|
|
75
|
+
|
|
76
|
+
### Step 5: SESSION SUMMARY
|
|
77
|
+
|
|
78
|
+
Write `{date}-cycle-{N}.md` to `.claude/memory/sessions/`:
|
|
79
|
+
```yaml
|
|
80
|
+
---
|
|
81
|
+
date: {ISO date}
|
|
82
|
+
cycle: {N}
|
|
83
|
+
topics: [list of what was detected, generated, evaluated]
|
|
84
|
+
gaps_fixed: [list]
|
|
85
|
+
skills_created: [list]
|
|
86
|
+
promoted_to_shared: [list or none]
|
|
87
|
+
---
|
|
88
|
+
|
|
89
|
+
{2-3 sentence summary of what changed and why}
|
|
90
|
+
```
|
|
@@ -0,0 +1,123 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: evolution-generate
|
|
3
|
+
description: >
|
|
4
|
+
Load after detect.md has produced a PLAN with specific gaps listed. Load when
|
|
5
|
+
about to write a new skill, agent, or capability file. Load when detect found
|
|
6
|
+
a stale doc, a missing capability, or a broken skill and you are about to fix it.
|
|
7
|
+
Do NOT load before detect — generate without a PLAN produces noise.
|
|
8
|
+
tokens: ~250
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## GENERATE
|
|
12
|
+
|
|
13
|
+
Run for each item in the PLAN that passed DETECT.
|
|
14
|
+
|
|
15
|
+
---
|
|
16
|
+
|
|
17
|
+
### Step 1: Contract-First [DUCTILE]
|
|
18
|
+
|
|
19
|
+
Before writing any skill or agent, define the contract:
|
|
20
|
+
|
|
21
|
+
```
|
|
22
|
+
Component: {name}
|
|
23
|
+
|
|
24
|
+
Input: What context this component receives to do its job
|
|
25
|
+
Output: What it produces in the success case
|
|
26
|
+
Failures: What happens when input is wrong or incomplete
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
The contract is the acceptance test. If you cannot fill all three fields clearly,
|
|
30
|
+
the component is under-specified. Do not generate it yet — clarify first.
|
|
31
|
+
|
|
32
|
+
---
|
|
33
|
+
|
|
34
|
+
### Step 2: Documentation-Quality-as-Signal [DUCTILE]
|
|
35
|
+
|
|
36
|
+
Before generating, read any existing documentation for this capability:
|
|
37
|
+
- < 2 sentences of docs = under-specified component → clarify before building
|
|
38
|
+
- Contradictory docs = Confusion rot → classify and resolve first
|
|
39
|
+
- No docs at all = treat as net-new, contract-first applies fully
|
|
40
|
+
|
|
41
|
+
Sparse documentation is a signal, not a gap to fill with code.
|
|
42
|
+
|
|
43
|
+
---
|
|
44
|
+
|
|
45
|
+
### Step 3: Generate the File
|
|
46
|
+
|
|
47
|
+
Write the capability file following the architecture rules:
|
|
48
|
+
- ≤ 150 lines
|
|
49
|
+
- YAML frontmatter required (name, description, tokens estimate)
|
|
50
|
+
- One capability per file — if it's growing beyond 150 lines, split it
|
|
51
|
+
- No file reads another file by default — explicit pointers only
|
|
52
|
+
|
|
53
|
+
**Frontmatter format:**
|
|
54
|
+
```yaml
|
|
55
|
+
---
|
|
56
|
+
name: {capability-name}
|
|
57
|
+
description: >
|
|
58
|
+
One paragraph: what this does, when it fires, trigger words.
|
|
59
|
+
tokens: ~{estimate}
|
|
60
|
+
---
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
---
|
|
64
|
+
|
|
65
|
+
### Step 4: Self-Applicability Check [QChunker]
|
|
66
|
+
|
|
67
|
+
After writing the file, read it as if you are an unfamiliar agent seeing it for the first time:
|
|
68
|
+
- Can an unfamiliar agent apply this immediately, without reading other files?
|
|
69
|
+
- Is it self-contained for its scope?
|
|
70
|
+
- Binary: **pass** or **fail**. No partial credit.
|
|
71
|
+
|
|
72
|
+
If it fails: revise until it passes. Do not ship a file that fails self-applicability.
|
|
73
|
+
|
|
74
|
+
---
|
|
75
|
+
|
|
76
|
+
### Step 4.5: Pressure-Test Scenarios (enforcement skills only)
|
|
77
|
+
|
|
78
|
+
If the skill enforces a process gate (completion, review, TDD, checkpoint):
|
|
79
|
+
Load `shared/pressure-test.md` and add a `## Pressure Tests` section to the skill.
|
|
80
|
+
|
|
81
|
+
Write one scenario per pressure type: time pressure, sunk cost, authority, false confidence.
|
|
82
|
+
A skill that can be argued out of is not a skill — it's a suggestion.
|
|
83
|
+
|
|
84
|
+
Skip this step only if the skill is purely guidance (vocabulary, patterns, reference material).
|
|
85
|
+
|
|
86
|
+
---
|
|
87
|
+
|
|
88
|
+
### Step 5: Consider Hook Generation [Hookify]
|
|
89
|
+
|
|
90
|
+
If the detected friction is **behavioral** (user keeps correcting the same mistake,
|
|
91
|
+
Claude keeps producing unwanted output), generate a hook instead of rewriting a prompt.
|
|
92
|
+
|
|
93
|
+
**When to generate a hook instead of a skill/agent:**
|
|
94
|
+
- Same correction appears 3+ times across sessions → PreToolUse or PostToolUse hook
|
|
95
|
+
- Pattern is mechanical (e.g., "always run tests after edit") → PostToolUse hook
|
|
96
|
+
- Pattern is preventive (e.g., "never use eval()") → PreToolUse hook with matcher
|
|
97
|
+
|
|
98
|
+
**Hook generation template:**
|
|
99
|
+
```json
|
|
100
|
+
{
|
|
101
|
+
"hooks": {
|
|
102
|
+
"PreToolUse": [{
|
|
103
|
+
"matcher": "Edit|Write",
|
|
104
|
+
"hooks": [{"type": "command", "command": "bash .claude/hooks/check-pattern.sh"}]
|
|
105
|
+
}]
|
|
106
|
+
}
|
|
107
|
+
}
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
**Rule:** Hooks are for mechanical enforcement. Skills are for guided workflows.
|
|
111
|
+
If the fix requires reasoning, write a skill. If it requires pattern matching, write a hook.
|
|
112
|
+
|
|
113
|
+
---
|
|
114
|
+
|
|
115
|
+
### Step 6: Add to Manifest
|
|
116
|
+
|
|
117
|
+
After generating a new capability file, add one row to `capabilities/manifest.md`:
|
|
118
|
+
```
|
|
119
|
+
| {file-path} | {trigger description} | ~{tokens} |
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
This is the ONLY change to existing files. Nothing else needs to change.
|
|
123
|
+
The dispatch in CLAUDE.md already says "read manifest.md for unknown capabilities."
|
|
@@ -0,0 +1,77 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: evolution-re-derivation
|
|
3
|
+
description: >
|
|
4
|
+
Re-Derivation Protocol. Triggered when friction logs exceed threshold.
|
|
5
|
+
Not a patch — a full architectural rethink of a broken design.
|
|
6
|
+
Load when: friction logs >= 10 AND same pattern appears >= 5 times.
|
|
7
|
+
tokens: ~150
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
## RE-DERIVATION PROTOCOL
|
|
11
|
+
|
|
12
|
+
This fires when patching has stopped working. This is an architectural problem — not a content problem.
|
|
13
|
+
Adding more rules to a broken architecture makes it worse, not better.
|
|
14
|
+
|
|
15
|
+
### Trigger Threshold
|
|
16
|
+
```bash
|
|
17
|
+
ls ops/observations/ | wc -l
|
|
18
|
+
grep -r "{pattern}" ops/observations/ | wc -l
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
If friction logs ≥ 10 AND same pattern appears in ≥ 5 logs:
|
|
22
|
+
→ Do NOT patch again.
|
|
23
|
+
→ Do NOT add another rule.
|
|
24
|
+
→ Run this protocol.
|
|
25
|
+
|
|
26
|
+
---
|
|
27
|
+
|
|
28
|
+
### Step 1: Summarize Friction
|
|
29
|
+
Read all matching friction logs. Write a 3-sentence summary:
|
|
30
|
+
- What keeps failing
|
|
31
|
+
- What has already been tried
|
|
32
|
+
- What the current design assumed that turned out to be wrong
|
|
33
|
+
|
|
34
|
+
---
|
|
35
|
+
|
|
36
|
+
### Step 2: Identify the Pattern
|
|
37
|
+
Name the architectural assumption that is failing.
|
|
38
|
+
Not the symptom — the root assumption.
|
|
39
|
+
|
|
40
|
+
Example: "We assumed agents would read their full instruction file. They don't — they skim."
|
|
41
|
+
|
|
42
|
+
---
|
|
43
|
+
|
|
44
|
+
### Step 3: Propose Architectural Change
|
|
45
|
+
One specific change to the structure, not the content.
|
|
46
|
+
Examples:
|
|
47
|
+
- Split a monolithic file into micro-sections
|
|
48
|
+
- Move a rule from agent instructions to CLAUDE.md (always-hot)
|
|
49
|
+
- Replace agent-spawning with direct skill execution
|
|
50
|
+
- Add a manifest entry instead of embedding logic in CLAUDE.md
|
|
51
|
+
|
|
52
|
+
---
|
|
53
|
+
|
|
54
|
+
### Step 4: Present to User Before Changing
|
|
55
|
+
Write a Re-Derivation Proposal:
|
|
56
|
+
```
|
|
57
|
+
## Re-Derivation Proposal
|
|
58
|
+
|
|
59
|
+
**Pattern**: {what keeps failing}
|
|
60
|
+
**Root assumption that failed**: {what the design assumed}
|
|
61
|
+
**Proposed change**: {one architectural change}
|
|
62
|
+
**Files affected**: {list}
|
|
63
|
+
**Expected result**: {what improves}
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
Do NOT implement until the user approves. Architectural changes affect everything.
|
|
67
|
+
|
|
68
|
+
---
|
|
69
|
+
|
|
70
|
+
### Step 5: Implement if Approved
|
|
71
|
+
Make the change. Archive processed friction logs:
|
|
72
|
+
```bash
|
|
73
|
+
mkdir -p ops/observations/archive
|
|
74
|
+
mv ops/observations/{matching-logs} ops/observations/archive/
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
Reset friction counter. The architectural change starts a new baseline.
|