clawpowers 1.0.1 → 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +2 -2
- package/docs/launch-images/post1-hero-lobster.jpg +0 -0
- package/docs/launch-images/post2-dashboard.jpg +0 -0
- package/docs/launch-images/post3-superpowers.jpg +0 -0
- package/docs/launch-images/post4-before-after.jpg +0 -0
- package/docs/launch-images/post5-install-now.jpg +0 -0
- package/docs/launch-posts.md +76 -0
- package/package.json +1 -1
- package/skills/cross-project-knowledge/SKILL.md +345 -0
- package/skills/formal-verification-lite/SKILL.md +441 -0
- package/skills/meta-skill-evolution/SKILL.md +325 -0
- package/skills/self-healing-code/SKILL.md +369 -0
- package/skills/systematic-debugging/SKILL.md +76 -0
- package/skills/test-driven-development/SKILL.md +117 -0
- package/skills/using-clawpowers/SKILL.md +17 -5
|
@@ -0,0 +1,325 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: meta-skill-evolution
|
|
3
|
+
description: RSI for coding methodology itself. After every 50 completed tasks, analyze outcome patterns, identify the weakest skill, surgically improve it, and commit the evolution. Agents that literally improve their own methodology over time.
|
|
4
|
+
version: 1.0.0
|
|
5
|
+
requires:
|
|
6
|
+
tools: [bash, git, node]
|
|
7
|
+
runtime: true
|
|
8
|
+
metrics:
|
|
9
|
+
tracks: [evolutions_triggered, skills_improved, success_rate_delta, version_bumps, evolution_duration]
|
|
10
|
+
improves: [skill_selection_accuracy, weakest_skill_identification, surgical_edit_quality]
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# Meta-Skill Evolution
|
|
14
|
+
|
|
15
|
+
## When to Use
|
|
16
|
+
|
|
17
|
+
Apply this skill when:
|
|
18
|
+
|
|
19
|
+
- The task counter reaches a multiple of 50 (tracked in `~/.clawpowers/state/task-counter.json`)
|
|
20
|
+
- A skill consistently shows < 70% success rate over the last 20 uses
|
|
21
|
+
- `runtime/feedback/analyze.sh` surfaces a skill with declining trend
|
|
22
|
+
- Bill explicitly requests "evolve the skills" or "improve methodology"
|
|
23
|
+
- A cluster of related task failures points to a methodology gap
|
|
24
|
+
|
|
25
|
+
**Skip when:**
|
|
26
|
+
- Fewer than 50 total tasks have been completed (insufficient signal)
|
|
27
|
+
- The runtime directory `~/.clawpowers/` doesn't exist (static mode)
|
|
28
|
+
- A previous evolution cycle completed within the last 10 tasks (cooling period)
|
|
29
|
+
|
|
30
|
+
**Decision tree:**
|
|
31
|
+
```
|
|
32
|
+
Has task counter hit a multiple of 50?
|
|
33
|
+
├── No → continue working; check counter at next task completion
|
|
34
|
+
└── Yes → Run evolution cycle
|
|
35
|
+
└── Does weakest skill have < 80% success rate?
|
|
36
|
+
├── No → log "all skills healthy", increment counter, skip
|
|
37
|
+
└── Yes → identify weakest section → surgical edit → version bump → commit
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
## Core Methodology
|
|
41
|
+
|
|
42
|
+
### Step 1: Trigger and Task Counter
|
|
43
|
+
|
|
44
|
+
Every completed task increments a persistent counter. After each task:
|
|
45
|
+
|
|
46
|
+
```bash
|
|
47
|
+
# Increment task counter
|
|
48
|
+
COUNTER_FILE=~/.clawpowers/state/task-counter.json
|
|
49
|
+
CURRENT=$(cat "$COUNTER_FILE" 2>/dev/null | node -e "const d=require('/dev/stdin');console.log(d.count||0)" 2>/dev/null || echo 0)
|
|
50
|
+
NEXT=$((CURRENT + 1))
|
|
51
|
+
echo "{\"count\": $NEXT, \"last_updated\": \"$(date -u +%Y-%m-%dT%H:%M:%SZ)\"}" > "$COUNTER_FILE"
|
|
52
|
+
|
|
53
|
+
# Check if evolution cycle is due
|
|
54
|
+
if (( NEXT % 50 == 0 )); then
|
|
55
|
+
echo "Evolution cycle triggered at task $NEXT"
|
|
56
|
+
# → proceed to Step 2
|
|
57
|
+
fi
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
**Recording task completion (run this after every task):**
|
|
61
|
+
```bash
|
|
62
|
+
bash runtime/metrics/collector.sh record \
|
|
63
|
+
--skill <active-skill-name> \
|
|
64
|
+
--outcome success|failure \
|
|
65
|
+
--duration <seconds> \
|
|
66
|
+
--notes "<brief description>"
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
### Step 2: Outcome Pattern Analysis
|
|
70
|
+
|
|
71
|
+
Pull the last 50 task records and compute per-skill success rates:
|
|
72
|
+
|
|
73
|
+
```bash
|
|
74
|
+
# Analyze outcomes for the last 50 tasks
|
|
75
|
+
METRICS_FILE=~/.clawpowers/metrics/outcomes.jsonl
|
|
76
|
+
|
|
77
|
+
# Per-skill success rate (requires jq or node)
|
|
78
|
+
node - <<'EOF'
|
|
79
|
+
const fs = require('fs');
|
|
80
|
+
const lines = fs.readFileSync(process.env.HOME + '/.clawpowers/metrics/outcomes.jsonl', 'utf8')
|
|
81
|
+
.trim().split('\n').filter(Boolean).slice(-50)
|
|
82
|
+
.map(l => JSON.parse(l));
|
|
83
|
+
|
|
84
|
+
const stats = {};
|
|
85
|
+
for (const rec of lines) {
|
|
86
|
+
const s = rec.skill || 'unknown';
|
|
87
|
+
if (!stats[s]) stats[s] = { success: 0, failure: 0, durations: [] };
|
|
88
|
+
stats[s][rec.outcome === 'success' ? 'success' : 'failure']++;
|
|
89
|
+
if (rec.duration) stats[s].durations.push(rec.duration);
|
|
90
|
+
}
|
|
91
|
+
|
|
92
|
+
const report = Object.entries(stats).map(([skill, d]) => {
|
|
93
|
+
const total = d.success + d.failure;
|
|
94
|
+
const rate = total > 0 ? (d.success / total) : null;
|
|
95
|
+
const avgDuration = d.durations.length > 0
|
|
96
|
+
? Math.round(d.durations.reduce((a,b)=>a+b,0) / d.durations.length)
|
|
97
|
+
: null;
|
|
98
|
+
return { skill, success_rate: rate, total_tasks: total, avg_duration_s: avgDuration };
|
|
99
|
+
}).sort((a, b) => (a.success_rate ?? 1) - (b.success_rate ?? 1));
|
|
100
|
+
|
|
101
|
+
console.log(JSON.stringify(report, null, 2));
|
|
102
|
+
EOF
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
**What to look for:**
|
|
106
|
+
- Lowest `success_rate` → weakest skill candidate
|
|
107
|
+
- Rising `avg_duration_s` → methodology is too slow or unclear
|
|
108
|
+
- High failure count on a single skill → systemic gap, not random noise
|
|
109
|
+
|
|
110
|
+
### Step 3: Identify the Weakest Skill
|
|
111
|
+
|
|
112
|
+
```bash
|
|
113
|
+
# Get weakest skill (lowest success rate with ≥ 3 data points)
|
|
114
|
+
WEAKEST=$(node - <<'EOF'
|
|
115
|
+
const fs = require('fs');
|
|
116
|
+
const lines = fs.readFileSync(process.env.HOME + '/.clawpowers/metrics/outcomes.jsonl', 'utf8')
|
|
117
|
+
.trim().split('\n').filter(Boolean).slice(-50).map(l => JSON.parse(l));
|
|
118
|
+
const stats = {};
|
|
119
|
+
for (const rec of lines) {
|
|
120
|
+
const s = rec.skill || 'unknown';
|
|
121
|
+
if (!stats[s]) stats[s] = { success: 0, failure: 0 };
|
|
122
|
+
stats[s][rec.outcome === 'success' ? 'success' : 'failure']++;
|
|
123
|
+
}
|
|
124
|
+
const ranked = Object.entries(stats)
|
|
125
|
+
.filter(([_, d]) => (d.success + d.failure) >= 3)
|
|
126
|
+
.map(([skill, d]) => ({ skill, rate: d.success / (d.success + d.failure) }))
|
|
127
|
+
.sort((a, b) => a.rate - b.rate);
|
|
128
|
+
console.log(ranked[0]?.skill || '');
|
|
129
|
+
EOF
|
|
130
|
+
)
|
|
131
|
+
|
|
132
|
+
echo "Weakest skill: $WEAKEST"
|
|
133
|
+
|
|
134
|
+
# Read the skill file
|
|
135
|
+
SKILL_FILE="skills/$WEAKEST/SKILL.md"
|
|
136
|
+
if [[ ! -f "$SKILL_FILE" ]]; then
|
|
137
|
+
echo "Skill file not found: $SKILL_FILE — skipping evolution"
|
|
138
|
+
exit 0
|
|
139
|
+
fi
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
### Step 4: Diagnose the Specific Weakness
|
|
143
|
+
|
|
144
|
+
Before editing, analyze *why* the skill is failing. Read the failure notes:
|
|
145
|
+
|
|
146
|
+
```bash
|
|
147
|
+
# Extract failure notes for the weakest skill
|
|
148
|
+
node - <<EOF
|
|
149
|
+
const fs = require('fs');
|
|
150
|
+
const skill = '$WEAKEST';
|
|
151
|
+
const lines = fs.readFileSync(process.env.HOME + '/.clawpowers/metrics/outcomes.jsonl', 'utf8')
|
|
152
|
+
.trim().split('\n').filter(Boolean).slice(-50)
|
|
153
|
+
.map(l => JSON.parse(l))
|
|
154
|
+
.filter(r => r.skill === skill && r.outcome === 'failure' && r.notes);
|
|
155
|
+
lines.forEach(r => console.log(r.timestamp, '|', r.notes));
|
|
156
|
+
EOF
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
**Diagnosis patterns:**
|
|
160
|
+
|
|
161
|
+
| Failure note pattern | Likely weak section | Fix strategy |
|
|
162
|
+
|---------------------|-------------------|-------------|
|
|
163
|
+
| "step X was unclear" | Core Methodology step X | Add concrete example, remove ambiguity |
|
|
164
|
+
| "forgot to check Y" | Anti-Patterns table | Add the missed check as an explicit anti-pattern |
|
|
165
|
+
| "didn't know when to apply" | When to Use decision tree | Sharpen the decision tree with new branch |
|
|
166
|
+
| "ClawPowers commands failed" | ClawPowers Enhancement | Fix command syntax or add error handling |
|
|
167
|
+
| "took too long on Z" | Core Methodology step Z | Add shortcut or restructure step ordering |
|
|
168
|
+
|
|
169
|
+
### Step 5: Surgical Edit (Not Wholesale Replacement)
|
|
170
|
+
|
|
171
|
+
**Critical rule:** Edit specific sections, not the entire file. Wholesale rewrites lose working methodology.
|
|
172
|
+
|
|
173
|
+
```bash
|
|
174
|
+
# Read the current skill version
|
|
175
|
+
CURRENT_VERSION=$(grep '^version:' "$SKILL_FILE" | head -1 | awk '{print $2}' | tr -d '"')
|
|
176
|
+
MAJOR=$(echo $CURRENT_VERSION | cut -d. -f1)
|
|
177
|
+
MINOR=$(echo $CURRENT_VERSION | cut -d. -f2)
|
|
178
|
+
PATCH=$(echo $CURRENT_VERSION | cut -d. -f3)
|
|
179
|
+
NEW_VERSION="$MAJOR.$MINOR.$((PATCH + 1))"
|
|
180
|
+
|
|
181
|
+
echo "Evolving $WEAKEST from v$CURRENT_VERSION → v$NEW_VERSION"
|
|
182
|
+
```
|
|
183
|
+
|
|
184
|
+
**Surgical edit guidelines:**
|
|
185
|
+
- If the "When to Use" decision tree is wrong → edit only that block
|
|
186
|
+
- If a Core Methodology step is incomplete → add one concrete example under that step
|
|
187
|
+
- If an Anti-Pattern is missing → append one row to the table
|
|
188
|
+
- If ClawPowers commands are broken → fix only the broken command block
|
|
189
|
+
- Never touch sections that aren't implicated in the failures
|
|
190
|
+
- Max lines changed per evolution cycle: 30 (forces focus)
|
|
191
|
+
|
|
192
|
+
**Apply the edit and bump version:**
|
|
193
|
+
```bash
|
|
194
|
+
# After making the targeted edit in SKILL_FILE:
|
|
195
|
+
sed -i "s/^version: $CURRENT_VERSION/version: $NEW_VERSION/" "$SKILL_FILE"
|
|
196
|
+
```
|
|
197
|
+
|
|
198
|
+
### Step 6: Commit the Evolution
|
|
199
|
+
|
|
200
|
+
```bash
|
|
201
|
+
# Stage and commit
|
|
202
|
+
git add "$SKILL_FILE"
|
|
203
|
+
git commit -m "skill-evolution: $WEAKEST v$CURRENT_VERSION → v$NEW_VERSION
|
|
204
|
+
|
|
205
|
+
Triggered at task $TASK_COUNT. Success rate was $RATE%.
|
|
206
|
+
Section edited: $SECTION_EDITED
|
|
207
|
+
Root cause: $ROOT_CAUSE
|
|
208
|
+
|
|
209
|
+
[meta-skill-evolution]"
|
|
210
|
+
|
|
211
|
+
# Copy evolved skill to ~/.clawpowers/skills/ if exists
|
|
212
|
+
MANAGED_SKILLS_DIR=~/.clawpowers/skills
|
|
213
|
+
if [[ -d "$MANAGED_SKILLS_DIR" ]]; then
|
|
214
|
+
mkdir -p "$MANAGED_SKILLS_DIR/$WEAKEST"
|
|
215
|
+
cp "$SKILL_FILE" "$MANAGED_SKILLS_DIR/$WEAKEST/SKILL.md"
|
|
216
|
+
fi
|
|
217
|
+
```
|
|
218
|
+
|
|
219
|
+
### Step 7: Log Evolution History
|
|
220
|
+
|
|
221
|
+
Every evolution is appended to a persistent log:
|
|
222
|
+
|
|
223
|
+
```bash
|
|
224
|
+
EVOLUTION_LOG=~/.clawpowers/feedback/evolution-log.jsonl
|
|
225
|
+
mkdir -p "$(dirname $EVOLUTION_LOG)"
|
|
226
|
+
|
|
227
|
+
cat >> "$EVOLUTION_LOG" <<EOF
|
|
228
|
+
{"timestamp":"$(date -u +%Y-%m-%dT%H:%M:%SZ)","task_count":$TASK_COUNT,"skill":"$WEAKEST","version_from":"$CURRENT_VERSION","version_to":"$NEW_VERSION","success_rate_before":$RATE,"section_edited":"$SECTION_EDITED","root_cause":"$ROOT_CAUSE","commit":"$(git rev-parse --short HEAD)"}
|
|
229
|
+
EOF
|
|
230
|
+
```
|
|
231
|
+
|
|
232
|
+
**Review evolution history:**
|
|
233
|
+
```bash
|
|
234
|
+
# See all past evolutions
|
|
235
|
+
cat ~/.clawpowers/feedback/evolution-log.jsonl | node -e "
|
|
236
|
+
const lines = require('fs').readFileSync('/dev/stdin','utf8').trim().split('\n').map(JSON.parse);
|
|
237
|
+
lines.forEach(e => console.log(e.timestamp.slice(0,10), e.skill, e.version_from, '→', e.version_to, 'rate:', (e.success_rate_before*100).toFixed(0)+'%'));
|
|
238
|
+
"
|
|
239
|
+
|
|
240
|
+
# Check if an evolution helped (compare rate before vs after)
|
|
241
|
+
# Re-run outcome analysis after 10 more tasks to measure improvement
|
|
242
|
+
```
|
|
243
|
+
|
|
244
|
+
### Step 8: Validate the Evolution
|
|
245
|
+
|
|
246
|
+
After 10 more tasks using the evolved skill, check if the success rate improved:
|
|
247
|
+
|
|
248
|
+
```bash
|
|
249
|
+
# Post-evolution check (run after 10+ tasks)
|
|
250
|
+
NEW_RATE=$(node -e "
|
|
251
|
+
const fs = require('fs');
|
|
252
|
+
const lines = fs.readFileSync(process.env.HOME + '/.clawpowers/metrics/outcomes.jsonl','utf8')
|
|
253
|
+
.trim().split('\n').filter(Boolean).slice(-10)
|
|
254
|
+
.map(l => JSON.parse(l))
|
|
255
|
+
.filter(r => r.skill === '$WEAKEST');
|
|
256
|
+
const success = lines.filter(r => r.outcome === 'success').length;
|
|
257
|
+
console.log((success/lines.length).toFixed(2));
|
|
258
|
+
")
|
|
259
|
+
echo "Post-evolution success rate for $WEAKEST: $NEW_RATE"
|
|
260
|
+
|
|
261
|
+
# If rate dropped: revert the evolution
|
|
262
|
+
if node -e "process.exit(parseFloat('$NEW_RATE') < parseFloat('$RATE') ? 1 : 0)"; then
|
|
263
|
+
echo "Evolution improved the skill. Rate: $RATE → $NEW_RATE"
|
|
264
|
+
else
|
|
265
|
+
echo "WARNING: Evolution did not help. Consider reverting."
|
|
266
|
+
git revert HEAD --no-edit
|
|
267
|
+
fi
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
## ClawPowers Enhancement
|
|
271
|
+
|
|
272
|
+
When `~/.clawpowers/` runtime is initialized:
|
|
273
|
+
|
|
274
|
+
**Full evolution pipeline:**
|
|
275
|
+
|
|
276
|
+
```bash
|
|
277
|
+
# Store evolution state for resumability
|
|
278
|
+
bash runtime/persistence/store.sh set "meta-evolution:current:task_count" "$TASK_COUNT"
|
|
279
|
+
bash runtime/persistence/store.sh set "meta-evolution:current:weakest_skill" "$WEAKEST"
|
|
280
|
+
bash runtime/persistence/store.sh set "meta-evolution:current:phase" "diagnosis|editing|committed|validated"
|
|
281
|
+
|
|
282
|
+
# Record the evolution outcome
|
|
283
|
+
bash runtime/metrics/collector.sh record \
|
|
284
|
+
--skill meta-skill-evolution \
|
|
285
|
+
--outcome success \
|
|
286
|
+
--duration "$DURATION" \
|
|
287
|
+
--notes "$WEAKEST v$CURRENT_VERSION→v$NEW_VERSION rate:$RATE→$NEW_RATE"
|
|
288
|
+
```
|
|
289
|
+
|
|
290
|
+
**Analyze evolution effectiveness over time:**
|
|
291
|
+
|
|
292
|
+
```bash
|
|
293
|
+
bash runtime/feedback/analyze.sh --filter meta-skill-evolution
|
|
294
|
+
# Shows: how many evolutions triggered, average rate improvement per evolution,
|
|
295
|
+
# which skills have been evolved most, correlation between evolution and task success
|
|
296
|
+
```
|
|
297
|
+
|
|
298
|
+
**Track cumulative improvement:**
|
|
299
|
+
```bash
|
|
300
|
+
# Evolution impact report
|
|
301
|
+
cat ~/.clawpowers/feedback/evolution-log.jsonl | node -e "
|
|
302
|
+
const lines = require('fs').readFileSync('/dev/stdin','utf8').trim().split('\n').map(JSON.parse);
|
|
303
|
+
const bySkill = {};
|
|
304
|
+
lines.forEach(e => {
|
|
305
|
+
if (!bySkill[e.skill]) bySkill[e.skill] = [];
|
|
306
|
+
bySkill[e.skill].push(e);
|
|
307
|
+
});
|
|
308
|
+
Object.entries(bySkill).forEach(([skill, evos]) => {
|
|
309
|
+
console.log(skill + ': ' + evos.length + ' evolutions, versions: ' + evos.map(e=>e.version_to).join(', '));
|
|
310
|
+
});
|
|
311
|
+
"
|
|
312
|
+
```
|
|
313
|
+
|
|
314
|
+
## Anti-Patterns
|
|
315
|
+
|
|
316
|
+
| Anti-Pattern | Why It Fails | Correct Approach |
|
|
317
|
+
|-------------|-------------|-----------------|
|
|
318
|
+
| Rewrite the whole skill | Destroys working methodology, no signal on what improved | Surgical edits only — max 30 lines changed |
|
|
319
|
+
| Evolve based on < 3 data points | Statistical noise triggers false evolution | Require ≥ 3 uses before a skill is eligible |
|
|
320
|
+
| Evolve on a cooling period | Too-frequent changes create instability | Enforce 10-task cooldown between evolutions |
|
|
321
|
+
| Skip the validation step | Bad evolutions compound over time | Always measure rate before vs after |
|
|
322
|
+
| Edit non-implicated sections | Changes unrelated things, pollutes signal | Only edit sections linked to failure notes |
|
|
323
|
+
| Forget to bump version | Can't track evolution history | Version bump is mandatory before commit |
|
|
324
|
+
| No evolution log entry | History is lost; can't audit what improved | Always append to evolution-log.jsonl |
|
|
325
|
+
| Evolve the meta-skill-evolution skill first | Circular improvement without baseline | Evolve leaf skills first; evolve this skill only after 5+ other evolutions |
|
|
@@ -0,0 +1,369 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: self-healing-code
|
|
3
|
+
description: On test failure, automatically capture the failure, run hypothesis-driven debugging, generate ≥2 candidate patches, apply and measure each, auto-commit the winner or escalate with full context. Max 3 iteration cycles with coverage guard.
|
|
4
|
+
version: 1.0.0
|
|
5
|
+
requires:
|
|
6
|
+
tools: [bash, git]
|
|
7
|
+
runtime: true
|
|
8
|
+
metrics:
|
|
9
|
+
tracks: [healing_attempts, auto_commits, escalations, patches_generated, coverage_delta, cycles_used]
|
|
10
|
+
improves: [patch_quality, hypothesis_accuracy, escalation_context_completeness]
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# Self-Healing Code
|
|
14
|
+
|
|
15
|
+
## When to Use
|
|
16
|
+
|
|
17
|
+
Apply this skill when:
|
|
18
|
+
|
|
19
|
+
- A CI run or local test suite produces a failure
|
|
20
|
+
- A previously green test suite goes red after a code change
|
|
21
|
+
- An automated pipeline fails and needs remediation without human intervention
|
|
22
|
+
- Bill runs tests and the output contains `FAILED`, `ERROR`, or non-zero exit code
|
|
23
|
+
|
|
24
|
+
**Skip when:**
|
|
25
|
+
- Tests fail because of a missing environment variable or missing external service (that's a configuration issue, not a code defect)
|
|
26
|
+
- The failure is a flaky test known to fail intermittently — check `~/.clawpowers/state/known-flaky.json` first
|
|
27
|
+
- A previous healing cycle for this exact error is already in progress (check `~/.clawpowers/state/healing-lock.json`)
|
|
28
|
+
|
|
29
|
+
**Decision tree:**
|
|
30
|
+
```
|
|
31
|
+
Did the test suite produce a failure?
|
|
32
|
+
├── No → no action
|
|
33
|
+
└── Yes → Is this a known flaky test?
|
|
34
|
+
├── Yes → skip, add flaky annotation, report
|
|
35
|
+
└── No → Is a healing cycle already running for this error?
|
|
36
|
+
├── Yes → wait for completion or check lock age
|
|
37
|
+
└── No → self-healing-code ← YOU ARE HERE
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
## Core Methodology
|
|
41
|
+
|
|
42
|
+
### Guardrails (enforce before any healing action)
|
|
43
|
+
|
|
44
|
+
```bash
|
|
45
|
+
# Max cycles — never exceed 3 healing iterations per error
|
|
46
|
+
MAX_CYCLES=3
|
|
47
|
+
HEALING_STATE=~/.clawpowers/state/healing-$(echo "$ERROR_SIG" | md5).json
|
|
48
|
+
CURRENT_CYCLE=$(cat "$HEALING_STATE" 2>/dev/null | node -e "const d=require('/dev/stdin');console.log(d.cycle||0)" 2>/dev/null || echo 0)
|
|
49
|
+
|
|
50
|
+
if (( CURRENT_CYCLE >= MAX_CYCLES )); then
|
|
51
|
+
echo "Max cycles ($MAX_CYCLES) reached. Escalating."
|
|
52
|
+
# → go to Step 6: Escalation
|
|
53
|
+
fi
|
|
54
|
+
|
|
55
|
+
# Coverage guard — baseline before any patch
|
|
56
|
+
COVERAGE_BASELINE=$(bash runtime/persistence/store.sh get "coverage:baseline:$PROJECT" 2>/dev/null || echo "0")
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
### Step 1: Capture the Failure
|
|
60
|
+
|
|
61
|
+
Collect everything needed to understand and reproduce the failure:
|
|
62
|
+
|
|
63
|
+
```bash
|
|
64
|
+
# Run tests and capture full output
|
|
65
|
+
TEST_OUTPUT=$(bash -c "$TEST_CMD 2>&1") || true
|
|
66
|
+
EXIT_CODE=$?
|
|
67
|
+
|
|
68
|
+
# Extract structured fields
|
|
69
|
+
TEST_NAME=$(echo "$TEST_OUTPUT" | grep -E "^(FAILED|FAIL|Error in)" | head -1)
|
|
70
|
+
ERROR_MSG=$(echo "$TEST_OUTPUT" | grep -A5 "AssertionError\|Error:\|Exception:" | head -10)
|
|
71
|
+
STACK_TRACE=$(echo "$TEST_OUTPUT" | grep -A20 "Traceback\|at [A-Za-z]" | head -30)
|
|
72
|
+
|
|
73
|
+
# Diff from last green commit
|
|
74
|
+
LAST_GREEN=$(bash runtime/persistence/store.sh get "last-green:$PROJECT" 2>/dev/null || git log --oneline | grep -i "green\|pass\|ci:" | head -1 | awk '{print $1}')
|
|
75
|
+
DIFF_FROM_GREEN=""
|
|
76
|
+
if [[ -n "$LAST_GREEN" ]]; then
|
|
77
|
+
DIFF_FROM_GREEN=$(git diff "$LAST_GREEN" HEAD -- . 2>/dev/null | head -200)
|
|
78
|
+
fi
|
|
79
|
+
|
|
80
|
+
# Error signature hash (for dedup and state tracking)
|
|
81
|
+
ERROR_SIG=$(echo "${TEST_NAME}${ERROR_MSG}" | md5)
|
|
82
|
+
|
|
83
|
+
# Log the capture
|
|
84
|
+
CAPTURE_RECORD=~/.clawpowers/state/healing-$ERROR_SIG-capture.json
|
|
85
|
+
cat > "$CAPTURE_RECORD" <<EOF
|
|
86
|
+
{
|
|
87
|
+
"timestamp": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
|
|
88
|
+
"test_name": $(echo "$TEST_NAME" | node -e "process.stdin.on('data',d=>console.log(JSON.stringify(d.toString().trim())))"),
|
|
89
|
+
"error_msg": $(echo "$ERROR_MSG" | node -e "process.stdin.on('data',d=>console.log(JSON.stringify(d.toString().trim())))"),
|
|
90
|
+
"exit_code": $EXIT_CODE,
|
|
91
|
+
"last_green_commit": "$LAST_GREEN",
|
|
92
|
+
"error_signature": "$ERROR_SIG"
|
|
93
|
+
}
|
|
94
|
+
EOF
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
**Capture checklist:**
|
|
98
|
+
- [ ] Test name (exact test identifier)
|
|
99
|
+
- [ ] Full error message (not truncated)
|
|
100
|
+
- [ ] Stack trace (full, not just last frame)
|
|
101
|
+
- [ ] Diff from last green commit
|
|
102
|
+
- [ ] Environment snapshot (language version, key deps)
|
|
103
|
+
|
|
104
|
+
### Step 2: Hypothesis Tree (Systematic Debugging Integration)
|
|
105
|
+
|
|
106
|
+
Apply the `systematic-debugging` methodology to form ranked hypotheses. This is not optional — random patches without hypotheses produce random results.
|
|
107
|
+
|
|
108
|
+
```bash
|
|
109
|
+
# Check persistent hypothesis memory first (see systematic-debugging enhancement)
|
|
110
|
+
KNOWN_HYP=$(bash runtime/persistence/store.sh get "debug:hypothesis:$ERROR_SIG" 2>/dev/null)
|
|
111
|
+
if [[ -n "$KNOWN_HYP" ]]; then
|
|
112
|
+
echo "Known error pattern found. Starting with previously successful hypothesis."
|
|
113
|
+
echo "$KNOWN_HYP"
|
|
114
|
+
fi
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
**Hypothesis template for common failure patterns:**
|
|
118
|
+
|
|
119
|
+
| Failure pattern | Likely hypothesis | Experiment |
|
|
120
|
+
|----------------|------------------|-----------|
|
|
121
|
+
| `AttributeError: 'NoneType' has no attribute X` | Null not guarded in refactored path | Add null check before access |
|
|
122
|
+
| `AssertionError: expected X, got Y` | Logic changed in upstream function | Bisect to find commit, inspect callers |
|
|
123
|
+
| `ConnectionRefusedError` | Service not started or port changed | Check env config, not a code fix |
|
|
124
|
+
| `KeyError: 'field_name'` | Schema changed, consumer not updated | Find all consumers of that key |
|
|
125
|
+
| `TypeError: expected str, got int` | Type coercion removed | Restore coercion or fix caller |
|
|
126
|
+
|
|
127
|
+
Form 2-4 specific hypotheses before generating patches.
|
|
128
|
+
|
|
129
|
+
### Step 3: Generate Candidate Patches (Minimum 2)
|
|
130
|
+
|
|
131
|
+
For each top hypothesis, generate a candidate patch. Generate patches **before** applying any:
|
|
132
|
+
|
|
133
|
+
```bash
|
|
134
|
+
# Stash current state for rollback safety
|
|
135
|
+
git stash push -m "self-healing-pre-patch-$ERROR_SIG-$(date +%s)"
|
|
136
|
+
STASH_REF=$(git stash list | head -1 | awk '{print $1}' | tr -d ':')
|
|
137
|
+
|
|
138
|
+
# Generate patch candidates (store as files, don't apply yet)
|
|
139
|
+
mkdir -p ~/.clawpowers/state/patches/$ERROR_SIG
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
**Patch generation principles:**
|
|
143
|
+
- **Patch A:** Minimal fix — smallest change that addresses the hypothesis (prefer this)
|
|
144
|
+
- **Patch B:** Alternative approach — different mechanism, same outcome
|
|
145
|
+
- **Patch C (if needed):** Defensive fix — add guards to prevent the class of error
|
|
146
|
+
|
|
147
|
+
**Example (Python null guard):**
|
|
148
|
+
```python
|
|
149
|
+
# Patch A — minimal: add None check at the failure site
|
|
150
|
+
# Before:
|
|
151
|
+
result = user.profile.settings["theme"]
|
|
152
|
+
# After:
|
|
153
|
+
result = user.profile.settings.get("theme", "default") if user.profile else "default"
|
|
154
|
+
|
|
155
|
+
# Patch B — alternative: fix upstream to guarantee non-null
|
|
156
|
+
# Before:
|
|
157
|
+
def get_user(user_id):
|
|
158
|
+
return db.query(User).filter_by(id=user_id).first() # can return None
|
|
159
|
+
# After:
|
|
160
|
+
def get_user(user_id):
|
|
161
|
+
user = db.query(User).filter_by(id=user_id).first()
|
|
162
|
+
if user is None:
|
|
163
|
+
raise UserNotFoundError(f"User {user_id} not found")
|
|
164
|
+
return user
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
Write each patch to a file:
|
|
168
|
+
```bash
|
|
169
|
+
# Write patches to staging area
|
|
170
|
+
cat > ~/.clawpowers/state/patches/$ERROR_SIG/patch-a.diff <<'EOF'
|
|
171
|
+
[patch content here]
|
|
172
|
+
EOF
|
|
173
|
+
|
|
174
|
+
# Capture reasoning for each patch
|
|
175
|
+
echo '{"patch":"a","hypothesis":"null not guarded","mechanism":"add get() with default","confidence":"high"}' \
|
|
176
|
+
> ~/.clawpowers/state/patches/$ERROR_SIG/patch-a-meta.json
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
### Step 4: Apply, Test, Measure
|
|
180
|
+
|
|
181
|
+
Apply patches in order, testing each. Stop at the first winner.
|
|
182
|
+
|
|
183
|
+
```bash
|
|
184
|
+
# Measure baseline coverage before any patch
|
|
185
|
+
COVERAGE_BEFORE=$(bash -c "$COVERAGE_CMD 2>&1" | grep -E "TOTAL.*[0-9]+%" | grep -oE "[0-9]+%" | tail -1)
|
|
186
|
+
|
|
187
|
+
for PATCH in a b c; do
|
|
188
|
+
PATCH_FILE=~/.clawpowers/state/patches/$ERROR_SIG/patch-$PATCH.diff
|
|
189
|
+
[[ -f "$PATCH_FILE" ]] || continue
|
|
190
|
+
|
|
191
|
+
echo "=== Applying patch $PATCH ==="
|
|
192
|
+
|
|
193
|
+
# Restore clean state from stash before each patch
|
|
194
|
+
git stash pop 2>/dev/null || true
|
|
195
|
+
git stash push -m "self-healing-between-patches-$ERROR_SIG" 2>/dev/null || true
|
|
196
|
+
git checkout -- . 2>/dev/null || true
|
|
197
|
+
|
|
198
|
+
# Apply the patch
|
|
199
|
+
git apply "$PATCH_FILE" 2>/dev/null || patch -p1 < "$PATCH_FILE" 2>/dev/null
|
|
200
|
+
|
|
201
|
+
# Run full test suite
|
|
202
|
+
TEST_RESULT=$(bash -c "$TEST_CMD 2>&1")
|
|
203
|
+
TEST_EXIT=$?
|
|
204
|
+
|
|
205
|
+
# Measure coverage after patch
|
|
206
|
+
COVERAGE_AFTER=$(bash -c "$COVERAGE_CMD 2>&1" | grep -E "TOTAL.*[0-9]+%" | grep -oE "[0-9]+%" | tail -1)
|
|
207
|
+
|
|
208
|
+
# Coverage guard: never reduce
|
|
209
|
+
COVERAGE_OK=true
|
|
210
|
+
if [[ -n "$COVERAGE_BEFORE" && -n "$COVERAGE_AFTER" ]]; then
|
|
211
|
+
BEFORE_NUM=$(echo "$COVERAGE_BEFORE" | tr -d '%')
|
|
212
|
+
AFTER_NUM=$(echo "$COVERAGE_AFTER" | tr -d '%')
|
|
213
|
+
if (( AFTER_NUM < BEFORE_NUM )); then
|
|
214
|
+
COVERAGE_OK=false
|
|
215
|
+
echo "Coverage dropped: $COVERAGE_BEFORE → $COVERAGE_AFTER. Patch $PATCH rejected."
|
|
216
|
+
fi
|
|
217
|
+
fi
|
|
218
|
+
|
|
219
|
+
if [[ $TEST_EXIT -eq 0 && "$COVERAGE_OK" == "true" ]]; then
|
|
220
|
+
echo "Patch $PATCH PASSED all tests. Coverage: $COVERAGE_BEFORE → $COVERAGE_AFTER"
|
|
221
|
+
WINNING_PATCH=$PATCH
|
|
222
|
+
break
|
|
223
|
+
else
|
|
224
|
+
echo "Patch $PATCH FAILED. Exit: $TEST_EXIT. Coverage OK: $COVERAGE_OK"
|
|
225
|
+
fi
|
|
226
|
+
done
|
|
227
|
+
```
|
|
228
|
+
|
|
229
|
+
### Step 5: Auto-Commit the Winner
|
|
230
|
+
|
|
231
|
+
If a patch passes all tests and maintains coverage:
|
|
232
|
+
|
|
233
|
+
```bash
|
|
234
|
+
if [[ -n "$WINNING_PATCH" ]]; then
|
|
235
|
+
# Commit with full context
|
|
236
|
+
git add -A
|
|
237
|
+
git commit -m "fix: self-healing patch for ${TEST_NAME}
|
|
238
|
+
|
|
239
|
+
Error signature: $ERROR_SIG
|
|
240
|
+
Patch applied: $WINNING_PATCH
|
|
241
|
+
Hypothesis: $(cat ~/.clawpowers/state/patches/$ERROR_SIG/patch-$WINNING_PATCH-meta.json | node -e "const d=require('/dev/stdin');process.stdin.pipe(d.hypothesis)")
|
|
242
|
+
Coverage: $COVERAGE_BEFORE → $COVERAGE_AFTER
|
|
243
|
+
Cycles used: $((CURRENT_CYCLE + 1))/$MAX_CYCLES
|
|
244
|
+
|
|
245
|
+
[self-healing-code]"
|
|
246
|
+
|
|
247
|
+
# Store last-green reference
|
|
248
|
+
bash runtime/persistence/store.sh set "last-green:$PROJECT" "$(git rev-parse HEAD)"
|
|
249
|
+
|
|
250
|
+
# Record success
|
|
251
|
+
bash runtime/metrics/collector.sh record \
|
|
252
|
+
--skill self-healing-code \
|
|
253
|
+
--outcome success \
|
|
254
|
+
--notes "patch-$WINNING_PATCH won, coverage $COVERAGE_BEFORE→$COVERAGE_AFTER, cycle $((CURRENT_CYCLE+1))/$MAX_CYCLES"
|
|
255
|
+
|
|
256
|
+
# Clean up healing state
|
|
257
|
+
rm -rf ~/.clawpowers/state/patches/$ERROR_SIG
|
|
258
|
+
rm -f ~/.clawpowers/state/healing-$ERROR_SIG*.json
|
|
259
|
+
fi
|
|
260
|
+
```
|
|
261
|
+
|
|
262
|
+
### Step 6: Rollback Protocol
|
|
263
|
+
|
|
264
|
+
If no patch wins after all candidates are tried:
|
|
265
|
+
|
|
266
|
+
```bash
|
|
267
|
+
if [[ -z "$WINNING_PATCH" ]]; then
|
|
268
|
+
# Restore to pre-healing state
|
|
269
|
+
git checkout -- .
|
|
270
|
+
git stash drop 2>/dev/null || true
|
|
271
|
+
echo "All patches failed. State restored to pre-healing baseline."
|
|
272
|
+
|
|
273
|
+
# Increment cycle counter
|
|
274
|
+
NEW_CYCLE=$((CURRENT_CYCLE + 1))
|
|
275
|
+
echo "{\"cycle\": $NEW_CYCLE, \"error_sig\": \"$ERROR_SIG\"}" > "$HEALING_STATE"
|
|
276
|
+
|
|
277
|
+
if (( NEW_CYCLE < MAX_CYCLES )); then
|
|
278
|
+
echo "Cycle $NEW_CYCLE/$MAX_CYCLES complete. Forming new hypotheses."
|
|
279
|
+
# → Loop back to Step 2 with refined hypotheses
|
|
280
|
+
else
|
|
281
|
+
# → Escalate
|
|
282
|
+
echo "Max cycles reached. Escalating with full context."
|
|
283
|
+
fi
|
|
284
|
+
fi
|
|
285
|
+
```
|
|
286
|
+
|
|
287
|
+
### Step 7: Escalation Package
|
|
288
|
+
|
|
289
|
+
When all cycles are exhausted, escalate with enough context that a human can immediately begin debugging:
|
|
290
|
+
|
|
291
|
+
```markdown
|
|
292
|
+
## Self-Healing Escalation Report
|
|
293
|
+
|
|
294
|
+
**Error:** [test_name]
|
|
295
|
+
**Error signature:** [hash]
|
|
296
|
+
**Cycles attempted:** 3/3
|
|
297
|
+
**Time spent:** [duration]
|
|
298
|
+
|
|
299
|
+
### Failure Details
|
|
300
|
+
[Full test output — not truncated]
|
|
301
|
+
|
|
302
|
+
### Patches Attempted
|
|
303
|
+
1. Patch A — [hypothesis] — [outcome]
|
|
304
|
+
2. Patch B — [hypothesis] — [outcome]
|
|
305
|
+
3. Patch C — [hypothesis] — [outcome]
|
|
306
|
+
|
|
307
|
+
### Diff from Last Green
|
|
308
|
+
[git diff output]
|
|
309
|
+
|
|
310
|
+
### Recommended Next Step
|
|
311
|
+
[Best remaining hypothesis with suggested experiment]
|
|
312
|
+
|
|
313
|
+
### Relevant Files
|
|
314
|
+
[files touched by failing test]
|
|
315
|
+
```
|
|
316
|
+
|
|
317
|
+
```bash
|
|
318
|
+
# Record escalation
|
|
319
|
+
bash runtime/metrics/collector.sh record \
|
|
320
|
+
--skill self-healing-code \
|
|
321
|
+
--outcome failure \
|
|
322
|
+
--notes "escalated: $MAX_CYCLES cycles, $PATCHES_TRIED patches, test: $TEST_NAME"
|
|
323
|
+
```
|
|
324
|
+
|
|
325
|
+
## ClawPowers Enhancement
|
|
326
|
+
|
|
327
|
+
When `~/.clawpowers/` runtime is initialized:
|
|
328
|
+
|
|
329
|
+
**Healing state persistence (resumable across sessions):**
|
|
330
|
+
|
|
331
|
+
```bash
|
|
332
|
+
# Save healing progress
|
|
333
|
+
bash runtime/persistence/store.sh set "healing:$ERROR_SIG:cycle" "$CURRENT_CYCLE"
|
|
334
|
+
bash runtime/persistence/store.sh set "healing:$ERROR_SIG:stash" "$STASH_REF"
|
|
335
|
+
bash runtime/persistence/store.sh set "healing:$ERROR_SIG:patches_tried" "$PATCHES_TRIED"
|
|
336
|
+
|
|
337
|
+
# Resume an interrupted healing session
|
|
338
|
+
ERROR_SIG="<hash>"
|
|
339
|
+
CYCLE=$(bash runtime/persistence/store.sh get "healing:$ERROR_SIG:cycle")
|
|
340
|
+
STASH=$(bash runtime/persistence/store.sh get "healing:$ERROR_SIG:stash")
|
|
341
|
+
echo "Resuming healing cycle $CYCLE for error $ERROR_SIG"
|
|
342
|
+
```
|
|
343
|
+
|
|
344
|
+
**Regression detection:**
|
|
345
|
+
```bash
|
|
346
|
+
# After auto-commit, verify no regressions in related tests
|
|
347
|
+
RELATED_TESTS=$(git diff HEAD~1 HEAD --name-only | xargs grep -l "def test_" 2>/dev/null | head -10)
|
|
348
|
+
bash -c "$TEST_CMD $RELATED_TESTS 2>&1"
|
|
349
|
+
```
|
|
350
|
+
|
|
351
|
+
**Pattern learning (feeds systematic-debugging):**
|
|
352
|
+
```bash
|
|
353
|
+
# After successful heal, store the winning pattern
|
|
354
|
+
bash runtime/persistence/store.sh set "debug:hypothesis:$ERROR_SIG" \
|
|
355
|
+
"$(cat ~/.clawpowers/state/patches/$ERROR_SIG/patch-$WINNING_PATCH-meta.json)"
|
|
356
|
+
```
|
|
357
|
+
|
|
358
|
+
## Anti-Patterns
|
|
359
|
+
|
|
360
|
+
| Anti-Pattern | Why It Fails | Correct Approach |
|
|
361
|
+
|-------------|-------------|-----------------|
|
|
362
|
+
| Apply patches without stashing first | No rollback path if all patches fail | Always stash before first patch |
|
|
363
|
+
| Skip hypothesis formation | Random patches waste all 3 cycles | Form ranked hypotheses before any patch |
|
|
364
|
+
| Generate only 1 patch | Single point of failure | Always generate ≥ 2 patches before applying |
|
|
365
|
+
| Skip coverage check | Patches that delete tests always "pass" | Coverage guard is non-negotiable |
|
|
366
|
+
| Apply patches sequentially without reset | Patches contaminate each other | Reset to clean state between each patch |
|
|
367
|
+
| Commit without full test suite pass | Partial fixes break other tests | Run full suite, not just the failing test |
|
|
368
|
+
| Exceed 3 cycles | Spiraling into a rabbit hole | Hard limit at 3; escalate cleanly |
|
|
369
|
+
| Escalate without full context | Human must re-investigate from scratch | Escalation package must include all evidence |
|