clawpowers 1.0.0 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -38,6 +38,82 @@ Is the error message self-explanatory?
38
38
 
39
39
  ## Core Methodology
40
40
 
41
+ ### Persistent Hypothesis Memory
42
+
43
+ Before forming any new hypotheses, check if this error pattern has been seen before. Pattern-matching known bugs is 10-100x faster than fresh investigation.
44
+
45
+ **Step 0: Check the hypothesis memory store**
46
+
47
+ ```bash
48
+ # Compute error signature hash from the error message + test name
49
+ ERROR_MSG="ConnectionPool timeout after 50 requests"
50
+ ERROR_SIG=$(echo "$ERROR_MSG" | md5)
51
+
52
+ # Look up prior debugging sessions for this error pattern
53
+ KNOWN=$(bash runtime/persistence/store.sh get "debug:hypothesis:$ERROR_SIG:winning" 2>/dev/null)
54
+
55
+ if [[ -n "$KNOWN" ]]; then
56
+ echo "=== Known error pattern found ==="
57
+ echo "Previously solved. Winning hypothesis:"
58
+ echo "$KNOWN"
59
+ echo ""
60
+ # Start directly with the previously successful hypothesis
61
+ # Verify it applies to the current context before applying
62
+ fi
63
+ ```
64
+
65
+ **Storage format** — every hypothesis tree is stored keyed by error signature:
66
+
67
+ ```bash
68
+ # After solving a bug, always persist the result
69
+ ERROR_SIG=$(echo "$ERROR_MSG" | md5)
70
+ RESOLVE_TIME=$(( END_TS - START_TS ))
71
+
72
+ bash runtime/persistence/store.sh set "debug:hypothesis:$ERROR_SIG:error_msg" "$ERROR_MSG"
73
+ bash runtime/persistence/store.sh set "debug:hypothesis:$ERROR_SIG:hypotheses_tried" "$H1|$H2|$H3"
74
+ bash runtime/persistence/store.sh set "debug:hypothesis:$ERROR_SIG:winning" "$WINNING_HYPOTHESIS"
75
+ bash runtime/persistence/store.sh set "debug:hypothesis:$ERROR_SIG:root_cause" "$ROOT_CAUSE"
76
+ bash runtime/persistence/store.sh set "debug:hypothesis:$ERROR_SIG:fix_summary" "$FIX_SUMMARY"
77
+ bash runtime/persistence/store.sh set "debug:hypothesis:$ERROR_SIG:time_to_resolution" "$RESOLVE_TIME"
78
+ bash runtime/persistence/store.sh set "debug:hypothesis:$ERROR_SIG:project" "$(basename $(git rev-parse --show-toplevel))"
79
+ bash runtime/persistence/store.sh set "debug:hypothesis:$ERROR_SIG:timestamp" "$(date -u +%Y-%m-%dT%H:%M:%SZ)"
80
+ ```
81
+
82
+ **Fuzzy search for similar patterns** (when exact hash doesn't match):
83
+
84
+ ```bash
85
+ # Search by keyword across all stored hypotheses
86
+ bash runtime/persistence/store.sh list "debug:hypothesis:*:error_msg" | while read key; do
87
+ VALUE=$(bash runtime/persistence/store.sh get "$key")
88
+ if echo "$VALUE" | grep -qi "connection\|pool\|timeout"; then
89
+ SIG=$(echo "$key" | awk -F: '{print $3}')
90
+ echo "=== Similar error ==="
91
+ echo "Error: $VALUE"
92
+ echo "Winning hypothesis: $(bash runtime/persistence/store.sh get "debug:hypothesis:$SIG:winning")"
93
+ echo "Time to resolve: $(bash runtime/persistence/store.sh get "debug:hypothesis:$SIG:time_to_resolution")s"
94
+ echo ""
95
+ fi
96
+ done
97
+ ```
98
+
99
+ **After 10+ debugging sessions, the memory pays dividends:**
100
+
101
+ | Scenario | Without memory | With memory |
102
+ |---------|---------------|-------------|
103
+ | Same error exact match | 30-90 min investigation | < 2 min (known fix) |
104
+ | Similar error pattern | 20-60 min | 5-10 min (start from best hypothesis) |
105
+ | Novel error | Same as before | Same — no false acceleration |
106
+
107
+ **When to override the memory:**
108
+ - The error signature matches but the context differs (different library version, different project type)
109
+ - The previously winning hypothesis was marked as "project-specific"
110
+ - The fix was a workaround, not a root cause fix
111
+
112
+ ```bash
113
+ # Flag a fix as project-specific (won't suggest for other projects)
114
+ bash runtime/persistence/store.sh set "debug:hypothesis:$ERROR_SIG:scope" "project-specific"
115
+ ```
116
+
41
117
  ### The Scientific Debugging Loop
42
118
 
43
119
  ```
@@ -188,10 +188,127 @@ test_jwt_token_stuff()
188
188
 
189
189
  TDD applies at all layers. Start with unit. Add integration when units pass.
190
190
 
191
+ ### Autonomous Mutation Testing
192
+
193
+ After the REFACTOR phase is complete and all tests are green, run autonomous mutation testing to verify your tests actually catch bugs — not just pass on correct code.
194
+
195
+ **The mutation testing loop:**
196
+
197
+ ```
198
+ GREEN tests → generate mutants → run suite against each → calculate score → fix gaps → re-run
199
+ ```
200
+
201
+ **Step 1: Generate mutants**
202
+
203
+ Mutation tools automatically modify your production code in small ways to simulate bugs:
204
+
205
+ | Mutation type | Example | What it tests |
206
+ |--------------|---------|-------------|
207
+ | Operator swap | `a > b` → `a >= b` | Off-by-one detection |
208
+ | Condition removal | `if (valid && active)` → `if (active)` | Guard clause tests |
209
+ | Return value swap | `return true` → `return false` | Output assertion coverage |
210
+ | Constant mutation | `ttl = 3600` → `ttl = 0` | Boundary value tests |
211
+ | Statement deletion | Remove a line entirely | Whether tests catch missing logic |
212
+
213
+ **Step 2: Run mutation tools**
214
+
215
+ ```bash
216
+ # Python: mutmut
217
+ pip install mutmut
218
+ mutmut run --paths-to-mutate src/ --tests-dir tests/
219
+ mutmut results # shows surviving (undetected) mutants
220
+
221
+ # JavaScript/TypeScript: Stryker
222
+ npx stryker run
223
+ # Stryker generates a detailed HTML report with surviving mutants
224
+
225
+ # Go: go-mutesting
226
+ go install github.com/zimmski/go-mutesting/cmd/go-mutesting@latest
227
+ go-mutesting ./...
228
+
229
+ # Java: PIT
230
+ mvn org.pitest:pitest-maven:mutationCoverage
231
+ ```
232
+
233
+ **Step 3: Calculate and interpret the mutation score**
234
+
235
+ ```
236
+ mutation score = (killed mutants / total mutants) × 100
237
+ ```
238
+
239
+ | Score | Assessment | Action |
240
+ |-------|-----------|--------|
241
+ | ≥ 90% | Excellent | No action needed |
242
+ | 80–89% | Good | Review surviving mutants; add 1-2 targeted tests |
243
+ | 70–79% | Marginal | Systematic gap; add boundary and error-path tests |
244
+ | < 70% | Poor | Tests exist but don't assert enough; add failing-case coverage |
245
+
246
+ **Step 4: Kill surviving mutants**
247
+
248
+ For each surviving mutant, the tool shows what change it made. Write a test that would catch that bug:
249
+
250
+ ```python
251
+ # Stryker report shows this mutant survived:
252
+ # Original: if score >= passing_threshold:
253
+ # Mutant: if score > passing_threshold:
254
+
255
+ # Write a test that detects the off-by-one:
256
+ def test_score_at_exact_threshold_passes():
257
+ # This test kills the >= vs > mutant
258
+ assert grade(score=passing_threshold) == "pass"
259
+ assert grade(score=passing_threshold - 1) == "fail"
260
+ ```
261
+
262
+ ```typescript
263
+ // Stryker shows this mutant survived:
264
+ // Original: return { token, expiresAt, userId }
265
+ // Mutant: return { token, expiresAt, userId: "" }
266
+
267
+ // Write a test that kills it:
268
+ test('issue() returns correct userId in payload', () => {
269
+ const result = auth.issue('user-abc', 3600);
270
+ expect(result.userId).toBe('user-abc'); // was not previously asserted!
271
+ });
272
+ ```
273
+
274
+ **Step 5: Iterate until score ≥ 80%**
275
+
276
+ ```bash
277
+ # After adding new tests, re-run to measure improvement
278
+ mutmut run --paths-to-mutate src/ --tests-dir tests/
279
+ NEW_SCORE=$(mutmut results | grep "Killed" | awk '{print $2/$4 * 100}')
280
+ echo "Mutation score: $NEW_SCORE%"
281
+ ```
282
+
283
+ **Tracking mutation scores over time:**
284
+
285
+ ```bash
286
+ # Record in ClawPowers metrics after each TDD cycle
287
+ MUTATION_SCORE=87
288
+ bash runtime/metrics/collector.sh record \
289
+ --skill test-driven-development \
290
+ --outcome success \
291
+ --notes "AuthService: RED×6 witnessed, mutation_score=$MUTATION_SCORE%, 0 surviving mutants after 2 additions"
292
+ ```
293
+
294
+ The TDD cycle with mutation testing:
295
+
296
+ ```
297
+ RED → GREEN → REFACTOR → MUTATE → [score < 80%? → KILL SURVIVORS → RE-MUTATE] → done
298
+ ```
299
+
191
300
  ## ClawPowers Enhancement
192
301
 
193
302
  When `~/.clawpowers/` runtime is initialized:
194
303
 
304
+ **Mutation Score History:**
305
+
306
+ ```bash
307
+ # Query historical mutation scores
308
+ bash runtime/persistence/store.sh list "tdd:mutation:*" | sort
309
+ # Shows trend: if scores are declining, tests are growing but not keeping up with code complexity
310
+ ```
311
+
195
312
  **Mutation Analysis Integration:**
196
313
 
197
314
  After the GREEN phase, optionally run mutation analysis to verify your tests actually catch bugs — not just pass on correct code:
@@ -29,7 +29,7 @@ ClawPowers follows a three-layer approach:
29
29
  3. **Outcome Tracking** — If runtime is available, record execution outcomes for self-improvement
30
30
 
31
31
 
32
- You have ClawPowers loaded. This gives you 20 skills that go beyond static instructions — they execute tools, persist state across sessions, and track outcomes for self-improvement.
32
+ You have ClawPowers loaded. This gives you 24 skills that go beyond static instructions — they execute tools, persist state across sessions, and track outcomes for self-improvement. The RSI Intelligence Layer (skills 21-24) enables the agent to improve its own methodology over time.
33
33
 
34
34
  ## How Skills Work
35
35
 
@@ -58,6 +58,12 @@ Skills activate automatically when you recognize a matching task pattern. You do
58
58
  | Need to understand how to learn something effectively | `learn-how-to-learn` |
59
59
  | Competitive research or trend analysis | `market-intelligence` |
60
60
  | Finding leads or prospects | `prospecting` |
61
+ | Task counter hits 50; skill success rates declining | `meta-skill-evolution` |
62
+ | Test suite fails; want automatic patch-and-commit | `self-healing-code` |
63
+ | Starting a task; want to check cross-project patterns first | `cross-project-knowledge` |
64
+ | After fixing a bug or architecture decision; want to store the pattern | `cross-project-knowledge` |
65
+ | TDD GREEN phase complete; want invariant property tests | `formal-verification-lite` |
66
+ | Need roundtrip/idempotence/commutativity tests for a pure function | `formal-verification-lite` |
61
67
 
62
68
  ## Reading a Skill
63
69
 
@@ -106,15 +112,15 @@ You never need to check the mode. Skills detect it themselves and adapt their in
106
112
  - **Don't stack conflicting skills** — If TDD and subagent-driven-development both apply, let subagent-driven-development drive; it includes TDD internally
107
113
  - **Don't ignore ClawPowers enhancements** — When the runtime is available, use it; the static path is a fallback, not the goal
108
114
 
109
- ## Quick Reference: All 20 Skills
115
+ ## Quick Reference: All 24 Skills
110
116
 
111
117
  ### Core Development (14)
112
118
  1. `subagent-driven-development` — Parallel subagents, two-stage review, worktree isolation
113
- 2. `test-driven-development` — RED-GREEN-REFACTOR with failure witness and mutation analysis
119
+ 2. `test-driven-development` — RED-GREEN-REFACTOR with failure witness and autonomous mutation testing
114
120
  3. `writing-plans` — Spec to sequenced 2-5 min tasks with dependency graph
115
121
  4. `executing-plans` — Tracked execution with resumability and milestone persistence
116
122
  5. `brainstorming` — Structured ideation with convergence protocol
117
- 6. `systematic-debugging` — Hypothesis-driven debugging with evidence collection
123
+ 6. `systematic-debugging` — Hypothesis-driven debugging with persistent hypothesis memory
118
124
  7. `verification-before-completion` — Quality gates before any merge or handoff
119
125
  8. `finishing-a-development-branch` — Branch cleanup, changelog, squash, merge prep
120
126
  9. `requesting-code-review` — Review request with context, risk areas, reviewer matching
@@ -132,6 +138,12 @@ You never need to check the mode. Skills detect it themselves and adapt their in
132
138
  19. `market-intelligence` — Competitive analysis, trend detection, opportunity scoring
133
139
  20. `prospecting` — ICP → company search → contact enrichment → outreach prep
134
140
 
141
+ ### RSI Intelligence Layer (4) — NEW
142
+ 21. `meta-skill-evolution` — Every 50 tasks: analyze outcomes, find weakest skill, surgically improve it, commit with version bump
143
+ 22. `self-healing-code` — Test failure → hypothesis tree → ≥2 candidate patches → auto-commit winner or escalate
144
+ 23. `cross-project-knowledge` — Persistent pattern KB across all projects; search before tasks, store after fixes
145
+ 24. `formal-verification-lite` — Property-based testing (fast-check/Hypothesis) after TDD GREEN; 1000+ iterations per invariant
146
+
135
147
  ## Session Initialization Complete
136
148
 
137
- ClawPowers is ready. Skills activate on pattern recognition. Runtime enhancements available when `~/.clawpowers/` exists.
149
+ ClawPowers is ready. 24 skills active. Skills activate on pattern recognition. Runtime enhancements available when `~/.clawpowers/` exists. RSI Intelligence Layer (meta-skill-evolution, self-healing-code, cross-project-knowledge, formal-verification-lite) provides persistent learning across sessions and projects.