npm - clawpowers - Versions diffs - 1.0.0 → 1.1.0 - Mend

clawpowers 1.0.0 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

package/README.md +8 -2
package/docs/demo/clawpowers-demo.cast +197 -0
package/docs/demo/clawpowers-demo.gif +0 -0
package/docs/launch-images/post1-hero-lobster.jpg +0 -0
package/docs/launch-images/post2-dashboard.jpg +0 -0
package/docs/launch-images/post3-superpowers.jpg +0 -0
package/docs/launch-images/post4-before-after.jpg +0 -0
package/docs/launch-images/post5-install-now.jpg +0 -0
package/docs/launch-posts.md +76 -0
package/package.json +3 -2
package/skills/cross-project-knowledge/SKILL.md +345 -0
package/skills/formal-verification-lite/SKILL.md +441 -0
package/skills/meta-skill-evolution/SKILL.md +325 -0
package/skills/self-healing-code/SKILL.md +369 -0
package/skills/systematic-debugging/SKILL.md +76 -0
package/skills/test-driven-development/SKILL.md +117 -0
package/skills/using-clawpowers/SKILL.md +17 -5

package/skills/systematic-debugging/SKILL.md CHANGED Viewed

@@ -38,6 +38,82 @@ Is the error message self-explanatory?
 ## Core Methodology
+### Persistent Hypothesis Memory
+Before forming any new hypotheses, check if this error pattern has been seen before. Pattern-matching known bugs is 10-100x faster than fresh investigation.
+**Step 0: Check the hypothesis memory store**
+```bash
+# Compute error signature hash from the error message + test name
+ERROR_MSG="ConnectionPool timeout after 50 requests"
+ERROR_SIG=$(echo "$ERROR_MSG" | md5)
+# Look up prior debugging sessions for this error pattern
+KNOWN=$(bash runtime/persistence/store.sh get "debug:hypothesis:$ERROR_SIG:winning" 2>/dev/null)
+if [[ -n "$KNOWN" ]]; then
+  echo "=== Known error pattern found ==="
+  echo "Previously solved. Winning hypothesis:"
+  echo "$KNOWN"
+  echo ""
+  # Start directly with the previously successful hypothesis
+  # Verify it applies to the current context before applying
+fi
+```
+**Storage format** — every hypothesis tree is stored keyed by error signature:
+```bash
+# After solving a bug, always persist the result
+ERROR_SIG=$(echo "$ERROR_MSG" | md5)
+RESOLVE_TIME=$(( END_TS - START_TS ))
+bash runtime/persistence/store.sh set "debug:hypothesis:$ERROR_SIG:error_msg" "$ERROR_MSG"
+bash runtime/persistence/store.sh set "debug:hypothesis:$ERROR_SIG:hypotheses_tried" "$H1|$H2|$H3"
+bash runtime/persistence/store.sh set "debug:hypothesis:$ERROR_SIG:winning" "$WINNING_HYPOTHESIS"
+bash runtime/persistence/store.sh set "debug:hypothesis:$ERROR_SIG:root_cause" "$ROOT_CAUSE"
+bash runtime/persistence/store.sh set "debug:hypothesis:$ERROR_SIG:fix_summary" "$FIX_SUMMARY"
+bash runtime/persistence/store.sh set "debug:hypothesis:$ERROR_SIG:time_to_resolution" "$RESOLVE_TIME"
+bash runtime/persistence/store.sh set "debug:hypothesis:$ERROR_SIG:project" "$(basename $(git rev-parse --show-toplevel))"
+bash runtime/persistence/store.sh set "debug:hypothesis:$ERROR_SIG:timestamp" "$(date -u +%Y-%m-%dT%H:%M:%SZ)"
+```
+**Fuzzy search for similar patterns** (when exact hash doesn't match):
+```bash
+# Search by keyword across all stored hypotheses
+bash runtime/persistence/store.sh list "debug:hypothesis:*:error_msg" | while read key; do
+  VALUE=$(bash runtime/persistence/store.sh get "$key")
+  if echo "$VALUE" | grep -qi "connection\|pool\|timeout"; then
+    SIG=$(echo "$key" | awk -F: '{print $3}')
+    echo "=== Similar error ==="
+    echo "Error: $VALUE"
+    echo "Winning hypothesis: $(bash runtime/persistence/store.sh get "debug:hypothesis:$SIG:winning")"
+    echo "Time to resolve: $(bash runtime/persistence/store.sh get "debug:hypothesis:$SIG:time_to_resolution")s"
+    echo ""
+  fi
+done
+```
+**After 10+ debugging sessions, the memory pays dividends:**
+| Scenario | Without memory | With memory |
+|---------|---------------|-------------|
+| Same error exact match | 30-90 min investigation | < 2 min (known fix) |
+| Similar error pattern | 20-60 min | 5-10 min (start from best hypothesis) |
+| Novel error | Same as before | Same — no false acceleration |
+**When to override the memory:**
+- The error signature matches but the context differs (different library version, different project type)
+- The previously winning hypothesis was marked as "project-specific"
+- The fix was a workaround, not a root cause fix
+```bash
+# Flag a fix as project-specific (won't suggest for other projects)
+bash runtime/persistence/store.sh set "debug:hypothesis:$ERROR_SIG:scope" "project-specific"
+```
 ### The Scientific Debugging Loop
 ```

package/skills/test-driven-development/SKILL.md CHANGED Viewed

@@ -188,10 +188,127 @@ test_jwt_token_stuff()
 TDD applies at all layers. Start with unit. Add integration when units pass.
+### Autonomous Mutation Testing
+After the REFACTOR phase is complete and all tests are green, run autonomous mutation testing to verify your tests actually catch bugs — not just pass on correct code.
+**The mutation testing loop:**
+```
+GREEN tests → generate mutants → run suite against each → calculate score → fix gaps → re-run
+```
+**Step 1: Generate mutants**
+Mutation tools automatically modify your production code in small ways to simulate bugs:
+| Mutation type | Example | What it tests |
+|--------------|---------|-------------|
+| Operator swap | `a > b` → `a >= b` | Off-by-one detection |
+| Condition removal | `if (valid && active)` → `if (active)` | Guard clause tests |
+| Return value swap | `return true` → `return false` | Output assertion coverage |
+| Constant mutation | `ttl = 3600` → `ttl = 0` | Boundary value tests |
+| Statement deletion | Remove a line entirely | Whether tests catch missing logic |
+**Step 2: Run mutation tools**
+```bash
+# Python: mutmut
+pip install mutmut
+mutmut run --paths-to-mutate src/ --tests-dir tests/
+mutmut results  # shows surviving (undetected) mutants
+# JavaScript/TypeScript: Stryker
+npx stryker run
+# Stryker generates a detailed HTML report with surviving mutants
+# Go: go-mutesting
+go install github.com/zimmski/go-mutesting/cmd/go-mutesting@latest
+go-mutesting ./...
+# Java: PIT
+mvn org.pitest:pitest-maven:mutationCoverage
+```
+**Step 3: Calculate and interpret the mutation score**
+```
+mutation score = (killed mutants / total mutants) × 100
+```
+| Score | Assessment | Action |
+|-------|-----------|--------|
+| ≥ 90% | Excellent | No action needed |
+| 80–89% | Good | Review surviving mutants; add 1-2 targeted tests |
+| 70–79% | Marginal | Systematic gap; add boundary and error-path tests |
+| < 70% | Poor | Tests exist but don't assert enough; add failing-case coverage |
+**Step 4: Kill surviving mutants**
+For each surviving mutant, the tool shows what change it made. Write a test that would catch that bug:
+```python
+# Stryker report shows this mutant survived:
+# Original:  if score >= passing_threshold:
+# Mutant:    if score > passing_threshold:
+# Write a test that detects the off-by-one:
+def test_score_at_exact_threshold_passes():
+    # This test kills the >= vs > mutant
+    assert grade(score=passing_threshold) == "pass"
+    assert grade(score=passing_threshold - 1) == "fail"
+```
+```typescript
+// Stryker shows this mutant survived:
+// Original:  return { token, expiresAt, userId }
+// Mutant:    return { token, expiresAt, userId: "" }
+// Write a test that kills it:
+test('issue() returns correct userId in payload', () => {
+  const result = auth.issue('user-abc', 3600);
+  expect(result.userId).toBe('user-abc');  // was not previously asserted!
+});
+```
+**Step 5: Iterate until score ≥ 80%**
+```bash
+# After adding new tests, re-run to measure improvement
+mutmut run --paths-to-mutate src/ --tests-dir tests/
+NEW_SCORE=$(mutmut results | grep "Killed" | awk '{print $2/$4 * 100}')
+echo "Mutation score: $NEW_SCORE%"
+```
+**Tracking mutation scores over time:**
+```bash
+# Record in ClawPowers metrics after each TDD cycle
+MUTATION_SCORE=87
+bash runtime/metrics/collector.sh record \
+  --skill test-driven-development \
+  --outcome success \
+  --notes "AuthService: RED×6 witnessed, mutation_score=$MUTATION_SCORE%, 0 surviving mutants after 2 additions"
+```
+The TDD cycle with mutation testing:
+```
+RED → GREEN → REFACTOR → MUTATE → [score < 80%? → KILL SURVIVORS → RE-MUTATE] → done
+```
 ## ClawPowers Enhancement
 When `~/.clawpowers/` runtime is initialized:
+**Mutation Score History:**
+```bash
+# Query historical mutation scores
+bash runtime/persistence/store.sh list "tdd:mutation:*" | sort
+# Shows trend: if scores are declining, tests are growing but not keeping up with code complexity
+```
 **Mutation Analysis Integration:**
 After the GREEN phase, optionally run mutation analysis to verify your tests actually catch bugs — not just pass on correct code:

package/skills/using-clawpowers/SKILL.md CHANGED Viewed

@@ -29,7 +29,7 @@ ClawPowers follows a three-layer approach:
 3. **Outcome Tracking** — If runtime is available, record execution outcomes for self-improvement
-You have ClawPowers loaded. This gives you 20 skills that go beyond static instructions — they execute tools, persist state across sessions, and track outcomes for self-improvement.
+You have ClawPowers loaded. This gives you 24 skills that go beyond static instructions — they execute tools, persist state across sessions, and track outcomes for self-improvement. The RSI Intelligence Layer (skills 21-24) enables the agent to improve its own methodology over time.
 ## How Skills Work
@@ -58,6 +58,12 @@ Skills activate automatically when you recognize a matching task pattern. You do
 | Need to understand how to learn something effectively | `learn-how-to-learn` |
 | Competitive research or trend analysis | `market-intelligence` |
 | Finding leads or prospects | `prospecting` |
+| Task counter hits 50; skill success rates declining | `meta-skill-evolution` |
+| Test suite fails; want automatic patch-and-commit | `self-healing-code` |
+| Starting a task; want to check cross-project patterns first | `cross-project-knowledge` |
+| After fixing a bug or architecture decision; want to store the pattern | `cross-project-knowledge` |
+| TDD GREEN phase complete; want invariant property tests | `formal-verification-lite` |
+| Need roundtrip/idempotence/commutativity tests for a pure function | `formal-verification-lite` |
 ## Reading a Skill
@@ -106,15 +112,15 @@ You never need to check the mode. Skills detect it themselves and adapt their in
 - **Don't stack conflicting skills** — If TDD and subagent-driven-development both apply, let subagent-driven-development drive; it includes TDD internally
 - **Don't ignore ClawPowers enhancements** — When the runtime is available, use it; the static path is a fallback, not the goal
-## Quick Reference: All 20 Skills
+## Quick Reference: All 24 Skills
 ### Core Development (14)
 1. `subagent-driven-development` — Parallel subagents, two-stage review, worktree isolation
-2. `test-driven-development` — RED-GREEN-REFACTOR with failure witness and mutation analysis
+2. `test-driven-development` — RED-GREEN-REFACTOR with failure witness and autonomous mutation testing
 3. `writing-plans` — Spec to sequenced 2-5 min tasks with dependency graph
 4. `executing-plans` — Tracked execution with resumability and milestone persistence
 5. `brainstorming` — Structured ideation with convergence protocol
-6. `systematic-debugging` — Hypothesis-driven debugging with evidence collection
+6. `systematic-debugging` — Hypothesis-driven debugging with persistent hypothesis memory
 7. `verification-before-completion` — Quality gates before any merge or handoff
 8. `finishing-a-development-branch` — Branch cleanup, changelog, squash, merge prep
 9. `requesting-code-review` — Review request with context, risk areas, reviewer matching
@@ -132,6 +138,12 @@ You never need to check the mode. Skills detect it themselves and adapt their in
 19. `market-intelligence` — Competitive analysis, trend detection, opportunity scoring
 20. `prospecting` — ICP → company search → contact enrichment → outreach prep
+### RSI Intelligence Layer (4) — NEW
+21. `meta-skill-evolution` — Every 50 tasks: analyze outcomes, find weakest skill, surgically improve it, commit with version bump
+22. `self-healing-code` — Test failure → hypothesis tree → ≥2 candidate patches → auto-commit winner or escalate
+23. `cross-project-knowledge` — Persistent pattern KB across all projects; search before tasks, store after fixes
+24. `formal-verification-lite` — Property-based testing (fast-check/Hypothesis) after TDD GREEN; 1000+ iterations per invariant
 ## Session Initialization Complete
-ClawPowers is ready. Skills activate on pattern recognition. Runtime enhancements available when `~/.clawpowers/` exists.
+ClawPowers is ready. 24 skills active. Skills activate on pattern recognition. Runtime enhancements available when `~/.clawpowers/` exists. RSI Intelligence Layer (meta-skill-evolution, self-healing-code, cross-project-knowledge, formal-verification-lite) provides persistent learning across sessions and projects.