clawpowers 1.1.4 → 2.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +126 -0
- package/COMPATIBILITY.md +13 -0
- package/KNOWN_LIMITATIONS.md +19 -0
- package/LICENSE +44 -0
- package/LICENSING.md +10 -0
- package/README.md +378 -210
- package/SECURITY.md +52 -0
- package/dist/index.d.ts +1477 -0
- package/dist/index.js +3464 -0
- package/dist/index.js.map +1 -0
- package/native/Cargo.lock +4863 -0
- package/native/Cargo.toml +73 -0
- package/native/crates/canonical/Cargo.toml +24 -0
- package/native/crates/canonical/src/lib.rs +673 -0
- package/native/crates/compression/Cargo.toml +20 -0
- package/native/crates/compression/benches/compression_bench.rs +42 -0
- package/native/crates/compression/src/lib.rs +393 -0
- package/native/crates/evm-eth/Cargo.toml +13 -0
- package/native/crates/evm-eth/src/lib.rs +105 -0
- package/native/crates/fee/Cargo.toml +15 -0
- package/native/crates/fee/src/lib.rs +281 -0
- package/native/crates/index/Cargo.toml +16 -0
- package/native/crates/index/src/lib.rs +277 -0
- package/native/crates/policy/Cargo.toml +17 -0
- package/native/crates/policy/src/lib.rs +614 -0
- package/native/crates/security/Cargo.toml +22 -0
- package/native/crates/security/src/lib.rs +478 -0
- package/native/crates/tokens/Cargo.toml +13 -0
- package/native/crates/tokens/src/lib.rs +534 -0
- package/native/crates/verification/Cargo.toml +23 -0
- package/native/crates/verification/src/lib.rs +333 -0
- package/native/crates/wallet/Cargo.toml +20 -0
- package/native/crates/wallet/src/lib.rs +261 -0
- package/native/crates/x402/Cargo.toml +30 -0
- package/native/crates/x402/src/lib.rs +423 -0
- package/native/ffi/Cargo.toml +34 -0
- package/native/ffi/build.rs +4 -0
- package/native/ffi/index.node +0 -0
- package/native/ffi/src/lib.rs +352 -0
- package/native/ffi/tests/integration.rs +354 -0
- package/native/pyo3/Cargo.toml +26 -0
- package/native/pyo3/pyproject.toml +16 -0
- package/native/pyo3/src/lib.rs +407 -0
- package/native/pyo3/tests/test_smoke.py +180 -0
- package/native/wasm/Cargo.toml +44 -0
- package/native/wasm/pkg/.gitignore +6 -0
- package/native/wasm/pkg/clawpowers_wasm.d.ts +208 -0
- package/native/wasm/pkg/clawpowers_wasm.js +872 -0
- package/native/wasm/pkg/clawpowers_wasm_bg.wasm +0 -0
- package/native/wasm/pkg/clawpowers_wasm_bg.wasm.d.ts +40 -0
- package/native/wasm/pkg/package.json +17 -0
- package/native/wasm/pkg-node/.gitignore +6 -0
- package/native/wasm/pkg-node/clawpowers_wasm.d.ts +143 -0
- package/native/wasm/pkg-node/clawpowers_wasm.js +798 -0
- package/native/wasm/pkg-node/clawpowers_wasm_bg.wasm +0 -0
- package/native/wasm/pkg-node/clawpowers_wasm_bg.wasm.d.ts +40 -0
- package/native/wasm/pkg-node/package.json +13 -0
- package/native/wasm/src/lib.rs +433 -0
- package/package.json +71 -44
- package/src/skills/catalog.ts +435 -0
- package/src/skills/executor.ts +56 -0
- package/src/skills/index.ts +3 -0
- package/src/skills/itp/SKILL.md +112 -0
- package/src/skills/loader.ts +193 -0
- package/.claude-plugin/manifest.json +0 -19
- package/.codex/INSTALL.md +0 -36
- package/.cursor-plugin/manifest.json +0 -21
- package/.opencode/INSTALL.md +0 -52
- package/ARCHITECTURE.md +0 -69
- package/bin/clawpowers.js +0 -625
- package/bin/clawpowers.sh +0 -91
- package/docs/demo/clawpowers-demo.cast +0 -197
- package/docs/demo/clawpowers-demo.gif +0 -0
- package/docs/launch-images/25-skills-breakdown.jpg +0 -0
- package/docs/launch-images/clawpowers-vs-superpowers.jpg +0 -0
- package/docs/launch-images/economic-code-optimization.jpg +0 -0
- package/docs/launch-images/native-vs-bridge-2.jpg +0 -0
- package/docs/launch-images/native-vs-bridge.jpg +0 -0
- package/docs/launch-images/post1-hero-lobster.jpg +0 -0
- package/docs/launch-images/post2-dashboard.jpg +0 -0
- package/docs/launch-images/post3-superpowers.jpg +0 -0
- package/docs/launch-images/post4-before-after.jpg +0 -0
- package/docs/launch-images/post5-install-now.jpg +0 -0
- package/docs/launch-images/ultimate-stack.jpg +0 -0
- package/docs/launch-posts.md +0 -76
- package/docs/quickstart-first-transaction.md +0 -204
- package/gemini-extension.json +0 -32
- package/hooks/session-start +0 -205
- package/hooks/session-start.cmd +0 -43
- package/hooks/session-start.js +0 -163
- package/runtime/demo/README.md +0 -78
- package/runtime/demo/x402-mock-server.js +0 -230
- package/runtime/feedback/analyze.js +0 -621
- package/runtime/feedback/analyze.sh +0 -546
- package/runtime/init.js +0 -210
- package/runtime/init.sh +0 -178
- package/runtime/metrics/collector.js +0 -361
- package/runtime/metrics/collector.sh +0 -308
- package/runtime/payments/ledger.js +0 -305
- package/runtime/payments/ledger.sh +0 -262
- package/runtime/payments/pipeline.js +0 -455
- package/runtime/persistence/store.js +0 -433
- package/runtime/persistence/store.sh +0 -303
- package/skill.json +0 -106
- package/skills/agent-bounties/SKILL.md +0 -553
- package/skills/agent-payments/SKILL.md +0 -479
- package/skills/brainstorming/SKILL.md +0 -233
- package/skills/content-pipeline/SKILL.md +0 -282
- package/skills/cross-project-knowledge/SKILL.md +0 -345
- package/skills/dispatching-parallel-agents/SKILL.md +0 -305
- package/skills/economic-code-optimization/SKILL.md +0 -265
- package/skills/executing-plans/SKILL.md +0 -255
- package/skills/finishing-a-development-branch/SKILL.md +0 -260
- package/skills/formal-verification-lite/SKILL.md +0 -441
- package/skills/learn-how-to-learn/SKILL.md +0 -235
- package/skills/market-intelligence/SKILL.md +0 -323
- package/skills/meta-skill-evolution/SKILL.md +0 -325
- package/skills/prospecting/SKILL.md +0 -454
- package/skills/receiving-code-review/SKILL.md +0 -225
- package/skills/requesting-code-review/SKILL.md +0 -206
- package/skills/security-audit/SKILL.md +0 -353
- package/skills/self-healing-code/SKILL.md +0 -369
- package/skills/subagent-driven-development/SKILL.md +0 -244
- package/skills/systematic-debugging/SKILL.md +0 -355
- package/skills/test-driven-development/SKILL.md +0 -416
- package/skills/using-clawpowers/SKILL.md +0 -160
- package/skills/using-git-worktrees/SKILL.md +0 -261
- package/skills/validator/SKILL.md +0 -281
- package/skills/verification-before-completion/SKILL.md +0 -254
- package/skills/writing-plans/SKILL.md +0 -276
- package/skills/writing-skills/SKILL.md +0 -260
|
@@ -1,416 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: test-driven-development
|
|
3
|
-
description: Enforce RED-GREEN-REFACTOR with mandatory failure witness. Activate whenever writing new code, implementing a feature, or fixing a bug with a reproducible test case.
|
|
4
|
-
version: 1.0.0
|
|
5
|
-
requires:
|
|
6
|
-
tools: [bash]
|
|
7
|
-
runtime: false
|
|
8
|
-
metrics:
|
|
9
|
-
tracks: [red_witnessed, green_achieved, refactor_cycles, mutation_score, test_effectiveness]
|
|
10
|
-
improves: [test_granularity, refactor_threshold, mutation_analysis_frequency]
|
|
11
|
-
---
|
|
12
|
-
|
|
13
|
-
# Test-Driven Development
|
|
14
|
-
|
|
15
|
-
## When to Use
|
|
16
|
-
|
|
17
|
-
Apply this skill when:
|
|
18
|
-
|
|
19
|
-
- Implementing any new feature or function
|
|
20
|
-
- Fixing a bug that has a reproducible failure condition
|
|
21
|
-
- Refactoring code where behavior must be preserved
|
|
22
|
-
- Building an API endpoint, utility function, or module
|
|
23
|
-
- Implementing a specification with defined inputs and outputs
|
|
24
|
-
|
|
25
|
-
**Skip when:**
|
|
26
|
-
- Exploratory prototyping where the interface isn't known yet (write the prototype, then TDD the real implementation)
|
|
27
|
-
- One-off scripts with no production path
|
|
28
|
-
- Pure configuration changes (infrastructure, env vars, YAML)
|
|
29
|
-
|
|
30
|
-
**Decision tree:**
|
|
31
|
-
```
|
|
32
|
-
Do you know what correct behavior looks like?
|
|
33
|
-
├── No → Explore/prototype first, then TDD the real version
|
|
34
|
-
└── Yes → Do you have a reproducible failure condition?
|
|
35
|
-
├── Yes (bug fix) → TDD from the failing test
|
|
36
|
-
└── No (new feature) → TDD from the spec → RED-GREEN-REFACTOR
|
|
37
|
-
```
|
|
38
|
-
|
|
39
|
-
## Core Methodology
|
|
40
|
-
|
|
41
|
-
### The Laws of TDD
|
|
42
|
-
|
|
43
|
-
1. You may not write production code unless it is to make a failing test pass.
|
|
44
|
-
2. You may not write more of a unit test than is sufficient to fail (compilation failures count as failures).
|
|
45
|
-
3. You may not write more production code than is sufficient to pass the currently failing test.
|
|
46
|
-
|
|
47
|
-
These are not suggestions. Violations produce code that is tested after the fact — which is not TDD.
|
|
48
|
-
|
|
49
|
-
### The RED Phase
|
|
50
|
-
|
|
51
|
-
**Objective:** Write a test that fails for the right reason.
|
|
52
|
-
|
|
53
|
-
```
|
|
54
|
-
Step 1: Write the test before any production code exists
|
|
55
|
-
Step 2: Run the test suite
|
|
56
|
-
Step 3: WITNESS the failure — read the actual error message
|
|
57
|
-
Step 4: Confirm the failure is the expected one (not a compile error, not a wrong import)
|
|
58
|
-
```
|
|
59
|
-
|
|
60
|
-
**Failure witness requirement:** You must see the test runner output showing failure. Copy-pasting the expected error is not sufficient — run it.
|
|
61
|
-
|
|
62
|
-
**Example (Python):**
|
|
63
|
-
```python
|
|
64
|
-
# test_auth.py — write this FIRST
|
|
65
|
-
def test_jwt_issue_returns_token_with_expiry():
|
|
66
|
-
auth = AuthService(secret="test-secret")
|
|
67
|
-
result = auth.issue(user_id="u123", ttl_seconds=3600)
|
|
68
|
-
|
|
69
|
-
assert result["token"] is not None
|
|
70
|
-
assert result["expires_at"] > time.time()
|
|
71
|
-
assert result["user_id"] == "u123"
|
|
72
|
-
|
|
73
|
-
# Run and witness:
|
|
74
|
-
# pytest test_auth.py::test_jwt_issue_returns_token_with_expiry
|
|
75
|
-
# FAILED: ImportError: cannot import name 'AuthService' from 'auth'
|
|
76
|
-
# ← This is the expected RED failure. Correct.
|
|
77
|
-
```
|
|
78
|
-
|
|
79
|
-
**What a bad RED looks like:**
|
|
80
|
-
```
|
|
81
|
-
# WRONG: Writing AuthService first, then the test
|
|
82
|
-
# WRONG: Test passes on first run (you tested nothing)
|
|
83
|
-
# WRONG: Test fails with wrong error (syntax error in test, not missing implementation)
|
|
84
|
-
```
|
|
85
|
-
|
|
86
|
-
### The GREEN Phase
|
|
87
|
-
|
|
88
|
-
**Objective:** Write the minimum production code to make the test pass.
|
|
89
|
-
|
|
90
|
-
```
|
|
91
|
-
Step 1: Write only what the test requires — nothing more
|
|
92
|
-
Step 2: No edge case handling beyond what the test covers
|
|
93
|
-
Step 3: Run the test suite
|
|
94
|
-
Step 4: WITNESS the green — all targeted tests pass
|
|
95
|
-
Step 5: If other tests broke, fix them (don't disable them)
|
|
96
|
-
```
|
|
97
|
-
|
|
98
|
-
**Minimum code principle:** If the test only checks that `add(2, 3)` returns `5`, write `return 5` if that makes it green. The next test will force generalization.
|
|
99
|
-
|
|
100
|
-
**Example:**
|
|
101
|
-
```python
|
|
102
|
-
# auth.py — write this AFTER the test fails
|
|
103
|
-
import jwt
|
|
104
|
-
import time
|
|
105
|
-
|
|
106
|
-
class AuthService:
|
|
107
|
-
def __init__(self, secret: str):
|
|
108
|
-
self.secret = secret
|
|
109
|
-
|
|
110
|
-
def issue(self, user_id: str, ttl_seconds: int) -> dict:
|
|
111
|
-
now = time.time()
|
|
112
|
-
payload = {"sub": user_id, "iat": now, "exp": now + ttl_seconds}
|
|
113
|
-
token = jwt.encode(payload, self.secret, algorithm="HS256")
|
|
114
|
-
return {"token": token, "expires_at": now + ttl_seconds, "user_id": user_id}
|
|
115
|
-
|
|
116
|
-
# Run and witness:
|
|
117
|
-
# pytest test_auth.py::test_jwt_issue_returns_token_with_expiry
|
|
118
|
-
# PASSED
|
|
119
|
-
```
|
|
120
|
-
|
|
121
|
-
### The REFACTOR Phase
|
|
122
|
-
|
|
123
|
-
**Objective:** Improve code structure without changing behavior.
|
|
124
|
-
|
|
125
|
-
```
|
|
126
|
-
Step 1: All tests must be green BEFORE refactoring begins
|
|
127
|
-
Step 2: Identify: duplication, poor naming, complex conditionals, missing abstractions
|
|
128
|
-
Step 3: Refactor ONE thing at a time
|
|
129
|
-
Step 4: Run full test suite after each change
|
|
130
|
-
Step 5: If tests break, revert immediately (don't debug during refactor)
|
|
131
|
-
Step 6: Refactor test code too — tests are first-class code
|
|
132
|
-
```
|
|
133
|
-
|
|
134
|
-
**What belongs in REFACTOR:**
|
|
135
|
-
- Extract repeated logic into helper functions
|
|
136
|
-
- Rename variables/functions for clarity
|
|
137
|
-
- Simplify nested conditionals
|
|
138
|
-
- Add type annotations
|
|
139
|
-
- Break long functions into smaller ones
|
|
140
|
-
- Move related code into a class
|
|
141
|
-
|
|
142
|
-
**What does NOT belong in REFACTOR:**
|
|
143
|
-
- New functionality (that's the next RED phase)
|
|
144
|
-
- Performance optimization (benchmark first, optimize second)
|
|
145
|
-
- Changing behavior "while you're in there"
|
|
146
|
-
|
|
147
|
-
### The Cycle
|
|
148
|
-
|
|
149
|
-
```
|
|
150
|
-
RED (5-15 min) → GREEN (5-30 min) → REFACTOR (5-20 min) → RED...
|
|
151
|
-
```
|
|
152
|
-
|
|
153
|
-
Each cycle covers ONE behavior. Not a feature — one behavior of a feature.
|
|
154
|
-
|
|
155
|
-
For `AuthService`, the full TDD cycle would be:
|
|
156
|
-
1. RED/GREEN/REFACTOR: `issue()` returns token
|
|
157
|
-
2. RED/GREEN/REFACTOR: `issue()` handles invalid user_id
|
|
158
|
-
3. RED/GREEN/REFACTOR: `validate()` returns valid for good token
|
|
159
|
-
4. RED/GREEN/REFACTOR: `validate()` returns invalid for expired token
|
|
160
|
-
5. RED/GREEN/REFACTOR: `validate()` returns invalid for tampered token
|
|
161
|
-
6. RED/GREEN/REFACTOR: `issue()` and `validate()` work end-to-end
|
|
162
|
-
|
|
163
|
-
### Test Naming Convention
|
|
164
|
-
|
|
165
|
-
Tests are documentation. Name them:
|
|
166
|
-
```
|
|
167
|
-
test_[unit]_[action]_[expected_result]
|
|
168
|
-
test_jwt_issue_with_negative_ttl_raises_value_error()
|
|
169
|
-
test_jwt_validate_expired_token_returns_invalid_with_reason()
|
|
170
|
-
test_jwt_validate_tampered_token_returns_invalid_with_reason()
|
|
171
|
-
```
|
|
172
|
-
|
|
173
|
-
Not:
|
|
174
|
-
```
|
|
175
|
-
test_1()
|
|
176
|
-
test_auth()
|
|
177
|
-
test_jwt_token_stuff()
|
|
178
|
-
```
|
|
179
|
-
|
|
180
|
-
### Testing Layers
|
|
181
|
-
|
|
182
|
-
| Layer | What It Tests | Tool |
|
|
183
|
-
|-------|-------------|------|
|
|
184
|
-
| Unit | Single function, isolated | pytest, jest, go test |
|
|
185
|
-
| Integration | Multiple units together | pytest with real DB |
|
|
186
|
-
| Contract | API interface compliance | pact, dredd |
|
|
187
|
-
| E2E | Full system path | playwright, cypress |
|
|
188
|
-
|
|
189
|
-
TDD applies at all layers. Start with unit. Add integration when units pass.
|
|
190
|
-
|
|
191
|
-
### Autonomous Mutation Testing
|
|
192
|
-
|
|
193
|
-
After the REFACTOR phase is complete and all tests are green, run autonomous mutation testing to verify your tests actually catch bugs — not just pass on correct code.
|
|
194
|
-
|
|
195
|
-
**The mutation testing loop:**
|
|
196
|
-
|
|
197
|
-
```
|
|
198
|
-
GREEN tests → generate mutants → run suite against each → calculate score → fix gaps → re-run
|
|
199
|
-
```
|
|
200
|
-
|
|
201
|
-
**Step 1: Generate mutants**
|
|
202
|
-
|
|
203
|
-
Mutation tools automatically modify your production code in small ways to simulate bugs:
|
|
204
|
-
|
|
205
|
-
| Mutation type | Example | What it tests |
|
|
206
|
-
|--------------|---------|-------------|
|
|
207
|
-
| Operator swap | `a > b` → `a >= b` | Off-by-one detection |
|
|
208
|
-
| Condition removal | `if (valid && active)` → `if (active)` | Guard clause tests |
|
|
209
|
-
| Return value swap | `return true` → `return false` | Output assertion coverage |
|
|
210
|
-
| Constant mutation | `ttl = 3600` → `ttl = 0` | Boundary value tests |
|
|
211
|
-
| Statement deletion | Remove a line entirely | Whether tests catch missing logic |
|
|
212
|
-
|
|
213
|
-
**Step 2: Run mutation tools**
|
|
214
|
-
|
|
215
|
-
```bash
|
|
216
|
-
# Python: mutmut
|
|
217
|
-
pip install mutmut
|
|
218
|
-
mutmut run --paths-to-mutate src/ --tests-dir tests/
|
|
219
|
-
mutmut results # shows surviving (undetected) mutants
|
|
220
|
-
|
|
221
|
-
# JavaScript/TypeScript: Stryker
|
|
222
|
-
npx stryker run
|
|
223
|
-
# Stryker generates a detailed HTML report with surviving mutants
|
|
224
|
-
|
|
225
|
-
# Go: go-mutesting
|
|
226
|
-
go install github.com/zimmski/go-mutesting/cmd/go-mutesting@latest
|
|
227
|
-
go-mutesting ./...
|
|
228
|
-
|
|
229
|
-
# Java: PIT
|
|
230
|
-
mvn org.pitest:pitest-maven:mutationCoverage
|
|
231
|
-
```
|
|
232
|
-
|
|
233
|
-
**Step 3: Calculate and interpret the mutation score**
|
|
234
|
-
|
|
235
|
-
```
|
|
236
|
-
mutation score = (killed mutants / total mutants) × 100
|
|
237
|
-
```
|
|
238
|
-
|
|
239
|
-
| Score | Assessment | Action |
|
|
240
|
-
|-------|-----------|--------|
|
|
241
|
-
| ≥ 90% | Excellent | No action needed |
|
|
242
|
-
| 80–89% | Good | Review surviving mutants; add 1-2 targeted tests |
|
|
243
|
-
| 70–79% | Marginal | Systematic gap; add boundary and error-path tests |
|
|
244
|
-
| < 70% | Poor | Tests exist but don't assert enough; add failing-case coverage |
|
|
245
|
-
|
|
246
|
-
**Step 4: Kill surviving mutants**
|
|
247
|
-
|
|
248
|
-
For each surviving mutant, the tool shows what change it made. Write a test that would catch that bug:
|
|
249
|
-
|
|
250
|
-
```python
|
|
251
|
-
# Stryker report shows this mutant survived:
|
|
252
|
-
# Original: if score >= passing_threshold:
|
|
253
|
-
# Mutant: if score > passing_threshold:
|
|
254
|
-
|
|
255
|
-
# Write a test that detects the off-by-one:
|
|
256
|
-
def test_score_at_exact_threshold_passes():
|
|
257
|
-
# This test kills the >= vs > mutant
|
|
258
|
-
assert grade(score=passing_threshold) == "pass"
|
|
259
|
-
assert grade(score=passing_threshold - 1) == "fail"
|
|
260
|
-
```
|
|
261
|
-
|
|
262
|
-
```typescript
|
|
263
|
-
// Stryker shows this mutant survived:
|
|
264
|
-
// Original: return { token, expiresAt, userId }
|
|
265
|
-
// Mutant: return { token, expiresAt, userId: "" }
|
|
266
|
-
|
|
267
|
-
// Write a test that kills it:
|
|
268
|
-
test('issue() returns correct userId in payload', () => {
|
|
269
|
-
const result = auth.issue('user-abc', 3600);
|
|
270
|
-
expect(result.userId).toBe('user-abc'); // was not previously asserted!
|
|
271
|
-
});
|
|
272
|
-
```
|
|
273
|
-
|
|
274
|
-
**Step 5: Iterate until score ≥ 80%**
|
|
275
|
-
|
|
276
|
-
```bash
|
|
277
|
-
# After adding new tests, re-run to measure improvement
|
|
278
|
-
mutmut run --paths-to-mutate src/ --tests-dir tests/
|
|
279
|
-
NEW_SCORE=$(mutmut results | grep "Killed" | awk '{print $2/$4 * 100}')
|
|
280
|
-
echo "Mutation score: $NEW_SCORE%"
|
|
281
|
-
```
|
|
282
|
-
|
|
283
|
-
**Tracking mutation scores over time:**
|
|
284
|
-
|
|
285
|
-
```bash
|
|
286
|
-
# Record in ClawPowers metrics after each TDD cycle
|
|
287
|
-
MUTATION_SCORE=87
|
|
288
|
-
bash runtime/metrics/collector.sh record \
|
|
289
|
-
--skill test-driven-development \
|
|
290
|
-
--outcome success \
|
|
291
|
-
--notes "AuthService: RED×6 witnessed, mutation_score=$MUTATION_SCORE%, 0 surviving mutants after 2 additions"
|
|
292
|
-
```
|
|
293
|
-
|
|
294
|
-
The TDD cycle with mutation testing:
|
|
295
|
-
|
|
296
|
-
```
|
|
297
|
-
RED → GREEN → REFACTOR → MUTATE → [score < 80%? → KILL SURVIVORS → RE-MUTATE] → done
|
|
298
|
-
```
|
|
299
|
-
|
|
300
|
-
## ClawPowers Enhancement
|
|
301
|
-
|
|
302
|
-
When `~/.clawpowers/` runtime is initialized:
|
|
303
|
-
|
|
304
|
-
**Mutation Score History:**
|
|
305
|
-
|
|
306
|
-
```bash
|
|
307
|
-
# Query historical mutation scores
|
|
308
|
-
bash runtime/persistence/store.sh list "tdd:mutation:*" | sort
|
|
309
|
-
# Shows trend: if scores are declining, tests are growing but not keeping up with code complexity
|
|
310
|
-
```
|
|
311
|
-
|
|
312
|
-
**Mutation Analysis Integration:**
|
|
313
|
-
|
|
314
|
-
After the GREEN phase, optionally run mutation analysis to verify your tests actually catch bugs — not just pass on correct code:
|
|
315
|
-
|
|
316
|
-
```bash
|
|
317
|
-
# Python: mutmut
|
|
318
|
-
pip install mutmut
|
|
319
|
-
mutmut run --paths-to-mutate src/auth.py --tests-dir tests/
|
|
320
|
-
|
|
321
|
-
# JavaScript: Stryker
|
|
322
|
-
npx stryker run
|
|
323
|
-
|
|
324
|
-
# Go: go-mutesting
|
|
325
|
-
go-mutesting ./...
|
|
326
|
-
```
|
|
327
|
-
|
|
328
|
-
Mutation score target: ≥ 80%. Below 70% means your tests would miss real bugs.
|
|
329
|
-
|
|
330
|
-
**Test Portfolio Lifecycle Tracking:**
|
|
331
|
-
|
|
332
|
-
```bash
|
|
333
|
-
bash runtime/metrics/collector.sh record \
|
|
334
|
-
--skill test-driven-development \
|
|
335
|
-
--outcome success \
|
|
336
|
-
--notes "AuthService: 6 behaviors, mutation score 87%, 0 stubs"
|
|
337
|
-
```
|
|
338
|
-
|
|
339
|
-
**Effectiveness Scoring:**
|
|
340
|
-
|
|
341
|
-
`runtime/feedback/analyze.sh` computes per-feature test effectiveness based on:
|
|
342
|
-
- Mutation score
|
|
343
|
-
- Number of RED phases witnessed (vs skipped)
|
|
344
|
-
- Time from RED to GREEN (indicates test complexity)
|
|
345
|
-
- Defect rate post-merge (bugs found in production = test misses)
|
|
346
|
-
|
|
347
|
-
Skills with declining effectiveness scores trigger recommendations:
|
|
348
|
-
- If mutation score < 70%: add boundary and error case tests
|
|
349
|
-
- If RED skipped: review test authoring process
|
|
350
|
-
- If GREEN > 60 min: tests are too coarse, decompose
|
|
351
|
-
|
|
352
|
-
## Anti-Patterns
|
|
353
|
-
|
|
354
|
-
| Anti-Pattern | Why It Fails | Correct Approach |
|
|
355
|
-
|-------------|-------------|-----------------|
|
|
356
|
-
| Write tests after code | Tests are biased toward existing implementation | Tests must be written first — that's the definition |
|
|
357
|
-
| Skip the failure witness | Tests may pass vacuously (wrong assertion, wrong import) | Run the suite, read the failure message |
|
|
358
|
-
| Test implementation details | Tests break on refactor | Test behavior/interface, not internal state |
|
|
359
|
-
| One giant test | Hard to diagnose failures | One test per behavior |
|
|
360
|
-
| Mocking everything | Tests pass but real system fails | Mock only at true system boundaries (network, DB, time) |
|
|
361
|
-
| Skip REFACTOR | Technical debt accumulates | REFACTOR is mandatory, not optional |
|
|
362
|
-
| Write all tests upfront | Spec changes invalidate all tests | Write tests incrementally, one behavior at a time |
|
|
363
|
-
| Disable failing tests | Silences real bugs | Fix the code, never disable the test |
|
|
364
|
-
|
|
365
|
-
## Examples
|
|
366
|
-
|
|
367
|
-
### Example 1: Pure Function (simplest case)
|
|
368
|
-
|
|
369
|
-
```python
|
|
370
|
-
# RED
|
|
371
|
-
def test_celsius_to_fahrenheit_converts_correctly():
|
|
372
|
-
assert convert_temp(0, "celsius", "fahrenheit") == 32.0
|
|
373
|
-
assert convert_temp(100, "celsius", "fahrenheit") == 212.0
|
|
374
|
-
|
|
375
|
-
# GREEN
|
|
376
|
-
def convert_temp(value, from_unit, to_unit):
|
|
377
|
-
if from_unit == "celsius" and to_unit == "fahrenheit":
|
|
378
|
-
return value * 9/5 + 32
|
|
379
|
-
raise ValueError(f"Unsupported conversion: {from_unit} → {to_unit}")
|
|
380
|
-
|
|
381
|
-
# REFACTOR → add UNIT_CONVERTERS dict, extract conversion logic
|
|
382
|
-
```
|
|
383
|
-
|
|
384
|
-
### Example 2: Side-effecting Code
|
|
385
|
-
|
|
386
|
-
```python
|
|
387
|
-
# RED — test the effect, not the implementation
|
|
388
|
-
def test_user_created_event_emitted_on_signup(event_bus):
|
|
389
|
-
service = UserService(db=FakeDB(), events=event_bus)
|
|
390
|
-
service.signup(email="a@b.com", password="secure123")
|
|
391
|
-
|
|
392
|
-
assert event_bus.has_event("user.created")
|
|
393
|
-
assert event_bus.last_event("user.created")["email"] == "a@b.com"
|
|
394
|
-
|
|
395
|
-
# Note: FakeDB is a test double for the DB boundary
|
|
396
|
-
# event_bus is a test double for the event system boundary
|
|
397
|
-
# The UserService logic itself is real — no mocking of it
|
|
398
|
-
```
|
|
399
|
-
|
|
400
|
-
### Example 3: Bug Fix
|
|
401
|
-
|
|
402
|
-
```python
|
|
403
|
-
# Reproduce the bug first
|
|
404
|
-
def test_cart_total_with_discount_code_not_negative():
|
|
405
|
-
cart = Cart()
|
|
406
|
-
cart.add_item(price=10.00, qty=1)
|
|
407
|
-
cart.apply_discount(code="HALF_OFF")
|
|
408
|
-
cart.apply_discount(code="HALF_OFF") # applying twice — the bug
|
|
409
|
-
|
|
410
|
-
assert cart.total() >= 0.0 # was returning -5.00
|
|
411
|
-
|
|
412
|
-
# Run → FAIL (reproduces the bug)
|
|
413
|
-
# Fix the bug
|
|
414
|
-
# Run → PASS
|
|
415
|
-
# Now the bug is regression-protected
|
|
416
|
-
```
|
|
@@ -1,160 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: using-clawpowers
|
|
3
|
-
description: Meta-skill explaining ClawPowers, available skills, and how to trigger them. Auto-injected at session start.
|
|
4
|
-
version: 1.0.0
|
|
5
|
-
requires:
|
|
6
|
-
tools: []
|
|
7
|
-
runtime: false
|
|
8
|
-
metrics:
|
|
9
|
-
tracks: [session_starts, skill_activations, platform]
|
|
10
|
-
improves: [onboarding_clarity, trigger_accuracy]
|
|
11
|
-
---
|
|
12
|
-
|
|
13
|
-
# ClawPowers — Skills Framework
|
|
14
|
-
|
|
15
|
-
## When to Use
|
|
16
|
-
|
|
17
|
-
This skill activates automatically at session start. You never invoke it manually.
|
|
18
|
-
|
|
19
|
-
- **Session start:** Injected by the session hook to provide skill discovery
|
|
20
|
-
- **New user onboarding:** Reference this document when unsure which skill applies
|
|
21
|
-
- **Skill lookup:** Check the trigger map below to find the right skill for your task
|
|
22
|
-
|
|
23
|
-
## Core Methodology
|
|
24
|
-
|
|
25
|
-
ClawPowers follows a three-layer approach:
|
|
26
|
-
|
|
27
|
-
1. **Pattern Recognition** — Match your current task to a skill via the trigger map
|
|
28
|
-
2. **Skill Application** — Read the matched skill's SKILL.md and follow its methodology
|
|
29
|
-
3. **Outcome Tracking** — If runtime is available, record execution outcomes for self-improvement
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
You have ClawPowers loaded. This gives you 24 skills that go beyond static instructions — they execute tools, persist state across sessions, and track outcomes for self-improvement. The RSI Intelligence Layer (skills 21-24) enables the agent to improve its own methodology over time.
|
|
33
|
-
|
|
34
|
-
## How Skills Work
|
|
35
|
-
|
|
36
|
-
Skills activate automatically when you recognize a matching task pattern. You don't announce them. You just apply them.
|
|
37
|
-
|
|
38
|
-
**Pattern → Skill mapping:**
|
|
39
|
-
|
|
40
|
-
| When you encounter... | Apply this skill |
|
|
41
|
-
|----------------------|-----------------|
|
|
42
|
-
| A complex task that should be broken into parallel workstreams | `subagent-driven-development` |
|
|
43
|
-
| Writing new code, any feature | `test-driven-development` |
|
|
44
|
-
| A request to plan work | `writing-plans` |
|
|
45
|
-
| Executing a plan that already exists | `executing-plans` |
|
|
46
|
-
| "What should we do about X?" or ideation needed | `brainstorming` |
|
|
47
|
-
| A bug, unexpected behavior, or error | `systematic-debugging` |
|
|
48
|
-
| About to complete, merge, or hand off work | `verification-before-completion` |
|
|
49
|
-
| Done with a feature branch, need to merge | `finishing-a-development-branch` |
|
|
50
|
-
| Need someone else to review the code | `requesting-code-review` |
|
|
51
|
-
| Received code review feedback | `receiving-code-review` |
|
|
52
|
-
| Working on multiple branches simultaneously | `using-git-worktrees` |
|
|
53
|
-
| Need to create a new skill | `writing-skills` |
|
|
54
|
-
| Multiple independent tasks that can run concurrently | `dispatching-parallel-agents` |
|
|
55
|
-
| Making a payment or calling a paid API | `agent-payments` |
|
|
56
|
-
| "setup payments" / "enable wallet" / "configure spending" | `agent-payments` → `npx clawpowers payments setup` |
|
|
57
|
-
| "demo x402" / "test payments" / "mock merchant" | `npx clawpowers demo x402` |
|
|
58
|
-
| "payment log" / "spending history" | `npx clawpowers payments log` |
|
|
59
|
-
| Checking code/containers for vulnerabilities | `security-audit` |
|
|
60
|
-
| Writing blog posts, docs, or social content | `content-pipeline` |
|
|
61
|
-
| Need to understand how to learn something effectively | `learn-how-to-learn` |
|
|
62
|
-
| Competitive research or trend analysis | `market-intelligence` |
|
|
63
|
-
| Finding leads or prospects | `prospecting` |
|
|
64
|
-
| Task counter hits 50; skill success rates declining | `meta-skill-evolution` |
|
|
65
|
-
| Test suite fails; want automatic patch-and-commit | `self-healing-code` |
|
|
66
|
-
| Starting a task; want to check cross-project patterns first | `cross-project-knowledge` |
|
|
67
|
-
| After fixing a bug or architecture decision; want to store the pattern | `cross-project-knowledge` |
|
|
68
|
-
| TDD GREEN phase complete; want invariant property tests | `formal-verification-lite` |
|
|
69
|
-
| Need roundtrip/idempotence/commutativity tests for a pure function | `formal-verification-lite` |
|
|
70
|
-
| Complex task where premium resources would improve quality | `economic-code-optimization` |
|
|
71
|
-
| Deciding whether to pay for expert review or premium model | `economic-code-optimization` |
|
|
72
|
-
| Hiring another agent to complete a task with payment escrow | `agent-bounties` |
|
|
73
|
-
| Want skin-in-the-game guarantees before a multi-agent task | `agent-bounties` |
|
|
74
|
-
|
|
75
|
-
## Reading a Skill
|
|
76
|
-
|
|
77
|
-
Skills are in `skills/<skill-name>/SKILL.md`. Read them with:
|
|
78
|
-
|
|
79
|
-
```bash
|
|
80
|
-
# From repo root
|
|
81
|
-
cat skills/systematic-debugging/SKILL.md
|
|
82
|
-
```
|
|
83
|
-
|
|
84
|
-
Or reference them by path in your context: `skills/systematic-debugging/SKILL.md`
|
|
85
|
-
|
|
86
|
-
## Runtime Layer
|
|
87
|
-
|
|
88
|
-
If the runtime is initialized (`~/.clawpowers/` exists), skills can:
|
|
89
|
-
|
|
90
|
-
1. **Persist state** — `runtime/persistence/store.sh get|set|list`
|
|
91
|
-
2. **Track outcomes** — `runtime/metrics/collector.sh` appends JSON lines
|
|
92
|
-
3. **Analyze performance** — `runtime/feedback/analyze.sh` computes success rates
|
|
93
|
-
|
|
94
|
-
Check if runtime is available:
|
|
95
|
-
```bash
|
|
96
|
-
[ -d ~/.clawpowers ] && echo "runtime available" || echo "static mode"
|
|
97
|
-
```
|
|
98
|
-
|
|
99
|
-
Initialize runtime:
|
|
100
|
-
```bash
|
|
101
|
-
npx clawpowers init
|
|
102
|
-
# or directly:
|
|
103
|
-
bash runtime/init.sh
|
|
104
|
-
```
|
|
105
|
-
|
|
106
|
-
## Graceful Degradation
|
|
107
|
-
|
|
108
|
-
Skills work in two modes:
|
|
109
|
-
|
|
110
|
-
- **Static mode** (no runtime): Skills provide methodology guidance. Same capability as competing frameworks.
|
|
111
|
-
- **Runtime mode** (`~/.clawpowers/` initialized): Full capability — persistence, metrics, RSI feedback, resumable workflows.
|
|
112
|
-
|
|
113
|
-
You never need to check the mode. Skills detect it themselves and adapt their instructions accordingly.
|
|
114
|
-
|
|
115
|
-
## Anti-Patterns
|
|
116
|
-
|
|
117
|
-
- **Don't announce skill usage** — Apply the skill silently, don't say "I'm now using the systematic-debugging skill"
|
|
118
|
-
- **Don't read the skill on every step** — Read once, apply throughout
|
|
119
|
-
- **Don't stack conflicting skills** — If TDD and subagent-driven-development both apply, let subagent-driven-development drive; it includes TDD internally
|
|
120
|
-
- **Don't ignore ClawPowers enhancements** — When the runtime is available, use it; the static path is a fallback, not the goal
|
|
121
|
-
|
|
122
|
-
## Quick Reference: All 24 Skills
|
|
123
|
-
|
|
124
|
-
### Core Development (14)
|
|
125
|
-
1. `subagent-driven-development` — Parallel subagents, two-stage review, worktree isolation
|
|
126
|
-
2. `test-driven-development` — RED-GREEN-REFACTOR with failure witness and autonomous mutation testing
|
|
127
|
-
3. `writing-plans` — Spec to sequenced 2-5 min tasks with dependency graph
|
|
128
|
-
4. `executing-plans` — Tracked execution with resumability and milestone persistence
|
|
129
|
-
5. `brainstorming` — Structured ideation with convergence protocol
|
|
130
|
-
6. `systematic-debugging` — Hypothesis-driven debugging with persistent hypothesis memory
|
|
131
|
-
7. `verification-before-completion` — Quality gates before any merge or handoff
|
|
132
|
-
8. `finishing-a-development-branch` — Branch cleanup, changelog, squash, merge prep
|
|
133
|
-
9. `requesting-code-review` — Review request with context, risk areas, reviewer matching
|
|
134
|
-
10. `receiving-code-review` — Constructive processing, pattern database, response protocol
|
|
135
|
-
11. `using-git-worktrees` — Isolated parallel branch development
|
|
136
|
-
12. `using-clawpowers` — This document
|
|
137
|
-
13. `writing-skills` — TDD for skills: test scenarios → fail → write skill → pass
|
|
138
|
-
14. `dispatching-parallel-agents` — Fan-out execution, load balancing, result aggregation
|
|
139
|
-
|
|
140
|
-
### Extended Capabilities (6)
|
|
141
|
-
15. `agent-payments` — x402 payment protocol, non-custodial wallets, spending limits
|
|
142
|
-
16. `security-audit` — Trivy, gitleaks, npm audit, bandit — actionable report output
|
|
143
|
-
17. `content-pipeline` — Write → humanize → format → publish workflow
|
|
144
|
-
18. `learn-how-to-learn` — 5-layer learning stack, 14 anti-patterns, confidence calibration
|
|
145
|
-
19. `market-intelligence` — Competitive analysis, trend detection, opportunity scoring
|
|
146
|
-
20. `prospecting` — ICP → company search → contact enrichment → outreach prep
|
|
147
|
-
|
|
148
|
-
### RSI Intelligence Layer (4) — NEW
|
|
149
|
-
21. `meta-skill-evolution` — Every 50 tasks: analyze outcomes, find weakest skill, surgically improve it, commit with version bump
|
|
150
|
-
22. `self-healing-code` — Test failure → hypothesis tree → ≥2 candidate patches → auto-commit winner or escalate
|
|
151
|
-
23. `cross-project-knowledge` — Persistent pattern KB across all projects; search before tasks, store after fixes
|
|
152
|
-
24. `formal-verification-lite` — Property-based testing (fast-check/Hypothesis) after TDD GREEN; 1000+ iterations per invariant
|
|
153
|
-
25. `economic-code-optimization` — Autonomously spend micro-budgets on premium models, compute, expert reviews when ROI justifies it
|
|
154
|
-
|
|
155
|
-
### Agent Economy Layer (1) — NEW
|
|
156
|
-
26. `agent-bounties` — Post tasks with USDC rewards, escrow both-party collateral via MutualStakeEscrow, verify with automation, release or dispute on-chain
|
|
157
|
-
|
|
158
|
-
## Session Initialization Complete
|
|
159
|
-
|
|
160
|
-
ClawPowers is ready. 26 skills active. Skills activate on pattern recognition. Runtime enhancements available when `~/.clawpowers/` exists. RSI Intelligence Layer (meta-skill-evolution, self-healing-code, cross-project-knowledge, formal-verification-lite) provides persistent learning across sessions and projects. Agent Economy Layer (agent-bounties) enables autonomous agent-to-agent hiring with on-chain escrow.
|