clawpowers 1.0.1 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,441 @@
1
+ ---
2
+ name: formal-verification-lite
3
+ description: Goes beyond unit tests with property-based testing. Generate invariant properties for functions, write tests with fast-check/Hypothesis/QuickCheck, run 1000+ examples per property, and track edge cases found. Integrates with TDD after GREEN phase.
4
+ version: 1.0.0
5
+ requires:
6
+ tools: [bash, node]
7
+ runtime: false
8
+ metrics:
9
+ tracks: [properties_discovered, edge_cases_found, false_positive_rate, properties_per_function, shrink_examples_count]
10
+ improves: [property_coverage, edge_case_detection_rate, false_positive_threshold]
11
+ ---
12
+
13
+ # Formal Verification Lite
14
+
15
+ ## When to Use
16
+
17
+ Apply this skill when:
18
+
19
+ - The TDD GREEN phase is complete and you have passing unit tests
20
+ - Implementing pure functions with mathematical properties (sort, serialize, parse, encrypt)
21
+ - Building data transformation pipelines where correctness matters at scale
22
+ - Implementing state machines, parsers, or any function where "all inputs" behavior matters
23
+ - A bug report mentions an edge case that unit tests didn't cover
24
+ - Before shipping code to production that handles external/untrusted input
25
+
26
+ **Skip when:**
27
+ - Functions have no invariant properties (e.g., a function that sends an email — pure I/O)
28
+ - Tests require real external services (use integration tests instead)
29
+ - The function is pure configuration or pure UI rendering
30
+ - You're still in RED phase — finish TDD first, then apply this skill
31
+
32
+ **Decision tree:**
33
+ ```
34
+ Does the function have any of these properties?
35
+ ├── Roundtrip: encode/decode, serialize/deserialize → apply roundtrip pattern
36
+ ├── Idempotence: f(f(x)) == f(x) → apply idempotence pattern
37
+ ├── Commutativity: f(a,b) == f(b,a) → apply commutativity pattern
38
+ ├── Monotonicity: if a ≤ b then f(a) ≤ f(b) → apply monotone pattern
39
+ ├── Length preservation: output.length == input.length → apply structural pattern
40
+ └── None of these → unit tests are sufficient; skip this skill
41
+ ```
42
+
43
+ ## Core Methodology
44
+
45
+ ### The Property-Based Testing Mindset
46
+
47
+ Unit tests check specific examples: `sort([3,1,2]) === [1,2,3]`.
48
+ Property tests check **universal invariants**: `∀ array: sort(array).length === array.length`.
49
+
50
+ The difference:
51
+ - Unit test: verifies 1 input
52
+ - Property test with 1000 iterations: verifies 1000 randomly-generated inputs, including edge cases the developer never thought of
53
+
54
+ **The property you write describes what must ALWAYS be true.** The framework finds inputs that break it.
55
+
56
+ ### Step 1: Identify Properties
57
+
58
+ Before writing code, list the mathematical properties of your function. Use this taxonomy:
59
+
60
+ **Roundtrip (parse/serialize inverse pairs):**
61
+ ```
62
+ parse(serialize(x)) == x
63
+ deserialize(serialize(x)) == x
64
+ decode(encode(x)) == x
65
+ ```
66
+
67
+ **Idempotence (applying twice = applying once):**
68
+ ```
69
+ normalize(normalize(x)) == normalize(x)
70
+ deduplicate(deduplicate(x)) == deduplicate(x)
71
+ trim(trim(x)) == trim(x)
72
+ ```
73
+
74
+ **Commutativity (order doesn't matter):**
75
+ ```
76
+ merge(a, b) == merge(b, a)
77
+ add(a, b) == add(b, a)
78
+ union(setA, setB) == union(setB, setA)
79
+ ```
80
+
81
+ **Monotonicity (order-preserving):**
82
+ ```
83
+ if a <= b then score(a) <= score(b)
84
+ if input.length increases then output.length >= previous output.length
85
+ ```
86
+
87
+ **Structural invariants:**
88
+ ```
89
+ sort(arr).length == arr.length
90
+ sort(arr) contains same elements as arr
91
+ filter(arr, pred).every(pred)
92
+ map(arr, f).length == arr.length
93
+ ```
94
+
95
+ **Conservation laws:**
96
+ ```
97
+ sum(split(total, n)) == total
98
+ partition(arr).flatMap(x=>x).length == arr.length
99
+ ```
100
+
101
+ ### Step 2: Write Property Tests
102
+
103
+ #### JavaScript / TypeScript — fast-check
104
+
105
+ ```bash
106
+ npm install --save-dev fast-check
107
+ ```
108
+
109
+ ```typescript
110
+ import * as fc from 'fast-check';
111
+ import { serialize, deserialize } from './serializer';
112
+ import { sort } from './sort';
113
+ import { merge } from './merge';
114
+
115
+ // === Roundtrip property ===
116
+ test('serialize/deserialize roundtrip', () => {
117
+ fc.assert(
118
+ fc.property(
119
+ fc.record({ // generate random records
120
+ id: fc.uuid(),
121
+ name: fc.string({ minLength: 1 }),
122
+ age: fc.integer({ min: 0, max: 150 }),
123
+ tags: fc.array(fc.string()),
124
+ }),
125
+ (obj) => {
126
+ const result = deserialize(serialize(obj));
127
+ expect(result).toEqual(obj); // must roundtrip perfectly
128
+ }
129
+ ),
130
+ { numRuns: 1000 } // run 1000 random examples
131
+ );
132
+ });
133
+
134
+ // === Structural invariant ===
135
+ test('sort preserves length and elements', () => {
136
+ fc.assert(
137
+ fc.property(
138
+ fc.array(fc.integer()),
139
+ (arr) => {
140
+ const sorted = sort(arr);
141
+ // Property 1: length preserved
142
+ expect(sorted.length).toBe(arr.length);
143
+ // Property 2: elements preserved (multiset equality)
144
+ expect(sorted.slice().sort()).toEqual(arr.slice().sort());
145
+ // Property 3: monotone increasing
146
+ for (let i = 1; i < sorted.length; i++) {
147
+ expect(sorted[i]).toBeGreaterThanOrEqual(sorted[i-1]);
148
+ }
149
+ }
150
+ ),
151
+ { numRuns: 1000 }
152
+ );
153
+ });
154
+
155
+ // === Commutativity ===
156
+ test('merge is commutative', () => {
157
+ fc.assert(
158
+ fc.property(
159
+ fc.record({ a: fc.integer(), b: fc.string() }),
160
+ fc.record({ a: fc.integer(), b: fc.string() }),
161
+ (objA, objB) => {
162
+ expect(merge(objA, objB)).toEqual(merge(objB, objA));
163
+ }
164
+ ),
165
+ { numRuns: 500 }
166
+ );
167
+ });
168
+
169
+ // === Idempotence ===
170
+ test('normalize is idempotent', () => {
171
+ fc.assert(
172
+ fc.property(
173
+ fc.string(),
174
+ (str) => {
175
+ const once = normalize(str);
176
+ const twice = normalize(normalize(str));
177
+ expect(once).toEqual(twice);
178
+ }
179
+ ),
180
+ { numRuns: 1000 }
181
+ );
182
+ });
183
+ ```
184
+
185
+ #### Python — Hypothesis
186
+
187
+ ```bash
188
+ pip install hypothesis
189
+ ```
190
+
191
+ ```python
192
+ from hypothesis import given, settings, strategies as st
193
+ from hypothesis import HealthCheck
194
+ from mymodule import serialize, deserialize, sort_items, merge_dicts
195
+
196
+ # === Roundtrip property ===
197
+ @given(st.fixed_dictionaries({
198
+ 'id': st.uuids().map(str),
199
+ 'name': st.text(min_size=1, max_size=100),
200
+ 'age': st.integers(min_value=0, max_value=150),
201
+ 'tags': st.lists(st.text()),
202
+ }))
203
+ @settings(max_examples=1000)
204
+ def test_serialize_deserialize_roundtrip(obj):
205
+ assert deserialize(serialize(obj)) == obj
206
+
207
+ # === Structural invariant ===
208
+ @given(st.lists(st.integers()))
209
+ @settings(max_examples=1000)
210
+ def test_sort_preserves_structure(arr):
211
+ sorted_arr = sort_items(arr)
212
+ assert len(sorted_arr) == len(arr) # length preserved
213
+ assert sorted(sorted_arr) == sorted(arr) # elements preserved
214
+ for i in range(1, len(sorted_arr)): # monotone
215
+ assert sorted_arr[i] >= sorted_arr[i-1]
216
+
217
+ # === Commutativity ===
218
+ @given(
219
+ st.dictionaries(st.text(), st.integers()),
220
+ st.dictionaries(st.text(), st.integers()),
221
+ )
222
+ @settings(max_examples=500)
223
+ def test_merge_commutative(dict_a, dict_b):
224
+ assert merge_dicts(dict_a, dict_b) == merge_dicts(dict_b, dict_a)
225
+
226
+ # === Conservation law ===
227
+ @given(st.integers(min_value=1, max_value=10000), st.integers(min_value=2, max_value=10))
228
+ @settings(max_examples=500)
229
+ def test_split_sum_conserved(total, n):
230
+ parts = split(total, n)
231
+ assert sum(parts) == total
232
+ assert len(parts) == n
233
+ ```
234
+
235
+ #### Go — testing/quick or gopter
236
+
237
+ ```go
238
+ import (
239
+ "testing"
240
+ "testing/quick"
241
+ "reflect"
242
+ )
243
+
244
+ // Roundtrip property
245
+ func TestSerializeRoundtrip(t *testing.T) {
246
+ f := func(data Record) bool {
247
+ serialized := Serialize(data)
248
+ deserialized, err := Deserialize(serialized)
249
+ return err == nil && reflect.DeepEqual(data, deserialized)
250
+ }
251
+ if err := quick.Check(f, &quick.Config{MaxCount: 1000}); err != nil {
252
+ t.Error(err)
253
+ }
254
+ }
255
+
256
+ // Sort structural invariant
257
+ func TestSortInvariant(t *testing.T) {
258
+ f := func(arr []int) bool {
259
+ sorted := Sort(append([]int{}, arr...))
260
+ if len(sorted) != len(arr) { return false }
261
+ for i := 1; i < len(sorted); i++ {
262
+ if sorted[i] < sorted[i-1] { return false }
263
+ }
264
+ return true
265
+ }
266
+ quick.Check(f, &quick.Config{MaxCount: 1000})
267
+ }
268
+ ```
269
+
270
+ ### Step 3: Run with High Iteration Count
271
+
272
+ Default settings in most frameworks are too low (100 examples). Always override:
273
+
274
+ ```bash
275
+ # fast-check: numRuns: 1000+ per property
276
+ # Hypothesis: max_examples=1000 per test
277
+ # QuickCheck: maxSuccess 1000
278
+
279
+ # Run with seed for reproducibility on failure
280
+ npx jest --testNamePattern="property"
281
+ # If a failure occurs, fast-check outputs the seed and minimal counterexample
282
+
283
+ # Python Hypothesis with verbose output
284
+ pytest --hypothesis-show-statistics tests/test_properties.py
285
+ ```
286
+
287
+ **When a property fails:**
288
+
289
+ fast-check and Hypothesis both **shrink** the counterexample — they find the *smallest* failing input. This is the key advantage over manual testing:
290
+
291
+ ```
292
+ Property failed after 47 examples:
293
+ Counterexample: { id: "", name: "a", age: 0, tags: [] }
294
+ ↕ shrunk from: { id: "abc-xyz-...", name: "hello world", age: 42, tags: ["x","y"] }
295
+ ```
296
+
297
+ The shrunk example directly points to the bug: empty string `id` breaks the serializer.
298
+
299
+ ### Step 4: Track Properties and Edge Cases
300
+
301
+ After each property-testing run, record what was discovered:
302
+
303
+ ```bash
304
+ # Record properties found and edge cases surfaced
305
+ cat >> ~/.clawpowers/memory/property-log.jsonl <<EOF
306
+ {
307
+ "timestamp": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
308
+ "function": "$FUNCTION_NAME",
309
+ "project": "$(basename $(git rev-parse --show-toplevel))",
310
+ "properties_tested": $PROPERTIES_COUNT,
311
+ "iterations": $TOTAL_ITERATIONS,
312
+ "edge_cases_found": $EDGE_CASES,
313
+ "counterexamples": $COUNTEREXAMPLES_JSON,
314
+ "false_positives": $FALSE_POSITIVES
315
+ }
316
+ EOF
317
+ ```
318
+
319
+ **False positive tracking:** If a property test fails because the property was wrong (not the implementation), that's a false positive. Track these — high false positive rate means your properties need tightening.
320
+
321
+ ### Step 5: Integration with TDD
322
+
323
+ After the GREEN phase, add property tests before entering REFACTOR:
324
+
325
+ ```
326
+ RED → GREEN → [formal-verification-lite] → REFACTOR
327
+ ```
328
+
329
+ **Integration flow:**
330
+ 1. TDD GREEN: unit tests pass for specific examples
331
+ 2. formal-verification-lite: identify properties → write property tests → run 1000 iterations
332
+ 3. If property tests surface a bug: go back to GREEN to fix
333
+ 4. If all property tests pass: enter REFACTOR with confidence that behavior is correct for all inputs
334
+
335
+ ```python
336
+ # After unit tests pass:
337
+ # test_auth.py — unit tests (specific examples)
338
+ def test_jwt_issue_returns_token():
339
+ auth = AuthService(secret="test")
340
+ result = auth.issue("u123", 3600)
341
+ assert result["token"] is not None
342
+
343
+ # test_auth_properties.py — property tests (universal)
344
+ @given(
345
+ user_id=st.text(min_size=1, max_size=100),
346
+ ttl=st.integers(min_value=1, max_value=86400)
347
+ )
348
+ @settings(max_examples=500)
349
+ def test_jwt_roundtrip(user_id, ttl):
350
+ auth = AuthService(secret="test")
351
+ token_data = auth.issue(user_id, ttl)
352
+ validated = auth.validate(token_data["token"])
353
+ assert validated["user_id"] == user_id
354
+ assert validated["valid"] == True
355
+ ```
356
+
357
+ ### Common Property Templates
358
+
359
+ ```typescript
360
+ // Template 1: Roundtrip
361
+ fc.assert(fc.property(arbitraryInput, (x) => {
362
+ expect(decode(encode(x))).toEqual(x);
363
+ }), { numRuns: 1000 });
364
+
365
+ // Template 2: Idempotence
366
+ fc.assert(fc.property(arbitraryInput, (x) => {
367
+ expect(f(f(x))).toEqual(f(x));
368
+ }), { numRuns: 1000 });
369
+
370
+ // Template 3: Commutativity
371
+ fc.assert(fc.property(arbitraryA, arbitraryB, (a, b) => {
372
+ expect(f(a, b)).toEqual(f(b, a));
373
+ }), { numRuns: 500 });
374
+
375
+ // Template 4: Monotonicity
376
+ fc.assert(fc.property(fc.tuple(arbitraryNum, arbitraryNum), ([a, b]) => {
377
+ fc.pre(a <= b); // pre-condition
378
+ expect(score(a)).toBeLessThanOrEqual(score(b));
379
+ }), { numRuns: 500 });
380
+
381
+ // Template 5: Conservation
382
+ fc.assert(fc.property(arbitraryArray, (arr) => {
383
+ const parts = partition(arr);
384
+ expect(parts.flat()).toEqual(expect.arrayContaining(arr));
385
+ expect(parts.flat().length).toBe(arr.length);
386
+ }), { numRuns: 1000 });
387
+ ```
388
+
389
+ ## ClawPowers Enhancement
390
+
391
+ When `~/.clawpowers/` runtime is initialized:
392
+
393
+ **Track property discovery over time:**
394
+
395
+ ```bash
396
+ bash runtime/persistence/store.sh set "fvl:$PROJECT:$FUNCTION:properties_count" "$PROPERTIES_COUNT"
397
+ bash runtime/persistence/store.sh set "fvl:$PROJECT:$FUNCTION:edge_cases_found" "$EDGE_CASES"
398
+ bash runtime/persistence/store.sh set "fvl:$PROJECT:last_run" "$(date -u +%Y-%m-%dT%H:%M:%SZ)"
399
+ ```
400
+
401
+ **Metrics recording:**
402
+
403
+ ```bash
404
+ bash runtime/metrics/collector.sh record \
405
+ --skill formal-verification-lite \
406
+ --outcome success \
407
+ --notes "function: $FUNCTION_NAME, properties: $PROPERTIES_COUNT, iterations: $TOTAL_ITERATIONS, edge cases: $EDGE_CASES"
408
+ ```
409
+
410
+ **Analyze property-testing effectiveness:**
411
+
412
+ ```bash
413
+ bash runtime/feedback/analyze.sh --filter formal-verification-lite
414
+ # Reports: edge cases found per 1000 iterations, functions with most property failures,
415
+ # false positive rate trend, which property patterns are most productive
416
+ ```
417
+
418
+ **Cross-project property library:**
419
+ ```bash
420
+ # Store a reusable property template in knowledge base
421
+ search_patterns "roundtrip property" "testing"
422
+ store_pattern "testing" \
423
+ "Roundtrip property for JSON-serializable data structures" \
424
+ "Any function pair encode/decode or serialize/deserialize" \
425
+ "Property: decode(encode(x)) == x with 1000 iterations" \
426
+ "fc.assert(fc.property(fc.jsonValue(), x => expect(decode(encode(x))).toEqual(x)), {numRuns:1000})" \
427
+ "property-based,roundtrip,fast-check,hypothesis"
428
+ ```
429
+
430
+ ## Anti-Patterns
431
+
432
+ | Anti-Pattern | Why It Fails | Correct Approach |
433
+ |-------------|-------------|-----------------|
434
+ | Run with default iteration count (< 100) | Edge cases aren't found; same as unit tests | Always set numRuns/max_examples ≥ 1000 |
435
+ | Write tautological properties | Property always passes, catches nothing | `expect(sort(arr).length >= 0)` is useless; test real invariants |
436
+ | Use property tests instead of unit tests | Harder to debug specific examples | Use both: unit tests for known examples, property tests for invariants |
437
+ | Skip shrinking | Large counterexamples are hard to debug | Let the framework shrink; always look at the minimal counterexample |
438
+ | Write properties before GREEN phase | Tests fail for wrong reasons | Complete TDD GREEN first, then add property tests |
439
+ | Test implementation details in properties | Properties break on refactor | Test mathematical relationships, not internal state |
440
+ | High false positive rate (> 10%) | Wastes time on wrong property definitions | Tighten pre-conditions with `fc.pre()` or `assume()` |
441
+ | Apply to I/O-heavy functions | Property tests of side effects are flaky | Property tests are for pure functions only |