serenecode 0.1.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (39) hide show
  1. serenecode/__init__.py +281 -0
  2. serenecode/adapters/__init__.py +6 -0
  3. serenecode/adapters/coverage_adapter.py +1173 -0
  4. serenecode/adapters/crosshair_adapter.py +1069 -0
  5. serenecode/adapters/hypothesis_adapter.py +1824 -0
  6. serenecode/adapters/local_fs.py +169 -0
  7. serenecode/adapters/module_loader.py +492 -0
  8. serenecode/adapters/mypy_adapter.py +161 -0
  9. serenecode/checker/__init__.py +6 -0
  10. serenecode/checker/compositional.py +2216 -0
  11. serenecode/checker/coverage.py +186 -0
  12. serenecode/checker/properties.py +154 -0
  13. serenecode/checker/structural.py +1504 -0
  14. serenecode/checker/symbolic.py +178 -0
  15. serenecode/checker/types.py +148 -0
  16. serenecode/cli.py +478 -0
  17. serenecode/config.py +711 -0
  18. serenecode/contracts/__init__.py +6 -0
  19. serenecode/contracts/predicates.py +176 -0
  20. serenecode/core/__init__.py +6 -0
  21. serenecode/core/exceptions.py +38 -0
  22. serenecode/core/pipeline.py +807 -0
  23. serenecode/init.py +307 -0
  24. serenecode/models.py +308 -0
  25. serenecode/ports/__init__.py +6 -0
  26. serenecode/ports/coverage_analyzer.py +124 -0
  27. serenecode/ports/file_system.py +95 -0
  28. serenecode/ports/property_tester.py +69 -0
  29. serenecode/ports/symbolic_checker.py +70 -0
  30. serenecode/ports/type_checker.py +66 -0
  31. serenecode/reporter.py +346 -0
  32. serenecode/source_discovery.py +319 -0
  33. serenecode/templates/__init__.py +5 -0
  34. serenecode/templates/content.py +337 -0
  35. serenecode-0.1.0.dist-info/METADATA +298 -0
  36. serenecode-0.1.0.dist-info/RECORD +39 -0
  37. serenecode-0.1.0.dist-info/WHEEL +4 -0
  38. serenecode-0.1.0.dist-info/entry_points.txt +2 -0
  39. serenecode-0.1.0.dist-info/licenses/LICENSE +21 -0
@@ -0,0 +1,337 @@
1
+ """Embedded template content for SERENECODE.md files.
2
+
3
+ This module stores template content as string constants so that
4
+ the init module can generate SERENECODE.md without file I/O.
5
+ """
6
+
7
+ from __future__ import annotations
8
+
9
+ import icontract
10
+
11
+ from serenecode.contracts.predicates import is_valid_template_name
12
+
13
+ _DEFAULT_TEMPLATE = """\
14
+ # SERENECODE.md — Project Conventions
15
+
16
+ This file governs how all code in this project must be written. Any AI coding \
17
+ agent MUST read this file in its entirety before writing or modifying any code.
18
+
19
+ Verified with: `serenecode check src/ --level 4 --allow-code-execution`
20
+
21
+ Levels 3-6 import and execute project modules. Only use \
22
+ `--allow-code-execution` for trusted code.
23
+
24
+ ---
25
+
26
+ ## Complete Example
27
+
28
+ This shows every pattern the checker enforces. Follow this exactly:
29
+
30
+ ```python
31
+ \"\"\"Module docstring describing purpose and architecture role.\"\"\"
32
+
33
+ import icontract
34
+ from dataclasses import dataclass
35
+
36
+
37
+ @icontract.invariant(lambda self: self.balance >= 0, "balance must be non-negative")
38
+ @dataclass(frozen=True)
39
+ class Account:
40
+ \"\"\"An immutable account record.\"\"\"
41
+
42
+ name: str
43
+ balance: float
44
+
45
+
46
+ @icontract.require(lambda items: len(items) > 0, "items must not be empty")
47
+ @icontract.ensure(lambda items, result: min(items) <= result <= max(items), "result within range")
48
+ def compute_mean(items: list[float]) -> float:
49
+ \"\"\"Compute the arithmetic mean.\"\"\"
50
+ return sum(items) / len(items)
51
+ ```
52
+
53
+ ---
54
+
55
+ ## Contract Standards
56
+
57
+ ### Public Functions
58
+
59
+ Every public function MUST have `@icontract.require` (preconditions) and \
60
+ `@icontract.ensure` (postconditions) using icontract decorators.
61
+
62
+ - Every contract decorator MUST include a human-readable description string \
63
+ as the second argument: `@icontract.require(lambda x: x > 0, "x must be positive")`
64
+ - Functions with no meaningful parameters may omit `@icontract.require`.
65
+ - Contracts must be pure boolean expressions — no side effects.
66
+
67
+ ### Private/Helper Functions
68
+
69
+ Private functions (prefixed with `_`) SHOULD have contracts when the function \
70
+ contains non-trivial logic.
71
+
72
+ ### Class Invariants
73
+
74
+ Every class MUST have at least one `@icontract.invariant` defining its \
75
+ representation invariant. Invariants must constrain actual state — \
76
+ tautological invariants like `lambda self: True` provide no verification \
77
+ value and should not be used. If a class is truly stateless (e.g. a \
78
+ Protocol or a stateless adapter), omit the invariant and document why.
79
+
80
+ ---
81
+
82
+ ## Type Annotation Standards
83
+
84
+ - All function signatures MUST have complete type annotations on every \
85
+ parameter kind (including positional-only, keyword-only, variadic, and private \
86
+ helper parameters) and the return type.
87
+ - No use of `Any` in core modules. Use `Protocol`, `Union`, or generics.
88
+ - Generic types must be fully parameterized (`list[str]` not `list`).
89
+ - Use modern type syntax (Python 3.10+): `X | None` not `Optional[X]`.
90
+
91
+ ---
92
+
93
+ ## Documentation Standards
94
+
95
+ - Every module MUST have a module-level docstring.
96
+ - Every public function and class MUST have a docstring.
97
+
98
+ ---
99
+
100
+ ## Architecture Standards
101
+
102
+ ### Hexagonal Architecture
103
+
104
+ ```
105
+ src/yourproject/
106
+ ├── core/ # Pure logic. No I/O. No os/pathlib/subprocess imports.
107
+ ├── ports/ # Protocol interfaces only.
108
+ ├── adapters/ # I/O implementations.
109
+ └── cli.py # Thin entry point.
110
+ ```
111
+
112
+ Core modules (`core/`, models, contracts) MUST NOT import I/O libraries \
113
+ (`os`, `pathlib`, `subprocess`, `requests`, `socket`, `shutil`, `tempfile`, `glob`). \
114
+ Inject dependencies through function parameters.
115
+
116
+ ---
117
+
118
+ ## Naming Conventions
119
+
120
+ - Modules: `snake_case.py`. Classes: `PascalCase`. Functions: `snake_case`.
121
+
122
+ ---
123
+
124
+ ## Exemptions
125
+
126
+ The following are exempt from full contract requirements:
127
+ - `cli.py`, `__init__.py` — Composition roots.
128
+ - `adapters/` — I/O boundary code.
129
+ - `ports/` — Protocol definitions.
130
+ - `templates/`, `tests/fixtures/`, `exceptions.py`
131
+
132
+ These MUST still have type annotations.
133
+ """
134
+
135
+ _STRICT_TEMPLATE = """\
136
+ # SERENECODE.md — Strict Project Conventions
137
+
138
+ This file governs how all code in this project must be written. Any AI coding \
139
+ agent MUST read this file in its entirety before writing or modifying any code. \
140
+ **No exemptions.** Every function — public and private — must have contracts.
141
+
142
+ Verified with: `serenecode check src/ --level 6 --allow-code-execution`
143
+
144
+ Levels 3-6 import and execute project modules. Only use \
145
+ `--allow-code-execution` for trusted code.
146
+
147
+ ---
148
+
149
+ ## Complete Example
150
+
151
+ This shows every pattern the checker enforces. Follow this exactly:
152
+
153
+ ```python
154
+ \"\"\"Module docstring describing purpose and architecture role.
155
+
156
+ This is a core module — no I/O operations are permitted.
157
+ \"\"\"
158
+
159
+ import icontract
160
+ from dataclasses import dataclass
161
+
162
+
163
+ @icontract.invariant(lambda self: self.balance >= 0, "balance must be non-negative")
164
+ @dataclass(frozen=True)
165
+ class Account:
166
+ \"\"\"An immutable account record.\"\"\"
167
+
168
+ name: str
169
+ balance: float
170
+
171
+
172
+ @icontract.require(lambda items: len(items) > 0, "items must not be empty")
173
+ @icontract.ensure(lambda items, result: min(items) <= result <= max(items), "result within range")
174
+ def compute_mean(items: list[float]) -> float:
175
+ \"\"\"Compute the arithmetic mean.\"\"\"
176
+ total = 0.0
177
+ # Loop invariant: total is the sum of items[0..i]
178
+ for item in items:
179
+ total += item
180
+ return total / len(items)
181
+
182
+
183
+ def _validate_positive(value: float) -> bool:
184
+ \"\"\"Check that a value is positive.\"\"\"
185
+ return value > 0
186
+ ```
187
+
188
+ ---
189
+
190
+ ## Contract Standards
191
+
192
+ ### Public Functions
193
+
194
+ Every public function MUST have `@icontract.require` and `@icontract.ensure` \
195
+ with description strings: `@icontract.require(lambda x: x > 0, "x must be positive")`
196
+
197
+ Functions with no meaningful parameters may omit `@icontract.require`.
198
+
199
+ ### Private Functions
200
+
201
+ Private functions (prefixed with `_`) MUST have contracts for all non-trivial \
202
+ logic. Simple one-liner helpers may omit contracts but MUST have type annotations.
203
+
204
+ ### Class Invariants
205
+
206
+ Every class MUST have `@icontract.invariant`. Invariants must constrain \
207
+ actual state — tautological invariants like `lambda self: True` provide no \
208
+ verification value. If a class is truly stateless (Protocol, stateless adapter), \
209
+ omit the invariant and document why.
210
+
211
+ ---
212
+
213
+ ## Type Annotation Standards
214
+
215
+ - All function signatures MUST have complete type annotations on every \
216
+ parameter kind (including positional-only, keyword-only, variadic, and private \
217
+ helper parameters) and the return type.
218
+ - No use of `Any` anywhere — use `Protocol`, `Union`, or generics.
219
+ - Generic types must be fully parameterized (`list[str]` not `list`).
220
+ - Use modern type syntax (Python 3.10+): `X | None` not `Optional[X]`.
221
+
222
+ ---
223
+
224
+ ## Documentation Standards
225
+
226
+ - Every module MUST have a module-level docstring.
227
+ - Every public function and class MUST have a docstring.
228
+
229
+ ---
230
+
231
+ ## Architecture Standards
232
+
233
+ ```
234
+ src/yourproject/
235
+ ├── core/ # Pure logic. No I/O. No os/pathlib/subprocess imports.
236
+ ├── ports/ # Protocol interfaces only.
237
+ ├── adapters/ # I/O implementations.
238
+ └── cli.py # Thin entry point.
239
+ ```
240
+
241
+ Core modules (`core/`, models, contracts, checkers) MUST NOT import I/O \
242
+ libraries (`os`, `pathlib`, `subprocess`, `requests`, `socket`, `shutil`, \
243
+ `tempfile`, `glob`). Inject dependencies through function parameters.
244
+
245
+ ---
246
+
247
+ ## Error Handling Standards
248
+
249
+ Only domain-specific exceptions permitted in core modules. Never raise bare \
250
+ `Exception`, `ValueError`, `TypeError`, `RuntimeError`, `KeyError`, \
251
+ `IndexError`, or `AttributeError` in core.
252
+
253
+ ---
254
+
255
+ ## Loop and Recursion Standards
256
+
257
+ - Every loop MUST include a comment describing the loop invariant.
258
+ - Recursive functions MUST document the variant (decreasing measure).
259
+ - Prefer bounded iteration over unbounded `while`.
260
+
261
+ ---
262
+
263
+ ## Naming Conventions
264
+
265
+ - Modules: `snake_case.py`. Classes: `PascalCase`. Functions: `snake_case`.
266
+
267
+ ---
268
+
269
+ ## No Exemptions
270
+
271
+ Strict mode has NO exempt modules. Every module, including CLI and adapters, \
272
+ must follow all conventions above.
273
+ """
274
+
275
+ _MINIMAL_TEMPLATE = """\
276
+ # SERENECODE.md — Minimal Project Conventions
277
+
278
+ This file defines minimal code conventions for this project. AI coding agents \
279
+ MUST read this before writing any code.
280
+
281
+ Verified with: `serenecode check src/`
282
+
283
+ ---
284
+
285
+ ## Contract Standards
286
+
287
+ Every public function MUST have `@icontract.require` (preconditions) and \
288
+ `@icontract.ensure` (postconditions). Type annotations on all parameters \
289
+ and return values.
290
+
291
+ ```python
292
+ import icontract
293
+
294
+ @icontract.require(lambda items: len(items) > 0)
295
+ @icontract.ensure(lambda items, result: min(items) <= result <= max(items))
296
+ def compute_mean(items: list[float]) -> float:
297
+ return sum(items) / len(items)
298
+ ```
299
+
300
+ Functions with no meaningful parameters may omit `@icontract.require` but \
301
+ MUST still have `@icontract.ensure`.
302
+
303
+ ---
304
+
305
+ ## Exemptions
306
+
307
+ - `cli.py` — Thin CLI layer.
308
+ - `adapters/` — I/O boundary code.
309
+ - `templates/` — Static files.
310
+ - `tests/fixtures/` — Test fixtures.
311
+ """
312
+
313
+ _TEMPLATES = {
314
+ "default": _DEFAULT_TEMPLATE,
315
+ "strict": _STRICT_TEMPLATE,
316
+ "minimal": _MINIMAL_TEMPLATE,
317
+ }
318
+
319
+
320
+ @icontract.require(
321
+ lambda template_name: is_valid_template_name(template_name),
322
+ "template_name must be a valid template name",
323
+ )
324
+ @icontract.ensure(
325
+ lambda result: isinstance(result, str) and len(result) > 0,
326
+ "result must be a non-empty string",
327
+ )
328
+ def get_template(template_name: str) -> str:
329
+ """Return the template content for a named template.
330
+
331
+ Args:
332
+ template_name: One of 'default', 'strict', or 'minimal'.
333
+
334
+ Returns:
335
+ The full SERENECODE.md template content.
336
+ """
337
+ return _TEMPLATES[template_name]
@@ -0,0 +1,298 @@
1
+ Metadata-Version: 2.4
2
+ Name: serenecode
3
+ Version: 0.1.0
4
+ Summary: Verification framework for AI-generated Python — test coverage, property testing, and symbolic execution
5
+ Project-URL: Homepage, https://github.com/helgster77/serenecode
6
+ Project-URL: Repository, https://github.com/helgster77/serenecode
7
+ Project-URL: Issues, https://github.com/helgster77/serenecode/issues
8
+ Author: helgster77
9
+ License-Expression: MIT
10
+ License-File: LICENSE
11
+ Classifier: Development Status :: 3 - Alpha
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: License :: OSI Approved :: MIT License
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Programming Language :: Python :: 3.10
16
+ Classifier: Programming Language :: Python :: 3.11
17
+ Classifier: Programming Language :: Python :: 3.12
18
+ Classifier: Programming Language :: Python :: 3.13
19
+ Classifier: Topic :: Software Development :: Quality Assurance
20
+ Classifier: Topic :: Software Development :: Testing
21
+ Classifier: Typing :: Typed
22
+ Requires-Python: >=3.10
23
+ Requires-Dist: click>=8.0
24
+ Requires-Dist: icontract>=2.7.0
25
+ Provides-Extra: dev
26
+ Requires-Dist: crosshair-tool>=0.0.60; extra == 'dev'
27
+ Requires-Dist: hypothesis>=6.0; extra == 'dev'
28
+ Requires-Dist: mypy>=1.0; extra == 'dev'
29
+ Requires-Dist: pytest-cov>=4.0; extra == 'dev'
30
+ Requires-Dist: pytest>=7.0; extra == 'dev'
31
+ Provides-Extra: verify
32
+ Requires-Dist: coverage>=7.0; extra == 'verify'
33
+ Requires-Dist: crosshair-tool>=0.0.60; extra == 'verify'
34
+ Requires-Dist: hypothesis>=6.0; extra == 'verify'
35
+ Requires-Dist: mypy>=1.0; extra == 'verify'
36
+ Description-Content-Type: text/markdown
37
+
38
+ <p align="center">
39
+ <img src="serenecode.jpg" alt="SereneCode" width="500">
40
+ </p>
41
+
42
+ <h3 align="center">A Framework for AI-Driven Development of Verifiable Systems</h3>
43
+
44
+ SereneCode is a verification framework for AI-generated Python. It tells the AI *how* to write verifiable code, checks that the AI followed instructions, and then verifies the code at multiple levels — from test coverage analysis that catches gaps in AI-written tests, through property-based testing that checks contracts against hundreds of random inputs, to symbolic execution that uses an SMT solver to search for *any* input that breaks a contract. You choose the verification depth that matches your project: lightweight for internal tools, balanced for production systems, strict for safety-critical code. AI agents write code fast but can be suboptimal at testing their own work; SereneCode closes that gap by surfacing untested paths, generating test suggestions, and verifying behavior beyond what hand-written tests cover.
45
+
46
+ > **This framework was bootstrapped with AI under its own rules.** SereneCode's SERENECODE.md was written before the first line of code, and the codebase has been developed under those conventions from the start. The current tree passes its own `serenecode check src --level 6 --allow-code-execution`, an internal strict-config Level 6 self-check in the test suite, `mypy src examples/dosage-serenecode/src`, the shipped example's strict Level 6 check, and the full `pytest` suite. The verification output is transparent about scope: exempt modules (adapters, CLI, ports) and functions excluded from deep verification (non-primitive parameter types) are reported as "exempt" rather than silently omitted.
47
+
48
+ ---
49
+
50
+ ## Why This Exists
51
+
52
+ AI writes code fast. But *fast* and *correct* aren't the same thing. When you're building a medical dosage calculator, a financial ledger, or an avionics controller, "it passed my tests" isn't enough. Tests check the inputs you thought of. Formal verification uses an SMT solver to search for *any* input that breaks your contracts.
53
+
54
+ The problem is that formal verification has always been expensive — too slow, too manual, too specialized. SereneCode makes it tractable by controlling the process from the start: a convention file tells the AI to write verification-ready code, a structural linter checks it followed the rules, and CrossHair + Z3 search for contract violations via symbolic execution.
55
+
56
+ SereneCode is designed for **building new verifiable systems from scratch with AI**, not for retrofitting verification onto large existing codebases. The conventions go in before the first line of code, and every module is written with verification in mind from day one. That's what makes it work. SereneCode is a best-effort tool, not a guarantee — see the [Disclaimer](#disclaimer) for important limitations on what it can and cannot assure.
57
+
58
+ ### Choosing the Right Level
59
+
60
+ The cost of verification should be proportional to the cost of a bug. Each level generates a different SERENECODE.md with different requirements for the AI, so the choice shapes how code is *written*, not just how it's checked.
61
+
62
+ | | `--minimal` | **Default** | `--strict` |
63
+ |---|---|---|---|
64
+ | **Verifies through** | L2 (structure + types) | L4 (+ test coverage + properties) | L6 (+ symbolic + compositional) |
65
+ | **What the AI must write** | Contracts on public functions, type annotations | + description strings, class invariants, hexagonal architecture | + contracts on *all* functions, loop invariants, domain exceptions, no exemptions |
66
+ | **What catches bugs** | Runtime contract checks, mypy | + L3 surfaces untested code paths and generates test suggestions; L4 tests contracts against hundreds of random inputs | + SMT solver searches for *any* counterexample within analysis bounds |
67
+ | **Good for** | Internal tools, scripts, prototypes, incremental adoption | Production APIs, business logic, data pipelines | Medical, financial, infrastructure, regulated systems |
68
+ | **The tradeoff** | Low ceremony, but contracts are only checked at the boundaries you wrote them | Moderate overhead; architecture rules keep core logic pure and testable | Significant overhead — every loop gets an invariant comment, every helper gets a contract. Justified when the cost of an undiscovered bug is measured in patient harm, financial loss, or regulatory failure |
69
+
70
+ Pick the level that matches the stakes, and pick it early. Moving up later means retrofitting contracts, invariants, and architecture onto existing code — it's not just flipping a flag. Safety-critical code should be written for `--strict` from the first line.
71
+
72
+ ---
73
+
74
+ ## See It In Action: The Medical Dosage Calculator
75
+
76
+ We built the same medical dosage calculator twice from the same spec — once with plain AI, once with SereneCode — to show the difference.
77
+
78
+ Both versions implement four functions: dose calculation with weight-based dosing and max caps, renal function adjustment with tiered CrCl thresholds, daily safety checks with explicit total-versus-threshold calculations, and contraindication detection across current medications.
79
+
80
+ Both versions implement the same requirements, and the plain version passes its 59-test suite. Here's what SereneCode adds on top:
81
+
82
+ | What can you claim? | Plain AI | SereneCode |
83
+ |---|---|---|
84
+ | **Dose never exceeds maximum** | Covered by unit tests | Encoded as a postcondition; bounded symbolic search found no counterexample within analysis bounds |
85
+ | **Renal adjustment never increases a dose** | Covered by unit tests | `result <= dose_mg` is an executable contract, not just a test expectation |
86
+ | **Safety result is internally consistent** | No validation — you can construct `SafetyResult(total=9999, max=100, is_safe=True)` | Representation invariants make inconsistent `SafetyResult` states unconstructable |
87
+ | **Objects are truly immutable** | `frozen=True` with mutable `set` on Drug | `_Frozen` mixin + immutable `tuple`/`frozenset` fields — fully locked down |
88
+ | **Boundary behavior (CrCl exactly 30.0)** | Covered by explicit tests | Boundary behavior is specified in contracts; bounded symbolic search found no counterexample |
89
+ | **What if someone changes the code later?** | You rely on the tests you remembered to keep | Contracts stay attached to the code and keep checking every contracted call |
90
+ | **Can a solver verify it?** | No executable specification for a solver to target | 120 executable contracts and a clean `serenecode check ... --level 6 --allow-code-execution` run |
91
+ | **Confidence in a safety-critical setting** | Better than ad hoc code, but still test-shaped confidence | Higher: behavior is formally specified, runtime-checked, and solver-checked within analysis bounds — but bounded search is not proof |
92
+
93
+ The plain version relies on 59 tests that check specific scenarios. The SereneCode version adds 120 executable contracts across its domain models and core dosage logic. Those contracts define *what correct means* in code, get checked at runtime, and give CrossHair/Z3 something precise to search against when looking for counterexamples within analysis bounds.
94
+
95
+ > Both examples live in [`examples/dosage-regular/`](examples/dosage-regular/) and [`examples/dosage-serenecode/`](examples/dosage-serenecode/). Read them side by side.
96
+
97
+ The Serenecode dosage example currently passes `serenecode check examples/dosage-serenecode/src --level 6 --allow-code-execution`. Its local `pytest` suite is also green with 74 passing tests.
98
+
99
+ ---
100
+
101
+ ## How It Works
102
+
103
+ ### 1. SERENECODE.md — Your AI Writes Code That's Built for Verification
104
+
105
+ A markdown file in your project root that tells AI coding agents exactly how to write code: what contracts to include, what architecture to follow, what patterns to use. When Claude Code (or another agent) reads this before generating code, it has a concrete target for producing verification-friendly output from the first keystroke.
106
+
107
+ ```bash
108
+ serenecode init # balanced defaults — contracts on public APIs, test coverage, hexagonal architecture
109
+ serenecode init --strict # maximum rigor — contracts on ALL functions (public and private), no exemptions
110
+ serenecode init --minimal # lightweight — public-function contracts only, relaxed architecture rules
111
+ ```
112
+
113
+ This creates a SERENECODE.md tailored to your project and integrates with CLAUDE.md so Claude Code follows the conventions automatically. You write the rules once, and the agent has a stable spec to follow on every iteration.
114
+
115
+ ### 2. The Checker — Instant Feedback
116
+
117
+ A lightweight AST-based linter that validates code follows SERENECODE.md conventions in seconds. Missing a postcondition? No class invariant? I/O imports in a core module? Caught before you waste time on heavy verification.
118
+
119
+ ```bash
120
+ serenecode check src/ --structural # seconds
121
+ ```
122
+
123
+ ### 3. The Verifier — Symbolic Verification
124
+
125
+ A six-level verification pipeline that escalates from fast checks to full symbolic verification:
126
+
127
+ | Level | What | Speed | Backend |
128
+ |-------|------|-------|---------|
129
+ | **L1** | Structural conventions | Seconds | AST analysis |
130
+ | **L2** | Type correctness | Seconds | mypy --strict |
131
+ | **L3** | Test coverage analysis | Seconds–minutes | coverage.py |
132
+ | **L4** | Property-based testing | Seconds–minutes | Hypothesis |
133
+ | **L5** | Symbolic search (bounded) | Minutes | CrossHair / Z3 |
134
+ | **L6** | Cross-module verification | Seconds | Compositional analysis |
135
+
136
+ ```bash
137
+ serenecode check src/ --level 6 --allow-code-execution # verify it
138
+ ```
139
+
140
+ **L3 Test Coverage** is where SereneCode checks that the AI's tests actually exercise the code it wrote. AI agents can be suboptimal at writing tests — they tend to cover the happy path, skip edge cases, and miss error branches. L3 runs your existing tests under coverage.py tracing, measures per-function line and branch coverage, and reports exactly which lines and branches are untested. For each coverage gap, it generates concrete test suggestions including mock necessity assessments: each dependency is classified as REQUIRED (external I/O — must mock) or OPTIONAL (internal code — consider using the real implementation). This gives the AI agent actionable feedback to improve its own tests rather than leaving coverage gaps undetected. When no tests exist for a module, L3 reports this as informational rather than failing, so the coverage level serves as a baseline measurement before L4 property testing generates new test inputs.
141
+
142
+ The full pipeline is thorough but not instant. Larger systems will take longer, and the deepest runs may surface skipped items when Hypothesis cannot synthesize valid values for complex domain types or when CrossHair hits its time budget. By default, L5 focuses on contracted top-level functions defined in each module and skips modules or signatures that are currently poor fits for direct symbolic execution, such as adapter/composition-root code, helper predicate modules, and object-heavy APIs. Not everything needs L5/L6. Critical paths get full symbolic and compositional verification. Utility functions get property testing. A Level 4 run only counts as achieved when at least one contracted property target was actually exercised.
143
+
144
+ Levels 3-6 import and execute project modules so coverage.py, Hypothesis, and CrossHair can exercise real code. Deep runs therefore require explicit `--allow-code-execution` and should only be used on trusted code.
145
+
146
+ Scoped targets keep their package/import context across verification levels. In practice that means commands like `serenecode check src/core/ --level 4 --allow-code-execution` and `serenecode check src/core/models.py --level 3 --allow-code-execution` use the same local import roots and architectural module paths as a project-wide run instead of breaking relative imports or scoped core-module rules. Those scoped core/exemption rules are matched on path segments, not raw substrings, so names like `notcli.py`, `viewmodels.py`, and `transports/` do not accidentally change policy classification. Standalone files with non-importable names are also targeted correctly for CrossHair via `file.py:line` references.
147
+
148
+ ---
149
+
150
+ ## The AI Agent Loop
151
+
152
+ SereneCode is designed for AI agents that write code and fix their own mistakes:
153
+
154
+ ```
155
+ AI reads SERENECODE.md → knows how to write verification-ready code
156
+ AI generates code with contracts → postconditions, input preconditions, invariants
157
+ serenecode check --structural → instant: did the AI follow the rules?
158
+ serenecode check --level 5 --allow-code-execution → deep: can the solver find any counterexample?
159
+ AI reads counterexamples → "input x=[-1] violates postcondition"
160
+ AI fixes the code → adjusts implementation or contract
161
+ Repeat until verified → no counterexample found, not just tested
162
+ ```
163
+
164
+ AI-generated code won't always pass verification on the first try — and that's the point. SereneCode gives the coding agent structured feedback on exactly what failed and why: counterexamples, violated contracts, and suggested fixes. The agent uses that feedback to iterate until the code passes. The value isn't in one-shotting perfection — it's in the loop that converges on verified correctness.
165
+
166
+ Works in Claude Code, works in the terminal, works in CI:
167
+
168
+ ```python
169
+ import serenecode
170
+
171
+ result = serenecode.check(path="src/", level=5, allow_code_execution=True)
172
+ for failure in result.failures:
173
+ print(f"{failure.function} @ {failure.file}:{failure.line}")
174
+ for detail in failure.details:
175
+ if detail.counterexample is not None:
176
+ print(detail.counterexample) # exact input that breaks the code
177
+ if detail.suggestion is not None:
178
+ print(detail.suggestion) # proposed fix direction
179
+ ```
180
+
181
+ ---
182
+
183
+ ## Built With Its Own Medicine
184
+
185
+ SereneCode isn't just a tool that *tells* you to write verified code. It *is* verified code.
186
+
187
+ The SERENECODE.md convention file was the first artifact created — before any Python was written. The framework has been developed under those conventions with AI as a first-class contributor, and the repository continuously checks itself with:
188
+
189
+ - `pytest` across the full suite (currently 651 passing tests, 16 skipped)
190
+ - `mypy --strict` across `src/` and `examples/dosage-serenecode/src/`
191
+ - SereneCode's own structural, type, property, symbolic, and compositional passes
192
+
193
+ On the current tree, `serenecode check src --level 6 --allow-code-execution` runs all six verification levels. The exempt items include adapter modules (which handle I/O and are integration-tested), port interfaces (Protocols that define abstract contracts), CLI entry points, and functions whose parameter types are too complex for automated strategy generation or symbolic execution. Exempt items are visible in the output — they are not silently omitted.
194
+
195
+ At Level 5, CrossHair and Z3 search for counterexamples across the codebase's symbolic-friendly contracted top-level functions. Functions with non-primitive parameters (custom dataclasses, Protocol implementations, Callable types) are reported as exempt because the solver cannot generate inputs for them. Level 6 adds structural compositional analysis: dependency direction, circular dependency detection, interface compliance, contract presence at module boundaries, aliased cross-module call resolution, and architectural invariants. Interface compliance follows explicit `Protocol` inheritance and checks substitutability, including extra required parameters and incompatible return annotations. Together, they provide both deep per-function verification and system-level structural guarantees — but the structural checks at L6 verify contract *presence*, not logical *sufficiency* across call chains.
196
+
197
+ ---
198
+
199
+ ## Quick Start
200
+
201
+ ```bash
202
+ # Clone and install from source
203
+ git clone https://github.com/helgster77/serenecode.git
204
+ cd serenecode
205
+ uv sync --extra verify
206
+
207
+ # Or with pip:
208
+ # pip install -e ".[verify]"
209
+
210
+ # Initialize a project with conventions
211
+ serenecode init
212
+
213
+ # Let your AI agent write code following SERENECODE.md...
214
+ # Then verify:
215
+ serenecode check src/ --structural
216
+
217
+ # Or go deep:
218
+ serenecode check src/core/ --level 5 --allow-code-execution --format json
219
+ ```
220
+
221
+ JSON output includes top-level `passed`, `level_requested`, and `level_achieved` fields alongside the summary and per-function results.
222
+
223
+ When you verify a nested package or a single module, Serenecode now preserves the package root and module-path context used by mypy, Hypothesis, CrossHair, and the architectural checks. That lets package-local absolute imports, relative imports, and scoped core-module rules behave the same way they do in project-wide runs.
224
+
225
+ ## CLI Reference
226
+
227
+ ```bash
228
+ serenecode init [<path>] [--strict | --minimal] # set up conventions
229
+ serenecode check [<path>] [--level 1-6] [--allow-code-execution] # run verification
230
+ [--format human|json] # output format
231
+ [--structural] [--verify] # L1 only / L3-6 only
232
+ [--per-condition-timeout N] # L5 CrossHair budgets
233
+ [--per-path-timeout N] [--module-timeout N] # (defaults: 30/10/300s)
234
+ [--workers N] # L5 parallel workers
235
+ serenecode status [<path>] [--format human|json] # verification status
236
+ serenecode report [<path>] [--format human|json|html] # generate reports
237
+ [--output FILE] [--allow-code-execution] # write to file
238
+ ```
239
+
240
+ **Exit codes:** 0 = passed, 1 = structural, 2 = types, 3 = coverage, 4 = properties, 5 = symbolic, 6 = compositional, 10 = internal error or deep verification refused without explicit trust.
241
+
242
+ ---
243
+
244
+ ## Honest Limitations
245
+
246
+ SereneCode is honest about what it can and can't do:
247
+
248
+ **"No counterexample found" is not "proven correct."** CrossHair uses bounded symbolic execution backed by Z3 — it explores execution paths within time limits (default: 30 seconds per condition, 10 seconds per path, 300 seconds per module) and searches for counterexamples. When it reports "no counterexample found within analysis bounds," that's strong evidence of correctness for the explored paths, but it's not an unbounded proof in the Coq/Lean sense. For pure functions with simple control flow, the coverage is often effectively exhaustive. For complex code, it's bounded. The tool's output now uses this honest language rather than saying "verified."
249
+
250
+ **Contracts are only as good as you write them.** A function with weak postconditions will pass verification even if the implementation is subtly wrong. SereneCode checks that contracts exist and hold, but can't check that they fully capture your intent. Tautological contracts like `lambda self: True` are now flagged by the conventions and should not be used — they provide no verification value.
251
+
252
+ **Exempt items are visible, not hidden.** Modules exempt from structural checking (adapters, CLI, ports, `__init__.py`) and functions excluded from deep verification (non-primitive parameter types, adapter code) are reported as "exempt" in the output rather than being silently omitted. This makes the verification scope transparent: the tool reports passed, failed, skipped, and exempt counts separately so you can see exactly what was and wasn't deeply verified. Previous versions silently omitted these, inflating the apparent scope.
253
+
254
+ **Runtime checks can be disabled.** icontract decorators are checked on every call by default, but can be disabled via environment variables for performance in production. This is a feature, not a bug — but it means runtime guarantees depend on configuration.
255
+
256
+ **Not everything can be deeply verified.** Functions with complex domain-type parameters (custom dataclasses, Callable, Protocol implementations) are automatically excluded from L4/L5 because the tools cannot generate valid inputs for them — they show up as "exempt" in the output. See "Choosing the Right Level" above for guidance on which verification depth fits your system.
257
+
258
+ **Levels 3-6 execute your code.** Coverage analysis, property-based testing, and symbolic verification import project modules and run their top-level code as part of analysis. Module loading uses `compile()` + `exec()` on target source files and their transitive dependencies. There is no sandboxing or syscall filtering — a malicious `.py` file in the target directory gets full access to the host. Use `--allow-code-execution` or `allow_code_execution=True` only for code you trust. Subprocess-based backends (CrossHair, pytest/coverage) receive module paths and search paths from the source discovery layer; symlink-based directory traversal is blocked (`followlinks=False`), but the trust boundary ultimately relies on the `--allow-code-execution` gate.
259
+
260
+ **Deep runs can be incomplete by default.** A result can include skipped items even when there are no correctness failures: Hypothesis may not be able to derive strategies for some highly structured project-local types, and CrossHair can time out on solver-heavy modules once the module budget is exhausted. When a run exercises no property-testing targets at all, Serenecode does not claim L4 was achieved. When a scoped run produces no symbolic findings at all, Serenecode does not claim L5 was achieved. A verification level is only marked as achieved when results are non-empty with no failures and no skips — empty results from L3/L4/L5 backends mean "nothing was exercised," not "everything passed." Increase `--per-condition-timeout`, `--per-path-timeout`, or `--module-timeout` when you want to push harder on L5.
261
+
262
+ **Level 6 is structural, not semantic.** Compositional verification (L6) checks that contracts *exist* at module boundaries, that dependency direction is correct, and that interfaces structurally match, including explicit `Protocol` inheritance and signature-shape compatibility. It does not verify that postconditions *logically satisfy* preconditions across call chains — that would require symbolic reasoning across module boundaries, which is a planned future enhancement. L6 catches architectural violations and contract gaps, not logical insufficiency. Source files with syntax errors are now reported as skipped with an actionable message instead of silently producing an empty analysis.
263
+
264
+
265
+
266
+ ---
267
+
268
+ ## Architecture
269
+
270
+ SereneCode follows hexagonal architecture — the same pattern it enforces on your code:
271
+
272
+ ```
273
+ CLI / Library API ← composition roots
274
+
275
+ ├──▸ Pipeline ← orchestrates L1 → L2 → L3 → L4 → L5 → L6
276
+ │ ├──▸ Structural Checker (ast)
277
+ │ ├──▸ Type Checker (mypy)
278
+ │ ├──▸ Coverage Analyzer (coverage.py)
279
+ │ ├──▸ Property Tester (Hypothesis)
280
+ │ ├──▸ Symbolic Checker (CrossHair/Z3)
281
+ │ └──▸ Compositional Checker (ast)
282
+
283
+ ├──▸ Reporter ← human / JSON / HTML
284
+
285
+ └──▸ Adapters → Ports ← Protocol interfaces for all I/O
286
+ ```
287
+
288
+ Core logic is pure. All I/O goes through Protocol-defined ports. The verification engine itself is verifiable.
289
+
290
+ ## Disclaimer
291
+
292
+ SereneCode is provided as-is, without warranty of any kind. It is a best-effort tool that helps surface defects through contracts, property-based testing, and bounded symbolic execution — but it cannot guarantee the absence of bugs. "No counterexample found" means the solver did not find one within its analysis bounds, not that none exists. Verification results depend on the quality of the contracts you write, the time budgets you configure, and the inherent limitations of the underlying tools.
293
+
294
+ Users are responsible for the correctness, safety, and regulatory compliance of their own systems. SereneCode is not a substitute for independent code review, domain-expert validation, or any certification process required by your industry. If you are building safety-critical software, use this framework as one layer of assurance among many — not as the only one.
295
+
296
+ ## License
297
+
298
+ MIT