serenecode 0.3.0__tar.gz → 0.5.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (170) hide show
  1. {serenecode-0.3.0 → serenecode-0.5.0}/CLAUDE.md +12 -6
  2. {serenecode-0.3.0 → serenecode-0.5.0}/PKG-INFO +83 -37
  3. {serenecode-0.3.0 → serenecode-0.5.0}/README.md +81 -36
  4. serenecode-0.5.0/SPEC.md +290 -0
  5. serenecode-0.5.0/docs/SECURITY.md +37 -0
  6. serenecode-0.5.0/docs/VERIFICATION_LEVELS.md +29 -0
  7. {serenecode-0.3.0 → serenecode-0.5.0}/examples/dosage-serenecode/CLAUDE.md +7 -5
  8. {serenecode-0.3.0 → serenecode-0.5.0}/examples/dosage-serenecode/SERENECODE.md +64 -5
  9. {serenecode-0.3.0 → serenecode-0.5.0}/examples/dosage-serenecode/SPEC.md +2 -0
  10. {serenecode-0.3.0 → serenecode-0.5.0}/pyproject.toml +2 -1
  11. {serenecode-0.3.0 → serenecode-0.5.0}/src/serenecode/__init__.py +57 -1
  12. {serenecode-0.3.0 → serenecode-0.5.0}/src/serenecode/adapters/__init__.py +17 -0
  13. serenecode-0.5.0/src/serenecode/adapters/coverage_adapter.py +718 -0
  14. serenecode-0.5.0/src/serenecode/adapters/coverage_suggestions.py +519 -0
  15. {serenecode-0.3.0 → serenecode-0.5.0}/src/serenecode/adapters/crosshair_adapter.py +65 -472
  16. serenecode-0.5.0/src/serenecode/adapters/hypothesis_adapter.py +841 -0
  17. serenecode-0.5.0/src/serenecode/adapters/hypothesis_strategies.py +965 -0
  18. {serenecode-0.3.0 → serenecode-0.5.0}/src/serenecode/adapters/local_fs.py +1 -0
  19. {serenecode-0.3.0 → serenecode-0.5.0}/src/serenecode/adapters/module_loader.py +2 -0
  20. serenecode-0.5.0/src/serenecode/adapters/unavailable_dead_code_adapter.py +44 -0
  21. serenecode-0.5.0/src/serenecode/adapters/vulture_adapter.py +105 -0
  22. serenecode-0.5.0/src/serenecode/checker/compositional.py +972 -0
  23. serenecode-0.5.0/src/serenecode/checker/compositional_integration.py +934 -0
  24. serenecode-0.5.0/src/serenecode/checker/compositional_parsing.py +729 -0
  25. serenecode-0.5.0/src/serenecode/checker/spec_traceability.py +964 -0
  26. serenecode-0.5.0/src/serenecode/checker/structural.py +1000 -0
  27. serenecode-0.5.0/src/serenecode/checker/structural_helpers.py +999 -0
  28. serenecode-0.5.0/src/serenecode/checker/structural_quality.py +997 -0
  29. serenecode-0.5.0/src/serenecode/checker/symbolic.py +165 -0
  30. {serenecode-0.3.0 → serenecode-0.5.0}/src/serenecode/cli.py +180 -327
  31. serenecode-0.5.0/src/serenecode/cli_helpers.py +471 -0
  32. {serenecode-0.3.0 → serenecode-0.5.0}/src/serenecode/config.py +159 -35
  33. {serenecode-0.3.0 → serenecode-0.5.0}/src/serenecode/contracts/predicates.py +36 -3
  34. {serenecode-0.3.0 → serenecode-0.5.0}/src/serenecode/core/exceptions.py +2 -0
  35. serenecode-0.5.0/src/serenecode/core/module_health.py +527 -0
  36. {serenecode-0.3.0 → serenecode-0.5.0}/src/serenecode/core/pipeline.py +349 -367
  37. serenecode-0.5.0/src/serenecode/core/pipeline_helpers.py +223 -0
  38. {serenecode-0.3.0 → serenecode-0.5.0}/src/serenecode/init.py +119 -84
  39. {serenecode-0.3.0 → serenecode-0.5.0}/src/serenecode/mcp/resources.py +38 -5
  40. {serenecode-0.3.0 → serenecode-0.5.0}/src/serenecode/mcp/schemas.py +21 -4
  41. {serenecode-0.3.0 → serenecode-0.5.0}/src/serenecode/mcp/server.py +107 -26
  42. {serenecode-0.3.0 → serenecode-0.5.0}/src/serenecode/mcp/tools.py +257 -249
  43. serenecode-0.5.0/src/serenecode/mcp/tools_spec.py +458 -0
  44. {serenecode-0.3.0 → serenecode-0.5.0}/src/serenecode/models.py +50 -7
  45. {serenecode-0.3.0 → serenecode-0.5.0}/src/serenecode/ports/coverage_analyzer.py +2 -2
  46. serenecode-0.5.0/src/serenecode/ports/dead_code_analyzer.py +82 -0
  47. {serenecode-0.3.0 → serenecode-0.5.0}/src/serenecode/ports/file_system.py +1 -0
  48. {serenecode-0.3.0 → serenecode-0.5.0}/src/serenecode/reporter.py +214 -139
  49. {serenecode-0.3.0 → serenecode-0.5.0}/src/serenecode/source_discovery.py +105 -2
  50. serenecode-0.5.0/src/serenecode/support/crosshair_parsing.py +435 -0
  51. serenecode-0.5.0/src/serenecode/support/hypothesis_refinement.py +240 -0
  52. {serenecode-0.3.0 → serenecode-0.5.0}/src/serenecode/templates/content.py +132 -13
  53. {serenecode-0.3.0 → serenecode-0.5.0}/tests/e2e/test_check_command.py +3 -0
  54. {serenecode-0.3.0 → serenecode-0.5.0}/tests/e2e/test_cli_branches.py +45 -3
  55. {serenecode-0.3.0 → serenecode-0.5.0}/tests/e2e/test_init_command.py +33 -5
  56. {serenecode-0.3.0 → serenecode-0.5.0}/tests/integration/test_adapter_internals.py +16 -16
  57. serenecode-0.5.0/tests/integration/test_coverage_suggestions.py +24 -0
  58. {serenecode-0.3.0 → serenecode-0.5.0}/tests/integration/test_crosshair_adapter_helpers.py +2 -0
  59. serenecode-0.5.0/tests/integration/test_crosshair_parsing.py +19 -0
  60. {serenecode-0.3.0 → serenecode-0.5.0}/tests/integration/test_example_projects.py +19 -5
  61. {serenecode-0.3.0 → serenecode-0.5.0}/tests/integration/test_hypothesis_adapter.py +49 -0
  62. {serenecode-0.3.0 → serenecode-0.5.0}/tests/integration/test_hypothesis_adapter_helpers.py +27 -7
  63. serenecode-0.5.0/tests/integration/test_hypothesis_refinement.py +16 -0
  64. serenecode-0.5.0/tests/integration/test_hypothesis_strategies.py +11 -0
  65. {serenecode-0.3.0 → serenecode-0.5.0}/tests/integration/test_resources.py +25 -0
  66. {serenecode-0.3.0 → serenecode-0.5.0}/tests/integration/test_schemas.py +37 -1
  67. {serenecode-0.3.0 → serenecode-0.5.0}/tests/integration/test_server.py +4 -1
  68. {serenecode-0.3.0 → serenecode-0.5.0}/tests/integration/test_tools.py +132 -0
  69. serenecode-0.5.0/tests/integration/test_tools_spec.py +11 -0
  70. serenecode-0.5.0/tests/integration/test_unavailable_dead_code_adapter.py +16 -0
  71. serenecode-0.5.0/tests/integration/test_vulture_adapter.py +39 -0
  72. {serenecode-0.3.0 → serenecode-0.5.0}/tests/unit/checker/test_compositional_helpers.py +48 -0
  73. serenecode-0.5.0/tests/unit/checker/test_compositional_integration.py +11 -0
  74. serenecode-0.5.0/tests/unit/checker/test_compositional_parsing.py +10 -0
  75. serenecode-0.5.0/tests/unit/checker/test_module_health.py +323 -0
  76. {serenecode-0.3.0 → serenecode-0.5.0}/tests/unit/checker/test_spec_traceability.py +48 -1
  77. {serenecode-0.3.0 → serenecode-0.5.0}/tests/unit/checker/test_structural.py +16 -12
  78. {serenecode-0.3.0 → serenecode-0.5.0}/tests/unit/checker/test_structural_helpers.py +10 -6
  79. serenecode-0.5.0/tests/unit/checker/test_structural_quality.py +23 -0
  80. serenecode-0.5.0/tests/unit/contracts/__init__.py +0 -0
  81. {serenecode-0.3.0 → serenecode-0.5.0}/tests/unit/contracts/test_predicates.py +2 -2
  82. {serenecode-0.3.0 → serenecode-0.5.0}/tests/unit/contracts/test_predicates_hypothesis.py +1 -1
  83. serenecode-0.5.0/tests/unit/mcp/__init__.py +0 -0
  84. serenecode-0.5.0/tests/unit/mcp/test_tool_module_health.py +89 -0
  85. serenecode-0.5.0/tests/unit/test_cli_helpers.py +27 -0
  86. {serenecode-0.3.0 → serenecode-0.5.0}/tests/unit/test_config.py +30 -0
  87. {serenecode-0.3.0 → serenecode-0.5.0}/tests/unit/test_models.py +43 -1
  88. {serenecode-0.3.0 → serenecode-0.5.0}/tests/unit/test_pipeline.py +11 -7
  89. serenecode-0.5.0/tests/unit/test_pipeline_helpers.py +20 -0
  90. {serenecode-0.3.0 → serenecode-0.5.0}/tests/unit/test_reporter.py +51 -2
  91. {serenecode-0.3.0 → serenecode-0.5.0}/tests/unit/test_templates_content.py +6 -0
  92. {serenecode-0.3.0 → serenecode-0.5.0}/uv.lock +15 -1
  93. serenecode-0.3.0/SERENECODE.md +0 -669
  94. serenecode-0.3.0/src/serenecode/adapters/coverage_adapter.py +0 -1188
  95. serenecode-0.3.0/src/serenecode/adapters/hypothesis_adapter.py +0 -1828
  96. serenecode-0.3.0/src/serenecode/checker/compositional.py +0 -2219
  97. serenecode-0.3.0/src/serenecode/checker/spec_traceability.py +0 -490
  98. serenecode-0.3.0/src/serenecode/checker/structural.py +0 -2909
  99. serenecode-0.3.0/src/serenecode/checker/symbolic.py +0 -178
  100. {serenecode-0.3.0 → serenecode-0.5.0}/.gitignore +0 -0
  101. {serenecode-0.3.0 → serenecode-0.5.0}/LICENSE +0 -0
  102. {serenecode-0.3.0 → serenecode-0.5.0}/examples/DOSAGE_CALC_SPEC.md +0 -0
  103. {serenecode-0.3.0 → serenecode-0.5.0}/examples/dosage-regular/dosage_calc.py +0 -0
  104. {serenecode-0.3.0 → serenecode-0.5.0}/examples/dosage-regular/test_dosage_calc.py +0 -0
  105. {serenecode-0.3.0 → serenecode-0.5.0}/examples/dosage-serenecode/pyproject.toml +0 -0
  106. {serenecode-0.3.0 → serenecode-0.5.0}/examples/dosage-serenecode/src/dosage/__init__.py +0 -0
  107. {serenecode-0.3.0 → serenecode-0.5.0}/examples/dosage-serenecode/src/dosage/core/__init__.py +0 -0
  108. {serenecode-0.3.0 → serenecode-0.5.0}/examples/dosage-serenecode/src/dosage/core/dosage.py +0 -0
  109. {serenecode-0.3.0 → serenecode-0.5.0}/examples/dosage-serenecode/src/dosage/core/models.py +0 -0
  110. {serenecode-0.3.0 → serenecode-0.5.0}/examples/dosage-serenecode/src/dosage/core/safety.py +0 -0
  111. {serenecode-0.3.0 → serenecode-0.5.0}/examples/dosage-serenecode/tests/__init__.py +0 -0
  112. {serenecode-0.3.0 → serenecode-0.5.0}/examples/dosage-serenecode/tests/unit/__init__.py +0 -0
  113. {serenecode-0.3.0 → serenecode-0.5.0}/examples/dosage-serenecode/tests/unit/test_dosage.py +0 -0
  114. {serenecode-0.3.0 → serenecode-0.5.0}/examples/dosage-serenecode/tests/unit/test_models.py +0 -0
  115. {serenecode-0.3.0 → serenecode-0.5.0}/examples/dosage-serenecode/tests/unit/test_safety.py +0 -0
  116. {serenecode-0.3.0 → serenecode-0.5.0}/examples/dosage-serenecode/uv.lock +0 -0
  117. {serenecode-0.3.0 → serenecode-0.5.0}/serenecode.jpg +0 -0
  118. {serenecode-0.3.0 → serenecode-0.5.0}/src/serenecode/adapters/mypy_adapter.py +0 -0
  119. {serenecode-0.3.0 → serenecode-0.5.0}/src/serenecode/checker/__init__.py +0 -0
  120. {serenecode-0.3.0 → serenecode-0.5.0}/src/serenecode/checker/coverage.py +0 -0
  121. {serenecode-0.3.0 → serenecode-0.5.0}/src/serenecode/checker/properties.py +0 -0
  122. {serenecode-0.3.0 → serenecode-0.5.0}/src/serenecode/checker/types.py +0 -0
  123. {serenecode-0.3.0 → serenecode-0.5.0}/src/serenecode/contracts/__init__.py +0 -0
  124. {serenecode-0.3.0 → serenecode-0.5.0}/src/serenecode/core/__init__.py +0 -0
  125. {serenecode-0.3.0 → serenecode-0.5.0}/src/serenecode/mcp/__init__.py +0 -0
  126. {serenecode-0.3.0 → serenecode-0.5.0}/src/serenecode/ports/__init__.py +0 -0
  127. {serenecode-0.3.0 → serenecode-0.5.0}/src/serenecode/ports/property_tester.py +0 -0
  128. {serenecode-0.3.0 → serenecode-0.5.0}/src/serenecode/ports/symbolic_checker.py +0 -0
  129. {serenecode-0.3.0 → serenecode-0.5.0}/src/serenecode/ports/type_checker.py +0 -0
  130. {serenecode-0.3.0/tests → serenecode-0.5.0/src/serenecode/support}/__init__.py +0 -0
  131. {serenecode-0.3.0 → serenecode-0.5.0}/src/serenecode/templates/__init__.py +0 -0
  132. {serenecode-0.3.0/tests/e2e → serenecode-0.5.0/tests}/__init__.py +0 -0
  133. {serenecode-0.3.0 → serenecode-0.5.0}/tests/conftest.py +0 -0
  134. {serenecode-0.3.0/tests/integration → serenecode-0.5.0/tests/e2e}/__init__.py +0 -0
  135. {serenecode-0.3.0 → serenecode-0.5.0}/tests/e2e/test_cli.py +0 -0
  136. {serenecode-0.3.0 → serenecode-0.5.0}/tests/e2e/test_init.py +0 -0
  137. {serenecode-0.3.0 → serenecode-0.5.0}/tests/e2e/test_mcp_command.py +0 -0
  138. {serenecode-0.3.0 → serenecode-0.5.0}/tests/e2e/test_report_command.py +0 -0
  139. {serenecode-0.3.0 → serenecode-0.5.0}/tests/e2e/test_status_command.py +0 -0
  140. {serenecode-0.3.0 → serenecode-0.5.0}/tests/fixtures/edge_cases/aliased_import.py +0 -0
  141. {serenecode-0.3.0 → serenecode-0.5.0}/tests/fixtures/edge_cases/async_functions.py +0 -0
  142. {serenecode-0.3.0 → serenecode-0.5.0}/tests/fixtures/edge_cases/empty_module.py +0 -0
  143. {serenecode-0.3.0 → serenecode-0.5.0}/tests/fixtures/edge_cases/from_import.py +0 -0
  144. {serenecode-0.3.0 → serenecode-0.5.0}/tests/fixtures/invalid/broken_postcondition.py +0 -0
  145. {serenecode-0.3.0 → serenecode-0.5.0}/tests/fixtures/invalid/io_in_core.py +0 -0
  146. {serenecode-0.3.0 → serenecode-0.5.0}/tests/fixtures/invalid/missing_contracts.py +0 -0
  147. {serenecode-0.3.0 → serenecode-0.5.0}/tests/fixtures/invalid/missing_invariant.py +0 -0
  148. {serenecode-0.3.0 → serenecode-0.5.0}/tests/fixtures/invalid/missing_types.py +0 -0
  149. {serenecode-0.3.0 → serenecode-0.5.0}/tests/fixtures/valid/class_with_invariant.py +0 -0
  150. {serenecode-0.3.0 → serenecode-0.5.0}/tests/fixtures/valid/full_module.py +0 -0
  151. {serenecode-0.3.0 → serenecode-0.5.0}/tests/fixtures/valid/simple_function.py +0 -0
  152. {serenecode-0.3.0/tests/unit → serenecode-0.5.0/tests/integration}/__init__.py +0 -0
  153. {serenecode-0.3.0 → serenecode-0.5.0}/tests/integration/test_checkers_real_code.py +0 -0
  154. {serenecode-0.3.0 → serenecode-0.5.0}/tests/integration/test_coverage_adapter.py +0 -0
  155. {serenecode-0.3.0 → serenecode-0.5.0}/tests/integration/test_crosshair_adapter.py +0 -0
  156. {serenecode-0.3.0 → serenecode-0.5.0}/tests/integration/test_file_adapter.py +0 -0
  157. {serenecode-0.3.0 → serenecode-0.5.0}/tests/integration/test_local_fs.py +0 -0
  158. {serenecode-0.3.0 → serenecode-0.5.0}/tests/integration/test_module_loader.py +0 -0
  159. {serenecode-0.3.0 → serenecode-0.5.0}/tests/integration/test_mypy_adapter.py +0 -0
  160. {serenecode-0.3.0/tests/unit/checker → serenecode-0.5.0/tests/unit}/__init__.py +0 -0
  161. {serenecode-0.3.0/tests/unit/contracts → serenecode-0.5.0/tests/unit/checker}/__init__.py +0 -0
  162. {serenecode-0.3.0 → serenecode-0.5.0}/tests/unit/checker/test_compositional.py +0 -0
  163. {serenecode-0.3.0 → serenecode-0.5.0}/tests/unit/checker/test_coverage.py +0 -0
  164. {serenecode-0.3.0 → serenecode-0.5.0}/tests/unit/checker/test_properties.py +0 -0
  165. {serenecode-0.3.0 → serenecode-0.5.0}/tests/unit/checker/test_structural_hypothesis.py +0 -0
  166. {serenecode-0.3.0 → serenecode-0.5.0}/tests/unit/checker/test_symbolic.py +0 -0
  167. {serenecode-0.3.0 → serenecode-0.5.0}/tests/unit/checker/test_types.py +0 -0
  168. {serenecode-0.3.0 → serenecode-0.5.0}/tests/unit/test_api.py +0 -0
  169. {serenecode-0.3.0 → serenecode-0.5.0}/tests/unit/test_models_hypothesis.py +0 -0
  170. {serenecode-0.3.0 → serenecode-0.5.0}/tests/unit/test_source_discovery.py +0 -0
@@ -1,22 +1,28 @@
1
1
  ## Serenecode
2
2
 
3
- All code in this project MUST follow the standards defined in SERENECODE.md. Read SERENECODE.md before writing or modifying any code. Every public function must have icontract preconditions and postconditions. Every class with state must have invariants. Follow the architectural patterns specified in SERENECODE.md.
3
+ All code in this project MUST follow the same standards SereneCode ships to users: the embedded templates in `src/serenecode/templates/content.py` (default / strict / minimal) define the conventions the structural checker enforces. Read the relevant template before writing or modifying any code. Every public function must have icontract preconditions and postconditions. Every class with state must have invariants. Follow the architectural patterns specified there.
4
4
 
5
- ### Verification Commands
5
+ Pre-existing `*_SPEC.md` or PRD files are narrative inputs; only project-root `SPEC.md` with REQ/INT identifiers satisfies SereneCode traceability (`serenecode check --spec`).
6
6
 
7
- After each work iteration (implementing a feature, fixing a bug, refactoring), offer to run verification before considering the task complete.
7
+ ### Verification (prefer MCP while editing)
8
8
 
9
- **Quick structural check (seconds):**
9
+ After each work iteration (implementing a feature, fixing a bug, refactoring), run verification before considering the task complete.
10
+
11
+ **Preferred — MCP tools in the IDE (per-symbol, fast feedback):** use **`serenecode_check_function`** (or `serenecode_check_file`) on the code you just changed. Prefer this over shell `serenecode check` during active editing. Reserve **`serenecode_check`** for whole-tree / CI-style runs. If MCP wiring is unclear, run `serenecode doctor` for install and registration hints.
12
+
13
+ **CLI — batch / CI (when not using MCP for this step):**
14
+
15
+ Quick structural check (seconds):
10
16
  ```bash
11
17
  serenecode check src/ --structural
12
18
  ```
13
19
 
14
- **Full verification with coverage and property testing (minutes):**
20
+ Full verification with coverage and property testing (minutes):
15
21
  ```bash
16
22
  serenecode check src/ --level 4 --allow-code-execution
17
23
  ```
18
24
 
19
- **Full verification including symbolic and compositional (minutes):**
25
+ Full verification including symbolic and compositional (minutes):
20
26
  ```bash
21
27
  serenecode check src/ --level 6 --allow-code-execution
22
28
  ```
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: serenecode
3
- Version: 0.3.0
3
+ Version: 0.5.0
4
4
  Summary: Verification framework for AI-generated Python — test coverage, property testing, and symbolic execution
5
5
  Project-URL: Homepage, https://github.com/helgster77/serenecode
6
6
  Project-URL: Repository, https://github.com/helgster77/serenecode
@@ -26,6 +26,7 @@ Requires-Dist: crosshair-tool>=0.0.60
26
26
  Requires-Dist: hypothesis>=6.0
27
27
  Requires-Dist: icontract>=2.7.0
28
28
  Requires-Dist: mypy>=1.0
29
+ Requires-Dist: vulture>=2.14
29
30
  Provides-Extra: dev
30
31
  Requires-Dist: pytest-cov>=4.0; extra == 'dev'
31
32
  Requires-Dist: pytest>=7.0; extra == 'dev'
@@ -39,11 +40,18 @@ Description-Content-Type: text/markdown
39
40
 
40
41
  <h3 align="center">A Framework for AI-Driven Development of Verifiable Systems</h3>
41
42
 
42
- SereneCode is a spec-to-verified-implementation framework for AI-generated Python. It ensures that every requirement in your spec is implemented, tested, and formally verified closing the gap between what you asked for and what the AI built. The workflow starts from a spec with traceable requirements (REQ-xxx), enforces that the AI writes verifiable code with contracts and tests, then verifies at multiple levels from structural checks and test coverage through property-based testing to symbolic execution with an SMT solver. You choose the verification depth during interactive setup: lightweight for internal tools, balanced for production systems, strict for safety-critical code.
43
+ SereneCode turns the question from "did the model ship code?" to "does it match the spec, the types, and the contracts we agreed on?" It is a Python toolkit and workflow for teams using AI coding assistants: a structured spec (`REQ-xxx`, `INT-xxx`), a project-level `SERENECODE.md` that steers how code is written, and one verification pipeline you can drive **from the MCP server (recommended while editing)** or from the **CLI** (CI, scripts, and full-tree batch runs).
43
44
 
44
- SereneCode also ships a built-in **MCP server** so verification runs *inside* your AI assistant's edit loop, not just at the end. Once registered with Claude Code, Cursor, Cline, or Continue, the agent calls verification tools after every function it writes — getting structured findings, contract suggestions, and counterexamples back as JSON, fixing them mid-turn, and only reporting the work complete when the result is clean. AI agents write code fast but can miss requirements and skip edge cases; SereneCode closes that gap with spec traceability, test-existence enforcement, formal verification, and an MCP-driven inner loop the agent can drive on itself.
45
+ **Recommended workflow**
45
46
 
46
- > **This framework was bootstrapped with AI under its own rules.** SereneCode's SERENECODE.md was written before the first line of code, and the codebase has been developed under those conventions from the start including the MCP server, which the same AI agents now use to verify their own work mid-edit. The current tree passes its own `serenecode check src --level 6 --allow-code-execution` end-to-end via the bare CLI (718 functions checked, 557 passed, 161 exempt; ~6 minutes wall time), an internal strict-config Level 6 self-check in the test suite (`pytest tests/integration/test_example_projects.py::test_serenecode_repo_passes_strict_level_6`, which exercises L4-L6 against `strict_config` over the full source tree), `mypy src examples/dosage-serenecode/src`, the shipped dosage example's own `serenecode check src --level 6 --allow-code-execution`, and the full `pytest` suite (1,393 passing tests, 16 skipped). The verification output is transparent about scope: exempt modules (adapters, CLI, ports, MCP server, `__init__.py`) and functions excluded from deep verification (non-primitive parameter types) are reported as "exempt" rather than silently omitted.
47
+ 1. **MCP (AI / IDE)**Register `serenecode mcp` once in Cursor, Claude Code, Cline, Continue, or any MCP client. While coding, call **`serenecode_check_function`** (or `serenecode_check_file`) on the symbol you just edited—not a full-project check every time. This is the fastest feedback loop; it matches how the pipeline is meant to be used with assistants.
48
+ 2. **CLI** — Use `serenecode check` for pipelines, pre-merge gates, and automation. After any CLI run, the human-readable report ends with a short reminder pointing at MCP for per-symbol follow-up; run **`serenecode doctor`** to verify the optional `mcp` install and see copy-paste registration commands.
49
+
50
+ **What you get:** problems surface as **reported findings**, not as silent gaps—missing traceability, weak or missing contracts, untested paths, type errors, and (where you enable deeper levels) property-test failures, bounded symbolic counterexamples, and cross-module issues. You choose depth to match risk: seconds for structure and types, more time for coverage, Hypothesis, CrossHair-backed search within analysis bounds, and compositional checks. This does not replace review or domain expertise; it **reduces surprise** by making scope explicit and by re-running the same checks your repo already encodes.
51
+
52
+ **MCP:** register the server once (Claude Code, Cursor, Cline, Continue, or any MCP client) and the agent can pull structured JSON—findings, suggestions, counterexamples—while the cursor is still on the line, instead of discovering issues only after merge.
53
+
54
+ > **This framework was bootstrapped with AI under its own rules.** Convention templates in `src/serenecode/templates/content.py` were defined before the first line of code, and the codebase has been developed under those conventions from the start — including the MCP server, which the same AI agents now use to verify their own work mid-edit. The current tree passes its own `serenecode check src --level 6 --allow-code-execution` end-to-end via the bare CLI (counts vary with the tree; expect hundreds of functions with a mix of passed, exempt, and advisory notes; wall time on the order of minutes for a full L1–L6 run), an internal strict-config Level 6 self-check in the test suite (`pytest tests/integration/test_example_projects.py::test_serenecode_repo_passes_strict_level_6`, which exercises L4–L6 against `strict_config` over the full source tree), `mypy src examples/dosage-serenecode/src`, the shipped dosage example's own `serenecode check src --level 6 --allow-code-execution`, and the full `pytest` suite (on the order of 1,450+ passing tests; a small number are skipped by design). The verification output is transparent about scope: exempt modules (adapters, CLI, ports, MCP server, `__init__.py`) and functions excluded from deep verification (non-primitive parameter types) are reported as "exempt" rather than silently omitted; dead-code and module health findings are **advisories** and appear in the summary unless you use `--fail-on-advisory` to fail CI when any remain.
47
55
 
48
56
  ---
49
57
 
@@ -57,7 +65,7 @@ The problem is that formal verification has always been expensive — too slow,
57
65
 
58
66
  After enough hours pair-programming with coding agents, the same handful of mistakes show up over and over. They're not random bugs — they're systematic patterns that follow from how a language model writes code: optimize for the happy path, infer intent from limited context, finish the visible task, leave implicit assumptions implicit. SereneCode treats each one as a verification target rather than a code-review target.
59
67
 
60
- - **Skipping requirements without realizing it.** Given a spec with twelve requirements, the agent writes code that handles eight of them confidently and quietly omits the four it didn't see a clean place for. There's no error, no TODO just missing behavior. SereneCode's spec traceability (REQ-xxx tags, `Implements:` / `Verifies:` references, the `serenecode_orphans` and `serenecode_req_status` tools) makes it impossible for a requirement to be silently dropped: every REQ in SPEC.md must be both implemented and tested or it shows up as an orphan.
68
+ - **Skipping requirements or wiring integration points incorrectly.** Given a spec with twelve requirements, the agent writes code that handles eight of them confidently and quietly omits the four it didn't see a clean place for, or it writes the happy-path behavior but connects the wrong components together. SereneCode's spec traceability (`REQ-xxx` and `INT-xxx` tags, `Implements:` / `Verifies:` references, and the MCP status tools) makes it impossible for a requirement or declared integration to be silently dropped: every declared item in `SPEC.md` must be both implemented and tested or it shows up as missing coverage, and deeper verification checks whether tagged integrations are actually present.
61
69
 
62
70
  - **Happy-path tests only.** Asked to "add tests," the agent writes a handful of cases that walk the obvious path through the function. Edge cases (empty input, off-by-one boundaries, the exact threshold value, the negative number, the unicode string) are routinely missed because they require imagining what could go wrong. L3 coverage catches uncovered branches; L4 Hypothesis property testing generates inputs the agent never thought of and runs them against the contracts.
63
71
 
@@ -77,6 +85,8 @@ After enough hours pair-programming with coding agents, the same handful of mist
77
85
 
78
86
  - **Tests that pass but verify nothing.** A `def test_foo(): foo()` with no `assert` runs successfully and counts as "covered" — but it only checks that the function doesn't raise. L1's no-assertions-in-tests check fires on any `test_*` function with no `assert`, `pytest.raises`, `pytest.fail`, or `self.assertX` call.
79
87
 
88
+ - **Module bloat and god classes.** Agents accumulate code in a single module without splitting. A 2000-line file with 30 methods on one class compiles and tests pass, but it's unmaintainable. Module health checks flag files, functions, and classes that exceed configurable thresholds — advisory warnings when they're getting large, hard errors when they exceed the maximum. The agent gets concrete split suggestions derived from AST analysis: which classes could be standalone modules, which function groups share a prefix, and where banner comments suggest logical boundaries.
89
+
80
90
  - **Architectural drift.** Asked to "add a feature," the agent puts I/O in core, business logic in adapters, and circular imports between them. The system still works in tests because everything is loaded together — but the layering rule that made the code testable in the first place is gone. L6 compositional checks enforce dependency direction, interface compliance, and contract presence at module boundaries.
81
91
 
82
92
  None of these failures are unique to AI; humans make them too. What's unique is the *rate* at which an agent produces them and the *confidence* with which the agent reports the work as done. The structural checker, the contracts, the property tester, and the symbolic search exist to make each pattern impossible to ship without an explicit, reviewed override.
@@ -85,7 +95,7 @@ SereneCode is designed for **building new verifiable systems from scratch with A
85
95
 
86
96
  ### Choosing the Right Level
87
97
 
88
- The cost of verification should be proportional to the cost of a bug. Each level generates a different SERENECODE.md with different requirements for the AI, so the choice shapes how code is *written*, not just how it's checked. You make this choice during `serenecode init` — it cannot be changed after implementation starts.
98
+ The cost of verification should be proportional to the cost of a bug. Each level generates a different SERENECODE.md with different requirements for the AI, so the choice shapes how code is *written*, not just how it's checked. You make this choice during `serenecode init` — treat it as the project default; you can revise SERENECODE.md later if the team agrees (the init script warns against casual drift once coding is underway).
89
99
 
90
100
  | | **Minimal** (Level 2) | **Default** (Level 4) | **Strict** (Level 6) |
91
101
  |---|---|---|---|
@@ -97,6 +107,8 @@ The cost of verification should be proportional to the cost of a bug. Each level
97
107
 
98
108
  Pick the level that matches the stakes. Safety-critical code should start at Strict.
99
109
 
110
+ For package layout (Hypothesis strategies vs `*.core.models`), compositional `INT` `Kind: call` semantics (`call` vs `isinstance`, comma-separated targets), and other level-specific expectations, see [docs/VERIFICATION_LEVELS.md](docs/VERIFICATION_LEVELS.md).
111
+
100
112
  ---
101
113
 
102
114
  ## See It In Action: The Medical Dosage Calculator
@@ -130,13 +142,17 @@ The Serenecode dosage example currently passes `serenecode check src/ --level 6
130
142
 
131
143
  ## How It Works
132
144
 
145
+ For **day-to-day AI-assisted coding**, treat **The MCP Server** (§4 below) as the primary path: scoped tools first, full-project checks when you need them. Sections **2–3** describe the same pipeline via the **CLI**, which is ideal for **CI and one-shot `serenecode check` runs**.
146
+
133
147
  ### 1. Interactive Setup — `serenecode init`
134
148
 
135
- Run `serenecode init` and answer two questions:
149
+ Run `serenecode init` and answer the prompts:
150
+
151
+ **Spec:** Do you already have a spec, or will you write one with your coding assistant? Both options set up spec traceability with REQ-xxx requirement identifiers — the difference is the workflow your assistant follows.
136
152
 
137
- **Spec question:** Do you already have a spec, or will you write one with your coding assistant? Both options set up spec traceability with REQ-xxx requirement identifiers the difference is the workflow your assistant follows.
153
+ **Verification level:** Minimal (L2), Default (L4), or Strict (L6). This determines what conventions your SERENECODE.md will require. (The init text describes this as fixed once implementation has started; you can still edit files manually.)
138
154
 
139
- **Verification level:** Minimal (L2), Default (L4), or Strict (L6). This determines what conventions your SERENECODE.md will require and cannot be changed after implementation starts.
155
+ **MCP server:** Whether to print a one-time MCP registration snippet for your AI tool (recommended).
140
156
 
141
157
  ```bash
142
158
  serenecode init
@@ -148,14 +164,14 @@ This creates SERENECODE.md (project conventions including spec traceability) and
148
164
 
149
165
  A lightweight AST-based checker that validates code follows SERENECODE.md conventions in seconds. Missing a postcondition? No class invariant? No test file for a module? Caught before you waste time on heavy verification.
150
166
 
151
- L1 also catches AI-failure-mode patterns that compile and look correct but represent real bugs: stub residue (`pass`/`...`/`raise NotImplementedError` left as a function body), mutable default arguments, bare `assert` in non-test source, `print()` in core, dangerous calls (`eval`, `exec`, `pickle.loads`, `os.system`, `subprocess` with `shell=True`), `TODO`/`FIXME`/`XXX`/`HACK` markers in tracked files, tests with no assertions, silent exception handlers, and tautological postconditions. Each rule has a per-rule opt-out comment for legitimate exceptions; see SERENECODE.md "Code Quality Standards" for the full list.
167
+ L1 also catches AI-failure-mode patterns that compile and look correct but represent real bugs: stub residue (`pass`/`...`/`raise NotImplementedError` left as a function body), mutable default arguments, bare `assert` in non-test source, `print()` in core, dangerous calls (`eval`, `exec`, `pickle.loads`, `os.system`, `subprocess` with `shell=True`), `TODO`/`FIXME`/`XXX`/`HACK` markers in tracked files, tests with no assertions, silent exception handlers, tautological postconditions, and likely dead code. L1 additionally runs **module health checks**: files exceeding a configurable line threshold, functions that are too long, functions with too many parameters, and classes with too many methods generate warnings (advisory) or errors depending on severity. These target the common AI agent drift pattern of accumulating code in a single module without splitting. Dead-code and module health findings are advisory review items; each rule has a per-rule opt-out comment for legitimate exceptions. See SERENECODE.md "Code Quality Standards" and "Module Health" for the full list. Skip all module health checks with `--skip-module-health`.
152
168
 
153
169
  ```bash
154
- serenecode check src/ --structural # structural conventions
155
- serenecode check src/ --spec SPEC.md # + spec traceability
170
+ serenecode check src/ --structural # structural conventions + dead-code review
171
+ serenecode check src/ --spec SPEC.md # override auto-discovered SPEC.md
156
172
  ```
157
173
 
158
- The `--spec` flag verifies that every REQ in the spec has an `Implements: REQ-xxx` tag in the code and a `Verifies: REQ-xxx` tag in the tests. No requirement goes unimplemented or untested.
174
+ When a project-root `SPEC.md` is present, normal verification runs auto-detect it. Spec traceability verifies that every declared REQ and INT in the spec has matching `Implements:` and `Verifies:` references in the codebase. No declared requirement or integration point goes unimplemented or untested without being surfaced.
159
175
 
160
176
  ### 3. The Verifier — Deep Verification
161
177
 
@@ -168,7 +184,9 @@ A six-level verification pipeline that escalates from fast checks to full symbol
168
184
  | **L3** | Test coverage analysis | Seconds–minutes | coverage.py |
169
185
  | **L4** | Property-based testing | Seconds–minutes | Hypothesis |
170
186
  | **L5** | Symbolic search (bounded) | Minutes | CrossHair / Z3 |
171
- | **L6** | Cross-module verification | Seconds | Compositional analysis |
187
+ | **L6** | Cross-module verification | Seconds–minutes | Compositional analysis |
188
+
189
+ The compositional step alone is often quick, but a full `serenecode check … --level 6` run still executes L1–L5 first and can take minutes on a large tree.
172
190
 
173
191
  ```bash
174
192
  serenecode check src/ --level 6 --allow-code-execution # verify it
@@ -207,10 +225,16 @@ The same `serenecode mcp` stdio server works in Claude Code, Cursor, Cline, Cont
207
225
  | `serenecode_suggest_test` | Test scaffold for an uncovered function |
208
226
  | `serenecode_validate_spec` | Validate a SPEC.md is well-formed |
209
227
  | `serenecode_list_reqs` | List REQ-xxx identifiers in a SPEC.md |
228
+ | `serenecode_list_integrations` | List INT-xxx identifiers in a SPEC.md |
210
229
  | `serenecode_req_status` | Implementation/verification status of one REQ |
230
+ | `serenecode_integration_status` | Implementation/verification status of one INT |
211
231
  | `serenecode_orphans` | REQs with no implementation or no test |
232
+ | `serenecode_dead_code` | Likely dead-code findings that require user review |
233
+ | `serenecode_module_health` | Module health metrics (file size, function lengths, parameter counts, class sizes) for a single file — no verification needed |
234
+
235
+ Tool results mirror the CLI pipeline: structured payloads include **`passed`**, levels, **`verdict`**, and a **summary** with counts (including **`advisory_count`** for advisory findings like dead code and module health warnings). Paths and `project_root` are resolved on the host — they are not sandboxed to one workspace; see [docs/SECURITY.md](docs/SECURITY.md).
212
236
 
213
- **Read-only resources** the agent can fetch without "calling" anything: `serenecode://config` (active SerenecodeConfig as JSON), `serenecode://findings/last-run` (most recent CheckResponse from this server session), `serenecode://exempt-modules` (the exempt path patterns for the active config), `serenecode://reqs` (parsed REQ-xxx list from the project's SPEC.md).
237
+ **Read-only resources** the agent can fetch without "calling" anything: `serenecode://config` (active SerenecodeConfig as JSON), `serenecode://findings/last-run` (most recent CheckResponse from this server session), `serenecode://exempt-modules` (the exempt path patterns for the active config), `serenecode://reqs` (parsed REQ-xxx list from the project's SPEC.md), and `serenecode://integrations` (parsed INT-xxx metadata from the project's SPEC.md).
214
238
 
215
239
  The server-level `--allow-code-execution` flag mirrors the CLI: without it, Levels 3-6 tools return a structured error rather than importing project code. `serenecode init` writes a copy-pasteable MCP setup snippet into the generated CLAUDE.md so newly initialized projects ship with the registration command and recommended workflow. See SERENECODE.md "MCP Integration" for the full descriptions and the agent-side workflow.
216
240
 
@@ -229,13 +253,13 @@ serenecode init → spec mode + verification level
229
253
  claude mcp add serenecode -- uv run \ → register MCP server with the AI
230
254
  serenecode mcp --allow-code-execution tool (Claude Code, Cursor, ...)
231
255
  serenecode spec SPEC.md → validate spec is ready
232
- (REQ-xxx format, no gaps)
256
+ (REQ-xxx / INT-xxx format, no gaps)
233
257
 
234
258
  ─── inner loop (per function, driven by the agent through MCP) ──────────────
235
259
  AI reads SERENECODE.md + SPEC.md → conventions and what to build
236
260
  AI calls serenecode_suggest_contracts → derive @require/@ensure for the
237
261
  function it's about to write
238
- AI writes the function → with Implements: REQ-xxx tag
262
+ AI writes the function → with Implements: REQ-xxx / INT-xxx tags
239
263
  AI calls serenecode_check_function → L1-L4 scoped to that function
240
264
  AI reads structured findings → missing contracts, mutable
241
265
  defaults, weak postconditions,
@@ -246,19 +270,21 @@ AI fixes them and calls verify_fixed → confirms each finding is gone
246
270
 
247
271
  ─── outer loop (per feature, batch verification) ─────────────────────────────
248
272
  serenecode check src/ --spec SPEC.md → did the AI follow conventions?
249
- --structural all REQs covered?
273
+ --structural all REQs/INTs covered? dead code?
250
274
  serenecode check src/ --level 5 \ → deep verification: coverage,
251
275
  --allow-code-execution \ property testing, symbolic search
252
276
  --spec SPEC.md
253
277
  AI calls serenecode_orphans / → which REQs are unimplemented or
254
- serenecode_req_status untested?
278
+ serenecode_req_status / untested?
279
+ serenecode_integration_status / which INTs are missing or broken?
280
+ serenecode_dead_code which symbols need user review?
255
281
  AI fixes the gaps → adds implementations, tests,
256
- stronger contracts
257
- Repeat until verified → all REQs implemented + tested,
282
+ stronger contracts, reviewed cleanup
283
+ Repeat until verified → all REQs/INTs implemented + tested,
258
284
  no counterexamples within bounds
259
285
  ```
260
286
 
261
- The inner loop is what the MCP server enables. Before MCP, the agent had to finish writing, exit its turn, wait for `serenecode check` to run, parse the output, and iterate. With MCP, every function the agent writes gets validated *before* it moves to the next one — `serenecode_check_function` returns structured JSON in milliseconds, the agent fixes any findings inline, and only reports the overall task complete when the result is clean. This collapses an iteration loop that used to span multiple turns into a sequence of tool calls inside a single turn.
287
+ The inner loop is what the MCP server enables. Before MCP, the agent had to finish writing, exit its turn, wait for `serenecode check` to run, parse the output, and iterate. With MCP, every function the agent writes gets validated *before* it moves to the next one — `serenecode_check_function` returns structured JSON without a separate full-repo CLI round-trip (wall time is often seconds per call and grows with level and project size), the agent fixes any findings inline, and only reports the overall task complete when the result is clean. This collapses an iteration loop that used to span multiple turns into a sequence of tool calls inside a single turn.
262
288
 
263
289
  The outer loop still matters: cross-module compositional analysis, full coverage runs, and spec-traceability sweeps over the whole codebase aren't function-scoped, so they live at the batch level. The CLI handles those, and the same pipeline runs identically in CI.
264
290
 
@@ -287,13 +313,13 @@ The library API (`serenecode.check`) and the MCP server (`serenecode_check`, `se
287
313
 
288
314
  SereneCode isn't just a tool that *tells* you to write verified code. It *is* verified code.
289
315
 
290
- The SERENECODE.md convention file was the first artifact created — before any Python was written. The framework has been developed under those conventions with AI as a first-class contributor, and the repository continuously checks itself with:
316
+ Those convention templates were the first artifacts — before any Python was written. The framework has been developed under those conventions with AI as a first-class contributor, and the repository continuously checks itself with:
291
317
 
292
- - `pytest` across the full suite (currently 1,393 passing tests, 16 skipped)
318
+ - `pytest` across the full suite (on the order of 1,450+ passing tests; a small number skipped by design)
293
319
  - `mypy --strict` across `src/` and `examples/dosage-serenecode/src/`
294
320
  - SereneCode's own structural, type, property, symbolic, and compositional passes
295
321
 
296
- On the current tree, the bare CLI invocation `serenecode check src --level 6 --allow-code-execution` runs the full L1-L6 pipeline end-to-end against the framework's own source 718 functions checked, 557 passed, 161 exempt, 0 failures, ~6 minutes wall time. A separate integration test, `test_serenecode_repo_passes_strict_level_6`, runs the same source tree through `run_pipeline` with `strict_config()` and `start_level=4`, which strips every path-based exemption and forces every adapter, CLI handler, MCP tool, and `__init__.py` through L4-L6. SereneCode also passes that strict-config self-check end-to-end: 0 L1 findings across all 466 strict-checked functions, 0 L3 coverage gaps across the strict-checked subset (~3.5 minutes), and 0 L4-L6 findings. The exempt items in the default-config run include adapter modules (which handle I/O and are integration-tested), port interfaces (Protocols that define abstract contracts), CLI entry points, the MCP server package, and functions whose parameter types are too complex for automated strategy generation or symbolic execution. Exempt items are visible in the output — they are not silently omitted.
322
+ On the current tree, `serenecode check src --level 6 --allow-code-execution` runs the full L1L6 pipeline against the framework's own source; exact counts of passed / exempt / advisory rows change as the tree grows (wall time often several minutes for a deep run). A separate integration test, `test_serenecode_repo_passes_strict_level_6`, runs the same `src/` tree through `run_pipeline` with `strict_config()` and `start_level=4`, which strips path-based exemptions and forces adapters, CLI, MCP, and similar code through L4L6. The exempt items in a **default-config** run still include adapter modules, port `Protocol`s, CLI/MCP composition roots, and functions whose parameter types are poor fits for Hypothesis/CrossHair. Advisory items include dead-code findings and module health warnings (file length, function length, parameter count, class method count). Exempt and advisory rows stay visible in the output — they are not silently omitted.
297
323
 
298
324
  At Level 5, CrossHair and Z3 search for counterexamples across the codebase's symbolic-friendly contracted top-level functions. Functions with non-primitive parameters (custom dataclasses, Protocol implementations, Callable types) are reported as exempt because the solver cannot generate inputs for them. Level 6 adds structural compositional analysis: dependency direction, circular dependency detection, interface compliance, contract presence at module boundaries, aliased cross-module call resolution, and architectural invariants. Interface compliance follows explicit `Protocol` inheritance and checks substitutability, including extra required parameters and incompatible return annotations. Together, they provide both deep per-function verification and system-level structural guarantees — but the structural checks at L6 verify contract *presence*, not logical *sufficiency* across call chains.
299
325
 
@@ -302,11 +328,8 @@ At Level 5, CrossHair and Z3 search for counterexamples across the codebase's sy
302
328
  ## Quick Start
303
329
 
304
330
  ```bash
305
- # Install from PyPI (add the [mcp] extra to enable the MCP server).
306
- # Note: the MCP server ships in the next release; until it's published
307
- # to PyPI, install from the source checkout instead:
308
- # git clone https://github.com/helgster77/serenecode && cd serenecode
309
- # uv sync --extra mcp # or: pip install -e '.[mcp]'
331
+ # Install from PyPI with the [mcp] extra so `serenecode mcp` is available,
332
+ # or from a source checkout: uv sync --extra mcp / pip install -e '.[mcp]'
310
333
  pip install 'serenecode[mcp]'
311
334
 
312
335
  # Initialize — interactive setup (spec mode + verification level)
@@ -329,7 +352,7 @@ serenecode check src/ --spec SPEC.md --structural
329
352
  serenecode check src/ --level 5 --allow-code-execution --spec SPEC.md
330
353
  ```
331
354
 
332
- JSON output (via `--format json`) includes top-level `passed`, `level_requested`, and `level_achieved` fields alongside the summary and per-function results.
355
+ JSON output (via `--format json`) includes top-level `passed`, `level_requested`, and `level_achieved`, a `summary` object with counts (`passed`, `failed`, `skipped`, `exempt`, `advisory_count`, `verdict`), a `timestamp`, and per-function `results`. **`verdict`** is `complete`, `failed`, or `incomplete` (for example, skips or not reaching the requested level). Human-readable output includes the same verdict in the header line.
333
356
 
334
357
  When you verify a nested package or a single module, Serenecode preserves the package root and module-path context used by mypy, Hypothesis, CrossHair, and the architectural checks. That lets package-local absolute imports, relative imports, and scoped core-module rules behave the same way they do in project-wide runs.
335
358
 
@@ -337,23 +360,30 @@ When you verify a nested package or a single module, Serenecode preserves the pa
337
360
 
338
361
  ```bash
339
362
  serenecode init [<path>] # interactive setup
363
+ serenecode doctor # MCP optional install + registration hints; SPEC.md vs narrative files
340
364
  serenecode spec <SPEC.md> # validate spec readiness
341
365
  [--format human|json]
342
366
  serenecode check [<path>] [--level 1-6] [--allow-code-execution] # run verification
343
367
  [--spec SPEC.md] # spec traceability
368
+ [--project-root DIR] # repo root for imports + config
344
369
  [--format human|json] # output format
345
370
  [--structural] [--verify] # L1 only / L3-6 only
371
+ [--skip-module-health] # skip file/function/param/class size checks
372
+ [--fail-on-advisory] # exit 11 if advisories remain
346
373
  [--per-condition-timeout N] # L5 CrossHair budgets
347
374
  [--per-path-timeout N] [--module-timeout N] # (defaults: 30/10/300s)
348
- [--workers N] # L5 parallel workers
375
+ [--coverage-timeout N] # L3 pytest/coverage subprocess (default 600s)
376
+ [--workers N] # L5 parallel workers (cap 32)
349
377
  serenecode status [<path>] [--format human|json] # verification status
350
378
  serenecode report [<path>] [--format human|json|html] # generate reports
351
379
  [--output FILE] [--allow-code-execution] # write to file
352
380
  serenecode mcp [--allow-code-execution] # boot the MCP server
353
- [--project-root DIR] # over stdio
381
+ [--project-root DIR] # default root; stdio transport
354
382
  ```
355
383
 
356
- **Exit codes:** 0 = passed, 1 = structural, 2 = types, 3 = coverage, 4 = properties, 5 = symbolic, 6 = compositional, 10 = internal error or deep verification refused without explicit trust.
384
+ **Environment (optional):** `SERENECODE_MAX_WORKERS` overrides `--workers`; `SERENECODE_COVERAGE_TIMEOUT` overrides `--coverage-timeout`. **`SERENECODE_DEBUG=1`** logs subprocess environment **key names** (not values) when tools spawn mypy, pytest, CrossHair, etc. Details: [docs/SECURITY.md](docs/SECURITY.md).
385
+
386
+ **Exit codes:** 0 = passed (and no `--fail-on-advisory` violation), 1–6 = first failing verification level (structural … compositional), 10 = internal error or deep verification refused without `--allow-code-execution`, **11 = advisories remain with `--fail-on-advisory`** (dead code, module health warnings; checks otherwise passed).
357
387
 
358
388
  ---
359
389
 
@@ -365,11 +395,11 @@ SereneCode is honest about what it can and can't do:
365
395
 
366
396
  **Contracts are only as good as you write them.** A function with weak postconditions will pass verification even if the implementation is subtly wrong. SereneCode checks that contracts exist and hold, but can't check that they fully capture your intent. Tautological contracts like `lambda self: True` are now flagged by the conventions and should not be used — they provide no verification value.
367
397
 
368
- **Exempt items are visible, not hidden.** Modules exempt from structural checking (adapters, CLI, ports, MCP server, `__init__.py`) and functions excluded from deep verification (non-primitive parameter types, adapter code) are reported as "exempt" in the output rather than being silently omitted. This makes the verification scope transparent: the tool reports passed, failed, skipped, and exempt counts separately so you can see exactly what was and wasn't deeply verified. Previous versions silently omitted these, inflating the apparent scope.
398
+ **Exempt items are visible, not hidden.** Modules exempt from structural checking (adapters, CLI, ports, MCP server, `__init__.py`) and functions excluded from deep verification (non-primitive parameter types, adapter code) are reported as "exempt" in the output rather than being silently omitted. Advisory items (dead code, module health warnings) are also visible and counted separately. This makes the verification scope transparent: the tool reports passed, failed, skipped, exempt, and advisory counts separately so you can see exactly what was and wasn't deeply verified.
369
399
 
370
400
  **Runtime checks can be disabled.** icontract decorators are checked on every call by default, but can be disabled via environment variables for performance in production. This is a feature, not a bug — but it means runtime guarantees depend on configuration.
371
401
 
372
- **Not everything can be deeply verified.** Functions with complex domain-type parameters (custom dataclasses, Callable, Protocol implementations) are automatically excluded from L4/L5 because the tools cannot generate valid inputs for themthey show up as "exempt" in the output. See "Choosing the Right Level" above for guidance on which verification depth fits your system.
402
+ **Not everything can be deeply verified at L4 or L5.** Hypothesis (L4) and CrossHair (L5) skip or mark **exempt** targets when inputs cannot be synthesized, when signatures are a poor fit for the backend, or when modules are treated as adapters or composition roots not only because of “complex types.” Those rows appear as exempt, skipped, or failed preconditions in the output rather than as silent omissions. For L4 layout and strategy behavior, see [docs/VERIFICATION_LEVELS.md](docs/VERIFICATION_LEVELS.md). See "Choosing the Right Level" above for guidance on which verification depth fits your system.
373
403
 
374
404
  **Levels 3-6 execute your code.** Coverage analysis, property-based testing, and symbolic verification import project modules and run their top-level code as part of analysis. Module loading uses `compile()` + `exec()` on target source files and their transitive dependencies. There is no sandboxing or syscall filtering — a malicious `.py` file in the target directory gets full access to the host. Use `--allow-code-execution` or `allow_code_execution=True` only for code you trust. Subprocess-based backends (CrossHair, pytest/coverage) receive module paths and search paths from the source discovery layer; symlink-based directory traversal is blocked (`followlinks=False`), but the trust boundary ultimately relies on the `--allow-code-execution` gate.
375
405
 
@@ -391,7 +421,9 @@ CLI / Library API / MCP ← composition roots (interactive init, spec valida
391
421
 
392
422
  ├──▸ Pipeline ← orchestrates L1 → L2 → L3 → L4 → L5 → L6
393
423
  │ ├──▸ Structural Checker (ast)
394
- │ ├──▸ Spec Traceability (REQ-xxx → Implements/Verifies)
424
+ │ ├──▸ Spec Traceability (REQ-xxx / INT-xxx → Implements/Verifies)
425
+ │ ├──▸ Dead-Code Review (likely unused code → ask before removal)
426
+ │ ├──▸ Module Health (file/function/class size → advisory or error)
395
427
  │ ├──▸ Test Existence (test_<module>.py discovery)
396
428
  │ ├──▸ Type Checker (mypy)
397
429
  │ ├──▸ Coverage Analyzer (coverage.py)
@@ -406,6 +438,20 @@ CLI / Library API / MCP ← composition roots (interactive init, spec valida
406
438
 
407
439
  Core logic is pure. All I/O goes through Protocol-defined ports. The verification engine itself is verifiable.
408
440
 
441
+ ## Comparison: AWS Kiro and Spec Kit / Speckit
442
+
443
+ SereneCode overlaps with these tools on **spec-first, AI-assisted development**—all three push against unstructured “vibe coding.” The difference is **what each product optimizes for**.
444
+
445
+ **AWS Kiro** is an agentic AI IDE from AWS (Bedrock-powered), with spec-driven flows, autonomous agents, “powers,” and multi-repo work inside that environment. SereneCode is **not an IDE**: it is a **Python verification toolkit** (CLI + MCP) that plugs into editors you already use. Kiro optimizes **how you build in their stack**; SereneCode optimizes **repeatable assurance on Python code**—traceability from root `SPEC.md` IDs to implementation and tests, `icontract` contracts, mypy, coverage, Hypothesis, bounded CrossHair search, and compositional checks. The two are **complementary** if you use Kiro to author code but still want repo-local, machine-checkable verification on a Python tree.
446
+
447
+ **Spec Kit** (GitHub’s methodology) and ecosystem tools such as **Speckit** focus on **structured specification and phased delivery**—for example constitution, specify, plan, tasks, and implementation—with agents and templates driving the workflow. “Executable” there often means **process and artifact structure** that steers implementation. SereneCode adds a different meaning of executable: **contracts and checkers that run against your Python**, plus optional solvers and property tests, with explicit **REQ-xxx / INT-xxx** traceability to `Implements:` / `Verifies:` in code and tests. Spec Kit is typically **stack-agnostic**; SereneCode is **Python-specific** by design.
448
+
449
+ If you already use Spec Kit–style phases, SereneCode fits **after** the spec is stable enough to land as root `SPEC.md`—as the **verification layer** that answers whether the code and tests actually match what you declared.
450
+
451
+ ## Security and trust
452
+
453
+ Serenecode runs on **your machine** and, with `--allow-code-execution` (CLI or MCP), **imports and executes** project code—similar trust to running `pytest` or `python -m` on that tree. It is **not** a sandbox. Subprocesses receive a filtered environment to limit credential leakage; see [docs/SECURITY.md](docs/SECURITY.md) for the threat model, MCP behavior, `SERENECODE_DEBUG`, and CI exit code **11** with `--fail-on-advisory`.
454
+
409
455
  ## Disclaimer
410
456
 
411
457
  SereneCode is provided as-is, without warranty of any kind. It is a best-effort tool that helps surface defects through contracts, property-based testing, and bounded symbolic execution — but it cannot guarantee the absence of bugs. "No counterexample found" means the solver did not find one within its analysis bounds, not that none exists. Verification results depend on the quality of the contracts you write, the time budgets you configure, and the inherent limitations of the underlying tools.