PyPI - serenecode - Versions diffs - 0.1.1__tar.gz → 0.2.0__tar.gz - Mend

serenecode 0.1.1tar.gz → 0.2.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (128) hide show

serenecode-0.2.0/.env ADDED Viewed

	@@ -0,0 +1 @@
1	+ PYPI_TOKEN=pypi-AgEIcHlwaS5vcmcCJDcxN2IyN2ViLWM5YmUtNGI4OS1hNWVlLTkwNTk2ODBjOWE5OAACKlszLCJhOTJmN2Q5MS05MjExLTQxMjYtYTFkOC0wNzM0YWE5OWFmZTAiXQAABiAOdR3FRlN1mzfkEM-TmJ0bO3h7NjwlYmtjgePNyog0Wg

{serenecode-0.1.1 → serenecode-0.2.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: serenecode
-Version: 0.1.1
+Version: 0.2.0
 Summary: Verification framework for AI-generated Python — test coverage, property testing, and symbolic execution
 Project-URL: Homepage, https://github.com/helgster77/serenecode
 Project-URL: Repository, https://github.com/helgster77/serenecode
@@ -27,16 +27,8 @@ Requires-Dist: hypothesis>=6.0
 Requires-Dist: icontract>=2.7.0
 Requires-Dist: mypy>=1.0
 Provides-Extra: dev
-Requires-Dist: crosshair-tool>=0.0.60; extra == 'dev'
-Requires-Dist: hypothesis>=6.0; extra == 'dev'
-Requires-Dist: mypy>=1.0; extra == 'dev'
 Requires-Dist: pytest-cov>=4.0; extra == 'dev'
 Requires-Dist: pytest>=7.0; extra == 'dev'
-Provides-Extra: verify
-Requires-Dist: coverage>=7.0; extra == 'verify'
-Requires-Dist: crosshair-tool>=0.0.60; extra == 'verify'
-Requires-Dist: hypothesis>=6.0; extra == 'verify'
-Requires-Dist: mypy>=1.0; extra == 'verify'
 Description-Content-Type: text/markdown
 <p align="center">
@@ -45,9 +37,9 @@ Description-Content-Type: text/markdown
 <h3 align="center">A Framework for AI-Driven Development of Verifiable Systems</h3>
-SereneCode is a verification framework for AI-generated Python. It tells the AI *how* to write verifiable code, checks that the AI followed instructions, and then verifies the code at multiple levels — from test coverage analysis that catches gaps in AI-written tests, through property-based testing that checks contracts against hundreds of random inputs, to symbolic execution that uses an SMT solver to search for *any* input that breaks a contract. You choose the verification depth that matches your project: lightweight for internal tools, balanced for production systems, strict for safety-critical code. AI agents write code fast but can be suboptimal at testing their own work; SereneCode closes that gap by surfacing untested paths, generating test suggestions, and verifying behavior beyond what hand-written tests cover.
+SereneCode is a spec-to-verified-implementation framework for AI-generated Python. It ensures that every requirement in your spec is implemented, tested, and formally verified — closing the gap between what you asked for and what the AI built. The workflow starts from a spec with traceable requirements (REQ-xxx), enforces that the AI writes verifiable code with contracts and tests, then verifies at multiple levels — from structural checks and test coverage through property-based testing to symbolic execution with an SMT solver. You choose the verification depth during interactive setup: lightweight for internal tools, balanced for production systems, strict for safety-critical code. AI agents write code fast but can miss requirements and skip edge cases; SereneCode closes that gap with spec traceability, test-existence enforcement, and formal verification.
-> **This framework was bootstrapped with AI under its own rules.** SereneCode's SERENECODE.md was written before the first line of code, and the codebase has been developed under those conventions from the start. The current tree passes its own `serenecode check src --level 6 --allow-code-execution`, an internal strict-config Level 6 self-check in the test suite, `mypy src examples/dosage-serenecode/src`, the shipped example's strict Level 6 check, and the full `pytest` suite. The verification output is transparent about scope: exempt modules (adapters, CLI, ports) and functions excluded from deep verification (non-primitive parameter types) are reported as "exempt" rather than silently omitted.
+> **This framework was bootstrapped with AI under its own rules.** SereneCode's SERENECODE.md was written before the first line of code, and the codebase has been developed under those conventions from the start. The current tree passes its own `serenecode check src --level 6 --allow-code-execution`, an internal strict-config Level 6 self-check in the test suite, `mypy src examples/dosage-serenecode/src`, the shipped example's check, and the full `pytest` suite (769 passing tests, 16 skipped). The verification output is transparent about scope: exempt modules (adapters, CLI, ports) and functions excluded from deep verification (non-primitive parameter types) are reported as "exempt" rather than silently omitted.
 ---
@@ -61,17 +53,17 @@ SereneCode is designed for **building new verifiable systems from scratch with A
 ### Choosing the Right Level
-The cost of verification should be proportional to the cost of a bug. Each level generates a different SERENECODE.md with different requirements for the AI, so the choice shapes how code is *written*, not just how it's checked.
+The cost of verification should be proportional to the cost of a bug. Each level generates a different SERENECODE.md with different requirements for the AI, so the choice shapes how code is *written*, not just how it's checked. You make this choice during `serenecode init` — it cannot be changed after implementation starts.
-| | `--minimal` | **Default** | `--strict` |
+| | **Minimal** (Level 2) | **Default** (Level 4) | **Strict** (Level 6) |
 |---|---|---|---|
 | **Verifies through** | L2 (structure + types) | L4 (+ test coverage + properties) | L6 (+ symbolic + compositional) |
 | **What the AI must write** | Contracts on public functions, type annotations | + description strings, class invariants, hexagonal architecture | + contracts on *all* functions, loop invariants, domain exceptions, no exemptions |
 | **What catches bugs** | Runtime contract checks, mypy | + L3 surfaces untested code paths and generates test suggestions; L4 tests contracts against hundreds of random inputs | + SMT solver searches for *any* counterexample within analysis bounds |
-| **Good for** | Internal tools, scripts, prototypes, incremental adoption | Production APIs, business logic, data pipelines | Medical, financial, infrastructure, regulated systems |
+| **Good for** | Internal tools, scripts, prototypes | Production APIs, business logic, data pipelines | Medical, financial, infrastructure, regulated systems |
 | **The tradeoff** | Low ceremony, but contracts are only checked at the boundaries you wrote them | Moderate overhead; architecture rules keep core logic pure and testable | Significant overhead — every loop gets an invariant comment, every helper gets a contract. Justified when the cost of an undiscovered bug is measured in patient harm, financial loss, or regulatory failure |
-Pick the level that matches the stakes, and pick it early. Moving up later means retrofitting contracts, invariants, and architecture onto existing code — it's not just flipping a flag. Safety-critical code should be written for `--strict` from the first line.
+Pick the level that matches the stakes. Safety-critical code should start at Strict.
 ---
@@ -87,44 +79,49 @@ Both versions implement the same requirements, and the plain version passes its
 |---|---|---|
 | **Dose never exceeds maximum** | Covered by unit tests | Encoded as a postcondition; bounded symbolic search found no counterexample within analysis bounds |
 | **Renal adjustment never increases a dose** | Covered by unit tests | `result <= dose_mg` is an executable contract, not just a test expectation |
-| **Safety result is internally consistent** | No validation — you can construct `SafetyResult(total=9999, max=100, is_safe=True)` | Representation invariants make inconsistent `SafetyResult` states unconstructable |
-| **Objects are truly immutable** | `frozen=True` with mutable `set` on Drug | `_Frozen` mixin + immutable `tuple`/`frozenset` fields — fully locked down |
+| **Safety result is internally consistent** | No validation — you can construct `SafetyResult(total=9999, max=100, is_safe=True)` | Postcondition on `check_daily_safety` enforces `is_safe == (total <= max)` — inconsistent results cannot be produced through the contracted API |
+| **Objects are truly immutable** | `frozen=True` with mutable `set` on Drug | `frozen=True` with class invariants enforcing valid state — mutations raise `FrozenInstanceError` and invariants guarantee internal consistency |
 | **Boundary behavior (CrCl exactly 30.0)** | Covered by explicit tests | Boundary behavior is specified in contracts; bounded symbolic search found no counterexample |
 | **What if someone changes the code later?** | You rely on the tests you remembered to keep | Contracts stay attached to the code and keep checking every contracted call |
-| **Can a solver verify it?** | No executable specification for a solver to target | 120 executable contracts and a clean `serenecode check ... --level 6 --allow-code-execution` run |
+| **Can a solver verify it?** | No executable specification for a solver to target | 42 executable contracts and a clean `serenecode check ... --level 6 --allow-code-execution` run |
 | **Confidence in a safety-critical setting** | Better than ad hoc code, but still test-shaped confidence | Higher: behavior is formally specified, runtime-checked, and solver-checked within analysis bounds — but bounded search is not proof |
-The plain version relies on 59 tests that check specific scenarios. The SereneCode version adds 120 executable contracts across its domain models and core dosage logic. Those contracts define *what correct means* in code, get checked at runtime, and give CrossHair/Z3 something precise to search against when looking for counterexamples within analysis bounds.
+The plain version relies on 59 tests that check specific scenarios. The SereneCode version adds 42 executable contracts across its domain models and core dosage logic. Those contracts define *what correct means* in code, get checked at runtime, and give CrossHair/Z3 something precise to search against when looking for counterexamples within analysis bounds.
 > Both examples live in [`examples/dosage-regular/`](examples/dosage-regular/) and [`examples/dosage-serenecode/`](examples/dosage-serenecode/). Read them side by side.
-The Serenecode dosage example currently passes `serenecode check examples/dosage-serenecode/src --level 6 --allow-code-execution`. Its local `pytest` suite is also green with 74 passing tests.
+The Serenecode dosage example currently passes `serenecode check src/ --level 6 --allow-code-execution` from within the example directory. Its local `pytest` suite is also green with 67 passing tests.
 ---
 ## How It Works
-### 1. SERENECODE.md — Your AI Writes Code That's Built for Verification
+### 1. Interactive Setup — `serenecode init`
-A markdown file in your project root that tells AI coding agents exactly how to write code: what contracts to include, what architecture to follow, what patterns to use. When Claude Code (or another agent) reads this before generating code, it has a concrete target for producing verification-friendly output from the first keystroke.
+Run `serenecode init` and answer two questions:
+**Spec question:** Do you already have a spec, or will you write one with your coding assistant? Both options set up spec traceability with REQ-xxx requirement identifiers — the difference is the workflow your assistant follows.
+**Verification level:** Minimal (L2), Default (L4), or Strict (L6). This determines what conventions your SERENECODE.md will require and cannot be changed after implementation starts.
 ```bash
-serenecode init              # balanced defaults — contracts on public APIs, test coverage, hexagonal architecture
-serenecode init --strict     # maximum rigor — contracts on ALL functions (public and private), no exemptions
-serenecode init --minimal    # lightweight — public-function contracts only, relaxed architecture rules
+serenecode init
 ```
-This creates a SERENECODE.md tailored to your project and integrates with CLAUDE.md so Claude Code follows the conventions automatically. You write the rules once, and the agent has a stable spec to follow on every iteration.
+This creates SERENECODE.md (project conventions including spec traceability) and CLAUDE.md (instructions for your AI coding assistant) tailored to your answers. The conventions become the contract between you, your coding assistant, and the verification tool. SERENECODE.md includes instructions for converting raw specs into SereneCode format (REQ-xxx identifiers), validating them with `serenecode spec SPEC.md`, creating an implementation plan, and building from it — the coding agent handles this workflow automatically.
-### 2. The Checker — Instant Feedback
+### 2. The Checker — Structural Enforcement
-A lightweight AST-based linter that validates code follows SERENECODE.md conventions in seconds. Missing a postcondition? No class invariant? I/O imports in a core module? Caught before you waste time on heavy verification.
+A lightweight AST-based checker that validates code follows SERENECODE.md conventions in seconds. Missing a postcondition? No class invariant? No test file for a module? Caught before you waste time on heavy verification.
 ```bash
-serenecode check src/ --structural    # seconds
+serenecode check src/ --structural          # structural conventions
+serenecode check src/ --spec SPEC.md        # + spec traceability
 ```
-### 3. The Verifier — Symbolic Verification
+The `--spec` flag verifies that every REQ in the spec has an `Implements: REQ-xxx` tag in the code and a `Verifies: REQ-xxx` tag in the tests. No requirement goes unimplemented or untested.
+### 3. The Verifier — Deep Verification
 A six-level verification pipeline that escalates from fast checks to full symbolic verification:
@@ -141,7 +138,7 @@ A six-level verification pipeline that escalates from fast checks to full symbol
 serenecode check src/ --level 6 --allow-code-execution  # verify it
 ```
-**L3 Test Coverage** is where SereneCode checks that the AI's tests actually exercise the code it wrote. AI agents can be suboptimal at writing tests — they tend to cover the happy path, skip edge cases, and miss error branches. L3 runs your existing tests under coverage.py tracing, measures per-function line and branch coverage, and reports exactly which lines and branches are untested. For each coverage gap, it generates concrete test suggestions including mock necessity assessments: each dependency is classified as REQUIRED (external I/O — must mock) or OPTIONAL (internal code — consider using the real implementation). This gives the AI agent actionable feedback to improve its own tests rather than leaving coverage gaps undetected. When no tests exist for a module, L3 reports this as informational rather than failing, so the coverage level serves as a baseline measurement before L4 property testing generates new test inputs.
+**L3 Test Coverage** is where SereneCode checks that the AI's tests actually exercise the code it wrote. AI agents can be suboptimal at writing tests — they tend to cover the happy path, skip edge cases, and miss error branches. L3 runs your existing tests under coverage.py tracing, measures per-function line and branch coverage, and reports exactly which lines and branches are untested. For each coverage gap, it generates concrete test suggestions including mock necessity assessments: each dependency is classified as REQUIRED (external I/O — must mock) or OPTIONAL (internal code — consider using the real implementation). This gives the AI agent actionable feedback to improve its own tests rather than leaving coverage gaps undetected. When no tests exist for a module, L3 reports this as a failure — missing tests must be written. At L1, the structural checker also verifies that every non-exempt source module has a corresponding `test_<module>.py` file.
 The full pipeline is thorough but not instant. Larger systems will take longer, and the deepest runs may surface skipped items when Hypothesis cannot synthesize valid values for complex domain types or when CrossHair hits its time budget. By default, L5 focuses on contracted top-level functions defined in each module and skips modules or signatures that are currently poor fits for direct symbolic execution, such as adapter/composition-root code, helper predicate modules, and object-heavy APIs. Not everything needs L5/L6. Critical paths get full symbolic and compositional verification. Utility functions get property testing. A Level 4 run only counts as achieved when at least one contracted property target was actually exercised.
@@ -153,19 +150,21 @@ Scoped targets keep their package/import context across verification levels. In
 ## The AI Agent Loop
-SereneCode is designed for AI agents that write code and fix their own mistakes:
+SereneCode is designed for spec-driven development with AI agents:
 ```
-AI reads SERENECODE.md           → knows how to write verification-ready code
-AI generates code with contracts → postconditions, input preconditions, invariants
-serenecode check --structural                        → instant: did the AI follow the rules?
-serenecode check --level 5 --allow-code-execution   → deep: can the solver find any counterexample?
-AI reads counterexamples         → "input x=[-1] violates postcondition"
-AI fixes the code                → adjusts implementation or contract
-Repeat until verified            → no counterexample found, not just tested
+serenecode init                  → interactive setup: spec mode + verification level
+serenecode spec SPEC.md          → validate spec is ready (REQ-xxx format, no gaps)
+AI reads SERENECODE.md + SPEC.md → knows the conventions and what to build
+AI implements from spec          → Implements: REQ-xxx in docstrings, contracts, tests
+serenecode check src/ --spec SPEC.md --structural   → did the AI follow rules? all REQs covered?
+serenecode check src/ --level 5 --allow-code-execution --spec SPEC.md   → deep verification
+AI reads findings                → missing REQs, counterexamples, untested paths
+AI fixes the code                → adjusts implementation, adds tests, closes gaps
+Repeat until verified            → all REQs implemented + tested + no counterexamples
 ```
-AI-generated code won't always pass verification on the first try — and that's the point. SereneCode gives the coding agent structured feedback on exactly what failed and why: counterexamples, violated contracts, and suggested fixes. The agent uses that feedback to iterate until the code passes. The value isn't in one-shotting perfection — it's in the loop that converges on verified correctness.
+AI-generated code won't always pass verification on the first try — and that's the point. SereneCode gives the coding agent structured feedback on exactly what failed and why: missing requirement implementations, counterexamples, violated contracts, untested modules, and suggested fixes. When there are many findings, SereneCode suggests the agent spawn subagents to address groups of related issues in parallel. **The value isn't in one-shotting perfection — it's in the loop that converges on verified completeness and correctness.**
 Works in Claude Code, works in the terminal, works in CI:
@@ -190,7 +189,7 @@ SereneCode isn't just a tool that *tells* you to write verified code. It *is* ve
 The SERENECODE.md convention file was the first artifact created — before any Python was written. The framework has been developed under those conventions with AI as a first-class contributor, and the repository continuously checks itself with:
-- `pytest` across the full suite (currently 651 passing tests, 16 skipped)
+- `pytest` across the full suite (currently 769 passing tests, 16 skipped)
 - `mypy --strict` across `src/` and `examples/dosage-serenecode/src/`
 - SereneCode's own structural, type, property, symbolic, and compositional passes
@@ -206,26 +205,32 @@ At Level 5, CrossHair and Z3 search for counterexamples across the codebase's sy
 # Install from PyPI
 pip install serenecode
-# Initialize a project with conventions
+# Initialize — interactive setup (spec mode + verification level)
 serenecode init
-# Let your AI agent write code following SERENECODE.md...
-# Then verify:
-serenecode check src/ --structural
+# Place your spec in the project directory, then start a coding session.
+# Your agent reads SERENECODE.md, converts the spec to REQ-xxx format,
+# validates it, creates an implementation plan, and builds from it.
+# Verify structure + spec traceability:
+serenecode check src/ --spec SPEC.md --structural
-# Or go deep:
-serenecode check src/core/ --level 5 --allow-code-execution --format json
+# Go deep — test coverage, property testing, symbolic verification:
+serenecode check src/ --level 5 --allow-code-execution --spec SPEC.md
 ```
-JSON output includes top-level `passed`, `level_requested`, and `level_achieved` fields alongside the summary and per-function results.
+JSON output (via `--format json`) includes top-level `passed`, `level_requested`, and `level_achieved` fields alongside the summary and per-function results.
-When you verify a nested package or a single module, Serenecode now preserves the package root and module-path context used by mypy, Hypothesis, CrossHair, and the architectural checks. That lets package-local absolute imports, relative imports, and scoped core-module rules behave the same way they do in project-wide runs.
+When you verify a nested package or a single module, Serenecode preserves the package root and module-path context used by mypy, Hypothesis, CrossHair, and the architectural checks. That lets package-local absolute imports, relative imports, and scoped core-module rules behave the same way they do in project-wide runs.
 ## CLI Reference
 ```bash
-serenecode init [<path>] [--strict | --minimal]                         # set up conventions
+serenecode init [<path>]                                                # interactive setup
+serenecode spec <SPEC.md>                                               # validate spec readiness
+                [--format human|json]
 serenecode check [<path>] [--level 1-6] [--allow-code-execution]        # run verification
+                          [--spec SPEC.md]                              #   spec traceability
                           [--format human|json]                         #   output format
                           [--structural] [--verify]                     #   L1 only / L3-6 only
                           [--per-condition-timeout N]                   #   L5 CrossHair budgets
@@ -269,10 +274,12 @@ SereneCode is honest about what it can and can't do:
 SereneCode follows hexagonal architecture — the same pattern it enforces on your code:
 ```
-CLI / Library API           ← composition roots
+CLI / Library API           ← composition roots (interactive init, spec validation)
     │
     ├──▸ Pipeline           ← orchestrates L1 → L2 → L3 → L4 → L5 → L6
     │       ├──▸ Structural Checker    (ast)
+    │       ├──▸ Spec Traceability     (REQ-xxx → Implements/Verifies)
+    │       ├──▸ Test Existence        (test_<module>.py discovery)
     │       ├──▸ Type Checker          (mypy)
     │       ├──▸ Coverage Analyzer     (coverage.py)
     │       ├──▸ Property Tester       (Hypothesis)

{serenecode-0.1.1 → serenecode-0.2.0}/README.md RENAMED Viewed

@@ -4,9 +4,9 @@
 <h3 align="center">A Framework for AI-Driven Development of Verifiable Systems</h3>
-SereneCode is a verification framework for AI-generated Python. It tells the AI *how* to write verifiable code, checks that the AI followed instructions, and then verifies the code at multiple levels — from test coverage analysis that catches gaps in AI-written tests, through property-based testing that checks contracts against hundreds of random inputs, to symbolic execution that uses an SMT solver to search for *any* input that breaks a contract. You choose the verification depth that matches your project: lightweight for internal tools, balanced for production systems, strict for safety-critical code. AI agents write code fast but can be suboptimal at testing their own work; SereneCode closes that gap by surfacing untested paths, generating test suggestions, and verifying behavior beyond what hand-written tests cover.
+SereneCode is a spec-to-verified-implementation framework for AI-generated Python. It ensures that every requirement in your spec is implemented, tested, and formally verified — closing the gap between what you asked for and what the AI built. The workflow starts from a spec with traceable requirements (REQ-xxx), enforces that the AI writes verifiable code with contracts and tests, then verifies at multiple levels — from structural checks and test coverage through property-based testing to symbolic execution with an SMT solver. You choose the verification depth during interactive setup: lightweight for internal tools, balanced for production systems, strict for safety-critical code. AI agents write code fast but can miss requirements and skip edge cases; SereneCode closes that gap with spec traceability, test-existence enforcement, and formal verification.
-> **This framework was bootstrapped with AI under its own rules.** SereneCode's SERENECODE.md was written before the first line of code, and the codebase has been developed under those conventions from the start. The current tree passes its own `serenecode check src --level 6 --allow-code-execution`, an internal strict-config Level 6 self-check in the test suite, `mypy src examples/dosage-serenecode/src`, the shipped example's strict Level 6 check, and the full `pytest` suite. The verification output is transparent about scope: exempt modules (adapters, CLI, ports) and functions excluded from deep verification (non-primitive parameter types) are reported as "exempt" rather than silently omitted.
+> **This framework was bootstrapped with AI under its own rules.** SereneCode's SERENECODE.md was written before the first line of code, and the codebase has been developed under those conventions from the start. The current tree passes its own `serenecode check src --level 6 --allow-code-execution`, an internal strict-config Level 6 self-check in the test suite, `mypy src examples/dosage-serenecode/src`, the shipped example's check, and the full `pytest` suite (769 passing tests, 16 skipped). The verification output is transparent about scope: exempt modules (adapters, CLI, ports) and functions excluded from deep verification (non-primitive parameter types) are reported as "exempt" rather than silently omitted.
 ---
@@ -20,17 +20,17 @@ SereneCode is designed for **building new verifiable systems from scratch with A
 ### Choosing the Right Level
-The cost of verification should be proportional to the cost of a bug. Each level generates a different SERENECODE.md with different requirements for the AI, so the choice shapes how code is *written*, not just how it's checked.
+The cost of verification should be proportional to the cost of a bug. Each level generates a different SERENECODE.md with different requirements for the AI, so the choice shapes how code is *written*, not just how it's checked. You make this choice during `serenecode init` — it cannot be changed after implementation starts.
-| | `--minimal` | **Default** | `--strict` |
+| | **Minimal** (Level 2) | **Default** (Level 4) | **Strict** (Level 6) |
 |---|---|---|---|
 | **Verifies through** | L2 (structure + types) | L4 (+ test coverage + properties) | L6 (+ symbolic + compositional) |
 | **What the AI must write** | Contracts on public functions, type annotations | + description strings, class invariants, hexagonal architecture | + contracts on *all* functions, loop invariants, domain exceptions, no exemptions |
 | **What catches bugs** | Runtime contract checks, mypy | + L3 surfaces untested code paths and generates test suggestions; L4 tests contracts against hundreds of random inputs | + SMT solver searches for *any* counterexample within analysis bounds |
-| **Good for** | Internal tools, scripts, prototypes, incremental adoption | Production APIs, business logic, data pipelines | Medical, financial, infrastructure, regulated systems |
+| **Good for** | Internal tools, scripts, prototypes | Production APIs, business logic, data pipelines | Medical, financial, infrastructure, regulated systems |
 | **The tradeoff** | Low ceremony, but contracts are only checked at the boundaries you wrote them | Moderate overhead; architecture rules keep core logic pure and testable | Significant overhead — every loop gets an invariant comment, every helper gets a contract. Justified when the cost of an undiscovered bug is measured in patient harm, financial loss, or regulatory failure |
-Pick the level that matches the stakes, and pick it early. Moving up later means retrofitting contracts, invariants, and architecture onto existing code — it's not just flipping a flag. Safety-critical code should be written for `--strict` from the first line.
+Pick the level that matches the stakes. Safety-critical code should start at Strict.
 ---
@@ -46,44 +46,49 @@ Both versions implement the same requirements, and the plain version passes its
 |---|---|---|
 | **Dose never exceeds maximum** | Covered by unit tests | Encoded as a postcondition; bounded symbolic search found no counterexample within analysis bounds |
 | **Renal adjustment never increases a dose** | Covered by unit tests | `result <= dose_mg` is an executable contract, not just a test expectation |
-| **Safety result is internally consistent** | No validation — you can construct `SafetyResult(total=9999, max=100, is_safe=True)` | Representation invariants make inconsistent `SafetyResult` states unconstructable |
-| **Objects are truly immutable** | `frozen=True` with mutable `set` on Drug | `_Frozen` mixin + immutable `tuple`/`frozenset` fields — fully locked down |
+| **Safety result is internally consistent** | No validation — you can construct `SafetyResult(total=9999, max=100, is_safe=True)` | Postcondition on `check_daily_safety` enforces `is_safe == (total <= max)` — inconsistent results cannot be produced through the contracted API |
+| **Objects are truly immutable** | `frozen=True` with mutable `set` on Drug | `frozen=True` with class invariants enforcing valid state — mutations raise `FrozenInstanceError` and invariants guarantee internal consistency |
 | **Boundary behavior (CrCl exactly 30.0)** | Covered by explicit tests | Boundary behavior is specified in contracts; bounded symbolic search found no counterexample |
 | **What if someone changes the code later?** | You rely on the tests you remembered to keep | Contracts stay attached to the code and keep checking every contracted call |
-| **Can a solver verify it?** | No executable specification for a solver to target | 120 executable contracts and a clean `serenecode check ... --level 6 --allow-code-execution` run |
+| **Can a solver verify it?** | No executable specification for a solver to target | 42 executable contracts and a clean `serenecode check ... --level 6 --allow-code-execution` run |
 | **Confidence in a safety-critical setting** | Better than ad hoc code, but still test-shaped confidence | Higher: behavior is formally specified, runtime-checked, and solver-checked within analysis bounds — but bounded search is not proof |
-The plain version relies on 59 tests that check specific scenarios. The SereneCode version adds 120 executable contracts across its domain models and core dosage logic. Those contracts define *what correct means* in code, get checked at runtime, and give CrossHair/Z3 something precise to search against when looking for counterexamples within analysis bounds.
+The plain version relies on 59 tests that check specific scenarios. The SereneCode version adds 42 executable contracts across its domain models and core dosage logic. Those contracts define *what correct means* in code, get checked at runtime, and give CrossHair/Z3 something precise to search against when looking for counterexamples within analysis bounds.
 > Both examples live in [`examples/dosage-regular/`](examples/dosage-regular/) and [`examples/dosage-serenecode/`](examples/dosage-serenecode/). Read them side by side.
-The Serenecode dosage example currently passes `serenecode check examples/dosage-serenecode/src --level 6 --allow-code-execution`. Its local `pytest` suite is also green with 74 passing tests.
+The Serenecode dosage example currently passes `serenecode check src/ --level 6 --allow-code-execution` from within the example directory. Its local `pytest` suite is also green with 67 passing tests.
 ---
 ## How It Works
-### 1. SERENECODE.md — Your AI Writes Code That's Built for Verification
+### 1. Interactive Setup — `serenecode init`
-A markdown file in your project root that tells AI coding agents exactly how to write code: what contracts to include, what architecture to follow, what patterns to use. When Claude Code (or another agent) reads this before generating code, it has a concrete target for producing verification-friendly output from the first keystroke.
+Run `serenecode init` and answer two questions:
+**Spec question:** Do you already have a spec, or will you write one with your coding assistant? Both options set up spec traceability with REQ-xxx requirement identifiers — the difference is the workflow your assistant follows.
+**Verification level:** Minimal (L2), Default (L4), or Strict (L6). This determines what conventions your SERENECODE.md will require and cannot be changed after implementation starts.
 ```bash
-serenecode init              # balanced defaults — contracts on public APIs, test coverage, hexagonal architecture
-serenecode init --strict     # maximum rigor — contracts on ALL functions (public and private), no exemptions
-serenecode init --minimal    # lightweight — public-function contracts only, relaxed architecture rules
+serenecode init
 ```
-This creates a SERENECODE.md tailored to your project and integrates with CLAUDE.md so Claude Code follows the conventions automatically. You write the rules once, and the agent has a stable spec to follow on every iteration.
+This creates SERENECODE.md (project conventions including spec traceability) and CLAUDE.md (instructions for your AI coding assistant) tailored to your answers. The conventions become the contract between you, your coding assistant, and the verification tool. SERENECODE.md includes instructions for converting raw specs into SereneCode format (REQ-xxx identifiers), validating them with `serenecode spec SPEC.md`, creating an implementation plan, and building from it — the coding agent handles this workflow automatically.
-### 2. The Checker — Instant Feedback
+### 2. The Checker — Structural Enforcement
-A lightweight AST-based linter that validates code follows SERENECODE.md conventions in seconds. Missing a postcondition? No class invariant? I/O imports in a core module? Caught before you waste time on heavy verification.
+A lightweight AST-based checker that validates code follows SERENECODE.md conventions in seconds. Missing a postcondition? No class invariant? No test file for a module? Caught before you waste time on heavy verification.
 ```bash
-serenecode check src/ --structural    # seconds
+serenecode check src/ --structural          # structural conventions
+serenecode check src/ --spec SPEC.md        # + spec traceability
 ```
-### 3. The Verifier — Symbolic Verification
+The `--spec` flag verifies that every REQ in the spec has an `Implements: REQ-xxx` tag in the code and a `Verifies: REQ-xxx` tag in the tests. No requirement goes unimplemented or untested.
+### 3. The Verifier — Deep Verification
 A six-level verification pipeline that escalates from fast checks to full symbolic verification:
@@ -100,7 +105,7 @@ A six-level verification pipeline that escalates from fast checks to full symbol
 serenecode check src/ --level 6 --allow-code-execution  # verify it
 ```
-**L3 Test Coverage** is where SereneCode checks that the AI's tests actually exercise the code it wrote. AI agents can be suboptimal at writing tests — they tend to cover the happy path, skip edge cases, and miss error branches. L3 runs your existing tests under coverage.py tracing, measures per-function line and branch coverage, and reports exactly which lines and branches are untested. For each coverage gap, it generates concrete test suggestions including mock necessity assessments: each dependency is classified as REQUIRED (external I/O — must mock) or OPTIONAL (internal code — consider using the real implementation). This gives the AI agent actionable feedback to improve its own tests rather than leaving coverage gaps undetected. When no tests exist for a module, L3 reports this as informational rather than failing, so the coverage level serves as a baseline measurement before L4 property testing generates new test inputs.
+**L3 Test Coverage** is where SereneCode checks that the AI's tests actually exercise the code it wrote. AI agents can be suboptimal at writing tests — they tend to cover the happy path, skip edge cases, and miss error branches. L3 runs your existing tests under coverage.py tracing, measures per-function line and branch coverage, and reports exactly which lines and branches are untested. For each coverage gap, it generates concrete test suggestions including mock necessity assessments: each dependency is classified as REQUIRED (external I/O — must mock) or OPTIONAL (internal code — consider using the real implementation). This gives the AI agent actionable feedback to improve its own tests rather than leaving coverage gaps undetected. When no tests exist for a module, L3 reports this as a failure — missing tests must be written. At L1, the structural checker also verifies that every non-exempt source module has a corresponding `test_<module>.py` file.
 The full pipeline is thorough but not instant. Larger systems will take longer, and the deepest runs may surface skipped items when Hypothesis cannot synthesize valid values for complex domain types or when CrossHair hits its time budget. By default, L5 focuses on contracted top-level functions defined in each module and skips modules or signatures that are currently poor fits for direct symbolic execution, such as adapter/composition-root code, helper predicate modules, and object-heavy APIs. Not everything needs L5/L6. Critical paths get full symbolic and compositional verification. Utility functions get property testing. A Level 4 run only counts as achieved when at least one contracted property target was actually exercised.
@@ -112,19 +117,21 @@ Scoped targets keep their package/import context across verification levels. In
 ## The AI Agent Loop
-SereneCode is designed for AI agents that write code and fix their own mistakes:
+SereneCode is designed for spec-driven development with AI agents:
 ```
-AI reads SERENECODE.md           → knows how to write verification-ready code
-AI generates code with contracts → postconditions, input preconditions, invariants
-serenecode check --structural                        → instant: did the AI follow the rules?
-serenecode check --level 5 --allow-code-execution   → deep: can the solver find any counterexample?
-AI reads counterexamples         → "input x=[-1] violates postcondition"
-AI fixes the code                → adjusts implementation or contract
-Repeat until verified            → no counterexample found, not just tested
+serenecode init                  → interactive setup: spec mode + verification level
+serenecode spec SPEC.md          → validate spec is ready (REQ-xxx format, no gaps)
+AI reads SERENECODE.md + SPEC.md → knows the conventions and what to build
+AI implements from spec          → Implements: REQ-xxx in docstrings, contracts, tests
+serenecode check src/ --spec SPEC.md --structural   → did the AI follow rules? all REQs covered?
+serenecode check src/ --level 5 --allow-code-execution --spec SPEC.md   → deep verification
+AI reads findings                → missing REQs, counterexamples, untested paths
+AI fixes the code                → adjusts implementation, adds tests, closes gaps
+Repeat until verified            → all REQs implemented + tested + no counterexamples
 ```
-AI-generated code won't always pass verification on the first try — and that's the point. SereneCode gives the coding agent structured feedback on exactly what failed and why: counterexamples, violated contracts, and suggested fixes. The agent uses that feedback to iterate until the code passes. The value isn't in one-shotting perfection — it's in the loop that converges on verified correctness.
+AI-generated code won't always pass verification on the first try — and that's the point. SereneCode gives the coding agent structured feedback on exactly what failed and why: missing requirement implementations, counterexamples, violated contracts, untested modules, and suggested fixes. When there are many findings, SereneCode suggests the agent spawn subagents to address groups of related issues in parallel. **The value isn't in one-shotting perfection — it's in the loop that converges on verified completeness and correctness.**
 Works in Claude Code, works in the terminal, works in CI:
@@ -149,7 +156,7 @@ SereneCode isn't just a tool that *tells* you to write verified code. It *is* ve
 The SERENECODE.md convention file was the first artifact created — before any Python was written. The framework has been developed under those conventions with AI as a first-class contributor, and the repository continuously checks itself with:
-- `pytest` across the full suite (currently 651 passing tests, 16 skipped)
+- `pytest` across the full suite (currently 769 passing tests, 16 skipped)
 - `mypy --strict` across `src/` and `examples/dosage-serenecode/src/`
 - SereneCode's own structural, type, property, symbolic, and compositional passes
@@ -165,26 +172,32 @@ At Level 5, CrossHair and Z3 search for counterexamples across the codebase's sy
 # Install from PyPI
 pip install serenecode
-# Initialize a project with conventions
+# Initialize — interactive setup (spec mode + verification level)
 serenecode init
-# Let your AI agent write code following SERENECODE.md...
-# Then verify:
-serenecode check src/ --structural
+# Place your spec in the project directory, then start a coding session.
+# Your agent reads SERENECODE.md, converts the spec to REQ-xxx format,
+# validates it, creates an implementation plan, and builds from it.
+# Verify structure + spec traceability:
+serenecode check src/ --spec SPEC.md --structural
-# Or go deep:
-serenecode check src/core/ --level 5 --allow-code-execution --format json
+# Go deep — test coverage, property testing, symbolic verification:
+serenecode check src/ --level 5 --allow-code-execution --spec SPEC.md
 ```
-JSON output includes top-level `passed`, `level_requested`, and `level_achieved` fields alongside the summary and per-function results.
+JSON output (via `--format json`) includes top-level `passed`, `level_requested`, and `level_achieved` fields alongside the summary and per-function results.
-When you verify a nested package or a single module, Serenecode now preserves the package root and module-path context used by mypy, Hypothesis, CrossHair, and the architectural checks. That lets package-local absolute imports, relative imports, and scoped core-module rules behave the same way they do in project-wide runs.
+When you verify a nested package or a single module, Serenecode preserves the package root and module-path context used by mypy, Hypothesis, CrossHair, and the architectural checks. That lets package-local absolute imports, relative imports, and scoped core-module rules behave the same way they do in project-wide runs.
 ## CLI Reference
 ```bash
-serenecode init [<path>] [--strict | --minimal]                         # set up conventions
+serenecode init [<path>]                                                # interactive setup
+serenecode spec <SPEC.md>                                               # validate spec readiness
+                [--format human|json]
 serenecode check [<path>] [--level 1-6] [--allow-code-execution]        # run verification
+                          [--spec SPEC.md]                              #   spec traceability
                           [--format human|json]                         #   output format
                           [--structural] [--verify]                     #   L1 only / L3-6 only
                           [--per-condition-timeout N]                   #   L5 CrossHair budgets
@@ -228,10 +241,12 @@ SereneCode is honest about what it can and can't do:
 SereneCode follows hexagonal architecture — the same pattern it enforces on your code:
 ```
-CLI / Library API           ← composition roots
+CLI / Library API           ← composition roots (interactive init, spec validation)
     │
     ├──▸ Pipeline           ← orchestrates L1 → L2 → L3 → L4 → L5 → L6
     │       ├──▸ Structural Checker    (ast)
+    │       ├──▸ Spec Traceability     (REQ-xxx → Implements/Verifies)
+    │       ├──▸ Test Existence        (test_<module>.py discovery)
     │       ├──▸ Type Checker          (mypy)
     │       ├──▸ Coverage Analyzer     (coverage.py)
     │       ├──▸ Property Tester       (Hypothesis)

{serenecode-0.1.1 → serenecode-0.2.0}/SERENECODE.md RENAMED Viewed

@@ -311,7 +311,7 @@ Every meaningful code change in this project MUST come with verification. Writin
 ### Verification Tiers by Module Type
-**Pure core modules** (`core/`, `checker/`, `models.py`, `contracts/`, `config.py`, `reporter.py`, `source_discovery.py`) should remain friendly to Serenecode's full pipeline:
+**Pure core modules** (`core/`, `checker/`, `models.py`, `contracts/`, `config.py`, `reporter.py`) should remain friendly to Serenecode's full pipeline:
 1. Structural check — required contracts present on public functions and classes.
 2. `mypy --strict` — zero errors.
 3. Test coverage analysis through Serenecode's coverage adapter.
@@ -319,6 +319,8 @@ Every meaningful code change in this project MUST come with verification. Writin
 5. Symbolic verification through CrossHair for symbolic-friendly contracted top-level functions within analysis bounds.
 6. Example-based unit tests for edge cases, boundary conditions, regressions, and behavior that is important but awkward for automated strategy generation.
+**Infrastructure modules** (`source_discovery.py`) use filesystem operations to locate and prepare source files. They are not pure core modules but should maintain contracts and full test coverage.
 **Adapter and composition-root modules** (`adapters/`, `cli.py`, `__init__.py`, `init.py`) must pass:
 1. `mypy --strict` — zero errors.
 2. Integration or end-to-end tests that exercise real file system, subprocess, and CLI behavior.
@@ -473,6 +475,7 @@ Steps 1-3 may be done together, but steps 4-8 MUST NOT be skipped or deferred.
 The following modules are exempt from full contract requirements due to their nature:
 - `cli.py` — Thin CLI layer, tested via integration tests.
 - `__init__.py` — Composition roots, tested via integration tests.
+- `init.py` — Composition root for project initialization, tested via e2e tests.
 - `adapters/` — I/O boundary code, tested via integration tests.
 - `templates/` — Static markdown files, not code.
 - `tests/fixtures/` — Intentionally broken or incomplete code used for testing the checker.

serenecode-0.2.0/examples/dosage-serenecode/CLAUDE.md ADDED Viewed

@@ -0,0 +1,51 @@
+## Serenecode (Strict Mode)
+All code in this project MUST follow the standards defined in SERENECODE.md. Read SERENECODE.md before writing or modifying any code. Every function — public and private — with caller-supplied inputs must have icontract preconditions, and every function must have postconditions. Every class must have invariants. No exemptions.
+### Verification
+After each work iteration (implementing a feature, fixing a bug, refactoring), you MUST run verification before considering the task complete. Do not skip this.
+**Quick structural check (seconds):**
+```bash
+serenecode check src/ --structural
+```
+**Full verification with property testing (minutes):**
+```bash
+serenecode check src/ --level 4 --allow-code-execution
+```
+**Full verification including symbolic and compositional (minutes):**
+```bash
+serenecode check src/ --level 6 --allow-code-execution
+```
+**Spec traceability check:**
+```bash
+serenecode check src/ --spec SPEC.md
+```
+Levels 3-6 import and execute project modules. Only use `--allow-code-execution` for trusted code.
+If verification fails, read the error messages and fix the issues. Each failure includes the function name, file, line number, and a suggested fix. Iterate until all checks pass. Do not commit code that fails verification.
+### Testing
+You MUST write tests for every function. Do not skip this.
+- Unit tests for core functions in `tests/unit/`
+- Integration tests for adapters in `tests/integration/`
+- Property-based tests (Hypothesis) for pure functions
+Run `pytest -q` before considering any task complete. Do not commit code without passing tests.
+### Spec-Driven Workflow
+This project has an existing spec document. Follow the Spec Traceability section in SERENECODE.md for the full workflow. The key steps are:
+1. Read the existing spec and SERENECODE.md before writing any code.
+2. If the spec is not already in SereneCode format (REQ-xxx headings), convert it into SPEC.md following the "Preparing a SereneCode-Ready Spec" instructions in SERENECODE.md. Validate with `serenecode spec SPEC.md`.
+3. Create an implementation plan mapping each REQ to functions, modules, and contracts. Get user approval before writing code.
+4. Implement and tag with `Implements: REQ-xxx`. Test and tag with `Verifies: REQ-xxx`.
+5. Run `serenecode check src/ --spec SPEC.md` to verify full traceability.

serenecode 0.1.1__tar.gz → 0.2.0__tar.gz

serenecode 0.1.1tar.gz → 0.2.0tar.gz