PyPI - pen-stack - Versions diffs - 3.4.0__tar.gz → 4.0.1__tar.gz - Mend

pen-stack 3.4.0tar.gz → 4.0.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (275) hide show

{pen_stack-3.4.0 → pen_stack-4.0.1}/CHANGELOG.md RENAMED Viewed

@@ -3,6 +3,45 @@
 All notable changes to PEN-STACK are documented here. This file follows
 [Keep a Changelog](https://keepachangelog.com/) and the program's phase structure.
+## [4.0.1] - 2026-06-09 - data-correctness patch: writer-verification panel verified against Perry 2025
+### Fixed
+- **WS-WV frozen panel is now verbatim from the measured Perry 2025 ISCro4 DMS.** The offline-fallback panel
+  in `atlas/writer_verify.py` previously used *illustrative* Z-scores (2.6/2.1/1.7) and invented control
+  variants (G15D/P88R/L120E), and `_CORE_RESIDUES` used illustrative arginines. Replaced with the REAL values
+  from `science.adz0276` Table S3: the top-3 enhancers **N322P (Z 0.754), H50K (0.742), R278M (0.709)**, real
+  near-neutral variants (V21R, S312Q, G286T), the most-deleterious variants (R132E −5.40, R137E −5.12,
+  R195D −4.98), and the documented catalytic residues **D11/E60/D102/D105/S241** ("Residue Groups" sheet). The
+  real-DMS path (on the VM/Drive) was already correct; only the offline fallback constants were illustrative.
+  Added `test_ws_wv.py::test_frozen_panel_matches_real_perry_dms_table_s3` to guard against drift.
+## [4.0.0] - 2026-06-09 - v4.0 release: the Oracle Mesh (on top of the foundation models) + writer verification
+A major bump: the substrate now *composes* the biomolecular foundation models under one contract and verifies
+the writer enzyme itself. Workstreams WS-{O,WV,ATLAS}, each SHA-locked. No de-novo writer invention — score
+and critique only (the pen-assemble lesson).
+### Added
+- **WS-O - the oracle mesh.** `pen_stack/oracles/` with `OracleResult{value, provenance(model+version),
+  native_uncertainty, scope_card, in_scope, extrapolating, output_kind, available, cached}`. Adapters:
+  `genome.py` (AlphaGenome OOD-gated; Evo2 likelihood=claim / generation=candidate; ChromBPNet·Borzoi
+  baseline), `structure.py` (AlphaFold3/Boltz-2/Chai-1/Protenix + `consensus()` that widens the interval on
+  cross-oracle disagreement), `protein_design.py` (RFdiffusion/ProteinMPNN/ESM3 - all candidates), `rna.py`
+  (ViennaRNA - real, hard fold-legality), `energetics.py` (bridge off-target, MC3 gate ≥0.77).
+  `configs/oracles/scope_cards.yaml` (11 models); deterministic version-pinned `oracle_cache/`. Guard:
+  generative candidate `as_claim()` raises. `docs/oracles.md`; `prereg/ws_o.yaml`.
+- **WS-WV - writer verification.** `pen_stack/atlas/writer_verify.py`: DMS- + structure-grounded variant
+  scoring (measured=claimable, unmeasured=not), `blind_recovery` recovers N322P/H50K/R278M above
+  measured-worse controls, and `critique_candidate` (fold/active-site/deliverable/reachable) wired into
+  `verify()` as `Verdict.writer_critique` - always `no_claim=True`. `docs/writer_verification.md`;
+  `prereg/ws_wv.yaml`.
+- **WS-ATLAS - mesh upgrade + delivery oracle.** `wgenome/mesh_features.py` (OOD-gated feature hook + honest
+  blind re-validation reporting parity vs v3.x when oracles are deferred) + a computable
+  `delivery.aav_packaging_margin` soft rule (titre drops near the AAV capsid limit). `prereg/ws_atlas.yaml`.
+### Changed
+- Version 3.4.0 -> 4.0.0; `Verdict` gains `writer_critique`; M1 + writer-verification note + M2 updates.
 ## [3.4.0] - 2026-06-09 - v3.4 release: the Environment (train/eval surface + bench v0.3 + outcome-calibration)
 v3.4 turns the thin Gym interface into a full environment an AI agent can be trained and graded in, ships

{pen_stack-3.4.0 → pen_stack-4.0.1}/CITATION.cff RENAMED Viewed

@@ -1,7 +1,7 @@
 cff-version: 1.2.0
 message: "If you use PEN-STACK, please cite it as below."
 title: "PEN-STACK: open infrastructure for genome writing"
-version: 3.4.0
+version: 4.0.1
 date-released: 2026-06-01
 authors:
   - family-names: "Mahaboob Ali"

{pen_stack-3.4.0 → pen_stack-4.0.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: pen-stack
-Version: 3.4.0
+Version: 4.0.1
 Summary: Open infrastructure for genome writing: the Writable Genome atlas, the Writer Atlas, and the Write Planner.
 Author-email: Anees Ahmed Mahaboob Ali <ahmedaneesm@gmail.com>
 License: MIT
@@ -89,8 +89,8 @@ and durably write new DNA, **which enzyme** can write it there, and **how** to d
 [![codecov](https://codecov.io/gh/ahmedanees-m/pen-stack/branch/main/graph/badge.svg)](https://codecov.io/gh/ahmedanees-m/pen-stack)
 [![License: MIT](https://img.shields.io/badge/License-MIT-informational.svg)](LICENSE)
 [![Python 3.11+](https://img.shields.io/badge/python-3.11%2B-blue.svg)](https://www.python.org/)
-[![Version](https://img.shields.io/badge/version-3.4.0-blue.svg)](CHANGELOG.md)
-[![Tests](https://img.shields.io/badge/tests-190%20passing-success.svg)](tests/)
+[![Version](https://img.shields.io/badge/version-4.0.1-blue.svg)](CHANGELOG.md)
+[![Tests](https://img.shields.io/badge/tests-208%20passing-success.svg)](tests/)
 [![Lint: ruff](https://img.shields.io/badge/lint-ruff-purple.svg)](https://github.com/astral-sh/ruff)
 [![Runtime: Docker](https://img.shields.io/badge/runtime-docker-2496ED.svg)](docker/)
 [![Validation: pre-registered](https://img.shields.io/badge/validation-pre--registered-critical.svg)](prereg/)
@@ -133,6 +133,25 @@ Two questions gate every genome-writing project, and before PEN-STACK no resourc
 Everything is built on bulk-downloadable public data, runs on a single GPU, and is validated **blind** against
 a pre-registered, honest baseline before release.
+## What is new in v4.0 — the Oracle Mesh (sitting on top of the foundation models)
+v4.0 makes PEN-STACK the **composition + verification layer over the biomolecular foundation models**. It
+wraps AlphaGenome, Evo2, AlphaFold3, Boltz-2, Chai-1, Protenix, ESM3, RFdiffusion and ProteinMPNN under one
+contract that carries each model's provenance, native uncertainty, and a **scope card** stating what it is
+valid for — then routes their outputs through the rule-grounded verifier and the calibrated trust layer. A
+generated sequence or structure is always a **candidate to be checked, never a claim**. For the writer enzyme
+itself, v4.0 builds **verification, not invention**: proposed/variant writers are scored against measured DMS
+data and predicted structure, recovering known enhanced variants blind and refusing to assert activity for
+anything unsupported.
+| Workstream | What it adds | Result |
+|---|---|---|
+| **O — the oracle mesh** | `pen_stack/oracles/` — `OracleResult{value, provenance(model+version), native_uncertainty, scope_card, output_kind}`; adapters for genome / structure / protein-design / RNA / energetics; deterministic version-pinned cache | one contract; **generative output = candidate** (`as_claim()` raises — the pen-assemble lesson in code); AlphaGenome **OOD-gated**; cross-oracle **disagreement widens the interval**; ViennaRNA + energetics real |
+| **WV — writer verification** | `atlas/writer_verify.py` — DMS- + structure-grounded variant scoring; candidate **critique** wired into `verify()` | recovers the known enhancers (**N322P / H50K / R278M**) above measured-worse controls; unmeasured variants flagged, **not claimable**; a generated writer is critiqued (fold/active-site/deliverable/reachable), **never returned as a working pen** |
+| **ATLAS — mesh + delivery oracle** | `wgenome/mesh_features.py` (OOD-gated feature hook + honest blind re-validation) + a computable **AAV packaging-margin** delivery rule | atlas re-validation reports **parity** vs v3.x when oracles are deferred (delta 0.0, never hidden); titre-margin flag fires near the AAV capsid limit; immunogenicity magnitude stays a scope flag |
+See `docs/oracles.md`, `docs/writer_verification.md`, and `prereg/ws_{o,wv,atlas}.yaml`.
 ## What is new in v3.4 — the Environment (a place to train and grade genome-writing AI)
 v3.4 makes PEN-STACK the surface an AI agent can be **trained and graded** in, the counterpart to v3.3's
@@ -377,8 +396,9 @@ pen-stack/
 │   │                                   + v3.2 offtarget_energetics (position x substitution; held-out 0.88, ships)
 │   ├── agent/                        agentic platform: tools / orchestrator / pen_agent / mcp_server / guardrails
 │   │                                   + v3.2 epistemic (3-tier status) / scope (known-unknowns matcher)
+│   ├── oracles/                      v4.0 L1 oracle mesh: OracleResult contract + adapters (genome/structure/protein_design/rna/energetics) over the foundation models; version-pinned cache
 │   ├── rules/                        v3.3 machine-readable rules engine (schema/evaluators/loader/solver) over configs/rules/*.yaml
-│   ├── verify/                       v3.3 verification service: verify(design) -> Verdict (legal+reasons+confidence+scope)
+│   ├── verify/                       v3.3 verification service: verify(design) -> Verdict (legal+reasons+confidence+scope; v4.0 writer_critique)
 │   ├── adapt/                        local recalibration / private-data adaptation behind a gate (v3.1, WS-F)
 │   ├── env/                          v3.4 full Gymnasium environment over router+verifier (genome_writing_env + policies; [env] extra)
 │   ├── monitor/                      PEN-MONITOR living database (Europe PMC)

{pen_stack-3.4.0 → pen_stack-4.0.1}/README.md RENAMED Viewed

@@ -14,8 +14,8 @@ and durably write new DNA, **which enzyme** can write it there, and **how** to d
 [![codecov](https://codecov.io/gh/ahmedanees-m/pen-stack/branch/main/graph/badge.svg)](https://codecov.io/gh/ahmedanees-m/pen-stack)
 [![License: MIT](https://img.shields.io/badge/License-MIT-informational.svg)](LICENSE)
 [![Python 3.11+](https://img.shields.io/badge/python-3.11%2B-blue.svg)](https://www.python.org/)
-[![Version](https://img.shields.io/badge/version-3.4.0-blue.svg)](CHANGELOG.md)
-[![Tests](https://img.shields.io/badge/tests-190%20passing-success.svg)](tests/)
+[![Version](https://img.shields.io/badge/version-4.0.1-blue.svg)](CHANGELOG.md)
+[![Tests](https://img.shields.io/badge/tests-208%20passing-success.svg)](tests/)
 [![Lint: ruff](https://img.shields.io/badge/lint-ruff-purple.svg)](https://github.com/astral-sh/ruff)
 [![Runtime: Docker](https://img.shields.io/badge/runtime-docker-2496ED.svg)](docker/)
 [![Validation: pre-registered](https://img.shields.io/badge/validation-pre--registered-critical.svg)](prereg/)
@@ -58,6 +58,25 @@ Two questions gate every genome-writing project, and before PEN-STACK no resourc
 Everything is built on bulk-downloadable public data, runs on a single GPU, and is validated **blind** against
 a pre-registered, honest baseline before release.
+## What is new in v4.0 — the Oracle Mesh (sitting on top of the foundation models)
+v4.0 makes PEN-STACK the **composition + verification layer over the biomolecular foundation models**. It
+wraps AlphaGenome, Evo2, AlphaFold3, Boltz-2, Chai-1, Protenix, ESM3, RFdiffusion and ProteinMPNN under one
+contract that carries each model's provenance, native uncertainty, and a **scope card** stating what it is
+valid for — then routes their outputs through the rule-grounded verifier and the calibrated trust layer. A
+generated sequence or structure is always a **candidate to be checked, never a claim**. For the writer enzyme
+itself, v4.0 builds **verification, not invention**: proposed/variant writers are scored against measured DMS
+data and predicted structure, recovering known enhanced variants blind and refusing to assert activity for
+anything unsupported.
+| Workstream | What it adds | Result |
+|---|---|---|
+| **O — the oracle mesh** | `pen_stack/oracles/` — `OracleResult{value, provenance(model+version), native_uncertainty, scope_card, output_kind}`; adapters for genome / structure / protein-design / RNA / energetics; deterministic version-pinned cache | one contract; **generative output = candidate** (`as_claim()` raises — the pen-assemble lesson in code); AlphaGenome **OOD-gated**; cross-oracle **disagreement widens the interval**; ViennaRNA + energetics real |
+| **WV — writer verification** | `atlas/writer_verify.py` — DMS- + structure-grounded variant scoring; candidate **critique** wired into `verify()` | recovers the known enhancers (**N322P / H50K / R278M**) above measured-worse controls; unmeasured variants flagged, **not claimable**; a generated writer is critiqued (fold/active-site/deliverable/reachable), **never returned as a working pen** |
+| **ATLAS — mesh + delivery oracle** | `wgenome/mesh_features.py` (OOD-gated feature hook + honest blind re-validation) + a computable **AAV packaging-margin** delivery rule | atlas re-validation reports **parity** vs v3.x when oracles are deferred (delta 0.0, never hidden); titre-margin flag fires near the AAV capsid limit; immunogenicity magnitude stays a scope flag |
+See `docs/oracles.md`, `docs/writer_verification.md`, and `prereg/ws_{o,wv,atlas}.yaml`.
 ## What is new in v3.4 — the Environment (a place to train and grade genome-writing AI)
 v3.4 makes PEN-STACK the surface an AI agent can be **trained and graded** in, the counterpart to v3.3's
@@ -302,8 +321,9 @@ pen-stack/
 │   │                                   + v3.2 offtarget_energetics (position x substitution; held-out 0.88, ships)
 │   ├── agent/                        agentic platform: tools / orchestrator / pen_agent / mcp_server / guardrails
 │   │                                   + v3.2 epistemic (3-tier status) / scope (known-unknowns matcher)
+│   ├── oracles/                      v4.0 L1 oracle mesh: OracleResult contract + adapters (genome/structure/protein_design/rna/energetics) over the foundation models; version-pinned cache
 │   ├── rules/                        v3.3 machine-readable rules engine (schema/evaluators/loader/solver) over configs/rules/*.yaml
-│   ├── verify/                       v3.3 verification service: verify(design) -> Verdict (legal+reasons+confidence+scope)
+│   ├── verify/                       v3.3 verification service: verify(design) -> Verdict (legal+reasons+confidence+scope; v4.0 writer_critique)
 │   ├── adapt/                        local recalibration / private-data adaptation behind a gate (v3.1, WS-F)
 │   ├── env/                          v3.4 full Gymnasium environment over router+verifier (genome_writing_env + policies; [env] extra)
 │   ├── monitor/                      PEN-MONITOR living database (Europe PMC)

pen_stack-4.0.1/configs/oracles/scope_cards.yaml ADDED Viewed

@@ -0,0 +1,114 @@
+# PEN-STACK v4.0 — oracle scope cards (WS-O0). What each wrapped foundation model is VALID for, and what it
+# is NOT — so the substrate can gate and label outputs (the field's evidence that these models do not
+# generalize to unseen loci is made legible here, not hidden). `output_kind`: claim (a checkable prediction),
+# candidate (a generative proposal that must pass writer-verification), baseline (an honest comparator).
+version: "1.0"
+oracles:
+  alphagenome:
+    family: genome
+    version: "2025.1"
+    output_kind: claim
+    valid_for: "regulatory-track + variant-effect prediction at IN-DISTRIBUTION loci (trained tracks/tissues)"
+    not_valid_for: "unseen loci / cell types outside training; does NOT generalize to novel regulatory contexts"
+    generalizes_to_unseen_loci: false
+    license: "non-commercial (Google DeepMind terms)"
+  evo2:
+    family: genome
+    version: "40b-2025"
+    output_kind: candidate            # generative DNA + likelihood; sequences are proposals, never claims
+    valid_for: "genomic sequence likelihood / zero-shot variant scoring; generative DNA candidates"
+    not_valid_for: "accessibility/expression QTLs; quantitative regulatory tracks; asserting a sequence WORKS"
+    generalizes_to_unseen_loci: false
+    license: "Apache-2.0 (Arc Institute)"
+  chrombpnet_borzoi:
+    family: genome
+    version: "borzoi-2024"
+    output_kind: baseline             # kept as an honest comparator to AlphaGenome
+    valid_for: "accessibility / expression baseline tracks (honest comparator)"
+    not_valid_for: "variant effects beyond trained assays"
+    generalizes_to_unseen_loci: false
+    license: "open"
+  alphafold3:
+    family: structure
+    version: "3.0-2024"
+    output_kind: claim
+    valid_for: "protein / protein-NA complex structure at confidence (pLDDT/PAE) within trained fold space"
+    not_valid_for: "absolute binding free energies; novel folds far from the PDB; in-vivo behaviour"
+    generalizes_to_unseen_loci: true   # structure prediction is not locus-bound
+    license: "non-commercial weights (DeepMind terms)"
+  boltz-2:
+    family: structure
+    version: "2.0-2025"
+    output_kind: claim
+    valid_for: "structure + binding-affinity prediction (open weights); cross-oracle consistency comparator"
+    not_valid_for: "guaranteed affinities; designs outside trained chemical space"
+    generalizes_to_unseen_loci: true
+    license: "MIT"
+  chai-1:
+    family: structure
+    version: "1.0-2024"
+    output_kind: claim
+    valid_for: "structure prediction; cross-oracle self-consistency"
+    not_valid_for: "absolute affinities; far-OOD complexes"
+    generalizes_to_unseen_loci: true
+    license: "Apache-2.0"
+  protenix:
+    family: structure
+    version: "0.5-2025"
+    output_kind: claim
+    valid_for: "AF3-style structure prediction (open); cross-oracle self-consistency"
+    not_valid_for: "absolute affinities; far-OOD complexes"
+    generalizes_to_unseen_loci: true
+    license: "Apache-2.0"
+  esm3:
+    family: protein_design
+    version: "sm-2024"
+    output_kind: candidate
+    valid_for: "protein representation + generative protein design CANDIDATES; variant likelihoods"
+    not_valid_for: "asserting a designed protein FOLDS or is ACTIVE without verification"
+    generalizes_to_unseen_loci: true
+    license: "non-commercial / community"
+  rfdiffusion:
+    family: protein_design
+    version: "aa-2024"
+    output_kind: candidate
+    valid_for: "backbone generation CANDIDATES (RFdiffusion / RFdiffusion-AA)"
+    not_valid_for: "asserting function; a backbone is a proposal, not a working enzyme"
+    generalizes_to_unseen_loci: true
+    license: "open (BSD-style)"
+  proteinmpnn:
+    family: protein_design
+    version: "ligandmpnn-2024"
+    output_kind: candidate
+    valid_for: "sequence design for a fixed backbone CANDIDATES (ProteinMPNN / LigandMPNN)"
+    not_valid_for: "asserting activity/specificity; must be scored against measured data"
+    generalizes_to_unseen_loci: true
+    license: "MIT"
+  viennarna:
+    family: rna
+    version: "2.6"
+    output_kind: claim
+    valid_for: "RNA secondary-structure MFE fold (a HARD legality input for bridge-RNA QC)"
+    not_valid_for: "tertiary structure; in-cell folding kinetics"
+    generalizes_to_unseen_loci: true
+    license: "open"
+  bridge_energetics:
+    family: energetics
+    version: "v3.2-mc3"
+    output_kind: claim
+    valid_for: "bridge IS110/ISCro4 off-target relative-risk ranking (beats the 0.77 position-weight baseline)"
+    not_valid_for: "absolute off-target rates; non-bridge writers; a non-recombining background"
+    generalizes_to_unseen_loci: false
+    license: "open (this work)"

{pen_stack-3.4.0 → pen_stack-4.0.1}/configs/rules/delivery.yaml RENAMED Viewed

@@ -29,6 +29,15 @@ rules:
     provenance: { doi: ["10.1089/hum.2017.084"], note: "v3.2 MC2 delivery_constraints scan" }
     test_ref: "tests/unit/test_ws_r.py::test_delivery_controls"
     scope: "labeled heuristic, directional; not a titre predictor"
+  - id: delivery.aav_packaging_margin
+    kind: soft_penalty
+    category: delivery
+    mechanism: "AAV packaging efficiency / titre drops sharply as the cargo approaches the capsid limit (computable from cargo_bp vs vehicle capacity), even while still under capacity (v4.0 delivery-oracle refinement)"
+    evaluator: delivery_aav_packaging
+    param: { margin_frac: 0.9 }
+    provenance: { doi: ["10.1089/hum.2010.245"], note: "AAV genome-size vs packaging-efficiency relationship" }
+    test_ref: "tests/unit/test_ws_atlas.py::test_aav_packaging_margin"
+    scope: "computable efficiency margin, directional; not a titre predictor"
   - id: delivery.immunogenicity_magnitude
     kind: scope_flag
     category: delivery

pen_stack-4.0.1/docs/oracles.md ADDED Viewed

@@ -0,0 +1,51 @@
+# The oracle mesh (v4.0, WS-O)
+PEN-STACK v4.0 sits **on top of** the biomolecular foundation models. `pen_stack.oracles` wraps them under one
+contract so their outputs can be composed, checked by the rule-grounded verifier, and trust-calibrated —
+without losing provenance, native uncertainty, or scope.
+## One contract: `OracleResult`
+Every adapter returns an `OracleResult`:
+```
+OracleResult{oracle, value, provenance{model, version, source, cache_key},
+             native_uncertainty, scope_card, in_scope, extrapolating,
+             output_kind ∈ {claim, candidate, baseline}, available, cached}
+```
+Three invariants are encoded in the type:
+1. **A generative output is a candidate, never a claim.** `output_kind="candidate"` (Evo2 generation, ESM3,
+   RFdiffusion, ProteinMPNN) → `as_claim()` **raises**. A candidate must pass writer-verification (WS-WV)
+   before any claim. (The pen-assemble lesson — 0 validatable de-novo writers — encoded in code.)
+2. **One contract for every oracle.** Provenance (model + version) and the model's *native* uncertainty are
+   always carried; every call is cache-keyed on `(oracle, model, version, inputs)` and replayable offline.
+3. **Scope is explicit.** Each result carries its scope-card id and an `extrapolating` flag; the field's
+   evidence that these models do not generalize to unseen loci is **labelled**, not hidden.
+## Wrapped models (scope cards in `configs/oracles/scope_cards.yaml`)
+| Family | Models | Output kind |
+|---|---|---|
+| `genome` | AlphaGenome (OOD-gated), Evo2 (likelihood=claim / generation=candidate), ChromBPNet·Borzoi (baseline) | claim / candidate / baseline |
+| `structure` | AlphaFold3, Boltz-2, Chai-1, Protenix + `consistency()` | claim |
+| `protein_design` | ESM3, RFdiffusion(-AA), ProteinMPNN·LigandMPNN | **candidate** |
+| `rna` | ViennaRNA (real; hard fold-legality input) | claim |
+| `energetics` | bridge off-target (MC3 gate ≥ 0.77) | claim |
+## Cross-oracle consistency
+`structure.consistency(seq)` runs the available structure predictors and combines them with `consensus()`:
+agreement is a confidence signal, and **disagreement widens the reported interval** (`native_uncertainty`
+grows with the cross-oracle spread) — v4.0 Principle 3.
+## Compute / offline policy
+Heavy backends (AF3, Evo2, ESM3, …) run on-demand (hosted API / local GPU) and are cached + version-pinned
+under `oracle_cache/` (committed for offline CI). When a backend and a cache entry are both absent, the
+adapter returns a **deferred** result (`available=False`) — it never fabricates a value. ViennaRNA and the
+bridge energetics model are real and run locally / on the VM.
+See `docs/writer_verification.md` (scoring/critiquing writers through the mesh), `prereg/ws_o.yaml`, and
+`pen_stack/oracles/`.

pen_stack-4.0.1/docs/writer_verification.md ADDED Viewed

@@ -0,0 +1,46 @@
+# Writer verification (v4.0, WS-WV) — the honest "better pen"
+PEN-STACK does **not** invent writer enzymes de-novo. The earlier `pen-assemble` direction produced **0
+validatable de-novo writers** and could not be checked computationally. v4.0 builds the honest alternative:
+**score and critique** proposed or variant writers against measured data — never assert novel function.
+## WV1 — variant scoring (DMS- + structure-grounded)
+`pen_stack.atlas.writer_verify.score_variants(variants)` returns, per variant, a **calibrated activity score**
+in [0, 1] with an interval and a scope flag:
+- A variant present in the **Perry-2025 ISCro4 deep mutational scan** is scored from its *measured* activity
+  Z-score → `claimable=True`, finite interval.
+- A variant **out of the DMS distribution** → `extrapolating=True`, `claimable=False` — a plausibility screen
+  only, **no activity claim**.
+`blind_recovery()` is the deterministic retrospective criterion: ranking the documented panel, the known
+enhancers **N322P / H50K / R278M** land on top, above the measured-worse controls. `real_dms_recovery()`
+reports the same against the full Perry DMS when it is present (on the VM). This is a *catalogue* feature that
+recovers known enhancers — **not** a blind sequence-only predictor, and stated as such.
+## WV2 — critique, not invention
+`critique_candidate(seq, ...)` takes a **generated** candidate writer (e.g. from `oracles.protein_design`,
+which only ever returns `candidate` outputs) and critiques it:
+| Check | Source |
+|---|---|
+| folds? | structure oracle (`oracles.structure.consistency`) — deferred → flagged, never asserted |
+| plausible active site? | retains conserved core residues (heuristic) |
+| deliverable form? | the rule-grounded verifier (delivery rules) |
+| reachable target? | the rule-grounded verifier (reachability rules) |
+It returns `pass`/flags + reasons, and **always** `no_claim=True`, `claimable=False`. A generated writer is
+never returned as "a working new pen." The verifier surfaces this as `Verdict.writer_critique` (a scope flag,
+never a confidence): the legality of the write *plan* and the activity *claim* for the candidate are distinct
+axes — a legal plan does not make an unverified candidate claimable.
+## Scope & honesty
+- Deep DMS exists for **few** enzymes (bridge recombinases); elsewhere WV1 is a labelled plausibility screen.
+- Generative designs are **proposals**; they are scored/critiqued, never asserted active.
+- Structure verification is deferred without an AF3/Boltz/Chai/Protenix backend or a committed cache entry —
+  the candidate is then explicitly *fold-unverified*, not assumed to fold.
+See `docs/oracles.md` (the mesh), `prereg/ws_wv.yaml`, and `pen_stack/atlas/writer_verify.py`.

{pen_stack-3.4.0 → pen_stack-4.0.1}/pen_stack/__init__.py RENAMED Viewed

@@ -1,2 +1,2 @@
 """PEN-STACK v3.0 - open infrastructure for genome writing."""
-__version__ = "3.4.0"
+__version__ = "4.0.1"

pen_stack-4.0.1/pen_stack/atlas/writer_verify.py ADDED Viewed

@@ -0,0 +1,167 @@
+"""Writer-verification branch (v4.0, WS-WV) — the honest "better pen".
+We do **not** invent writer enzymes de-novo (pen-assemble produced 0 validatable de-novo writers and could not
+be checked computationally). We **score and critique** proposed/variant writers against measured data:
+* **WV1 — variant scoring head.** Combine the MEASURED DMS effect (Perry-2025 ISCro4 deep mutational scan,
+  via the existing `atlas.variant_propose` model) with structural plausibility (the structure oracle) into a
+  *calibrated* activity score + interval + scope flag. On held-out variants it ranks measured-better above
+  measured-worse above a baseline, and **recovers the known enhanced variants blind** (N322P / H50K / R278M).
+  It asserts **no activity** for a variant lacking measured or in-distribution support.
+* **WV2 — critique, not invention.** A generated candidate writer (from `oracles.protein_design`) is
+  *critiqued* — does it fold? plausible active site? deliverable form? reachable target? — returning
+  pass/flag + reasons; it is **never** returned as "a working new pen" (`no_claim=True`, `claimable=False`).
+When the Perry DMS is absent (off the VM) a small **frozen documented panel** keeps WV1 exercisable and the
+blind-recovery criterion deterministic — labelled as a retrospective panel, never presented as a blind
+sequence predictor.
+"""
+from __future__ import annotations
+import math
+from dataclasses import dataclass
+# A FROZEN retrospective panel of REAL values from the Perry 2025 ISCro4 deep mutational scan
+# (science.adz0276 Table S3, sheet "L2FC_Relative_Z-Scores", column Z_Score_wrt_WT) — the three top-ranked
+# enhancers, three near-neutral variants, and the three most-deleterious variants, copied verbatim from the
+# measured table. Used only when the full Perry DMS is absent (offline/CI); never fabricated.
+_FROZEN_DMS_Z = {
+    "N322P": 0.754, "H50K": 0.742, "R278M": 0.709,      # top-3 enhancers (measured Z, ranks 1-3)
+    "V21R": -0.000, "S312Q": -0.001, "G286T": -0.001,   # near-neutral (|Z| ~ 0)
+    "R132E": -5.400, "R137E": -5.115, "R195D": -4.984,  # most-deleterious (measured worst by Z)
+}
+KNOWN_ISCRO4_ENHANCERS = ["N322P", "H50K", "R278M"]
+_WORSE_CONTROLS = ["R132E", "R137E", "R195D"]            # measured-worst variants (Perry Table S3)
+# the catalytic residues a plausible ISCro4-family candidate must retain (Perry Table S3, sheet
+# "Residue Groups", Catalytic_Residues == "Catalytic"): D11, E60, D102, D105, S241 (1-based) -> 0-based below
+_CORE_RESIDUES = {10: "D", 59: "E", 101: "D", 104: "D", 240: "S"}
+def _sigmoid(x: float) -> float:
+    return 1.0 / (1.0 + math.exp(-x))
+@dataclass
+class VariantScore:
+    variant: str
+    effect: float | None          # measured/predicted activity effect (higher = better)
+    score: float | None           # calibrated activity score in [0,1]
+    interval: tuple[float, float] | None
+    in_dms: bool                  # backed by measured DMS
+    extrapolating: bool           # out of DMS distribution
+    claimable: bool               # may an activity claim be made? (only with measured/in-dist support)
+    note: str
+def _dms_lookup():
+    """Return (model_name, {variant: z}) — the real Perry DMS if present, else the frozen panel (labelled)."""
+    try:
+        from pen_stack.atlas.variant_propose import DMSVariantEffectModel
+        m = DMSVariantEffectModel()
+        return m.name, m._z
+    except Exception:  # noqa: BLE001 - Perry tables absent off the VM
+        return "frozen_retrospective_panel", dict(_FROZEN_DMS_Z)
+def score_variants(variants: list[str], structure_uncertainty: float | None = None) -> list[VariantScore]:
+    """Calibrated activity score per variant. Measured (DMS) variants are claimable with a tight interval;
+    unmeasured variants are flagged extrapolating and are NOT claimable (no activity asserted)."""
+    model_name, z = _dms_lookup()
+    out: list[VariantScore] = []
+    # interval half-width: wider when the structure oracle is uncertain/deferred (no structural support)
+    su = 0.25 if structure_uncertainty is None else float(structure_uncertainty)
+    half = 0.10 + 0.5 * su
+    for v in variants:
+        if v in z:
+            eff = float(z[v])
+            score = _sigmoid(eff)                              # monotone map of the measured Z-score
+            lo, hi = max(0.0, score - half), min(1.0, score + half)
+            out.append(VariantScore(v, eff, round(score, 3), (round(lo, 3), round(hi, 3)),
+                                    in_dms=True, extrapolating=False, claimable=True,
+                                    note=f"measured DMS effect ({model_name})"))
+        else:
+            out.append(VariantScore(v, None, None, None, in_dms=False, extrapolating=True, claimable=False,
+                                    note="OUT of DMS distribution — plausibility screen only, NO activity "
+                                         "claim (v4.0 WS-WV)"))
+    return out
+def blind_recovery(top_k: int = 5) -> dict:
+    """Deterministic blind-validation over the FROZEN documented panel (the same published enhancers,
+    measured-neutral, and measured-worse controls): the known enhancers must rank on top, above the
+    measured-worse variants. This is a retrospective catalogue criterion, NOT a blind sequence predictor —
+    labelled as such. (The full-Perry-DMS recovery is reported separately by `real_dms_recovery`.)"""
+    scores = {v: _sigmoid(zz) for v, zz in _FROZEN_DMS_Z.items()}
+    ranked = sorted(scores, key=scores.get, reverse=True)
+    top = ranked[:top_k]
+    recovered = {e: (e in top) for e in KNOWN_ISCRO4_ENHANCERS}
+    enh_min = min(scores[e] for e in KNOWN_ISCRO4_ENHANCERS)
+    worse_max = max(scores[w] for w in _WORSE_CONTROLS)
+    return {"available": True, "model": "frozen_retrospective_panel", "n": len(_FROZEN_DMS_Z), "top_k": top_k,
+            "top": top, "recovered": recovered, "all_enhancers_recovered": all(recovered.values()),
+            "enhancers_outrank_worse": bool(enh_min > worse_max),
+            "note": "recovers KNOWN enhancers (N322P/H50K/R278M) above measured-worse controls — a "
+                    "retrospective catalogue criterion, NOT a blind sequence-only predictor."}
+def real_dms_recovery(top: int = 20) -> dict:
+    """Recovery against the FULL Perry-2025 ISCro4 DMS via the existing validated harness; deferred (and the
+    frozen panel stands in) when the Perry tables are absent off the VM."""
+    try:
+        from pen_stack.atlas.variant_propose import iscro4_dms_recovery
+        rep = iscro4_dms_recovery(top=top)
+        if rep.get("available", True) is not False and "recovered" in rep:
+            return {"available": True, **rep}
+    except Exception:  # noqa: BLE001
+        pass
+    return {"available": False, "note": "Perry 2025 DMS absent (runs on the VM); see blind_recovery (frozen)"}
+def critique_candidate(candidate_seq: str, writer_family: str = "bridge_IS110",
+                       delivery_vehicle: str | None = None, no_integration: bool = False,
+                       site_seq: str | None = None) -> dict:
+    """Critique a GENERATED candidate writer (WV2) — folds? plausible active site? deliverable? reachable? —
+    returning pass/flag + reasons. NEVER returns 'a working new pen' (no_claim=True, claimable=False)."""
+    flags, reasons = [], []
+    # 1. structural plausibility (structure oracle; deferred without a backend -> flagged, not asserted)
+    from pen_stack.oracles import structure
+    st = structure.consistency(candidate_seq)
+    fold_ok = bool(st.available and st.value is not None and float(st.value) >= 0.7)
+    if not st.available:
+        flags.append("fold_unverified")
+        reasons.append("structure oracle deferred (no AF3/Boltz/Chai/Protenix backend or cache) — fold not "
+                       "verified; candidate cannot be claimed to fold")
+    # 2. active-site plausibility (heuristic: retains conserved core residues)
+    active_site_ok = all(0 <= i < len(candidate_seq) and candidate_seq[i] == aa
+                         for i, aa in _CORE_RESIDUES.items())
+    if not active_site_ok:
+        flags.append("active_site_implausible")
+        reasons.append("candidate does not retain the conserved core residues expected of the writer family")
+    # 3. deliverability + 4. reachability — reuse the rule-grounded verifier where inputs are present
+    deliverable = reachable = None
+    if delivery_vehicle or site_seq:
+        from pen_stack.verify import verify
+        v = verify(dict(write_type="insertion", writer_family=writer_family, site_seq=site_seq,
+                        delivery_vehicle=delivery_vehicle, no_integration=no_integration))
+        named = [x["rule_id"] for x in v.violations]
+        deliverable = not any(r.startswith("delivery.") for r in named)
+        reachable = not any(r.startswith("reachability.") for r in named)
+        if not deliverable:
+            flags.append("not_deliverable")
+            reasons.append("; ".join(x["reason"] for x in v.violations if x["rule_id"].startswith("delivery.")))
+        if not reachable:
+            flags.append("not_reachable")
+            reasons.append("; ".join(x["reason"] for x in v.violations if x["rule_id"].startswith("reachability.")))
+    passed = active_site_ok and fold_ok and (deliverable is not False) and (reachable is not False)
+    return {
+        "writer_family": writer_family, "fold_ok": fold_ok, "active_site_ok": active_site_ok,
+        "deliverable": deliverable, "reachable": reachable, "pass": bool(passed), "flags": flags,
+        "reasons": reasons,
+        "no_claim": True, "claimable": False,             # WV2 NEVER asserts a generated writer works
+        "note": "critique only — a generated writer is scored/critiqued against structure + rules, never "
+                "returned as a working new pen (v4.0 Principle 1 + the pen-assemble lesson).",
+    }

pen_stack-4.0.1/pen_stack/oracles/__init__.py ADDED Viewed

@@ -0,0 +1,65 @@
+"""The L1 oracle mesh (v4.0, WS-O) — one contract over the biomolecular foundation models.
+`pen_stack.oracles` wraps AlphaGenome / Evo2 / structure predictors (AF3, Boltz-2, Chai-1, Protenix) /
+protein-design models (ESM3, RFdiffusion, ProteinMPNN) / ViennaRNA / the bridge energetics model under a
+single `OracleResult` contract (value + provenance + native uncertainty + scope card). Heavy backends run
+on-demand (hosted API / local GPU) and are cached + version-pinned; when a backend is absent the adapter
+returns a *deferred* result (available=False) — the core stays runnable offline from cache.
+"""
+from __future__ import annotations
+from typing import Any
+from pen_stack.oracles.cache import cache_get, cache_key, cache_put, scope_card
+from pen_stack.oracles.schema import OracleResult, Provenance
+__all__ = ["OracleResult", "Provenance", "build_result", "consensus", "assert_claimable",
+           "cache_get", "cache_key", "cache_put", "scope_card"]
+def build_result(oracle: str, model: str, *, value: Any = None, inputs: dict | None = None,
+                 native_uncertainty: float | None = None, available: bool = True, cached: bool = False,
+                 source: str = "adapter", extrapolating: bool = False, in_scope: bool = True,
+                 output_kind: str | None = None, note: str | None = None) -> OracleResult:
+    """Assemble an `OracleResult`, filling version / output_kind / scope from the model's scope card.
+    `output_kind` defaults to the scope card's, but may be overridden per call (e.g. an Evo2 *likelihood*
+    score is a claim-scope scalar, while an Evo2 *generated sequence* is a candidate)."""
+    card = scope_card(model) or {}
+    key = cache_key(oracle, model, card.get("version", "0"), inputs or {})
+    return OracleResult(
+        oracle=oracle, value=value,
+        provenance=Provenance(model=model, version=card.get("version", "0"), source=source, cache_key=key),
+        native_uncertainty=native_uncertainty,
+        scope_card=model, in_scope=in_scope, extrapolating=extrapolating,
+        output_kind=output_kind or card.get("output_kind", "claim"),
+        available=available, cached=cached, note=note)
+def assert_claimable(result: OracleResult) -> OracleResult:
+    """Guard: a generative candidate cannot enter a claim path without writer-verification (Principle 1)."""
+    return result.as_claim()
+def consensus(results: list[OracleResult], oracle: str = "structure") -> OracleResult:
+    """Cross-oracle self-consistency (v4.0 Principle 3): agreement is a confidence signal, **divergence
+    widens the interval**. Combines redundant numeric oracles (e.g. AF3 / Boltz-2 / Chai-1 / Protenix); the
+    reported native_uncertainty is increased by the spread across the available oracles."""
+    avail = [r for r in results if r.available and isinstance(r.value, (int, float))]
+    if not avail:
+        return build_result(oracle, "consensus", available=False,
+                            note="no available numeric oracle to combine")
+    vals = [float(r.value) for r in avail]
+    mean = sum(vals) / len(vals)
+    spread = (max(vals) - min(vals)) if len(vals) > 1 else 0.0
+    base_unc = max((r.native_uncertainty or 0.0) for r in avail)
+    card = scope_card("boltz-2") or {}
+    return OracleResult(
+        oracle=oracle, value=round(mean, 4),
+        provenance=Provenance(model="consensus", version=card.get("version", "0"), source="adapter",
+                              extra={"members": [r.provenance.model for r in avail], "spread": round(spread, 4)}),
+        # disagreement widens the interval: native uncertainty + half the cross-oracle spread
+        native_uncertainty=round(base_unc + 0.5 * spread, 4),
+        scope_card="boltz-2", in_scope=all(r.in_scope for r in avail),
+        extrapolating=any(r.extrapolating for r in avail), output_kind="claim",
+        available=True, note=f"consensus of {len(avail)} oracles; spread {round(spread, 4)} widens the interval")

pen-stack 3.4.0__tar.gz → 4.0.1__tar.gz

pen-stack 3.4.0tar.gz → 4.0.1tar.gz