PyPI - csp5 - Versions diffs - 0.2.5__tar.gz - Mend

csp5 0.2.5__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (65) hide show

csp5-0.2.5/PKG-INFO ADDED Viewed

@@ -0,0 +1,197 @@
+Metadata-Version: 2.4
+Name: csp5
+Version: 0.2.5
+Summary: CSP5: pip-installable CASCADE NMR predictor (13C + 1H baselines).
+Author: Benji Rowlands
+Requires-Python: >=3.10
+Description-Content-Type: text/markdown
+Requires-Dist: numpy>=1.24
+Requires-Dist: pandas>=2.0
+Requires-Dist: pyarrow>=12
+Requires-Dist: scipy>=1.13
+Requires-Dist: scikit-learn>=1.6
+Requires-Dist: tqdm>=4.65
+Requires-Dist: rdkit>=2023.9
+Requires-Dist: torch>=2.2
+# CSP5
+`CSP5` is a pip-installable CASCADE predictor package with:
+- batched `13C` and `1H` prediction
+- prediction from precomputed geometries (no re-embedding)
+- shift matching utilities with `dp` (default), `scipy`, and `murty` (k-best)
+Bundled defaults:
+- 13C model: `CSP5 base (13C)` (`model_id`: `csp5-base-13c`)
+- 1H model: `CSP5 base (1H)` (`model_id`: `csp5-base-1h`)
+## Install
+```bash
+pip install CSP5
+```
+## Prediction CLI
+In interactive terminals, `csp5` prints status lines to `stderr` before
+and after prediction. If a run is slow, it prints an additional note that first
+invocation can take longer while dependencies and model weights initialize, plus
+periodic "still working" updates during long runs. Use `--no-status` to silence
+them.
+### From SMILES
+```bash
+csp5 --smiles "CCO" --nucleus 1H
+csp5 --smiles-file smiles.txt --nucleus 13C --batch-size 64
+```
+### From precomputed geometries (parquet structures dataset)
+Input dataset requirements:
+- required columns: `smiles`, `molblock`
+- optional columns: `conformer_rank`, `conformer_id`, `energy`, `energy_method`
+Predict only rank-0 conformers:
+```bash
+csp5 \
+  --structures-path /path/to/structures.parquet \
+  --conformer-rank 0 \
+  --nucleus 1H \
+  --batch-size 64
+```
+Predict using all conformers in the dataset:
+```bash
+csp5 \
+  --structures-path /path/to/structures.parquet \
+  --use-all-conformers \
+  --nucleus 13C
+```
+## Prediction Python API
+```python
+from csp5 import predict_smiles, predict_structures, predict_sdf
+# Standard SMILES mode
+res = predict_smiles(["CCO", "c1ccccc1"], nucleus="1H", batch_size=32)
+print(res.predictions.head())
+# Precomputed-geometry parquet mode
+res2 = predict_structures(
+    "/path/to/structures.parquet",
+    nucleus="1H",
+    conformer_rank=0,
+    use_all_conformers=False,
+)
+# Precomputed-geometry SDF mode
+res3 = predict_sdf("/path/to/embedded.sdf", nucleus="13C")
+```
+## Matching CLI
+`csp5-match` expects one shift per line in each file.
+### Default fast path (`dp`)
+```bash
+csp5-match \
+  --predicted-file predicted.txt \
+  --experimental-file experimental.txt \
+  --solver dp
+```
+### SciPy Hungarian option
+```bash
+csp5-match \
+  --predicted-file predicted.txt \
+  --experimental-file experimental.txt \
+  --solver scipy
+```
+### Murty k-best option
+```bash
+csp5-match \
+  --predicted-file predicted.txt \
+  --experimental-file experimental.txt \
+  --solver murty \
+  --k-best-policy clip \
+  --k-best 25 \
+  --temperature 0.5 \
+  --mae-delta-threshold 0.2
+```
+## Matching Python API
+```python
+from csp5 import match_shifts
+pred = [7.35, 7.30, 1.25]
+exp = [7.34, 7.31, 1.20]
+# DP (default)
+r1 = match_shifts(pred, exp, solver="dp")
+# SciPy Hungarian
+r2 = match_shifts(pred, exp, solver="scipy")
+# Murty k-best
+r3 = match_shifts(pred, exp, solver="murty", k_best=10, k_best_policy="clip")
+print(r3.assignment_entropy, r3.num_competing_assignments)
+```
+## Solver Notes
+- `dp` is the default and is intended for the standard 1D shift objective.
+- `scipy` uses Hungarian assignment on the full padded cost matrix.
+- `murty` is the k-best solver; use this when you need assignment ambiguity analysis.
+- For `murty`, `k_best_policy="clip"` (default) returns all feasible unique assignments
+  when `k_best` is larger than what exists. Use `k_best_policy="strict"` to fail instead.
+- `dp` and `scipy` are top-1 only (`k_best` must be `1`).
+## Output Notes
+- Prediction failures are returned explicitly (`failures`) with reason tags.
+- Prediction output always includes `nucleus`, `model_id`, and `model_name`.
+- For structures-mode predictions, conformer metadata columns are propagated when available.
+## Release
+### Local macOS wheel build
+From repo root:
+```bash
+cd deploy/CSP5
+rm -rf dist build *.egg-info
+MACOSX_DEPLOYMENT_TARGET=11.0 uvx --from build pyproject-build --wheel
+uvx --from twine twine check dist/*
+uvx --from twine twine upload --repository pypi --skip-existing dist/*.whl
+```
+`MACOSX_DEPLOYMENT_TARGET=11.0` keeps wheel tags broadly compatible (for example,
+`macosx_11_0_arm64`) instead of pinning to the host macOS version.
+### Cross-platform publishing (Linux + macOS)
+Use GitHub Actions workflow:
+- file: `.github/workflows/release-csp5.yml`
+- trigger:
+  - push a tag like `csp5-v0.2.5` (build + publish), or
+  - run manually with `publish=true`
+- required repo secret: `PYPI_API_TOKEN`
+The workflow builds:
+- Linux manylinux x86_64 wheels for Python 3.10, 3.11, 3.12, and 3.13
+- macOS arm64 wheels for Python 3.10, 3.11, 3.12, and 3.13
+- one source distribution (sdist)
+Then it uploads all artifacts to PyPI in one step.

csp5-0.2.5/README.md ADDED Viewed

@@ -0,0 +1,181 @@
+# CSP5
+`CSP5` is a pip-installable CASCADE predictor package with:
+- batched `13C` and `1H` prediction
+- prediction from precomputed geometries (no re-embedding)
+- shift matching utilities with `dp` (default), `scipy`, and `murty` (k-best)
+Bundled defaults:
+- 13C model: `CSP5 base (13C)` (`model_id`: `csp5-base-13c`)
+- 1H model: `CSP5 base (1H)` (`model_id`: `csp5-base-1h`)
+## Install
+```bash
+pip install CSP5
+```
+## Prediction CLI
+In interactive terminals, `csp5` prints status lines to `stderr` before
+and after prediction. If a run is slow, it prints an additional note that first
+invocation can take longer while dependencies and model weights initialize, plus
+periodic "still working" updates during long runs. Use `--no-status` to silence
+them.
+### From SMILES
+```bash
+csp5 --smiles "CCO" --nucleus 1H
+csp5 --smiles-file smiles.txt --nucleus 13C --batch-size 64
+```
+### From precomputed geometries (parquet structures dataset)
+Input dataset requirements:
+- required columns: `smiles`, `molblock`
+- optional columns: `conformer_rank`, `conformer_id`, `energy`, `energy_method`
+Predict only rank-0 conformers:
+```bash
+csp5 \
+  --structures-path /path/to/structures.parquet \
+  --conformer-rank 0 \
+  --nucleus 1H \
+  --batch-size 64
+```
+Predict using all conformers in the dataset:
+```bash
+csp5 \
+  --structures-path /path/to/structures.parquet \
+  --use-all-conformers \
+  --nucleus 13C
+```
+## Prediction Python API
+```python
+from csp5 import predict_smiles, predict_structures, predict_sdf
+# Standard SMILES mode
+res = predict_smiles(["CCO", "c1ccccc1"], nucleus="1H", batch_size=32)
+print(res.predictions.head())
+# Precomputed-geometry parquet mode
+res2 = predict_structures(
+    "/path/to/structures.parquet",
+    nucleus="1H",
+    conformer_rank=0,
+    use_all_conformers=False,
+)
+# Precomputed-geometry SDF mode
+res3 = predict_sdf("/path/to/embedded.sdf", nucleus="13C")
+```
+## Matching CLI
+`csp5-match` expects one shift per line in each file.
+### Default fast path (`dp`)
+```bash
+csp5-match \
+  --predicted-file predicted.txt \
+  --experimental-file experimental.txt \
+  --solver dp
+```
+### SciPy Hungarian option
+```bash
+csp5-match \
+  --predicted-file predicted.txt \
+  --experimental-file experimental.txt \
+  --solver scipy
+```
+### Murty k-best option
+```bash
+csp5-match \
+  --predicted-file predicted.txt \
+  --experimental-file experimental.txt \
+  --solver murty \
+  --k-best-policy clip \
+  --k-best 25 \
+  --temperature 0.5 \
+  --mae-delta-threshold 0.2
+```
+## Matching Python API
+```python
+from csp5 import match_shifts
+pred = [7.35, 7.30, 1.25]
+exp = [7.34, 7.31, 1.20]
+# DP (default)
+r1 = match_shifts(pred, exp, solver="dp")
+# SciPy Hungarian
+r2 = match_shifts(pred, exp, solver="scipy")
+# Murty k-best
+r3 = match_shifts(pred, exp, solver="murty", k_best=10, k_best_policy="clip")
+print(r3.assignment_entropy, r3.num_competing_assignments)
+```
+## Solver Notes
+- `dp` is the default and is intended for the standard 1D shift objective.
+- `scipy` uses Hungarian assignment on the full padded cost matrix.
+- `murty` is the k-best solver; use this when you need assignment ambiguity analysis.
+- For `murty`, `k_best_policy="clip"` (default) returns all feasible unique assignments
+  when `k_best` is larger than what exists. Use `k_best_policy="strict"` to fail instead.
+- `dp` and `scipy` are top-1 only (`k_best` must be `1`).
+## Output Notes
+- Prediction failures are returned explicitly (`failures`) with reason tags.
+- Prediction output always includes `nucleus`, `model_id`, and `model_name`.
+- For structures-mode predictions, conformer metadata columns are propagated when available.
+## Release
+### Local macOS wheel build
+From repo root:
+```bash
+cd deploy/CSP5
+rm -rf dist build *.egg-info
+MACOSX_DEPLOYMENT_TARGET=11.0 uvx --from build pyproject-build --wheel
+uvx --from twine twine check dist/*
+uvx --from twine twine upload --repository pypi --skip-existing dist/*.whl
+```
+`MACOSX_DEPLOYMENT_TARGET=11.0` keeps wheel tags broadly compatible (for example,
+`macosx_11_0_arm64`) instead of pinning to the host macOS version.
+### Cross-platform publishing (Linux + macOS)
+Use GitHub Actions workflow:
+- file: `.github/workflows/release-csp5.yml`
+- trigger:
+  - push a tag like `csp5-v0.2.5` (build + publish), or
+  - run manually with `publish=true`
+- required repo secret: `PYPI_API_TOKEN`
+The workflow builds:
+- Linux manylinux x86_64 wheels for Python 3.10, 3.11, 3.12, and 3.13
+- macOS arm64 wheels for Python 3.10, 3.11, 3.12, and 3.13
+- one source distribution (sdist)
+Then it uploads all artifacts to PyPI in one step.

csp5-0.2.5/pyproject.toml ADDED Viewed

@@ -0,0 +1,45 @@
+[build-system]
+requires = ["setuptools>=68", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "csp5"
+version = "0.2.5"
+description = "CSP5: pip-installable CASCADE NMR predictor (13C + 1H baselines)."
+readme = "README.md"
+requires-python = ">=3.10"
+authors = [{ name = "Benji Rowlands" }]
+dependencies = [
+  "numpy>=1.24",
+  "pandas>=2.0",
+  "pyarrow>=12",
+  "scipy>=1.13",
+  "scikit-learn>=1.6",
+  "tqdm>=4.65",
+  "rdkit>=2023.9",
+  "torch>=2.2",
+]
+[project.scripts]
+csp5 = "csp5.cli:main"
+csp5-match = "csp5.matching_cli:main"
+[tool.setuptools]
+package-dir = {"" = "src"}
+include-package-data = true
+[tool.setuptools.packages.find]
+where = ["src"]
+[tool.setuptools.package-data]
+"csp5" = [
+  "models/**/*.pt",
+  "_runtime/Predict_SMILES_FF/preprocessor_orig.p",
+  "_runtime/Predict_SMILES_FF/modules/**/*.py",
+  "_runtime/Predict_SMILES_FF/torch_model.py",
+  "_native/*.so",
+  "_native/src/matching_dp.cpp",
+  "_native/src/fastmurty/*.c",
+  "_native/src/fastmurty/*.h",
+  "_native/src/fastmurty/LICENSE",
+]

csp5-0.2.5/setup.cfg ADDED Viewed

@@ -0,0 +1,4 @@
+[egg_info]
+tag_build =
+tag_date = 0

csp5-0.2.5/setup.py ADDED Viewed

@@ -0,0 +1,86 @@
+from __future__ import annotations
+import os
+import subprocess
+import sys
+from pathlib import Path
+from setuptools import Distribution, setup
+from setuptools.command.build_py import build_py as _build_py
+class BinaryDistribution(Distribution):
+    def has_ext_modules(self) -> bool:  # pragma: no cover
+        return True
+class build_py(_build_py):
+    def run(self) -> None:
+        super().run()
+        self._build_native_shared_libs()
+    def _build_native_shared_libs(self) -> None:
+        if os.name == "nt":
+            raise RuntimeError("Building CSP5 native matching backends on Windows is unsupported")
+        pkg_native_dir = Path(self.build_lib) / "csp5" / "_native"
+        pkg_native_dir.mkdir(parents=True, exist_ok=True)
+        project_root = Path(__file__).resolve().parent
+        native_src = project_root / "src" / "csp5" / "_native" / "src"
+        dp_src = native_src / "matching_dp.cpp"
+        fastmurty_dir = native_src / "fastmurty"
+        dp_out = pkg_native_dir / "libmatching_dp.so"
+        murty_out = pkg_native_dir / "mhtda.so"
+        cxx = os.environ.get("CXX", "g++")
+        cc = os.environ.get("CC", "gcc")
+        dp_cmd = [
+            cxx,
+            "-O3",
+            "-std=c++17",
+            "-fPIC",
+            "-shared",
+            "-o",
+            str(dp_out),
+            str(dp_src),
+        ]
+        fastmurty_sources = [
+            fastmurty_dir / "subproblem.c",
+            fastmurty_dir / "queue.c",
+            fastmurty_dir / "sspDense.c",
+            fastmurty_dir / "sspSparse.c",
+            fastmurty_dir / "murtysplitDense.c",
+            fastmurty_dir / "murtysplitSparse.c",
+            fastmurty_dir / "da.c",
+        ]
+        murty_cmd = [
+            cc,
+            "-O3",
+            "-fPIC",
+            "-shared",
+            "-DSPARSE",
+            "-DNDEBUG",
+            "-o",
+            str(murty_out),
+            *[str(path) for path in fastmurty_sources],
+        ]
+        self.announce("Building native DP backend", level=2)
+        subprocess.check_call(dp_cmd)
+        self.announce("Building native Murty backend", level=2)
+        subprocess.check_call(murty_cmd)
+        if not dp_out.exists():
+            raise RuntimeError(f"Failed to build native DP backend: {dp_out}")
+        if not murty_out.exists():
+            raise RuntimeError(f"Failed to build native Murty backend: {murty_out}")
+setup(
+    distclass=BinaryDistribution,
+    cmdclass={"build_py": build_py},
+)

csp5-0.2.5/src/csp5/__init__.py ADDED Viewed

@@ -0,0 +1,52 @@
+"""CSP5 public API.
+This module keeps top-level imports lazy so entrypoints like ``csp5``
+avoid importing optional matching dependencies unless needed.
+"""
+from __future__ import annotations
+from importlib import import_module
+from typing import TYPE_CHECKING, Dict, Tuple
+if TYPE_CHECKING:
+    from .api import PredictionResult, predict_mols, predict_sdf, predict_smiles, predict_structures
+    from .matching import MatchingResult, RankedAssignment, match_shifts
+__all__ = [
+    "PredictionResult",
+    "MatchingResult",
+    "RankedAssignment",
+    "predict_smiles",
+    "predict_mols",
+    "predict_structures",
+    "predict_sdf",
+    "match_shifts",
+]
+_EXPORT_MAP: Dict[str, Tuple[str, str]] = {
+    "PredictionResult": ("csp5.api", "PredictionResult"),
+    "predict_smiles": ("csp5.api", "predict_smiles"),
+    "predict_mols": ("csp5.api", "predict_mols"),
+    "predict_structures": ("csp5.api", "predict_structures"),
+    "predict_sdf": ("csp5.api", "predict_sdf"),
+    "MatchingResult": ("csp5.matching", "MatchingResult"),
+    "RankedAssignment": ("csp5.matching", "RankedAssignment"),
+    "match_shifts": ("csp5.matching", "match_shifts"),
+}
+def __getattr__(name: str):
+    if name not in _EXPORT_MAP:
+        raise AttributeError(f"module 'csp5' has no attribute {name!r}")
+    module_name, attr_name = _EXPORT_MAP[name]
+    module = import_module(module_name)
+    value = getattr(module, attr_name)
+    globals()[name] = value
+    return value
+def __dir__() -> list[str]:
+    return sorted(set(globals()) | set(__all__))

csp5-0.2.5/src/csp5/_native/__init__.py ADDED Viewed

@@ -0,0 +1,6 @@
+"""Native matching backends bundled with CSP5."""
+from .dp_backend import match_indices_dp
+from .murty_backend import murty_k_best
+__all__ = ["match_indices_dp", "murty_k_best"]

csp5-0.2.5/src/csp5/_native/dp_backend.py ADDED Viewed

@@ -0,0 +1,86 @@
+"""ctypes wrapper for native DP matching backend."""
+from __future__ import annotations
+import ctypes
+from pathlib import Path
+from typing import List, Sequence, Tuple
+import numpy as np
+_LIB_PATH = Path(__file__).resolve().with_name("libmatching_dp.so")
+if not _LIB_PATH.exists():
+    raise RuntimeError(
+        "Native DP backend missing: libmatching_dp.so was not bundled. "
+        "Reinstall CSP5 from a wheel built for this platform."
+    )
+try:
+    _LIB = ctypes.CDLL(str(_LIB_PATH))
+except OSError as exc:  # pragma: no cover
+    raise RuntimeError(f"Failed to load native DP backend: {_LIB_PATH} ({exc})") from exc
+_MATCH_DP = _LIB.nmrexp_match_indices_dp
+_MATCH_DP.argtypes = [
+    ctypes.POINTER(ctypes.c_double),
+    ctypes.c_int,
+    ctypes.POINTER(ctypes.c_double),
+    ctypes.c_int,
+    ctypes.c_double,
+    ctypes.POINTER(ctypes.c_double),
+    ctypes.POINTER(ctypes.c_int),
+    ctypes.POINTER(ctypes.c_int),
+    ctypes.c_int,
+]
+_MATCH_DP.restype = ctypes.c_int
+def match_indices_dp(
+    pred_vals: Sequence[float] | np.ndarray,
+    obs_vals: Sequence[float] | np.ndarray,
+    *,
+    dummy_cost: float,
+    row_penalties: Sequence[float] | np.ndarray | None = None,
+) -> Tuple[List[int], List[int]]:
+    """Run native DP matcher and return matched row/column indices."""
+    pred_arr = np.asarray(pred_vals, dtype=np.float64).reshape(-1)
+    obs_arr = np.asarray(obs_vals, dtype=np.float64).reshape(-1)
+    n_pred = int(pred_arr.shape[0])
+    n_obs = int(obs_arr.shape[0])
+    if n_pred == 0 or n_obs == 0:
+        return [], []
+    penalties_arr = None
+    penalties_ptr = None
+    if row_penalties is not None:
+        penalties_arr = np.asarray(row_penalties, dtype=np.float64).reshape(-1)
+        if penalties_arr.shape[0] != n_pred:
+            raise ValueError(
+                f"row_penalties length mismatch: got {penalties_arr.shape[0]}, expected {n_pred}"
+            )
+        penalties_ptr = penalties_arr.ctypes.data_as(ctypes.POINTER(ctypes.c_double))
+    out_size = int(min(n_pred, n_obs))
+    out_rows = np.empty(out_size, dtype=np.int32)
+    out_cols = np.empty(out_size, dtype=np.int32)
+    count = _MATCH_DP(
+        pred_arr.ctypes.data_as(ctypes.POINTER(ctypes.c_double)),
+        ctypes.c_int(n_pred),
+        obs_arr.ctypes.data_as(ctypes.POINTER(ctypes.c_double)),
+        ctypes.c_int(n_obs),
+        ctypes.c_double(float(dummy_cost)),
+        penalties_ptr,
+        out_rows.ctypes.data_as(ctypes.POINTER(ctypes.c_int)),
+        out_cols.ctypes.data_as(ctypes.POINTER(ctypes.c_int)),
+        ctypes.c_int(out_size),
+    )
+    if count < 0:
+        raise RuntimeError(f"Native DP backend returned invalid count={count}")
+    if count == 0:
+        return [], []
+    return out_rows[:count].tolist(), out_cols[:count].tolist()