PyPI - gtsr - Versions diffs - 0.0.1__tar.gz - Mend

gtsr 0.0.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (22) hide show

gtsr-0.0.1/GTsRunner.py +5 -0
gtsr-0.0.1/LICENSE +21 -0
gtsr-0.0.1/PKG-INFO +129 -0
gtsr-0.0.1/README.md +105 -0
gtsr-0.0.1/ckpt/__init__.py +1 -0
gtsr-0.0.1/ckpt/all_best.pth +0 -0
gtsr-0.0.1/ckpt/free_best.pth +0 -0
gtsr-0.0.1/ckpt/stability_best.pkl +0 -0
gtsr-0.0.1/gtsr/__init__.py +5 -0
gtsr-0.0.1/gtsr/runner.py +317 -0
gtsr-0.0.1/gtsr.egg-info/PKG-INFO +129 -0
gtsr-0.0.1/gtsr.egg-info/SOURCES.txt +20 -0
gtsr-0.0.1/gtsr.egg-info/dependency_links.txt +1 -0
gtsr-0.0.1/gtsr.egg-info/requires.txt +8 -0
gtsr-0.0.1/gtsr.egg-info/top_level.txt +2 -0
gtsr-0.0.1/setup.cfg +4 -0
gtsr-0.0.1/setup.py +51 -0
gtsr-0.0.1/src/GCN.py +141 -0
gtsr-0.0.1/src/__init__.py +2 -0
gtsr-0.0.1/src/cif_utils.py +368 -0
gtsr-0.0.1/src/data.py +149 -0
gtsr-0.0.1/src/utils.py +58 -0

gtsr-0.0.1/GTsRunner.py ADDED Viewed

@@ -0,0 +1,5 @@
+"""Backward-compatible source-tree import for GTsRunner."""
+from gtsr.runner import GTsRunner
+__all__ = ["GTsRunner"]

gtsr-0.0.1/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 coollkr
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

gtsr-0.0.1/PKG-INFO ADDED Viewed

@@ -0,0 +1,129 @@
+Metadata-Version: 2.1
+Name: gtsr
+Version: 0.0.1
+Summary: Graph neural network tool for solvent removal from MOF structures
+Author: Xiao-Yan Li Group
+License: MIT
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Operating System :: OS Independent
+Classifier: Topic :: Scientific/Engineering :: Chemistry
+Requires-Python: >=3.9
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: ase>=3.19
+Requires-Dist: numpy>=1.21
+Requires-Dist: pymatgen>=2018.6.11
+Requires-Dist: scikit-learn>=1.0
+Requires-Dist: torch>=1.12
+Requires-Dist: molSimplify==1.8.0
+Requires-Dist: rdkit
+Requires-Dist: networkx
+# GTsR
+<div align="center">
+        <img src="https://raw.githubusercontent.com/Xiao-Yan-Li-group/GTsR/main/webapp/imgs/gtsr_logo.png" alt="GTsR logo" width="500"/>
+</div>
+[![Requires Python 3.10](https://img.shields.io/badge/Python-3.9-blue.svg?logo=python&logoColor=white)](https://python.org/downloads)
+[![MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://github.com/sxm13/pypi-dev/blob/main/LICENSE)
+**GTsR (GNN Tool for Solvent Removal)** is a tool for solvent identification, solvent removal, and activation-stability prediction in metal-organic frameworks (MOFs).
+GTsR uses graph neural networks to classify atoms in CIF structures and generate solvent-free framework CIF files. It also provides a random forest model that predicts the activation stability of cleaned MOFs using structural, pore, and RAC descriptors.
+## Models
+| `checkpoint` | Model file | Purpose |
+| --- | --- | --- |
+| `free` (default) | `ckpt/free_best.pth` | Remove free solvent |
+| `all` | `ckpt/all_best.pth` | Remove all solvent |
+| `stability` | `ckpt/stability_best.pkl` | Predict activation stability |
+The `free` and `all` checkpoints are atom-level GNN classifiers. The `stability` checkpoint is a random forest model bundled with its missing-value imputer.
+## Installation
+```bash
+git clone https://github.com/coollkr/GTsR.git
+cd GTsR
+pip install -e .
+```
+## Usage
+### Solvent Removal
+```python
+from gtsr import GTsRunner
+runner = GTsRunner(checkpoint="free") ### for free solvent removal
+runner = GTsRunner(checkpoint="all") ### for all solvent removal
+runner = GTsRunner(checkpoint="path/to/ckpt.pth", device="cpu") #### use your model
+result = runner.clean(
+    cif="input.cif",
+    output="prediction",
+    threshold=0.5,
+)
+```
+#### `clean()` Result
+`clean()` returns a dictionary containing the following fields:
+| Field | Description |
+| --- | --- |
+| `input` | Absolute path to the input CIF |
+| `output` | Output directory |
+| `framework` | Path to the cleaned framework CIF |
+| `solvent` | Path to the solvent CIF, or `None` if no file was generated |
+| `checkpoint` | Path to the checkpoint used for prediction |
+| `task` | Task name stored in the checkpoint |
+| `threshold` | Atom-classification threshold |
+| `num_atoms` | Total number of atoms |
+| `num_framework_atoms` | Number of framework atoms |
+| `num_solvent_atoms` | Number of solvent atoms |
+| `probabilities` | Solvent probability for each atom |
+| `labels` | Predicted class label for each atom |
+| `solvent_smiles` | SMILES strings of identified solvents |
+### Predict Activation Stability
+```python
+from gtsr import GTsRunner
+runner = GTsRunner(checkpoint="stability")
+score = runner.stability(cif="cleaned_framework.cif")
+if score == 1:
+    print("The cleaned structure is stable.")
+else:
+    print("The cleaned structure is not stable.")
+```
+## Web Interface
+[Host on Streamlit](https://xiao-yan-li-group.streamlit.app/GTsR)
+or in your location
+```bash
+streamlit run webapp/Home.py
+```
+## Citation
+Update the following entry when the associated publication becomes available:
+```bibtex
+@article{gtsr-xyl-group,
+  title   = {GTSR: A GNN Based Tool for Solvent Removal from MOF with Stability Check},
+  author  = {Liang, Kairui and Zhao, Guobin and Li, Xiao-Yan},
+  year    = {2026}
+}
+```
+## License
+The repository's [`LICENSE`](LICENSE) file currently uses the MIT License.

gtsr-0.0.1/README.md ADDED Viewed

@@ -0,0 +1,105 @@
+# GTsR
+<div align="center">
+        <img src="https://raw.githubusercontent.com/Xiao-Yan-Li-group/GTsR/main/webapp/imgs/gtsr_logo.png" alt="GTsR logo" width="500"/>
+</div>
+[![Requires Python 3.10](https://img.shields.io/badge/Python-3.9-blue.svg?logo=python&logoColor=white)](https://python.org/downloads)
+[![MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://github.com/sxm13/pypi-dev/blob/main/LICENSE)
+**GTsR (GNN Tool for Solvent Removal)** is a tool for solvent identification, solvent removal, and activation-stability prediction in metal-organic frameworks (MOFs).
+GTsR uses graph neural networks to classify atoms in CIF structures and generate solvent-free framework CIF files. It also provides a random forest model that predicts the activation stability of cleaned MOFs using structural, pore, and RAC descriptors.
+## Models
+| `checkpoint` | Model file | Purpose |
+| --- | --- | --- |
+| `free` (default) | `ckpt/free_best.pth` | Remove free solvent |
+| `all` | `ckpt/all_best.pth` | Remove all solvent |
+| `stability` | `ckpt/stability_best.pkl` | Predict activation stability |
+The `free` and `all` checkpoints are atom-level GNN classifiers. The `stability` checkpoint is a random forest model bundled with its missing-value imputer.
+## Installation
+```bash
+git clone https://github.com/coollkr/GTsR.git
+cd GTsR
+pip install -e .
+```
+## Usage
+### Solvent Removal
+```python
+from gtsr import GTsRunner
+runner = GTsRunner(checkpoint="free") ### for free solvent removal
+runner = GTsRunner(checkpoint="all") ### for all solvent removal
+runner = GTsRunner(checkpoint="path/to/ckpt.pth", device="cpu") #### use your model
+result = runner.clean(
+    cif="input.cif",
+    output="prediction",
+    threshold=0.5,
+)
+```
+#### `clean()` Result
+`clean()` returns a dictionary containing the following fields:
+| Field | Description |
+| --- | --- |
+| `input` | Absolute path to the input CIF |
+| `output` | Output directory |
+| `framework` | Path to the cleaned framework CIF |
+| `solvent` | Path to the solvent CIF, or `None` if no file was generated |
+| `checkpoint` | Path to the checkpoint used for prediction |
+| `task` | Task name stored in the checkpoint |
+| `threshold` | Atom-classification threshold |
+| `num_atoms` | Total number of atoms |
+| `num_framework_atoms` | Number of framework atoms |
+| `num_solvent_atoms` | Number of solvent atoms |
+| `probabilities` | Solvent probability for each atom |
+| `labels` | Predicted class label for each atom |
+| `solvent_smiles` | SMILES strings of identified solvents |
+### Predict Activation Stability
+```python
+from gtsr import GTsRunner
+runner = GTsRunner(checkpoint="stability")
+score = runner.stability(cif="cleaned_framework.cif")
+if score == 1:
+    print("The cleaned structure is stable.")
+else:
+    print("The cleaned structure is not stable.")
+```
+## Web Interface
+[Host on Streamlit](https://xiao-yan-li-group.streamlit.app/GTsR)
+or in your location
+```bash
+streamlit run webapp/Home.py
+```
+## Citation
+Update the following entry when the associated publication becomes available:
+```bibtex
+@article{gtsr-xyl-group,
+  title   = {GTSR: A GNN Based Tool for Solvent Removal from MOF with Stability Check},
+  author  = {Liang, Kairui and Zhao, Guobin and Li, Xiao-Yan},
+  year    = {2026}
+}
+```
+## License
+The repository's [`LICENSE`](LICENSE) file currently uses the MIT License.

gtsr-0.0.1/ckpt/__init__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ """Bundled GTsR model checkpoints."""

gtsr-0.0.1/ckpt/all_best.pth ADDED Viewed

Binary file

gtsr-0.0.1/ckpt/free_best.pth ADDED Viewed

Binary file

gtsr-0.0.1/ckpt/stability_best.pkl ADDED Viewed

Binary file

gtsr-0.0.1/gtsr/__init__.py ADDED Viewed

@@ -0,0 +1,5 @@
+"""GTsR solvent-removal prediction API."""
+from .runner import GTsRunner
+__all__ = ["GTsRunner"]

gtsr-0.0.1/gtsr/runner.py ADDED Viewed

@@ -0,0 +1,317 @@
+from __future__ import annotations
+import pickle
+from pathlib import Path
+from typing import Any
+import numpy as np
+import torch
+try:
+    from .src.GCN import SolventAtomClassifier
+    from .src.cif_utils import (
+        PoreDiameter,
+        PoreVolume,
+        RACs,
+        cif2graph,
+        cif2pos,
+        flatten_cell,
+        flatten_rac,
+        get_cell,
+        get_sol_smi,
+        label2cif,
+        n_atom,
+        convert2pymatgen
+    )
+    from .src.data import GaussianDistance
+    from .src.utils import load_checkpoint
+except ImportError:
+    from src.GCN import SolventAtomClassifier
+    from src.cif_utils import (
+        PoreDiameter,
+        PoreVolume,
+        RACs,
+        cif2graph,
+        cif2pos,
+        flatten_cell,
+        flatten_rac,
+        get_cell,
+        get_sol_smi,
+        label2cif,
+        n_atom,
+        convert2pymatgen
+    )
+    from src.data import GaussianDistance
+    from src.utils import load_checkpoint
+PACKAGE_DIR = Path(__file__).resolve().parent
+def _bundled_model(filename: str) -> Path:
+    candidates = (
+        PACKAGE_DIR / "ckpt" / filename,
+        PACKAGE_DIR.parent / "ckpt" / filename,
+    )
+    return next((path for path in candidates if path.is_file()), candidates[0])
+def _bundled_checkpoint(name: str) -> Path:
+    return _bundled_model(f"{name}_best.pth")
+CHECKPOINTS = {
+    "free": _bundled_checkpoint("free"),
+    "all": _bundled_checkpoint("all"),
+}
+DEFAULT_CHECKPOINT = CHECKPOINTS["free"]
+STABILITY_MODEL = _bundled_model("stability_best.pkl")
+RAC_FEATURE_NAMES = tuple(
+    f"{prefix}-{property_name}-{depth}"
+    for prefix, property_names in (
+        ("f-sbu", ("chi", "Z", "I", "T", "S")),
+        ("mc", ("chi", "Z", "I", "T", "S")),
+        ("D_mc", ("chi", "Z", "I", "T", "S")),
+        ("f-link", ("chi", "Z", "I", "T", "S")),
+        ("lc", ("chi", "Z", "I", "T", "S", "alpha")),
+        ("D_lc", ("chi", "Z", "I", "T", "S", "alpha")),
+        ("func", ("chi", "Z", "I", "T", "S", "alpha")),
+        ("D_func", ("chi", "Z", "I", "T", "S", "alpha")),
+    )
+    for property_name in property_names
+    for depth in range(4)
+)
+class GTsRunner:
+    def __init__(
+        self,
+        checkpoint: str | Path = "",
+        device: str | torch.device | None = None,
+    ) -> None:
+        checkpoint_name = str(checkpoint).strip().lower()
+        self.device = self._resolve_device(device)
+        self.stability_model = None
+        self.stability_imputer = None
+        if checkpoint_name == "stability":
+            self.checkpoint_path = self._resolve_stability_model()
+            self._load_stability_model()
+            self.checkpoint = None
+            self.task = "stability"
+            return
+        self.checkpoint_path = self._resolve_checkpoint(checkpoint)
+        self.checkpoint = load_checkpoint(self.checkpoint_path, device=self.device)
+        model_config = self.checkpoint.get("model_config")
+        if not isinstance(model_config, dict):
+            raise ValueError(
+                f"Checkpoint does not contain a valid model_config: {self.checkpoint_path}"
+            )
+        self.model = SolventAtomClassifier(**model_config).to(self.device)
+        self.model.load_state_dict(self.checkpoint["state_dict"])
+        self.model.eval()
+        self.radius = float(self.checkpoint.get("radius", 8.0))
+        self.dmin = float(self.checkpoint.get("dmin", 0.0))
+        self.step = float(self.checkpoint.get("step", 0.2))
+        self.default_threshold = float(self.checkpoint.get("threshold", 0.5))
+        self.task = str(self.checkpoint.get("task", "unknown"))
+        self.max_atomic_number = 118
+        self.gdf = GaussianDistance(
+            dmin=self.dmin,
+            dmax=self.radius,
+            step=self.step,
+        )
+    def clean(
+        self,
+        cif: str | Path = "",
+        output: str | Path = "",
+        threshold: float | None = None,
+    ) -> dict[str, Any]:
+        convert2pymatgen(cif)
+        if self.task == "stability":
+            raise RuntimeError(
+                "clean() requires a GNN checkpoint; initialize GTsRunner with "
+                "checkpoint='free' or checkpoint='all'"
+            )
+        cif_path = self._resolve_cif(cif)
+        output_dir = self._resolve_output(cif_path, output)
+        cutoff = self.default_threshold if threshold is None else float(threshold)
+        if not 0.0 <= cutoff <= 1.0:
+            raise ValueError(f"threshold must be between 0 and 1, got {cutoff}")
+        tensors = self._build_tensors(cif_path)
+        with torch.inference_mode():
+            probabilities = torch.sigmoid(self.model(*tensors)).cpu().numpy()
+        labels = (probabilities >= cutoff).astype(np.int64)
+        label2cif(cif_path, labels, str(output_dir))
+        stem = cif_path.stem
+        framework_path = output_dir / f"{stem}_gtsr.cif"
+        solvent_path = output_dir / f"{stem}_sol.cif"
+        try:
+            sol_smis = get_sol_smi(solvent_path)
+        except:
+            sol_smis = None
+        return {
+            "input": str(cif_path),
+            "output": str(output_dir),
+            "framework": str(framework_path),
+            "solvent": str(solvent_path) if solvent_path.exists() else None,
+            "checkpoint": str(self.checkpoint_path),
+            "task": self.task,
+            "threshold": cutoff,
+            "num_atoms": int(labels.size),
+            "num_framework_atoms": int((labels == 0).sum()),
+            "num_solvent_atoms": int((labels == 1).sum()),
+            "probabilities": probabilities.tolist(),
+            "labels": labels.tolist(),
+            "solvent_smiles": sol_smis
+        }
+    def _build_tensors(self, cif_path: Path) -> tuple[torch.Tensor, ...]:
+        graph = cif2graph(cif_path, radius=self.radius)
+        positions = np.asarray(cif2pos(cif_path), dtype=np.float32)
+        numbers = np.asarray(graph["numbers"], dtype=np.int64)
+        if numbers.size == 0:
+            raise ValueError(f"CIF contains no atoms: {cif_path}")
+        if numbers.min() < 1 or numbers.max() > self.max_atomic_number:
+            raise ValueError(
+                f"CIF contains an unsupported atomic number; supported range is "
+                f"1-{self.max_atomic_number}"
+            )
+        if len(positions) != len(numbers):
+            raise ValueError(
+                f"Position/atom mismatch in {cif_path}: "
+                f"{len(positions)} positions for {len(numbers)} atoms"
+            )
+        atom_features = np.eye(
+            self.max_atomic_number + 1,
+            dtype=np.float32,
+        )[numbers]
+        atom_features = np.concatenate([atom_features, positions], axis=1)
+        distances = np.asarray(graph["dij"], dtype=np.float32)
+        neighbor_features = self.gdf.expand(distances).astype(np.float32)
+        index1 = np.asarray(graph["index1"], dtype=np.int64)
+        index2 = np.asarray(graph["index2"], dtype=np.int64)
+        atom_index = np.zeros(len(numbers), dtype=np.int64)
+        tensors = (
+            torch.from_numpy(atom_features),
+            torch.from_numpy(neighbor_features),
+            torch.from_numpy(index1),
+            torch.from_numpy(index2),
+            torch.from_numpy(atom_index),
+        )
+        return tuple(tensor.to(self.device) for tensor in tensors)
+    @staticmethod
+    def _resolve_device(device: str | torch.device | None) -> torch.device:
+        if device is None or str(device).lower() == "auto":
+            return torch.device("cuda" if torch.cuda.is_available() else "cpu")
+        resolved = torch.device(device)
+        if resolved.type == "cuda" and not torch.cuda.is_available():
+            raise RuntimeError("CUDA was requested but is not available")
+        return resolved
+    @staticmethod
+    def _resolve_checkpoint(checkpoint: str | Path) -> Path:
+        checkpoint_name = str(checkpoint).strip().lower()
+        if not checkpoint_name:
+            path = DEFAULT_CHECKPOINT
+        elif checkpoint_name in CHECKPOINTS:
+            path = CHECKPOINTS[checkpoint_name]
+        else:
+            path = Path(checkpoint).expanduser()
+        path = path.resolve()
+        if not path.is_file():
+            raise FileNotFoundError(f"Checkpoint not found: {path}")
+        return path
+    @staticmethod
+    def _resolve_stability_model() -> Path:
+        path = STABILITY_MODEL.resolve()
+        if not path.is_file():
+            raise FileNotFoundError(f"Stability model not found: {path}")
+        return path
+    def _load_stability_model(self) -> None:
+        model_path = self._resolve_stability_model()
+        with model_path.open("rb") as model_file:
+            saved_model = pickle.load(model_file)
+        if isinstance(saved_model, dict):
+            self.stability_model = saved_model["model"]
+            self.stability_imputer = saved_model.get("imputer")
+        else:
+            self.stability_model = saved_model
+            self.stability_imputer = None
+        self._make_stability_model_compatible()
+    def _make_stability_model_compatible(self) -> None:
+        """Fill attributes absent from models saved by older scikit-learn versions."""
+        estimators = [self.stability_model]
+        estimators.extend(getattr(self.stability_model, "estimators_", []))
+        for estimator in estimators:
+            if estimator is not None and not hasattr(estimator, "monotonic_cst"):
+                estimator.monotonic_cst = None
+    @staticmethod
+    def _resolve_cif(cif: str | Path) -> Path:
+        if not cif:
+            raise ValueError("cif must be a path to an input CIF file")
+        path = Path(cif).expanduser().resolve()
+        if not path.is_file():
+            raise FileNotFoundError(f"CIF not found: {path}")
+        return path
+    @staticmethod
+    def _resolve_output(cif_path: Path, output: str | Path) -> Path:
+        path = (
+            Path(output).expanduser()
+            if output
+            else cif_path.parent / f"{cif_path.stem}_gtsr"
+        )
+        path = path.resolve()
+        path.mkdir(parents=True, exist_ok=True)
+        return path
+    def stability(self, cif: str | Path):
+        cif_path = self._resolve_cif(cif)
+        cif_filename = str(cif_path)
+        cell = flatten_cell(get_cell(cif_filename))
+        pore_diameter = PoreDiameter(cif_filename)
+        pore_volume = PoreVolume(cif_filename)
+        rac = flatten_rac(RACs(cif_filename))
+        features = [
+            n_atom(cif_filename),
+            *cell.values(),
+            pore_diameter["Di"],
+            pore_diameter["Df"],
+            pore_diameter["Dif"],
+            pore_volume["Density"],
+            pore_volume["VF"],
+            *(rac.get(name, np.nan) for name in RAC_FEATURE_NAMES),
+        ]
+        feature_batch = np.asarray([features], dtype=np.float64)
+        if self.stability_model is None:
+            self._load_stability_model()
+        if self.stability_imputer is not None:
+            feature_batch = self.stability_imputer.transform(feature_batch)
+        return self.stability_model.predict(feature_batch)[0]