PyPI - pxmeter - Versions diffs - 0.1.6__tar.gz → 1.0.0__tar.gz - Mend

pxmeter 0.1.6tar.gz → 1.0.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (57) hide show

{pxmeter-0.1.6/pxmeter.egg-info → pxmeter-1.0.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: pxmeter
-Version: 0.1.6
+Version: 1.0.0
 Summary: PXMeter is a comprehensive toolkit for evaluating the quality of         structures generated by biomolecular structure prediction models.
 Author: Bytedance Inc.
 Author-email: ai4s-bio@bytedance.com
@@ -9,7 +9,6 @@ Platform: manylinux1
 Requires-Python: >=3.11
 License-File: LICENSE
 Requires-Dist: biotite>=1.2.0
-Requires-Dist: dockq==2.1.3
 Requires-Dist: gemmi==0.7.0
 Requires-Dist: joblib
 Requires-Dist: ml_collections
@@ -22,6 +21,8 @@ Requires-Dist: scipy
 Requires-Dist: tabulate
 Requires-Dist: tqdm
 Requires-Dist: click
+Requires-Dist: pyarrow
+Requires-Dist: PyYAML
 Dynamic: author
 Dynamic: author-email
 Dynamic: license

{pxmeter-0.1.6 → pxmeter-1.0.0}/README.md RENAMED Viewed

@@ -32,7 +32,7 @@ pip install -r requirements.txt
 pip install -e .
 ```
-PXMeter will automatically download the Chemical Component Dictionary (CCD) upon its first run. To update the CCD files:
+PXMeter directly uses the Chemical Component Dictionary (CCD) bundled with Biotite. To update the CCD files:
 ```bash
 pxm ccd update
@@ -48,12 +48,13 @@ pxm -r examples/7rss.cif -m examples/7rss_protenix_pred.cif -o pxm_output.json
 **Key Parameters**:
 - `-r` or `--ref_cif`: Path to reference CIF file
 - `-m` or `--model_cif`: Path to model CIF file
-- `-o` or `--output`: Path to save evaluation results (default: "pxm_output.json")
+- `-o` or `--output_json`: Path to save evaluation results (default: "pxm_output.json")
 - `--ref_model`: Specify model number of reference CIF (default: 1)
 - `--ref_assembly_id`: Specify the assembly ID for the reference CIF (default: None; uses the Asymmetric Unit for evaluation)
-- `ref_altloc`: Specify the alternative location identifier for the reference CIF (default: "first", uses the first alternative location code for each residue).
+- `--ref_altloc`: Specify the alternative location identifier for the reference CIF (default: "first", uses the first alternative location code for each residue).
 - `--chain_id_to_mol_json`: JSON file defining custom ligands, where keys are chain IDs (label_asym_id) and values are the corresponding ligand SMILES strings.
 - `-l` or `--interested_lig_label_asym_id`: Indicate the `label_asym_id` of ligands for metrics like pocket-aligned RMSD. Multiple ligands should be comma-separated.
+- `-C key.path=value`: Override fields in `pxmeter.configs.run_config.RUN_CONFIG` (repeatable; e.g., `-C metric.lddt.eps=1e-4 -C mapping.mapping_ligand=false`).
 To access the full list of parameters, use the `--help` option.
@@ -80,18 +81,59 @@ For detailed descriptions of additional parameters, use the `help()` function:
 help(evaluate)
 ```
+If you need to modify the runtime settings defined in
+`pxmeter.configs.run_config.RUN_CONFIG` (equivalent to using `-C` on the command line),
+you may directly update the values in `RUN_CONFIG` and then pass it into the evaluate() function.
+```python
+from pxmeter.configs.run_config import RUN_CONFIG
+RUN_CONFIG.mapping.res_id_alignments = False
+metric_result = evaluate(
+    ...,
+	run_config=RUN_CONFIG,
+)
+```
+For a detailed, step-by-step description of the PXMeter runtime evaluation pipeline (mapping, alignment, and metric computation), please refer to the [PXMeter evaluation pipeline details](docs/pxmeter_eval_details.md).
+For a comprehensive overview of the runtime configuration options, recommended defaults, and advanced usage examples, see the [PXMeter run configuration guide](docs/run_config_details.md).
+### Optional: Stereochemistry checks
+Run stereochemistry checks for a single CIF and export a CSV report:
+```bash
+pxm stereocheck -c examples/7rss_protenix_pred.cif -o stereochem_report.csv
+```
+**`pxm stereocheck` Parameters**:
+- `-c` or `--cif` (required): Path to the CIF file
+- `-o` or `--output-csv`: Path to the output CSV report (default: `stereochem_report.csv`)
 ## 📊 Benchmarking
-Refer to [benchmark/README.md](./benchmark/README.md) for evaluation protocols on:
-- RecentPDB dataset
-- PoseBusters V2
-The benchmark data is released under the CC0 license.
-We include code in the `benchmark` directory that evaluates various models using PXMeter and aggregates their metrics.
-This serves as an example of best practices for using the tool. For more details, please refer to our paper:
+PXMeter offers a reproducible workflow covering both dataset creation and model evaluation.
+**Note**: The benchmarking workflow (the `benchmark/` directory) is only available in the source repository and is not shipped with the PyPI package. To run benchmarking, please clone the repository first:
+```bash
+git clone https://github.com/bytedance/PXMeter.git
+cd PXMeter
+```
+- The **[Benchmark Documentation](docs/benchmark.md)** explains how to run evaluations on model predictions and how the aggregated metrics are computed.
+- The **[Dataset Pipeline Overview](docs/datapipeline.md)** describes the complete construction of the RecentPDB low-homology dataset,
+including filtering, homology scans, clustering, and subset labeling.
+The pipeline also allows users to **rebuild the evaluation dataset from scratch using any custom time window**.
+This makes the benchmark fully flexible and adaptable to different release periods or ongoing updates from the PDB.
+- For details on the dataset used in our paper, please refer to the **[legacy dataset documentation](docs/legacy_dataset_reference.md)**, which describes the dataset version and evaluation code used at the time of the initial release.
+## ➡️ Preparing input files
-📄 <a href="https://www.biorxiv.org/content/10.1101/2025.07.17.664878v1">From Dataset Curation to Unified Evaluation: Revisiting
- Structure Prediction Benchmarks with PXMeter</a>
+When working with structural inputs—e.g., converting mmCIF, AlpahFold3, Protenix, or Boltz formats—you may find the following utility helpful:
+[pxm gen-input Usage Guide](docs/gen_input.md).
+ — a tool for generating and converting model input files via CLI or Python API.
 ## 💪 Contributing to PXMeter

{pxmeter-0.1.6 → pxmeter-1.0.0}/pxmeter/calc_metric.py RENAMED Viewed

@@ -12,20 +12,16 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
-import copy
 import dataclasses
-import gzip
 import json
 import logging
 import tempfile
 from pathlib import Path
-from typing import Any
+from typing import Any, Optional, Union
-import DockQ.parsers as dockq_parsers
 import numpy as np
 import pandas as pd
 from biotite.structure.io import pdb
-from DockQ.DockQ import run_on_all_native_interfaces
 from ml_collections.config_dict import ConfigDict
 from posebusters import PoseBusters
 from rdkit import Chem
@@ -34,105 +30,18 @@ from pxmeter.configs.run_config import RUN_CONFIG
 from pxmeter.constants import IONS, LIGAND
 from pxmeter.data.ccd import get_ccd_mol_from_chain_atom_array
 from pxmeter.data.struct import Structure
-from pxmeter.metrics.clashes import check_clashes_by_vdw
+from pxmeter.metrics.dockq import compute_dockq
 from pxmeter.metrics.lddt_metrics import LDDT
 from pxmeter.metrics.rmsd_metrics import RMSDMetrics
 logging.getLogger("posebusters").setLevel(logging.ERROR)
-def load_PDB(path, chains=None, small_molecule=False, n_model=0):
-    """
-    Modified from DockQ.DockQ.load_PDB to avoid ResourceWarning warnings.
-    ResourceWarning: Enable tracemalloc to get the object allocation traceback
-    DockQ/DockQ.py:660: ResourceWarning: unclosed file
-    """
-    if chains is None:
-        chains = []
-    try:
-        pdb_parser = dockq_parsers.PDBParser(QUIET=True)
-        with (
-            gzip.open(path, "rt") if path.endswith(".gz") else open(path, "rt")
-        ) as file_obj:
-            model = pdb_parser.get_structure(
-                "-",
-                file_obj,
-                chains=chains,
-                parse_hetatms=small_molecule,
-                model_number=n_model,
-            )
-    except Exception:
-        pdb_parser = dockq_parsers.MMCIFParser(QUIET=True)
-        with (
-            gzip.open(path, "rt") if path.endswith(".gz") else open(path, "rt")
-        ) as file_obj:
-            model = pdb_parser.get_structure(
-                "-",
-                file_obj,
-                chains=chains,
-                parse_hetatms=small_molecule,
-                auth_chains=not small_molecule,
-                model_number=n_model,
-            )
-    model.id = path
-    return model
-def compute_dockq(
-    ref_struct: Structure,
-    model_struct: Structure,
-    ref_to_model_chain_map: dict[str, str],
-) -> dict[str, dict[str, Any]]:
-    """
-    Computes the DockQ score between a reference structure and a model structure.
-    Args:
-        ref_struct (Structure): The reference structure.
-        model_struct (Structure): The model structure to be evaluated.
-        ref_to_model_chain_map (dict[str, str]): A dictionary mapping reference chain IDs to model chain IDs.
-    Returns:
-        dict[str, dict[str, Any]]: A dictionary containing the DockQ score and other related metrics.
-    """
-    with tempfile.TemporaryDirectory() as tmp_dir:
-        tmp_dir = Path(tmp_dir)
-        tmp_ref_cif = tmp_dir / "tmp_ref.cif"
-        tmp_model_cif = tmp_dir / "tmp_model.cif"
-        # Calculate DockQ using exclusively valid atoms
-        # Use uni_chain_id as label_asym_id
-        ref_struct.to_cif(tmp_ref_cif, use_uni_chain_id=True)
-        model_struct.to_cif(tmp_model_cif, use_uni_chain_id=True)
-        # small_molecule=False means only polymer is considered
-        model = load_PDB(str(tmp_model_cif), small_molecule=False)
-        native = load_PDB(str(tmp_ref_cif), small_molecule=False)
-        native_chains = [c.id for c in native]
-        model_chains = [c.id for c in model]
-        valid_ref_to_model_chain_map = {}
-        for k, v in ref_to_model_chain_map.items():
-            if (
-                k in ref_struct.uni_chain_id
-                and k in native_chains
-                and v in model_chains
-            ):
-                # some all UNK structure will not be load by load_PDB(), e.g. chain Q in 7q6i
-                valid_ref_to_model_chain_map[k] = v
-                assert v in model_struct.uni_chain_id
-    dockq_result_dict, _total_dockq = run_on_all_native_interfaces(
-        model, native, chain_map=valid_ref_to_model_chain_map
-    )
-    return dockq_result_dict
 def compute_pb_valid(
     ref_struct: Structure,
     model_struct: Structure,
-    ref_lig_label_asym_id: str | list[str],
-) -> pd.DataFrame | None:
+    ref_lig_label_asym_id: Union[str, list[str]],
+) -> Optional[pd.DataFrame]:
     """
     Compute pose-busting validation metrics for a given reference structure, model structure, and reference features.
@@ -152,6 +61,8 @@ def compute_pb_valid(
         ref_lig_label_asym_ids = list(ref_lig_label_asym_id)
     df_list = []
+    buster = PoseBusters(config="redock")
     for lig_label_asym_id in ref_lig_label_asym_ids:
         lig_mask = ref_struct.atom_array.label_asym_id == lig_label_asym_id
@@ -159,13 +70,19 @@ def compute_pb_valid(
         model_lig_chain_id = model_struct.uni_chain_id[lig_mask][0]
         ref_lig_atom_array = ref_struct.atom_array[lig_mask]
-        model_lig_atom_array = copy.deepcopy(model_struct.atom_array[lig_mask])
+        model_lig_atom_array = model_struct.atom_array[lig_mask].copy()
         # reset res_name for model ligand atoms by ref Structure
         model_lig_atom_array.res_name = ref_lig_atom_array.res_name
-        model_cond_atom_array = model_struct.atom_array[~lig_mask]
-        ref_lig_mol = get_ccd_mol_from_chain_atom_array(ref_lig_atom_array)
-        model_lig_mol = get_ccd_mol_from_chain_atom_array(model_lig_atom_array)
+        model_cond_atom_array = model_struct.atom_array[~lig_mask].copy()
+        try:
+            ref_lig_mol = get_ccd_mol_from_chain_atom_array(ref_lig_atom_array)
+            model_lig_mol = get_ccd_mol_from_chain_atom_array(model_lig_atom_array)
+        except Exception:
+            logging.warning(
+                f"Failed to create RDKit molecule for ligand {lig_label_asym_id}. Skipping PoseBusters."
+            )
+            continue
         with tempfile.TemporaryDirectory() as tmp_dir:
             tmp_dir = Path(tmp_dir)
@@ -182,26 +99,34 @@ def compute_pb_valid(
             sdf_writer.close()
             pdb_file = pdb.PDBFile()
-            model_cond_atom_array = copy.deepcopy(model_cond_atom_array)
-            # PDB file only support one letter chain_id
-            model_cond_atom_array.chain_id = [
-                i[0] for i in model_cond_atom_array.chain_id
-            ]
+            # PDB file only support one letter chain_id, 3 letters res_name, 4 letters atom_name
+            model_cond_atom_array.chain_id = np.array(
+                [i[0] if len(i) > 0 else " " for i in model_cond_atom_array.chain_id],
+                dtype="U1",
+            )
+            model_cond_atom_array.res_name = np.array(
+                [i[:3] for i in model_cond_atom_array.res_name], dtype="U3"
+            )
+            model_cond_atom_array.atom_name = np.array(
+                [i[:4] for i in model_cond_atom_array.atom_name], dtype="U4"
+            )
+            model_cond_atom_array.bonds = None
             pdb_file.set_structure(model_cond_atom_array)
             pdb_file.write(model_cond_pdb)
-            buster = PoseBusters(config="redock")
             df = buster.bust(
                 mol_pred=model_lig_sdf,
                 mol_true=ref_lig_sdf,
                 mol_cond=model_cond_pdb,
                 full_report=True,
             )
             # record ligand chain id
             df["ref_lig_chain_id"] = ref_lig_chain_id
             df["model_lig_chain_id"] = model_lig_chain_id
             df_list.append(df)
     df_cat = pd.concat(df_list)
     return df_cat
@@ -232,6 +157,7 @@ class CalcLDDTMetric:
             is_nucleotide_threshold=lddt_config.nucleotide_threshold,
             is_not_nucleotide_threshold=lddt_config.non_nucleotide_threshold,
             eps=lddt_config.eps,
+            stereochecks=lddt_config.stereochecks,
         )
     def get_chains_mask(
@@ -278,7 +204,7 @@ class CalcLDDTMetric:
         merged_chain_2_masks = np.array(merged_chain_2_masks)
         return merged_chain_1_masks, merged_chain_2_masks
-    def get_complex_lddt(self) -> float:
+    def get_complex_lddt(self, atom_mask: Optional[np.ndarray] = None) -> float:
         """
         Calculate the LDDT score for a complex.
@@ -286,6 +212,9 @@ class CalcLDDTMetric:
         and true coordinates of the complex. The LDDT score is a measure of the
         structural similarity between the predicted and true structures.
+        Args:
+            atom_mask (np.ndarray): A mask for the atoms to include in the calculation.
         Returns:
             float: The LDDT score for the complex.
         """
@@ -293,11 +222,15 @@ class CalcLDDTMetric:
         complex_lddt = self.lddt_calculator.run(
             chain_1_masks=None,
             chain_2_masks=None,
+            atom_mask=atom_mask,
         )
         return complex_lddt
     def get_chain_interface_lddt(
-        self, chains: list[str], interfaces: list[tuple[str, str]]
+        self,
+        chains: list[str],
+        interfaces: list[tuple[str, str]],
+        atom_mask: Optional[np.ndarray] = None,
     ) -> list[float]:
         """
         Calculate the LDDT scores for chains and interfaces.
@@ -305,7 +238,9 @@ class CalcLDDTMetric:
         Args:
             chains (list[str]): A list of chain identifiers.
             interfaces (list[tuple[str, str]]): A list of tuples, each containing
-                                                two chain identifiers representing an interface.
+                two chain identifiers representing an interface.
+            atom_mask (np.ndarray, optional): A mask for the atoms to include in the calculation.
+                Defaults to None.
         Returns:
             list[float]: A list of LDDT scores for chains and interfaces.
@@ -317,6 +252,7 @@ class CalcLDDTMetric:
         lddt_list = self.lddt_calculator.run(
             chain_1_masks=merged_chain_1_masks,
             chain_2_masks=merged_chain_2_masks,
+            atom_mask=atom_mask,
         )
         return lddt_list
@@ -343,9 +279,11 @@ class MetricResult:
     interface: dict[tuple[str, str], dict[str, Any]]
     # [ref_chain_id: {metric: value}]
-    pb_valid: dict[str, dict[str, Any]] | None = None
+    pb_valid: Optional[dict[str, dict[str, Any]]] = None
-    ori_model_chain_ids: list[str] | None = None
+    ori_model_chain_ids: Optional[list[str]] = None
+    update_data: Optional[dict[str, Any]] = None
     @staticmethod
     def _get_chain_info(ref_struct: Structure) -> dict[str, dict[str, str]]:
@@ -420,26 +358,31 @@ class MetricResult:
         chains: list[str],
         interfaces: list[tuple[str, str]],
         chain_interface_lddt: list[float],
+        metric_name: str = "lddt",
     ) -> tuple[dict[str, dict[str, float]], dict[tuple[str, str], dict[str, float]]]:
         chain_lddt_dict = {}
         interface_lddt_dict = {}
         num_chains = len(chains)
         for idx, chain_id in enumerate(chains):
-            chain_lddt_dict[chain_id] = {"lddt": chain_interface_lddt[idx]}
+            lddt_value = chain_interface_lddt[idx]
+            if np.isnan(lddt_value):
+                continue
+            chain_lddt_dict[chain_id] = {metric_name: lddt_value}
         for idx, interface in enumerate(interfaces):
             sorted_interface = tuple(
                 sorted(interface)
             )  # Sort chains to ensure consistent order
-            interface_lddt_dict[sorted_interface] = {
-                "lddt": chain_interface_lddt[idx + num_chains]
-            }
+            lddt_value = chain_interface_lddt[idx + num_chains]
+            if np.isnan(lddt_value):
+                continue
+            interface_lddt_dict[sorted_interface] = {metric_name: lddt_value}
         return chain_lddt_dict, interface_lddt_dict
     @staticmethod
     def _post_process_dockq(
         dockq_result_dict: dict[str, Any],
-    ) -> dict[str, float | dict[str, float]]:
+    ) -> dict[str, Union[float, dict[str, float]]]:
         polymer_dockq_metrics = {"F1", "iRMSD", "LRMSD", "fnat", "nat_correct",
                                  "nat_total", "fnonnat", "nonnat_count", "model_total",
                                  "clashes", "len1", "len2", "class1", "class2", "is_het",
@@ -475,8 +418,8 @@ class MetricResult:
     @staticmethod
     def _post_process_pb_valid(
-        pb_valid_result_df: pd.DataFrame | None,
-    ) -> dict[str, dict[str, Any]] | None:
+        pb_valid_result_df: Optional[pd.DataFrame],
+    ) -> Optional[dict[str, dict[str, Any]]]:
         if pb_valid_result_df is None:
             return
@@ -505,14 +448,129 @@ class MetricResult:
             else:
                 tar_dict[key] = value
+    @staticmethod
+    def _calc_stereochecks_summary(
+        atom_mask: np.ndarray,
+        clash_df: pd.DataFrame,
+        bad_bond_df: pd.DataFrame,
+        bad_angle_df: pd.DataFrame,
+    ) -> dict[str, int]:
+        """
+        ggregate stereochemistry violations within an atom subset.
+        - `clash_atoms`: number of unique atoms involved in clashes (within subset)
+        - `bad_bonds`: number of bad bonds (within subset)
+        - `bad_angles`: number of bad angles (within subset)
+        The `idx*` columns in DataFrames are indices into the mapped atom arrays.
+        """
+        atom_mask = np.asarray(atom_mask, dtype=bool)
+        clash_atoms = 0
+        if clash_df is not None and (not clash_df.empty):
+            idx1 = clash_df["idx1"].to_numpy(dtype=np.int64, copy=False)
+            idx2 = clash_df["idx2"].to_numpy(dtype=np.int64, copy=False)
+            row_mask = atom_mask[idx1] & atom_mask[idx2]
+            if np.any(row_mask):
+                clash_atoms = int(
+                    np.unique(np.concatenate([idx1[row_mask], idx2[row_mask]])).size
+                )
+        bond_cnt = 0
+        if bad_bond_df is not None and (not bad_bond_df.empty):
+            idx1 = bad_bond_df["idx1"].to_numpy(dtype=np.int64, copy=False)
+            idx2 = bad_bond_df["idx2"].to_numpy(dtype=np.int64, copy=False)
+            bond_cnt = int(np.sum(atom_mask[idx1] & atom_mask[idx2]))
+        angle_cnt = 0
+        if bad_angle_df is not None and (not bad_angle_df.empty):
+            idx_a = bad_angle_df["idx_a"].to_numpy(dtype=np.int64, copy=False)
+            idx_b = bad_angle_df["idx_b"].to_numpy(dtype=np.int64, copy=False)
+            idx_c = bad_angle_df["idx_c"].to_numpy(dtype=np.int64, copy=False)
+            angle_cnt = int(
+                np.sum(atom_mask[idx_a] & atom_mask[idx_b] & atom_mask[idx_c])
+            )
+        return {
+            "clash_atoms": clash_atoms,
+            "bad_bonds": bond_cnt,
+            "bad_angles": angle_cnt,
+        }
+    @classmethod
+    def _maybe_add_lddt_stereochecks_summaries(
+        cls,
+        *,
+        lddt_config: ConfigDict,
+        lddt_calculator: LDDT,
+        ref_struct: Structure,
+        chains: list[str],
+        interfaces: list[tuple[str, str]],
+        complex_result_dict: dict[str, Any],
+        chain_result_dict: dict[str, dict[str, Any]],
+        interface_result_dict: dict[tuple[str, str], dict[str, Any]],
+    ) -> None:
+        """Attach stereochemistry violation summaries to output dicts.
+        Only active when `metric.lddt.stereochecks=True` and the underlying
+        stereochemistry checker produced violation tables.
+        """
+        if not lddt_config.stereochecks:
+            return
+        stereo_violation_dfs = getattr(lddt_calculator, "stereo_violation_dfs", None)
+        if stereo_violation_dfs is None:
+            return
+        clash_df, bad_bond_df, bad_angle_df = stereo_violation_dfs
+        n_atoms = len(ref_struct.atom_array)
+        # Complex-level summary
+        complex_result_dict["stereochecks"] = cls._calc_stereochecks_summary(
+            atom_mask=np.ones(n_atoms, dtype=bool),
+            clash_df=clash_df,
+            bad_bond_df=bad_bond_df,
+            bad_angle_df=bad_angle_df,
+        )
+        # Chain-level summary (keyed by reference chain IDs)
+        for chain_id in chains:
+            chain_atom_mask = ref_struct.uni_chain_id == chain_id
+            chain_result_dict.setdefault(chain_id, {})[
+                "stereochecks"
+            ] = cls._calc_stereochecks_summary(
+                atom_mask=chain_atom_mask,
+                clash_df=clash_df,
+                bad_bond_df=bad_bond_df,
+                bad_angle_df=bad_angle_df,
+            )
+        # Interface-level summary (keyed by sorted(reference chain IDs))
+        for chain_1, chain_2 in interfaces:
+            interface_key = tuple(sorted((chain_1, chain_2)))
+            interface_atom_mask = (ref_struct.uni_chain_id == chain_1) | (
+                ref_struct.uni_chain_id == chain_2
+            )
+            interface_result_dict.setdefault(interface_key, {})[
+                "stereochecks"
+            ] = cls._calc_stereochecks_summary(
+                atom_mask=interface_atom_mask,
+                clash_df=clash_df,
+                bad_bond_df=bad_bond_df,
+                bad_angle_df=bad_angle_df,
+            )
     @classmethod
     def from_struct(
         cls,
         ref_struct: Structure,
         model_struct: Structure,
-        ori_model_chain_ids: list[str] | None = None,
-        interested_lig_label_asym_id: str | list[str] | None = None,
+        ori_model_chain_ids: Optional[list[str]] = None,
+        interested_lig_label_asym_id: Optional[Union[str, list[str]]] = None,
         metric_config: ConfigDict = RUN_CONFIG.metric,
+        update_data: Optional[dict[str, Any]] = None,
     ) -> "MetricResult":
         """
         Create a MetricResult instance from given structures and features.
@@ -525,6 +583,8 @@ class MetricResult:
                 specifying the ligand label asym IDs of interest.
             metric_config (dict[str, Any]): A dictionary containing configuration for
                           metrics. Defaults to RUN_CONFIG.metric.
+            update_data (dict[str, Any] | None): A dictionary containing additional data to update.
+                Defaults to None.
         Returns:
             MetricResult: An instance of MetricResult containing the calculated metrics.
@@ -555,16 +615,6 @@ class MetricResult:
         meta_info_dict["ref_to_model_chain_mapping"] = chain_map
         meta_info_dict["ref_chain_info"] = cls._get_chain_info(ref_struct)
-        # Calculate clashes
-        if metric_config.calc_clashes:
-            clashes = check_clashes_by_vdw(
-                model_struct.atom_array,
-                vdw_scale_factor=metric_config.clashes.vdw_scale_factor,
-            )
-            complex_result_dict["clashes"] = len(
-                {x for a, b in clashes for x in (a, b)}
-            )
         # Calculate RMSD (if ligand and pocket specified in ref_features)
         if metric_config.calc_rmsd and interested_lig_label_asym_id:
             rmsd_metrics = RMSDMetrics(
@@ -590,8 +640,21 @@ class MetricResult:
                 model_struct=model_struct,
                 lddt_config=metric_config.lddt,
             )
+            cls._maybe_add_lddt_stereochecks_summaries(
+                lddt_config=metric_config.lddt,
+                lddt_calculator=calc_lddt.lddt_calculator,
+                ref_struct=ref_struct,
+                chains=chains,
+                interfaces=interfaces,
+                complex_result_dict=complex_result_dict,
+                chain_result_dict=chain_result_dict,
+                interface_result_dict=interface_result_dict,
+            )
             complex_lddt = calc_lddt.get_complex_lddt()
-            complex_result_dict["lddt"] = complex_lddt
+            if not np.isnan(complex_lddt):
+                complex_result_dict["lddt"] = complex_lddt
             chain_interface_lddt = calc_lddt.get_chain_interface_lddt(
                 chains, interfaces
@@ -605,12 +668,34 @@ class MetricResult:
             cls._update_src_to_tar_dict(chain_lddt_dict, chain_result_dict)
             cls._update_src_to_tar_dict(interface_lddt_dict, interface_result_dict)
+            if metric_config.lddt.calc_backbone_lddt:
+                backbone_mask = ref_struct.get_backbone_atom_masks(only_rep_atom=True)
+                complex_bb_lddt = calc_lddt.get_complex_lddt(atom_mask=backbone_mask)
+                if not np.isnan(complex_bb_lddt):
+                    complex_result_dict["bb_lddt"] = complex_bb_lddt
+                # It reuses the chains and interfaces from the previous step
+                chain_interface_lddt = calc_lddt.get_chain_interface_lddt(
+                    chains, interfaces, atom_mask=backbone_mask
+                )
+                (
+                    chain_bb_lddt_dict,
+                    interface_bb_lddt_dict,
+                ) = cls._post_process_chain_interface_lddt(
+                    chains, interfaces, chain_interface_lddt, metric_name="bb_lddt"
+                )
+                cls._update_src_to_tar_dict(chain_bb_lddt_dict, chain_result_dict)
+                cls._update_src_to_tar_dict(
+                    interface_bb_lddt_dict, interface_result_dict
+                )
         # Calculate DockQ
         if metric_config.calc_dockq:
             dockq_result_dict = compute_dockq(
                 ref_struct=ref_struct,
                 model_struct=model_struct,
                 ref_to_model_chain_map=chain_map,
+                exclude_hetatms=metric_config.dockq.exclude_hetatms,
             )
             interface_dockq_dict = cls._post_process_dockq(dockq_result_dict)
             cls._update_src_to_tar_dict(interface_dockq_dict, interface_result_dict)
@@ -635,6 +720,7 @@ class MetricResult:
             interface=interface_result_dict,
             pb_valid=chain_pb_valid_dict,
             ori_model_chain_ids=ori_model_chain_ids,
+            update_data=update_data,
         )
     def to_json_dict(self) -> dict[str, Any]:
@@ -663,7 +749,7 @@ class MetricResult:
             json_dict["ori_model_chain_ids"] = self.ori_model_chain_ids
         return json_dict
-    def to_json(self, json_file: Path, update_data: dict | None = None):
+    def to_json(self, json_file: Path, update_data: Optional[dict] = None):
         """
         Convert the MetricResult instance to a JSON string.
@@ -677,5 +763,8 @@ class MetricResult:
         if update_data:
             json_dict.update(update_data)
+        if self.update_data is not None:
+            json_dict.update(self.update_data)
         with open(json_file, "w", encoding="utf-8") as f:
             json.dump(json_dict, f, indent=4, ensure_ascii=False)

pxmeter 0.1.6__tar.gz → 1.0.0__tar.gz

pxmeter 0.1.6tar.gz → 1.0.0tar.gz