PyPI - cavefiller - Versions diffs - 0.2.1__tar.gz → 0.3.1__tar.gz - Mend

cavefiller 0.2.1tar.gz → 0.3.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

{cavefiller-0.2.1 → cavefiller-0.3.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: cavefiller
-Version: 0.2.1
+Version: 0.3.1
 Summary: A tool to find and fill protein cavities with water molecules using KVFinder and Packmol
 Author: CaveFiller Contributors
 Requires-Python: >=3.8
@@ -11,6 +11,7 @@ Requires-Dist: pykvfinder>=0.6.0
 Requires-Dist: rdkit>=2022.9.1
 Requires-Dist: numpy>=1.20.0
 Requires-Dist: biopython>=1.79
+Requires-Dist: tqdm>=4.67.3
 Provides-Extra: dev
 Requires-Dist: pytest>=7.0.0; extra == "dev"
 Requires-Dist: black>=22.0.0; extra == "dev"
@@ -74,18 +75,23 @@ cavefiller [PROTEIN_FILE] [OPTIONS]
 **Options:**
 - `--output-dir PATH`: Directory to save output files (default: `./output`)
+- `--grid-step FLOAT`: Grid spacing for cavity detection in Ångströms (default: 0.6)
 - `--probe-in FLOAT`: Probe In radius for cavity detection in Ångströms (default: 1.4)
 - `--probe-out FLOAT`: Probe Out radius for cavity detection in Ångströms (default: 4.0)
+- `--exterior-trim-distance FLOAT`: Exterior trim distance in Ångströms (default: 2.4)
 - `--volume-cutoff FLOAT`: Minimum cavity volume to consider in Ų (default: 5.0)
 - `--auto-select`: Automatically select all cavities without user interaction
 - `--cavity-ids TEXT`: Comma-separated list of cavity IDs to fill (e.g., '1,2,3')
 - `--waters-per-cavity TEXT`: Comma-separated list of water counts (e.g., '10,15,20'), must match cavity-ids order
 - `--optimize-mmff94 / --no-optimize-mmff94`: Enable/disable MMFF94 with protein fixed (default: enabled)
 - `--mmff-max-iterations INTEGER`: Max MMFF94 iterations (default: 300)
+- `--remove-after-optim / --no-remove-after-optim`: After MMFF94, remove waters that fail post-checks (default: enabled)
+  - Also accepted: `--remove_after_optim / --no_remove_after_optim`
 Recommended usage:
 - Prefer interactive/manual cavity and water-count selection over `--auto-select`. Auto-selection often overfills cavities with too many waters.
 - Keep `--optimize-mmff94` enabled (recommended) to refine water placement after Monte Carlo sampling.
+- Use `--no-remove-after-optim` if you want to keep all waters after MMFF94, even if they clash or move out of cavity bounds.
 ### Examples
@@ -106,7 +112,7 @@ cavefiller protein.pdb --cavity-ids "1,3,5" --waters-per-cavity "10,15,20"
 **Custom cavity detection parameters:**
 ```bash
-cavefiller protein.pdb --probe-in 1.2 --probe-out 5.0 --volume-cutoff 10.0
+cavefiller protein.pdb --grid-step 0.6 --probe-in 1.4 --probe-out 4.0 --exterior-trim-distance 2.4 --volume-cutoff 5.0
 ```
 ## Workflow
@@ -180,7 +186,7 @@ This repository includes GitHub Actions workflow at `.github/workflows/ci-cd.yml
 - Runs `pytest` on every push to `main`
 - Runs `pytest` on every pull request targeting `main`
 - Builds package distributions after tests pass
-- Publishes to PyPI only when you push a version tag like `v0.1.1`
+- Publishes to PyPI only on pushes to `main` where `pyproject.toml` `project.version` changed
 #### One-time setup for automatic PyPI publishing
@@ -200,14 +206,7 @@ No PyPI API token secret is needed when using Trusted Publishing.
    - `pyproject.toml` (`project.version`)
    - `cavefiller/__init__.py` (`__version__`)
 2. Commit and push to `main`.
-3. Create and push a matching tag:
-```bash
-git tag v0.1.1
-git push origin v0.1.1
-```
-The workflow validates that the tag matches `pyproject.toml` (for example, tag `v0.1.1` must match version `0.1.1`) before publishing.
+3. CI will publish that pushed version to PyPI automatically, but only if `pyproject.toml` version changed versus the previous commit on `main`.
 ## License

{cavefiller-0.2.1 → cavefiller-0.3.1}/README.md RENAMED Viewed

@@ -55,18 +55,23 @@ cavefiller [PROTEIN_FILE] [OPTIONS]
 **Options:**
 - `--output-dir PATH`: Directory to save output files (default: `./output`)
+- `--grid-step FLOAT`: Grid spacing for cavity detection in Ångströms (default: 0.6)
 - `--probe-in FLOAT`: Probe In radius for cavity detection in Ångströms (default: 1.4)
 - `--probe-out FLOAT`: Probe Out radius for cavity detection in Ångströms (default: 4.0)
+- `--exterior-trim-distance FLOAT`: Exterior trim distance in Ångströms (default: 2.4)
 - `--volume-cutoff FLOAT`: Minimum cavity volume to consider in Ų (default: 5.0)
 - `--auto-select`: Automatically select all cavities without user interaction
 - `--cavity-ids TEXT`: Comma-separated list of cavity IDs to fill (e.g., '1,2,3')
 - `--waters-per-cavity TEXT`: Comma-separated list of water counts (e.g., '10,15,20'), must match cavity-ids order
 - `--optimize-mmff94 / --no-optimize-mmff94`: Enable/disable MMFF94 with protein fixed (default: enabled)
 - `--mmff-max-iterations INTEGER`: Max MMFF94 iterations (default: 300)
+- `--remove-after-optim / --no-remove-after-optim`: After MMFF94, remove waters that fail post-checks (default: enabled)
+  - Also accepted: `--remove_after_optim / --no_remove_after_optim`
 Recommended usage:
 - Prefer interactive/manual cavity and water-count selection over `--auto-select`. Auto-selection often overfills cavities with too many waters.
 - Keep `--optimize-mmff94` enabled (recommended) to refine water placement after Monte Carlo sampling.
+- Use `--no-remove-after-optim` if you want to keep all waters after MMFF94, even if they clash or move out of cavity bounds.
 ### Examples
@@ -87,7 +92,7 @@ cavefiller protein.pdb --cavity-ids "1,3,5" --waters-per-cavity "10,15,20"
 **Custom cavity detection parameters:**
 ```bash
-cavefiller protein.pdb --probe-in 1.2 --probe-out 5.0 --volume-cutoff 10.0
+cavefiller protein.pdb --grid-step 0.6 --probe-in 1.4 --probe-out 4.0 --exterior-trim-distance 2.4 --volume-cutoff 5.0
 ```
 ## Workflow
@@ -161,7 +166,7 @@ This repository includes GitHub Actions workflow at `.github/workflows/ci-cd.yml
 - Runs `pytest` on every push to `main`
 - Runs `pytest` on every pull request targeting `main`
 - Builds package distributions after tests pass
-- Publishes to PyPI only when you push a version tag like `v0.1.1`
+- Publishes to PyPI only on pushes to `main` where `pyproject.toml` `project.version` changed
 #### One-time setup for automatic PyPI publishing
@@ -181,14 +186,7 @@ No PyPI API token secret is needed when using Trusted Publishing.
    - `pyproject.toml` (`project.version`)
    - `cavefiller/__init__.py` (`__version__`)
 2. Commit and push to `main`.
-3. Create and push a matching tag:
-```bash
-git tag v0.1.1
-git push origin v0.1.1
-```
-The workflow validates that the tag matches `pyproject.toml` (for example, tag `v0.1.1` must match version `0.1.1`) before publishing.
+3. CI will publish that pushed version to PyPI automatically, but only if `pyproject.toml` version changed versus the previous commit on `main`.
 ## License

{cavefiller-0.2.1 → cavefiller-0.3.1}/cavefiller/__init__.py RENAMED Viewed

@@ -1,3 +1,3 @@
 """CaveFiller - A tool to find and fill protein cavities with water molecules."""
-__version__ = "0.2.1"
+__version__ = "0.3.1"

cavefiller-0.3.1/cavefiller/cavity_finder.py ADDED Viewed

@@ -0,0 +1,174 @@
+"""Cavity detection using pyKVFinder."""
+from typing import List, Dict, Tuple, Any
+import numpy as np
+# Grid spacing for cavity detection (in Angstroms)
+DEFAULT_GRID_STEP = 0.6
+DEFAULT_PROBE_IN = 1.4
+DEFAULT_PROBE_OUT = 4.0
+DEFAULT_EXTERIOR_TRIM_DISTANCE = 2.4
+DEFAULT_VOLUME_CUTOFF = 5.0
+def _map_volume_keys_to_grid_labels(cavity_data: Any, volume_keys: List[str]) -> Dict[str, int]:
+    """
+    Map pyKVFinder cavity string IDs (KAA, KAB, ...) to integer labels in `cavity_data.cavities`.
+    pyKVFinder 0.9.0 often uses positive cavity labels starting at 2 (label 1 is reserved),
+    while volumes are reported by sequential string IDs. We map by ordered positive labels.
+    """
+    labels = sorted(int(v) for v in np.unique(cavity_data.cavities) if int(v) > 0)
+    if not labels:
+        return {}
+    # Most pyKVFinder outputs are contiguous and aligned with volume-key order.
+    if len(labels) >= len(volume_keys):
+        ordered_labels = labels[: len(volume_keys)]
+    else:
+        ordered_labels = labels + list(range(labels[-1] + 1, labels[-1] + 1 + (len(volume_keys) - len(labels))))
+    return {key: int(ordered_labels[idx]) for idx, key in enumerate(volume_keys)}
+def find_cavities(
+    protein_file: str,
+    probe_in: float = DEFAULT_PROBE_IN,
+    probe_out: float = DEFAULT_PROBE_OUT,
+    step: float = DEFAULT_GRID_STEP,
+    removal_distance: float = DEFAULT_EXTERIOR_TRIM_DISTANCE,
+    volume_cutoff: float = DEFAULT_VOLUME_CUTOFF,
+    output_dir: str = "./output",
+) -> Tuple[List[Dict[str, Any]], Any]:
+    """
+    Find cavities in a protein structure using pyKVFinder.
+    Args:
+        protein_file: Path to the protein PDB file
+        probe_in: Probe In radius for cavity detection (Å)
+        probe_out: Probe Out radius for cavity detection (Å)
+        step: Grid spacing for cavity detection (Å)
+        removal_distance: Exterior trim distance for cavity detection (Å)
+        volume_cutoff: Minimum cavity volume to consider (Ų)
+        output_dir: Directory to save cavity detection results
+    Returns:
+        Tuple of (list of cavity dictionaries, cavity_data object)
+    """
+    try:
+        import pyKVFinder
+    except ImportError:
+        raise ImportError(
+            "pyKVFinder is not installed. Please install it with: pip install pykvfinder"
+        )
+    # Run KVFinder to detect cavities
+    cavity_data = pyKVFinder.run_workflow(
+        input=protein_file,
+        probe_in=probe_in,
+        probe_out=probe_out,
+        step=step,
+        removal_distance=removal_distance,
+        volume_cutoff=volume_cutoff,
+    )
+    # Extract cavity information
+    cavities = []
+    # Get cavity volumes and areas
+    if hasattr(cavity_data, 'volume') and cavity_data.volume is not None:
+        volumes = cavity_data.volume
+        areas = cavity_data.area if hasattr(cavity_data, 'area') else {}
+        # Map string IDs to underlying integer cavity-grid labels.
+        # User-facing IDs remain sequential for compatibility.
+        volume_keys = list(volumes.keys())
+        cavity_grid_id_map = _map_volume_keys_to_grid_labels(cavity_data, volume_keys)
+        # Process each cavity
+        for display_idx, (cavity_str_id, volume) in enumerate(volumes.items(), start=1):
+            if volume >= volume_cutoff:
+                cavity_info = {
+                    "id": display_idx,
+                    "grid_id": cavity_grid_id_map.get(cavity_str_id, display_idx),
+                    "string_id": cavity_str_id,
+                    "volume": volume,
+                    "area": areas.get(cavity_str_id, 0.0) if areas else 0.0,
+                }
+                cavities.append(cavity_info)
+    # Sort cavities by volume (largest first)
+    cavities.sort(key=lambda x: x["volume"], reverse=True)
+    return cavities, cavity_data
+def get_cavity_grid_points(cavity_data: Any, cavity_id: int) -> np.ndarray:
+    """
+    Get the grid points that belong to a specific cavity.
+    Args:
+        cavity_data: The cavity data object from pyKVFinder
+        cavity_id: Integer ID of the cavity (1-indexed)
+    Returns:
+        Array of (x, y, z) coordinates for the cavity grid points
+    """
+    if not hasattr(cavity_data, 'cavities') or cavity_data.cavities is None:
+        return np.array([])
+    # Get cavity grid
+    cavity_grid = cavity_data.cavities
+    # Find all points belonging to this cavity
+    # Note: KVFinder uses 1-indexed cavity IDs in the grid
+    points = np.argwhere(cavity_grid == cavity_id)
+    points = points.astype(float)
+    # Convert grid indices to real coordinates if metadata is available.
+    # pyKVFinder versions expose this either as public P1/P2/P3/P4 or private _vertices/_step.
+    step = float(getattr(cavity_data, "step", getattr(cavity_data, "_step", DEFAULT_GRID_STEP)))
+    vertices = None
+    if hasattr(cavity_data, "surface") and hasattr(cavity_data.surface, "P1"):
+        vertices = np.array(
+            [
+                [cavity_data.surface.P1[i] for i in range(3)],
+                [cavity_data.surface.P2[i] for i in range(3)],
+                [cavity_data.surface.P3[i] for i in range(3)],
+                [cavity_data.surface.P4[i] for i in range(3)],
+            ],
+            dtype=float,
+        )
+    elif hasattr(cavity_data, "P1"):
+        vertices = np.array(
+            [
+                [cavity_data.P1[i] for i in range(3)],
+                [cavity_data.P2[i] for i in range(3)],
+                [cavity_data.P3[i] for i in range(3)],
+                [cavity_data.P4[i] for i in range(3)],
+            ],
+            dtype=float,
+        )
+    elif hasattr(cavity_data, "_vertices"):
+        vertices = np.asarray(cavity_data._vertices, dtype=float)
+    if vertices is not None and vertices.shape[0] >= 4:
+        origin = vertices[0]
+        axes = [vertices[1] - origin, vertices[2] - origin, vertices[3] - origin]
+        unit_axes = []
+        for axis in axes:
+            norm = np.linalg.norm(axis)
+            unit_axes.append(axis / norm if norm > 1e-8 else np.zeros(3, dtype=float))
+        return (
+            origin
+            + points[:, [0]] * (step * unit_axes[0])
+            + points[:, [1]] * (step * unit_axes[1])
+            + points[:, [2]] * (step * unit_axes[2])
+        )
+    if vertices is not None and vertices.shape[0] >= 1:
+        return vertices[0] + points * step
+    # Fallback: return index-space points; downstream code will align to protein frame.
+    return points

{cavefiller-0.2.1 → cavefiller-0.3.1}/cavefiller/cli.py RENAMED Viewed

@@ -2,8 +2,15 @@
 import typer
 from pathlib import Path
-from typing import Optional, List
-from cavefiller.cavity_finder import find_cavities
+from typing import Optional
+from cavefiller.cavity_finder import (
+    find_cavities,
+    DEFAULT_GRID_STEP,
+    DEFAULT_PROBE_IN,
+    DEFAULT_PROBE_OUT,
+    DEFAULT_EXTERIOR_TRIM_DISTANCE,
+    DEFAULT_VOLUME_CUTOFF,
+)
 from cavefiller.cavity_selector import select_cavities
 from cavefiller.water_filler import fill_cavities_with_water
@@ -21,16 +28,26 @@ def run(
         Path("./output"),
         help="Directory to save output files",
     ),
+    grid_step: float = typer.Option(
+        DEFAULT_GRID_STEP,
+        "--grid-step",
+        help="Grid spacing for cavity detection (Å)",
+    ),
     probe_in: float = typer.Option(
-        1.4,
+        DEFAULT_PROBE_IN,
         help="Probe In radius for cavity detection (Å)",
     ),
     probe_out: float = typer.Option(
-        4.0,
+        DEFAULT_PROBE_OUT,
         help="Probe Out radius for cavity detection (Å)",
     ),
+    exterior_trim_distance: float = typer.Option(
+        DEFAULT_EXTERIOR_TRIM_DISTANCE,
+        "--exterior-trim-distance",
+        help="Exterior trim distance (KVFinder removal distance) (Å)",
+    ),
     volume_cutoff: float = typer.Option(
-        5.0,
+        DEFAULT_VOLUME_CUTOFF,
         help="Minimum cavity volume to consider (Å³)",
     ),
     auto_select: bool = typer.Option(
@@ -54,6 +71,15 @@ def run(
         300,
         help="Maximum MMFF94 iterations when optimization is enabled.",
     ),
+    remove_after_optim: bool = typer.Option(
+        True,
+        "--remove-after-optim/--no-remove-after-optim",
+        "--remove_after_optim/--no_remove_after_optim",
+        help=(
+            "After MMFF94, remove waters that fail cavity/clash validation. "
+            "Disable to keep all optimized waters."
+        ),
+    ),
 ):
     """
     Find cavities in a protein and fill them with explicit water molecules.
@@ -74,8 +100,10 @@ def run(
     typer.echo("Step 1: Finding cavities with KVFinder...")
     cavities, cavity_data = find_cavities(
         str(protein_file),
+        step=grid_step,
         probe_in=probe_in,
         probe_out=probe_out,
+        removal_distance=exterior_trim_distance,
         volume_cutoff=volume_cutoff,
         output_dir=str(output_dir),
     )
@@ -135,6 +163,7 @@ def run(
         waters_per_cavity=waters_dict,
         optimize_mmff94=optimize_mmff94,
         mmff_max_iterations=mmff_max_iterations,
+        remove_after_optim=remove_after_optim,
     )
     typer.echo(f"\n✅ Success! Output saved to: {output_file}")

{cavefiller-0.2.1 → cavefiller-0.3.1}/cavefiller/water_filler.py RENAMED Viewed

@@ -30,6 +30,46 @@ HOH_ANGLE_DEG = 104.52
 DEFAULT_MMFF_MAX_ITERS = 300
+def _run_mmff_with_progress(ff: Any, max_iterations: int) -> None:
+    """Run MMFF minimization while reporting per-step progress."""
+    if max_iterations <= 0:
+        return
+    try:
+        from tqdm import tqdm  # type: ignore
+    except Exception:
+        tqdm = None
+    converged = False
+    if tqdm is not None:
+        with tqdm(total=max_iterations, desc="MMFF94", unit="iter") as bar:
+            for _step in range(1, max_iterations + 1):
+                # RDKit returns 0 on convergence, non-zero otherwise.
+                status = ff.Minimize(maxIts=1)
+                bar.update(1)
+                if status == 0:
+                    converged = True
+                    break
+            if converged:
+                bar.set_postfix_str("converged")
+            else:
+                bar.set_postfix_str("max iters")
+    else:
+        print(f"MMFF94 optimization progress: 0/{max_iterations}")
+        for step in range(1, max_iterations + 1):
+            status = ff.Minimize(maxIts=1)
+            if step == 1 or step % 10 == 0 or step == max_iterations or status == 0:
+                print(f"MMFF94 optimization progress: {step}/{max_iterations}")
+            if status == 0:
+                converged = True
+                break
+    if converged:
+        print("MMFF94 optimization converged before max iterations")
+    else:
+        print("MMFF94 optimization reached max iterations")
 def fill_cavities_with_water(
     protein_file: str,
     selected_cavities: List[Dict[str, Any]],
@@ -38,6 +78,7 @@ def fill_cavities_with_water(
     waters_per_cavity: Dict[int, int] = None,
     optimize_mmff94: bool = True,
     mmff_max_iterations: int = DEFAULT_MMFF_MAX_ITERS,
+    remove_after_optim: bool = True,
 ) -> str:
     """Place explicit waters in selected cavities and write a combined PDB."""
     base_name = os.path.splitext(os.path.basename(protein_file))[0]
@@ -51,6 +92,7 @@ def fill_cavities_with_water(
     for cavity in selected_cavities:
         cavity_id = cavity["id"]
+        cavity_grid_id = int(cavity.get("grid_id", cavity_id))
         volume = cavity["volume"]
         if waters_per_cavity and cavity_id in waters_per_cavity:
@@ -63,9 +105,12 @@ def fill_cavities_with_water(
             f"(volume: {volume:.2f} A^3)..."
         )
-        cavity_points = get_cavity_grid_points(cavity_data, cavity_id)
+        cavity_points = get_cavity_grid_points(cavity_data, cavity_grid_id)
         if len(cavity_points) == 0:
-            print(f"  Warning: no grid points found for cavity {cavity_id}")
+            print(
+                f"  Warning: no grid points found for cavity {cavity_id} "
+                f"(grid label {cavity_grid_id})"
+            )
             continue
         cavity_points = _ensure_cavity_points_near_protein(cavity_points, protein_atoms)
@@ -105,12 +150,19 @@ def fill_cavities_with_water(
                 water_origin_cavity_points=all_water_origin_cavity_points,
                 protein_atoms=protein_atoms,
                 max_iterations=mmff_max_iterations,
+                remove_after_optim=remove_after_optim,
             )
             if optimized_positions:
-                print(
-                    f"MMFF94 (fixed protein) kept {len(optimized_positions)}/"
-                    f"{len(all_water_positions)} waters after filtering"
-                )
+                if remove_after_optim:
+                    print(
+                        f"MMFF94 (fixed protein) kept {len(optimized_positions)}/"
+                        f"{len(all_water_positions)} waters after filtering"
+                    )
+                else:
+                    print(
+                        f"MMFF94 (fixed protein) kept all {len(optimized_positions)} waters "
+                        "without post-optimization filtering"
+                    )
                 optimized_waters_mol = build_waters_mol(
                     water_positions=[],
                     chain_id="W",
@@ -278,6 +330,7 @@ def optimize_waters_mmff94_fixed_protein(
     water_origin_cavity_points: List[np.ndarray],
     protein_atoms: List[Tuple[str, np.ndarray]],
     max_iterations: int = DEFAULT_MMFF_MAX_ITERS,
+    remove_after_optim: bool = True,
 ) -> Tuple[List[np.ndarray], List[Tuple[np.ndarray, np.ndarray, np.ndarray]]]:
     """MMFF94 optimize waters with protein atoms fixed in place."""
     if not water_positions:
@@ -328,7 +381,7 @@ def optimize_waters_mmff94_fixed_protein(
             ff.AddFixedPoint(atom_idx)
         ff.Initialize()
-        ff.Minimize(maxIts=max_iterations)
+        _run_mmff_with_progress(ff, max_iterations)
         conf = work_mol.GetConformer()
         optimized_geometries = []
@@ -347,6 +400,9 @@ def optimize_waters_mmff94_fixed_protein(
                 )
             )
+        if not remove_after_optim:
+            return [geom[0] for geom in optimized_geometries], optimized_geometries
         selected_geometries = _select_valid_geometries_after_mmff(
             optimized_geometries=optimized_geometries,
             original_geometries=original_geometries,

{cavefiller-0.2.1 → cavefiller-0.3.1}/cavefiller.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: cavefiller
-Version: 0.2.1
+Version: 0.3.1
 Summary: A tool to find and fill protein cavities with water molecules using KVFinder and Packmol
 Author: CaveFiller Contributors
 Requires-Python: >=3.8
@@ -11,6 +11,7 @@ Requires-Dist: pykvfinder>=0.6.0
 Requires-Dist: rdkit>=2022.9.1
 Requires-Dist: numpy>=1.20.0
 Requires-Dist: biopython>=1.79
+Requires-Dist: tqdm>=4.67.3
 Provides-Extra: dev
 Requires-Dist: pytest>=7.0.0; extra == "dev"
 Requires-Dist: black>=22.0.0; extra == "dev"
@@ -74,18 +75,23 @@ cavefiller [PROTEIN_FILE] [OPTIONS]
 **Options:**
 - `--output-dir PATH`: Directory to save output files (default: `./output`)
+- `--grid-step FLOAT`: Grid spacing for cavity detection in Ångströms (default: 0.6)
 - `--probe-in FLOAT`: Probe In radius for cavity detection in Ångströms (default: 1.4)
 - `--probe-out FLOAT`: Probe Out radius for cavity detection in Ångströms (default: 4.0)
+- `--exterior-trim-distance FLOAT`: Exterior trim distance in Ångströms (default: 2.4)
 - `--volume-cutoff FLOAT`: Minimum cavity volume to consider in Ų (default: 5.0)
 - `--auto-select`: Automatically select all cavities without user interaction
 - `--cavity-ids TEXT`: Comma-separated list of cavity IDs to fill (e.g., '1,2,3')
 - `--waters-per-cavity TEXT`: Comma-separated list of water counts (e.g., '10,15,20'), must match cavity-ids order
 - `--optimize-mmff94 / --no-optimize-mmff94`: Enable/disable MMFF94 with protein fixed (default: enabled)
 - `--mmff-max-iterations INTEGER`: Max MMFF94 iterations (default: 300)
+- `--remove-after-optim / --no-remove-after-optim`: After MMFF94, remove waters that fail post-checks (default: enabled)
+  - Also accepted: `--remove_after_optim / --no_remove_after_optim`
 Recommended usage:
 - Prefer interactive/manual cavity and water-count selection over `--auto-select`. Auto-selection often overfills cavities with too many waters.
 - Keep `--optimize-mmff94` enabled (recommended) to refine water placement after Monte Carlo sampling.
+- Use `--no-remove-after-optim` if you want to keep all waters after MMFF94, even if they clash or move out of cavity bounds.
 ### Examples
@@ -106,7 +112,7 @@ cavefiller protein.pdb --cavity-ids "1,3,5" --waters-per-cavity "10,15,20"
 **Custom cavity detection parameters:**
 ```bash
-cavefiller protein.pdb --probe-in 1.2 --probe-out 5.0 --volume-cutoff 10.0
+cavefiller protein.pdb --grid-step 0.6 --probe-in 1.4 --probe-out 4.0 --exterior-trim-distance 2.4 --volume-cutoff 5.0
 ```
 ## Workflow
@@ -180,7 +186,7 @@ This repository includes GitHub Actions workflow at `.github/workflows/ci-cd.yml
 - Runs `pytest` on every push to `main`
 - Runs `pytest` on every pull request targeting `main`
 - Builds package distributions after tests pass
-- Publishes to PyPI only when you push a version tag like `v0.1.1`
+- Publishes to PyPI only on pushes to `main` where `pyproject.toml` `project.version` changed
 #### One-time setup for automatic PyPI publishing
@@ -200,14 +206,7 @@ No PyPI API token secret is needed when using Trusted Publishing.
    - `pyproject.toml` (`project.version`)
    - `cavefiller/__init__.py` (`__version__`)
 2. Commit and push to `main`.
-3. Create and push a matching tag:
-```bash
-git tag v0.1.1
-git push origin v0.1.1
-```
-The workflow validates that the tag matches `pyproject.toml` (for example, tag `v0.1.1` must match version `0.1.1`) before publishing.
+3. CI will publish that pushed version to PyPI automatically, but only if `pyproject.toml` version changed versus the previous commit on `main`.
 ## License

{cavefiller-0.2.1 → cavefiller-0.3.1}/cavefiller.egg-info/requires.txt RENAMED Viewed

@@ -3,6 +3,7 @@ pykvfinder>=0.6.0
 rdkit>=2022.9.1
 numpy>=1.20.0
 biopython>=1.79
+tqdm>=4.67.3
 [dev]
 pytest>=7.0.0

{cavefiller-0.2.1 → cavefiller-0.3.1}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "cavefiller"
-version = "0.2.1"
+version = "0.3.1"
 description = "A tool to find and fill protein cavities with water molecules using KVFinder and Packmol"
 readme = "README.md"
 requires-python = ">=3.8"
@@ -17,6 +17,7 @@ dependencies = [
     "rdkit>=2022.9.1",
     "numpy>=1.20.0",
     "biopython>=1.79",
+    "tqdm>=4.67.3",
 ]
 [project.optional-dependencies]

{cavefiller-0.2.1 → cavefiller-0.3.1}/tests/test_cavefiller.py RENAMED Viewed

@@ -215,6 +215,7 @@ def test_fill_cavities_uses_mmff94_optimizer(monkeypatch):
             water_origin_cavity_points,
             protein_atoms,
             max_iterations,
+            remove_after_optim,
         ):
             called["value"] = True
             called["max_iterations"] = max_iterations
@@ -235,3 +236,46 @@ def test_fill_cavities_uses_mmff94_optimizer(monkeypatch):
         assert called["value"] is True
         assert called["max_iterations"] == 123
         assert os.path.exists(output_file)
+def test_fill_cavities_passes_remove_after_optim_flag(monkeypatch):
+    """Test that fill_cavities_with_water forwards remove_after_optim to MMFF stage."""
+    from cavefiller.cavity_finder import find_cavities
+    from cavefiller import water_filler as wf
+    if not EXAMPLE_PDB.exists():
+        pytest.skip("Example protein file not found")
+    with tempfile.TemporaryDirectory() as tmpdir:
+        cavities, cavity_data = find_cavities(str(EXAMPLE_PDB), output_dir=tmpdir)
+        if len(cavities) == 0:
+            pytest.skip("No cavities found in test protein")
+        called = {"remove_after_optim": None}
+        real_builder = wf._build_water_geometries_from_positions
+        def fake_optimize(
+            protein_mol,
+            water_positions,
+            water_origin_cavity_points,
+            protein_atoms,
+            max_iterations,
+            remove_after_optim,
+        ):
+            called["remove_after_optim"] = remove_after_optim
+            return water_positions, real_builder(water_positions, protein_atoms)
+        monkeypatch.setattr(wf, "optimize_waters_mmff94_fixed_protein", fake_optimize)
+        output_file = wf.fill_cavities_with_water(
+            str(EXAMPLE_PDB),
+            cavities[:1],
+            cavity_data,
+            tmpdir,
+            waters_per_cavity={cavities[0]["id"]: 2},
+            optimize_mmff94=True,
+            remove_after_optim=False,
+        )
+        assert called["remove_after_optim"] is False
+        assert os.path.exists(output_file)

cavefiller-0.2.1/cavefiller/cavity_finder.py DELETED Viewed

@@ -1,114 +0,0 @@
-"""Cavity detection using pyKVFinder."""
-import os
-from typing import List, Dict, Tuple, Any
-import numpy as np
-# Grid spacing for cavity detection (in Angstroms)
-DEFAULT_GRID_STEP = 0.6
-def find_cavities(
-    protein_file: str,
-    probe_in: float = 1.4,
-    probe_out: float = 4.0,
-    volume_cutoff: float = 5.0,
-    output_dir: str = "./output",
-) -> Tuple[List[Dict[str, Any]], Any]:
-    """
-    Find cavities in a protein structure using pyKVFinder.
-    Args:
-        protein_file: Path to the protein PDB file
-        probe_in: Probe In radius for cavity detection (Å)
-        probe_out: Probe Out radius for cavity detection (Å)
-        volume_cutoff: Minimum cavity volume to consider (Ų)
-        output_dir: Directory to save cavity detection results
-    Returns:
-        Tuple of (list of cavity dictionaries, cavity_data object)
-    """
-    try:
-        import pyKVFinder
-    except ImportError:
-        raise ImportError(
-            "pyKVFinder is not installed. Please install it with: pip install pykvfinder"
-        )
-    # Run KVFinder to detect cavities
-    cavity_data = pyKVFinder.run_workflow(
-        input=protein_file,
-        probe_in=probe_in,
-        probe_out=probe_out,
-        step=DEFAULT_GRID_STEP,  # Grid step size
-        volume_cutoff=volume_cutoff,
-    )
-    # Extract cavity information
-    cavities = []
-    # Get cavity volumes and areas
-    if hasattr(cavity_data, 'volume') and cavity_data.volume is not None:
-        volumes = cavity_data.volume
-        areas = cavity_data.area if hasattr(cavity_data, 'area') else {}
-        # Create mapping from string IDs to integer IDs
-        # KVFinder uses string IDs like 'KAA', 'KAB', etc., but the grid uses integers
-        cavity_id_map = {}
-        for idx, (cavity_str_id, volume) in enumerate(volumes.items(), start=1):
-            cavity_id_map[cavity_str_id] = idx
-        # Process each cavity
-        for cavity_str_id, volume in volumes.items():
-            if volume >= volume_cutoff:
-                cavity_info = {
-                    "id": cavity_id_map[cavity_str_id],
-                    "string_id": cavity_str_id,
-                    "volume": volume,
-                    "area": areas.get(cavity_str_id, 0.0) if areas else 0.0,
-                }
-                cavities.append(cavity_info)
-    # Sort cavities by volume (largest first)
-    cavities.sort(key=lambda x: x["volume"], reverse=True)
-    return cavities, cavity_data
-def get_cavity_grid_points(cavity_data: Any, cavity_id: int) -> np.ndarray:
-    """
-    Get the grid points that belong to a specific cavity.
-    Args:
-        cavity_data: The cavity data object from pyKVFinder
-        cavity_id: Integer ID of the cavity (1-indexed)
-    Returns:
-        Array of (x, y, z) coordinates for the cavity grid points
-    """
-    if not hasattr(cavity_data, 'cavities') or cavity_data.cavities is None:
-        return np.array([])
-    # Get cavity grid
-    cavity_grid = cavity_data.cavities
-    # Find all points belonging to this cavity
-    # Note: KVFinder uses 1-indexed cavity IDs in the grid
-    points = np.argwhere(cavity_grid == cavity_id)
-    # Convert grid indices to real coordinates if origin metadata is available.
-    # Different pyKVFinder versions expose metadata on either cavity_data or cavity_data.surface.
-    step = getattr(cavity_data, "step", DEFAULT_GRID_STEP)
-    origin = None
-    if hasattr(cavity_data, "surface") and hasattr(cavity_data.surface, "P1"):
-        origin = np.array([cavity_data.surface.P1[i] for i in range(3)], dtype=float)
-    elif hasattr(cavity_data, "P1"):
-        origin = np.array([cavity_data.P1[i] for i in range(3)], dtype=float)
-    points = points.astype(float)
-    if origin is not None:
-        return origin + points * float(step)
-    # Fallback: return index-space points; downstream code will align to protein frame.
-    return points

{cavefiller-0.2.1 → cavefiller-0.3.1}/LICENSE RENAMED Viewed

File without changes

{cavefiller-0.2.1 → cavefiller-0.3.1}/cavefiller/cavity_selector.py RENAMED Viewed

File without changes

{cavefiller-0.2.1 → cavefiller-0.3.1}/cavefiller.egg-info/SOURCES.txt RENAMED Viewed

File without changes

{cavefiller-0.2.1 → cavefiller-0.3.1}/cavefiller.egg-info/dependency_links.txt RENAMED Viewed

File without changes

{cavefiller-0.2.1 → cavefiller-0.3.1}/cavefiller.egg-info/entry_points.txt RENAMED Viewed

File without changes

{cavefiller-0.2.1 → cavefiller-0.3.1}/cavefiller.egg-info/top_level.txt RENAMED Viewed

File without changes

{cavefiller-0.2.1 → cavefiller-0.3.1}/setup.cfg RENAMED Viewed

File without changes

cavefiller 0.2.1__tar.gz → 0.3.1__tar.gz

cavefiller 0.2.1tar.gz → 0.3.1tar.gz