PyPI - parq-blockmodel - Versions diffs - 0.1.1__tar.gz - Mend

parq-blockmodel 0.1.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

parq_blockmodel-0.1.1/LICENSE +21 -0
parq_blockmodel-0.1.1/PKG-INFO +40 -0
parq_blockmodel-0.1.1/README.md +16 -0
parq_blockmodel-0.1.1/parq_blockmodel/__init__.py +13 -0
parq_blockmodel-0.1.1/parq_blockmodel/blockmodel.py +399 -0
parq_blockmodel-0.1.1/parq_blockmodel/geometry.py +573 -0
parq_blockmodel-0.1.1/parq_blockmodel/utils/__init__.py +3 -0
parq_blockmodel-0.1.1/parq_blockmodel/utils/demo_block_model.py +68 -0
parq_blockmodel-0.1.1/parq_blockmodel/utils/geometry_utils.py +134 -0
parq_blockmodel-0.1.1/parq_blockmodel/utils/pyvista_utils.py +143 -0
parq_blockmodel-0.1.1/parq_blockmodel/utils/spatial_encoding.py +76 -0
parq_blockmodel-0.1.1/pyproject.toml +42 -0

parq_blockmodel-0.1.1/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2023 Greg Elphick
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

parq_blockmodel-0.1.1/PKG-INFO ADDED Viewed

@@ -0,0 +1,40 @@
+Metadata-Version: 2.3
+Name: parq-blockmodel
+Version: 0.1.1
+Summary: A Python package for efficient storage, manipulation, and analysis of mining block models using Parquet files.
+Author: Greg
+Author-email: 11791585+elphick@users.noreply.github.com
+Requires-Python: >=3.10,<3.13
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Provides-Extra: blockmodel
+Provides-Extra: profiling
+Provides-Extra: progress
+Requires-Dist: lark (>=1.2.2,<2.0.0)
+Requires-Dist: numpy (>=1.25.2)
+Requires-Dist: parq-tools (==0.3.1)
+Requires-Dist: pyarrow (>=16.0)
+Requires-Dist: pyvista (>=0.45.2,<0.46.0) ; extra == "blockmodel"
+Requires-Dist: setuptools ; extra == "profiling"
+Requires-Dist: tqdm (>=4.67.1,<5.0.0) ; extra == "progress"
+Requires-Dist: ydata-profiling (>=4.16.1,<5.0.0) ; extra == "profiling"
+Description-Content-Type: text/markdown
+# parq-blockmodel
+[![Run Tests](https://github.com/Elphick/parq-blockmodel/actions/workflows/build_and_test.yml/badge.svg?branch=main)](https://github.com/Elphick/parq-blockmodel/actions/workflows/build_and_test.yml)
+[![PyPI](https://img.shields.io/pypi/v/parq-blockmodel.svg?logo=python&logoColor=white)](https://pypi.org/project/parq-blockmodel/)
+![Coverage](https://raw.githubusercontent.com/elphick/parq-blockmodel/main/docs/source/_static/badges/coverage.svg)
+[![Python Versions](https://img.shields.io/pypi/pyversions/parq-blockmodel.svg)](https://pypi.org/project/parq-blockmodel/)
+[![License](https://img.shields.io/github/license/Elphick/parq-blockmodel.svg?logo=apache&logoColor=white)](https://pypi.org/project/parq-blockmodel/)
+[![Publish Docs](https://github.com/Elphick/parq-blockmodel/actions/workflows/docs_to_gh_pages.yml/badge.svg?branch=main)](https://github.com/Elphick/parq-blockmodel/actions/workflows/docs_to_gh_pages.yml)
+[![Open Issues](https://img.shields.io/github/issues/Elphick/parq-blockmodel.svg)](https://github.com/Elphick/parq-blockmodel/issues)
+[![Open PRs](https://img.shields.io/github/issues-pr/Elphick/parq-blockmodel.svg)](https://github.com/Elphick/parq-blockmodel/pulls)
+## Overview
+A Python package for efficient storage, manipulation, and analysis of mining block models using Parquet files.
+parq-blockmodel provides tools for reading, writing, indexing, and transforming large-scale block model datasets,
+leveraging the performance of Apache Arrow and Parquet for scalable geoscience data workflows.

parq_blockmodel-0.1.1/README.md ADDED Viewed

@@ -0,0 +1,16 @@
+# parq-blockmodel
+[![Run Tests](https://github.com/Elphick/parq-blockmodel/actions/workflows/build_and_test.yml/badge.svg?branch=main)](https://github.com/Elphick/parq-blockmodel/actions/workflows/build_and_test.yml)
+[![PyPI](https://img.shields.io/pypi/v/parq-blockmodel.svg?logo=python&logoColor=white)](https://pypi.org/project/parq-blockmodel/)
+![Coverage](https://raw.githubusercontent.com/elphick/parq-blockmodel/main/docs/source/_static/badges/coverage.svg)
+[![Python Versions](https://img.shields.io/pypi/pyversions/parq-blockmodel.svg)](https://pypi.org/project/parq-blockmodel/)
+[![License](https://img.shields.io/github/license/Elphick/parq-blockmodel.svg?logo=apache&logoColor=white)](https://pypi.org/project/parq-blockmodel/)
+[![Publish Docs](https://github.com/Elphick/parq-blockmodel/actions/workflows/docs_to_gh_pages.yml/badge.svg?branch=main)](https://github.com/Elphick/parq-blockmodel/actions/workflows/docs_to_gh_pages.yml)
+[![Open Issues](https://img.shields.io/github/issues/Elphick/parq-blockmodel.svg)](https://github.com/Elphick/parq-blockmodel/issues)
+[![Open PRs](https://img.shields.io/github/issues-pr/Elphick/parq-blockmodel.svg)](https://github.com/Elphick/parq-blockmodel/pulls)
+## Overview
+A Python package for efficient storage, manipulation, and analysis of mining block models using Parquet files.
+parq-blockmodel provides tools for reading, writing, indexing, and transforming large-scale block model datasets,
+leveraging the performance of Apache Arrow and Parquet for scalable geoscience data workflows.

parq_blockmodel-0.1.1/parq_blockmodel/__init__.py ADDED Viewed

@@ -0,0 +1,13 @@
+import os
+os.environ["YDATA_SUPPRESS_BANNER"] = "1"
+from importlib import metadata
+from .blockmodel import ParquetBlockModel
+from .geometry import RegularGeometry
+try:
+    __version__ = metadata.version('parq_blockmodel')
+except metadata.PackageNotFoundError:
+    # Package is not installed
+    pass

parq_blockmodel-0.1.1/parq_blockmodel/blockmodel.py ADDED Viewed

@@ -0,0 +1,399 @@
+"""
+blockmodel.py
+This module defines the ParquetBlockModel class, which represents a block model stored in a Parquet file.
+Main API:
+- ParquetBlockModel: Class for representing a block model stored in a Parquet file.
+"""
+import logging
+import math
+import shutil
+import warnings
+from pathlib import Path
+from typing import Union, Optional
+import numpy as np
+import pandas as pd
+import pyarrow as pa
+import pyarrow.parquet as pq
+from parq_blockmodel.utils import create_demo_blockmodel, rotation_to_axis_orientation
+from parq_blockmodel.utils.pyvista_utils import df_to_pv_structured_grid, df_to_pv_unstructured_grid
+from parq_tools.lazy_parquet import LazyParquetDataFrame
+from pyarrow.parquet import ParquetFile
+from tqdm import tqdm
+from parq_tools import ParquetProfileReport
+from parq_tools.utils import atomic_output_file
+from parq_blockmodel.geometry import RegularGeometry
+Point = Union[tuple[float, float, float], list[float, float, float]]
+Triple = Union[tuple[float, float, float], list[float, float, float]]
+class ParquetBlockModel:
+    """
+    A class to represent a **regular** Parquet block model.
+    Block ordering is c-style, ordered by x, y, z coordinates.
+    Attributes:
+        blockmodel_path (Path): The file path to the blockmodel Parquet file.  This file is the source of the
+            block model data.  Consider a .pbm.parquet extension to imply a ParquetBlockModel file.
+        name (str): The name of the block model, derived from the file name.
+        block_path (Path): The original file path from which the block model will be created.
+        geometry (RegularGeometry): The geometry of the block model, derived from the Parquet file.
+    """
+    def __init__(self, blockmodel_path: Optional[Path] = None, name: Optional[str] = None,
+                 block_path: Optional[Path] = None,
+                 geometry: Optional[RegularGeometry] = None):
+        if blockmodel_path is None and block_path is not None:
+            # Derive the .pbm.parquet path from block_path
+            blockmodel_path = block_path.with_suffix('.pbm.parquet')
+            shutil.copy(block_path, blockmodel_path)
+        elif blockmodel_path is None:
+            raise ValueError("Either 'path' or 'block_path' must be provided.")
+        self.blockmodel_path = blockmodel_path
+        self.name = name or blockmodel_path.stem.strip('.pbm')
+        self.block_path = block_path or blockmodel_path
+        self.pf: ParquetFile = ParquetFile(blockmodel_path)
+        self.report_path: Optional[Path] = None
+        self.geometry: Optional[RegularGeometry] = geometry
+        if self.geometry is None and blockmodel_path.exists():
+            self.geometry = RegularGeometry.from_parquet(self.blockmodel_path)
+        self.data: LazyParquetDataFrame = LazyParquetDataFrame(self.blockmodel_path)
+        self.columns: list[str] = pq.read_schema(self.blockmodel_path).names
+        self._centroid_index: Optional[pd.MultiIndex] = None
+        self.attributes: list[str] = [col for col in self.columns if col not in ["x", "y", "z"]]
+        self._extract_column_dtypes()
+        self._logger = logging.getLogger(__name__)
+        if self.is_sparse:
+            if not self.validate_sparse():
+                raise ValueError("The sparse ParquetBlockModel is invalid. "
+                                 "Sparse centroids must be a subset of the dense grid.")
+    def __repr__(self):
+        return f"ParquetBlockModel(name={self.name}, path={self.blockmodel_path})"
+    def _extract_column_dtypes(self):
+        self.column_dtypes: dict[str, np.dtype] = {}
+        self._column_categorical_ordered: dict[str, bool] = {}
+        schema = pq.read_schema(self.blockmodel_path)
+        for col in self.columns:
+            if col in ["x", "y", "z"]:
+                continue
+            field_type = schema.field(col).type
+            if pa.types.is_dictionary(field_type):
+                self.column_dtypes[col] = pd.CategoricalDtype(ordered=field_type.ordered)
+                self._column_categorical_ordered[col] = field_type.ordered
+            else:
+                self.column_dtypes[col] = field_type.to_pandas_dtype()
+    @property
+    def column_categorical_ordered(self) -> dict[str, bool]:
+        return self._column_categorical_ordered.copy()
+    @property
+    def centroid_index(self) -> pd.MultiIndex:
+        """
+        Get the centroid index of the block model.
+        Returns:
+            pd.MultiIndex: The MultiIndex representing the centroid coordinates (x, y, z).
+        """
+        if self._centroid_index is None:
+            centroid_cols = ["x", "y", "z"]
+            centroids: pd.DataFrame = pq.read_table(self.blockmodel_path, columns=centroid_cols).to_pandas()
+            if centroids.index.names == centroid_cols:
+                index = centroids.index
+            else:
+                if centroids.empty:
+                    raise ValueError("Parquet file is empty or does not contain valid centroid data.")
+                index = centroids.set_index(["x", "y", "z"]).index
+            if not index.is_unique:
+                raise ValueError("The index of the Parquet file is not unique. "
+                                 "Ensure that the centroid coordinates (x, y, z) are unique.")
+            # Only check monotonicity if axes are aligned (not rotated)
+            if not self.geometry.is_rotated and not index.is_monotonic_increasing:
+                raise ValueError("The index of the Parquet file is not sorted in ascending order. "
+                                 "Ensure that the centroid coordinates (x, y, z) are sorted.")
+            self._centroid_index = index
+        return self._centroid_index
+    @property
+    def is_sparse(self) -> bool:
+        dense_index = self.geometry.to_multi_index()
+        return len(self.centroid_index) < len(dense_index)
+    @property
+    def sparsity(self) -> float:
+        dense_index = self.geometry.to_multi_index()
+        return 1.0 - (len(self.centroid_index) / len(dense_index))
+    @property
+    def index_c(self) -> np.ndarray:
+        """Zero-based C-order (x, y, z) indices for the dense grid."""
+        shape = self.geometry.shape
+        return np.arange(np.prod(shape)).reshape(shape, order='C').ravel(order='C')
+    @property
+    def index_f(self) -> np.ndarray:
+        """Zero-based F-order (z, y, x) indices for the dense grid."""
+        shape = self.geometry.shape
+        return np.arange(np.prod(shape)).reshape(shape, order='C').ravel(order='F')
+    def validate_sparse(self) -> bool:
+        dense_index = self.geometry.to_multi_index()
+        # All sparse centroids must be in the dense grid
+        return self.centroid_index.isin(dense_index).all()
+    @classmethod
+    def from_parquet(cls, parquet_path: Path, overwrite: bool = False) -> "ParquetBlockModel":
+        """ Create a ParquetBlockModel instance from a Parquet file.
+        Args:
+            parquet_path (Path): The path to the Parquet file.
+            overwrite (bool): If True, allows overwriting an existing ParquetBlockModel file. Defaults to False.
+        """
+        if parquet_path.suffixes[-2:] == [".pbm", ".parquet"]:
+            if not overwrite:
+                raise ValueError(
+                    f"File {parquet_path} appears to be a compliant ParquetBlockModel file. "
+                    f"Use the constructor directly, or pass overwrite=True to allow mutation."
+                )
+        new_filepath = shutil.copy(parquet_path, parquet_path.resolve().with_suffix(".pbm.parquet"))
+        return cls(name=parquet_path.stem, blockmodel_path=new_filepath, block_path=parquet_path)
+    @classmethod
+    def create_demo_block_model(cls, filename: Path,
+                                shape=(3, 3, 3),
+                                block_size=(1, 1, 1),
+                                corner=(-0.5, -0.5, -0.5),
+                                azimuth: float = 0.0,
+                                dip: float = 0.0,
+                                plunge: float = 0.0) -> "ParquetBlockModel":
+        """
+        Create a demo block model with specified parameters.
+        Args:
+            filename (Path): The file path where the Parquet file will be saved.
+            shape (tuple): The shape of the block model.
+            block_size (tuple): The size of each block.
+            corner (tuple): The coordinates of the corner of the block model.
+            azimuth (float): The azimuth angle in degrees for rotation.
+            dip (float): The dip angle in degrees for rotation.
+            plunge (float): The plunge angle in degrees for rotation.
+        Returns:
+            ParquetBlockModel: An instance of ParquetBlockModel with demo data.
+        """
+        create_demo_blockmodel(shape=shape, block_size=block_size, corner=corner,
+                               azimuth=azimuth, dip=dip, plunge=plunge,
+                               parquet_filepath=filename)
+        # get the orientation of the axes
+        axis_u, axis_v, axis_w = rotation_to_axis_orientation(azimuth=azimuth, dip=dip, plunge=plunge)
+        # create geometry that aligns with the demo block model
+        geometry = RegularGeometry(block_size=block_size, corner=corner, shape=shape,
+                                   axis_u=axis_u, axis_v=axis_v, axis_w=axis_w)
+        return cls(geometry=geometry, block_path=filename)
+    @classmethod
+    def from_geometry(cls, geometry: RegularGeometry, path: Path, name: Optional[str] = None) -> "ParquetBlockModel":
+        centroids_df = geometry.to_dataframe()
+        centroids_df.to_parquet(path, index=False)
+        return cls(blockmodel_path=path, name=name, geometry=geometry)
+    def create_report(self, columns: Optional[list[str]] = None,
+                      column_batch_size: int = 10,
+                      show_progress: bool = True, open_in_browser: bool = False) -> Path:
+        """
+        Create a ydata-profiling report for the block model.
+        The report will be of the same name as the block model, with a '.html' extension.
+        Args:
+            columns: List of column names to include in the profile. If None, all columns are used.
+            column_batch_size: The number of columns to process in each batch. If None, processes all columns at once.
+            show_progress: bool: If True, displays a progress bar during profiling.
+            open_in_browser: bool: If True, opens the report in a web browser after generation.
+        Returns
+            Path: The path to the generated profile report.
+        """
+        report: ParquetProfileReport = ParquetProfileReport(self.blockmodel_path, columns=columns,
+                                                            batch_size=column_batch_size,
+                                                            show_progress=show_progress).profile()
+        if open_in_browser:
+            report.show(notebook=False)
+        if not columns:
+            self.report_path = self.blockmodel_path.with_suffix('.html')
+        return self.report_path
+    def plot(self, scalar: str, threshold: bool = True, show_edges: bool = True,
+             show_axes: bool = True) -> 'pv.Plotter':
+        import pyvista as pv
+        if scalar not in self.attributes:
+            raise ValueError(f"Column '{scalar}' not found in the ParquetBlockModel.")
+        # Create a PyVista plotter
+        plotter = pv.Plotter()
+        mesh = self.to_pyvista(attributes=[scalar])
+        # Add a thresholded mesh to the plotter
+        if threshold:
+            plotter.add_mesh_threshold(mesh, scalars=scalar, show_edges=show_edges)
+        else:
+            plotter.add_mesh(mesh, scalars=scalar, show_edges=show_edges)
+        plotter.title = self.name
+        if show_axes:
+            plotter.show_axes()
+        return plotter
+    def read(self, columns: Optional[list[str]] = None,
+             with_index: bool = True, dense: bool = False) -> pd.DataFrame:
+        """
+        Read the Parquet file and return a DataFrame.
+        Args:
+            columns: List of column names to read. If None, all columns are read.
+            with_index: If True, includes the index ('x', 'y', 'z') in the DataFrame.
+            dense: If True, reads the data as a dense grid. If False, reads the data as a sparse grid.
+        Returns:
+            pd.DataFrame: The DataFrame containing the block model data.
+        """
+        if columns is None:
+            columns = self.columns
+        df = pq.read_table(self.blockmodel_path, columns=columns).to_pandas()
+        if with_index:
+            df.index = self.centroid_index
+            if dense:
+                dense_index = self.geometry.to_multi_index()
+                if len(df) == len(dense_index):
+                    assert df.index.equals(dense_index)
+                df = df.reindex(dense_index)
+        return df
+    def to_pyvista(self, attributes: Optional[list[str]] = None) -> 'pv.ImageData':
+        if attributes is None:
+            attributes = self.attributes
+        grid = self.geometry.to_pyvista()
+        df = self.read(columns=attributes, with_index=False, dense=True)
+        df['f_order'] = self.index_f
+        df = df.sort_values('f_order')
+        df = df.drop(columns='f_order')
+        for col in attributes:
+            grid.cell_data[col] = df[col].values
+        return grid
+    @staticmethod
+    def _validate_geometry(filepath: Path, geometry: Optional[RegularGeometry] = None) -> None:
+        """
+        Validates the geometry of a Parquet file by checking if the index (centroid) columns are present
+        and have valid values.
+        Args:
+            filepath (Path): Path to the Parquet file.
+            geometry (RegularGeometry, optional): The geometry of the block model. If None, it will be derived from
+             the Parquet file.
+        Raises:
+            ValueError: If any index column is missing or contains invalid values.
+        """
+        index_columns = ['x', 'y', 'z']
+        columns = pq.read_schema(filepath).names
+        if not all(col in columns for col in index_columns):
+            raise ValueError(f"Missing index columns in the dataset: {', '.join(index_columns)}")
+        table = pq.read_table(filepath, columns=index_columns)
+        for col in index_columns:
+            if table[col].null_count > 0:
+                raise ValueError(f"Column '{col}' contains NaN values, which is not allowed in the index columns.")
+        x_values = np.sort(table['x'].to_pandas().unique())
+        y_values = np.sort(table['y'].to_pandas().unique())
+        z_values = np.sort(table['z'].to_pandas().unique())
+        if len(x_values) < 2 or len(y_values) < 2 or len(z_values) < 2:
+            raise ValueError(
+                "The geometry is not regular. At least two unique values are required in each index column.")
+        # Only check regular spacing if not rotated
+        if geometry is None:
+            geometry = RegularGeometry.from_parquet(filepath)
+        if not geometry.is_rotated:
+            def is_regular_spacing(values, tol=1e-8):
+                diffs = np.diff(values)
+                return np.all(np.abs(diffs - diffs[0]) < tol)
+            if not (is_regular_spacing(x_values) and is_regular_spacing(y_values) and is_regular_spacing(z_values)):
+                raise ValueError(
+                    "The geometry is not regular. The index columns must be evenly spaced (regular grid) in x, y, and z.")
+        logging.info(f"Geometry validation completed successfully for {filepath}.")
+    @staticmethod
+    def _validate_and_load_data(df, expected_num_blocks):
+        required_cols = {'x', 'y', 'z'}
+        if not required_cols.issubset(df.columns):
+            if len(df) == expected_num_blocks:
+                warnings.warn("Data loaded without x, y, z columns. "
+                              "Order is assumed to match the block model geometry.")
+            else:
+                raise ValueError("Data missing x, y, z and row count does not match block model.")
+        return df
+    def to_dense_parquet(self, filepath: Path,
+                         chunk_size: int = 100_000, show_progress: bool = False) -> None:
+        """
+        Save the block model to a Parquet file.
+        This method saves the block model as a Parquet file by chunk. If `dense` is True, it saves the block model as a dense grid,
+        Args:
+            filepath (Path): The file path where the Parquet file will be saved.
+            chunk_size (int): The number of blocks to save in each chunk. Defaults to 100_000.
+            show_progress (bool): If True, show a progress bar. Defaults to False.
+        """
+        columns = self.columns
+        dense_index = self.geometry.to_multi_index()
+        parquet_file = pq.ParquetFile(self.blockmodel_path)
+        total_rows = parquet_file.metadata.num_rows
+        total_batches = max(math.ceil(total_rows / chunk_size), 1)
+        progress = tqdm(total=total_batches, desc="Exporting", disable=not show_progress) if show_progress else None
+        with atomic_output_file(filepath) as tmp_path:
+            writer = None
+            try:
+                for batch in parquet_file.iter_batches(batch_size=chunk_size, columns=columns):
+                    df = pa.Table.from_batches([batch]).to_pandas()
+                    df = df.reindex(dense_index)
+                    table = pa.Table.from_pandas(df)
+                    if writer is None:
+                        writer = pq.ParquetWriter(tmp_path, table.schema)
+                    writer.write_table(table)
+                    if progress:
+                        progress.update(1)
+            finally:
+                if writer is not None:
+                    writer.close()
+                if progress:
+                    progress.close()