PyPI - ocf-data-sampler - Versions diffs - 0.5.11__tar.gz → 0.6.2__tar.gz - Mend

ocf-data-sampler 0.5.11tar.gz → 0.6.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (78) hide show

{ocf_data_sampler-0.5.11 → ocf_data_sampler-0.6.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: ocf-data-sampler
-Version: 0.5.11
+Version: 0.6.2
 Author: James Fulton, Peter Dudfield
 Author-email: Open Climate Fix team <info@openclimatefix.org>
 License: MIT License
@@ -28,7 +28,7 @@ License: MIT License
 Project-URL: repository, https://github.com/openclimatefix/ocf-data-sampler
 Classifier: Programming Language :: Python :: 3
 Classifier: License :: OSI Approved :: MIT License
-Requires-Python: >=3.11
+Requires-Python: <3.14,>=3.11
 Description-Content-Type: text/markdown
 Requires-Dist: torch
 Requires-Dist: numpy
@@ -50,7 +50,7 @@ Requires-Dist: zarr>=3
 # ocf-data-sampler
 <!-- ALL-CONTRIBUTORS-BADGE:START - Do not remove or modify this section -->
-[![All Contributors](https://img.shields.io/badge/all_contributors-14-orange.svg?style=flat-square)](#contributors-)
+[![All Contributors](https://img.shields.io/badge/all_contributors-16-orange.svg?style=flat-square)](#contributors-)
 <!-- ALL-CONTRIBUTORS-BADGE:END -->
 [![tags badge](https://img.shields.io/github/v/tag/openclimatefix/ocf-data-sampler?include_prereleases&sort=semver&color=FFAC5F)](https://github.com/openclimatefix/ocf-data-sampler/tags)
@@ -137,6 +137,10 @@ Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/d
       <td align="center" valign="top" width="14.28%"><a href="https://drona-gyawali.github.io/"><img src="https://avatars.githubusercontent.com/u/170401554?v=4?s=100" width="100px;" alt="Dorna Raj Gyawali"/><br /><sub><b>Dorna Raj Gyawali</b></sub></a><br /><a href="https://github.com/openclimatefix/ocf-data-sampler/commits?author=drona-gyawali" title="Code">💻</a></td>
       <td align="center" valign="top" width="14.28%"><a href="https://github.com/adnanhashmi25"><img src="https://avatars.githubusercontent.com/u/55550094?v=4?s=100" width="100px;" alt="Adnan Hashmi"/><br /><sub><b>Adnan Hashmi</b></sub></a><br /><a href="https://github.com/openclimatefix/ocf-data-sampler/commits?author=adnanhashmi25" title="Code">💻</a></td>
     </tr>
+    <tr>
+      <td align="center" valign="top" width="14.28%"><a href="https://github.com/utsav-pal"><img src="https://avatars.githubusercontent.com/u/159793156?v=4?s=100" width="100px;" alt="utsav-pal"/><br /><sub><b>utsav-pal</b></sub></a><br /><a href="https://github.com/openclimatefix/ocf-data-sampler/commits?author=utsav-pal" title="Code">💻</a></td>
+      <td align="center" valign="top" width="14.28%"><a href="https://github.com/zaryab-ali"><img src="https://avatars.githubusercontent.com/u/85732412?v=4?s=100" width="100px;" alt="zaryab-ali"/><br /><sub><b>zaryab-ali</b></sub></a><br /><a href="https://github.com/openclimatefix/ocf-data-sampler/commits?author=zaryab-ali" title="Code">💻</a></td>
+    </tr>
   </tbody>
 </table>

{ocf_data_sampler-0.5.11 → ocf_data_sampler-0.6.2}/README.md RENAMED Viewed

@@ -1,7 +1,7 @@
 # ocf-data-sampler
 <!-- ALL-CONTRIBUTORS-BADGE:START - Do not remove or modify this section -->
-[![All Contributors](https://img.shields.io/badge/all_contributors-14-orange.svg?style=flat-square)](#contributors-)
+[![All Contributors](https://img.shields.io/badge/all_contributors-16-orange.svg?style=flat-square)](#contributors-)
 <!-- ALL-CONTRIBUTORS-BADGE:END -->
 [![tags badge](https://img.shields.io/github/v/tag/openclimatefix/ocf-data-sampler?include_prereleases&sort=semver&color=FFAC5F)](https://github.com/openclimatefix/ocf-data-sampler/tags)
@@ -88,6 +88,10 @@ Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/d
       <td align="center" valign="top" width="14.28%"><a href="https://drona-gyawali.github.io/"><img src="https://avatars.githubusercontent.com/u/170401554?v=4?s=100" width="100px;" alt="Dorna Raj Gyawali"/><br /><sub><b>Dorna Raj Gyawali</b></sub></a><br /><a href="https://github.com/openclimatefix/ocf-data-sampler/commits?author=drona-gyawali" title="Code">💻</a></td>
       <td align="center" valign="top" width="14.28%"><a href="https://github.com/adnanhashmi25"><img src="https://avatars.githubusercontent.com/u/55550094?v=4?s=100" width="100px;" alt="Adnan Hashmi"/><br /><sub><b>Adnan Hashmi</b></sub></a><br /><a href="https://github.com/openclimatefix/ocf-data-sampler/commits?author=adnanhashmi25" title="Code">💻</a></td>
     </tr>
+    <tr>
+      <td align="center" valign="top" width="14.28%"><a href="https://github.com/utsav-pal"><img src="https://avatars.githubusercontent.com/u/159793156?v=4?s=100" width="100px;" alt="utsav-pal"/><br /><sub><b>utsav-pal</b></sub></a><br /><a href="https://github.com/openclimatefix/ocf-data-sampler/commits?author=utsav-pal" title="Code">💻</a></td>
+      <td align="center" valign="top" width="14.28%"><a href="https://github.com/zaryab-ali"><img src="https://avatars.githubusercontent.com/u/85732412?v=4?s=100" width="100px;" alt="zaryab-ali"/><br /><sub><b>zaryab-ali</b></sub></a><br /><a href="https://github.com/openclimatefix/ocf-data-sampler/commits?author=zaryab-ali" title="Code">💻</a></td>
+    </tr>
   </tbody>
 </table>

{ocf_data_sampler-0.5.11 → ocf_data_sampler-0.6.2}/ocf_data_sampler/config/model.py RENAMED Viewed

@@ -7,7 +7,7 @@ Prefix with a protocol like s3:// to read from alternative filesystems.
 from collections.abc import Iterator
 from typing import Literal
-from pydantic import BaseModel, Field, RootModel, field_validator, model_validator
+from pydantic import BaseModel, ConfigDict, Field, RootModel, field_validator, model_validator
 from typing_extensions import override
 NWP_PROVIDERS = [
@@ -23,10 +23,7 @@ NWP_PROVIDERS = [
 class Base(BaseModel):
     """Pydantic Base model where no extras can be added."""
-    class Config:
-        """Config class."""
-        extra = "forbid"  # forbid use of extra kwargs
+    model_config = ConfigDict(extra="forbid")
 class General(Base):
@@ -90,12 +87,17 @@ class DropoutMixin(Base):
         "negative or zero.",
     )
-    dropout_fraction: float|list[float] = Field(
-        default=0,
+    dropout_fraction: float | list[float] = Field(
+        default=0.0,
         description="Either a float(Chance of dropout being applied to each sample) or a list of "
         "floats (probability that dropout of the corresponding timedelta is applied)",
     )
+    dropout_value: float = Field(
+        default=0.0,
+        description="The value to use for dropped out values. "
+        "Idea is to use -1, but to be backwards comptaible we've put the default as 0")
     @field_validator("dropout_timedeltas_minutes")
     def dropout_timedeltas_minutes_negative(cls, v: list[int]) -> list[int]:
         """Validate 'dropout_timedeltas_minutes'."""
@@ -106,31 +108,22 @@ class DropoutMixin(Base):
     @field_validator("dropout_fraction")
-    def dropout_fractions(cls, dropout_frac: float|list[float]) -> float|list[float]:
+    def dropout_fractions(cls, dropout_frac: float | list[float]) -> float | list[float]:
         """Validate 'dropout_frac'."""
-        from math import isclose
-        if isinstance(dropout_frac, float):
-            if not (dropout_frac <= 1):
-                raise ValueError("Input should be less than or equal to 1")
-            elif not (dropout_frac >= 0):
-                raise ValueError("Input should be greater than or equal to 0")
+        if isinstance(dropout_frac, float | int):
+            if not (0<= dropout_frac <= 1):
+                raise ValueError("Dropout fractions must be in range [0, 1]")
         elif isinstance(dropout_frac, list):
             if not dropout_frac:
                 raise ValueError("List cannot be empty")
-            if not all(isinstance(i, float) for i in dropout_frac):
-                raise ValueError("All elements in the list must be floats")
             if not all(0 <= i <= 1 for i in dropout_frac):
-                raise ValueError("Each float in the list must be between 0 and 1")
-            if not isclose(sum(dropout_frac), 1.0, rel_tol=1e-9):
-                raise ValueError("Sum of all floats in the list must be 1.0")
+                raise ValueError("All dropout fractions must be in range [0, 1]")
+            if not (0 <= sum(dropout_frac) <= 1):
+                raise ValueError("The sum of dropout fractions must be in range [0, 1]")
-        else:
-            raise TypeError("Must be either a float or a list of floats")
         return dropout_frac
@@ -172,23 +165,6 @@ class NormalisationConstantsMixin(Base):
     """Normalisation constants for multiple channels."""
     normalisation_constants: dict[str, NormalisationValues]
-    @property
-    def channel_means(self) -> dict[str, float]:
-        """Return the channel means."""
-        return {
-            channel: norm_values.mean
-            for channel, norm_values in self.normalisation_constants.items()
-        }
-    @property
-    def channel_stds(self) -> dict[str, float]:
-        """Return the channel standard deviations."""
-        return {
-            channel: norm_values.std
-            for channel, norm_values in self.normalisation_constants.items()
-        }
 class Satellite(TimeWindowMixin, DropoutMixin, SpatialWindowMixin, NormalisationConstantsMixin):
     """Satellite configuration model."""
@@ -363,9 +339,19 @@ class InputData(Base):
     site: Site | None = None
     solar_position: SolarPosition | None = None
+    @model_validator(mode="after")
+    def check_site_or_gsp(self) -> "InputData":
+        """Ensure that either `site` or `gsp` is provided in the input data."""
+        if self.site is None and self.gsp is None:
+            raise ValueError(
+                "You must provide either `site` or `gsp` in the `input_data`",
+            )
+        return self
 class Configuration(Base):
     """Configuration model for the dataset."""
     general: General = General()
-    input_data: InputData = InputData()
+    input_data: InputData = Field(default_factory=InputData)

{ocf_data_sampler-0.5.11 → ocf_data_sampler-0.6.2}/ocf_data_sampler/load/gsp.py RENAMED Viewed

@@ -32,7 +32,7 @@ def open_gsp(
     boundaries_version: str = "20220314",
     public: bool = False,
 ) -> xr.DataArray:
-    """Open the GSP data and validates its data types.
+    """Open and eagerly load the GSP data and validates its data types.
     Args:
         zarr_path: Path to the GSP zarr data
@@ -93,4 +93,6 @@ def open_gsp(
             dtype = gsp_da.coords[coord].dtype
             raise TypeError(f"{coord} should be {expected_dtype.__name__}, not {dtype}")
-    return gsp_da
+    # Below we load the data eagerly into memory - this makes the dataset faster to sample from, but
+    # at the cost of a little extra memory usage
+    return gsp_da.compute()

{ocf_data_sampler-0.5.11 → ocf_data_sampler-0.6.2}/ocf_data_sampler/load/nwp/nwp.py RENAMED Viewed

@@ -29,7 +29,6 @@ def _validate_nwp_data(data_array: xr.DataArray, provider: str) -> None:
     common_expected_dtypes = {
         "init_time_utc": np.datetime64,
         "step": np.timedelta64,
-        "channel": (np.str_, np.object_),
     }
     geographic_spatial_dtypes = {

{ocf_data_sampler-0.5.11 → ocf_data_sampler-0.6.2}/ocf_data_sampler/load/nwp/providers/utils.py RENAMED Viewed

@@ -75,7 +75,7 @@ def _tensostore_open_zarr_paths(zarr_path: str | list[str], time_dim: str) -> xr
         zarr_path = sorted(glob(zarr_path))
     if isinstance(zarr_path, list | tuple):
-        ds = open_zarrs(zarr_path, concat_dim=time_dim).sortby(time_dim)
+        ds = open_zarrs(zarr_path, concat_dim=time_dim, data_source="nwp").sortby(time_dim)
     else:
         ds = open_zarr(zarr_path)
     return ds

{ocf_data_sampler-0.5.11 → ocf_data_sampler-0.6.2}/ocf_data_sampler/load/open_xarray_tensorstore.py RENAMED Viewed

@@ -14,6 +14,7 @@ References:
     [2] https://www.apache.org/licenses/LICENSE-2.0
 """
+import logging
 import os.path
 import re
@@ -26,6 +27,7 @@ from xarray_tensorstore import (
     _TensorStoreAdapter,
 )
+logger = logging.getLogger(__name__)
 def _zarr_spec_from_path(path: str, zarr_format: int) -> ...:
     if re.match(r"\w+\://", path):  # path is a URI
@@ -127,6 +129,7 @@ def open_zarrs(
     concat_dim: str,
     context: ts.Context | None = None,
     mask_and_scale: bool = True,
+    data_source: str = "unknown",
 ) -> xr.Dataset:
     """Open multiple zarrs with TensorStore.
@@ -135,6 +138,7 @@ def open_zarrs(
         concat_dim: Dimension along which to concatenate the data variables.
         context: TensorStore context.
         mask_and_scale: Whether to mask and scale the data.
+        data_source: Which data source is being opened. Used for warning context.
     Returns:
         Concatenated Dataset with all data variables opened via TensorStore.
@@ -143,13 +147,28 @@ def open_zarrs(
         context = ts.Context()
     ds_list = [xr.open_zarr(p, mask_and_scale=mask_and_scale, decode_timedelta=True) for p in paths]
-    ds = xr.concat(
-        ds_list,
-        dim=concat_dim,
-        data_vars="minimal",
-        compat="equals",
-        combine_attrs="drop_conflicts",
-    )
+    try:
+        ds = xr.concat(
+            ds_list,
+            dim=concat_dim,
+            data_vars="minimal",
+            compat="equals",
+            combine_attrs="drop_conflicts",
+            join="exact",
+        )
+    except ValueError:
+        logger.warning(f"Coordinate mismatch found in {data_source} input data. "
+                       f"The coordinates will be overwritten! "
+                       f"This might be fine for satellite data. "
+                       f"Proceed with caution.")
+        ds = xr.concat(
+            ds_list,
+            dim=concat_dim,
+            data_vars="minimal",
+            compat="equals",
+            combine_attrs="drop_conflicts",
+            join="override",
+        )
     if mask_and_scale:
         _raise_if_mask_and_scale_used_for_data_vars(ds)

{ocf_data_sampler-0.5.11 → ocf_data_sampler-0.6.2}/ocf_data_sampler/load/satellite.py RENAMED Viewed

@@ -19,7 +19,7 @@ def open_sat_data(zarr_path: str | list[str]) -> xr.DataArray:
     """
     # Open the data
     if isinstance(zarr_path, list | tuple):
-        ds = open_zarrs(zarr_path, concat_dim="time")
+        ds = open_zarrs(zarr_path, concat_dim="time", data_source="satellite")
     else:
         ds = open_zarr(zarr_path)

ocf_data_sampler-0.6.2/ocf_data_sampler/load/site.py ADDED Viewed

@@ -0,0 +1,73 @@
+"""Funcitons for loading site data."""
+import numpy as np
+import pandas as pd
+import xarray as xr
+def open_site(generation_file_path: str, metadata_file_path: str) -> xr.DataArray:
+    """Open a site's generation data and metadata.
+    Args:
+        generation_file_path: Path to the site generation netcdf data
+        metadata_file_path: Path to the site csv metadata
+    Returns:
+        xr.DataArray: The opened site generation data
+    """
+    generation_ds = xr.open_dataset(generation_file_path)
+    metadata_df = pd.read_csv(metadata_file_path, index_col="site_id")
+    if not metadata_df.index.is_unique:
+        raise ValueError("site_id is not unique in metadata")
+    # Ensure metadata aligns with the site_id dimension in generation_ds
+    metadata_df = metadata_df.reindex(generation_ds.site_id.values)
+    # Assign coordinates to the Dataset using the aligned metadata
+    # Check if variable capacity was passed with the generation data
+    # If not assign static capacity from metadata
+    if hasattr(generation_ds,"capacity_kwp"):
+        generation_ds = generation_ds.assign_coords(
+            latitude=(metadata_df.latitude.to_xarray()),
+            longitude=(metadata_df.longitude.to_xarray()),
+            capacity_kwp=generation_ds.capacity_kwp,
+        )
+    else:
+        generation_ds = generation_ds.assign_coords(
+            latitude=(metadata_df.latitude.to_xarray()),
+            longitude=(metadata_df.longitude.to_xarray()),
+            capacity_kwp=(metadata_df.capacity_kwp.to_xarray()),
+        )
+    # Sanity checks, to prevent inf or negative values
+    # Note NaNs are allowed in generation_kw as can have non overlapping time periods for sites
+    if np.isinf(generation_ds.generation_kw.values).all():
+        raise ValueError("generation_kw contains infinite (+/- inf) values")
+    if not (generation_ds.capacity_kwp.values > 0).all():
+        raise ValueError("capacity_kwp contains non-positive values")
+    site_da = generation_ds.generation_kw
+    # Validate data types directly in loading function
+    if not np.issubdtype(site_da.dtype, np.floating):
+        raise TypeError(f"Generation data should be float, not {site_da.dtype}")
+    coord_dtypes = {
+    "time_utc": (np.datetime64,),
+    "site_id": (np.integer,),
+    "capacity_kwp": (np.integer, np.floating),
+    "latitude": (np.floating,),
+    "longitude": (np.floating,),
+}
+    for coord, expected_dtypes in coord_dtypes.items():
+        if not any(np.issubdtype(site_da.coords[coord].dtype, dt) for dt in expected_dtypes):
+            dtype = site_da.coords[coord].dtype
+            allowed = ", ".join(dt.__name__ for dt in expected_dtypes)
+            raise TypeError(f"{coord} should be one of ({allowed}), not {dtype}")
+    # Load the data eagerly into memory by calling compute
+    # this makes the dataset faster to sample from, but
+    # at the cost of a little extra memory usage
+    return site_da.compute()

{ocf_data_sampler-0.5.11 → ocf_data_sampler-0.6.2}/ocf_data_sampler/numpy_sample/nwp.py RENAMED Viewed

@@ -28,7 +28,7 @@ def convert_nwp_to_numpy_sample(da: xr.DataArray, t0_idx: int | None = None) ->
         NWPSampleKey.channel_names: da.channel.values,
         NWPSampleKey.init_time_utc: da.init_time_utc.values.astype(float),
         NWPSampleKey.step: (da.step.values / 3600).astype(int),
-        NWPSampleKey.target_time_utc: da.target_time_utc.values.astype(float),
+        NWPSampleKey.target_time_utc: (da.init_time_utc.values + da.step.values).astype(float),
     }
     if t0_idx is not None:

ocf_data_sampler-0.6.2/ocf_data_sampler/select/diff_channels.py ADDED Viewed

@@ -0,0 +1,25 @@
+"""Takes the diff along the step axis for a given set of channels."""
+import numpy as np
+import xarray as xr
+def diff_channels(da: xr.DataArray, accum_channels: list[str]) -> xr.DataArray:
+    """Perform in-place diff of the given channels of the DataArray in the steps dimension.
+    Args:
+        da: The DataArray to slice from
+        accum_channels: Channels which are accumulated and need to be differenced
+    """
+    if da.dims[:2] != ("step", "channel"):
+        raise ValueError("This function assumes the first two dimensions are step then channel")
+    all_channels = da.channel.values
+    accum_channel_inds = [i for i, c in enumerate(all_channels) if c in accum_channels]
+    # Make a copy of the values to avoid changing the underlying numpy array
+    vals = da.values.copy()
+    vals[:-1, accum_channel_inds] = np.diff(vals[:, accum_channel_inds], axis=0)
+    da.values = vals
+    return da.isel(step=slice(0, -1))

ocf_data_sampler-0.6.2/ocf_data_sampler/select/dropout.py ADDED Viewed

@@ -0,0 +1,59 @@
+"""Functions for simulating dropout in time series data.
+This is used for the following types of data: GSP, Satellite and Site
+This is not used for NWP
+"""
+import numpy as np
+import pandas as pd
+import xarray as xr
+def apply_history_dropout(
+    t0: pd.Timestamp,
+    dropout_timedeltas: list[pd.Timedelta],
+    dropout_frac: float | list[float],
+    da: xr.DataArray,
+) -> xr.DataArray:
+    """Apply randomly sampled dropout to the historical part of some sequence data.
+    Dropped out data is replaced with NaNs
+    Args:
+        t0: The forecast init-time.
+        dropout_timedeltas: List of timedeltas relative to t0 to pick from
+        dropout_frac: The probabilit(ies) that each dropout timedelta will be applied. This should
+            be between 0 and 1 inclusive.
+        da: Xarray DataArray with 'time_utc' coordinate
+    """
+    if len(dropout_timedeltas)==0:
+        return da
+    if isinstance(dropout_frac, float | int):
+        if not (0<=dropout_frac<=1):
+            raise ValueError("`dropout_frac` must be in range [0, 1]")
+        # Create list with equal chance for all dropout timedeltas
+        n = len(dropout_timedeltas)
+        dropout_frac = [dropout_frac/n for _ in range(n)]
+    else:
+        if not 0<=sum(dropout_frac)<=1:
+            raise ValueError("The sum of `dropout_frac` must be in range [0, 1]")
+        if len(dropout_timedeltas)!=len(dropout_frac):
+            raise ValueError("`dropout_timedeltas` and `dropout_frac` must have the same length")
+        dropout_frac = [*dropout_frac] # Make copy of the list so we can append to it
+    dropout_timedeltas = [*dropout_timedeltas] # Make copy of the list so we can append to it
+    # Add chance of no dropout
+    dropout_frac.append(1-sum(dropout_frac))
+    dropout_timedeltas.append(None)
+    timedelta_choice = np.random.choice(dropout_timedeltas, p=dropout_frac)
+    if timedelta_choice is None:
+        return da
+    else:
+        return da.where((da.time_utc <= timedelta_choice + t0) | (da.time_utc> t0))

{ocf_data_sampler-0.5.11 → ocf_data_sampler-0.6.2}/ocf_data_sampler/select/location.py RENAMED Viewed

@@ -37,9 +37,9 @@ class Location:
             return self._projections[coord_system]
         else:
             raise ValueError(
-                "Requested the coodinate in {coord_system}. This has not yet been added. "
+                f"Requested the coodinate in {coord_system}. This has not yet been added. "
                 "The current available coordinate systems are "
-                f"{list(self.self._projections.keys())}",
+                f"{list(self._projections.keys())}",
             )
     def add_coord_system(self, x: float, y: float, coord_system: int) -> None:

ocf_data_sampler-0.6.2/ocf_data_sampler/select/select_spatial_slice.py ADDED Viewed

@@ -0,0 +1,110 @@
+"""Select spatial slices."""
+import numpy as np
+import xarray as xr
+from ocf_data_sampler.select.geospatial import find_coord_system
+from ocf_data_sampler.select.location import Location
+def _get_pixel_index_location(da: xr.DataArray, location: Location) -> tuple[int, int]:
+    """Find pixel index location closest to given Location.
+    Args:
+        da: The xarray DataArray.
+        location: The Location object representing the point of interest.
+    Returns:
+        The pixel indices.
+    Raises:
+        ValueError: If the location is outside the bounds of the DataArray.
+    """
+    target_coords, x_dim, y_dim = find_coord_system(da)
+    x, y = location.in_coord_system(target_coords)
+    x_vals = da[x_dim].values
+    y_vals = da[y_dim].values
+    # Check that requested point lies within the data
+    if not (x_vals[0] < x < x_vals[-1]):
+        raise ValueError(
+            f"{x} is not in the interval {x_vals[0]}: {x_vals[-1]}",
+        )
+    if not (y_vals[0] < y < y_vals[-1]):
+        raise ValueError(
+            f"{y} is not in the interval {y_vals[0]}: {y_vals[-1]}",
+        )
+    closest_x = np.argmin(np.abs(x_vals - x))
+    closest_y = np.argmin(np.abs(y_vals - y))
+    return closest_x, closest_y
+def select_spatial_slice_pixels(
+    da: xr.DataArray,
+    location: Location,
+    width_pixels: int,
+    height_pixels: int,
+) -> xr.DataArray:
+    """Select spatial slice based off pixels from location point of interest.
+    Args:
+        da: xarray DataArray to slice from
+        location: Location of interest that will be the center of the returned slice
+        height_pixels: Height of the slice in pixels
+        width_pixels: Width of the slice in pixels
+    Returns:
+        The selected DataArray slice.
+    Raises:
+        ValueError: If the dimensions are not even or the slice is not allowed
+                    when padding is required.
+    """
+    if (width_pixels % 2) != 0:
+        raise ValueError("Width must be an even number")
+    if (height_pixels % 2) != 0:
+        raise ValueError("Height must be an even number")
+    _, x_dim, y_dim = find_coord_system(da)
+    center_idx_x, center_idx_y = _get_pixel_index_location(da, location)
+    half_width = width_pixels // 2
+    half_height = height_pixels // 2
+    left_idx = int(center_idx_x - half_width)
+    right_idx = int(center_idx_x + half_width)
+    bottom_idx = int(center_idx_y - half_height)
+    top_idx = int(center_idx_y + half_height)
+    data_width_pixels = len(da[x_dim])
+    data_height_pixels = len(da[y_dim])
+    # Padding checks
+    slice_unavailable = (
+        left_idx < 0
+        or right_idx > data_width_pixels
+        or bottom_idx < 0
+        or top_idx > data_height_pixels
+    )
+    if slice_unavailable:
+        issues = []
+        if left_idx < 0:
+            issues.append(f"left_idx ({left_idx}) < 0")
+        if right_idx > data_width_pixels:
+            issues.append(f"right_idx ({right_idx}) > data_width_pixels ({data_width_pixels})")
+        if bottom_idx < 0:
+            issues.append(f"bottom_idx ({bottom_idx}) < 0")
+        if top_idx > data_height_pixels:
+            issues.append(f"top_idx ({top_idx}) > data_height_pixels ({data_height_pixels})")
+        issue_details = "\n - ".join(issues)
+        raise ValueError(f"Window for location {location} not available: \n - {issue_details}")
+    # Standard selection - without padding
+    da = da.isel({x_dim: slice(left_idx, right_idx), y_dim: slice(bottom_idx, top_idx)})
+    return da

ocf-data-sampler 0.5.11__tar.gz → 0.6.2__tar.gz

ocf-data-sampler 0.5.11tar.gz → 0.6.2tar.gz