PyPI - PVNet_summation - Versions diffs - 1.0.0__tar.gz - Mend

PVNet_summation 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of PVNet_summation might be problematic. Click here for more details.

Files changed (24) hide show

pvnet_summation-1.0.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2023 Open Climate Fix
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

pvnet_summation-1.0.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,100 @@
+Metadata-Version: 2.4
+Name: PVNet_summation
+Version: 1.0.0
+Summary: PVNet_summation
+Author-email: James Fulton <info@openclimatefix.org>
+Requires-Python: >=3.10
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: pvnet>=5.0.0
+Requires-Dist: ocf-data-sampler>=0.2.32
+Requires-Dist: numpy
+Requires-Dist: pandas
+Requires-Dist: matplotlib
+Requires-Dist: xarray
+Requires-Dist: torch>=2.0.0
+Requires-Dist: lightning
+Requires-Dist: typer
+Requires-Dist: wandb
+Requires-Dist: huggingface-hub
+Requires-Dist: tqdm
+Requires-Dist: omegaconf
+Requires-Dist: hydra-core
+Requires-Dist: rich
+Requires-Dist: safetensors
+Dynamic: license-file
+# PVNet summation
+[![ease of contribution: hard](https://img.shields.io/badge/ease%20of%20contribution:%20hard-bb2629)](https://github.com/openclimatefix/ocf-meta-repo?tab=readme-ov-file#overview-of-ocfs-nowcasting-repositories)
+This project is used for training a model to sum the GSP predictions of [PVNet](https://github.com/openclimatefix/pvnet) into a national estimate.
+Using the summation model to sum the GSP predictions rather than doing a simple sum increases the accuracy of the national predictions and can be configured to produce estimates of the uncertainty range of the national estimate. See the [PVNet](https://github.com/openclimatefix/pvnet) repo for more details and our paper.
+## Setup / Installation
+```bash
+git clone https://github.com/openclimatefix/PVNet_summation
+cd PVNet_summation
+pip install .
+```
+### Additional development dependencies
+```bash
+pip install ".[dev]"
+```
+## Getting started with running PVNet summation
+In order to run PVNet summation, we assume that you are already set up with
+[PVNet](https://github.com/openclimatefix/pvnet) and have a trained PVNet model already available either locally or pushed to HuggingFace.
+Before running any code, copy the example configuration to a configs directory:
+```
+cp -r configs.example configs
+```
+You will be making local amendments to these configs.
+### Datasets
+The datasets required are the same as documented in
+[PVNet](https://github.com/openclimatefix/pvnet). The only addition is that you will need PVLive
+data for the national sum i.e. GSP ID 0.
+### Training PVNet_summation
+How PVNet_summation is run is determined by the extensive configuration in the config files. The
+configs stored in `configs.example`.
+Make sure to update the following config files before training your model:
+1. At the very start of training we loop over all of the input samples and make predictions for them using PVNet. These predictions are saved to disk and will be loaded in the training loop for more efficient training. In `configs/config.yaml` update `sample_save_dir` to set where the predictions will be saved to.
+2. In `configs/datamodule/default.yaml`:
+  - Update `pvnet_model.model_id` and `pvnet_model.revision` to point to the Huggingface commit or local directory where the exported PVNet model is.
+  - Update `configuration` to point to a data configuration compatible with the PVNet model whose outputs will be fed into the summation model.
+  - Set `train_period` and `val_period` to control the time ranges of the train and val period
+  - Optionally set `max_num_train_samples` and `max_num_val_samples` to limit the number of possible train and validation example which will be used.
+3. In `configs/model/default.yaml`:
+    - Update the hyperparameters and structure of the summation model
+4. In `configs/trainer/default.yaml`:
+    - Set `accelerator: 0` if running on a system without a supported GPU
+Assuming you have updated the configs, you should now be able to run:
+```
+python run.py
+```
+## Testing
+You can use `python -m pytest tests` to run tests

pvnet_summation-1.0.0/PVNet_summation.egg-info/PKG-INFO ADDED Viewed

@@ -0,0 +1,100 @@
+Metadata-Version: 2.4
+Name: PVNet_summation
+Version: 1.0.0
+Summary: PVNet_summation
+Author-email: James Fulton <info@openclimatefix.org>
+Requires-Python: >=3.10
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: pvnet>=5.0.0
+Requires-Dist: ocf-data-sampler>=0.2.32
+Requires-Dist: numpy
+Requires-Dist: pandas
+Requires-Dist: matplotlib
+Requires-Dist: xarray
+Requires-Dist: torch>=2.0.0
+Requires-Dist: lightning
+Requires-Dist: typer
+Requires-Dist: wandb
+Requires-Dist: huggingface-hub
+Requires-Dist: tqdm
+Requires-Dist: omegaconf
+Requires-Dist: hydra-core
+Requires-Dist: rich
+Requires-Dist: safetensors
+Dynamic: license-file
+# PVNet summation
+[![ease of contribution: hard](https://img.shields.io/badge/ease%20of%20contribution:%20hard-bb2629)](https://github.com/openclimatefix/ocf-meta-repo?tab=readme-ov-file#overview-of-ocfs-nowcasting-repositories)
+This project is used for training a model to sum the GSP predictions of [PVNet](https://github.com/openclimatefix/pvnet) into a national estimate.
+Using the summation model to sum the GSP predictions rather than doing a simple sum increases the accuracy of the national predictions and can be configured to produce estimates of the uncertainty range of the national estimate. See the [PVNet](https://github.com/openclimatefix/pvnet) repo for more details and our paper.
+## Setup / Installation
+```bash
+git clone https://github.com/openclimatefix/PVNet_summation
+cd PVNet_summation
+pip install .
+```
+### Additional development dependencies
+```bash
+pip install ".[dev]"
+```
+## Getting started with running PVNet summation
+In order to run PVNet summation, we assume that you are already set up with
+[PVNet](https://github.com/openclimatefix/pvnet) and have a trained PVNet model already available either locally or pushed to HuggingFace.
+Before running any code, copy the example configuration to a configs directory:
+```
+cp -r configs.example configs
+```
+You will be making local amendments to these configs.
+### Datasets
+The datasets required are the same as documented in
+[PVNet](https://github.com/openclimatefix/pvnet). The only addition is that you will need PVLive
+data for the national sum i.e. GSP ID 0.
+### Training PVNet_summation
+How PVNet_summation is run is determined by the extensive configuration in the config files. The
+configs stored in `configs.example`.
+Make sure to update the following config files before training your model:
+1. At the very start of training we loop over all of the input samples and make predictions for them using PVNet. These predictions are saved to disk and will be loaded in the training loop for more efficient training. In `configs/config.yaml` update `sample_save_dir` to set where the predictions will be saved to.
+2. In `configs/datamodule/default.yaml`:
+  - Update `pvnet_model.model_id` and `pvnet_model.revision` to point to the Huggingface commit or local directory where the exported PVNet model is.
+  - Update `configuration` to point to a data configuration compatible with the PVNet model whose outputs will be fed into the summation model.
+  - Set `train_period` and `val_period` to control the time ranges of the train and val period
+  - Optionally set `max_num_train_samples` and `max_num_val_samples` to limit the number of possible train and validation example which will be used.
+3. In `configs/model/default.yaml`:
+    - Update the hyperparameters and structure of the summation model
+4. In `configs/trainer/default.yaml`:
+    - Set `accelerator: 0` if running on a system without a supported GPU
+Assuming you have updated the configs, you should now be able to run:
+```
+python run.py
+```
+## Testing
+You can use `python -m pytest tests` to run tests

pvnet_summation-1.0.0/PVNet_summation.egg-info/SOURCES.txt ADDED Viewed

@@ -0,0 +1,22 @@
+LICENSE
+README.md
+pyproject.toml
+PVNet_summation.egg-info/PKG-INFO
+PVNet_summation.egg-info/SOURCES.txt
+PVNet_summation.egg-info/dependency_links.txt
+PVNet_summation.egg-info/requires.txt
+PVNet_summation.egg-info/top_level.txt
+pvnet_summation/__init__.py
+pvnet_summation/load_model.py
+pvnet_summation/optimizers.py
+pvnet_summation/utils.py
+pvnet_summation/data/__init__.py
+pvnet_summation/data/datamodule.py
+pvnet_summation/models/__init__.py
+pvnet_summation/models/base_model.py
+pvnet_summation/models/dense_model.py
+pvnet_summation/training/__init__.py
+pvnet_summation/training/lightning_module.py
+pvnet_summation/training/plots.py
+pvnet_summation/training/train.py
+tests/test_end2end.py

pvnet_summation-1.0.0/PVNet_summation.egg-info/dependency_links.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+

pvnet_summation-1.0.0/PVNet_summation.egg-info/requires.txt ADDED Viewed

@@ -0,0 +1,16 @@
+pvnet>=5.0.0
+ocf-data-sampler>=0.2.32
+numpy
+pandas
+matplotlib
+xarray
+torch>=2.0.0
+lightning
+typer
+wandb
+huggingface-hub
+tqdm
+omegaconf
+hydra-core
+rich
+safetensors

pvnet_summation-1.0.0/PVNet_summation.egg-info/top_level.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+ pvnet_summation

pvnet_summation-1.0.0/README.md ADDED Viewed

@@ -0,0 +1,74 @@
+# PVNet summation
+[![ease of contribution: hard](https://img.shields.io/badge/ease%20of%20contribution:%20hard-bb2629)](https://github.com/openclimatefix/ocf-meta-repo?tab=readme-ov-file#overview-of-ocfs-nowcasting-repositories)
+This project is used for training a model to sum the GSP predictions of [PVNet](https://github.com/openclimatefix/pvnet) into a national estimate.
+Using the summation model to sum the GSP predictions rather than doing a simple sum increases the accuracy of the national predictions and can be configured to produce estimates of the uncertainty range of the national estimate. See the [PVNet](https://github.com/openclimatefix/pvnet) repo for more details and our paper.
+## Setup / Installation
+```bash
+git clone https://github.com/openclimatefix/PVNet_summation
+cd PVNet_summation
+pip install .
+```
+### Additional development dependencies
+```bash
+pip install ".[dev]"
+```
+## Getting started with running PVNet summation
+In order to run PVNet summation, we assume that you are already set up with
+[PVNet](https://github.com/openclimatefix/pvnet) and have a trained PVNet model already available either locally or pushed to HuggingFace.
+Before running any code, copy the example configuration to a configs directory:
+```
+cp -r configs.example configs
+```
+You will be making local amendments to these configs.
+### Datasets
+The datasets required are the same as documented in
+[PVNet](https://github.com/openclimatefix/pvnet). The only addition is that you will need PVLive
+data for the national sum i.e. GSP ID 0.
+### Training PVNet_summation
+How PVNet_summation is run is determined by the extensive configuration in the config files. The
+configs stored in `configs.example`.
+Make sure to update the following config files before training your model:
+1. At the very start of training we loop over all of the input samples and make predictions for them using PVNet. These predictions are saved to disk and will be loaded in the training loop for more efficient training. In `configs/config.yaml` update `sample_save_dir` to set where the predictions will be saved to.
+2. In `configs/datamodule/default.yaml`:
+  - Update `pvnet_model.model_id` and `pvnet_model.revision` to point to the Huggingface commit or local directory where the exported PVNet model is.
+  - Update `configuration` to point to a data configuration compatible with the PVNet model whose outputs will be fed into the summation model.
+  - Set `train_period` and `val_period` to control the time ranges of the train and val period
+  - Optionally set `max_num_train_samples` and `max_num_val_samples` to limit the number of possible train and validation example which will be used.
+3. In `configs/model/default.yaml`:
+    - Update the hyperparameters and structure of the summation model
+4. In `configs/trainer/default.yaml`:
+    - Set `accelerator: 0` if running on a system without a supported GPU
+Assuming you have updated the configs, you should now be able to run:
+```
+python run.py
+```
+## Testing
+You can use `python -m pytest tests` to run tests

pvnet_summation-1.0.0/pvnet_summation/__init__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ """PVNet_summation"""

pvnet_summation-1.0.0/pvnet_summation/data/__init__.py ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ """Data module"""
2	+ from .datamodule import PresavedDataModule, StreamedDataModule

pvnet_summation-1.0.0/pvnet_summation/data/datamodule.py ADDED Viewed

@@ -0,0 +1,213 @@
+"""Pytorch lightning datamodules for loading pre-saved samples and predictions."""
+from glob import glob
+from typing import TypeAlias
+import numpy as np
+import pandas as pd
+import torch
+from lightning.pytorch import LightningDataModule
+from ocf_data_sampler.load.gsp import open_gsp
+from ocf_data_sampler.numpy_sample.common_types import NumpyBatch, NumpySample
+from ocf_data_sampler.torch_datasets.datasets.pvnet_uk import PVNetUKConcurrentDataset
+from ocf_data_sampler.utils import minutes
+from torch.utils.data import DataLoader, Dataset, default_collate
+from typing_extensions import override
+SumNumpySample: TypeAlias = dict[str, np.ndarray | NumpyBatch]
+SumTensorBatch: TypeAlias = dict[str, torch.Tensor]
+class StreamedDataset(PVNetUKConcurrentDataset):
+    """A torch dataset for creating concurrent PVNet inputs and national targets."""
+    def __init__(
+        self,
+        config_filename: str,
+        start_time: str | None = None,
+        end_time: str | None = None,
+    ) -> None:
+        """A torch dataset for creating concurrent PVNet inputs and national targets.
+        Args:
+            config_filename: Path to the configuration file
+            start_time: Limit the init-times to be after this
+            end_time: Limit the init-times to be before this
+        """
+        super().__init__(config_filename, start_time, end_time, gsp_ids=None)
+        # Load and nornmalise the national GSP data to use as target values
+        national_gsp_data = (
+            open_gsp(
+                zarr_path=self.config.input_data.gsp.zarr_path,
+                boundaries_version=self.config.input_data.gsp.boundaries_version
+            )
+            .sel(gsp_id=0)
+            .compute()
+        )
+        self.national_gsp_data = national_gsp_data / national_gsp_data.effective_capacity_mwp
+    def _get_sample(self, t0: pd.Timestamp) -> SumNumpySample:
+        """Generate a concurrent PVNet sample for given init-time.
+        Args:
+            t0: init-time for sample
+        """
+        pvnet_inputs: NumpySample = super()._get_sample(t0)
+        location_capacities = pvnet_inputs["gsp_effective_capacity_mwp"]
+        valid_times = pd.date_range(
+            t0+minutes(self.config.input_data.gsp.time_resolution_minutes),
+            t0+minutes(self.config.input_data.gsp.interval_end_minutes),
+            freq=minutes(self.config.input_data.gsp.time_resolution_minutes)
+        )
+        total_outturns = self.national_gsp_data.sel(time_utc=valid_times).values
+        total_capacity = self.national_gsp_data.sel(time_utc=t0).effective_capacity_mwp.item()
+        relative_capacities = location_capacities / total_capacity
+        return {
+            # NumpyBatch object with batch size = num_locations
+            "pvnet_inputs": pvnet_inputs,
+            # Shape: [time]
+            "target": total_outturns,
+            # Shape: [time]
+            "valid_times": valid_times.values.astype(int),
+            # Shape:
+            "last_outturn": self.national_gsp_data.sel(time_utc=t0).values,
+            # Shape: [num_locations]
+            "relative_capacity": relative_capacities,
+        }
+    @override
+    def __getitem__(self, idx: int) -> SumNumpySample:
+        return super().__getitem__(idx)
+    @override
+    def get_sample(self, t0: pd.Timestamp) -> SumNumpySample:
+        return super().get_sample(t0)
+class StreamedDataModule(LightningDataModule):
+    """Datamodule for training pvnet_summation."""
+    def __init__(
+        self,
+        configuration: str,
+        train_period: list[str | None] = [None, None],
+        val_period: list[str | None] = [None, None],
+        num_workers: int = 0,
+        prefetch_factor: int | None = None,
+        persistent_workers: bool = False,
+    ):
+        """Datamodule for creating concurrent PVNet inputs and national targets.
+        Args:
+            configuration: Path to ocf-data-sampler configuration file.
+            train_period: Date range filter for train dataloader.
+            val_period: Date range filter for val dataloader.
+            num_workers: Number of workers to use in multiprocess batch loading.
+            prefetch_factor: Number of data will be prefetched at the end of each worker process.
+            persistent_workers: If True, the data loader will not shut down the worker processes
+                after a dataset has been consumed once. This allows to maintain the workers Dataset
+                instances alive.
+        """
+        super().__init__()
+        self.configuration = configuration
+        self.train_period = train_period
+        self.val_period = val_period
+        self._dataloader_kwargs = dict(
+            batch_size=None,
+            batch_sampler=None,
+            num_workers=num_workers,
+            collate_fn=None,
+            pin_memory=False,
+            drop_last=False,
+            timeout=0,
+            worker_init_fn=None,
+            prefetch_factor=prefetch_factor,
+            persistent_workers=persistent_workers,
+        )
+    def train_dataloader(self, shuffle: bool = False) -> DataLoader:
+        """Construct train dataloader"""
+        dataset = StreamedDataset(self.configuration, *self.train_period)
+        return DataLoader(dataset, shuffle=shuffle, **self._dataloader_kwargs)
+    def val_dataloader(self, shuffle: bool = False) -> DataLoader:
+        """Construct val dataloader"""
+        dataset = StreamedDataset(self.configuration, *self.val_period)
+        return DataLoader(dataset, shuffle=shuffle, **self._dataloader_kwargs)
+class PresavedDataset(Dataset):
+    """Dataset for loading pre-saved PVNet predictions from disk"""
+    def __init__(self, sample_dir: str):
+        """"Dataset for loading pre-saved PVNet predictions from disk.
+        Args:
+            sample_dir: The directory containing the saved samples
+        """
+        self.sample_filepaths = sorted(glob(f"{sample_dir}/*.pt"))
+    def __len__(self) -> int:
+        return len(self.sample_filepaths)
+    def __getitem__(self, idx: int) -> dict:
+        return torch.load(self.sample_filepaths[idx], weights_only=True)
+class PresavedDataModule(LightningDataModule):
+    """Datamodule for loading pre-saved PVNet predictions."""
+    def __init__(
+        self,
+        sample_dir: str,
+        batch_size: int = 16,
+        num_workers: int = 0,
+        prefetch_factor: int | None = None,
+        persistent_workers: bool = False,
+    ):
+        """Datamodule for loading pre-saved PVNet predictions.
+        Args:
+            sample_dir: Path to the directory of pre-saved samples.
+            batch_size: Batch size.
+            num_workers: Number of workers to use in multiprocess batch loading.
+            prefetch_factor: Number of data will be prefetched at the end of each worker process.
+            persistent_workers: If True, the data loader will not shut down the worker processes
+                after a dataset has been consumed once. This allows to maintain the workers Dataset
+                instances alive.
+        """
+        super().__init__()
+        self.sample_dir = sample_dir
+        self._dataloader_kwargs = dict(
+            batch_size=batch_size,
+            sampler=None,
+            batch_sampler=None,
+            num_workers=num_workers,
+            collate_fn=None if batch_size is None else default_collate,
+            pin_memory=False,
+            drop_last=False,
+            timeout=0,
+            worker_init_fn=None,
+            prefetch_factor=prefetch_factor,
+            persistent_workers=persistent_workers,
+        )
+    def train_dataloader(self, shuffle: bool = True) -> DataLoader:
+        """Construct train dataloader"""
+        dataset = PresavedDataset(f"{self.sample_dir}/train")
+        return DataLoader(dataset, shuffle=shuffle, **self._dataloader_kwargs)
+    def val_dataloader(self, shuffle: bool = False) -> DataLoader:
+        """Construct val dataloader"""
+        dataset = PresavedDataset(f"{self.sample_dir}/val")
+        return DataLoader(dataset, shuffle=shuffle, **self._dataloader_kwargs)

pvnet_summation-1.0.0/pvnet_summation/load_model.py ADDED Viewed

@@ -0,0 +1,70 @@
+"""Load a model from its checkpoint directory"""
+import glob
+import os
+import hydra
+import torch
+import yaml
+from pvnet_summation.utils import (
+    DATAMODULE_CONFIG_NAME,
+    FULL_CONFIG_NAME,
+    MODEL_CONFIG_NAME,
+)
+def get_model_from_checkpoints(
+    checkpoint_dir_path: str,
+    val_best: bool = True,
+) -> tuple[torch.nn.Module, dict, str | None, str | None]:
+    """Load a model from its checkpoint directory
+    Returns:
+        tuple:
+            model: nn.Module of pretrained model.
+            model_config: path to model config used to train the model.
+            datamodule_config: path to datamodule used to create samples e.g train/test split info.
+            experiment_configs: path to the full experimental config.
+    """
+    # Load lightning training module
+    with open(f"{checkpoint_dir_path}/{MODEL_CONFIG_NAME}") as cfg:
+        model_config = yaml.load(cfg, Loader=yaml.FullLoader)
+    lightning_module = hydra.utils.instantiate(model_config)
+    if val_best:
+        # Only one epoch (best) saved per model
+        files = glob.glob(f"{checkpoint_dir_path}/epoch*.ckpt")
+        if len(files) != 1:
+            raise ValueError(
+                f"Found {len(files)} checkpoints @ {checkpoint_dir_path}/epoch*.ckpt. Expected one."
+            )
+        checkpoint = torch.load(files[0], map_location="cpu", weights_only=True)
+    else:
+        checkpoint = torch.load(
+            f"{checkpoint_dir_path}/last.ckpt",
+            map_location="cpu",
+            weights_only=True,
+        )
+    lightning_module.load_state_dict(state_dict=checkpoint["state_dict"])
+    # Extract the model from the lightning module
+    model = lightning_module.model
+    model_config = model_config["model"]
+    # Check for datamodule config
+    # This only exists if the model was trained with presaved samples
+    datamodule_config = f"{checkpoint_dir_path}/{DATAMODULE_CONFIG_NAME}"
+    datamodule_config = datamodule_config if os.path.isfile(datamodule_config) else None
+    # Check for experiment config
+    # For backwards compatibility - this might not always exist
+    experiment_config = f"{checkpoint_dir_path}/{FULL_CONFIG_NAME}"
+    experiment_config = experiment_config if os.path.isfile(experiment_config) else None
+    return model, model_config, datamodule_config, experiment_config

pvnet_summation-1.0.0/pvnet_summation/models/__init__.py ADDED Viewed

@@ -0,0 +1,3 @@
+"""Models for PVNet summation"""
+from .base_model import BaseModel
+from .dense_model import DenseModel